List:Cluster« Previous MessageNext Message »
From:Mikael Ronström Date:August 27 2004 9:22am
Subject:Re: Questions about fault tolerance.
View as plain text  
Hi Chris,

2004-08-26 kl. 23.07 skrev IHLING, CHRIS G (CHRIS):

> Hi All,
>
>             I have successfully set up a two node replicated database 
> cluster. The problem I am having happens when I power down one node. 
> If I power down computer 1, the db on computer 2 shuts down. Also if I 
> pull the network cable from one of the nodes the db on computer 2 
> shuts down. In this scenario the db on computer 1 continues to work 
> and I can make updates. The updates get replicated after I reconnect 
> the cable and restart the db node. But I cannot restart the db on 
> computer2 until the cable is reconnected. I have turned on all the DB 
> tracing in the config file but I don't get any messages saying what is 
> going on. It just shuts down. I will attach my config.ini file. But it 
> looks right to me. Is this supposed to work like this to avoid updates 
> being made on both sides?  If so scenario one doesn't make sense. Does 
> anyone else have these issues?
>

My best guess is that your Arbitrator is on the first computer. The DB 
node on computer 2 is shut down since it is not part of a majority of 
the nodes.
To handle computer shutdowns where power is stopped you need at least 3 
computers to ensure surviving crashes. With only 2 computers the
system has no way of knowing whether the other node actually died or 
the network was partitioned. Our approach to solve this is the
introduction of an arbitrator. The arbitrator is recommended to be the 
management node (configured using Arbitration Rank). The only 
responsibility
of the arbitrator is to vote in the case of node failure and only half 
of the DB nodes is available, apart from that it does nothing.

So if you have access to a third machine (doesn't need a lot of CPU and 
memory resources obviously) place the management server there and set 
its
ArbitrationRank to 1 and all other API and MGM's to ArbitrationRank 0.

Rgrds Mikael

>
>
> Thanks,
>
> Chris Ihling
>
> Lucent Technologies, Bell Labs Innovations.
>
>
>
> [COMPUTER DEFAULT]
>
> ByteOrder: Little
>
>
>
> [MGM DEFAULT]
>
> PortNumber: 2200
>
>
>
> [DB DEFAULT]
>
> NoOfReplicas: 2
>
> FileSystemPath: /opt/lps/current/mysql/data
>
> TimeBetweenWatchdogCheck: 30000
>
>
>
> [COMPUTER]
>
> Id:1
>
> HostName: lpsdev3-n1
>
>
>
> [COMPUTER]
>
> Id:2
>
> HostName: lpsdev3-n2
>
>
>
> [MGM]
>
> Id:1
>
> ExecuteOnComputer:1
>
>
>
> [MGM]
>
> Id:2
>
> ExecuteOnComputer:2
>
>
>
> [DB]
>
> Id:3
>
> ExecuteOnComputer:1
>
>
>
> [DB]
>
> Id:4
>
> ExecuteOnComputer:2
>
>
>
> [API]
>
> Id:5
>
> ExecuteOnComputer:1
>
>
>
> [API]
>
> Id:6
>
> ExecuteOnComputer:2
>
>
Mikael Ronström, Senior Software Architect
MySQL AB, www.mysql.com

Clustering:
http://www.infoworld.com/article/04/04/14/HNmysqlcluster_1.html

http://www.eweek.com/article2/0,1759,1567546,00.asp


Thread
Questions about fault tolerance.CHRIS)26 Aug
  • Re: Questions about fault tolerance.Mikael Ronström27 Aug