From: Andrew Morgan Date: April 23 2012 5:41pm Subject: RE: Cluster crash after datanode shutdown List-Archive: http://lists.mysql.com/cluster/8307 Message-Id: <988dc50a-8783-4221-ad99-570a9b50b4b1@default> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Hi Antonio, For now, just retry with the increase timeout for arbitration and then che= ck the logs if it still crashes. Andrew. > -----Original Message----- > From: Antonio Modesto [mailto:modesto@stripped] > Sent: 23 April 2012 18:12 > To: Andrew Morgan > Cc: MySQL-Cluster Lists > Subject: RE: Cluster crash after datanode shutdown >=20 > Do you want me enable some extra debug and/or info? >=20 > On Mon, 2012-04-23 at 10:05 -0700, Andrew Morgan wrote: > > Hi Antonio, > > > > > > > > The next thing I=E2=80=99d check is the log/error files on the manageme= nt node > > and the data node that you weren=E2=80=99t trying to shut down. > > > > > > > > I notice that you have ArbitrationTimeout set to 10 milleseconds (the > > default is 7500 =E2=80=93 I=E2=80=99d try increasing it). > > > > > > > > Regards, Andrew. > > > > > > > > From: Antonio Modesto [mailto:modesto@stripped] > > Sent: 23 April 2012 17:13 > > To: Andrew Morgan > > Cc: MySQL-Cluster Lists > > Subject: RE: Cluster crash after datanode shutdown > > > > > > > > > > Hi, here is my config.ini: > > > > [NDBD DEFAULT] > > NoOfReplicas: 2 > > DataDir: /usr/local/mysql/data > > DataMemory: 6000M > > IndexMemory: 2000M > > StringMemory: 5 > > MaxNoOfConcurrentTransactions: 4096 > > MaxNoOfConcurrentOperations: 100000 > > MaxNoOfLocalOperations: 100000 > > MaxNoOfConcurrentIndexOperations: 8192 > > MaxNoOfFiredTriggers: 4000 > > TransactionBufferMemory: 1M > > MaxNoOfConcurrentScans: 300 > > MaxNoOfLocalScans: 32 > > BatchSizePerLocalScan: 64 > > LongMessageBuffer: 1M > > NoOfFragmentLogFiles: 300 > > FragmentLogFileSize: 16M > > MaxNoOfOpenFiles: 40 > > InitialNoOfOpenFiles: 27 > > MaxNoOfSavedMessages: 25 > > MaxNoOfAttributes: 1500 > > MaxNoOfTables: 400 > > MaxNoOfOrderedIndexes: 200 > > MaxNoOfUniqueHashIndexes: 200 > > MaxNoOfTriggers: 770 > > LockPagesInMainMemory: 0 > > StopOnError: 1 > > Diskless: 0 > > ODirect: 0 > > TimeBetweenWatchDogCheck: 6000 > > TimeBetweenWatchDogCheckInitial: 6000 > > StartPartialTimeout: 30000 > > StartPartitionedTimeout: 60000 > > StartFailureTimeout: 1000000 > > HeartbeatIntervalDbDb: 2000 > > HeartbeatIntervalDbApi: 3000 > > TimeBetweenLocalCheckpoints: 20 > > TimeBetweenGlobalCheckpoints: 2000 > > TransactionInactiveTimeout: 0 > > TransactionDeadlockDetectionTimeout: 1200 > > DiskSyncSize: 4M > > DiskCheckpointSpeed: 10M > > DiskCheckpointSpeedInRestart: 100M > > ArbitrationTimeout: 10 > > UndoIndexBuffer: 2M > > UndoDataBuffer: 1M > > RedoBuffer: 32M > > LogLevelStartup: 15 > > LogLevelShutdown: 3 > > LogLevelStatistic: 0 > > LogLevelCheckpoint: 0 > > LogLevelNodeRestart: 0 > > LogLevelConnection: 0 > > LogLevelError: 15 > > LogLevelCongestion: 0 > > LogLevelInfo: 3 > > MemReportFrequency: 0 > > BackupDataBufferSize: 2M > > BackupLogBufferSize: 2M > > BackupMemory: 64M > > BackupWriteSize: 32K > > BackupMaxWriteSize: 256K > > [MGM DEFAULT] > > PortNumber: 1186 > > DataDir: /usr/local/mysql/mysql-cluster [TCP DEFAULT] > > SendBufferMemory: 2M > > [NDB_MGMD] > > NodeId: 1 > > HostName: 192.168.0.7 > > [NDBD] > > NodeId: 2 > > HostName: 192.168.0.30 > > [NDBD] > > NodeId: 3 > > HostName: 192.168.0.31 > > [API] > > NodeId: 4 > > HostName: 192.168.0.30 > > [API] > > NodeId: 5 > > HostName: 192.168.0.31 > > > > On Mon, 2012-04-23 at 08:59 -0700, Andrew Morgan wrote: > > > > > > Hi Antonio, > > > > Could you please share your config.ini file? I'd like to check that you= r > ndb_mgmd process is not running on the same host as one of your data > nodes. > > > > Setting StopOnError to FALSE will tell the data node's agent to restart= an > ndbd process if it is killed. > > > > Regards, Andrew. > > > > -----Original Message----- > > From: Antonio Modesto [mailto:modesto@stripped] > > Sent: 23 April 2012 15:44 > > To: MySQL-Cluster Lists > > Subject: Cluster crash after datanode shutdown > > > > Hi, > > > > I setting up a mysql cluster with 2 data nodes and 1 management node. I > was testing its reliability by shutting down a node and making some queri= es > in the alive one, when I was testing with a small database, it worked wel= l, > independently of the data node I turned off, it kept running. The problem= is > when I import my radius database to it (about 1.5GB), if I turn one of th= e data > nodes off, the cluster stops and I receive this message: > > > > Node 3: Forced node shutdown completed. Caused by error 2305: 'Node > lost connection to other nodes and can not form a unpartitioned cluster, > please investigate if there are error(s) on other node(s)(Arbitration err= or). > Temporary error, restart node'. > > > > > > I've seen in some lists the people recommending to enable the > StopOnError attribute, but I don't know its side effects. > > > > Thanks. > > > > > > > > > > -- > > > > > > Atenciosamente, > > > > Ant=C3=B4nio Modesto > > > > Gerente de TI > > > > > > > > Pra=C3=A7a Get=C3=BAlio Vargas, 77 =E2=80=93 Sala 308 =E2=80=93 Centro > > > > Santo Ant=C3=B4nio do Monte =E2=80=93 MG =E2=80=93 CEP: 35560-000 > > Tel:(37) 3281-2800 > > > > Contato: isimples@stripped > > http://www.isimples.com.br > > > > > > Aviso:Esta mensagem e quaisquer arquivos em anexo podem conter > > informa=C3=A7=C3=B5es confidenciais e/ou > > > > privilegiadas. Se voc=C3=AA n=C3=A3o for o destinat=C3=A1rio ou a pesso= a autorizada a > > receber esta mensagem, por favor, n=C3=A3o > > > > leia, copie, repasse, imprima, guarde, nem tome qualquer a=C3=A7=C3=A3o= baseada > > nessas informa=C3=A7=C3=B5es. Notifique o > > > > remetente imediatamente por e-mail e apague a mensagem > > permanentemente. Aten=C3=A7=C3=A3o: embora a Isimples > > > > Telecom, tome seus cuidados para garantir a aus=C3=AAncia de v=C3=ADrus= neste > > e-mail, a empresa n=C3=A3o se responsabiliza > > > > por quaisquer perdas ou danos decorrentes do uso da mensagem e seus > > anexos. A seguran=C3=A7a e aus=C3=AAncia de > > > > erros na transmiss=C3=A3o do e-mail n=C3=A3o podem ser garantidas, j=C3= =A1 que as > > informa=C3=A7=C3=B5es podem ser interceptadas, > > > > corrompidas, perdidas, destru=C3=ADdas, atrasadas, chegarem incompletas= , > > ou, ainda, conter v=C3=ADrus. Recomendamos > > > > checar se o e-mail e seus anexos cont=C3=A9m v=C3=ADrus, uma vez que ne= m a > > Isimples Telecom ou o remetente se > > > > responsabilizam pela transmiss=C3=A3o destes. > > > > > > > > > > > > > > > > >=20 >=20 >=20 > -- > MySQL Cluster Mailing List > For list archives: http://lists.mysql.com/cluster > To unsubscribe: http://lists.mysql.com/cluster >=20