List:Cluster« Previous MessageNext Message »
From:Andrew Morgan Date:April 23 2012 5:41pm
Subject:RE: Cluster crash after datanode shutdown
View as plain text  
Hi Antonio,

 For now, just retry with the increase timeout for arbitration and then check the logs if
it still crashes.

Andrew.

> -----Original Message-----
> From: Antonio Modesto [mailto:modesto@stripped]
> Sent: 23 April 2012 18:12
> To: Andrew Morgan
> Cc: MySQL-Cluster Lists
> Subject: RE: Cluster crash after datanode shutdown
> 
> Do you want me enable some extra debug and/or info?
> 
> On Mon, 2012-04-23 at 10:05 -0700, Andrew Morgan wrote:
> > Hi Antonio,
> >
> >
> >
> > The next thing I’d check is the log/error files on the management node
> > and the data node that you weren’t trying to shut down.
> >
> >
> >
> > I notice that you have ArbitrationTimeout set to 10 milleseconds (the
> > default is 7500 – I’d try increasing it).
> >
> >
> >
> > Regards, Andrew.
> >
> >
> >
> > From: Antonio Modesto [mailto:modesto@stripped]
> > Sent: 23 April 2012 17:13
> > To: Andrew Morgan
> > Cc: MySQL-Cluster Lists
> > Subject: RE: Cluster crash after datanode shutdown
> >
> >
> >
> >
> > Hi, here is my config.ini:
> >
> > [NDBD DEFAULT]
> > NoOfReplicas: 2
> > DataDir: /usr/local/mysql/data
> > DataMemory: 6000M
> > IndexMemory: 2000M
> > StringMemory: 5
> > MaxNoOfConcurrentTransactions: 4096
> > MaxNoOfConcurrentOperations: 100000
> > MaxNoOfLocalOperations: 100000
> > MaxNoOfConcurrentIndexOperations: 8192
> > MaxNoOfFiredTriggers: 4000
> > TransactionBufferMemory: 1M
> > MaxNoOfConcurrentScans: 300
> > MaxNoOfLocalScans: 32
> > BatchSizePerLocalScan: 64
> > LongMessageBuffer: 1M
> > NoOfFragmentLogFiles: 300
> > FragmentLogFileSize: 16M
> > MaxNoOfOpenFiles: 40
> > InitialNoOfOpenFiles: 27
> > MaxNoOfSavedMessages: 25
> > MaxNoOfAttributes: 1500
> > MaxNoOfTables: 400
> > MaxNoOfOrderedIndexes: 200
> > MaxNoOfUniqueHashIndexes: 200
> > MaxNoOfTriggers: 770
> > LockPagesInMainMemory: 0
> > StopOnError: 1
> > Diskless: 0
> > ODirect: 0
> > TimeBetweenWatchDogCheck: 6000
> > TimeBetweenWatchDogCheckInitial: 6000
> > StartPartialTimeout: 30000
> > StartPartitionedTimeout: 60000
> > StartFailureTimeout: 1000000
> > HeartbeatIntervalDbDb: 2000
> > HeartbeatIntervalDbApi: 3000
> > TimeBetweenLocalCheckpoints: 20
> > TimeBetweenGlobalCheckpoints: 2000
> > TransactionInactiveTimeout: 0
> > TransactionDeadlockDetectionTimeout: 1200
> > DiskSyncSize: 4M
> > DiskCheckpointSpeed: 10M
> > DiskCheckpointSpeedInRestart: 100M
> > ArbitrationTimeout: 10
> > UndoIndexBuffer: 2M
> > UndoDataBuffer: 1M
> > RedoBuffer: 32M
> > LogLevelStartup: 15
> > LogLevelShutdown: 3
> > LogLevelStatistic: 0
> > LogLevelCheckpoint: 0
> > LogLevelNodeRestart: 0
> > LogLevelConnection: 0
> > LogLevelError: 15
> > LogLevelCongestion: 0
> > LogLevelInfo: 3
> > MemReportFrequency: 0
> > BackupDataBufferSize: 2M
> > BackupLogBufferSize: 2M
> > BackupMemory: 64M
> > BackupWriteSize: 32K
> > BackupMaxWriteSize: 256K
> > [MGM DEFAULT]
> > PortNumber: 1186
> > DataDir: /usr/local/mysql/mysql-cluster [TCP DEFAULT]
> > SendBufferMemory: 2M
> > [NDB_MGMD]
> > NodeId: 1
> > HostName: 192.168.0.7
> > [NDBD]
> > NodeId: 2
> > HostName: 192.168.0.30
> > [NDBD]
> > NodeId: 3
> > HostName: 192.168.0.31
> > [API]
> > NodeId: 4
> > HostName: 192.168.0.30
> > [API]
> > NodeId: 5
> > HostName: 192.168.0.31
> >
> > On Mon, 2012-04-23 at 08:59 -0700, Andrew Morgan wrote:
> >
> >
> > Hi Antonio,
> >
> > Could you please share your config.ini file? I'd like to check that your
> ndb_mgmd process is not running on the same host as one of your data
> nodes.
> >
> > Setting StopOnError to FALSE will tell the data node's agent to restart an
> ndbd process if it is killed.
> >
> > Regards, Andrew.
> >
> > -----Original Message-----
> > From: Antonio Modesto [mailto:modesto@stripped]
> > Sent: 23 April 2012 15:44
> > To: MySQL-Cluster Lists
> > Subject: Cluster crash after datanode shutdown
> >
> > Hi,
> >
> > I setting up a mysql cluster with 2 data nodes and 1 management node. I
> was testing its reliability by shutting down a node and making some queries
> in the alive one, when I was testing with a small database, it worked well,
> independently of the data node I turned off, it kept running. The problem is
> when I import my radius database to it (about 1.5GB), if I turn one of the data
> nodes off, the cluster stops and I receive this message:
> >
> > Node 3: Forced node shutdown completed. Caused by error 2305: 'Node
> lost connection to other nodes and can not form a unpartitioned cluster,
> please investigate if there are error(s) on other node(s)(Arbitration error).
> Temporary error, restart node'.
> >
> >
> > I've seen in some lists the people recommending to enable the
> StopOnError attribute, but I don't know its side effects.
> >
> > Thanks.
> >
> >
> >
> >
> > --
> >
> >
> > Atenciosamente,
> >
> > Antônio Modesto
> >
> > Gerente de TI
> >
> >
> >
> > Praça Getúlio Vargas, 77 – Sala 308 – Centro
> >
> > Santo Antônio do Monte – MG – CEP: 35560-000
> > Tel:(37) 3281-2800
> >
> > Contato: isimples@stripped
> > http://www.isimples.com.br
> >
> >
> > Aviso:Esta mensagem e quaisquer arquivos em anexo podem conter
> > informações confidenciais e/ou
> >
> > privilegiadas. Se você não for o destinatário ou a pessoa
> autorizada a
> > receber esta mensagem, por favor, não
> >
> > leia, copie, repasse, imprima, guarde, nem tome qualquer ação
> baseada
> > nessas informações. Notifique o
> >
> > remetente imediatamente por e-mail e apague a mensagem
> > permanentemente. Atenção: embora a Isimples
> >
> > Telecom, tome seus cuidados para garantir a ausência de vírus
> neste
> > e-mail, a empresa não se responsabiliza
> >
> > por quaisquer perdas ou danos decorrentes do uso da mensagem e seus
> > anexos. A segurança e ausência de
> >
> > erros na transmissão do e-mail não podem ser garantidas, já
> que as
> > informações podem ser interceptadas,
> >
> > corrompidas, perdidas, destruídas, atrasadas, chegarem incompletas,
> > ou, ainda, conter vírus. Recomendamos
> >
> > checar se o e-mail e seus anexos contém vírus, uma vez que nem a
> > Isimples Telecom ou o remetente se
> >
> > responsabilizam pela transmissão destes.
> >
> >
> >
> >
> >
> >
> >
> >
> 
> 
> 
> --
> MySQL Cluster Mailing List
> For list archives: http://lists.mysql.com/cluster
> To unsubscribe:    http://lists.mysql.com/cluster
> 
Thread
Cluster crash after datanode shutdownAntonio Modesto23 Apr
  • RE: Cluster crash after datanode shutdownAndrew Morgan23 Apr
    • RE: Cluster crash after datanode shutdownAntonio Modesto23 Apr
      • RE: Cluster crash after datanode shutdownAndrew Morgan23 Apr
RE: Cluster crash after datanode shutdownAntonio Modesto23 Apr
  • RE: Cluster crash after datanode shutdownAndrew Morgan23 Apr
    • RE: Cluster crash after datanode shutdownAntonio Modesto23 Apr
      • Re: Cluster crash after datanode shutdownfranco riggi25 Apr