From: Janos Lehnhardt Date: October 19 2012 2:31pm Subject: Re: Installation of SQL-Node - connection established & heartbeat not working List-Archive: http://lists.mysql.com/cluster/8422 Message-Id: <50816436.7080204@fleetboard.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Hey, thanks for the fast reply. I've invested another few hours trying to get the thing running, but I'm unfortunately still stuck on the same mistake. The servers are installed on one vmware-esx-server as vm's using the same network device in the same subnet, they are definately not routed over any firewall or whatsoever. Its seems like the nodes cannot "heartbeat" with my sql-node, altho they reach each other directly on the first hop. Ive disabled any sort of software-firewall on the machines afaik. Id appreciate any ideas, since im running out of em..:( These are my logs. error-log of sql-node: 121019 16:27:12 mysqld_safe Number of processes running now: 0 121019 16:27:12 mysqld_safe mysqld restarted 121019 16:27:12 [Note] Plugin 'FEDERATED' is disabled. 121019 16:27:12 [Note] NDB: NodeID is 4, management server '10.X.X.57:1186' 121019 16:27:30 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql 121019 16:27:30 [Note] Plugin 'FEDERATED' is disabled. 121019 16:27:30 [Note] NDB: NodeID is 4, management server '10.X.X.57:1186' 121019 16:28:00 [Note] NDB[0]: NodeID: 4, no storage nodes connected (timed out) 121019 16:28:00 [Warning] NDB: server id set to zero - changes logged to bin log with server id zero will be logged with another server id by slave mysqlds 121019 16:28:00 [Note] Starting Cluster Binlog Thread 121019 16:28:00 InnoDB: The InnoDB memory heap is disabled 121019 16:28:00 InnoDB: Mutexes and rw_locks use GCC atomic builtins 121019 16:28:00 InnoDB: Compressed tables use zlib 1.2.3 121019 16:28:00 InnoDB: Using Linux native AIO 121019 16:28:00 InnoDB: Initializing buffer pool, size = 128.0M 121019 16:28:00 InnoDB: Completed initialization of buffer pool 121019 16:28:00 InnoDB: highest supported file format is Barracuda. InnoDB: The log sequence number in ibdata files does not match InnoDB: the log sequence number in the ib_logfiles! 121019 16:28:00 InnoDB: Database was not shut down normally! InnoDB: Starting crash recovery. InnoDB: Reading tablespace information from the .ibd files... InnoDB: Restoring possible half-written data pages from the doublewrite InnoDB: buffer... 121019 16:28:00 InnoDB: Waiting for the background threads to start 121019 16:28:01 InnoDB: 1.1.8 started; log sequence number 1595685 121019 16:28:01 [Note] Server hostname (bind-address): '0.0.0.0'; port: 3306 121019 16:28:01 [Note] - '0.0.0.0' resolves to '0.0.0.0'; 121019 16:28:01 [Note] Server socket created on IP: '0.0.0.0'. 121019 16:28:01 [Note] Event Scheduler: Loaded 0 events 121019 16:28:01 [Note] /usr/sbin/mysqld: ready for connections. Version: '5.5.27-ndb-7.2.8-cluster-gpl' socket: '/var/lib/mysql/mysql.sock' port: 3306 MySQL Cluster Community Server (GPL) (the crash comes from the killing the mysqld process via console) error-log of management-node: 2012-10-19 16:23:46 [MgmSrvr] INFO -- Node 2: Node 4: API version 7.2.8 2012-10-19 16:23:46 [MgmSrvr] INFO -- Node 3: Node 4 Connected 2012-10-19 16:23:46 [MgmSrvr] INFO -- Node 3: Node 4: API version 7.2.8 2012-10-19 16:23:50 [MgmSrvr] WARNING -- Node 2: Node 4 missed heartbeat 2 2012-10-19 16:23:51 [MgmSrvr] WARNING -- Node 3: Node 4 missed heartbeat 2 2012-10-19 16:23:51 [MgmSrvr] WARNING -- Node 2: Node 4 missed heartbeat 3 2012-10-19 16:23:52 [MgmSrvr] WARNING -- Node 3: Node 4 missed heartbeat 3 2012-10-19 16:23:53 [MgmSrvr] WARNING -- Node 2: Node 4 missed heartbeat 4 2012-10-19 16:23:53 [MgmSrvr] ALERT -- Node 2: Node 4 declared dead due to missed heartbeat 2012-10-19 16:23:53 [MgmSrvr] INFO -- Node 2: Communication to Node 4 closed 2012-10-19 16:23:53 [MgmSrvr] INFO -- Node 3: Communication to Node 4 closed 2012-10-19 16:23:53 [MgmSrvr] ALERT -- Node 2: Node 4 Disconnected 2012-10-19 16:23:53 [MgmSrvr] ALERT -- Node 3: Node 4 Disconnected Regards, Janos On 18.10.2012 17:31, Johan De Meersman wrote: > ----- Original Message ----- >> From: "Janos Lehnhardt" >> >> 2012-10-18 15:32:46 [MgmSrvr] WARNING -- Node 2: Node 4 missed heartbeat 4 >> 2012-10-18 15:32:46 [MgmSrvr] ALERT -- Node 2: Node 4 declared dead due to missed heartbeat >> 2012-10-18 15:32:46 [MgmSrvr] INFO -- Node 2: Communication to Node 4 closed > Well, this is obviously why it keeps being disconnected. Now the question is, why is it missing heartbeats? You say there's no firewalls. I assume all nodes are connected to the same switch, then? Plug node 4 into another port, change the cable, et cetera to rule out network issues. Use a different network card, if you can. Also check the general system logs for unusual messages. > > Is your system not overloaded, cpu-wise? Is it swapping badly? Run ping to one of the other nodes (or better still, MTR) to see how the network behaves right before the disconnect. Grab a tcpdump and see wether there's something interesting to be seen there. > > Hmm. Also, check wether it's not as simple as another device claiming the same IP. Seen that one happen before, too :-) > > > >