Hi,
A bug report (4585) relating to this has been filed.
Sorry for your inconvenience,
b.r,
Johan Andersson
Devananda wrote:
> I've been experiencing this same general problem, but haven't tried to
> narrow it down to a reproduceable pattern. Seems to happen in relation
> to restarting a DB node, like Jim said.
>
> Jim Hoadley wrote:
>
>> When I stop/start or restart a database node, the API (MySQL server)
>> loses
>> connection with the data until the node comes back online. This only
>> happens on
>> one of my 2 nodes (BOX2). The other (BOX1) is fine. Been puzzling
>> over this for
>> a week or so. Something I missed? Please forward any suggestions.
>> Details
>> below.
>>
>> BOX1 = Pentium III/1000MHz/512MB RAM
>> BOX2 = Pentium III/600MHz/512MB RAM
>> Both running mysql-4.1.3-beta-nightly-20040628.tar.gz.
>> Not a lot of RAM but only using a tiny test database at this point.
>> Running the MGM on a separate computer (BOX4) to help isolate problem.
>>
>> Connected to BOX1, issue SELECT against test.simpsons and get proper
>> response:
>>
>> ----------------------------------------
>> mysql> select * from simpsons ;
>> +----+------------+
>> | id | first_name |
>> +----+------------+
>> | 2 | Lisa |
>> | 4 | Homer |
>> | 5 | Maggie |
>> | 3 | Marge |
>> | 1 | Bart |
>> +----+------------+
>> 5 rows in set (0.03 sec)
>> ----------------------------------------
>>
>> Stop node 3 on BOX1. SELECT now fails:
>>
>> ----------------------------------------
>> mysql> select * from simpsons ;
>> ERROR 1015: Can't lock file (errno: 4009)
>> ----------------------------------------
>>
>> Repeating SELECT fails:
>>
>> ----------------------------------------
>> mysql> select * from simpsons ;
>> ERROR 2013: Lost connection to MySQL server during query
>> ----------------------------------------
>>
>> Repeating SELECT fails again, then succeeds after node 3 is restarted:
>>
>> ----------------------------------------
>> mysql> select * from simpsons ;
>> ERROR 2006: MySQL server has gone away
>> No connection. Trying to reconnect...
>> Connection id: 1
>> Current database: test
>>
>> +----+------------+
>> | id | first_name |
>> +----+------------+
>> | 2 | Lisa |
>> | 4 | Homer |
>> | 5 | Maggie |
>> | 3 | Marge |
>> | 1 | Bart |
>> +----+------------+
>> 5 rows in set (6.55 sec)
>> ----------------------------------------
>>
>> All data is intact. BTW new records added to node 2 on BOX2 while
>> node 3 on
>> BOX1 is down show up (this is good).
>>
>> Here's what restarting node 3 on BOX1 with mgmd looks like (looks
>> right to me):
>>
>> ----------------------------------------
>> NDB> show
>> Cluster Configuration
>> ---------------------
>> 2 NDB Node(s)
>> DB node: 2 (Version: 3.5.0)
>> DB node: 3 (Version: 3.5.0)
>>
>> 4 API Node(s)
>> API node: 11 (not connected)
>> API node: 12 (Version: 3.5.0)
>> API node: 13 (not connected)
>> API node: 14 (not connected)
>>
>> 1 MGM Node(s)
>> MGM node: 1 (Version: 3.5.0)
>>
>> NDB> 2 restart
>> Executing RESTART on node 2.
>> Database node 2 is being restarted.
>>
>> NDB> 2 - endTakeOver
>> ----------------------------------------
>>
>> Here is the MySQL server error log output on BOX1 as node 3 is
>> restarted:
>>
>> ----------------------------------------
>> 040713 10:53:31 mysqld started
>> 040713 10:53:32 InnoDB: Started; log sequence number 0 44112
>> /usr/local/mysql/libexec/mysqld: ready for connections.
>> Version: '4.1.3-beta-nightly-20040628-log' socket: '/tmp/mysql.sock'
>> port:
>> 3306
>> 2004-07-19 11:39:15 [NDB] INFO -- Node shutdown initiated
>> mysqld got signal 11;
>> This could be because you hit a bug. It is also possible that this
>> binary
>> or one of the libraries it was linked against is corrupt, improperly
>> built,
>> or misconfigured. This error can also be caused by malfunctioning
>> hardware.
>> We will try our best to scrape up some info that will hopefully help
>> diagnose
>> the problem, but since we have already crashed, something is
>> definitely wrong
>> and this may fail.
>>
>> key_buffer_size=16777216
>> read_buffer_size=258048
>> max_used_connections=1
>> max_connections=100
>> threads_connected=1
>> It is possible that mysqld could use up to
>> key_buffer_size + (read_buffer_size +
>> sort_buffer_size)*max_connections = 92783
>> K
>> bytes of memory
>> Hope that's ok; if not, decrease some variables in the equation.
>>
>> thd=0x8751280
>> Attempting backtrace. You can use the following information to find out
>> where mysqld died. If you see no messages after this, something went
>> terribly wrong...
>> Cannot determine thread, fp=0xafdc0c5c, backtrace may not be correct.
>> Stack range sanity check OK, backtrace follows:
>> 0x81725e0
>> 0xb749de48
>> 0x81bc7a4
>> 0x81f21c0
>> 0x81bc7a4
>> 0x81bb8e8
>> 0x81b57e7
>> 0x81ad41e
>> 0x81adf45
>> 0x81aa3fa
>> 0x8187aba
>> 0x818d8cc
>> 0x81864b0
>> 0x8186056
>> 0x81857fa
>> 0xb7497dac
>> 0xb73c3a8a
>> New value of fp=(nil) failed sanity check, terminating stack trace!
>> Please read http://www.mysql.com/doc/en/Using_stack_trace.html and
>> follow
>> instructions on how to resolve the stack trace. Resolved
>> stack trace is much more helpful in diagnosing the problem, so please do
>> resolve it
>> Trying to get some variables.
>> Some pointers may be invalid and cause the dump to abort...
>> thd->query at 0x875d768 = select * from simpsons
>> thd->thread_id=7854
>> The manual page at http://www.mysql.com/doc/en/Crashing.html contains
>> information that should help you find out what is causing the crash.
>>
>> Number of processes running now: 0
>> 040719 11:39:24 mysqld restarted
>> InnoDB: Warning: we did not need to do crash recovery, but log scan
>> InnoDB: progressed past the checkpoint lsn 0 44112 up to lsn 0 44146
>> 040719 11:39:24 InnoDB: Started; log sequence number 0 44112
>> Restarting system
>> 2004-07-19 11:39:28 [NDB] INFO -- Ndb has terminated (pid 18367)
>> restarting
>> 2004-07-19 11:39:28 [NDB] INFO -- Angel pid: 18366 ndb pid: 18428
>> 2004-07-19 11:39:28 [NDB] INFO -- NDB Cluster -- DB node 2
>> 2004-07-19 11:39:28 [NDB] INFO -- Version 3.5.0 (beta) --
>> 2004-07-19 11:39:28 [NDB] INFO -- Start initiated (version 3.5.0)
>> NR: setLcpActiveStatusEnd - m_participatingLQH
>> 2004-07-19 11:39:33 [NDB] INFO -- Started (version 3.5.0)
>> /usr/local/mysql/libexec/mysqld: ready for connections.
>> Version: '4.1.3-beta-nightly-20040628-log' socket:
>> '/tmp/mysql.sock' port:
>> 3306
>> ----------------------------------------
>>
>> Here is the config.ini:
>>
>> ----------------------------------------
>> [DB DEFAULT]
>> NoOfReplicas: 2
>> MaxNoOfConcurrentOperations: 10000
>> DataMemory: 40M
>> IndexMemory: 12M
>> Discless: 0
>>
>> [COMPUTER]
>> Id: 1
>> ByteOrder: Little
>> HostName: BOX3
>>
>> [COMPUTER]
>> Id: 2
>> ByteOrder: Little
>> HostName: BOX2
>>
>> [COMPUTER]
>> Id: 3
>> ByteOrder: Little
>> HostName: BOX3
>>
>> [COMPUTER]
>> Id: 4
>> ByteOrder: Little
>> HostName: localhost
>>
>> [COMPUTER]
>> Id: 5
>> ByteOrder: Little
>> HostName: localhost
>>
>> [COMPUTER]
>> Id: 6
>> ByteOrder: Little
>> HostName: localhost
>>
>> [COMPUTER]
>> Id: 7
>> ByteOrder: Little
>> HostName: localhost
>>
>> [MGM]
>> Id: 1
>> ExecuteOnComputer: 1
>> PortNumber: 2200
>>
>> [DB]
>> Id: 2
>> ExecuteOnComputer: 2
>> FileSystemPath: /var/ndbcluster/mysql-test/ndbcluster/node-2-fs-2200
>>
>> [DB]
>> Id: 3
>> ExecuteOnComputer: 3
>> FileSystemPath: /var/ndbcluster/mysql-test/ndbcluster/node-3-fs-2200
>>
>> [API]
>> Id: 11
>> ExecuteOnComputer: 4
>>
>> [API]
>> Id: 12
>> ExecuteOnComputer: 5
>>
>> [API]
>> Id: 13
>> ExecuteOnComputer: 6
>>
>> [API]
>> Id: 14
>> ExecuteOnComputer: 7
>>
>> [TCP DEFAULT]
>> PortNumber: 2202
>> ----------------------------------------
>>
>> Here is table status on BOX1 (database Engine type looks right, I
>> guess):
>>
>> ----------------------------------------
>> mysql> SHOW TABLE STATUS;
>>
> +-----------+------------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+-------------+-------------+------------+-------------------+----------+----------------+-------------------------------------------+
>
>>
>> | Name | Engine | Version | Row_format | Rows |
>> Avg_row_length |
>> Data_length | Max_data_length | Index_length | Data_free |
>> Auto_increment |
>> Create_time | Update_time | Check_time | Collation | Checksum |
>> Create_options | Comment |
>>
> +-----------+------------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+-------------+-------------+------------+-------------------+----------+----------------+-------------------------------------------+
>
>>
>> | simpsons | ndbcluster | 9 | Fixed | 100 |
>> 0 |
>> 0 | NULL | 0 | 0 | NULL |
>> NULL
>> | NULL | NULL | latin1_swedish_ci | NULL
>> | |
>> |
>>
> +-----------+------------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+-------------+-------------+------------+-------------------+----------+----------------+-------------------------------------------+
>
>>
>> 1 row in set (0.00 sec)
>> ----------------------------------------
>>
>> Here is table status on BOX2 (obviously, same as BOX1):
>>
>> ----------------------------------------
>> mysql> SHOW TABLE STATUS;
>>
> +----------+------------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+-------------+-------------+------------+-------------------+----------+----------------+---------+
>
>>
>> | Name | Engine | Version | Row_format | Rows | Avg_row_length |
>> Data_length | Max_data_length | Index_length | Data_free |
>> Auto_increment |
>> Create_time | Update_time | Check_time | Collation | Checksum |
>> Create_options | Comment |
>>
> +----------+------------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+-------------+-------------+------------+-------------------+----------+----------------+---------+
>
>>
>> | simpsons | ndbcluster | 9 | Fixed | 100 | 0 |
>> 0 | NULL | 0 | 0 | NULL | NULL
>> | NULL | NULL | latin1_swedish_ci | NULL
>> | |
>> |
>>
> +----------+------------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+-------------+-------------+------------+-------------------+----------+----------------+---------+
>
>>
>> 1 row in set (0.00 sec)
>> ----------------------------------------
>>
>> Here is /etc/hosts file from BOX1:
>>
>> ----------------------------------------
>> [root@BOX1 3.ndb_db]# cat /etc/hosts
>> # Do not remove the following line, or various programs
>> # that require network functionality will fail.
>> 127.0.0.1 localhost.localdomain localhost
>> 192.168.1.211 BOX2
>> 192.168.1.212 BOX1
>> 192.168.1.213 BOX4
>> 192.168.1.220 BOX3
>> ----------------------------------------
>>
>> Here is /etc/hosts file from BOX2:
>>
>> ----------------------------------------
>> # Do not remove the following line, or various programs
>> # that require network functionality will fail.
>> 127.0.0.1 localhost.localdomain localhost
>> 192.168.1.211 BOX2
>> 192.168.1.212 BOX1
>> 192.168.1.213 BOX4
>> 192.168.1.220 BOX3
>> ----------------------------------------
>>
>>
>>
>>
>>
>>
>> __________________________________
>> Do you Yahoo!?
>> Vote for the stars of Yahoo!'s next ad campaign!
>> http://advision.webevents.yahoo.com/yahoo/votelifeengine/
>>
>>
>>
>>
>