List:Cluster« Previous MessageNext Message »
From:Jim Hoadley Date:July 19 2004 7:52pm
Subject:API loses data during node restarts
View as plain text  
When I stop/start or restart a database node, the API (MySQL server) loses
connection with the data until the node comes back online. This only happens on
one of my 2 nodes (BOX2). The other (BOX1) is fine. Been puzzling over this for
a week or so. Something I missed? Please forward any suggestions. Details
below.

BOX1 = Pentium III/1000MHz/512MB RAM
BOX2 = Pentium III/600MHz/512MB RAM
Both running mysql-4.1.3-beta-nightly-20040628.tar.gz.
Not a lot of RAM but only using a tiny test database at this point.
Running the MGM on a separate computer (BOX4) to help isolate problem.

Connected to BOX1, issue SELECT against test.simpsons and get proper response:

----------------------------------------
mysql> select * from simpsons ;
+----+------------+
| id | first_name |
+----+------------+
|  2 | Lisa       |
|  4 | Homer      |
|  5 | Maggie     |
|  3 | Marge      |
|  1 | Bart       |
+----+------------+
5 rows in set (0.03 sec)
----------------------------------------

Stop node 3 on BOX1. SELECT now fails:

----------------------------------------
mysql> select * from simpsons ;
ERROR 1015: Can't lock file (errno: 4009)
----------------------------------------

Repeating SELECT fails:

----------------------------------------
mysql> select * from simpsons ;
ERROR 2013: Lost connection to MySQL server during query
----------------------------------------

Repeating SELECT fails again, then succeeds after node 3 is restarted:

----------------------------------------
mysql> select * from simpsons ;
ERROR 2006: MySQL server has gone away
No connection. Trying to reconnect...
Connection id:    1
Current database: test
 
+----+------------+
| id | first_name |
+----+------------+
|  2 | Lisa       |
|  4 | Homer      |
|  5 | Maggie     |
|  3 | Marge      |
|  1 | Bart       |
+----+------------+
5 rows in set (6.55 sec)
----------------------------------------

All data is intact. BTW new records added to node 2 on BOX2 while node 3 on
BOX1 is down show up (this is good).

Here's what restarting node 3 on BOX1 with mgmd looks like (looks right to me):

----------------------------------------
NDB> show
Cluster Configuration
---------------------
2 NDB Node(s)
DB node:        2  (Version: 3.5.0)
DB node:        3  (Version: 3.5.0)
 
4 API Node(s)
API node:       11  (not connected)
API node:       12  (Version: 3.5.0)
API node:       13  (not connected)
API node:       14  (not connected)
 
1 MGM Node(s)
MGM node:       1  (Version: 3.5.0)
 
NDB> 2 restart
Executing RESTART on node 2.
Database node 2 is being restarted.
 
NDB> 2 - endTakeOver
----------------------------------------

Here is the MySQL server error log output on BOX1 as node 3 is restarted:

----------------------------------------
040713 10:53:31  mysqld started
040713 10:53:32  InnoDB: Started; log sequence number 0 44112
/usr/local/mysql/libexec/mysqld: ready for connections.
Version: '4.1.3-beta-nightly-20040628-log'  socket: '/tmp/mysql.sock' port:
3306
2004-07-19 11:39:15 [NDB] INFO     -- Node shutdown initiated
mysqld got signal 11;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.
 
key_buffer_size=16777216
read_buffer_size=258048
max_used_connections=1
max_connections=100
threads_connected=1
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 92783
K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
thd=0x8751280
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
Cannot determine thread, fp=0xafdc0c5c, backtrace may not be correct.
Stack range sanity check OK, backtrace follows:
0x81725e0
0xb749de48
0x81bc7a4
0x81f21c0
0x81bc7a4
0x81bb8e8
0x81b57e7
0x81ad41e
0x81adf45
0x81aa3fa
0x8187aba
0x818d8cc
0x81864b0
0x8186056
0x81857fa
0xb7497dac
0xb73c3a8a
New value of fp=(nil) failed sanity check, terminating stack trace!
Please read http://www.mysql.com/doc/en/Using_stack_trace.html and follow
instructions on how to resolve the stack trace. Resolved
stack trace is much more helpful in diagnosing the problem, so please do
resolve it
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort...
thd->query at 0x875d768 = select * from simpsons
thd->thread_id=7854
The manual page at http://www.mysql.com/doc/en/Crashing.html contains
information that should help you find out what is causing the crash.
 
Number of processes running now: 0
040719 11:39:24  mysqld restarted
InnoDB: Warning: we did not need to do crash recovery, but log scan
InnoDB: progressed past the checkpoint lsn 0 44112 up to lsn 0 44146
040719 11:39:24  InnoDB: Started; log sequence number 0 44112
Restarting system
2004-07-19 11:39:28 [NDB] INFO     -- Ndb has terminated (pid 18367) restarting
2004-07-19 11:39:28 [NDB] INFO     -- Angel pid: 18366 ndb pid: 18428
2004-07-19 11:39:28 [NDB] INFO     -- NDB Cluster -- DB node 2
2004-07-19 11:39:28 [NDB] INFO     -- Version 3.5.0 (beta) --
2004-07-19 11:39:28 [NDB] INFO     -- Start initiated (version 3.5.0)
NR: setLcpActiveStatusEnd - m_participatingLQH
2004-07-19 11:39:33 [NDB] INFO     -- Started (version 3.5.0)
/usr/local/mysql/libexec/mysqld: ready for connections.
Version: '4.1.3-beta-nightly-20040628-log'  socket: '/tmp/mysql.sock'  port:
3306
----------------------------------------

Here is the config.ini:

----------------------------------------
[DB DEFAULT]
NoOfReplicas: 2
MaxNoOfConcurrentOperations: 10000
DataMemory: 40M
IndexMemory: 12M
Discless: 0
  
[COMPUTER]
Id: 1
ByteOrder: Little
HostName: BOX3
  
[COMPUTER]
Id: 2
ByteOrder: Little
HostName: BOX2
  
[COMPUTER]
Id: 3
ByteOrder: Little
HostName: BOX3
  
[COMPUTER]
Id: 4
ByteOrder: Little
HostName: localhost
  
[COMPUTER]
Id: 5
ByteOrder: Little
HostName: localhost
  
[COMPUTER]
Id: 6
ByteOrder: Little
HostName: localhost
  
[COMPUTER]
Id: 7
ByteOrder: Little
HostName: localhost
  
[MGM]
Id: 1
ExecuteOnComputer: 1
PortNumber: 2200
  
[DB]
Id: 2
ExecuteOnComputer: 2
FileSystemPath: /var/ndbcluster/mysql-test/ndbcluster/node-2-fs-2200
  
[DB]
Id: 3
ExecuteOnComputer: 3
FileSystemPath: /var/ndbcluster/mysql-test/ndbcluster/node-3-fs-2200
  
[API]
Id: 11
ExecuteOnComputer: 4
  
[API]
Id: 12
ExecuteOnComputer: 5
  
[API]
Id: 13
ExecuteOnComputer: 6
  
[API]
Id: 14
ExecuteOnComputer: 7
  
[TCP DEFAULT]
PortNumber: 2202
----------------------------------------

Here is table status on BOX1 (database Engine type looks right, I guess):

----------------------------------------
mysql> SHOW TABLE STATUS;
+-----------+------------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+-------------+-------------+------------+-------------------+----------+----------------+-------------------------------------------+
| Name      | Engine     | Version | Row_format | Rows | Avg_row_length |
Data_length | Max_data_length | Index_length | Data_free | Auto_increment |
Create_time | Update_time | Check_time | Collation         | Checksum |
Create_options | Comment                                   |
+-----------+------------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+-------------+-------------+------------+-------------------+----------+----------------+-------------------------------------------+
| simpsons  | ndbcluster |       9 | Fixed      |  100 |              0 |
    0 |            NULL |            0 |         0 |           NULL | NULL
  | NULL        | NULL       | latin1_swedish_ci |     NULL |                |
                                         |
+-----------+------------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+-------------+-------------+------------+-------------------+----------+----------------+-------------------------------------------+
1 row in set (0.00 sec)
----------------------------------------

Here is table status on BOX2 (obviously, same as BOX1):

----------------------------------------
mysql> SHOW TABLE STATUS;
+----------+------------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+-------------+-------------+------------+-------------------+----------+----------------+---------+
| Name     | Engine     | Version | Row_format | Rows | Avg_row_length |
Data_length | Max_data_length | Index_length | Data_free | Auto_increment |
Create_time | Update_time | Check_time | Collation         | Checksum |
Create_options | Comment |
+----------+------------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+-------------+-------------+------------+-------------------+----------+----------------+---------+
| simpsons | ndbcluster |       9 | Fixed      |  100 |              0 |
   0 |            NULL |            0 |         0 |           NULL | NULL
 | NULL        | NULL       | latin1_swedish_ci |     NULL |                |
      |
+----------+------------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+-------------+-------------+------------+-------------------+----------+----------------+---------+
1 row in set (0.00 sec)
----------------------------------------

Here is /etc/hosts file from BOX1:

----------------------------------------
[root@BOX1 3.ndb_db]# cat /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1               localhost.localdomain localhost
192.168.1.211           BOX2
192.168.1.212           BOX1
192.168.1.213           BOX4
192.168.1.220           BOX3
----------------------------------------

Here is /etc/hosts file from BOX2:

----------------------------------------
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1               localhost.localdomain localhost
192.168.1.211           BOX2
192.168.1.212           BOX1
192.168.1.213           BOX4
192.168.1.220           BOX3
----------------------------------------





		
__________________________________
Do you Yahoo!?
Vote for the stars of Yahoo!'s next ad campaign!
http://advision.webevents.yahoo.com/yahoo/votelifeengine/

Thread
API loses data during node restartsJim Hoadley19 Jul
  • Re: API loses data during node restartsDevananda19 Jul
    • Re: API loses data during node restartsJohan Andersson19 Jul
      • Re: API loses data during node restartsJim Hoadley20 Jul
        • Re: API loses data during node restartsJustin Swanhart20 Jul
          • Re: API loses data during node restartsMikael Ronström20 Jul
          • Re: API loses data during node restartsDevananda20 Jul
            • Re: API loses data during node restartsJohan Andersson20 Jul
        • Re: API loses data during node restartsMikael Ronström20 Jul
          • Re: API loses data during node restartsJohan Andersson20 Jul
            • Re: API loses data during node restartsJim Hoadley20 Jul
          • Re: API loses data during node restartsJim Hoadley20 Jul
            • Re: API loses data during node restartsMikael Ronström20 Jul
              • Re: API loses data during node restartsJim Hoadley22 Jul