List:Cluster« Previous MessageNext Message »
From:Devananda Date:July 19 2004 9:21pm
Subject:Re: API loses data during node restarts
View as plain text  
I've been experiencing this same general problem, but haven't tried to 
narrow it down to a reproduceable pattern. Seems to happen in relation 
to restarting a DB node, like Jim said.

Jim Hoadley wrote:

>When I stop/start or restart a database node, the API (MySQL server) loses
>connection with the data until the node comes back online. This only happens on
>one of my 2 nodes (BOX2). The other (BOX1) is fine. Been puzzling over this for
>a week or so. Something I missed? Please forward any suggestions. Details
>below.
>
>BOX1 = Pentium III/1000MHz/512MB RAM
>BOX2 = Pentium III/600MHz/512MB RAM
>Both running mysql-4.1.3-beta-nightly-20040628.tar.gz.
>Not a lot of RAM but only using a tiny test database at this point.
>Running the MGM on a separate computer (BOX4) to help isolate problem.
>
>Connected to BOX1, issue SELECT against test.simpsons and get proper response:
>
>----------------------------------------
>mysql> select * from simpsons ;
>+----+------------+
>| id | first_name |
>+----+------------+
>|  2 | Lisa       |
>|  4 | Homer      |
>|  5 | Maggie     |
>|  3 | Marge      |
>|  1 | Bart       |
>+----+------------+
>5 rows in set (0.03 sec)
>----------------------------------------
>
>Stop node 3 on BOX1. SELECT now fails:
>
>----------------------------------------
>mysql> select * from simpsons ;
>ERROR 1015: Can't lock file (errno: 4009)
>----------------------------------------
>
>Repeating SELECT fails:
>
>----------------------------------------
>mysql> select * from simpsons ;
>ERROR 2013: Lost connection to MySQL server during query
>----------------------------------------
>
>Repeating SELECT fails again, then succeeds after node 3 is restarted:
>
>----------------------------------------
>mysql> select * from simpsons ;
>ERROR 2006: MySQL server has gone away
>No connection. Trying to reconnect...
>Connection id:    1
>Current database: test
> 
>+----+------------+
>| id | first_name |
>+----+------------+
>|  2 | Lisa       |
>|  4 | Homer      |
>|  5 | Maggie     |
>|  3 | Marge      |
>|  1 | Bart       |
>+----+------------+
>5 rows in set (6.55 sec)
>----------------------------------------
>
>All data is intact. BTW new records added to node 2 on BOX2 while node 3 on
>BOX1 is down show up (this is good).
>
>Here's what restarting node 3 on BOX1 with mgmd looks like (looks right to me):
>
>----------------------------------------
>NDB> show
>Cluster Configuration
>---------------------
>2 NDB Node(s)
>DB node:        2  (Version: 3.5.0)
>DB node:        3  (Version: 3.5.0)
> 
>4 API Node(s)
>API node:       11  (not connected)
>API node:       12  (Version: 3.5.0)
>API node:       13  (not connected)
>API node:       14  (not connected)
> 
>1 MGM Node(s)
>MGM node:       1  (Version: 3.5.0)
> 
>NDB> 2 restart
>Executing RESTART on node 2.
>Database node 2 is being restarted.
> 
>NDB> 2 - endTakeOver
>----------------------------------------
>
>Here is the MySQL server error log output on BOX1 as node 3 is restarted:
>
>----------------------------------------
>040713 10:53:31  mysqld started
>040713 10:53:32  InnoDB: Started; log sequence number 0 44112
>/usr/local/mysql/libexec/mysqld: ready for connections.
>Version: '4.1.3-beta-nightly-20040628-log'  socket: '/tmp/mysql.sock' port:
>3306
>2004-07-19 11:39:15 [NDB] INFO     -- Node shutdown initiated
>mysqld got signal 11;
>This could be because you hit a bug. It is also possible that this binary
>or one of the libraries it was linked against is corrupt, improperly built,
>or misconfigured. This error can also be caused by malfunctioning hardware.
>We will try our best to scrape up some info that will hopefully help diagnose
>the problem, but since we have already crashed, something is definitely wrong
>and this may fail.
> 
>key_buffer_size=16777216
>read_buffer_size=258048
>max_used_connections=1
>max_connections=100
>threads_connected=1
>It is possible that mysqld could use up to
>key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 92783
>K
>bytes of memory
>Hope that's ok; if not, decrease some variables in the equation.
> 
>thd=0x8751280
>Attempting backtrace. You can use the following information to find out
>where mysqld died. If you see no messages after this, something went
>terribly wrong...
>Cannot determine thread, fp=0xafdc0c5c, backtrace may not be correct.
>Stack range sanity check OK, backtrace follows:
>0x81725e0
>0xb749de48
>0x81bc7a4
>0x81f21c0
>0x81bc7a4
>0x81bb8e8
>0x81b57e7
>0x81ad41e
>0x81adf45
>0x81aa3fa
>0x8187aba
>0x818d8cc
>0x81864b0
>0x8186056
>0x81857fa
>0xb7497dac
>0xb73c3a8a
>New value of fp=(nil) failed sanity check, terminating stack trace!
>Please read http://www.mysql.com/doc/en/Using_stack_trace.html and follow
>instructions on how to resolve the stack trace. Resolved
>stack trace is much more helpful in diagnosing the problem, so please do
>resolve it
>Trying to get some variables.
>Some pointers may be invalid and cause the dump to abort...
>thd->query at 0x875d768 = select * from simpsons
>thd->thread_id=7854
>The manual page at http://www.mysql.com/doc/en/Crashing.html contains
>information that should help you find out what is causing the crash.
> 
>Number of processes running now: 0
>040719 11:39:24  mysqld restarted
>InnoDB: Warning: we did not need to do crash recovery, but log scan
>InnoDB: progressed past the checkpoint lsn 0 44112 up to lsn 0 44146
>040719 11:39:24  InnoDB: Started; log sequence number 0 44112
>Restarting system
>2004-07-19 11:39:28 [NDB] INFO     -- Ndb has terminated (pid 18367) restarting
>2004-07-19 11:39:28 [NDB] INFO     -- Angel pid: 18366 ndb pid: 18428
>2004-07-19 11:39:28 [NDB] INFO     -- NDB Cluster -- DB node 2
>2004-07-19 11:39:28 [NDB] INFO     -- Version 3.5.0 (beta) --
>2004-07-19 11:39:28 [NDB] INFO     -- Start initiated (version 3.5.0)
>NR: setLcpActiveStatusEnd - m_participatingLQH
>2004-07-19 11:39:33 [NDB] INFO     -- Started (version 3.5.0)
>/usr/local/mysql/libexec/mysqld: ready for connections.
>Version: '4.1.3-beta-nightly-20040628-log'  socket: '/tmp/mysql.sock'  port:
>3306
>----------------------------------------
>
>Here is the config.ini:
>
>----------------------------------------
>[DB DEFAULT]
>NoOfReplicas: 2
>MaxNoOfConcurrentOperations: 10000
>DataMemory: 40M
>IndexMemory: 12M
>Discless: 0
>  
>[COMPUTER]
>Id: 1
>ByteOrder: Little
>HostName: BOX3
>  
>[COMPUTER]
>Id: 2
>ByteOrder: Little
>HostName: BOX2
>  
>[COMPUTER]
>Id: 3
>ByteOrder: Little
>HostName: BOX3
>  
>[COMPUTER]
>Id: 4
>ByteOrder: Little
>HostName: localhost
>  
>[COMPUTER]
>Id: 5
>ByteOrder: Little
>HostName: localhost
>  
>[COMPUTER]
>Id: 6
>ByteOrder: Little
>HostName: localhost
>  
>[COMPUTER]
>Id: 7
>ByteOrder: Little
>HostName: localhost
>  
>[MGM]
>Id: 1
>ExecuteOnComputer: 1
>PortNumber: 2200
>  
>[DB]
>Id: 2
>ExecuteOnComputer: 2
>FileSystemPath: /var/ndbcluster/mysql-test/ndbcluster/node-2-fs-2200
>  
>[DB]
>Id: 3
>ExecuteOnComputer: 3
>FileSystemPath: /var/ndbcluster/mysql-test/ndbcluster/node-3-fs-2200
>  
>[API]
>Id: 11
>ExecuteOnComputer: 4
>  
>[API]
>Id: 12
>ExecuteOnComputer: 5
>  
>[API]
>Id: 13
>ExecuteOnComputer: 6
>  
>[API]
>Id: 14
>ExecuteOnComputer: 7
>  
>[TCP DEFAULT]
>PortNumber: 2202
>----------------------------------------
>
>Here is table status on BOX1 (database Engine type looks right, I guess):
>
>----------------------------------------
>mysql> SHOW TABLE STATUS;
>+-----------+------------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+-------------+-------------+------------+-------------------+----------+----------------+-------------------------------------------+
>| Name      | Engine     | Version | Row_format | Rows | Avg_row_length |
>Data_length | Max_data_length | Index_length | Data_free | Auto_increment |
>Create_time | Update_time | Check_time | Collation         | Checksum |
>Create_options | Comment                                   |
>+-----------+------------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+-------------+-------------+------------+-------------------+----------+----------------+-------------------------------------------+
>| simpsons  | ndbcluster |       9 | Fixed      |  100 |              0 |
>    0 |            NULL |            0 |         0 |           NULL | NULL
>  | NULL        | NULL       | latin1_swedish_ci |     NULL |                |
>                                         |
>+-----------+------------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+-------------+-------------+------------+-------------------+----------+----------------+-------------------------------------------+
>1 row in set (0.00 sec)
>----------------------------------------
>
>Here is table status on BOX2 (obviously, same as BOX1):
>
>----------------------------------------
>mysql> SHOW TABLE STATUS;
>+----------+------------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+-------------+-------------+------------+-------------------+----------+----------------+---------+
>| Name     | Engine     | Version | Row_format | Rows | Avg_row_length |
>Data_length | Max_data_length | Index_length | Data_free | Auto_increment |
>Create_time | Update_time | Check_time | Collation         | Checksum |
>Create_options | Comment |
>+----------+------------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+-------------+-------------+------------+-------------------+----------+----------------+---------+
>| simpsons | ndbcluster |       9 | Fixed      |  100 |              0 |
>   0 |            NULL |            0 |         0 |           NULL | NULL
> | NULL        | NULL       | latin1_swedish_ci |     NULL |                |
>      |
>+----------+------------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+-------------+-------------+------------+-------------------+----------+----------------+---------+
>1 row in set (0.00 sec)
>----------------------------------------
>
>Here is /etc/hosts file from BOX1:
>
>----------------------------------------
>[root@BOX1 3.ndb_db]# cat /etc/hosts
># Do not remove the following line, or various programs
># that require network functionality will fail.
>127.0.0.1               localhost.localdomain localhost
>192.168.1.211           BOX2
>192.168.1.212           BOX1
>192.168.1.213           BOX4
>192.168.1.220           BOX3
>----------------------------------------
>
>Here is /etc/hosts file from BOX2:
>
>----------------------------------------
># Do not remove the following line, or various programs
># that require network functionality will fail.
>127.0.0.1               localhost.localdomain localhost
>192.168.1.211           BOX2
>192.168.1.212           BOX1
>192.168.1.213           BOX4
>192.168.1.220           BOX3
>----------------------------------------
>
>
>
>
>
>		
>__________________________________
>Do you Yahoo!?
>Vote for the stars of Yahoo!'s next ad campaign!
>http://advision.webevents.yahoo.com/yahoo/votelifeengine/
>
>
>  
>
Thread
API loses data during node restartsJim Hoadley19 Jul
  • Re: API loses data during node restartsDevananda19 Jul
    • Re: API loses data during node restartsJohan Andersson19 Jul
      • Re: API loses data during node restartsJim Hoadley20 Jul
        • Re: API loses data during node restartsJustin Swanhart20 Jul
          • Re: API loses data during node restartsMikael Ronström20 Jul
          • Re: API loses data during node restartsDevananda20 Jul
            • Re: API loses data during node restartsJohan Andersson20 Jul
        • Re: API loses data during node restartsMikael Ronström20 Jul
          • Re: API loses data during node restartsJohan Andersson20 Jul
            • Re: API loses data during node restartsJim Hoadley20 Jul
          • Re: API loses data during node restartsJim Hoadley20 Jul
            • Re: API loses data during node restartsMikael Ronström20 Jul
              • Re: API loses data during node restartsJim Hoadley22 Jul