List:Cluster« Previous MessageNext Message »
From:Johan Andersson Date:July 19 2004 9:32pm
Subject:Re: API loses data during node restarts
View as plain text  
Hi,
A bug report (4585) relating to this has been filed.
Sorry for your inconvenience,

b.r,
Johan Andersson

Devananda wrote:

> I've been experiencing this same general problem, but haven't tried to 
> narrow it down to a reproduceable pattern. Seems to happen in relation 
> to restarting a DB node, like Jim said.
>
> Jim Hoadley wrote:
>
>> When I stop/start or restart a database node, the API (MySQL server) 
>> loses
>> connection with the data until the node comes back online. This only 
>> happens on
>> one of my 2 nodes (BOX2). The other (BOX1) is fine. Been puzzling 
>> over this for
>> a week or so. Something I missed? Please forward any suggestions. 
>> Details
>> below.
>>
>> BOX1 = Pentium III/1000MHz/512MB RAM
>> BOX2 = Pentium III/600MHz/512MB RAM
>> Both running mysql-4.1.3-beta-nightly-20040628.tar.gz.
>> Not a lot of RAM but only using a tiny test database at this point.
>> Running the MGM on a separate computer (BOX4) to help isolate problem.
>>
>> Connected to BOX1, issue SELECT against test.simpsons and get proper 
>> response:
>>
>> ----------------------------------------
>> mysql> select * from simpsons ;
>> +----+------------+
>> | id | first_name |
>> +----+------------+
>> |  2 | Lisa       |
>> |  4 | Homer      |
>> |  5 | Maggie     |
>> |  3 | Marge      |
>> |  1 | Bart       |
>> +----+------------+
>> 5 rows in set (0.03 sec)
>> ----------------------------------------
>>
>> Stop node 3 on BOX1. SELECT now fails:
>>
>> ----------------------------------------
>> mysql> select * from simpsons ;
>> ERROR 1015: Can't lock file (errno: 4009)
>> ----------------------------------------
>>
>> Repeating SELECT fails:
>>
>> ----------------------------------------
>> mysql> select * from simpsons ;
>> ERROR 2013: Lost connection to MySQL server during query
>> ----------------------------------------
>>
>> Repeating SELECT fails again, then succeeds after node 3 is restarted:
>>
>> ----------------------------------------
>> mysql> select * from simpsons ;
>> ERROR 2006: MySQL server has gone away
>> No connection. Trying to reconnect...
>> Connection id:    1
>> Current database: test
>>
>> +----+------------+
>> | id | first_name |
>> +----+------------+
>> |  2 | Lisa       |
>> |  4 | Homer      |
>> |  5 | Maggie     |
>> |  3 | Marge      |
>> |  1 | Bart       |
>> +----+------------+
>> 5 rows in set (6.55 sec)
>> ----------------------------------------
>>
>> All data is intact. BTW new records added to node 2 on BOX2 while 
>> node 3 on
>> BOX1 is down show up (this is good).
>>
>> Here's what restarting node 3 on BOX1 with mgmd looks like (looks 
>> right to me):
>>
>> ----------------------------------------
>> NDB> show
>> Cluster Configuration
>> ---------------------
>> 2 NDB Node(s)
>> DB node:        2  (Version: 3.5.0)
>> DB node:        3  (Version: 3.5.0)
>>
>> 4 API Node(s)
>> API node:       11  (not connected)
>> API node:       12  (Version: 3.5.0)
>> API node:       13  (not connected)
>> API node:       14  (not connected)
>>
>> 1 MGM Node(s)
>> MGM node:       1  (Version: 3.5.0)
>>
>> NDB> 2 restart
>> Executing RESTART on node 2.
>> Database node 2 is being restarted.
>>
>> NDB> 2 - endTakeOver
>> ----------------------------------------
>>
>> Here is the MySQL server error log output on BOX1 as node 3 is 
>> restarted:
>>
>> ----------------------------------------
>> 040713 10:53:31  mysqld started
>> 040713 10:53:32  InnoDB: Started; log sequence number 0 44112
>> /usr/local/mysql/libexec/mysqld: ready for connections.
>> Version: '4.1.3-beta-nightly-20040628-log'  socket: '/tmp/mysql.sock' 
>> port:
>> 3306
>> 2004-07-19 11:39:15 [NDB] INFO     -- Node shutdown initiated
>> mysqld got signal 11;
>> This could be because you hit a bug. It is also possible that this 
>> binary
>> or one of the libraries it was linked against is corrupt, improperly 
>> built,
>> or misconfigured. This error can also be caused by malfunctioning 
>> hardware.
>> We will try our best to scrape up some info that will hopefully help 
>> diagnose
>> the problem, but since we have already crashed, something is 
>> definitely wrong
>> and this may fail.
>>
>> key_buffer_size=16777216
>> read_buffer_size=258048
>> max_used_connections=1
>> max_connections=100
>> threads_connected=1
>> It is possible that mysqld could use up to
>> key_buffer_size + (read_buffer_size + 
>> sort_buffer_size)*max_connections = 92783
>> K
>> bytes of memory
>> Hope that's ok; if not, decrease some variables in the equation.
>>
>> thd=0x8751280
>> Attempting backtrace. You can use the following information to find out
>> where mysqld died. If you see no messages after this, something went
>> terribly wrong...
>> Cannot determine thread, fp=0xafdc0c5c, backtrace may not be correct.
>> Stack range sanity check OK, backtrace follows:
>> 0x81725e0
>> 0xb749de48
>> 0x81bc7a4
>> 0x81f21c0
>> 0x81bc7a4
>> 0x81bb8e8
>> 0x81b57e7
>> 0x81ad41e
>> 0x81adf45
>> 0x81aa3fa
>> 0x8187aba
>> 0x818d8cc
>> 0x81864b0
>> 0x8186056
>> 0x81857fa
>> 0xb7497dac
>> 0xb73c3a8a
>> New value of fp=(nil) failed sanity check, terminating stack trace!
>> Please read http://www.mysql.com/doc/en/Using_stack_trace.html and 
>> follow
>> instructions on how to resolve the stack trace. Resolved
>> stack trace is much more helpful in diagnosing the problem, so please do
>> resolve it
>> Trying to get some variables.
>> Some pointers may be invalid and cause the dump to abort...
>> thd->query at 0x875d768 = select * from simpsons
>> thd->thread_id=7854
>> The manual page at http://www.mysql.com/doc/en/Crashing.html contains
>> information that should help you find out what is causing the crash.
>>
>> Number of processes running now: 0
>> 040719 11:39:24  mysqld restarted
>> InnoDB: Warning: we did not need to do crash recovery, but log scan
>> InnoDB: progressed past the checkpoint lsn 0 44112 up to lsn 0 44146
>> 040719 11:39:24  InnoDB: Started; log sequence number 0 44112
>> Restarting system
>> 2004-07-19 11:39:28 [NDB] INFO     -- Ndb has terminated (pid 18367) 
>> restarting
>> 2004-07-19 11:39:28 [NDB] INFO     -- Angel pid: 18366 ndb pid: 18428
>> 2004-07-19 11:39:28 [NDB] INFO     -- NDB Cluster -- DB node 2
>> 2004-07-19 11:39:28 [NDB] INFO     -- Version 3.5.0 (beta) --
>> 2004-07-19 11:39:28 [NDB] INFO     -- Start initiated (version 3.5.0)
>> NR: setLcpActiveStatusEnd - m_participatingLQH
>> 2004-07-19 11:39:33 [NDB] INFO     -- Started (version 3.5.0)
>> /usr/local/mysql/libexec/mysqld: ready for connections.
>> Version: '4.1.3-beta-nightly-20040628-log'  socket: 
>> '/tmp/mysql.sock'  port:
>> 3306
>> ----------------------------------------
>>
>> Here is the config.ini:
>>
>> ----------------------------------------
>> [DB DEFAULT]
>> NoOfReplicas: 2
>> MaxNoOfConcurrentOperations: 10000
>> DataMemory: 40M
>> IndexMemory: 12M
>> Discless: 0
>>  
>> [COMPUTER]
>> Id: 1
>> ByteOrder: Little
>> HostName: BOX3
>>  
>> [COMPUTER]
>> Id: 2
>> ByteOrder: Little
>> HostName: BOX2
>>  
>> [COMPUTER]
>> Id: 3
>> ByteOrder: Little
>> HostName: BOX3
>>  
>> [COMPUTER]
>> Id: 4
>> ByteOrder: Little
>> HostName: localhost
>>  
>> [COMPUTER]
>> Id: 5
>> ByteOrder: Little
>> HostName: localhost
>>  
>> [COMPUTER]
>> Id: 6
>> ByteOrder: Little
>> HostName: localhost
>>  
>> [COMPUTER]
>> Id: 7
>> ByteOrder: Little
>> HostName: localhost
>>  
>> [MGM]
>> Id: 1
>> ExecuteOnComputer: 1
>> PortNumber: 2200
>>  
>> [DB]
>> Id: 2
>> ExecuteOnComputer: 2
>> FileSystemPath: /var/ndbcluster/mysql-test/ndbcluster/node-2-fs-2200
>>  
>> [DB]
>> Id: 3
>> ExecuteOnComputer: 3
>> FileSystemPath: /var/ndbcluster/mysql-test/ndbcluster/node-3-fs-2200
>>  
>> [API]
>> Id: 11
>> ExecuteOnComputer: 4
>>  
>> [API]
>> Id: 12
>> ExecuteOnComputer: 5
>>  
>> [API]
>> Id: 13
>> ExecuteOnComputer: 6
>>  
>> [API]
>> Id: 14
>> ExecuteOnComputer: 7
>>  
>> [TCP DEFAULT]
>> PortNumber: 2202
>> ----------------------------------------
>>
>> Here is table status on BOX1 (database Engine type looks right, I 
>> guess):
>>
>> ----------------------------------------
>> mysql> SHOW TABLE STATUS;
>>
> +-----------+------------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+-------------+-------------+------------+-------------------+----------+----------------+-------------------------------------------+
> 
>>
>> | Name      | Engine     | Version | Row_format | Rows | 
>> Avg_row_length |
>> Data_length | Max_data_length | Index_length | Data_free | 
>> Auto_increment |
>> Create_time | Update_time | Check_time | Collation         | Checksum |
>> Create_options | Comment                                   |
>>
> +-----------+------------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+-------------+-------------+------------+-------------------+----------+----------------+-------------------------------------------+
> 
>>
>> | simpsons  | ndbcluster |       9 | Fixed      |  100 |              
>> 0 |
>>    0 |            NULL |            0 |         0 |           NULL | 
>> NULL
>>  | NULL        | NULL       | latin1_swedish_ci |     NULL 
>> |                |
>>                                         |
>>
> +-----------+------------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+-------------+-------------+------------+-------------------+----------+----------------+-------------------------------------------+
> 
>>
>> 1 row in set (0.00 sec)
>> ----------------------------------------
>>
>> Here is table status on BOX2 (obviously, same as BOX1):
>>
>> ----------------------------------------
>> mysql> SHOW TABLE STATUS;
>>
> +----------+------------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+-------------+-------------+------------+-------------------+----------+----------------+---------+
> 
>>
>> | Name     | Engine     | Version | Row_format | Rows | Avg_row_length |
>> Data_length | Max_data_length | Index_length | Data_free | 
>> Auto_increment |
>> Create_time | Update_time | Check_time | Collation         | Checksum |
>> Create_options | Comment |
>>
> +----------+------------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+-------------+-------------+------------+-------------------+----------+----------------+---------+
> 
>>
>> | simpsons | ndbcluster |       9 | Fixed      |  100 |              0 |
>>   0 |            NULL |            0 |         0 |           NULL | NULL
>> | NULL        | NULL       | latin1_swedish_ci |     NULL 
>> |                |
>>      |
>>
> +----------+------------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+-------------+-------------+------------+-------------------+----------+----------------+---------+
> 
>>
>> 1 row in set (0.00 sec)
>> ----------------------------------------
>>
>> Here is /etc/hosts file from BOX1:
>>
>> ----------------------------------------
>> [root@BOX1 3.ndb_db]# cat /etc/hosts
>> # Do not remove the following line, or various programs
>> # that require network functionality will fail.
>> 127.0.0.1               localhost.localdomain localhost
>> 192.168.1.211           BOX2
>> 192.168.1.212           BOX1
>> 192.168.1.213           BOX4
>> 192.168.1.220           BOX3
>> ----------------------------------------
>>
>> Here is /etc/hosts file from BOX2:
>>
>> ----------------------------------------
>> # Do not remove the following line, or various programs
>> # that require network functionality will fail.
>> 127.0.0.1               localhost.localdomain localhost
>> 192.168.1.211           BOX2
>> 192.168.1.212           BOX1
>> 192.168.1.213           BOX4
>> 192.168.1.220           BOX3
>> ----------------------------------------
>>
>>
>>
>>
>>
>>        
>> __________________________________
>> Do you Yahoo!?
>> Vote for the stars of Yahoo!'s next ad campaign!
>> http://advision.webevents.yahoo.com/yahoo/votelifeengine/
>>
>>
>>  
>>
>

Thread
API loses data during node restartsJim Hoadley19 Jul
  • Re: API loses data during node restartsDevananda19 Jul
    • Re: API loses data during node restartsJohan Andersson19 Jul
      • Re: API loses data during node restartsJim Hoadley20 Jul
        • Re: API loses data during node restartsJustin Swanhart20 Jul
          • Re: API loses data during node restartsMikael Ronström20 Jul
          • Re: API loses data during node restartsDevananda20 Jul
            • Re: API loses data during node restartsJohan Andersson20 Jul
        • Re: API loses data during node restartsMikael Ronström20 Jul
          • Re: API loses data during node restartsJohan Andersson20 Jul
            • Re: API loses data during node restartsJim Hoadley20 Jul
          • Re: API loses data during node restartsJim Hoadley20 Jul
            • Re: API loses data during node restartsMikael Ronström20 Jul
              • Re: API loses data during node restartsJim Hoadley22 Jul