List:Cluster« Previous MessageNext Message »
From:Mikael Ronström Date:July 19 2004 11:12pm
Subject:Re: API loses data during node restarts
View as plain text  
Hi Jim,
The error code you get 4009 means that the mysql server believes that 
the cluster is down.
It is related to the bug reported by Johan but not the same so I
suggest you go ahead and file a new bug report to ensure that it gets 
priority in development.

 From what I could see in your email it seems as if one of the mysqld 
process finds the cluster down
(in error) which is reported with error code 4009. Then immediately 
after doing so the mysql server
itself fails. Does that analysis seem correct?

Rgrds Mikael

2004-07-20 kl. 00.36 skrev Jim Hoadley:

> Johan --
>
> Thanks for the fast response! I read bug report 4585. It says:
>
> -   Description:
> -   If entire DB cluster goes down, then the mysqld servers should 
> retry
> -   connecting to the DB. The mysql servers must not give up trying to
> reconnect
> -   to DB nodes.
> -
> -   If the mysqld is not restarted after a cluster restart and a query 
> is
> -   executed on that mysqld, then the mysqld will crash. Not so nice.
> -
> -   How to repeat:
> -   1. restart cluster
> -   2. issue a query on one mysqld server
> -
> -   Suggested fix:
> -   Let be there be a configurable option (--ndbcluster_timeout)  for 
> how long
> -   the mysqld should try to reconnect to the db nodes.
> -   --ndbcluster_timeout={0,0x7fffffff} and let 0 be retry forever.
>
> Not sure we're talking about the same issue. I'm not taking the entire 
> cluster
> down, just one of the nodes. In that case, shouldn't the API 
> seamlessly and
> instantly read from another node?
>
> 1) I have a 2-node cluster with 2 replicas, with an API running on 
> each node.
> 2) I run a shell script that connects to the first API and executes 
> one SELECT
>    query per second. I can stop either DB node everything still works.
> 3) I run the same script against the second API. I can stop the DB 
> node on the
>    *other* computer, but if I stop the DB node on the same computer 
> that the
> API
>    is running on, mysqld reports it can't get a lock on the data file 
> until the
>    node comes back up.
> 4) When the node is started again the API begins answering queries 
> again.
>
> Comments? Thanks again for taking the time to look at my problem.
>
> -- Jim
>
>
> --- Johan Andersson <johan@stripped> wrote:
>> Hi,
>> A bug report (4585) relating to this has been filed.
>> Sorry for your inconvenience,
>>
>> b.r,
>> Johan Andersson
>>
>> Devananda wrote:
>>
>>> I've been experiencing this same general problem, but haven't tried 
>>> to
>>> narrow it down to a reproduceable pattern. Seems to happen in 
>>> relation
>>> to restarting a DB node, like Jim said.
>>>
>>> Jim Hoadley wrote:
>>>
>>>> When I stop/start or restart a database node, the API (MySQL server)
>>>> loses
>>>> connection with the data until the node comes back online. This only
>>>> happens on
>>>> one of my 2 nodes (BOX2). The other (BOX1) is fine. Been puzzling
>>>> over this for
>>>> a week or so. Something I missed? Please forward any suggestions.
>>>> Details
>>>> below.
>>>>
>>>> BOX1 = Pentium III/1000MHz/512MB RAM
>>>> BOX2 = Pentium III/600MHz/512MB RAM
>>>> Both running mysql-4.1.3-beta-nightly-20040628.tar.gz.
>>>> Not a lot of RAM but only using a tiny test database at this point.
>>>> Running the MGM on a separate computer (BOX4) to help isolate 
>>>> problem.
>>>>
>>>> Connected to BOX1, issue SELECT against test.simpsons and get proper
>>>> response:
>>>>
>>>> ----------------------------------------
>>>> mysql> select * from simpsons ;
>>>> +----+------------+
>>>> | id | first_name |
>>>> +----+------------+
>>>> |  2 | Lisa       |
>>>> |  4 | Homer      |
>>>> |  5 | Maggie     |
>>>> |  3 | Marge      |
>>>> |  1 | Bart       |
>>>> +----+------------+
>>>> 5 rows in set (0.03 sec)
>>>> ----------------------------------------
>>>>
>>>> Stop node 3 on BOX1. SELECT now fails:
>>>>
>>>> ----------------------------------------
>>>> mysql> select * from simpsons ;
>>>> ERROR 1015: Can't lock file (errno: 4009)
>>>> ----------------------------------------
>>>>
>>>> Repeating SELECT fails:
>>>>
>>>> ----------------------------------------
>>>> mysql> select * from simpsons ;
>>>> ERROR 2013: Lost connection to MySQL server during query
>>>> ----------------------------------------
>>>>
>>>> Repeating SELECT fails again, then succeeds after node 3 is 
>>>> restarted:
>>>>
>>>> ----------------------------------------
>>>> mysql> select * from simpsons ;
>>>> ERROR 2006: MySQL server has gone away
>>>> No connection. Trying to reconnect...
>>>> Connection id:    1
>>>> Current database: test
>>>>
>>>> +----+------------+
>>>> | id | first_name |
>>>> +----+------------+
>>>> |  2 | Lisa       |
>>>> |  4 | Homer      |
>>>> |  5 | Maggie     |
>>>> |  3 | Marge      |
>>>> |  1 | Bart       |
>>>> +----+------------+
>>>> 5 rows in set (6.55 sec)
>>>> ----------------------------------------
>>>>
>>>> All data is intact. BTW new records added to node 2 on BOX2 while
>>>> node 3 on
>>>> BOX1 is down show up (this is good).
>>>>
>>>> Here's what restarting node 3 on BOX1 with mgmd looks like (looks
>>>> right to me):
>>>>
>>>> ----------------------------------------
>>>> NDB> show
>>>> Cluster Configuration
>>>> ---------------------
>>>> 2 NDB Node(s)
>>>> DB node:        2  (Version: 3.5.0)
>>>> DB node:        3  (Version: 3.5.0)
>>>>
>>>> 4 API Node(s)
>>>> API node:       11  (not connected)
>>>> API node:       12  (Version: 3.5.0)
>>>> API node:       13  (not connected)
>>>> API node:       14  (not connected)
>>>>
>>>> 1 MGM Node(s)
>>>> MGM node:       1  (Version: 3.5.0)
>>>>
>>>> NDB> 2 restart
>>>> Executing RESTART on node 2.
>>>> Database node 2 is being restarted.
>>>>
>>>> NDB> 2 - endTakeOver
>>>> ----------------------------------------
>>>>
>>>> Here is the MySQL server error log output on BOX1 as node 3 is
>>>> restarted:
>>>>
>>>> ----------------------------------------
>>>> 040713 10:53:31  mysqld started
>>>> 040713 10:53:32  InnoDB: Started; log sequence number 0 44112
>>>> /usr/local/mysql/libexec/mysqld: ready for connections.
>>>> Version: '4.1.3-beta-nightly-20040628-log'  socket: 
>>>> '/tmp/mysql.sock'
>>>> port:
>>>> 3306
>>>> 2004-07-19 11:39:15 [NDB] INFO     -- Node shutdown initiated
>>>> mysqld got signal 11;
>>>> This could be because you hit a bug. It is also possible that this
>>>> binary
>>>> or one of the libraries it was linked against is corrupt, improperly
>>>> built,
>>>> or misconfigured. This error can also be caused by malfunctioning
>>>> hardware.
>>>> We will try our best to scrape up some info that will hopefully help
>>>> diagnose
>>>> the problem, but since we have already crashed, something is
>>>> definitely wrong
>>>> and this may fail.
>>>>
>>>> key_buffer_size=16777216
>>>> read_buffer_size=258048
>>>> max_used_connections=1
>>>> max_connections=100
>>>> threads_connected=1
>>>> It is possible that mysqld could use up to
>>>> key_buffer_size + (read_buffer_size +
>>>> sort_buffer_size)*max_connections = 92783
>>>> K
>>>> bytes of memory
>>>> Hope that's ok; if not, decrease some variables in the equation.
>>>>
>>>> thd=0x8751280
>>>> Attempting backtrace. You can use the following information to find 
>>>> out
>>>> where mysqld died. If you see no messages after this, something went
>>>> terribly wrong...
>>>> Cannot determine thread, fp=0xafdc0c5c, backtrace may not be 
>>>> correct.
>>>> Stack range sanity check OK, backtrace follows:
>>>> 0x81725e0
>>>> 0xb749de48
>>>> 0x81bc7a4
>>>> 0x81f21c0
>>>> 0x81bc7a4
>>>> 0x81bb8e8
>>>> 0x81b57e7
>>>> 0x81ad41e
>>>> 0x81adf45
>>>> 0x81aa3fa
>>>> 0x8187aba
>>>> 0x818d8cc
>>>> 0x81864b0
>>>> 0x8186056
>>>> 0x81857fa
>>>> 0xb7497dac
>>>> 0xb73c3a8a
>>>> New value of fp=(nil) failed sanity check, terminating stack trace!
>>>> Please read http://www.mysql.com/doc/en/Using_stack_trace.html and
>>>> follow
>>>> instructions on how to resolve the stack trace. Resolved
>>>> stack trace is much more helpful in diagnosing the problem, so 
>>>> please do
>>>> resolve it
>>>> Trying to get some variables.
>>>> Some pointers may be invalid and cause the dump to abort...
>>>> thd->query at 0x875d768 = select * from simpsons
>>>> thd->thread_id=7854
>>>> The manual page at http://www.mysql.com/doc/en/Crashing.html 
>>>> contains
>>>> information that should help you find out what is causing the crash.
>>>>
>>>> Number of processes running now: 0
>>>> 040719 11:39:24  mysqld restarted
>>>> InnoDB: Warning: we did not need to do crash recovery, but log scan
>>>> InnoDB: progressed past the checkpoint lsn 0 44112 up to lsn 0 44146
>>>> 040719 11:39:24  InnoDB: Started; log sequence number 0 44112
>>>> Restarting system
>>>> 2004-07-19 11:39:28 [NDB] INFO     -- Ndb has terminated (pid 18367)
>>>> restarting
>>>> 2004-07-19 11:39:28 [NDB] INFO     -- Angel pid: 18366 ndb pid: 
>>>> 18428
>>>> 2004-07-19 11:39:28 [NDB] INFO     -- NDB Cluster -- DB node 2
>>>> 2004-07-19 11:39:28 [NDB] INFO     -- Version 3.5.0 (beta) --
>>>> 2004-07-19 11:39:28 [NDB] INFO     -- Start initiated (version 
>>>> 3.5.0)
>>>> NR: setLcpActiveStatusEnd - m_participatingLQH
>>>> 2004-07-19 11:39:33 [NDB] INFO     -- Started (version 3.5.0)
>>>> /usr/local/mysql/libexec/mysqld: ready for connections.
>>
> === message truncated ===
>
>
>
> 	
> 		
> __________________________________
> Do you Yahoo!?
> New and Improved Yahoo! Mail - 100MB free storage!
> http://promotions.yahoo.com/new_mail
>
> -- 
> MySQL Cluster Mailing List
> For list archives: http://lists.mysql.com/cluster
> To unsubscribe:    
> http://lists.mysql.com/cluster?unsub=1
>
Mikael Ronström, Senior Software Architect
MySQL AB, www.mysql.com

Clustering:
http://www.infoworld.com/article/04/04/14/HNmysqlcluster_1.html

http://www.eweek.com/article2/0,1759,1567546,00.asp


Thread
API loses data during node restartsJim Hoadley19 Jul
  • Re: API loses data during node restartsDevananda19 Jul
    • Re: API loses data during node restartsJohan Andersson19 Jul
      • Re: API loses data during node restartsJim Hoadley20 Jul
        • Re: API loses data during node restartsJustin Swanhart20 Jul
          • Re: API loses data during node restartsMikael Ronström20 Jul
          • Re: API loses data during node restartsDevananda20 Jul
            • Re: API loses data during node restartsJohan Andersson20 Jul
        • Re: API loses data during node restartsMikael Ronström20 Jul
          • Re: API loses data during node restartsJohan Andersson20 Jul
            • Re: API loses data during node restartsJim Hoadley20 Jul
          • Re: API loses data during node restartsJim Hoadley20 Jul
            • Re: API loses data during node restartsMikael Ronström20 Jul
              • Re: API loses data during node restartsJim Hoadley22 Jul