List:Cluster« Previous MessageNext Message »
From:Jim Hoadley Date:July 20 2004 12:22am
Subject:Re: API loses data during node restarts
View as plain text  
Mikael --

Yes, the MySQL server reports "ERROR 1015: Can't lock file (errno: 4009)" the 
instant a database node running on the same computer as the API is stopped.

In thinking about this problem please remember that I consistently see this
behavior on one node but never on the other, and that the MySQL server API
always resumes serving cluster data once the database node starts back up. 

I will follow your suggestion and file a bug report. Thanks!

-- Jim


--- Mikael_Ronstr> Hi Jim,
> The error code you get 4009 means that the mysql server believes that 
> the cluster is down.
> It is related to the bug reported by Johan but not the same so I
> suggest you go ahead and file a new bug report to ensure that it gets 
> priority in development.
> 
>  From what I could see in your email it seems as if one of the mysqld 
> process finds the cluster down
> (in error) which is reported with error code 4009. Then immediately 
> after doing so the mysql server
> itself fails. Does that analysis seem correct?
> 
> Rgrds Mikael
> 
> 2004-07-20 kl. 00.36 skrev Jim Hoadley:
> 
> > Johan --
> >
> > Thanks for the fast response! I read bug report 4585. It says:
> >
> > -   Description:
> > -   If entire DB cluster goes down, then the mysqld servers should 
> > retry
> > -   connecting to the DB. The mysql servers must not give up trying to
> > reconnect
> > -   to DB nodes.
> > -
> > -   If the mysqld is not restarted after a cluster restart and a query 
> > is
> > -   executed on that mysqld, then the mysqld will crash. Not so nice.
> > -
> > -   How to repeat:
> > -   1. restart cluster
> > -   2. issue a query on one mysqld server
> > -
> > -   Suggested fix:
> > -   Let be there be a configurable option (--ndbcluster_timeout)  for 
> > how long
> > -   the mysqld should try to reconnect to the db nodes.
> > -   --ndbcluster_timeout={0,0x7fffffff} and let 0 be retry forever.
> >
> > Not sure we're talking about the same issue. I'm not taking the entire 
> > cluster
> > down, just one of the nodes. In that case, shouldn't the API 
> > seamlessly and
> > instantly read from another node?
> >
> > 1) I have a 2-node cluster with 2 replicas, with an API running on 
> > each node.
> > 2) I run a shell script that connects to the first API and executes 
> > one SELECT
> >    query per second. I can stop either DB node everything still works.
> > 3) I run the same script against the second API. I can stop the DB 
> > node on the
> >    *other* computer, but if I stop the DB node on the same computer 
> > that the
> > API
> >    is running on, mysqld reports it can't get a lock on the data file 
> > until the
> >    node comes back up.
> > 4) When the node is started again the API begins answering queries 
> > again.
> >
> > Comments? Thanks again for taking the time to look at my problem.
> >
> > -- Jim
> >
> >
> > --- Johan Andersson <johan@stripped> wrote:
> >> Hi,
> >> A bug report (4585) relating to this has been filed.
> >> Sorry for your inconvenience,
> >>
> >> b.r,
> >> Johan Andersson
> >>
> >> Devananda wrote:
> >>
> >>> I've been experiencing this same general problem, but haven't tried 
> >>> to
> >>> narrow it down to a reproduceable pattern. Seems to happen in 
> >>> relation
> >>> to restarting a DB node, like Jim said.
> >>>
> >>> Jim Hoadley wrote:
> >>>
> >>>> When I stop/start or restart a database node, the API (MySQL
> server)
> >>>> loses
> >>>> connection with the data until the node comes back online. This
> only
> >>>> happens on
> >>>> one of my 2 nodes (BOX2). The other (BOX1) is fine. Been puzzling
> >>>> over this for
> >>>> a week or so. Something I missed? Please forward any suggestions.
> >>>> Details
> >>>> below.
> >>>>
> >>>> BOX1 = Pentium III/1000MHz/512MB RAM
> >>>> BOX2 = Pentium III/600MHz/512MB RAM
> >>>> Both running mysql-4.1.3-beta-nightly-20040628.tar.gz.
> >>>> Not a lot of RAM but only using a tiny test database at this point.
> >>>> Running the MGM on a separate computer (BOX4) to help isolate 
> >>>> problem.
> >>>>
> >>>> Connected to BOX1, issue SELECT against test.simpsons and get
> proper
> >>>> response:
> >>>>
> >>>> ----------------------------------------
> >>>> mysql> select * from simpsons ;
> >>>> +----+------------+
> >>>> | id | first_name |
> >>>> +----+------------+
> >>>> |  2 | Lisa       |
> >>>> |  4 | Homer      |
> >>>> |  5 | Maggie     |
> >>>> |  3 | Marge      |
> >>>> |  1 | Bart       |
> >>>> +----+------------+
> >>>> 5 rows in set (0.03 sec)
> >>>> ----------------------------------------
> >>>>
> >>>> Stop node 3 on BOX1. SELECT now fails:
> >>>>
> >>>> ----------------------------------------
> >>>> mysql> select * from simpsons ;
> >>>> ERROR 1015: Can't lock file (errno: 4009)
> >>>> ----------------------------------------
> >>>>
> >>>> Repeating SELECT fails:
> >>>>
> >>>> ----------------------------------------
> >>>> mysql> select * from simpsons ;
> >>>> ERROR 2013: Lost connection to MySQL server during query
> >>>> ----------------------------------------
> >>>>
> >>>> Repeating SELECT fails again, then succeeds after node 3 is 
> >>>> restarted:
> >>>>
> >>>> ----------------------------------------
> >>>> mysql> select * from simpsons ;
> >>>> ERROR 2006: MySQL server has gone away
> >>>> No connection. Trying to reconnect...
> >>>> Connection id:    1
> >>>> Current database: test
> >>>>
> >>>> +----+------------+
> >>>> | id | first_name |
> >>>> +----+------------+
> >>>> |  2 | Lisa       |
> >>>> |  4 | Homer      |
> >>>> |  5 | Maggie     |
> >>>> |  3 | Marge      |
> >>>> |  1 | Bart       |
> >>>> +----+------------+
> >>>> 5 rows in set (6.55 sec)
> >>>> ----------------------------------------
> >>>>
> >>>> All data is intact. BTW new records added to node 2 on BOX2 while
> >>>> node 3 on
> >>>> BOX1 is down show up (this is good).
> >>>>
> >>>> Here's what restarting node 3 on BOX1 with mgmd looks like (looks
> >>>> right to me):
> >>>>
> >>>> ----------------------------------------
> >>>> NDB> show
> >>>> Cluster Configuration
> >>>> ---------------------
> >>>> 2 NDB Node(s)
> >>>> DB node:        2  (Version: 3.5.0)
> >>>> DB node:        3  (Version: 3.5.0)
> >>>>
> >>>> 4 API Node(s)
> >>>> API node:       11  (not connected)
> >>>> API node:       12  (Version: 3.5.0)
> >>>> API node:       13  (not connected)
> >>>> API node:       14  (not connected)
> >>>>
> >>>> 1 MGM Node(s)
> >>>> MGM node:       1  (Version: 3.5.0)
> >>>>
> >>>> NDB> 2 restart
> >>>> Executing RESTART on node 2.
> >>>> Database node 2 is being restarted.
> >>>>
> >>>> NDB> 2 - endTakeOver
> >>>> ----------------------------------------
> >>>>
> >>>> Here is the MySQL server error log output on BOX1 as node 3 is
> >>>> restarted:
> >>>>
> >>>> ----------------------------------------
> >>>> 040713 10:53:31  mysqld started
> >>>> 040713 10:53:32  InnoDB: Started; log sequence number 0 44112
> >>>> /usr/local/mysql/libexec/mysqld: ready for connections.
> >>>> Version: '4.1.3-beta-nightly-20040628-log'  socket: 
> >>>> '/tmp/mysql.sock'
> >>>> port:
> >>>> 3306
> >>>> 2004-07-19 11:39:15 [NDB] INFO     -- Node shutdown initiated
> >>>> mysqld got signal 11;
> 
=== message truncated ===



		
__________________________________
Do you Yahoo!?
Vote for the stars of Yahoo!'s next ad campaign!
http://advision.webevents.yahoo.com/yahoo/votelifeengine/

Thread
API loses data during node restartsJim Hoadley19 Jul
  • Re: API loses data during node restartsDevananda19 Jul
    • Re: API loses data during node restartsJohan Andersson19 Jul
      • Re: API loses data during node restartsJim Hoadley20 Jul
        • Re: API loses data during node restartsJustin Swanhart20 Jul
          • Re: API loses data during node restartsMikael Ronström20 Jul
          • Re: API loses data during node restartsDevananda20 Jul
            • Re: API loses data during node restartsJohan Andersson20 Jul
        • Re: API loses data during node restartsMikael Ronström20 Jul
          • Re: API loses data during node restartsJohan Andersson20 Jul
            • Re: API loses data during node restartsJim Hoadley20 Jul
          • Re: API loses data during node restartsJim Hoadley20 Jul
            • Re: API loses data during node restartsMikael Ronström20 Jul
              • Re: API loses data during node restartsJim Hoadley22 Jul