List:Cluster« Previous MessageNext Message »
From:Jim Hoadley Date:July 20 2004 12:34am
Subject:Re: API loses data during node restarts
View as plain text  
Johan --

Johan> And it does not reconnect when the cluster is up.
Johan> [...]
Johan> This also means that the mysqld server must be able to reconnect to ndb
Johan> cluster storage engine.

FYI in my case it does reconnect. After a "[nodeid] restart" it resumes serving
data within 8 seconds.

Thanks again.

-- Jim

--- Johan Andersson <johan@stripped> wrote:
> 
> 
> Mikael Ronstr> 
> > From what I could see in your email it seems as if one of the mysqld 
> > process finds the cluster down
> > (in error) which is reported with error code 4009. Then immediately 
> > after doing so the mysql server
> > itself fails. Does that analysis seem correct?
> 
> The first query would generare erorr code 4009 and a subsequent query 
> generates a mysqld crash. This is also the case with bug 4585 when the 
> cluster goes down, The table handler does not gracefully cleanup after 
> connection loss. And it does not reconnect when the cluster is up. In 
> either way, the mysqld server may never ever crash since there be data 
> stored in other storage  storage engines  (innodb, mysisam).... This 
> also means that the mysqld server must be able to reconnect to ndb 
> cluster storage engine.
> 
> b.r.,
> johan
> 
> > Rgrds Mikael
> >
> > 2004-07-20 kl. 00.36 skrev Jim Hoadley:
> >
> >> Johan --
> >>
> >> Thanks for the fast response! I read bug report 4585. It says:
> >>
> >> -   Description:
> >> -   If entire DB cluster goes down, then the mysqld servers should retry
> >> -   connecting to the DB. The mysql servers must not give up trying to
> >> reconnect
> >> -   to DB nodes.
> >> -
> >> -   If the mysqld is not restarted after a cluster restart and a 
> >> query is
> >> -   executed on that mysqld, then the mysqld will crash. Not so nice.
> >> -
> >> -   How to repeat:
> >> -   1. restart cluster
> >> -   2. issue a query on one mysqld server
> >> -
> >> -   Suggested fix:
> >> -   Let be there be a configurable option (--ndbcluster_timeout)  for 
> >> how long
> >> -   the mysqld should try to reconnect to the db nodes.
> >> -   --ndbcluster_timeout={0,0x7fffffff} and let 0 be retry forever.
> >>
> >> Not sure we're talking about the same issue. I'm not taking the 
> >> entire cluster
> >> down, just one of the nodes. In that case, shouldn't the API 
> >> seamlessly and
> >> instantly read from another node?
> >>
> >> 1) I have a 2-node cluster with 2 replicas, with an API running on 
> >> each node.
> >> 2) I run a shell script that connects to the first API and executes 
> >> one SELECT
> >>    query per second. I can stop either DB node everything still works.
> >> 3) I run the same script against the second API. I can stop the DB 
> >> node on the
> >>    *other* computer, but if I stop the DB node on the same computer 
> >> that the
> >> API
> >>    is running on, mysqld reports it can't get a lock on the data file 
> >> until the
> >>    node comes back up.
> >> 4) When the node is started again the API begins answering queries 
> >> again.
> >>
> >> Comments? Thanks again for taking the time to look at my problem.
> >>
> >> -- Jim
> >>
> >>
> >> --- Johan Andersson <johan@stripped> wrote:
> >>
> >>> Hi,
> >>> A bug report (4585) relating to this has been filed.
> >>> Sorry for your inconvenience,
> >>>
> >>> b.r,
> >>> Johan Andersson
> >>>
> >>> Devananda wrote:
> >>>
> >>>> I've been experiencing this same general problem, but haven't tried
> to
> >>>> narrow it down to a reproduceable pattern. Seems to happen in
> relation
> >>>> to restarting a DB node, like Jim said.
> >>>>
> >>>> Jim Hoadley wrote:
> >>>>
> >>>>> When I stop/start or restart a database node, the API (MySQL
> server)
> >>>>> loses
> >>>>> connection with the data until the node comes back online. This
> only
> >>>>> happens on
> >>>>> one of my 2 nodes (BOX2). The other (BOX1) is fine. Been
> puzzling
> >>>>> over this for
> >>>>> a week or so. Something I missed? Please forward any
> suggestions.
> >>>>> Details
> >>>>> below.
> >>>>>
> >>>>> BOX1 = Pentium III/1000MHz/512MB RAM
> >>>>> BOX2 = Pentium III/600MHz/512MB RAM
> >>>>> Both running mysql-4.1.3-beta-nightly-20040628.tar.gz.
> >>>>> Not a lot of RAM but only using a tiny test database at this
> point.
> >>>>> Running the MGM on a separate computer (BOX4) to help isolate 
> >>>>> problem.
> >>>>>
> >>>>> Connected to BOX1, issue SELECT against test.simpsons and get
> proper
> >>>>> response:
> >>>>>
> >>>>> ----------------------------------------
> >>>>> mysql> select * from simpsons ;
> >>>>> +----+------------+
> >>>>> | id | first_name |
> >>>>> +----+------------+
> >>>>> |  2 | Lisa       |
> >>>>> |  4 | Homer      |
> >>>>> |  5 | Maggie     |
> >>>>> |  3 | Marge      |
> >>>>> |  1 | Bart       |
> >>>>> +----+------------+
> >>>>> 5 rows in set (0.03 sec)
> >>>>> ----------------------------------------
> >>>>>
> >>>>> Stop node 3 on BOX1. SELECT now fails:
> >>>>>
> >>>>> ----------------------------------------
> >>>>> mysql> select * from simpsons ;
> >>>>> ERROR 1015: Can't lock file (errno: 4009)
> >>>>> ----------------------------------------
> >>>>>
> >>>>> Repeating SELECT fails:
> >>>>>
> >>>>> ----------------------------------------
> >>>>> mysql> select * from simpsons ;
> >>>>> ERROR 2013: Lost connection to MySQL server during query
> >>>>> ----------------------------------------
> >>>>>
> >>>>> Repeating SELECT fails again, then succeeds after node 3 is 
> >>>>> restarted:
> >>>>>
> >>>>> ----------------------------------------
> >>>>> mysql> select * from simpsons ;
> >>>>> ERROR 2006: MySQL server has gone away
> >>>>> No connection. Trying to reconnect...
> >>>>> Connection id:    1
> >>>>> Current database: test
> >>>>>
> >>>>> +----+------------+
> >>>>> | id | first_name |
> >>>>> +----+------------+
> >>>>> |  2 | Lisa       |
> >>>>> |  4 | Homer      |
> >>>>> |  5 | Maggie     |
> >>>>> |  3 | Marge      |
> >>>>> |  1 | Bart       |
> >>>>> +----+------------+
> >>>>> 5 rows in set (6.55 sec)
> >>>>> ----------------------------------------
> >>>>>
> >>>>> All data is intact. BTW new records added to node 2 on BOX2
> while
> >>>>> node 3 on
> >>>>> BOX1 is down show up (this is good).
> >>>>>
> >>>>> Here's what restarting node 3 on BOX1 with mgmd looks like
> (looks
> >>>>> right to me):
> >>>>>
> >>>>> ----------------------------------------
> >>>>> NDB> show
> >>>>> Cluster Configuration
> >>>>> ---------------------
> >>>>> 2 NDB Node(s)
> >>>>> DB node:        2  (Version: 3.5.0)
> >>>>> DB node:        3  (Version: 3.5.0)
> >>>>>
> >>>>> 4 API Node(s)
> >>>>> API node:       11  (not connected)
> >>>>> API node:       12  (Version: 3.5.0)
> >>>>> API node:       13  (not connected)
> >>>>> API node:       14  (not connected)
> >>>>>
> >>>>> 1 MGM Node(s)
> >>>>> MGM node:       1  (Version: 3.5.0)
> >>>>>
> >>>>> NDB> 2 restart
> >>>>> Executing RESTART on node 2.
> >>>>> Database node 2 is being restarted.
> >>>>>
> >>>>> NDB> 2 - endTakeOver
> >>>>> ----------------------------------------
> >>>>>
> >>>>> Here is the MySQL server error log output on BOX1 as node 3 is
> >>>>> restarted:
> >>>>>
> >>>>> ----------------------------------------
> >>>>> 040713 10:53:31  mysqld started
> >>>>> 040713 10:53:32  InnoDB: Started; log sequence number 0 44112
> 
=== message truncated ===



		
__________________________________
Do you Yahoo!?
Vote for the stars of Yahoo!'s next ad campaign!
http://advision.webevents.yahoo.com/yahoo/votelifeengine/

Thread
API loses data during node restartsJim Hoadley19 Jul
  • Re: API loses data during node restartsDevananda19 Jul
    • Re: API loses data during node restartsJohan Andersson19 Jul
      • Re: API loses data during node restartsJim Hoadley20 Jul
        • Re: API loses data during node restartsJustin Swanhart20 Jul
          • Re: API loses data during node restartsMikael Ronström20 Jul
          • Re: API loses data during node restartsDevananda20 Jul
            • Re: API loses data during node restartsJohan Andersson20 Jul
        • Re: API loses data during node restartsMikael Ronström20 Jul
          • Re: API loses data during node restartsJohan Andersson20 Jul
            • Re: API loses data during node restartsJim Hoadley20 Jul
          • Re: API loses data during node restartsJim Hoadley20 Jul
            • Re: API loses data during node restartsMikael Ronström20 Jul
              • Re: API loses data during node restartsJim Hoadley22 Jul