Johan --
Johan> And it does not reconnect when the cluster is up.
Johan> [...]
Johan> This also means that the mysqld server must be able to reconnect to ndb
Johan> cluster storage engine.
FYI in my case it does reconnect. After a "[nodeid] restart" it resumes serving
data within 8 seconds.
Thanks again.
-- Jim
--- Johan Andersson <johan@stripped> wrote:
>
>
> Mikael Ronstr>
> > From what I could see in your email it seems as if one of the mysqld
> > process finds the cluster down
> > (in error) which is reported with error code 4009. Then immediately
> > after doing so the mysql server
> > itself fails. Does that analysis seem correct?
>
> The first query would generare erorr code 4009 and a subsequent query
> generates a mysqld crash. This is also the case with bug 4585 when the
> cluster goes down, The table handler does not gracefully cleanup after
> connection loss. And it does not reconnect when the cluster is up. In
> either way, the mysqld server may never ever crash since there be data
> stored in other storage storage engines (innodb, mysisam).... This
> also means that the mysqld server must be able to reconnect to ndb
> cluster storage engine.
>
> b.r.,
> johan
>
> > Rgrds Mikael
> >
> > 2004-07-20 kl. 00.36 skrev Jim Hoadley:
> >
> >> Johan --
> >>
> >> Thanks for the fast response! I read bug report 4585. It says:
> >>
> >> - Description:
> >> - If entire DB cluster goes down, then the mysqld servers should retry
> >> - connecting to the DB. The mysql servers must not give up trying to
> >> reconnect
> >> - to DB nodes.
> >> -
> >> - If the mysqld is not restarted after a cluster restart and a
> >> query is
> >> - executed on that mysqld, then the mysqld will crash. Not so nice.
> >> -
> >> - How to repeat:
> >> - 1. restart cluster
> >> - 2. issue a query on one mysqld server
> >> -
> >> - Suggested fix:
> >> - Let be there be a configurable option (--ndbcluster_timeout) for
> >> how long
> >> - the mysqld should try to reconnect to the db nodes.
> >> - --ndbcluster_timeout={0,0x7fffffff} and let 0 be retry forever.
> >>
> >> Not sure we're talking about the same issue. I'm not taking the
> >> entire cluster
> >> down, just one of the nodes. In that case, shouldn't the API
> >> seamlessly and
> >> instantly read from another node?
> >>
> >> 1) I have a 2-node cluster with 2 replicas, with an API running on
> >> each node.
> >> 2) I run a shell script that connects to the first API and executes
> >> one SELECT
> >> query per second. I can stop either DB node everything still works.
> >> 3) I run the same script against the second API. I can stop the DB
> >> node on the
> >> *other* computer, but if I stop the DB node on the same computer
> >> that the
> >> API
> >> is running on, mysqld reports it can't get a lock on the data file
> >> until the
> >> node comes back up.
> >> 4) When the node is started again the API begins answering queries
> >> again.
> >>
> >> Comments? Thanks again for taking the time to look at my problem.
> >>
> >> -- Jim
> >>
> >>
> >> --- Johan Andersson <johan@stripped> wrote:
> >>
> >>> Hi,
> >>> A bug report (4585) relating to this has been filed.
> >>> Sorry for your inconvenience,
> >>>
> >>> b.r,
> >>> Johan Andersson
> >>>
> >>> Devananda wrote:
> >>>
> >>>> I've been experiencing this same general problem, but haven't tried
> to
> >>>> narrow it down to a reproduceable pattern. Seems to happen in
> relation
> >>>> to restarting a DB node, like Jim said.
> >>>>
> >>>> Jim Hoadley wrote:
> >>>>
> >>>>> When I stop/start or restart a database node, the API (MySQL
> server)
> >>>>> loses
> >>>>> connection with the data until the node comes back online. This
> only
> >>>>> happens on
> >>>>> one of my 2 nodes (BOX2). The other (BOX1) is fine. Been
> puzzling
> >>>>> over this for
> >>>>> a week or so. Something I missed? Please forward any
> suggestions.
> >>>>> Details
> >>>>> below.
> >>>>>
> >>>>> BOX1 = Pentium III/1000MHz/512MB RAM
> >>>>> BOX2 = Pentium III/600MHz/512MB RAM
> >>>>> Both running mysql-4.1.3-beta-nightly-20040628.tar.gz.
> >>>>> Not a lot of RAM but only using a tiny test database at this
> point.
> >>>>> Running the MGM on a separate computer (BOX4) to help isolate
> >>>>> problem.
> >>>>>
> >>>>> Connected to BOX1, issue SELECT against test.simpsons and get
> proper
> >>>>> response:
> >>>>>
> >>>>> ----------------------------------------
> >>>>> mysql> select * from simpsons ;
> >>>>> +----+------------+
> >>>>> | id | first_name |
> >>>>> +----+------------+
> >>>>> | 2 | Lisa |
> >>>>> | 4 | Homer |
> >>>>> | 5 | Maggie |
> >>>>> | 3 | Marge |
> >>>>> | 1 | Bart |
> >>>>> +----+------------+
> >>>>> 5 rows in set (0.03 sec)
> >>>>> ----------------------------------------
> >>>>>
> >>>>> Stop node 3 on BOX1. SELECT now fails:
> >>>>>
> >>>>> ----------------------------------------
> >>>>> mysql> select * from simpsons ;
> >>>>> ERROR 1015: Can't lock file (errno: 4009)
> >>>>> ----------------------------------------
> >>>>>
> >>>>> Repeating SELECT fails:
> >>>>>
> >>>>> ----------------------------------------
> >>>>> mysql> select * from simpsons ;
> >>>>> ERROR 2013: Lost connection to MySQL server during query
> >>>>> ----------------------------------------
> >>>>>
> >>>>> Repeating SELECT fails again, then succeeds after node 3 is
> >>>>> restarted:
> >>>>>
> >>>>> ----------------------------------------
> >>>>> mysql> select * from simpsons ;
> >>>>> ERROR 2006: MySQL server has gone away
> >>>>> No connection. Trying to reconnect...
> >>>>> Connection id: 1
> >>>>> Current database: test
> >>>>>
> >>>>> +----+------------+
> >>>>> | id | first_name |
> >>>>> +----+------------+
> >>>>> | 2 | Lisa |
> >>>>> | 4 | Homer |
> >>>>> | 5 | Maggie |
> >>>>> | 3 | Marge |
> >>>>> | 1 | Bart |
> >>>>> +----+------------+
> >>>>> 5 rows in set (6.55 sec)
> >>>>> ----------------------------------------
> >>>>>
> >>>>> All data is intact. BTW new records added to node 2 on BOX2
> while
> >>>>> node 3 on
> >>>>> BOX1 is down show up (this is good).
> >>>>>
> >>>>> Here's what restarting node 3 on BOX1 with mgmd looks like
> (looks
> >>>>> right to me):
> >>>>>
> >>>>> ----------------------------------------
> >>>>> NDB> show
> >>>>> Cluster Configuration
> >>>>> ---------------------
> >>>>> 2 NDB Node(s)
> >>>>> DB node: 2 (Version: 3.5.0)
> >>>>> DB node: 3 (Version: 3.5.0)
> >>>>>
> >>>>> 4 API Node(s)
> >>>>> API node: 11 (not connected)
> >>>>> API node: 12 (Version: 3.5.0)
> >>>>> API node: 13 (not connected)
> >>>>> API node: 14 (not connected)
> >>>>>
> >>>>> 1 MGM Node(s)
> >>>>> MGM node: 1 (Version: 3.5.0)
> >>>>>
> >>>>> NDB> 2 restart
> >>>>> Executing RESTART on node 2.
> >>>>> Database node 2 is being restarted.
> >>>>>
> >>>>> NDB> 2 - endTakeOver
> >>>>> ----------------------------------------
> >>>>>
> >>>>> Here is the MySQL server error log output on BOX1 as node 3 is
> >>>>> restarted:
> >>>>>
> >>>>> ----------------------------------------
> >>>>> 040713 10:53:31 mysqld started
> >>>>> 040713 10:53:32 InnoDB: Started; log sequence number 0 44112
>
=== message truncated ===
__________________________________
Do you Yahoo!?
Vote for the stars of Yahoo!'s next ad campaign!
http://advision.webevents.yahoo.com/yahoo/votelifeengine/