List:Cluster« Previous MessageNext Message »
From:Jim Hoadley Date:July 22 2004 12:13am
Subject:Re: API loses data during node restarts
View as plain text  
Mikael and Johan --

I made the change to HeartbeatIntervalDbApi and saw no improvements.

I reinstalled mysql-4.1.3-beta-nightly-20040628 and the problem began occurring
on both nodes. So my earlier statement is no longer true:

> > I consistently see this
> > behavior on one node but never on the other

Next, I deleted MySQL and the Cluster software and reinstalled both using a new
code snapshot: mysql-4.1.4-beta-nightly-20040721.

Same results. Happens for either node. The MySQL server reports "ERROR 1015:
Can't lock file (errno: 4009)" the instant a DB node running on the same
computer as the API is stopped, and MySQL resumes serving data when the node
comes back up. 

So I'm looking for help, again.
I reported this as Bug#4641.

Thanks.

-- Jim

--- Mikael_Ronstr> Hi Jim,
> When you say that it only happens to the mysql server executing on the 
> same machine as the crashing nodee
> it gives me an idea that it might be a timing issue. It could be that 
> your machine is so busy with writing error
> logs stopping the storage node that the mysql server on the same 
> machine doesn't get a chance to send heartbeats.
> 
> If this is true you should change the config parameter 
> HeartbeatIntervalDbApi. Its default setting is 1500 miliseconds
> which means that a heartbeat must be responded to in 4.5-6.0 seconds. 
> Normally this is not a problem but if the
> machine is very busy and starts swapping in and out processes then it 
> can sometimes be a problem. So you could always
> try with setting this to something like 10000 milliseconds and see if 
> that helps.
> 
> Rgrds Mikael
> 
> 2004-07-20 kl. 02.22 skrev Jim Hoadley:
> 
> > Mikael --
> >
> > Yes, the MySQL server reports "ERROR 1015: Can't lock file (errno: 
> > 4009)" the
> > instant a database node running on the same computer as the API is 
> > stopped.
> >
> > In thinking about this problem please remember that I consistently see 
> > this
> > behavior on one node but never on the other, and that the MySQL server 
> > API
> > always resumes serving cluster data once the database node starts back 
> > up.
> >
> > I will follow your suggestion and file a bug report. Thanks!
> >
> > -- Jim
> >
> >
> > --- Mikael_Ronstr> >> Hi Jim,
> >> The error code you get 4009 means that the mysql server believes that
> >> the cluster is down.
> >> It is related to the bug reported by Johan but not the same so I
> >> suggest you go ahead and file a new bug report to ensure that it gets
> >> priority in development.
> >>
> >>  From what I could see in your email it seems as if one of the mysqld
> >> process finds the cluster down
> >> (in error) which is reported with error code 4009. Then immediately
> >> after doing so the mysql server
> >> itself fails. Does that analysis seem correct?
> >>
> >> Rgrds Mikael
> >>
> >> 2004-07-20 kl. 00.36 skrev Jim Hoadley:
> >>
> >>> Johan --
> >>>
> >>> Thanks for the fast response! I read bug report 4585. It says:
> >>>
> >>> -   Description:
> >>> -   If entire DB cluster goes down, then the mysqld servers should
> >>> retry
> >>> -   connecting to the DB. The mysql servers must not give up trying 
> >>> to
> >>> reconnect
> >>> -   to DB nodes.
> >>> -
> >>> -   If the mysqld is not restarted after a cluster restart and a 
> >>> query
> >>> is
> >>> -   executed on that mysqld, then the mysqld will crash. Not so nice.
> >>> -
> >>> -   How to repeat:
> >>> -   1. restart cluster
> >>> -   2. issue a query on one mysqld server
> >>> -
> >>> -   Suggested fix:
> >>> -   Let be there be a configurable option (--ndbcluster_timeout)  for
> >>> how long
> >>> -   the mysqld should try to reconnect to the db nodes.
> >>> -   --ndbcluster_timeout={0,0x7fffffff} and let 0 be retry forever.
> >>>
> >>> Not sure we're talking about the same issue. I'm not taking the 
> >>> entire
> >>> cluster
> >>> down, just one of the nodes. In that case, shouldn't the API
> >>> seamlessly and
> >>> instantly read from another node?
> >>>
> >>> 1) I have a 2-node cluster with 2 replicas, with an API running on
> >>> each node.
> >>> 2) I run a shell script that connects to the first API and executes
> >>> one SELECT
> >>>    query per second. I can stop either DB node everything still 
> >>> works.
> >>> 3) I run the same script against the second API. I can stop the DB
> >>> node on the
> >>>    *other* computer, but if I stop the DB node on the same computer
> >>> that the
> >>> API
> >>>    is running on, mysqld reports it can't get a lock on the data file
> >>> until the
> >>>    node comes back up.
> >>> 4) When the node is started again the API begins answering queries
> >>> again.
> >>>
> >>> Comments? Thanks again for taking the time to look at my problem.
> >>>
> >>> -- Jim
> >>>
> >>>
> >>> --- Johan Andersson <johan@stripped> wrote:
> >>>> Hi,
> >>>> A bug report (4585) relating to this has been filed.
> >>>> Sorry for your inconvenience,
> >>>>
> >>>> b.r,
> >>>> Johan Andersson
> >>>>
> >>>> Devananda wrote:
> >>>>
> >>>>> I've been experiencing this same general problem, but haven't
> tried
> >>>>> to
> >>>>> narrow it down to a reproduceable pattern. Seems to happen in
> >>>>> relation
> >>>>> to restarting a DB node, like Jim said.
> >>>>>
> >>>>> Jim Hoadley wrote:
> >>>>>
> >>>>>> When I stop/start or restart a database node, the API (MySQL
> 
> >>>>>> server)
> >>>>>> loses
> >>>>>> connection with the data until the node comes back online.
> This 
> >>>>>> only
> >>>>>> happens on
> >>>>>> one of my 2 nodes (BOX2). The other (BOX1) is fine. Been
> puzzling
> >>>>>> over this for
> >>>>>> a week or so. Something I missed? Please forward any
> suggestions.
> >>>>>> Details
> >>>>>> below.
> >>>>>>
> >>>>>> BOX1 = Pentium III/1000MHz/512MB RAM
> >>>>>> BOX2 = Pentium III/600MHz/512MB RAM
> >>>>>> Both running mysql-4.1.3-beta-nightly-20040628.tar.gz.
> >>>>>> Not a lot of RAM but only using a tiny test database at this
> 
> >>>>>> point.
> >>>>>> Running the MGM on a separate computer (BOX4) to help
> isolate
> >>>>>> problem.
> >>>>>>
> >>>>>> Connected to BOX1, issue SELECT against test.simpsons and
> get 
> >>>>>> proper
> >>>>>> response:
> >>>>>>
> >>>>>> ----------------------------------------
> >>>>>> mysql> select * from simpsons ;
> >>>>>> +----+------------+
> >>>>>> | id | first_name |
> >>>>>> +----+------------+
> >>>>>> |  2 | Lisa       |
> >>>>>> |  4 | Homer      |
> >>>>>> |  5 | Maggie     |
> >>>>>> |  3 | Marge      |
> >>>>>> |  1 | Bart       |
> >>>>>> +----+------------+
> >>>>>> 5 rows in set (0.03 sec)
> >>>>>> ----------------------------------------
> >>>>>>
> >>>>>> Stop node 3 on BOX1. SELECT now fails:
> >>>>>>
> >>>>>> ----------------------------------------
> >>>>>> mysql> select * from simpsons ;
> >>>>>> ERROR 1015: Can't lock file (errno: 4009)
> >>>>>> ----------------------------------------
> >>>>>>
> >>>>>> Repeating SELECT fails:
> >>>>>>
> >>>>>> ----------------------------------------
> >>>>>> mysql> select * from simpsons ;
> >>>>>> ERROR 2013: Lost connection to MySQL server during query
> >>>>>> ----------------------------------------
> >>>>>>
> >>>>>> Repeating SELECT fails again, then succeeds after node 3 is
> >>>>>> restarted:
> >>>>>>
> >>>>>> ----------------------------------------
> >>>>>> mysql> select * from simpsons ;
> >>>>>> ERROR 2006: MySQL server has gone away
> >>>>>> No connection. Trying to reconnect...
> >>>>>> Connection id:    1
> >>>>>> Current database: test
> >>>>>>
> >>>>>> +----+------------+
> >>>>>> | id | first_name |
> >>>>>> +----+------------+
> >>>>>> |  2 | Lisa       |
> >>>>>> |  4 | Homer      |
> >>>>>> |  5 | Maggie     |
> >>>>>> |  3 | Marge      |
> 
=== message truncated ===



	
		
__________________________________
Do you Yahoo!?
Vote for the stars of Yahoo!'s next ad campaign!
http://advision.webevents.yahoo.com/yahoo/votelifeengine/
Thread
API loses data during node restartsJim Hoadley19 Jul
  • Re: API loses data during node restartsDevananda19 Jul
    • Re: API loses data during node restartsJohan Andersson19 Jul
      • Re: API loses data during node restartsJim Hoadley20 Jul
        • Re: API loses data during node restartsJustin Swanhart20 Jul
          • Re: API loses data during node restartsMikael Ronström20 Jul
          • Re: API loses data during node restartsDevananda20 Jul
            • Re: API loses data during node restartsJohan Andersson20 Jul
        • Re: API loses data during node restartsMikael Ronström20 Jul
          • Re: API loses data during node restartsJohan Andersson20 Jul
            • Re: API loses data during node restartsJim Hoadley20 Jul
          • Re: API loses data during node restartsJim Hoadley20 Jul
            • Re: API loses data during node restartsMikael Ronström20 Jul
              • Re: API loses data during node restartsJim Hoadley22 Jul