List:Cluster« Previous MessageNext Message »
From:Justin Swanhart Date:July 19 2004 11:08pm
Subject:Re: API loses data during node restarts
View as plain text  
Do you have a [TCP] section for connections between
the second API server and the first DB server?

(assuming your app ids are 4 and 5 and your db ids are
2 and 3)

#DB1 / APP1
[TCP]  
node1=2
node2=4

#DB2 / APP1
[TCP]  
node1=3
node2=4

#DB1 / APP2
[TCP]
node1=2
node2=5

#DB2 / APP2
[TCP]
node1=3
node2=5


--- Jim Hoadley <j_hoadley@stripped> wrote:
> Johan --
> 
> Thanks for the fast response! I read bug report
> 4585. It says:
> 
> -   Description:
> -   If entire DB cluster goes down, then the mysqld
> servers should retry 
> -   connecting to the DB. The mysql servers must not
> give up trying to
> reconnect
> -   to DB nodes.
> -
> -   If the mysqld is not restarted after a cluster
> restart and a query is
> -   executed on that mysqld, then the mysqld will
> crash. Not so nice.
> -
> -   How to repeat:
> -   1. restart cluster
> -   2. issue a query on one mysqld server
> -
> -   Suggested fix:
> -   Let be there be a configurable option
> (--ndbcluster_timeout)  for how long 
> -   the mysqld should try to reconnect to the db
> nodes.
> -   --ndbcluster_timeout={0,0x7fffffff} and let 0 be
> retry forever.
> 
> Not sure we're talking about the same issue. I'm not
> taking the entire cluster
> down, just one of the nodes. In that case, shouldn't
> the API seamlessly and
> instantly read from another node?
> 
> 1) I have a 2-node cluster with 2 replicas, with an
> API running on each node.
> 2) I run a shell script that connects to the first
> API and executes one SELECT
>    query per second. I can stop either DB node
> everything still works.
> 3) I run the same script against the second API. I
> can stop the DB node on the
>    *other* computer, but if I stop the DB node on
> the same computer that the
> API
>    is running on, mysqld reports it can't get a lock
> on the data file until the
>    node comes back up.
> 4) When the node is started again the API begins
> answering queries again.
> 
> Comments? Thanks again for taking the time to look
> at my problem.
> 
> -- Jim
> 
> 
> --- Johan Andersson <johan@stripped> wrote:
> > Hi,
> > A bug report (4585) relating to this has been
> filed.
> > Sorry for your inconvenience,
> > 
> > b.r,
> > Johan Andersson
> > 
> > Devananda wrote:
> > 
> > > I've been experiencing this same general
> problem, but haven't tried to 
> > > narrow it down to a reproduceable pattern. Seems
> to happen in relation 
> > > to restarting a DB node, like Jim said.
> > >
> > > Jim Hoadley wrote:
> > >
> > >> When I stop/start or restart a database node,
> the API (MySQL server) 
> > >> loses
> > >> connection with the data until the node comes
> back online. This only 
> > >> happens on
> > >> one of my 2 nodes (BOX2). The other (BOX1) is
> fine. Been puzzling 
> > >> over this for
> > >> a week or so. Something I missed? Please
> forward any suggestions. 
> > >> Details
> > >> below.
> > >>
> > >> BOX1 = Pentium III/1000MHz/512MB RAM
> > >> BOX2 = Pentium III/600MHz/512MB RAM
> > >> Both running
> mysql-4.1.3-beta-nightly-20040628.tar.gz.
> > >> Not a lot of RAM but only using a tiny test
> database at this point.
> > >> Running the MGM on a separate computer (BOX4)
> to help isolate problem.
> > >>
> > >> Connected to BOX1, issue SELECT against
> test.simpsons and get proper 
> > >> response:
> > >>
> > >> ----------------------------------------
> > >> mysql> select * from simpsons ;
> > >> +----+------------+
> > >> | id | first_name |
> > >> +----+------------+
> > >> |  2 | Lisa       |
> > >> |  4 | Homer      |
> > >> |  5 | Maggie     |
> > >> |  3 | Marge      |
> > >> |  1 | Bart       |
> > >> +----+------------+
> > >> 5 rows in set (0.03 sec)
> > >> ----------------------------------------
> > >>
> > >> Stop node 3 on BOX1. SELECT now fails:
> > >>
> > >> ----------------------------------------
> > >> mysql> select * from simpsons ;
> > >> ERROR 1015: Can't lock file (errno: 4009)
> > >> ----------------------------------------
> > >>
> > >> Repeating SELECT fails:
> > >>
> > >> ----------------------------------------
> > >> mysql> select * from simpsons ;
> > >> ERROR 2013: Lost connection to MySQL server
> during query
> > >> ----------------------------------------
> > >>
> > >> Repeating SELECT fails again, then succeeds
> after node 3 is restarted:
> > >>
> > >> ----------------------------------------
> > >> mysql> select * from simpsons ;
> > >> ERROR 2006: MySQL server has gone away
> > >> No connection. Trying to reconnect...
> > >> Connection id:    1
> > >> Current database: test
> > >>
> > >> +----+------------+
> > >> | id | first_name |
> > >> +----+------------+
> > >> |  2 | Lisa       |
> > >> |  4 | Homer      |
> > >> |  5 | Maggie     |
> > >> |  3 | Marge      |
> > >> |  1 | Bart       |
> > >> +----+------------+
> > >> 5 rows in set (6.55 sec)
> > >> ----------------------------------------
> > >>
> > >> All data is intact. BTW new records added to
> node 2 on BOX2 while 
> > >> node 3 on
> > >> BOX1 is down show up (this is good).
> > >>
> > >> Here's what restarting node 3 on BOX1 with mgmd
> looks like (looks 
> > >> right to me):
> > >>
> > >> ----------------------------------------
> > >> NDB> show
> > >> Cluster Configuration
> > >> ---------------------
> > >> 2 NDB Node(s)
> > >> DB node:        2  (Version: 3.5.0)
> > >> DB node:        3  (Version: 3.5.0)
> > >>
> > >> 4 API Node(s)
> > >> API node:       11  (not connected)
> > >> API node:       12  (Version: 3.5.0)
> > >> API node:       13  (not connected)
> > >> API node:       14  (not connected)
> > >>
> > >> 1 MGM Node(s)
> > >> MGM node:       1  (Version: 3.5.0)
> > >>
> > >> NDB> 2 restart
> > >> Executing RESTART on node 2.
> > >> Database node 2 is being restarted.
> > >>
> > >> NDB> 2 - endTakeOver
> > >> ----------------------------------------
> > >>
> > >> Here is the MySQL server error log output on
> BOX1 as node 3 is 
> > >> restarted:
> > >>
> > >> ----------------------------------------
> > >> 040713 10:53:31  mysqld started
> > >> 040713 10:53:32  InnoDB: Started; log sequence
> number 0 44112
> > >> /usr/local/mysql/libexec/mysqld: ready for
> connections.
> > >> Version: '4.1.3-beta-nightly-20040628-log' 
> socket: '/tmp/mysql.sock' 
> 
=== message truncated ===

Thread
API loses data during node restartsJim Hoadley19 Jul
  • Re: API loses data during node restartsDevananda19 Jul
    • Re: API loses data during node restartsJohan Andersson19 Jul
      • Re: API loses data during node restartsJim Hoadley20 Jul
        • Re: API loses data during node restartsJustin Swanhart20 Jul
          • Re: API loses data during node restartsMikael Ronström20 Jul
          • Re: API loses data during node restartsDevananda20 Jul
            • Re: API loses data during node restartsJohan Andersson20 Jul
        • Re: API loses data during node restartsMikael Ronström20 Jul
          • Re: API loses data during node restartsJohan Andersson20 Jul
            • Re: API loses data during node restartsJim Hoadley20 Jul
          • Re: API loses data during node restartsJim Hoadley20 Jul
            • Re: API loses data during node restartsMikael Ronström20 Jul
              • Re: API loses data during node restartsJim Hoadley22 Jul