List:Cluster« Previous MessageNext Message »
From:Mikael Ronström Date:July 19 2004 11:30pm
Subject:Re: API loses data during node restarts
View as plain text  
Hi,

2004-07-20 kl. 01.08 skrev Justin Swanhart:

> Do you have a [TCP] section for connections between
> the second API server and the first DB server?
>

The default if you don't specify any connections is that you get one 
between all storage nodes
and between each API server and all storage nodes. It might however be 
worthwhile to test
specifying each connection to see if there is a bug in the code 
generating the connections.

Rgrds Mikael

> (assuming your app ids are 4 and 5 and your db ids are
> 2 and 3)
>
> #DB1 / APP1
> [TCP]
> node1=2
> node2=4
>
> #DB2 / APP1
> [TCP]
> node1=3
> node2=4
>
> #DB1 / APP2
> [TCP]
> node1=2
> node2=5
>
> #DB2 / APP2
> [TCP]
> node1=3
> node2=5
>
>
> --- Jim Hoadley <j_hoadley@stripped> wrote:
>> Johan --
>>
>> Thanks for the fast response! I read bug report
>> 4585. It says:
>>
>> -   Description:
>> -   If entire DB cluster goes down, then the mysqld
>> servers should retry
>> -   connecting to the DB. The mysql servers must not
>> give up trying to
>> reconnect
>> -   to DB nodes.
>> -
>> -   If the mysqld is not restarted after a cluster
>> restart and a query is
>> -   executed on that mysqld, then the mysqld will
>> crash. Not so nice.
>> -
>> -   How to repeat:
>> -   1. restart cluster
>> -   2. issue a query on one mysqld server
>> -
>> -   Suggested fix:
>> -   Let be there be a configurable option
>> (--ndbcluster_timeout)  for how long
>> -   the mysqld should try to reconnect to the db
>> nodes.
>> -   --ndbcluster_timeout={0,0x7fffffff} and let 0 be
>> retry forever.
>>
>> Not sure we're talking about the same issue. I'm not
>> taking the entire cluster
>> down, just one of the nodes. In that case, shouldn't
>> the API seamlessly and
>> instantly read from another node?
>>
>> 1) I have a 2-node cluster with 2 replicas, with an
>> API running on each node.
>> 2) I run a shell script that connects to the first
>> API and executes one SELECT
>>    query per second. I can stop either DB node
>> everything still works.
>> 3) I run the same script against the second API. I
>> can stop the DB node on the
>>    *other* computer, but if I stop the DB node on
>> the same computer that the
>> API
>>    is running on, mysqld reports it can't get a lock
>> on the data file until the
>>    node comes back up.
>> 4) When the node is started again the API begins
>> answering queries again.
>>
>> Comments? Thanks again for taking the time to look
>> at my problem.
>>
>> -- Jim
>>
>>
>> --- Johan Andersson <johan@stripped> wrote:
>>> Hi,
>>> A bug report (4585) relating to this has been
>> filed.
>>> Sorry for your inconvenience,
>>>
>>> b.r,
>>> Johan Andersson
>>>
>>> Devananda wrote:
>>>
>>>> I've been experiencing this same general
>> problem, but haven't tried to
>>>> narrow it down to a reproduceable pattern. Seems
>> to happen in relation
>>>> to restarting a DB node, like Jim said.
>>>>
>>>> Jim Hoadley wrote:
>>>>
>>>>> When I stop/start or restart a database node,
>> the API (MySQL server)
>>>>> loses
>>>>> connection with the data until the node comes
>> back online. This only
>>>>> happens on
>>>>> one of my 2 nodes (BOX2). The other (BOX1) is
>> fine. Been puzzling
>>>>> over this for
>>>>> a week or so. Something I missed? Please
>> forward any suggestions.
>>>>> Details
>>>>> below.
>>>>>
>>>>> BOX1 = Pentium III/1000MHz/512MB RAM
>>>>> BOX2 = Pentium III/600MHz/512MB RAM
>>>>> Both running
>> mysql-4.1.3-beta-nightly-20040628.tar.gz.
>>>>> Not a lot of RAM but only using a tiny test
>> database at this point.
>>>>> Running the MGM on a separate computer (BOX4)
>> to help isolate problem.
>>>>>
>>>>> Connected to BOX1, issue SELECT against
>> test.simpsons and get proper
>>>>> response:
>>>>>
>>>>> ----------------------------------------
>>>>> mysql> select * from simpsons ;
>>>>> +----+------------+
>>>>> | id | first_name |
>>>>> +----+------------+
>>>>> |  2 | Lisa       |
>>>>> |  4 | Homer      |
>>>>> |  5 | Maggie     |
>>>>> |  3 | Marge      |
>>>>> |  1 | Bart       |
>>>>> +----+------------+
>>>>> 5 rows in set (0.03 sec)
>>>>> ----------------------------------------
>>>>>
>>>>> Stop node 3 on BOX1. SELECT now fails:
>>>>>
>>>>> ----------------------------------------
>>>>> mysql> select * from simpsons ;
>>>>> ERROR 1015: Can't lock file (errno: 4009)
>>>>> ----------------------------------------
>>>>>
>>>>> Repeating SELECT fails:
>>>>>
>>>>> ----------------------------------------
>>>>> mysql> select * from simpsons ;
>>>>> ERROR 2013: Lost connection to MySQL server
>> during query
>>>>> ----------------------------------------
>>>>>
>>>>> Repeating SELECT fails again, then succeeds
>> after node 3 is restarted:
>>>>>
>>>>> ----------------------------------------
>>>>> mysql> select * from simpsons ;
>>>>> ERROR 2006: MySQL server has gone away
>>>>> No connection. Trying to reconnect...
>>>>> Connection id:    1
>>>>> Current database: test
>>>>>
>>>>> +----+------------+
>>>>> | id | first_name |
>>>>> +----+------------+
>>>>> |  2 | Lisa       |
>>>>> |  4 | Homer      |
>>>>> |  5 | Maggie     |
>>>>> |  3 | Marge      |
>>>>> |  1 | Bart       |
>>>>> +----+------------+
>>>>> 5 rows in set (6.55 sec)
>>>>> ----------------------------------------
>>>>>
>>>>> All data is intact. BTW new records added to
>> node 2 on BOX2 while
>>>>> node 3 on
>>>>> BOX1 is down show up (this is good).
>>>>>
>>>>> Here's what restarting node 3 on BOX1 with mgmd
>> looks like (looks
>>>>> right to me):
>>>>>
>>>>> ----------------------------------------
>>>>> NDB> show
>>>>> Cluster Configuration
>>>>> ---------------------
>>>>> 2 NDB Node(s)
>>>>> DB node:        2  (Version: 3.5.0)
>>>>> DB node:        3  (Version: 3.5.0)
>>>>>
>>>>> 4 API Node(s)
>>>>> API node:       11  (not connected)
>>>>> API node:       12  (Version: 3.5.0)
>>>>> API node:       13  (not connected)
>>>>> API node:       14  (not connected)
>>>>>
>>>>> 1 MGM Node(s)
>>>>> MGM node:       1  (Version: 3.5.0)
>>>>>
>>>>> NDB> 2 restart
>>>>> Executing RESTART on node 2.
>>>>> Database node 2 is being restarted.
>>>>>
>>>>> NDB> 2 - endTakeOver
>>>>> ----------------------------------------
>>>>>
>>>>> Here is the MySQL server error log output on
>> BOX1 as node 3 is
>>>>> restarted:
>>>>>
>>>>> ----------------------------------------
>>>>> 040713 10:53:31  mysqld started
>>>>> 040713 10:53:32  InnoDB: Started; log sequence
>> number 0 44112
>>>>> /usr/local/mysql/libexec/mysqld: ready for
>> connections.
>>>>> Version: '4.1.3-beta-nightly-20040628-log'
>> socket: '/tmp/mysql.sock'
>>
> === message truncated ===
>
>
> -- 
> MySQL Cluster Mailing List
> For list archives: http://lists.mysql.com/cluster
> To unsubscribe:    
> http://lists.mysql.com/cluster?unsub=1
>
Mikael Ronström, Senior Software Architect
MySQL AB, www.mysql.com

Clustering:
http://www.infoworld.com/article/04/04/14/HNmysqlcluster_1.html

http://www.eweek.com/article2/0,1759,1567546,00.asp


Thread
API loses data during node restartsJim Hoadley19 Jul
  • Re: API loses data during node restartsDevananda19 Jul
    • Re: API loses data during node restartsJohan Andersson19 Jul
      • Re: API loses data during node restartsJim Hoadley20 Jul
        • Re: API loses data during node restartsJustin Swanhart20 Jul
          • Re: API loses data during node restartsMikael Ronström20 Jul
          • Re: API loses data during node restartsDevananda20 Jul
            • Re: API loses data during node restartsJohan Andersson20 Jul
        • Re: API loses data during node restartsMikael Ronström20 Jul
          • Re: API loses data during node restartsJohan Andersson20 Jul
            • Re: API loses data during node restartsJim Hoadley20 Jul
          • Re: API loses data during node restartsJim Hoadley20 Jul
            • Re: API loses data during node restartsMikael Ronström20 Jul
              • Re: API loses data during node restartsJim Hoadley22 Jul