List:Cluster« Previous MessageNext Message »
From:Matthew Boehm Date:October 27 2009 11:48pm
Subject:RE: benchmarking. node failure. Signal lost.
View as plain text  
How do I verify that it has been set/changed from default value?

ndb_config -c "host=15001LDMGR01" --type=ndbd
--query=nodeid,datamemory,sendbuffermemory
Unknown query option: sendbuffermemory

I set this option in the managers config.ini:

[tcp default]
SendBufferMemory = 4M
ReceiveBufferMemory = 4M

And did rolling restart. Only did initial on manager.

Cluster v7.0.8a

-Matthew

> -----Original Message-----
> From: Rayson Ho [mailto:rayrayson@stripped]
> Sent: Tuesday, October 27, 2009 6:01 PM
> To: Boehm, Matthew
> Cc: cluster@stripped
> Subject: Re: benchmarking. node failure. Signal lost.
> 
> On 10/27/09, Boehm, Matthew <mboehm@stripped> wrote:
> > Was running sysbench on 2 node cluster (2 NDB, 1 MYSQLD). When I got
> to
> > 256 threads/connections in my benchmark, NDB01 failed/crashed. 128
> > threads/connections went fine. It appears 256 was too many and that
> TCP
> > stack got overloaded? Manual says SendBufferMemory defaults to 2MB.
> Is
> > the solution to increase to 4MB?
> 
> It's documented in the manual:
> http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-config-send-
> buffers.html
> 
> And to increase the number of file descriptors:
> http://www.cs.uwaterloo.ca/~brecht/servers/openfiles.html
> 
> Rayson
> 
> 
> >
> > -Matthew
> >
> > From ndb_1_error.log:
> >
> > Time: Tuesday 27 October 2009 - 17:19:42568atus: Permanent error,
> > external action needed
> > Message: Signal lost, out of send buffer memory, please increase
> > SendBufferMemory or lower the load (Resource configuration error)
> > Error: 6052
> > Error data: Remote note id 10.
> > Error object: TransporterCallback.cpp
> > Program: /usr/sbin/ndbmtd
> > Pid: 6826 thr: 3
> > Version: mysql-5.1.37 ndb-7.0.8a
> > Trace: /var/lib/mysql/cluster/ndb_1_trace.log.1
> > /var/lib/mysql/cluster/ndb_1_trace.log.1_t1
> > /var/lib/mysql/cluster/ndb_1_
> > /mysql/cluster/ndb_1_trace.log.1_t1 /var/lib/mysql/cluster/ndb_1_tra
> >
> > From ndb_1_out.log:
> >
> > 2009-10-27 17:18:35 [ndbd] WARNING  -- Ndb kernel thread 2 is stuck
> in:
> > Job Handling elapsed=100
> > 2009-10-27 17:18:35 [ndbd] INFO     -- Watchdog: User time: 467861
> > System time: 245535
> > 2009-10-27 17:18:38 [ndbd] WARNING  -- Ndb kernel thread 2 is stuck
> in:
> > Job Handling elapsed=99
> > 2009-10-27 17:18:38 [ndbd] INFO     -- Watchdog: User time: 467896
> > System time: 245562
> > 2009-10-27 17:18:40 [ndbd] WARNING  -- Ndb kernel thread 2 is stuck
> in:
> > Job Handling elapsed=100
> > 2009-10-27 17:18:40 [ndbd] INFO     -- Watchdog: User time: 467930
> > System time: 245598
> > send lock node 20 waiting for lock, contentions: 1400 spins: 4333994
> > sendbufferpool waiting for lock, contentions: 1800 spins: 27980
> > sendbufferpool waiting for lock, contentions: 2000 spins: 28855
> > sendbufferpool waiting for lock, contentions: 2200 spins: 32284
> > sendbufferpool waiting for lock, contentions: 2400 spins: 36214
> > send lock node 20 waiting for lock, contentions: 1600 spins: 4544753
> > 2009-10-27 17:19:11 [ndbd] WARNING  -- Ndb kernel thread 2 is stuck
> in:
> > Job Handling elapsed=100
> > 2009-10-27 17:19:11 [ndbd] INFO     -- Watchdog: User time: 468330
> > System time: 245791
> > sendbufferpool waiting for lock, contentions: 2600 spins: 39928
> > sendbufferpool waiting for lock, contentions: 2800 spins: 43699
> > send lock node 20 waiting for lock, contentions: 1800 spins: 4764414
> > sendbufferpool waiting for lock, contentions: 3000 spins: 48076
> > sendbufferpool waiting for lock, contentions: 3200 spins: 52114
> > send lock node 20 waiting for lock, contentions: 2000 spins: 4989164
> > 2009-10-27 17:19:42 [ndbd] WARNING  -- Ndb kernel thread 2 is stuck
> in:
> > Job Handling elapsed=100
> > 2009-10-27 17:19:42 [ndbd] INFO     -- Watchdog: User time: 468708
> > System time: 245989
> > 2009-10-27 17:19:42 [ndbd] WARNING  -- Ndb kernel thread 0 is stuck
> in:
> > Job Handling elapsed=99
> > 2009-10-27 17:19:42 [ndbd] INFO     -- Watchdog: User time: 468710
> > System time: 245994
> > 2009-10-27 17:19:42 [ndbd] WARNING  -- Ndb kernel thread 0 is stuck
> in:
> > Job Handling elapsed=199
> > 2009-10-27 17:19:42 [ndbd] INFO     -- Watchdog: User time: 468713
> > System time: 246002
> > 2009-10-27 17:19:42 [ndbd] WARNING  -- Ndb kernel thread 3 is stuck
> in:
> > Checking connections elapsed=100
> > 2009-10-27 17:19:42 [ndbd] INFO     -- Watchdog: User time: 468713
> > System time: 246002
> > 2009-10-27 17:19:42 [ndbd] WARNING  -- Ndb kernel thread 0 is stuck
> in:
> > Job Handling elapsed=299
> > 2009-10-27 17:19:42 [ndbd] INFO     -- Watchdog: User time: 468715
> > System time: 246011
> > 2009-10-27 17:19:42 [ndbd] WARNING  -- Ndb kernel thread 3 is stuck
> in:
> > Checking connections elapsed=199
> > 2009-10-27 17:19:42 [ndbd] INFO     -- Watchdog: User time: 468715
> > System time: 246011
> > 2009-10-27 17:19:42 [ndbd] WARNING  -- Ndb kernel thread 0 is stuck
> in:
> > Job Handling elapsed=399
> > 2009-10-27 17:19:42 [ndbd] INFO     -- Watchdog: User time: 468717
> > System time: 246020
> > 2009-10-27 17:19:42 [ndbd] WARNING  -- Ndb kernel thread 3 is stuck
> in:
> > Checking connections elapsed=300
> > 2009-10-27 17:19:42 [ndbd] INFO     -- Watchdog: User time: 468717
> > System time: 246020
> > 2009-10-27 17:19:42 [ndbd] WARNING  -- Ndb kernel thread 0 is stuck
> in:
> > Job Handling elapsed=499
> > 2009-10-27 17:19:42 [ndbd] INFO     -- Watchdog: User time: 468720
> > System time: 246029
> > 2009-10-27 17:19:42 [ndbd] WARNING  -- Ndb kernel thread 3 is stuck
> in:
> > Checking connections elapsed=399
> > 2009-10-27 17:19:42 [ndbd] INFO     -- Watchdog: User time: 468720
> > System time: 246029
> > 2009-10-27 17:19:42 [ndbd] WARNING  -- Ndb kernel thread 0 is stuck
> in:
> > Job Handling elapsed=599
> > 2009-10-27 17:19:42 [ndbd] INFO     -- Watchdog: User time: 468722
> > System time: 246037
> > 2009-10-27 17:19:42 [ndbd] WARNING  -- Ndb kernel thread 3 is stuck
> in:
> > Checking connections elapsed=500
> > 2009-10-27 17:19:42 [ndbd] INFO     -- Watchdog: User time: 468722
> > System time: 246037
> > 2009-10-27 17:19:42 [ndbd] WARNING  -- Ndb kernel thread 0 is stuck
> in:
> > Job Handling elapsed=699
> > 2009-10-27 17:19:42 [ndbd] INFO     -- Watchdog: User time: 468724
> > System time: 246045
> > 2009-10-27 17:19:42 [ndbd] WARNING  -- Ndb kernel thread 3 is stuck
> in:
> > Checking connections elapsed=599
> > 2009-10-27 17:19:42 [ndbd] INFO     -- Watchdog: User time: 468724
> > System time: 246045
> > 2009-10-27 17:19:43 [ndbd] WARNING  -- Ndb kernel thread 0 is stuck
> in:
> > Job Handling elapsed=799
> > 2009-10-27 17:19:43 [ndbd] INFO     -- Watchdog: User time: 468726
> > System time: 246053
> > 2009-10-27 17:19:43 [ndbd] WARNING  -- Ndb kernel thread 3 is stuck
> in:
> > Checking connections elapsed=700
> > 2009-10-27 17:19:43 [ndbd] INFO     -- Watchdog: User time: 468726
> > System time: 246053
> > 2009-10-27 17:19:43 [ndbd] WARNING  -- Ndb kernel thread 0 is stuck
> in:
> > Job Handling elapsed=899
> > 2009-10-27 17:19:43 [ndbd] INFO     -- Watchdog: User time: 468728
> > System time: 246062
> > 2009-10-27 17:19:43 [ndbd] WARNING  -- Ndb kernel thread 3 is stuck
> in:
> > Checking connections elapsed=799
> > 2009-10-27 17:19:43 [ndbd] INFO     -- Watchdog: User time: 468728
> > System time: 246062
> > 2009-10-27 17:19:43 [ndbd] WARNING  -- Ndb kernel thread 0 is stuck
> in:
> > Job Handling elapsed=999
> > 2009-10-27 17:19:43 [ndbd] INFO     -- Watchdog: User time: 468728
> > System time: 246076
> > 2009-10-27 17:19:43 [ndbd] WARNING  -- Ndb kernel thread 3 is stuck
> in:
> > Checking connections elapsed=899
> > 2009-10-27 17:19:43 [ndbd] INFO     -- Watchdog: User time: 468728
> > System time: 246076
> > 2009-10-27 17:19:43 [ndbd] WARNING  -- Ndb kernel thread 0 is stuck
> in:
> > Job Handling elapsed=1099
> > 2009-10-27 17:19:43 [ndbd] INFO     -- Watchdog: User time: 468729
> > System time: 246083
> > 2009-10-27 17:19:43 [ndbd] WARNING  -- Ndb kernel thread 3 is stuck
> in:
> > Checking connections elapsed=999
> > 2009-10-27 17:19:43 [ndbd] INFO     -- Watchdog: User time: 468729
> > System time: 246083
> > 2009-10-27 17:19:43 [ndbd] WARNING  -- Ndb kernel thread 0 is stuck
> in:
> > Job Handling elapsed=1199
> > 2009-10-27 17:19:43 [ndbd] INFO     -- Watchdog: User time: 468730
> > System time: 246090
> > 2009-10-27 17:19:43 [ndbd] WARNING  -- Ndb kernel thread 3 is stuck
> in:
> > Checking connections elapsed=1099
> > 2009-10-27 17:19:43 [ndbd] INFO     -- Watchdog: User time: 468730
> > System time: 246090
> > 2009-10-27 17:19:43 [ndbd] WARNING  -- Ndb kernel thread 0 is stuck
> in:
> > Job Handling elapsed=1299
> > 2009-10-27 17:19:43 [ndbd] INFO     -- Watchdog: User time: 468731
> > System time: 246098
> > 2009-10-27 17:19:43 [ndbd] WARNING  -- Ndb kernel thread 3 is stuck
> in:
> > Checking connections elapsed=1199
> > 2009-10-27 17:19:43 [ndbd] INFO     -- Watchdog: User time: 468731
> > System time: 246098
> > 2009-10-27 17:19:43 [ndbd] WARNING  -- Ndb kernel thread 0 is stuck
> in:
> > Job Handling elapsed=1399
> > 2009-10-27 17:19:43 [ndbd] INFO     -- Watchdog: User time: 468732
> > System time: 246106
> > 2009-10-27 17:19:43 [ndbd] WARNING  -- Ndb kernel thread 3 is stuck
> in:
> > Checking connections elapsed=1299
> > 2009-10-27 17:19:43 [ndbd] INFO     -- Watchdog: User time: 468732
> > System time: 246106
> > 2009-10-27 17:19:43 [ndbd] WARNING  -- Ndb kernel thread 0 is stuck
> in:
> > Job Handling elapsed=1499
> > 2009-10-27 17:19:43 [ndbd] INFO     -- Watchdog: User time: 468734
> > System time: 246112
> > 2009-10-27 17:19:43 [ndbd] WARNING  -- Ndb kernel thread 3 is stuck
> in:
> > Checking connections elapsed=1399
> > 2009-10-27 17:19:43 [ndbd] INFO     -- Watchdog: User time: 468734
> > System time: 246112
> > 2009-10-27 17:19:43 [ndbd] WARNING  -- Ndb kernel thread 0 is stuck
> in:
> > Job Handling elapsed=1599
> > 2009-10-27 17:19:43 [ndbd] INFO     -- Watchdog: User time: 468735
> > System time: 246119
> > 2009-10-27 17:19:43 [ndbd] WARNING  -- Ndb kernel thread 3 is stuck
> in:
> > Checking connections elapsed=1499
> > 2009-10-27 17:19:43 [ndbd] INFO     -- Watchdog: User time: 468735
> > System time: 246119
> > 2009-10-27 17:19:43 [ndbd] WARNING  -- Ndb kernel thread 0 is stuck
> in:
> > Job Handling elapsed=1699
> > 2009-10-27 17:19:43 [ndbd] INFO     -- Watchdog: User time: 468735
> > System time: 246127
> > 2009-10-27 17:19:43 [ndbd] WARNING  -- Ndb kernel thread 3 is stuck
> in:
> > Checking connections elapsed=1599
> > 2009-10-27 17:19:43 [ndbd] INFO     -- Watchdog: User time: 468735
> > System time: 246127
> > 2009-10-27 17:19:44 [ndbd] WARNING  -- Ndb kernel thread 0 is stuck
> in:
> > Job Handling elapsed=1799
> > 2009-10-27 17:19:44 [ndbd] INFO     -- Watchdog: User time: 468736
> > System time: 246133
> > 2009-10-27 17:19:44 [ndbd] WARNING  -- Ndb kernel thread 3 is stuck
> in:
> > Checking connections elapsed=1699
> > 2009-10-27 17:19:44 [ndbd] INFO     -- Watchdog: User time: 468736
> > System time: 246133
> > 2009-10-27 17:19:44 [ndbd] WARNING  -- Ndb kernel thread 0 is stuck
> in:
> > Job Handling elapsed=1899
> > 2009-10-27 17:19:44 [ndbd] INFO     -- Watchdog: User time: 468737
> > System time: 246141
> > 2009-10-27 17:19:44 [ndbd] WARNING  -- Ndb kernel thread 3 is stuck
> in:
> > Checking connections elapsed=1799
> > 2009-10-27 17:19:44 [ndbd] INFO     -- Watchdog: User time: 468737
> > System time: 246141
> > 2009-10-27 17:19:44 [ndbd] WARNING  -- Ndb kernel thread 0 is stuck
> in:
> > Job Handling elapsed=1999
> > 2009-10-27 17:19:44 [ndbd] INFO     -- Watchdog: User time: 468738
> > System time: 246148
> > 2009-10-27 17:19:44 [ndbd] WARNING  -- Ndb kernel thread 3 is stuck
> in:
> > Checking connections elapsed=1899
> > 2009-10-27 17:19:44 [ndbd] INFO     -- Watchdog: User time: 468738
> > System time: 246148
> > Warning: 1 thread(s) did not stop before starting crash dump.
> > Warning: 1 thread(s) did not stop before starting crash dump.
> > 2009-10-27 17:19:44 [ndbd] INFO     -- Remote note id 10.
> > 2009-10-27 17:19:44 [ndbd] INFO     -- TransporterCallback.cpp
> > 2009-10-27 17:19:44 [ndbd] INFO     -- Watchdog shutting down system
> > 2009-10-27 17:19:44 [ndbd] INFO     -- Watchdog shutdown completed -
> > exiting
> > 2009-10-27 17:19:44 [ndbd] ALERT    -- Node 1: Forced node shutdown
> > completed. Caused by error 6052: 'Signal lost, out of send buffer
> > memory,
> > please increase SendBufferMemory or lower the load(Resource
> > configuration error). Permanent error, external action needed'.
> >
> > --
> > MySQL Cluster Mailing List
> > For list archives: http://lists.mysql.com/cluster
> > To unsubscribe:
> http://lists.mysql.com/cluster?unsub=1
> >
> >
Thread
benchmarking. node failure. Signal lost. Matthew Boehm27 Oct
  • Re: benchmarking. node failure. Signal lost.Rayson Ho28 Oct
    • RE: benchmarking. node failure. Signal lost.Matthew Boehm28 Oct