List:Cluster« Previous MessageNext Message »
From:Richard McCluskey Date:November 4 2009 4:43pm
Subject:Re: NDBD addition failed
View as plain text  
Hi Andrew,


On Wed, 2009-11-04 at 16:29 +0000, Andrew Hutchings wrote:
> Hello Richard,
> 
> On Wed, 2009-11-04 at 11:02 -0500, Richard McCluskey wrote:
> > so I added a physical NDBD process (id=5) to the running cluster. After
> > about an hour of copying all the tablespace info, ndbd_mgm showed this :
> 
> Can you please attach your config.ini and let us know the steps you took
> to get in this state?
> 

so before I added the new NDBD process ndb_mgm showed this :

ndb_mgm> show
Cluster Configuration
---------------------
[ndbd(NDB)]	2 node(s)
id=4	@10.32.4.20  (mysql-5.1.35 ndb-7.0.7, Nodegroup: 0, Master)
id=5 (not connected, accepting connect from 10.32.4.30)

[ndb_mgmd(MGM)]	1 node(s)
id=1	@10.32.4.10  (mysql-5.1.35 ndb-7.0.7)

[mysqld(API)]	2 node(s)
id=2	@10.32.4.40  (mysql-5.1.35 ndb-7.0.7)
id=3	@10.32.4.50  (mysql-5.1.35 ndb-7.0.7)

ndb_mgm> 


when I added the node with a 'ndbd -v' it switched to :

Cluster Configuration
---------------------
[ndbd(NDB)]	2 node(s)
id=4	@10.32.4.20  (mysql-5.1.35 ndb-7.0.7, Nodegroup: 0, Master)
id=5	@10.32.4.30  (mysql-5.1.35 ndb-7.0.7, starting, Nodegroup: 0)

[ndb_mgmd(MGM)]	1 node(s)
id=1	@10.32.4.10  (mysql-5.1.35 ndb-7.0.7)

[mysqld(API)]	2 node(s)
id=2	@10.32.4.40  (mysql-5.1.35 ndb-7.0.7)
id=3	@10.32.4.50  (mysql-5.1.35 ndb-7.0.7)

ndb_mgm>

which of course looks good, as the starting node (node 5) is assigned to
nodegroup 0. However, once the ndbd_mgm reported 'Node 5 started' it
showed up as 'no nodegroup' and the data and index usage reported back
as 0%.

Cluster Configuration
---------------------
[ndbd(NDB)]	2 node(s)
id=4	@10.32.4.20  (mysql-5.1.35 ndb-7.0.7, Nodegroup: 0, Master)
id=5	@10.32.4.30  (mysql-5.1.35 ndb-7.0.7, no nodegroup)

[ndb_mgmd(MGM)]	1 node(s)
id=1	@10.32.4.10  (mysql-5.1.35 ndb-7.0.7)

[mysqld(API)]	2 node(s)
id=2	@10.32.4.40  (mysql-5.1.35 ndb-7.0.7)
id=3	@10.32.4.50  (mysql-5.1.35 ndb-7.0.7)

ndb_mgm> Node 4: Data usage is 52%(183764 32K pages of total 347520)
ndb_mgm> Node 4: Index usage is 25%(44406 8K pages of total 173856)
Node 5: Index usage is 0%(0 8K pages of total 173856)
Node 5: Data usage is 0%(0 32K pages of total 347520)


a tail -f /var/lib/mysql/ndb_4_out.log (the working nodes output log)
showed this ...:



2009-11-04 09:45:55 [ndbd] INFO     -- findNeighbours from: 1954 old
(left: 65535 right: 65535) new (5 5)
2009-11-04 10:43:38 [ndbd] INFO     -- granting dict lock to 5
2009-11-04 10:43:38 [ndbd] INFO     -- clearing dict lock for 5




 ... while the ndb_5_out.log of the connecting node showed this ...:


== === == == == === == == == == == = === = === = === == == == ==

[go2admin@sqlDnTwo mysql]$ cat ndb_5_out.log 
2009-11-04 09:46:14 [ndbd] INFO     -- Angel pid: 18016 ndb pid: 18017
NDBMT: non-mt
2009-11-04 09:46:14 [ndbd] INFO     -- NDB Cluster -- DB node 5
2009-11-04 09:46:14 [ndbd] INFO     -- mysql-5.1.35 ndb-7.0.7 --
2009-11-04 09:46:14 [ndbd] INFO     -- WatchDog timer is set to 6000 ms
2009-11-04 09:46:14 [ndbd] INFO     -- Ndbd_mem_manager::init(1) min:
10864Mb initial: 11248Mb
Adding 8192Mb to ZONE_LO (1,262142)
Adding 3056Mb to ZONE_LO (262145,97791)
2009-11-04 09:46:14 [ndbd] INFO     -- Start initiated (mysql-5.1.35
ndb-7.0.7)
2009-11-04 09:46:14 [ndbd] WARNING  -- Ndb kernel thread 0 is stuck in:
Job Handling elapsed=99
2009-11-04 09:46:14 [ndbd] INFO     -- Watchdog: User time: 0  System
time: 13
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
WOPool::init(61, 9)
RWPool::init(22, 13)
WARNING: timerHandlingLab now: 66357658 sent: 66357246 diff: 412
RWPool::init(42, 18)
RWPool::init(62, 13)
Using 1 fragments per node
RWPool::init(c2, 18)
RWPool::init(e2, 16)
WOPool::init(41, 8)
RWPool::init(82, 12)
RWPool::init(a2, 52)
WOPool::init(21, 10)
WARNING: timerHandlingLab now: 66357816 sent: 66357658 diff: 158
2009-11-04 09:46:14 [ndbd] INFO     -- Start phase 0 completed 
2009-11-04 09:46:14 [ndbd] INFO     -- CM_REGREF from Node 5 to our Node
5. Cause = Election without selecting new candidate
2009-11-04 09:46:14 [ndbd] INFO     -- Initial start, waiting for 4 to
connect,  nodes [ all: 4 and 5 connected: 5 no-wait:  ]
2009-11-04 09:46:14 [ndbd] INFO     -- CM_REGCONF president = 4, own
Node = 5, our dynamic id = 2
2009-11-04 09:46:14 [ndbd] INFO     -- findNeighbours from: 2042 old
(left: 65535 right: 65535) new (4 4)
2009-11-04 09:46:14 [ndbd] INFO     -- We are Node 5 with dynamic ID 2,
our left neighbour is Node 4, our right is Node 4
2009-11-04 09:46:14 [ndbd] INFO     -- Start phase 1 completed 
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
execSTART_RECREQ chaning srnodes from 0000000000000030 to
0000000000000020
Applying undo to LCP: 0
2009-11-04 10:34:24 [ndbd] INFO     -- Undo head -
undo_log_group_4info_sms.log page: 1 lsn: 0
2009-11-04 10:34:24 [ndbd] INFO     --    - next -
undo_log_group_sessions.log(0)
2009-11-04 10:34:24 [ndbd] INFO     --    - next -
undo_log_group_locationserver.log(0)
2009-11-04 10:34:24 [ndbd] INFO     --    - next -
undo_log_group_feedengine_ibatis.log(0)
2009-11-04 10:34:24 [ndbd] INFO     --    - next -
undo_log_group_adtracker.log(0)
2009-11-04 10:34:24 [ndbd] INFO     --    - next -
undo_log_group_activemq.log(0)
2009-11-04 10:34:24 [ndbd] INFO     --    - next -
undo_log_group_go2.log(0)
2009-11-04 10:34:24 [ndbd] INFO     -- Logfile group: 7 
2009-11-04 10:34:24 [ndbd] INFO     --   head:
undo_log_group_4info_sms.log page: 1
2009-11-04 10:34:24 [ndbd] INFO     --   tail:
undo_log_group_4info_sms.log page: 1
2009-11-04 10:34:24 [ndbd] INFO     -- Flushing page cache after undo
completion
2009-11-04 10:34:24 [ndbd] INFO     -- Flushing complete

== == == == == == == === == === == == === == == == == === == 

Finally here is my config.ini :

[go2admin@admin01 mysql-cluster]$ cat config.ini 
[TCP DEFAULT]
SendBufferMemory=4M
ReceiveBufferMemory=4M

[NDB_MGMD DEFAULT]
PortNumber=1186
Datadir=/var/lib/mysql-cluster

[NDB_MGMD]
Id=1
Hostname=10.32.4.10
ArbitrationRank=1

[NDBD DEFAULT]
NoOfReplicas=2
Datadir=/var/lib/mysql
DataMemory=10860M
IndexMemory=1358M
LockPagesInMainMemory=0

MaxNoOfConcurrentOperations=100000

StringMemory=25
MaxNoOfTables=4096
MaxNoOfOrderedIndexes=10000
MaxNoOfUniqueHashIndexes=2500
MaxNoOfAttributes=120000
DiskCheckpointSpeedInRestart=100M
FragmentLogFileSize=256M
InitFragmentLogFiles=FULL
NoOfFragmentLogFiles=64
RedoBuffer=32M

TimeBetweenLocalCheckpoints=20
TimeBetweenGlobalCheckpoints=1000
TimeBetweenEpochs=100

MemReportFrequency=30
BackupReportFrequency=10

### Params for setting logging
LogLevelStartup=15
LogLevelShutdown=15
LogLevelCheckpoint=8
LogLevelNodeRestart=15

### Params for increasing Disk throughput
BackupMaxWriteSize=1M
BackupDataBufferSize=16M
BackupLogBufferSize=4M
BackupMemory=20M
#Reports indicates that odirect=1 can cause io errors (os err code 5) on
some systems. You must test.
ODirect=1

### Watchdog
#TimeBetweenWatchdogCheckInitial=30000

### TransactionInactiveTimeout  - should be enabled in Production
#TransactionInactiveTimeout=30000
### CGE 6.3 - REALTIME EXTENSIONS
#RealTimeScheduler=1
#SchedulerExecutionTimer=80
#SchedulerSpinTimer=40

### DISK DATA
SharedGlobalMemory=384M
#read my blog how to set this:
DiskPageBufferMemory=2048M

### Multithreading
MaxNoOfExecutionThreads=4
#BatchSizePerLocalScan=512

[MYSQLD DEFAULT]
BatchSize=512
#BatchByteSize=2048K
#MaxScanBatchSize=2048K

[MYSQLD]
Id=2
Hostname=10.32.4.40

[MYSQLD]
Id=3
Hostname=10.32.4.50

[NDBD]
Id=4
Hostname=10.32.4.20

[NDBD]
Id=5
Hostname=10.32.4.30



let me know if you need anything else ...


Richard

> Kind Regards


-- 
Richard McCluskey
Senior Engineer
go2Media, Inc.
rmccluskey@stripped
(617) 671-0057
 
http://go2.com  - The best entertainment guide on mobile.
Thread
NDBD addition failedRichard McCluskey4 Nov
  • Re: NDBD addition failedAndrew Hutchings4 Nov
    • Re: NDBD addition failedRichard McCluskey4 Nov
      • Re: NDBD addition failedAndrew Hutchings5 Nov
        • Re: NDBD addition failedAndrew Hutchings5 Nov