List:Cluster« Previous MessageNext Message »
From:Richard McCluskey Date:October 26 2009 2:47pm
Subject:RE: GCP stop signal brings down data node
View as plain text  
currently the delete batch process is in the hundreds of rows realm, as
we are still ramping up, so I dont think than can be the issue...
thanks for the thought though, I'll definitely make sure we batch things
in the future.

Richard 


On Mon, 2009-10-26 at 15:37 +0100, Sammut, Etienne, VF-MT wrote:
> Hi Richard
> 
> But what is the amount of deleted records during the night? I found this
> problem when my delete statement had to delete more than 200k records ..
> thus I modified my delete procedure to delete batches of 10000. Hope
> this helps you
> 
> Regards
> Etienne Sammut 
> 
> -----Original Message-----
> From: Richard McCluskey [mailto:rmccluskey@stripped] 
> Sent: Monday, October 26, 2009 3:35 PM
> To: Jonas Oreland
> Cc: cluster@stripped
> Subject: Re: GCP stop signal brings down data node
> 
> On Mon, 2009-10-26 at 15:24 +0100, Jonas Oreland wrote:
> > What kind of transactions do you run ?
> > Disk-based NDB is currently a bit sensitive to "big" transactions
> > 
> 90% of our DB work is single record reads/writes. we do use some
> functions and stored procedures, but we do nothing that pulls large
> datasets. 
> E.G. Our current most common table has 9.8 million rows, but we only
> ever pull out single records by primary key. The cleanup cron for stale
> records runs in the middle of the night when our traffic is almost nil.
> 
> I hope this is what you were asking !
> 
> Richard
> 
> 
> 
> > /Jonas
> > 
> > Richard McCluskey wrote:
> > > Got this lovely message from a data node this weekend :
> > > 
> > > Time: Sunday 25 October 2009 - 15:04:01
> > > Status: Temporary error, restart node
> > > Message: System error, node killed during node restart by other node
> > > (Internal error, programming error or missing error message, please
> > > report a bug)
> > > Error: 2303
> > > Error data: Node 4 killed this node because GCP stop was detected
> > > Error object: NDBCNTR (Line: 263) 0x0000000a
> > > Program: ndbd
> > > Pid: 21156
> > > Trace: /var/lib/mysql/ndb_4_trace.log.5
> > > Version: mysql-5.1.35 ndb-7.0.7
> > > ***EOM***
> > > 
> > > Interesting thing is there was no other node that got restarted. I
> did
> > > look this up and see that it can be caused by slow disk (we use disk
> > > based NDB), or insufficient disk throughput but my disks are raided
> > > SCSI, so I doubt it is that. From my config.ini (relevant parts
> only) :
> > > 
> > > DataMemory=10860M
> > > IndexMemory=1358M
> > > SharedGlobalMemory=384M
> > > DiskPageBufferMemory=2048M
> > > 
> > > I did the following caluculations for memory allocation :
> > > 
> > > total memory   = 16384 MG
> > > OS reqs        = 1126 MG
> > > Buffer Memory  = 900 MG
> > > DataMemory     = 10860 MG
> > > IndexMemory    = 1358 MG
> > > 
> > > DPBM = 0.8 * (total memory - (OS + Buffer + data + index)
> > > DPBM = 0.8 * (16384 - (1126 + 900 + 10860 + 1358))
> > > DPBM = 1712
> > > 
> > > Therefore DiskPageBufferMemory should be mimimum 1712M, so setting
> it to
> > > 2048M should leave me loads of room right ?
> > > 
> > > So can anyone tell me why we are having this issue with Stopped
> > > datanodes ? It isn't doing much for the pointy-haired bosses'
> > > confidence !
> > > 
> > > Obviously let me know if I really need to file a bug, and I'll
> upload
> > > the tracelog etc ...
> > > 
> > > Richard
> > > 
> > > 
> > 
> 
> -- 
> MySQL Cluster Mailing List
> For list archives: http://lists.mysql.com/cluster
> To unsubscribe:
> http://lists.mysql.com/cluster?unsub=1
> 
>
> -------------------------------------------------------------------------------------
> Vodafone
>
> -------------------------------------------------------------------------------------
> 
> This email is intended only for the use of individuals to whom it is addressed, as it
> may contain confidential or privileged information. If you are not a named addressee,
> intended recipient, or the person responsible for delivering the message to the named
> addressee, be advised that you have received this email in error and that you should not
> disseminate, distribute, print, copy this mail or otherwise divulge its contents. In such
> instances, please notify Vodafone Malta Limited on telephone number +356 99999247 and
> delete this email from your system. Since this transmission was affected via email,
> Vodafone Malta Limited cannot guarantee that it is secure or error-free as information
> could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain
> viruses. Vodafone Malta Limited does not accept liability for any errors or omissions in
> the contents of this message which arise as a result of email transmission.
> 
> Save the environment for our children - Print e-mail only when necessary.
Thread
GCP stop signal brings down data nodeRichard McCluskey26 Oct
  • Re: GCP stop signal brings down data nodeJonas Oreland26 Oct
    • Re: GCP stop signal brings down data nodeRichard McCluskey26 Oct
      • RE: GCP stop signal brings down data nodeEtienne, VF-MT Sammut26 Oct
        • RE: GCP stop signal brings down data nodeRichard McCluskey26 Oct