List:Cluster« Previous MessageNext Message »
From:Johan Andersson Date:July 23 2004 9:12pm
Subject:Re: unable to start (infinite crash loop) and workaround
View as plain text  
Hi,

Thank you very much.

Devananda wrote:

>
> While not running any inserts, the results of /sbin/hdparm were
> Timing buffer-cache reads:   3744 MB in  2.00 seconds = 1872.00 MB/sec
> Timing buffered disk reads:  140 MB in  3.02 seconds =  46.36 MB/sec
>
> At the time of this crash, I was only running inserts from 3 API 
> nodes. I can just as easily run from 1 or from all 5. Today, I cleared 
> all the data directories, started up fresh, and began running inserts 
> from 4 of the 5 API nodes, and watched vmstat. The 'bo' column was 
> consistently over 1000, usually around 5000, and sometimes spiked to 
> 9- or 10,000.


So the disk is pretty loaded (assuming you have 4096 bytes blocks), and 
also considering that write perf is generally slower than read. Change 
TimeBetweenGlobalCheckpoints to 500ms
Also set TimeBetweenLocalCheckpoints to 20. Both these changes will make 
the disk writes spread out a bit more. Try that. I am curious to know if 
there is any difference in the blocks out column.

Also, it is interesting to see cat /proc/loadavg when you have these peak

I will distribute the information to the right people. Thank you very much.

Good luck you too,
johan

> The only things running on these boxes, besides the cluster, is php, 
> which I use for the insert script.
>
>
> Good luck! Let me know if there's anything more I can do.
>
> Best regards,
> Devananda
>
>
>
> Johan Andersson wrote:
>
>> Hi,
>>
>> Thank you for providing us with important test data! I have also 
>> noticed problems with system restart and your information is very 
>> valuable. We are very interested in getting the tracefiles. Can you 
>> put together the information as follows:
>>
>> devananda_mgm.tgz (cluster.log + config.ini)
>> devananda_db12.tgz (tracefiles + error.log)
>> devananda_db13.tgz (tracefiles + error.log)
>> devananda_db14.tgz (tracefiles + error.log)
>> devananda_db15.tgz (tracefiles + error.log)
>>
>> Send this to me privately (because of the size of the attachments) 
>> and I will distribute it to the right people!
>>
>> Also, I noticed that your NDB nodes started to miss heartbeats. So I 
>> have a couple of questions and recommendations:
>>
>> * Was the system heavily loaded when the nodes started to miss 
>> heartbeats?
>> * What hardware are you using (Disk subsystem (IDE, SCSI), CPU?
>> * Does two or more NDB nodes share a single disk?
>> * Can you do /sbin/hdpart -Tt /dev/hdX  (where X is the drive that 
>> keeps the NDB filessytem)?
>>
>> If the system is heavily loaded and the disks are slow then there is 
>> a chance that the NDB nodes can miss heartbeats. This can happen 
>> because the NDB nodes writes checkpoints and transaction logs to 
>> disk, and this can be very disk intensive. If you can do vmstat 1 (a 
>> program that atleast exist on Linux) and give me information about 
>> how many blocks per second that are written to disk (look for the 
>> "bo" column) and also what block size and file system (ext3, 
>> reiserfs...) you are currently using.
>>
>> A way to flatten out the disk writes is to change the 
>> TimeBetweenLocalCheckpoints to ~500. This means that the redo log 
>> buffers will be flushed to disk often and thus reducing disk writes. 
>> Otherwise, during high load (write load) the REDO log buffers can 
>> become big, resulting in a lot of information that must be written to 
>> disk. During these disk writes on a very loaded system other 
>> processes can be stalled because they must wait for I/O, thus 
>> resulting in that heartbeats might not be sent as they should, 
>> because other processes must also do I/O.
>>
>> In any case, you should always be able to do a system restart. Sorry 
>> for the inconvenience caused and thanks again for you help.
>>
>> Best regards,
>> Johan Andersson
>>
>>
>>
>> Devananda wrote:
>>
>>> Sorry! forgot to post the workaround.
>>>
>>> executed 'all stop' and deleted the ndb data storage directory on 2 
>>> of my 4 DB nodes (one from each pair). When I restarted the cluster, 
>>> it started slowly and then copied data over onto the 2 that I 
>>> deleted. This worked once, but I am having trouble making it work a 
>>> second time.
>>>
>>>>
>>>
>>


Thread
unable to start (infinite crash loop) and workaroundDevananda21 Jul
  • Re: unable to start (infinite crash loop) and workaroundDevananda21 Jul
    • Re: unable to start (infinite crash loop) and workaroundJohan Andersson21 Jul
Re: unable to start (infinite crash loop) and workaroundJohan Andersson23 Jul