----- Original Message -----
> From: "Janos Lehnhardt" <janos.lehnhardt@stripped>
> 2012-10-18 15:32:46 [MgmSrvr] WARNING -- Node 2: Node 4 missed heartbeat 4
> 2012-10-18 15:32:46 [MgmSrvr] ALERT -- Node 2: Node 4 declared dead due to missed
> 2012-10-18 15:32:46 [MgmSrvr] INFO -- Node 2: Communication to Node 4 closed
Well, this is obviously why it keeps being disconnected. Now the question is, why is it
missing heartbeats? You say there's no firewalls. I assume all nodes are connected to the
same switch, then? Plug node 4 into another port, change the cable, et cetera to rule out
network issues. Use a different network card, if you can. Also check the general system
logs for unusual messages.
Is your system not overloaded, cpu-wise? Is it swapping badly? Run ping to one of the
other nodes (or better still, MTR) to see how the network behaves right before the
disconnect. Grab a tcpdump and see wether there's something interesting to be seen there.
Hmm. Also, check wether it's not as simple as another device claiming the same IP. Seen
that one happen before, too :-)
Linux Bier Wanderung 2012, now also available in Belgium!
August, 12 to 19, Diksmuide, Belgium - http://lbw2012.tuxera.be