I have a production cluster running with 2 data nodes , 2 sql nodes and 1
mgmt node .
And I have a slave to one of the above servers with innodb plugin for data
which is running fine .
One day , while trying to do some parameter changes wrt disk based tables ,
I got some error
and the cluster was not able to re-start/recover . In this case , I had to
start the cluster with --initial
option again and reload/restore the data from the slave . But this took
considerable time(around 2 hours) ..
and I was safe as it was off-peak time ..and did not impact the customers.
How can I handle this kind of complete failure of cluster , in order to
have no downtime at all
or to quickly recover ?!
I am sure somebody might have faced this kind of issue earlier ...
Advice/Guidance in this regard
is highly appreciable ..
Thank you all in advance ..