I have been trying to understand the behavior of SHOW SLAVE HOSTS, and it didn't seem
to match the documentation, so I went to the source and got confused more :-)
I'm sure I am commiting several sins here including looking at 5.1 code while running
5.0.40, but the code I'm looking at is the Doxygen-ized 5.1 code of
sql/repl_failsafe.cc at
http://dev.mysql.com/sources/doxygen/mysql-5.1/repl__failsafe_8cc-source.html
So far, I find there is a hash table called slave_list, which is inserted from
register_slave() and read from the function (whose name I have now forgotten and can't
see) called by SHOW SLAVE HOSTS.
It looks to me like the command doesn't work quite like it is supposed to. It looks
like each slave is supposed to always know what other slaves are connected at all
times, because each slave reports to and reads from its master, and the master updates
the slave whenever another slave connects or disconnects (I think -- I am not very good
at reading the source). Yet on my 5.0.40 setup, I have the following replication
topology:
portland
=> fries
=> fresno
=> nepal
And on these servers, I see the following:
on portland:
+-----------+--------+------+-------------------+-----------+
| Server_id | Host | Port | Rpl_recovery_rank | Master_id |
+-----------+--------+------+-------------------+-----------+
| 40 | fresno | 3306 | 0 | 21 |
| 20 | fries | 3306 | 0 | 21 |
+-----------+--------+------+-------------------+-----------+
on fries:
+-----------+----------+------+-------------------+-----------+
| Server_id | Host | Port | Rpl_recovery_rank | Master_id |
+-----------+----------+------+-------------------+-----------+
| 21 | portland | 3306 | 0 | 11 |
| 40 | fresno | 3306 | 0 | 21 |
| 20 | fries | 3306 | 0 | 21 |
+-----------+----------+------+-------------------+-----------+
on fresno:
+-----------+----------+------+-------------------+-----------+
| Server_id | Host | Port | Rpl_recovery_rank | Master_id |
+-----------+----------+------+-------------------+-----------+
| 9 | nepal | 3306 | 0 | 40 |
| 40 | fresno | 3306 | 0 | 21 |
| 20 | fries | 3306 | 0 | 21 |
| 21 | portland | 3306 | 0 | 11 |
+-----------+----------+------+-------------------+-----------+
on nepal:
+-----------+----------+------+-------------------+-----------+
| Server_id | Host | Port | Rpl_recovery_rank | Master_id |
+-----------+----------+------+-------------------+-----------+
| 21 | portland | 3306 | 0 | 11 |
| 20 | fries | 3306 | 0 | 21 |
| 40 | fresno | 3306 | 0 | 21 |
| 42 | portland | 3306 | 0 | 11 |
| 9 | nepal | 3306 | 0 | 40 |
+-----------+----------+------+-------------------+-----------+
I'm sure you have guessed some of these servers have swapped roles at various times.
For example, portland used to be a slave of usa, which it replaced (after an OS
rebuild) and which is no longer in use. Likewise, I think nepal used to be a slave of
portland, a very long time ago -- probably six months ago. But all of these servers
have surely been restarted, if not given a new OS, during the swapping. Why the
obsolete entry for portland (currently server_id 21) on nepal?
What should this command really show in my setup? Should each of the four machines
show the same thing? (I think they are meant to) Should a server unregister itself
when it is stopped, and is the old entry for portland on nepal therefore a bug?
Finally, a question on the code itself, from the file linked above:
00473 Asks the master for the list of its other connected slaves.
00474 This is for failsafe replication:
00475 in order for failsafe replication to work, the servers involved in
00476 replication must know of each other. We accomplish this by having each
00477 slave report to the master how to reach it, and on connection, each
00478 slave receives information about where the other slaves are.
Shouldn't each slave also receive information about the other slaves whenever a new
slave connects? I realize this becomes one of those O(n(n-1)) kinds of problems but it
seems like the only way to get correct behavior -- unless only one server (the topmost
master in the replication tree) ever stores any information about which slaves are
connected. But then I imagine this isn't exactly failsafe.
Thanks for reading my disjointed thoughts and questions!
Baron
--
Baron Schwartz
http://www.xaprb.com/