List:General Discussion« Previous MessageNext Message »
From:Sasha Pachev Date:April 29 2002 3:55pm
Subject:Re: Slave crashes with SEGV on master shutdown
View as plain text  
On Monday 29 April 2002 04:39 am, David Harper wrote:
> >Description:
> 	I'm running the pre-compiled Compaq Alpha (OSF1) version of MySQL 3.23.49
> 	with master/slave replication.
> 
> 	The master mysqld is running on one machine, the slave on another.
> 	Everything works fine until I shutdown the master server. The slave
> 	then immediately crashes due to a segmentation violation fault. Here
> 	are the lines from the mysqld_multi.log file:
> 
> /nfs/pathsoft/external/mysql-3.23.49/libexec/mysqld: ready for connections
> 020426 11:26:15  Slave: connected to master 'slave@babel:14641',  
replication started in log 'mysql.001' at position 156
> 020426 11:27:03  Slave: received 0 length packet from server, apparent 
master shutdown:  (0)
> 020426 11:27:03  Slave: Failed reading log event, reconnecting to retry, 
log 'mysql.002' position 73
> mysqld got signal 11;
> 
> 	I can make the slave server crash *every* time it loses its connection
> 	to the master server.
> 
> 	It's not a hardware problem on one machine, because I have run the master
> 	and slave servers on several combinations of machines and the slave
> 	crashes *every* time.
> 
> 	It might help you to know that when I run a slave server on an i386 Linux
> 	machine, it survives when the master server on the Alpha machine is shut 
down,
> 	and it happily reconnects when I restart the master server.
> 
> 	This leads me to think that the problem is in the slave code, and is 
specific
> 	to the build for Compaq Alphas.
> 
> 	I built mysqld from the source code with the --with-debug option specified
> 	to the configure script. Then I duplicated the slave server crash and found
> 	that the problem is in the code which tried to re-connect to the master.
> 	Specifically, the SEGV fault occurs within call to gethostbyname_r. Here is
> 	the debugger traceback:
> 
> (ladebug) where
> >0  0x12025a538 in __nxm_thread_kill(0x20000f3f8c8, 0xb, 0x1, 0x1, 0x25, 
0x20000f3f600) in /nfs/pathsoft/external/mysql-3.23.49-src/libexec/mysqld
> #1  0x120242ac4 in pthread_kill(0x20000f3f8c8, 0xb, 0x1, 0x1, 0x25, 
0x20000f3f600) in /nfs/pathsoft/external/mysql-3.23.49-src/libexec/mysqld
> #2  0x1201b48d4 in write_core(sig=11) "stacktrace.c":220
> #3  0x120103f48 in handle_segfault(sig=11) "mysqld.cc":1287
> #4  0x120287bcc in __sigtramp(0x20000f3f8c8, 0xb, 0x1, 0x1, 0x25, 
0x20000f3f600) in /nfs/pathsoft/external/mysql-3.23.49-src/libexec/mysqld
> #5  0x1202b2b48 in rewind(0x20000000199, 0x20000f3b418, 0x20000f3b318, 
0x20000f3b418, 0x0, 0x1) in 
/nfs/pathsoft/external/mysql-3.23.49-src/libexec/mysqld
> #6  0x1202928e4 in UnknownProcedure1FromFile1780(0x20000000199, 
0x20000f3b418, 0x20000f3b318, 0x20000f3b418, 0x0, 0x1) in 
/nfs/pathsoft/external/mysql-3.23.49-src/libexec/mysqld
> #7  0x120293330 in UnknownProcedure13FromFile1780(0x20000000199, 
0x20000f3b418, 0x20000f3b318, 0x20000f3b418, 0x0, 0x1) in 
/nfs/pathsoft/external/mysql-3.23.49-src/libexec/mysqld
> #8  0x1202953e0 in __gethostbyname_r(0x20000000199, 0x20000f3b418, 
0x20000f3b318, 0x20000f3b418, 0x0, 0x1) in 
/nfs/pathsoft/external/mysql-3.23.49-src/libexec/mysqld
> #9  0x12021c6e4 in my_gethostbyname_r(name=0x1400788d8="babel", 
result=0x20000f3b318, buffer=0x20000f3b418="", buflen=8840, 
h_errnop=0x20000f3b338) "my_pthread.c":440
> #10 0x1201b27ec in mc_mysql_connect(mysql=0x20000f3d6e8, 
host=0x1400788d8="babel", user=0x1400994a0="slave", 
passwd=0x140098320="mylittlesecret", db=0x0, port=14641, unix_socket=0x0, 
client_flag=133) "mini_client.cc":622
> #11 0x1201b20ac in mc_mysql_reconnect(mysql=0x14009fb00) 
"mini_client.cc":416
> #12 0x1201ae040 in safe_reconnect(thd=0x140079400, mysql=0x14009fb00, 
mi=0x14005bb20) "slave.cc":1517
> #13 0x1201adae8 in handle_slave(arg=0x0) "slave.cc":1384
> #14 0x12023f648 in __thdBase(0x20000000199, 0x20000f3b418, 0x20000f3b318, 
0x20000f3b418, 0x0, 0x1) in 
/nfs/pathsoft/external/mysql-3.23.49-src/libexec/mysqld
>

David:

At this point, I have two theories:

a) There is something wrong with your gethostbyname_r function
b) MySQL has a sublte buffer overrun ( probably only a couple of bytes), that 
in your case happen to corrupt some critical structures in __gethostbyname_r.

If a) is the case, I would first try how well gethostbyname_r handles 
sequences of repeated calls. I would imagine the bug will be manifested only 
with a certain name resolutoin setup.

For a temporary workaround, I would suggest trying to use a numeric IP, or 
try to use a different name resolution configuration ( eg. put the master in 
/etc/hosts instead of of name server, or vice versa).

-- 
MySQL Development Team
For technical support contracts, visit https://order.mysql.com/?ref=mspa
   __  ___     ___ ____  __ 
  /  |/  /_ __/ __/ __ \/ /   Sasha Pachev <sasha@stripped>
 / /|_/ / // /\ \/ /_/ / /__  MySQL AB, http://www.mysql.com/
/_/  /_/\_, /___/\___\_\___/  Provo, Utah, USA
       <___/                  
Thread
Slave crashes with SEGV on master shutdownDavid Harper29 Apr
Re: Slave crashes with SEGV on master shutdownSasha Pachev29 Apr
  • Re: Slave crashes with SEGV on master shutdownMichael Widenius16 May
Re: Slave crashes with SEGV on master shutdownDavid Harper30 Apr