From: Hendrik Woltersdorf Date: October 22 2012 8:36am Subject: Re: cluster 5.5.27-ndb-7.2.8 crash with signal 11 in ndb_binlog_thread_func List-Archive: http://lists.mysql.com/cluster/8423 Message-Id: <50850589.8070403@xcom.de> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit Am 17.10.2012 11:38, schrieb Hendrik Woltersdorf: > Hi, > > after upgrading from version 7.1.18 to 7.2.8 we get regularly a crash > in all systems under the conditions described in bugreport #65979: > > 09:38:47 UTC - mysqld got signal 11 ; > This could be because you hit a bug. It is also possible that this binary > or one of the libraries it was linked against is corrupt, improperly > built, > or misconfigured. This error can also be caused by malfunctioning > hardware. > We will try our best to scrape up some info that will hopefully help > diagnose the problem, but since we have already crashed, > something is definitely wrong and this may fail. > > key_buffer_size=268435456 > read_buffer_size=262144 > max_used_connections=1 > max_threads=200 > thread_count=1 > connection_count=1 > It is possible that mysqld could use up to > key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = > 418037 K bytes of memory > Hope that's ok; if not, decrease some variables in the equation. > > Thread pointer: 0x1ed3b260 > Attempting backtrace. You can use the following information to find out > where mysqld died. If you see no messages after this, something went > terribly wrong... > stack_bottom = 496f3e98 thread_stack 0x40000 > /usr/sbin/mysqld(my_print_stacktrace+0x35)[0x852145] > /usr/sbin/mysqld(handle_fatal_signal+0x3e1)[0x708a01] > /lib64/libpthread.so.0[0x327d40de70] > /lib64/libc.so.6(strlen+0x30)[0x327c876170] > /usr/sbin/mysqld[0x9fabf0] > /usr/sbin/mysqld(ndb_binlog_thread_func+0x19da)[0xa0166a] > /lib64/libpthread.so.0[0x327d4062f7] > /lib64/libc.so.6(clone+0x6d)[0x327c8ce85d] > > Trying to get some variables. > Some pointers may be invalid and cause the dump to abort. > Query (0): is an invalid pointer > Connection ID (thread ID): 1 > Status: NOT_KILLED > > I can repduce this crash by doing a "replace into" statement into a > table, where all columns ar part of the primary key. > e.g.: > 'CREATE TABLE `sec_type` ( > `client_id` varchar(10) NOT NULL, > `sec_type` varchar(10) NOT NULL, > PRIMARY KEY (`client_id`,`sec_type`), > KEY `SECTYPE` (`client_id`,`sec_type`) > ) ENGINE=ndbcluster DEFAULT CHARSET=latin1' > > insert into sec_type values('1','TPIN'); > replace into sec_type values('1','TPIN'); > > The "replace" crashes all SQL nodes in the cluster with activated > binary logging. > Usually we install the generic binaries package, but a test with RPMs > showed the same crash. > > All application developers gave me their word, that they do never use > this "replace". The general query log did not show any suspicious > statements. > > Any ideas, how to find the killer statement and/or how to solve this > problem? > I do not want to downgrade to 7.1.* again. > For more tests I got the source code and patched it, to not crash at: ->store(first->next_master_log_file, (uint)strlen(first->next_master_log_file), &my_charset_bin); in ha_ndbcluster_binlog.cc by not filling the columns next_file and next_position in ndb_binlog_index if first->next_master_log_file == NULL. This works. During these tests I saw, that in the crash case nothing gets written to the binary log. That makes sense, because nothing has changed eventually. But what makes no sense to me is, that a row gets written to ndb_binlog_index when nothing gets written to the binary log. That might explain, why input data is missing. In the bug tracker bug #65979 is set to "Not a Bug". Is someone able to set the status to "Is a Bug" or else? (I can't find a button to do this.) kind regards Hendrik Woltersdorf XCOM AG Tel.: 0375/27008-580 Tel. mobil: 015222999584 Email: Hendrik.Woltersdorf@stripped