From: Ole John Aske Date: August 14 2012 11:25am Subject: bzr push into mysql-5.5-cluster-7.2 branch (ole.john.aske:3974 to 3975) Bug#14489398 List-Archive: http://lists.mysql.com/commits/144561 X-Bug: 14489398 Message-Id: <20120814112548.15961.56448.3975@fimafeng09.no.oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit 3975 Ole John Aske 2012-08-14 Fix for Bug#14489398 CLUSTER BACKUP FAILS WHEN USING NDBMTD AND CONFIGURED MULTIPLE LQHS Fix regression introduced by push: 3923 Ole John Aske 2012-05-23 This is the improved 'save_mem.patch' from Mikael R. patch set That push tried to save memory used for interthread communication buffers by not allocated buffers between those threads which was assumed to not communicate. LDM <-> LDM communication was one of the places where such buffers where removed. However, the BACKUP-block is part of the LDM thread, and during backup the first BACKUP instance in LDM thread#1 will act as client/coordinator. Thus it *will* communicate with other LDM threads! This fix introduce special handling of LDM thread#1, and 'opens up' communication between that thread and other LDM threads. Instead of writing a completely new testcase to cover this fix, I have introduced 'ndb_restore_misc.cnf' which overrides 'MaxNoOfExecutionThreads=4' - Such that at least one of the existing backup-restore tests are run with multiple LDM threads. If the BACKUP is later made full multithreaded, the changes to mt.cpp in this patch should be reverted. added: mysql-test/suite/ndb/t/ndb_restore_misc.cnf modified: storage/ndb/src/kernel/vm/mt.cpp 3974 Frazer Clement 2012-08-13 [merge] Merge 7.1->7.2 modified: storage/ndb/src/kernel/blocks/dblqh/DblqhMain.cpp === added file 'mysql-test/suite/ndb/t/ndb_restore_misc.cnf' --- a/mysql-test/suite/ndb/t/ndb_restore_misc.cnf 1970-01-01 00:00:00 +0000 +++ b/mysql-test/suite/ndb/t/ndb_restore_misc.cnf 2012-08-14 11:23:36 +0000 @@ -0,0 +1,7 @@ +!include suite/ndb/my.cnf + +[cluster_config] +# If MT-ndb, there should be multiple BACKUP blocks +# (There is one pr LDM) +MaxNoOfExecutionThreads=4 + === modified file 'storage/ndb/src/kernel/vm/mt.cpp' --- a/storage/ndb/src/kernel/vm/mt.cpp 2012-06-18 13:54:07 +0000 +++ b/storage/ndb/src/kernel/vm/mt.cpp 2012-08-14 11:23:36 +0000 @@ -4068,6 +4068,17 @@ is_ldm_thread(unsigned thr_no) thr_no < NUM_MAIN_THREADS+globalData.ndbMtLqhThreads; } +/** + * All LDM threads are not created equal: + * First LDMs BACKUP-thread act as client during BACKUP + * (See usage of Backup::UserBackupInstanceKey) + */ +static bool +is_first_ldm_thread(unsigned thr_no) +{ + return thr_no == NUM_MAIN_THREADS; +} + static bool is_tc_thread(unsigned thr_no) { @@ -4099,9 +4110,16 @@ may_communicate(unsigned from, unsigned } else if (is_ldm_thread(from)) { - // LQH threads can communicates with TC-, main- and itself + // First LDM is special as it may act as internal client + // during backup, and thus communicate with other LDMs: + if (is_first_ldm_thread(from) && is_ldm_thread(to)) + return true; + + // All LDM threads can communicates with TC-, main- + // itself, and the BACKUP client (above) return is_main_thread(to) || is_tc_thread(to) || + is_first_ldm_thread(to) || (to == from); } else if (is_tc_thread(from)) @@ -4426,6 +4444,14 @@ compute_jb_pages(struct EmulatorData * e job_queue_pages_per_thread; /** + * First LDM thread is special as it will act as client + * during backup. It will send to, and receive from (2x) + * the 'num_lqh_threads - 1' other LQH threads. + */ + tot += 2 * (num_lqh_threads-1) * + job_queue_pages_per_thread; + + /** * TC threads can communicate with SPJ-, LQH- and main threads. * Cannot communicate with receive threads and other TC threads, * but as SPJ is located together with TC, it is counted as it No bundle (reason: useless for push emails).