From: Ole John Aske Date: November 9 2011 8:55am Subject: bzr push into mysql-5.1-telco-7.0 branch (ole.john.aske:4640 to 4641) List-Archive: http://lists.mysql.com/commits/141875 Message-Id: <20111109085524.9DD9E233@fimafeng09.norway.sun.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit 4641 Ole John Aske 2011-11-09 Fix for better load balancing SPJ operations, and reduce exec. latency : Add a deterministic skew in the fragmentlist such that it is rotated one place for each SPJ request. This has the effect that a non parallel scan is not started on the same subset of fragments by all SPJ executors. Which will result in a better loadbalancing and reduced latency. modified: storage/ndb/src/kernel/blocks/dbspj/Dbspj.hpp storage/ndb/src/kernel/blocks/dbspj/DbspjMain.cpp 4640 Ole John Aske 2011-11-09 SPJ Fix of buffered rows possible refering garbage after reorg'ed memory pages: When the SPJ block buffer rows, it also convert the RowPtr from a RT_SECTION to RT_LINEAR type as part of ::storeRow(). After ::storeRow has allocated memory for the row itself, it is put into a 'map' by add_to_map(). However, as add:to_map() also does its own memory alloc, that may trigger a reorg of the memory page in order to reclaim free'ed memory blocks. Such a reorg will may cause the previously allocated row to be moved, and the previously memory rowptr to become invalid - thus referring garbage. Solution is to refetch the ptr after add_to_map() and fill in the returned RowPtr& as the final action. .... No MTR test as I am only able to reproduce this with my rewritten calculate_batch_size() + some additional hack for testing that. These hacks effectively force a really tiny 'BatchByteSize' to be calculated. That in turn cause lots of repeated fetching (and buffering) from bushy scan queries. modified: storage/ndb/src/kernel/blocks/dbspj/DbspjMain.cpp === modified file 'storage/ndb/src/kernel/blocks/dbspj/Dbspj.hpp' --- a/storage/ndb/src/kernel/blocks/dbspj/Dbspj.hpp 2011-09-29 11:43:27 +0000 +++ b/storage/ndb/src/kernel/blocks/dbspj/Dbspj.hpp 2011-11-09 08:54:55 +0000 @@ -871,6 +871,7 @@ public: Uint32 m_senderRef; Uint32 m_senderData; Uint32 m_rootResultData; + Uint32 m_rootFragId; Uint32 m_transId[2]; TreeNode_list::Head m_nodes; TreeNodeCursor_list::Head m_cursor_nodes; === modified file 'storage/ndb/src/kernel/blocks/dbspj/DbspjMain.cpp' --- a/storage/ndb/src/kernel/blocks/dbspj/DbspjMain.cpp 2011-11-09 08:39:50 +0000 +++ b/storage/ndb/src/kernel/blocks/dbspj/DbspjMain.cpp 2011-11-09 08:54:55 +0000 @@ -482,6 +482,7 @@ Dbspj::do_init(Request* requestP, const requestP->m_outstanding = 0; requestP->m_transId[0] = req->transId1; requestP->m_transId[1] = req->transId2; + requestP->m_rootFragId = LqhKeyReq::getFragmentId(req->fragmentData); bzero(requestP->m_lookup_node_data, sizeof(requestP->m_lookup_node_data)); #ifdef SPJ_TRACE_TIME requestP->m_cnt_batches = 0; @@ -777,6 +778,7 @@ Dbspj::do_init(Request* requestP, const requestP->m_transId[0] = req->transId1; requestP->m_transId[1] = req->transId2; requestP->m_rootResultData = req->resultData; + requestP->m_rootFragId = req->fragmentNoKeyLen; bzero(requestP->m_lookup_node_data, sizeof(requestP->m_lookup_node_data)); #ifdef SPJ_TRACE_TIME requestP->m_cnt_batches = 0; @@ -4630,12 +4632,17 @@ Dbspj::execDIH_SCAN_TAB_CONF(Signal* sig Ptr requestPtr; m_request_pool.getPtr(requestPtr, treeNodePtr.p->m_requestPtrI); + // Add a skew in the fragment lists such that we don't scan + // the same subset of frags fram all SPJ requests in case of + // the scan not being ' T_SCAN_PARALLEL' + Uint16 fragNoOffs = requestPtr.p->m_rootFragId % fragCount; + Ptr fragPtr; Local_ScanFragHandle_list list(m_scanfraghandle_pool, data.m_fragments); if (likely(m_scanfraghandle_pool.seize(requestPtr.p->m_arena, fragPtr))) { jam(); - fragPtr.p->init(0); + fragPtr.p->init(fragNoOffs); fragPtr.p->m_treeNodePtrI = treeNodePtr.i; list.addLast(fragPtr); } @@ -4701,10 +4708,11 @@ Dbspj::execDIH_SCAN_TAB_CONF(Signal* sig { jam(); Ptr fragPtr; + Uint16 fragNo = (fragNoOffs+i) % fragCount; if (likely(m_scanfraghandle_pool.seize(requestPtr.p->m_arena, fragPtr))) { jam(); - fragPtr.p->init(i); + fragPtr.p->init(fragNo); fragPtr.p->m_treeNodePtrI = treeNodePtr.i; list.addLast(fragPtr); } No bundle (reason: useless for push emails).