From: vasil.dimov Date: August 16 2010 2:23pm Subject: bzr commit into mysql-5.5-innodb branch (vasil.dimov:3156) Bug#53761 List-Archive: http://lists.mysql.com/commits/115812 X-Bug: 53761 Message-Id: <20100816142401.CB4932E0AB@mail.v5d.org> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============5853949907862923917==" --===============5853949907862923917== MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Content-Disposition: inline #At file:///usr/local/devel/bzrroot/server/mysql-5.5-innodb/ based on revid:inaam.rana@stripped 3156 Vasil Dimov 2010-08-16 Fix Bug#53761 RANGE estimation for matched rows may be 200 times different Improve the range estimation algorithm. Previously: For a given level the algo knows the number of pages in the requested range and the n With this change: Same idea, but peek a few (10) of the intermediate pages to get a better estimate of In the bug report one of the examples has a btree with a snippet of the leaf level li page1(899 records), page2(1 record), page3(1 record), page4(1 record) so when trying to estimate, the previous algo, assumed there are average (899+1)/2=45 Fix Bug#53761 RANGE estimation for matched rows may be 200 times different Improve the range estimation algorithm. Previously: For a given level the algo knows the number of pages in the requested range and the number of records on the leftmost and the rightmost page. Then it assumes all pages in between contain the average between the two border pages and multiplies this average number by the number of intermediate pages. With this change: Same idea, but peek a few (10) of the intermediate pages to get a better estimate of the average number of records per page. If there are less than 10 intermediate pages then all of them will be scanned and the result will be precise, not an estimation. In the bug report one of the examples has a btree with a snippet of the leaf level like this: page1(899 records), page2(1 record), page3(1 record), page4(1 record) so when trying to estimate, the previous algo, assumed there are average (899+1)/2=450 records per page which went terribly wrong. With this change page2 and page3 will be read and the exact number of records will be returned. Approved by: Sunny (rb://401) modified: storage/innobase/btr/btr0cur.c storage/innobase/include/btr0cur.h === modified file 'storage/innobase/btr/btr0cur.c' --- a/storage/innobase/btr/btr0cur.c revid:inaam.rana@stripped +++ b/storage/innobase/btr/btr0cur.c revid:vasil.dimov@stripped @@ -3153,6 +3153,7 @@ btr_cur_add_path_info( { btr_path_t* slot; rec_t* rec; + page_t* page; ut_a(cursor->path_arr); @@ -3175,8 +3176,155 @@ btr_cur_add_path_info( slot = cursor->path_arr + (root_height - height); + page = page_align(rec); + slot->nth_rec = page_rec_get_n_recs_before(rec); - slot->n_recs = page_get_n_recs(page_align(rec)); + slot->n_recs = page_get_n_recs(page); + slot->page_no = page_get_page_no(page); + slot->page_level = btr_page_get_level_low(page); +} + +/*******************************************************************//** +Estimate the number of rows between slot1 and slot2 for any level on a +B-tree. This function starts from slot1->page and reads a few pages to +the right, counting their records. If we reach slot2->page quickly then +we know exactly how many records there are between slot1 and slot2 and +we set is_n_rows_exact to TRUE. If we cannot reach slot2->page quickly +then we calculate the average number of records in the pages scanned +so far and assume that all pages that we did not scan up to slot2->page +contain the same number of records, then we multiply that average to +the number of pages between slot1->page and slot2->page (which is +n_rows_on_prev_level). In this case we set is_n_rows_exact to FALSE. +@return number of rows (exact or estimated) */ +static +ib_int64_t +btr_estimate_n_rows_in_range_on_level( +/*==================================*/ + dict_index_t* index, /*!< in: index */ + btr_path_t* slot1, /*!< in: left border */ + btr_path_t* slot2, /*!< in: right border */ + ib_int64_t n_rows_on_prev_level, /*!< in: number of rows + on the previous level for the + same descend paths; used to + determine the numbe of pages + on this level */ + ibool* is_n_rows_exact) /*!< out: TRUE if the returned + value is exact i.e. not an + estimation */ +{ + ulint space; + ib_int64_t n_rows; + ulint n_pages_read; + ulint page_no; + ulint zip_size; + ulint level; + + space = dict_index_get_space(index); + + n_rows = 0; + n_pages_read = 0; + + /* Assume by default that we will scan all pages between + slot1->page_no and slot2->page_no */ + *is_n_rows_exact = TRUE; + + /* add records from slot1->page_no which are to the right of + the record which serves as a left border of the range, if any */ + if (slot1->nth_rec < slot1->n_recs) { + n_rows += slot1->n_recs - slot1->nth_rec; + } + + /* add records from slot2->page_no which are to the left of + the record which servers as a right border of the range, if any */ + if (slot2->nth_rec > 1) { + n_rows += slot2->nth_rec - 1; + } + + /* count the records in the pages between slot1->page_no and + slot2->page_no (non inclusive), if any */ + + zip_size = fil_space_get_zip_size(space); + + /* Do not read more than this number of pages in order not to hurt + performance with this code which is just an estimation. If we read + this many pages before reaching slot2->page_no then we estimate the + average from the pages scanned so far */ + #define N_PAGES_READ_LIMIT 10 + + page_no = slot1->page_no; + level = slot1->page_level; + + do { + mtr_t mtr; + page_t* page; + buf_block_t* block; + + mtr_start(&mtr); + + /* fetch the page */ + block = buf_page_get(space, zip_size, page_no, RW_S_LATCH, + &mtr); + + page = buf_block_get_frame(block); + + /* It is possible that the tree has been reorganized in the + meantime and this is a different page. If this happens the + calculated estimate will be bogus, which is not fatal as + this is only an estimate. We are sure that a page with + page_no exists because InnoDB never frees pages, only + reuses them. */ + if (fil_page_get_type(page) != FIL_PAGE_INDEX + || btr_page_get_index_id(page) != index->id + || btr_page_get_level_low(page) != level) { + + /* The page got reused for something else */ + goto inexact; + } + + n_pages_read++; + + if (page_no != slot1->page_no) { + /* Do not count the records on slot1->page_no, + we already counted them before this loop. */ + n_rows += page_get_n_recs(page); + } + + page_no = btr_page_get_next(page, &mtr); + + mtr_commit(&mtr); + + if (n_pages_read == N_PAGES_READ_LIMIT + || page_no == FIL_NULL) { + /* Either we read too many pages or + we reached the end of the level without passing + through slot2->page_no, the tree must have changed + in the meantime */ + goto inexact; + } + + } while (page_no != slot2->page_no); + + return(n_rows); + +inexact: + + *is_n_rows_exact = FALSE; + + /* We did interrupt before reaching slot2->page */ + + if (n_pages_read > 0) { + /* The number of pages on this level is + n_rows_on_prev_level, multiply it by the + average number of recs per page so far */ + n_rows = n_rows_on_prev_level + * n_rows / n_pages_read; + } else { + /* The tree changed before we could even + start with slot1->page_no */ + n_rows = 10; + } + + return(n_rows); } /*******************************************************************//** @@ -3201,6 +3349,7 @@ btr_estimate_n_rows_in_range( ibool diverged_lot; ulint divergence_level; ib_int64_t n_rows; + ibool is_n_rows_exact; ulint i; mtr_t mtr; @@ -3243,6 +3392,7 @@ btr_estimate_n_rows_in_range( /* We have the path information for the range in path1 and path2 */ n_rows = 1; + is_n_rows_exact = TRUE; diverged = FALSE; /* This becomes true when the path is not the same any more */ diverged_lot = FALSE; /* This becomes true when the paths are @@ -3258,7 +3408,7 @@ btr_estimate_n_rows_in_range( if (slot1->nth_rec == ULINT_UNDEFINED || slot2->nth_rec == ULINT_UNDEFINED) { - if (i > divergence_level + 1) { + if (i > divergence_level + 1 && !is_n_rows_exact) { /* In trees whose height is > 1 our algorithm tends to underestimate: multiply the estimate by 2: */ @@ -3270,7 +3420,9 @@ btr_estimate_n_rows_in_range( to over 1 / 2 of the estimated rows in the whole table */ - if (n_rows > index->table->stat_n_rows / 2) { + if (n_rows > index->table->stat_n_rows / 2 + && !is_n_rows_exact) { + n_rows = index->table->stat_n_rows / 2; /* If there are just 0 or 1 rows in the table, @@ -3296,10 +3448,15 @@ btr_estimate_n_rows_in_range( divergence_level = i; } } else { - /* Maybe the tree has changed between - searches */ - - return(10); + /* It is possible that + slot1->nth_rec >= slot2->nth_rec + if, for example, we have a single page + tree which contains (inf, 5, 6, supr) + and we select where x > 20 and x < 30; + in this case slot1->nth_rec will point + to the supr record and slot2->nth_rec + will point to 6 */ + n_rows = 0; } } else if (diverged && !diverged_lot) { @@ -3323,8 +3480,9 @@ btr_estimate_n_rows_in_range( } } else if (diverged_lot) { - n_rows = (n_rows * (slot1->n_recs + slot2->n_recs)) - / 2; + n_rows = btr_estimate_n_rows_in_range_on_level( + index, slot1, slot2, n_rows, + &is_n_rows_exact); } } } === modified file 'storage/innobase/include/btr0cur.h' --- a/storage/innobase/include/btr0cur.h revid:inaam.rana@stripped +++ b/storage/innobase/include/btr0cur.h revid:vasil.dimov@stripped @@ -615,6 +615,11 @@ struct btr_path_struct{ order); value ULINT_UNDEFINED denotes array end */ ulint n_recs; /*!< number of records on the page */ + ulint page_no; /*!< no of the page containing the record */ + ulint page_level; /*!< level of the page, if later we fetch + the page under page_no and it is no different + level then we know that the tree has been + reorganized */ }; #define BTR_PATH_ARRAY_N_SLOTS 250 /*!< size of path array (in slots) */ --===============5853949907862923917== MIME-Version: 1.0 Content-Type: text/bzr-bundle; charset="us-ascii"; name="bzr/vasil.dimov@stripped" Content-Transfer-Encoding: 7bit Content-Disposition: inline # Bazaar merge directive format 2 (Bazaar 0.90) # revision_id: vasil.dimov@stripped # target_branch: file:///usr/local/devel/bzrroot/server/mysql-5.5-\ # innodb/ # testament_sha1: 10cee41fd1915b17fec7e9e9151a2b350a2629d5 # timestamp: 2010-08-16 17:24:01 +0300 # source_branch: file:///usr/local/devel/bzrroot/server/mysql-5.5-\ # bugfixing/ # base_revision_id: inaam.rana@stripped\ # 6so7vug3ykcay5n2 # # Begin bundle IyBCYXphYXIgcmV2aXNpb24gYnVuZGxlIHY0CiMKQlpoOTFBWSZTWSW0aCgABnb/gHAwAAF7f/// /+dfwL////5gEBwu2fUPUusXNd2ToQS7rbdtZazd3d2A52pum6KqhOwBTTkvhJEQE1PEnomTATJp NoEyaanp6phD1MmmhkDTQShNATQmTRAap6mg2jU9QMgABkABpoCJijUA0NABoAAAAAAAAABJqJEa NTSNDRqehPQjIaaaaDJoANA0DQA5hNAaA0aMI0GI0xMmJoMI0DIBkwEiQIAgJk0Q9Jkp5Hoo2iNN NAABoBk0vNJEGEAyITpRIUxiKMQRVXj+/9sKDioswJaKXKqUCsVggzJ7hyeXu55G2ihEQYojVUKR YLNo/k2ltNJQLMKlCKsiLGKaHe0+rf19OffXobym+UnzI+/Zg+pfqOXbNB/jg+nvlyUUjcFOF3gO xFmIFkB6n2uqaEpEQQzJBGToTECMDXE3xBUVBospxAlELC1qpsHFE/5fXoKHN8V0XVZV4uIe6zi4 9Ph01Zuqjfgz/Ixae2bZjp11q2FOG3ZLtjiJhMMZr2IgknZE+rOT33c7paotyxkwS6bZtMbLf3tK 9V8zZTk0yRzeHIixhHOG68sT4m/3fabeGI6QrOe2COTPBunpH90HvuONojXspmu9ruHM4zMgZlzy 41mSk9PK+GaaaE3IcvhqjPcY5t215Tx0JMm0UftnnGdW7vV3u4nZxc69/H7ufczkn0z3/7ZNuU2V Lr8Ky+WZaFbtXUWExy2m75LTKqyb79hmtg/Acnl8C4gkWB0k/WzHnfEU1OETiHWOzxV0kZm/Iu3l FJl9MxhNSf/LfAlsuhMqGnD7xlhlDIcsrQKoKLZsl3zzqr2UxqX8HweSWa50M3c+4gRs/dKCwaxs zWkjntyMuPbvz1V70MiUk1pWkSMQ2Y2Gw9gMevdwwLHRXTmwi2SwYMM5vIqVYvuZIo+VKkcsSZbh qQJYmrJ4eMLeM+ymfZLF1wvcHUTZTZWFdVEmvMkVttqMuVC+CN2VcAbfGIoWMGGwPRSXLXLxXpVB dlvKnB6prqmJmJlDAhhItOidRSIwb6vF+4dnu6lkYZrs3zVlkRrbWTfcMtia7cI93rSjRoQZblGZ edtJjNOL4Fpa2hZctQzgG3M0I0jCkdRm3sYDCaQZYPOZCGUiRuyGiOaYjlHwaDOThKMrYLS9JDyW hcIPAwVeJBpXS4w3pciwsMfxd8qy2ZQUJWrXLQhki6hHlSPPPd7ZhR7t9EKdCep0ciXcl7uuq+zT 8OPKdnfoAgwsx39KnKB16DvtbwH38ZkKztNBKdA33cDpmRsGfRza9LPVKpmJoO2/lhYki9Jh+TXl pYo5QHiOnHAG8uHqVssawE9X3xEysgcvgvYwiEIkLPOeqCrv8dXxsv29O+C5S6MhztRS+mCVrr37 N+dLwNxPDCf0TXWcgGuvBxFYxEiIbiCBjbeE/QL013fJ7/pxA521N31fGWhKdOU173413taiJYUj fBVLUkjIag6mANfOeXjfNhJOXcdpo7hpA0MktXFiJFEiOhSdZBCz7hSdbRlAnMqvRpNI9uASV2MM qSOgrU3EfSKwYtWdb4UYQzyPmFuMKeZQQd0vpsVQuj6+BytJ6cCdKi4l0V17NjXjXiHamxIDTPty afFKgWHw9zzCZAXI99SvzNyUJbCuZ96UhPkc6GSkcevfubGTTSVV6kLPnkdCRTfJZErzeREgMZW6 jInmSNGZPHJ4EgK5C31R4nc2N/wEdC7UwMJnkp4GwrrntWFMCsC0B8h+oetiKgbYQSyuuFeVKal0 14ixhnmrUpGa34IWegEDFRB/jqqkIKw8dxFYlUulUfnEo/NNi2PXtB5Qnyq+KnRhgec8CMppujuU 3G4gWzE+sbpORESx4kjhv2Fveg+IEU6Xs40TsSG/ICOto4m2/AHjMpATqnxSAFVnNcYLuhCMDghQ SPL2eilk4CTKa3uT4IRbXUKHMPY4nA8qQe4i2yNhMIJvUDZt3EwNE/aFbsdvKpYg0MfnOGyb3BUV RGzFQlx41Jm82BM1MTlLh06M9B8ZeIduy20z5bJbXKbzVeEgpALNXSiE6LQR006Z5+vn3rCMXIad UefObzTvXuV18hjQu91xlc17Pokffj4GDbUChNUWn67vdv4A1jDZSGMppG7/N0ERenYhBJMLoNyZ Bw8R818j6Hry19NTqgrtwfh0fv6TE/U9XqVOI34NXPVDkIV403mCfxoaHp9BZCgKDumdD0IKQiKk hhjeDTIbEh+U5PMguCux4KYnMFxVONA/3DvOxbhHNHrZL4MN0vgyLnlmIdCZFSjB4xmZCymQlBu3 BSlhSGKgvrAhERCSUz94NVqfQrzPO8sfA7EAylGPQSBBjYRFR4lYvKm4tFShrBKtAQIebjPPc6vD zWaaKyKqDFgHWQhUGBFUiQoFs8nRj61RlGpuKb6nxVOneVdrRDkZAzL4OVWhjGfuLBuKBcqcFrNL elmooMZcfbuSK0WlcGk7sIDeD9TSbdyijSpjYvXyNN0Jr9jFpymUzHuHc4rVwExMahVBUwb2E4Ay CY4rPsfXMZIw8Pkh50b5kNgxZ0AJO0fbpWckQgNiFzFZDhMQsCs2mMUv4127UOo7lkFxisswMhvt f8pQr4cgLiIueSJE1l9UOjTIEqhXkJd3CXay8BNGpArgGwLy+PI1FQpSko6xJQ4k1VmV4aZY8ZRU EYMXFZjlRhDFdzfycx1+c2Avh5jlkFg5NkSQ5voEgDbI5w8+yFFm/+OgYitwhJoY50wZkhoUUiVV cOnchg1tmbL5nIjY/O8m/zvPsor9A+LKJzxkUqW4VlO6KuZCjMkVJVKHbZQbkrKjNgrm1297qbsA WDMhDaaDsbQGUFZMYvEmtWoVXEj20YogzVv2PS7OaTnNMZ891yzam0Wnia0dFjPnMAmeDD2tHj4z clfhDNDHG9RQRXAhENUGhLU9e1Jb2WMCAO18mklPle9HHDLhpWRkWcZFYnIMAPd5cj2dj2HaNotw JBuDHTibkI4Rzcfwgmjf4FlObgbggUC3gcqct+wNrV6Yjs3G6JykkmVTJ4xylVBNHaFZa0SiUjM0 nAQcCBbGqPq0jKKVPSbu1o9w/Yxb8DE0IBbDPHeWQ0mLMQr+2QiysgyVDqcUqaCHuRTsMjXJDM7p Cdt6JYUpZDOMdXTq6jIpbAy9BAVGBKRFSWB/7M45Ra4nDE5ZidEQusjgJEWZnWLVAURhBGSIwkGM ZEJCmWBUxRIrHMUvlOWzt9zIAbO9wDY8pYFJ2yJUNsKARJJ8yCkIodOagUvIhA0hyGlJgqMvRGXB C7JcanHvDaW0JyKeKCXDiXHsZy/C0Zka4YgTR9FksUVMwWYrYWECkEfEBg30JF4M/A4PwOFhxr0K HnQvPScCaZeHncm2uqc+mo+7ivBE7DJLXC/g2e4mCih0I6QgrDPSr6TMDMDDMm8jigN8D6PsvI3y rHpD0pvbrHjw7TXrICuM16vANETI0rdnRAlDmA0zor1wRJDbEwJs++iRAZIpCCRYuuIktfmPI6km A2G1DIeJTmqdCHfcCwwmpLzqtZIvQwauliimghFxbEqaNZqb3e1+GhLLEw3g2heBGCl01TGbUiGl BogIHQG1LdIoFAIZ4uzn0JFtSgK+ya+J4BiFgphlZk3hfOFdXVS5UthRRoVUWW3ZjAoZnYwHjmwK Rh7RmEgmeZ2joopHc1ZSiYKO6RmrlMzAk2w6lQtUkuFpDePDmjezE3JC9kJXtvXLLxUIW6kTsCub YLA0kENdoWzaTbCxEOJuC7h0wNxeUwhsknG3NM5OX3EuCgjPW4HsUI7DPzKTB0GeLRUckw6ofXlo AHq6mRMKCVzNlmF0Q/nHCXMUHndBYNcjOmTr7DfDR9glci8ReUagSoeLOZvIDerkyhuFRmghGkjT BG1HlpeCor5u8kAQMEdMK3BxJoc4vOJQiHaAQ6IyYHsAiuVuQylxS0zDLbWVCFk71CDQ6Tv0dRme tmU1SJxHY7HM/tUzuEkxsVlwNLkzFsRRhlYUVRiCGi2cyRyU2F4mKRsHAKglCBo5/FdMRbDjUSVg zCBvFwOVRVTTT5wlZ5zAUDayAysF6XsqLv5EJ3pfyOzlRX6jOmk0TFNG1GuoeNsvQ1eSSGmCLlde bSTjAWMySkYS5ViJehro0m6E/XBgWMKX0JlmTExzsQSCBkBZqg1MnDLpEiR3PCpBRg6pklKLRFp6 sFAhhMV9/r1FM7cMyL5FODIZB+B7jAuBOlkWoLE0ZyVRtdOkJSYmwk0Q0WGHNfUonfUiPTtOZr1Y Yybna1/hlRHmxLejdG9ErLpewUVMV6lKO/UlRjIsN18uSrWRKkPEVITnAuPem9YBLvZmOpORE7GX wwoGosr1g0kDU8WsbTJQ2EmXA1woE8xTIPkAaFVqZSO8bgK74Q6XGBYGZiEDlSdlCdgZLYRZmhtY E2mySipse/IQYFjkRRG1majjwskYDWrwRevFewGJjAZMKSOtMxULaEw2m1gqc+h1+8uDrK0e07Vd EiBxEMaWLHdiVyAdEb+AX4UBCRRpE+VGSVdNWKhrpD7WPfGN4yRKG7MfqetMP6uAsep3JCg5KJoJ cSQj60yo2u3cl4tQwh9iTS97BeTSk6iQWgwISA7DMHmeHHWZQHIYgSKEDlAie5VLH2AT1IOHmsex 0rmX1VykFFiYXckU4UJAltGgoA== --===============5853949907862923917==--