From: Michael Widenius Date: April 8 2008 10:19am Subject: re: setting key_del when applying REDOs: bug? List-Archive: http://lists.mysql.com/maria/2 Message-Id: <18427.18077.352960.15203@narttu.mysql.fi> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Hi! >>>>> "Guilhem" == Guilhem Bichot writes: Guilhem> Hello, Guilhem> in _ma_apply_redo_index_new_page() there is: Guilhem> /* free_page is 0 if we shouldn't set key_del */ Guilhem> if (free_page) Guilhem> { Guilhem> if (free_page != IMPOSSIBLE_PAGE_NO) Guilhem> share->state.key_del= (my_off_t) free_page * share->block_size; Guilhem> else Guilhem> share->state.key_del= HA_OFFSET_ERROR; Guilhem> } Guilhem> and in _ma_apply_redo_index_free_page() there is: Guilhem> share->state.key_del= ((my_off_t) page) * share->block_size; Guilhem> I wonder: shouldn't both settings be included in a Guilhem> if (cmp_translog_addr(lsn, share->state.is_of_horizon) >= 0) Guilhem> { set key_del } Guilhem> ? Guilhem> Otherwise, imagine the following. Guilhem> At runtime: Guilhem> key_del is empty Guilhem> DELETE ...; frees page 1 (key_del=1) Guilhem> DELETE ...; frees page 2 (key_del=2, page 2 points to page 1) Guilhem> page 2 gets flushed, not page 1 Guilhem> checkpoint happens (page 1 is in list of dirty pages, not page 2), Guilhem> which flushes state (so key_del=2 goes to disk) Guilhem> At REDO applying: Guilhem> checkpoint says REDO for page 2 should not even be read and page 2 Guilhem> should not even be loaded from disk Guilhem> key_del is 2 Where in the code is this logic to skip executing redo entries ? The current code for REDO_INDEX_FREE_PAGE assumes that the log entry is always executed (even if the page is not read). It's possible to skip the code in _ma_apply_redo_index_new_page() starting from if (!(buff= pagecache_read(...), but I don't now if it's good to skip the first part of the code (setting key_del, key_root and file_size) Guilhem> REDO for freeing of page 1 is executed as page 1 is in dirty-pages Guilhem> list of checkpoint, sets key_del to 1 (because if() missing) Guilhem> REDO for freeing page 2 is skipped Guilhem> So we have page 2 marked deleted, pointing to page 1, and key_del is 1 Guilhem> instead of 2: this makes a wrong table (page 2 is lost forever). Yes, this may happen if we don't execute the first part of _ma_apply_redo_index_new_page() for all pages. I assume adding a test of cmp_translog_addr() should also be safe. Guilhem> Scenario with INSERT is worse: Guilhem> At runtime: Guilhem> key_del is 2 (page 2 and 1 have been deleted, 2 points to 1) Guilhem> INSERT ...; uses page 2 (key_del= 1) Guilhem> INSERT ...; uses page 1 (key_del empty) Guilhem> page 1 gets flushed, not page 2 Guilhem> checkpoint happens (page 2 is in list of dirty pages, not page 1), Guilhem> which flushes state (so key_del=empty goes to disk) Guilhem> At REDO applying: Guilhem> checkpoint says REDO for page 1 should not even be read and page 1 Guilhem> should not even be loaded from disk Guilhem> key_del empty Guilhem> REDO for new-ing of page 2 is executed as page 2 is in dirty-pages Guilhem> list of checkpoint, sets key_del to 1 (because if() missing) Guilhem> REDO for new-ing page 1 is skipped Guilhem> So we have key_del=1 though page 1 is a non-deleted page, next INSERT Guilhem> will have a problem (error or destroying important data). Guilhem> Do you remember of a reason why the if() is not there? Can I add it in Guilhem> both functions? The reason is that the code assumes that this code part is executed for all REDO, in which case it should always be safe to set the del. So the question is at what point you plan to skip reading the pages. If you want to skip calling _ma_apply_redo_index_new_page() altogether, then you need to add tests for cmp_translog_addr(). If you only want to skip reading the not used pages and add the test if a page should be read just before pagecache_read(), then the current code is probably safe. Feel free to go with either method. Regards, Monty