List:Maria Storage Engine« Previous MessageNext Message »
From:Guilhem Bichot Date:November 12 2008 9:37am
Subject:Re: Versioning for delete & update (for transactional tables) (4619)
View as plain text  
Hello,

Michael Widenius a écrit, Le 11/11/2008 08:34 PM:
>>>>>> "G" == Guilhem Bichot <guilhem@stripped> writes:
> G> I have some questions about the specs.
> 
>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>> TASK...........: Versioning for delete & update (for transactional
> tables)
>>> DESCRIPTION:
>>>
>>> Versioning for delete & update (for transactional tables)
>>>
>>>
>>> HIGH-LEVEL SPECIFICATION:
>>>
>>> For delete, instead of physically deleting rows when maria_delete() is
>>> called, we will change the delete internally to an update where the
>>> row and it's keys are tagged with the current transaction id as their
>>> delete trans id.
> 
> G> Will there be a new type of REDO log record to describe this change done 
> G> to the row or key? Or does some existing REDO log record need to be 
> G> slightly changed?
> G> If there is a "yes" to one of these questions, we need to update 
> G> recovery's code to add/modify "log record replay" functions.
> 
> Haven't thought about this in detail yet. Don't think we need a new
> type of REDO.  The main change is that instead of doing a delete of
> the row will be doing an update instead.

Ok.
And will there be a new type of UNDO?

>>> LOW-LEVEL DESIGN:
> 
> <cut>
> 
>>> Storing of transid (Trid):
>>>
>>> Trid is max 6 bytes long
>>>
>>> First Trid it's converted to a smaller number by using
>>> trid= trid - create_trid.
> 
> G> I think this needs a note:
> G> Right now maria_create() executes with dummy_transaction_object as far 
> G> as I know. Either this can be changed to a real non-dummy TRN, or 
> G> maria_create() can just fetch the current value of the global TrID 
> G> generator and use that as create_trid.
> 
> What's wrong with using a dummy_transaction_object?

It does not have a proper transaction id (it's just 0), which would have 
been a problem if you would have used its id to fill create_trid. But 
you're not, so no problem.

> Don't we need this to log the create of the table?

We log the CREATE, using dummy_transaction_object.

> For create_trid, we use trnman_get_min_safe_trid() which basicly
> fetches the safest value to use for a TrId.

Ok.

> We can't use the current value as a long running transaction may want
> to write to the newly created table.

Indeed.

> <cut>
> 
>>> Prefix bytes 244 to 249 are reserved for negative transid, that can be used
>>> when we pack transid relative to each other on a key block.
>>>
>>> We have to store transid in high-byte-first order to be able to do a
>>> fast byte-per-byte comparision of them without packing them up.
> 
> G> you mean "without unpacking them" ?
> 
> Yes.
> 
> G> I don't understand the sentence, could you please explain the scenario 
> G> why this high-byte-first helps?
> 
> High byte first allows us to compare byte per byte. The first byte
> that differs tells us which TrID is larger. (larger byte is larger)

Ok

>>> ------------
>>>
>>> For example, assuming we the following data:
>>>
>>> key_data:               1                (4 byte integer)
>>> pointer_to_row:         2 << 8 + 3 = 515 (page 2, row 3)
>>> table_create_transid    1000             Defined at create table time
>>> transid                 1010             Transaction that created row
>>> delete_transid          2011             Transaction that deleted row
>>>
>>> In addition we assume the table is created with a data pointer length
>>> of 4 bytes (this is automatically calculated based on the medium length of
> rows
>>> and the given max number of rows)
>>>
>>> The binary data for the key would then look like this in hex:
>>>
>>> 00 00 00 01     Key
>>> 00 00 00 47     (515 << 1) + 1         ;  The last 1 is marker that key
> cont.
>>> 15              ((1000-1010) << 1) + 1 ;  The last 1 is marker that key
> cont.
> G> you mean 1000-1010 or 1010-1000 ?
> 
> Sorry, meant 1010-1000
> 
>>> FB 07 E6        length byte and  ((2011 - 1000) << 1) = 07 E6
> 
> G> Could you please add "07E6 is 2 bytes and so 249 + 2 = 251 = FB" ?
> 
> Can do.

... nomination.

> 
> G> Do you need to specifically update maria_chk, maria_pack, so that they 
> G> don't fail when finding a delete_transid?
> 
> maria_pack should never see a delete_transid.
> maira_chk will ignore any rows with a delete_transid.
> (No code changes needed for this)

Why does maria_chk need no code change? Where's the magic?

> G> What about "zerofill" code, does it need an update?
> 
> No, as zerofill will never see a row with delete_transid.

ok


> (This is because you never run zerofill while there is something that
> is not purged).
> 
> What needs to be done is to not allow one to run
> repair/optimize/zerofill while there is old transactions that may see
> any of the deleted rows.

Could you please mention this into the WL?

Thread
Re: Versioning for delete & update (for transactional tables) (4619)Guilhem Bichot10 Nov
  • Re: Versioning for delete & update (for transactional tables) (4619)Oleksandr \"Sanja\" Byelkin11 Nov
  • Re: Versioning for delete & update (for transactional tables)(4619)Michael Widenius11 Nov
    • Re: Versioning for delete & update (for transactional tables) (4619)Guilhem Bichot12 Nov
      • Re: Versioning for delete & update (for transactional tables) (4619)Guilhem Bichot12 Nov