List:Maria Storage Engine« Previous MessageNext Message »
From:Michael Widenius Date:November 11 2008 7:34pm
Subject:Re: Versioning for delete & update (for transactional tables)
(4619)
View as plain text  
Hi!

>>>>> "G" == Guilhem Bichot <guilhem@stripped> writes:

G> Hello Monty,
G> I have some questions about the specs.

>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>> TASK...........: Versioning for delete & update (for transactional tables)
>> DESCRIPTION:
>> 
>> Versioning for delete & update (for transactional tables)
>> 
>> 
>> HIGH-LEVEL SPECIFICATION:
>> 
>> For delete, instead of physically deleting rows when maria_delete() is
>> called, we will change the delete internally to an update where the
>> row and it's keys are tagged with the current transaction id as their
>> delete trans id.

G> Will there be a new type of REDO log record to describe this change done 
G> to the row or key? Or does some existing REDO log record need to be 
G> slightly changed?
G> If there is a "yes" to one of these questions, we need to update 
G> recovery's code to add/modify "log record replay" functions.

Haven't thought about this in detail yet. Don't think we need a new
type of REDO.  The main change is that instead of doing a delete of
the row will be doing an update instead.  During purge we will do
a delete redo entry.

>> For update, we don't delete changed keys but instead tag them with
>> with the current transaction id as their delete trans id.

G> Same question here.

Same thing here. Instead of doing a redo for delete, we will use a
redo for update.

>> The position to the rows and keys are stored in the transaction log (like now).
>> 
>> After the transaction has committed, the link to the last deleted
>> entry in the transaction log will be submitted to a purge handler.
>> 
>> The purge handler will wait until all transaction that started before
>> the given one has completed and will then go through the linked entries
>> in the log and execute a normal row-delete & key-deletes on the
>> stale entries.

G> Actually, trnman_end_trn() already separates committed transactions 
G> which can be purged, from those to keep: see this function, it already 
G> has a comment:
G>    /* QQ: send them to the purge thread */
G> So the purge handler does not need to wait, whatever it receives can be 
G> purged immediately.

Ok;  'purge handler' should be 'transaction manager' instead. Will
replace the text in the worklog with:

"The transactiona handler will wait until all transaction that started
before the given one has completed and will inform the purge handler
to go through the linked entries in the log and execute a normal
row-delete & key-deletes on the stale entries"

>> LOW-LEVEL DESIGN:

<cut>

>> Storing of transid (Trid):
>> 
>> Trid is max 6 bytes long
>> 
>> First Trid it's converted to a smaller number by using
>> trid= trid - create_trid.

G> I think this needs a note:
G> Right now maria_create() executes with dummy_transaction_object as far 
G> as I know. Either this can be changed to a real non-dummy TRN, or 
G> maria_create() can just fetch the current value of the global TrID 
G> generator and use that as create_trid.

What's wrong with using a dummy_transaction_object?
Don't we need this to log the create of the table?

For create_trid, we use trnman_get_min_safe_trid() which basicly
fetches the safest value to use for a TrId.

We can't use the current value as a long running transaction may want
to write to the newly created table.

<cut>

>> Prefix bytes 244 to 249 are reserved for negative transid, that can be used
>> when we pack transid relative to each other on a key block.
>>
>> We have to store transid in high-byte-first order to be able to do a
>> fast byte-per-byte comparision of them without packing them up.

G> you mean "without unpacking them" ?

Yes.

G> I don't understand the sentence, could you please explain the scenario 
G> why this high-byte-first helps?

High byte first allows us to compare byte per byte. The first byte
that differs tells us which TrID is larger. (larger byte is larger)

>> ------------
>> 
>> For example, assuming we the following data:
>> 
>> key_data:               1                (4 byte integer)
>> pointer_to_row:         2 << 8 + 3 = 515 (page 2, row 3)
>> table_create_transid    1000             Defined at create table time
>> transid                 1010             Transaction that created row
>> delete_transid          2011             Transaction that deleted row
>> 
>> In addition we assume the table is created with a data pointer length
>> of 4 bytes (this is automatically calculated based on the medium length of rows
>> and the given max number of rows)
>> 
>> The binary data for the key would then look like this in hex:
>> 
>> 00 00 00 01     Key
>> 00 00 00 47     (515 << 1) + 1         ;  The last 1 is marker that key
> cont.
>> 15              ((1000-1010) << 1) + 1 ;  The last 1 is marker that key
> cont.
G> you mean 1000-1010 or 1010-1000 ?

Sorry, meant 1010-1000

>> FB 07 E6        length byte and  ((2011 - 1000) << 1) = 07 E6

G> Could you please add "07E6 is 2 bytes and so 249 + 2 = 251 = FB" ?

Can do.

G> Do you need to specifically update maria_chk, maria_pack, so that they 
G> don't fail when finding a delete_transid?

maria_pack should never see a delete_transid.
maira_chk will ignore any rows with a delete_transid.
(No code changes needed for this)

G> What about "zerofill" code, does it need an update?

No, as zerofill will never see a row with delete_transid.

(This is because you never run zerofill while there is something that
is not purged).

What needs to be done is to not allow one to run
repair/optimize/zerofill while there is old transactions that may see
any of the deleted rows.

Regards,
Monty
Thread
Re: Versioning for delete & update (for transactional tables) (4619)Guilhem Bichot10 Nov
  • Re: Versioning for delete & update (for transactional tables) (4619)Oleksandr \"Sanja\" Byelkin11 Nov
  • Re: Versioning for delete & update (for transactional tables)(4619)Michael Widenius11 Nov
    • Re: Versioning for delete & update (for transactional tables) (4619)Guilhem Bichot12 Nov
      • Re: Versioning for delete & update (for transactional tables) (4619)Guilhem Bichot12 Nov