List:Falcon Storage Engine« Previous MessageNext Message »
From:Ann W. Harrison Date:October 30 2008 3:26pm
Subject:Re: Unthawable
View as plain text  

I had written:

>>> Philip's cases are valuable tests, but not typical usage...  With
>>> reasonable settings, chills will happen only during loads and mass
>>> inserts.  The chances of modifying or deleting a record that's been
>>> chilled are small, and the chances of having a multi-level modify /
>>> chill / delete cycle are lower still.

Records are chilled only when the amount of data in new versions
created by a single transaction reach a threshold set by a condition
variable. Nothing any other transaction does has any effect on
chilling.  Record versions are thawed when the transaction that
created them needs to re-read them or in the (hopefully very rare)
case when the transaction that created them has committed but is
not yet write complete.

> Please find below an email from Kevin on another thread that describes a 
> potential chill/thaw/backlogging scenario that is not a load or a mass 
> insert -- 

Kevin's example is a long running report that makes multiple passes
over the same data.  It's the typical example used to show the value
of repeatable read isolation mode.  It has nothing to do with chilling
or thawing.  A long-running report is very unlikely to touch record
versions created by a mass-load or update that is committed but not
write complete.  If it did, the record versions would be thawed on
first reference and would stay in the cache.

Such a report might cause backlogging, which is different from chilling.

Record version are chilled by moving them to the serial log.  Normally
new record versions are moved to the serial log when the transaction
commits.  In the case of mass-loads and updates, records are moved when
their total amount of data hits a threshold.  This frees cache space
and (normally) has little or no extra cost since those records would
go to the serial log anyway.  The two cases where there is cost are
a rollback and a thaw.  When a transaction rolls back, its changes
are normally just dumped from the record cache; if the records have
been chilled, the cost of copying them to the serial log was wasted.
Thaws are expected to be rare because normally mass-load and mass-update
transactions don't turn around and re-read the data they've just created.

Backlogging is a different thing.  When the cache fills and can't
be emptied by scavenging unneeded old versions and mature records
that haven't been referenced recently, then chains of record versions
are written to a special tablespace called the backlog.  Backlogging
is an expensive operation - first, there's a write to the disk, and
second, the record version chains must be read back into memory on
reference and to be scavenged.

Currently, backlogging happens only after a chill - that was an
easy place to do it, but it is not the only time it should be done.
Aside from currently happening at the same time, chilling and
backlogging are completely different operations.

Having got that rant out, let me go back to Kevin's complex multi-
pass report.  Currently, if such a report is running and another
connection truncates the table, the reporting transaction will
see no data after the truncation.  That an interaction between a
non-transactional DDL statement and a repeatable read transaction
that makes the second transaction's reads not repeatable.

What Kevin (and Jim and I) are proposing is adding yet another
sync object to tables that when taken in shared mode guarantees
the continued existence of the table.  Truncating (or dropping,
or altering to drop columns) the table requires exclusive access
to that sync object.

> attempting to generate a report by making multiple passes 
> through the entire data while writing intermediate values in a summary 
> table.  

A big multi-pass report does put a load on any database system.
If the records referenced are mature (i.e. there are no uncommitted
versions and no old versions), then they may be scavenged during
the report to free up space in the cache.  If they're required
again, they'll be read from the database again.  If any of the
records being reported on have multiple versions, they'll stay
in the cache until scavenging no longer releases enough space,
then they may be backlogged.  They'll never be chilled -
reading never causes chilling.

> If you decide to mark the start of your execution by inserting a 
> timestamp in some log table, your transaction has become an update one 
> and thus the serial log will grow and the memory will fill up with old 
> record versions.

Yes, both those things could happen, but for different reasons.

Any long-running transaction causes old versions of records to be
kept alive.  That's what's necessary to preserve a stable snapshot
of the data for an old transaction.  It doesn't matter if the old
transaction is creating a complex report or waiting for interaction
with a user who's gone on holiday.  All MVCC engines keep old
versions for old transactions.  Oracle and InnoDB keep then in
their logs and when they hit some threshold, drop them arbitrarily,
breaking the repeatable read.  PostgreSQL and Firebird store old
versions in the database, causing the database to fill with old
cruft and leading to long, boring discussions of vacuuming and
garbage collection.  Falcon keeps old versions in the record cache
or in the backlog.  But the basic fact is that an MVCC repeatable
read requires that the database keep old versions of records

Locking systems handle the problem differently.  In a repeatable
read locking transaction, every record that's read is set to a
state where it can't be modified until the reader has committed.
In those systems, instead of filling the record cache, log, or
database with old versions, you fill the DBA's office with annoyed
users who can't get their work done until the month-end report
completes - or until they figure out who it was who went on
holiday with a transaction open.

In Falcon - and all other systems that I know of that use logs
as an intermediary between cache and database - an open write
transaction keeps the serial log from being truncated.  The serial
log will not fill memory - the number of buffers allocated to it
is set at start-up.  I suppose it's possible to run some sort of
scavenge on the log, keeping the active transaction entries and
throwing out everything that's write complete, but we don't do
that.  And I'm not sure that keeping the size of the log down
in the case of someone going on holiday with an update transaction
open is worth fixing.



UnthawableKevin Lewis25 Oct
  • Re: UnthawableJim Starkey25 Oct
  • Re: UnthawablePhilip Stoev29 Oct
    • Re: UnthawableKevin Lewis30 Oct
      • Re: UnthawableLars-Erik Bjørk30 Oct
    • Re: UnthawableAnn W. Harrison30 Oct
  • Re: UnthawablePhilip Stoev30 Oct
    • Re: UnthawableKevin Lewis30 Oct
      • Re: UnthawableHakan Kuecuekyilmaz30 Oct
      • Re: UnthawableJames Day31 Oct
Re: UnthawableKevin Lewis27 Oct
  • Re: UnthawableAnn W. Harrison27 Oct
  • Re: UnthawableJim Starkey27 Oct