List:General Discussion« Previous MessageNext Message »
From:Rick James Date:May 31 2013 6:54pm
Subject:RE: Audit Table storage for Primary Key(s)
View as plain text  
UUID PRIMARY KEY (or even secondary index) -- 
Once the table gets big enough (bigger than RAM cache), each row INSERTed (or SELECTed)
will be a disk hit.  (Rule of Thumb:  only 100 hits/sec.)  This is because _random_ keys
(like UUID) make caching useless.  Actually, the slowdown will be gradual.  For example,
once the table is 5 times as big as the cache, 80% (1-1/5) of the INSERTs/SELECTs will
hit disk.
Bottom line -- Avoid UUIDs in huge tables, if at all possible.  (Exception:  The bits in
type-1 UUIDs can be rearranged to be roughly chronological.)

BIGINT -- You cannot possibly hit its max with any existing hardware.

MyISAM -- PRIMARY KEY is just another secondary index.  Secondary indexes are separate
BTrees.

InnoDB -- PRIMARY KEY and data coexist in the same BTree.  Secondary indexes are separate
BTrees.

So, assuming this "audit" table will be huge (too big to be cached), you need to carefully
consider every index, both for writing and for reading.

You mentioned that you might audit 50 tables?  An index that starts with table_name would
be inserting/selecting in 50 spots.  If the second part of the index is something
'chronological', such as an AUTO_INCREMENT or TIMESTAMP, then there would be 50 "hot
spots" in the index.  This is quite efficient.  INDEX(table_name, UUID) would be bad
because of the randomness.

InnoDB may be the preferred engine, even though the footprint is bigger.  This is because
careful design of the PK could lead to INSERTs into hot spot(s), plus SELECTs being able
to take advantage of locality of reference.  With PRIMARY KEY(table_name, ...), and
SELECT .. WHERE tablename='...', InnoDB will find all the rows together (fewer disk
hits); MyISAM will find the data scattered (more disk hits, hence slower).

Another aspect...  Would your SELECTs say "WHERE ... AND timestamp BETWEEN..." ?  And,
would you _usually_ query _recent_ times?  If so, there could be a boost from doing both
of these
** PARTITION BY RANGE(TO_DAYS(timestamp))
** Move timestamp to the _end_ of any indexes that it is in.

I would be happy to discuss these principles further.  To be able to discuss more
specifically, please provide
** Your tentative SHOW CREATE TABLE
** how big you plan for the table to become (#rows or GB),
** how much RAM you have

> -----Original Message-----
> From: Carsten Pedersen [mailto:carsten@stripped]
> Sent: Friday, May 31, 2013 4:05 AM
> Cc: [MySQL]
> Subject: Re: Audit Table storage for Primary Key(s)
> 
> Based on the little information available, I would make a lookup field
> consisting of tablename and primary keys.
> 
> (although I still believe that storing this information in the database
> in the first place is probably the wrong approach, but to each his own)
> 
> / Carsten
> 
> On 31-05-2013 12:58, Neil Tompkins wrote:
> > The kind of look ups will be trying to diagnose when and by who
> > applied a update.  So the primary key of the audit is important.  My
> > question is for performance, should the primary key be stored as a
> > indexed field like I mentioned before, or should I have a actual
> > individual field per primary key
> >
> >
> > On Fri, May 31, 2013 at 11:03 AM, Carsten Pedersen
> > <carsten@stripped <mailto:carsten@stripped>> wrote:
> >
> >     Again: Unless you can give some idea as to the kind of lookups
> you
> >     will be performing (which fields? Temporal values? etc.), it is
> >     impossible to give advice on the table structure. I wouldn't
> blame
> >     anyone for not being able to do so; saving data for debugging
> will
> >     always be a moving target and almost by definition you don't know
> >     today what you'll be looking for tomorrow.
> >
> >     That's why I think that using CSV tables _the contents of which
> can
> >     subsequently be analyzed using any of a number of text file
> >     processing tools_ may indeed be your best initial option.
> >
> >     On UUIDs vs. INTs: (1) Please do yourself a favor and read up on
> how
> >     UUIDs are generated. If it's the same server that generates all
> the
> >     UUIDs, you won't get a lot of uniqueness for the amount of space
> >     you'll be using for your data and index; (2) Please do the math
> of
> >     just how many inserts you can do per second over the next 1.000
> >     years if you use a longint auto-increment field for your PK.
> >
> >     / Carsten
> >
> >     On 31-05-2013 11 <tel:31-05-2013%2011>:14, Neil Tompkins wrote:
> >
> >         Thanks for your response.  We expect to use the Audit log
> when
> >         looking into
> >         exceptions and/or any need to debug table updates.  I don't
> >         think a CSV
> >         table would be sufficient as we are wanting to use a
> interface
> >         to query
> >         this data at least on a daily basis if not weekly.
> >
> >         I use UUID because we have currently 54 tables, of which
> >         probably 30 will
> >         be audited.  So a INT PK wouldn't work because of the number
> of
> >         updates we
> >         are applying.
> >
> >
> >         On Fri, May 31, 2013 at 9:58 AM, Carsten Pedersen
> >         <carsten@stripped <mailto:carsten@stripped>>wrote:
> >
> >             On 30-05-2013 09:27, Neil Tompkins wrote:
> >
> >                 Hi,
> >
> >                 I've created a Audit table which tracks any changed
> >                 fields for multiple
> >                 tables.  In my Audit table I'm using a UUID for the
> >                 primary key.  However
> >                 I
> >                 need to have a reference back to the primary key(s)
> of
> >                 the table audited.
> >
> >                 At the moment I've a VARCHAR field which stores
> primary
> >                 keys like
> >
> >                 1
> >                 1|2013-05-29
> >                 2|2013-05-29
> >                 2
> >                 3
> >                 1|2|2
> >                 etc
> >
> >                 Is this the best approach, or should I have a
> individual
> >                 field in the
> >                 audit
> >                 table for all primary keys.  At the moment I think
> the
> >                 max number of
> >                 primary keys on any given table is 3
> >
> >                 Thanks
> >                 Neil
> >
> >
> >             First you need to ask yourself how you expect to use the
> >             table in the
> >             future. Will you be looking up the data on a regular
> basis?
> >             Or will lookups
> >             only be something you will do in exceptional situtions?
> >
> >             What is the intended goal of having a UUID for the
> primary
> >             key rather
> >             than, say, an integer - or having no PK at all?
> >
> >             My immediate thought when reading this was "why even
> store
> >             that data in a
> >             table?" - if it's a simple log, use a log file.
> Especially
> >             if you don't
> >             know how you intend to search for data later on. There
> are
> >             many tools that
> >             are far superior to SQL when it comes to searching for
> text
> >             strings. You
> >             could even consider having a CSV table, which will give
> you
> >             an SQL
> >             interface to said text file.
> >
> >             / Carsten
> >
> >
> >
> >             --
> >             MySQL General Mailing List
> >             For list archives: http://lists.mysql.com/mysql
> >             To unsubscribe: http://lists.mysql.com/mysql
> >
> >
> >
> >
> >     --
> >     MySQL General Mailing List
> >     For list archives: http://lists.mysql.com/mysql
> >     To unsubscribe: http://lists.mysql.com/mysql
> >
> >
> 
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:    http://lists.mysql.com/mysql

Thread
Audit Table storage for Primary Key(s)Neil Tompkins30 May
  • Fwd: Audit Table storage for Primary Key(s)Neil Tompkins31 May
    • RE: Audit Table storage for Primary Key(s)Jason Trebilcock31 May
  • Re: Audit Table storage for Primary Key(s)Carsten Pedersen31 May
    • Re: Audit Table storage for Primary Key(s)Neil Tompkins31 May
      • Re: Audit Table storage for Primary Key(s)Carsten Pedersen31 May
        • Re: Audit Table storage for Primary Key(s)Neil Tompkins31 May
          • Re: Audit Table storage for Primary Key(s)Carsten Pedersen31 May
            • Re: Audit Table storage for Primary Key(s)Andrew Moore31 May
            • RE: Audit Table storage for Primary Key(s)Rick James31 May
              • Re: Audit Table storage for Primary Key(s)Andrew Moore31 May