List:General Discussion« Previous MessageNext Message »
From:Rick James Date:September 20 2012 4:29pm
Subject:RE: Making Myself Crazy
View as plain text  

> -----Original Message-----
> From: Jan Steinman [mailto:Jan@stripped]
> Sent: Wednesday, September 19, 2012 8:59 PM
> To: mysql@stripped
> Subject: Re: Making Myself Crazy
> Thanks for your help, Rick!
> Interspersed are some questions and rationales for you to shoot down...
> :-)
[Rick James]  :)

> > From: Rick James <rjames@stripped>
> >
> > s_product_sales_log has no PRIMARY KEY.  All InnoDB tables 'should'
> have an explicit PK.
> This table really has no identifying information. There could be two
> identical, valid rows, if the same person sold the same amount of the
> same product to the same other person on the same day.

[Rick James] The PK provides a _unique_ identification for each row.  Each other key
depends on the PK to find the row.
A PK is a UNIQUE key is an INDEX.
If you don't provide a PK, InnoDB will provide an inaccessible, 6-byte, number for the
is a common technique.  It is 4 bytes.  (Or less if you use MEDIUMINT, etc)
(Hence, I said "should", not must.)

> All the foreign keys were indexed. Is there something I don't
> understand about something a PK field does? If an individual record
> cannot be uniquely identified by its information, is there really any
> need for a primary key?
> None the less, I added field "ID" as an unsigned autoincrement INT and
> made it PK.
> > INT(5) is not what you think.  INT is always a 32-bit, 4-byte
> quantity, regardless of the number.
> >
> > Use TINYINT UNSIGNED, SMALLINT UNSIGNED, etc. wherever reasonable.
> Understood. I make all my keys UINT even when they could be smaller,
> because I've gotten into some gnarly consistency problems. with foreign
> keys.
[Rick James] Yes, types must be consistent for FKs.  (I don't know the details of what
aspects of the 'type' can be relaxed.)  Usually that happens 'automatically' when you
build the schema and decide that foo_id is TINYINT UNSIGNED, and apply that to all tables
with foo_id.
Changing the type after the fact can be trickier, especially because of PKs.

> >> KEY `is_value_added` (`is_value_added`),
> > A single-column INDEX on a flag, ENUM, and other low-cardinality
> field, is almost always useless.
> Why is that? Surely, even a flag separates the record space into two?
[Rick James] Yes, it separates.  But let's look at the effort.  There are two 'files', one
with all the data, one with all of this index.
Plan A:  Use the index.  Walk through half of the index 'file' (actually a BTree).  For
each 'row' encountered, reach over into the data -- this may be a random lookup.  Back and
forth, back and forth.
Plan B:  Ignore the index; simply scan all the data 'file'.  As you scan, check the row
for the flag.
Plan A is likely to involve more I/O, hence be slower.  So, the optimizer chooses Plan B. 
There is no exact cutoff between the two approaches; it is somewhere around 20%.

Further note, that to reach from the Index into the Data requires a BTree lookup to find
the data row.  This uses the PRIMARY KEY, which is implicitly included in the secondary
key's BTree.  (Hence one reason that InnoDB must have a PK.)

> > Performance issues...
> >> WHERE  YEAR(sales.`Date`) = '{{{1}}}'
> > won't use
> >> KEY `Date` (`Date`),
> > because the column (Date) is hidden in a function.  A workaround:
> >   WHERE `Date` >= '{{{1}}}-01-01'
> >     AND `Date` <  '{{{1}}}-01-01' + INTERVAL 1 YEAR
> Thanks! Good catch.
> > JOINing two subqueries -- There is no way to index either of them, so
> the JOIN will have to do a table scan of one temp table for every row
> of the other temp table.
> > (The alternative is to CREATE TEMPORARY TABLE... with an index, for
> one of the subqueries.)
> But I made sure the subqueries were the smallest possible sets --
> essentially, the domain of s_profit_centre, which only has 12 records.
[Rick James] Good.  With 12, the performance won't be too bad.
> I had the entire thing coded up into one massive JOIN of everything,
> and it took 30 minutes to run! By LEFT JOINing down to a dozen or fewer
> records, it seems to run in reasonable time, even though it's two
> subqueries that are not indexed.
> > It would probably be better to move the mt.tot!=0 test inside:
> >
> >                   GROUP BY  `Profit Centre`
> >           ) mt ON mt.pcid = tt.pcid
> >           WHERE  mt.tot != 0
> >   ) xx
> > -->
> >                   GROUP BY  `Profit Centre`
> >                   HAVING tot != 0                   -- added
> >           ) mt ON mt.pcid = tt.pcid
> >           -- removed:   WHERE  mt.tot != 0
> >   ) xx
> >
> > That would make mt have fewer rows, hence that unindexed JOIN could
> run faster.
> My first attempt to do that produced an error. And again, both the
> subqueries will have 12 or fewer records, so I'm wondering if this
> really helps anything.
[Rick James] From where I sit, it is hard to say, I can't tell if the HAVING would
decrease the "12".
> Thanks for your help!
[Rick James] You are welcome.  (I can also be found on
> [clip]
> ----------------
> :::: People see what they have been conditioned to see; they refuse to
> see what they don't expect to see. -- Merle P. Martin
> :::: Jan Steinman, EcoReality Co-op ::::
> --
> MySQL General Mailing List
> For list archives:
> To unsubscribe:

Re: Making Myself CrazyJan Steinman20 Sep
  • RE: Making Myself CrazyRick James20 Sep