From: Rick James Date: September 20 2012 4:29pm Subject: RE: Making Myself Crazy List-Archive: http://lists.mysql.com/mysql/228208 Message-Id: <2E7DD7ADE53B044C8C8BCD9C5829E1EB148BB5E0C1@SP2-EX07VS01.ds.corp.yahoo.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable below.. > -----Original Message----- > From: Jan Steinman [mailto:Jan@stripped] > Sent: Wednesday, September 19, 2012 8:59 PM > To: mysql@stripped > Subject: Re: Making Myself Crazy >=20 > Thanks for your help, Rick! >=20 > Interspersed are some questions and rationales for you to shoot down... > :-) [Rick James] :) >=20 > > From: Rick James > > > > s_product_sales_log has no PRIMARY KEY. All InnoDB tables 'should' > have an explicit PK. >=20 > This table really has no identifying information. There could be two > identical, valid rows, if the same person sold the same amount of the > same product to the same other person on the same day. [Rick James] The PK provides a _unique_ identification for each row. Each = other key depends on the PK to find the row. A PK is a UNIQUE key is an INDEX. If you don't provide a PK, InnoDB will provide an inaccessible, 6-byte, num= ber for the purpose. id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY is a common technique. It is 4 bytes. (Or less if you use MEDIUMINT, etc) (Hence, I said "should", not must.) > All the foreign keys were indexed. Is there something I don't > understand about something a PK field does? If an individual record > cannot be uniquely identified by its information, is there really any > need for a primary key? >=20 > None the less, I added field "ID" as an unsigned autoincrement INT and > made it PK. >=20 > > INT(5) is not what you think. INT is always a 32-bit, 4-byte > quantity, regardless of the number. >=20 > > > > Use TINYINT UNSIGNED, SMALLINT UNSIGNED, etc. wherever reasonable. >=20 > Understood. I make all my keys UINT even when they could be smaller, > because I've gotten into some gnarly consistency problems. with foreign > keys. [Rick James] Yes, types must be consistent for FKs. (I don't know the deta= ils of what aspects of the 'type' can be relaxed.) Usually that happens 'a= utomatically' when you build the schema and decide that foo_id is TINYINT U= NSIGNED, and apply that to all tables with foo_id. Changing the type after the fact can be trickier, especially because of PKs= . >=20 > >> KEY `is_value_added` (`is_value_added`), > > A single-column INDEX on a flag, ENUM, and other low-cardinality > field, is almost always useless. >=20 > Why is that? Surely, even a flag separates the record space into two? [Rick James] Yes, it separates. But let's look at the effort. There are t= wo 'files', one with all the data, one with all of this index. Plan A: Use the index. Walk through half of the index 'file' (actually a = BTree). For each 'row' encountered, reach over into the data -- this may b= e a random lookup. Back and forth, back and forth. Plan B: Ignore the index; simply scan all the data 'file'. As you scan, c= heck the row for the flag. Plan A is likely to involve more I/O, hence be slower. So, the optimizer c= hooses Plan B. There is no exact cutoff between the two approaches; it is = somewhere around 20%. Further note, that to reach from the Index into the Data requires a BTree l= ookup to find the data row. This uses the PRIMARY KEY, which is implicitly= included in the secondary key's BTree. (Hence one reason that InnoDB must= have a PK.) >=20 > > Performance issues... > >> WHERE YEAR(sales.`Date`) =3D '{{{1}}}' > > won't use > >> KEY `Date` (`Date`), > > because the column (Date) is hidden in a function. A workaround: > > WHERE `Date` >=3D '{{{1}}}-01-01' > > AND `Date` < '{{{1}}}-01-01' + INTERVAL 1 YEAR >=20 > Thanks! Good catch. >=20 > > JOINing two subqueries -- There is no way to index either of them, so > the JOIN will have to do a table scan of one temp table for every row > of the other temp table. > > (The alternative is to CREATE TEMPORARY TABLE... with an index, for > one of the subqueries.) >=20 > But I made sure the subqueries were the smallest possible sets -- > essentially, the domain of s_profit_centre, which only has 12 records. [Rick James] Good. With 12, the performance won't be too bad. >=20 > I had the entire thing coded up into one massive JOIN of everything, > and it took 30 minutes to run! By LEFT JOINing down to a dozen or fewer > records, it seems to run in reasonable time, even though it's two > subqueries that are not indexed. >=20 > > It would probably be better to move the mt.tot!=3D0 test inside: > > > > GROUP BY `Profit Centre` > > ) mt ON mt.pcid =3D tt.pcid > > WHERE mt.tot !=3D 0 > > ) xx > > --> > > GROUP BY `Profit Centre` > > HAVING tot !=3D 0 -- added > > ) mt ON mt.pcid =3D tt.pcid > > -- removed: WHERE mt.tot !=3D 0 > > ) xx > > > > That would make mt have fewer rows, hence that unindexed JOIN could > run faster. >=20 > My first attempt to do that produced an error. And again, both the > subqueries will have 12 or fewer records, so I'm wondering if this > really helps anything. [Rick James] From where I sit, it is hard to say, I can't tell if the HAVIN= G would decrease the "12". >=20 > Thanks for your help! [Rick James] You are welcome. (I can also be found on forums.mysql.com.) >=20 > [clip] >=20 > ---------------- > :::: People see what they have been conditioned to see; they refuse to > see what they don't expect to see. -- Merle P. Martin > :::: Jan Steinman, EcoReality Co-op :::: >=20 >=20 >=20 >=20 >=20 > -- > MySQL General Mailing List > For list archives: http://lists.mysql.com/mysql > To unsubscribe: http://lists.mysql.com/mysql