Alec writes

>...Generally, Log A + log B is bound to be larger than log (A*B)...

Errm, log A + log B exactly = log(A*B) :-) .

PB

-----

Alec.Cawley@Quantel.Com wrote:
To reply to this, I think we have to understand why you have chosen to 
split the tables at all. It seems to me that this, by introducing a 
two-level lookup, is certain to be slower than any possible single table 
lookup. Generally, Log A + log B is bound to be larger than log (A*B). 

It appears that you are querying prediminantly by time. In this case, your 
index *must* start with the timestamp, not the monitor ID. I would suggest 
that you need an index on timestamp, and possible one on monitor ID - but 
not if, as you say, you never query by monitor ID at all. Do you need a 
PRIMARY KEY at all? In what way will your system break if there happen to 
be two entries with the same timestamp and monitor ID? Presumably this 
will reflect two events very close together: Wouldn't you rather store 
that fact rather than lose it? 

Generally, I would query your decision to have multiple tables by date to 
whatever. In my experience, whenever I have introduced such concepts into 
my early designs, they have disappeareed later into a better design. It 
looks to mee as if you are using a tool optimied to do fast searches on 
large databases, then crippling its ability to optimise.

I would expect the use of thousands of tables effectively to disable 
MySQL's caching capability, which is one of the biggest performance 
boosters.

        Alec






"John McCaskey" <johnm@klir.com> 
13/01/2006 17:20

To
"MySQL" <mysql@lists.mysql.com>
cc

Subject
Huge number of tables with InnoDB






Hi everyone,

 

I'm running MySQL 4.0.18 on Debian with a 2.6 linux kernel using ext3 as
the underlying filesystem for the database storage.

 

I currently have some InnoDB tables with the following structure:

 

Log_20060101 {

            Monitor_id medium int,

            Timestamp timestamp,

            Avg float,

PRIMARY KEY Monitor_id, Timestamp

}

 

We partition these tables by date as you can see as they grow very large
and they get to be slow to insert and query to over time.  We have the
idea to change the partitioning so the tables are as follows:

 

Log_[monitor_id] {

            Timestamp,

            Avg float,

PRIMARY KEY Timestamp

}

 

This seems to have several key advantages:

 

1)       Reduced disk space usage 

2)       Easier querying of data across time (but not across individual
id's, it turns out that doesn't ever really happen in our data usage
anyway though)

3)       Smaller tables, resulting in faster reads/writes, also smaller
data volume hopefully also resulting in faster reads/writes due to less
disk io neccesary

 

However, in our actual testing the 'faster writes' expectation is
getting shot down.  With 20,000 unique monitor id's and 8928 unique
timestamps inserting to the old set of tables (20051201-20051231 in this
case) is taking me about 4 hours 20 minutes.  Inserting to the 20,000
new tables (Log_0-Log_20000) is taking about 10 hours.  I expected this
to be much faster as I hoped finding the right table for an insert would
be a Hash type lookup taking linear time, while inserting into the large
tree structure in the old tables which have a much higher volume of rows
would be logarithmic time.  Where did I go wrong?  My only real thought
so far is the disk subsystem of the OS being slow with large numbers of
files, but I thought it wouldn't matter for InnoDB as the data storage
is all one file.  Does anyone know what would cause the inserts to be so
much slower? 

 

John A. McCaskey

Software Development Engineer

Klir Technologies, Inc.

johnm@klir.com

206.902.2027