List:General Discussion« Previous MessageNext Message »
From:Nicholas Elliott Date:July 11 2003 3:06pm
Subject:InnoDB Performance issues
View as plain text  
Hey all,

I've been experimenting with the best way to store a large (~100GB) of data for retrieval.
Essentially, I'm storing 9 variables for approximately 1,000,000 locations a day for the
last ten years. This can work out at around 4MB a variable a day - but not all variables
are always present (could be null).

(If you don't care about the details, I'll summarize at the end of this email).


Inserting and retrieving from a MyISAM table seemed to be approaching impossible. (Not
totally surprising.) I originally had a table along the lines of:

create table basic_daily_report(
date DATE NOT NULL,
location_id MEDIUMINT UNSIGNED NOT NULL,
variable1 float,
variable2 float....
variable9 float,
primary key (date, location_id)
);

(Just a summary of the actual table)

With this I had a maxiumum table size of around 100GB - just barely enough to do it. I
expected I would end up segmenting by year, or something similar, as ugly as that is. I
tested InnoDB as an alternative to this, but we'll get to that in a second. Basically,
inserting a day's worth of data would take ages, and pretty much require an analyze table
for a couple hours every morning. Selecting was getting to be pretty slow, as well.
Eventually, I hit on the idea of including one row per day:


create table basic_daily_grid(
date DATE NOT NULL PRIMARY KEY,
variable1 MEDIUMBLOB,
variable2 MEDIUMBLOB ...
variable9 MEDIUMBLOB
);

And wrote a UDF such that you pass it the variable and a location, and it'll return the
exact value. This works well because every day has a constant number of locations in a
grid format, so it's simply an array lookup. So, 

select grid_point(location_id, variable1) from basic_daily_grid where date=20030101

would return the right value for locationid. It turns out this is almost (95%) as fast as
the first version in selecting, but it has the added bonus of inserts now only take ~5
seconds per day!




Alas, after inserting 260 days (less than a year) I hit the MyISAM table size limit of
4.2GB - because a BLOB is a variable length field.


-----------------------------------------------------------------


I mention all the above in case someone has an alternative solution I'm looking over.
Possible solutions I've found are a) use InnoDB instead, b) work with the source to
create a new field type BLOBARRAY of a constant width instead of dynamic, c) work with
the source to somehow overcome the 4.2GB limit on a dynamic table.


c) Seems unlikely - if the actual developers can't do it, I probably can't
b) Seems possible, I assume no one saw a need for a constant width column of 4MB, so
hopefully its not too difficult
a) Was my first try. Inserting takes about twice as long as myisam... sure, I can deal
with that. Selecting a specific date is in the same ballpark as well, so little problem
there. What I'm having severe performance issues on are querys that group, or do a
count(*). For example:


mysql> explain select date from basic_daily_grid_innodb;
+-------------------------+-------+---------------+---------+---------+------+------+-------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra |
+-------------------------+-------+---------------+---------+---------+------+------+-------------+
| basic_daily_grid_innodb | index | NULL | PRIMARY | 3 | NULL |301 | Using index |
+-------------------------+-------+---------------+---------+---------+------+------+-------------+




mysql> select date from basic_daily_grid_innodb;
...
317 rows in set (0.00 sec)

mysql> explain select date, count(*) from basic_daily_grid_innodb group by date;
+-------------------------+-------+---------------+---------+---------+------+------+-------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra |
+-------------------------+-------+---------------+---------+---------+------+------+-------------+
| basic_daily_grid_innodb | index | NULL | PRIMARY | 3 | NULL |301 | Using index |
+-------------------------+-------+---------------+---------+---------+------+------+-------------+


mysql> explain select date, count(*) from basic_daily_grid_innodb group by date;
...
317 rows in set (2 min 54.95 sec)


I assume this is due to versioning or some other transactional feature. Or, is this a bug,
or am I doing something wrong? I don't quite see why grouping items that are all unique
should be that much slower than not grouping. I need InnoDB for the unlimited table size,
but I don't (Really) need transactions, commit/rollback, or checkpoints.


Any suggestions on solving this last hurdle? Its entirely likely I'll need to group by
year and average the results, or something similar - and at 3 minutes a pop, thats a
little high for my liking. Perhaps I'm expecting too much?


Thanks,
Nick Elliott

Thread
InnoDB Performance issuesNicholas Elliott11 Jul
  • Re: InnoDB Performance issuesDan Nelson11 Jul
  • Re: InnoDB Performance issuesNicholas Elliott11 Jul
  • Re: InnoDB Performance issuesNils Valentin12 Jul
  • Re: InnoDB Performance issuesNils Valentin12 Jul
Re: InnoDB Performance issuesHeikki Tuuri13 Jul