List:General Discussion« Previous MessageNext Message »
From:electroteque Date:July 9 2003 3:23am
Subject:RE: Faster reindexing
View as plain text  
when reimporting or reinserting or whatever from a huge db i usually drop
all the indexes reimport then create them again much quicker

-----Original Message-----
From: Paul DuBois [mailto:paul@stripped]
Sent: Wednesday, July 09, 2003 1:09 PM
To: Florian Weimer; mysql@stripped
Subject: Re: Faster reindexing


At 9:39 +0200 7/7/03, Florian Weimer wrote:
>I've got a table with 100 million rows and need some indexes on it
>(one row is 126 bytes).
>
>I'm currently using MyISAM and the indexing proceeds at an
>astonishingly low rate: about 200 MB per hour.  This is rate is far
>too low; if we had to recover the database for some reason, we'd have
>to wait for days.
>
>The table looks like this:
>
>CREATE TABLE flows (
>	version    CHAR NOT NULL,
>	router     CHAR(15) NOT NULL,
>	src_ip     CHAR(15) NOT NULL,
>	dst_ip     CHAR(15) NOT NULL,
>	protocol   TINYINT UNSIGNED NOT NULL,
>	src_port   MEDIUMINT UNSIGNED NOT NULL,
>	dst_port   MEDIUMINT UNSIGNED NOT NULL,
>	packets    INTEGER UNSIGNED NOT NULL,
>	bytes      INTEGER UNSIGNED NOT NULL,
>	src_if     MEDIUMINT UNSIGNED NOT NULL,
>	dst_if     MEDIUMINT UNSIGNED NOT NULL,
>	src_as     MEDIUMINT UNSIGNED NOT NULL,
>	dst_as     MEDIUMINT UNSIGNED NOT NULL,
>	src_net    CHAR(1) NOT NULL,
>	dst_net    CHAR(1) NOT NULL,
>	direction  CHAR(1) NOT NULL,
>	class      CHAR(1) NOT NULL,
>	start_time CHAR(24),
>	end_time   CHAR(24)
>);
>
>Indexes are created using this statement:
>
>mysql> ALTER TABLE flows
>     -> ADD INDEX dst_ip (dst_ip, src_ip),
>     -> ADD INDEX dst_port (dst_port, start_time),
>     -> ADD INDEX src_ip (src_ip, start_time),
>     -> ADD INDEX time (start_time);
>
>In theory, we could represent the columns router, src_ip, dst_ip,
>start_time, end_time using integers of the appropriate size, but this
>would make ad-hoc queries harder to type (and porting our applications
>would be even more difficult).

Perhaps, but as a test, you might add a couple of extra columns to
the table, then populate them like this after loading the table:

UPDATE flows SET int_src_ip = INET_ATON(src_ip), int_dst_ip =
INET_ATON(dst_ip);

Then try creating the indexes using int_src_ip and int_dst_ip rather
than src_ip and dst_ip.

If it's significantly faster, you may want to reconsider whether it might
not be worth using INET_ATON(X) in your queries rather than X.

>
>Should I switch to another table type?

It's easy enough to convert the table to, e.g., InnoDB and then
create the indexes, so an empirical test should not be difficult.

--
Paul DuBois, Senior Technical Writer
Madison, Wisconsin, USA
MySQL AB, www.mysql.com

Are you MySQL certified?  http://www.mysql.com/certification/


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:
http://lists.mysql.com/mysql?unsub=1

Thread
Faster reindexingFlorian Weimer7 Jul
  • Re: Faster reindexingPaul DuBois9 Jul
    • RE: Faster reindexingelectroteque9 Jul
      • Re: Faster reindexingDominicus Donny9 Jul
        • Re: Faster reindexingPaul DuBois9 Jul
      • Re: Faster reindexingDominicus Donny9 Jul
        • Re: Faster reindexingPaul DuBois10 Jul
          • Re: Faster reindexingJeremy Zawodny18 Jul
            • Re: Faster reindexingdaniel18 Jul
            • Re: Faster reindexingDominicus Donny18 Jul
    • RE: Faster reindexingDathan Vance Pattishall9 Jul
      • Re: Faster reindexingFlorian Weimer9 Jul
      • Re: Faster reindexingFlorian Weimer10 Jul
RE: Faster reindexingTerry Spencer9 Jul
RE: Faster reindexingRudy Metzger10 Jul