List:General Discussion« Previous MessageNext Message »
From:Rick James Date:June 20 2012 5:55pm
Subject:RE: Indexing about 40 Billion Entries
View as plain text  
Even if you get past the "REPAIR WITH KEYCACHE", the ALTER will still take days, maybe
weeks.

I strongly recommend you do not try to access that many rows directly.  Instead, build
summary tables, and access them.  We can discuss further.

Unfortunately, you did not even include a PRIMARY KEY when you built the table.  This
makes any operations slow.

> I need an index for ruid1 and for ruid2.
What will your SELECTs look like?  You may be better off with
INDEX(ruid1, ruid2, overlap), and
INDEX(ruid2, ruid1, overlap)

Will you be adding more rows to this table?  Or is it now static?

What are the semantics of the fields?


> -----Original Message-----
> From: Ananda Kumar [mailto:anandkl@stripped]
> Sent: Wednesday, June 20, 2012 4:37 AM
> To: Christian Koetteritzsch
> Cc: mysql@stripped
> Subject: Re: Indexing about 40 Billion Entries
> 
> looks like the value that you give for myisam_max_sort_size is not
> enough for the index creation and hence it doing a "REPAIR WITH
> KEYCACHE"
> 
> Use the below query to set the min values required for
> myisam_max_sort_size to avoid "repair with keycache"
> 
> 
> select
>         a.index_name as index_name,
>         ceil(
>         (select count(*) from `db_name`.`table_name`) *(
>         @@global.myisam_data_pointer_size +
>         sum(ifnull(1 + `a`.`sub_part` *
> (`b`.`character_octet_length`/`b`.`character_maximum_length`),
>                 ifnull(1 + `b`.`character_octet_length`,
>                         case
>                                 when `b`.`data_type` = 'tinyint' then 1
>                                 when `b`.`data_type` = 'smallint' then
> 2
>                                 when `b`.`data_type` = 'mediumint' then
> 3
>                                 when `b`.`data_type` = 'int' then 4
>                                 when `b`.`data_type` = 'bigint' then 8
>                                 when `b`.`data_type` = 'float' then 4
>                                 when `b`.`data_type` = 'double' then 8
>                                 when `b`.`data_type` = 'real' then 8
>                                 when `b`.`data_type` = 'bit' then 8
>                                 when `b`.`data_type` = 'date' then 3
>                                 when `b`.`data_type` = 'datetime' then
> 8
>                                 when `b`.`data_type` = 'timestamp' then
> 4
>                                 when `b`.`data_type` = 'time' then 3
>                                 when `b`.`data_type` = 'year' then 1
>                                 when `b`.`data_type` = 'enum' then 2
>                                 when `b`.`data_type` = 'set' then 8
>                                 when `b`.`data_type` = 'decimal' then 8
> 
>                         end
>                         )
> 
>         )
>         + if(`a`.`nullable`='YES',1,0)
>         + if(`b`.`character_octet_length` >=255,2,0)
>         ))/1048576)*1048576  as `index_size` from
> `information_schema`.`statistics` `a`, `information_schema`.`columns`
> `b` where
>         `a`.`table_name`=`b`.`table_name` and
>         `a`.`table_schema`=`b`.`table_schema` and
>         `a`.`column_name`=`b`.`column_name` and
>         `a`.`table_schema`='db_name' and
>         `a`.`table_name`='table_name'
> 
> group by `a`.`index_name`
> order by `index_size` desc limit 1;
> 
> On Wed, Jun 20, 2012 at 3:15 PM, Christian Koetteritzsch <
> ckoetteritzsch@stripped> wrote:
> 
> > Hi guys,
> >
> > As the title says I'm trying to index 40 billion entries with two
> > indexes on a server with 16 cores and 128GB RAM. The table is the one
> > below and it is a myisam table. The *.myd file is about 640GB
> >
> > DROP TABLE IF EXISTS `l4_link`;
> > CREATE TABLE  `l4_link` (
> >  `ruid1` int NOT NULL,
> >  `ruid2` int NOT NULL,
> >  `overlap` int NOT NULL
> > );
> >
> > I need an index for ruid1 and for ruid2.
> >
> > The status for this table is the following:
> >
> >           Name: l4_link
> >         Engine: MyISAM
> >        Version: 10
> >     Row_format: Fixed
> >           Rows: 39806500262
> >  Avg_row_length: 17
> >    Data_length: 676710504454
> > Max_data_length: 4785074604081151
> >   Index_length: 1024
> >      Data_free: 0
> >  Auto_increment: NULL
> >    Create_time: 2012-06-19 14:51:29
> >    Update_time: 2012-06-19 16:26:35
> >     Check_time: NULL
> >      Collation: utf8_general_ci
> >       Checksum: NULL
> >  Create_options:
> >        Comment:
> >
> > The variables for myisam are the following:
> > mysql> show global variables like '%myisam%';
> > +-----------------------------**-----------+------------------**
> > ---------------------+
> > | Variable_name                       | Value
> >          |
> > +-----------------------------**-----------+------------------**
> > ---------------------+
> > | myisam_data_pointer_size  | 6         |
> > | myisam_max_sort_file_size | 9223372036853727232      |
> > | myisam_mmap_size             | 18446744073709551615    |
> > | myisam_recover_options     | BACKUP       |
> > | myisam_repair_threads       | 1             |
> > | myisam_sort_buffer_size    | 8388608
> |
> > | myisam_stats_method        | nulls_unequal |
> > | myisam_use_mmap             | OFF
> >    |
> > +-----------------------------**----------+-------------------**
> > ---------------------+
> > 8 rows in set (0.00 sec)
> >
> > The temp folder has about 16tb free space.
> >
> > When I start the indexing process, it copies the 640Gb into a temp
> > file and then starts with "repair with keycache".
> > On the internet I found that if it says "repair with keycache" you
> > shold increase the "myisam_max_sort_file_size", but this didn't work.
> > It still says "repair with keycache" after it copied the data to the
> > temp dir.
> >
> > I hope you have any idea how to fix this.
> >
> > Thanks in advance.
> >
> > Christian
> >
> >
> > --
> > MySQL General Mailing List
> > For list archives: http://lists.mysql.com/mysql
> > To unsubscribe:    http://lists.mysql.com/mysql
> >
> >
Thread
Indexing about 40 Billion EntriesChristian Koetteritzsch20 Jun
  • Re: Indexing about 40 Billion EntriesAnanda Kumar20 Jun
    • RE: Indexing about 40 Billion EntriesRick James20 Jun
      • Re: Indexing about 40 Billion EntriesChristian Koetteritzsch20 Jun
        • Re: Indexing about 40 Billion EntriesChristian Koetteritzsch20 Jun
        • RE: Indexing about 40 Billion EntriesRick James20 Jun
          • Re: Indexing about 40 Billion EntriesChristian Koetteritzsch21 Jun
            • Re: Indexing about 40 Billion Entriesmos21 Jun
              • Re: Indexing about 40 Billion EntriesChristian Koetteritzsch21 Jun
              • RE: Indexing about 40 Billion EntriesRick James21 Jun
          • UNION and ORDER BYhsv22 Jun
            • RE: UNION and ORDER BYRick James22 Jun
  • Re: Indexing about 40 Billion EntriesShawn Green20 Jun
    • RE: Indexing about 40 Billion EntriesRick James20 Jun
  • Re: Indexing about 40 Billion EntriesBrent Clark21 Jun