List:General Discussion« Previous MessageNext Message »
From:Harrison Fisk Date:March 3 2005 9:23pm
Subject:Re: "Load Data Infile ... Repair with KeyCache" --WAY SLOW!
View as plain text  
Hi,

On Mar 3, 2005, at 3:13 PM, mos wrote:

> At 12:39 PM 3/3/2005, Harrison Fisk wrote:
>> Hi,
>>
>> On Mar 3, 2005, at 11:32 AM, mos wrote:
>>
>>> At 10:07 PM 3/2/2005, you wrote:
>>>> Hello.
>>>>
>>>> You may use ALTER TABLE .. DISABLE KEYS to speed up the loading 
>>>> process
>>>> on the MyISAM table.
>>>
>>> That may work provided I can get the keys rebuilt later using 
>>> FileSort and not KeyCache.
>>>
>>> You see the problem isn't in loading the data into the table which 
>>> occurs reasonably fast that because running "Load Data Infile" on an 
>>> empty table will disable the keys until all the data is loaded, so 
>>> explicitly disabling before hand them won't help.  After the data is 
>>> loaded, Load Data Infile will then rebuild the keys and will choose 
>>> to use "Repair with keycache" whereas "Repair with filesort" would 
>>> be 100x-1000x faster. There doesn't seem to be any way to get it to 
>>> use "Repair with filesort".
>>>
>>> So I could use disable keys as you had suggested and then rebuild 
>>> them manually with myisamchk "repair with sort" provided it doesn't 
>>> reload all the data into a temporary table (which Alter Table 
>>> usually does). If it does create a temporary table it will 
>>> physically reloads the 500 million rows a second time and I will 
>>> need another 100g of free disk space.
>>>
>>> So is there a solution to this paradox?
>>
>> It chooses which method to use based on your myisam_sort_buffer_size, 
>> myisam_max_sort_file_size and myisam_max_extra_sort_file_size.  Have 
>> you increased the size of these?  Keep in mind these are SESSION 
>> variables, so they can be set on the connection right before you LOAD 
>> DATA INFILE.
>>
>> You always have to tell MySQL how much extra memory/disk it can use, 
>> it can't assume that you want it to use it.  That is why it will use 
>> Repair by keycache, since it doesn't use any extra resources.  If you 
>> increase the amount of diskspace or memory, then it should use a 
>> Repair by sort instead.
>
> That's what I thought. I have myisam_max_sort_file_size set to 15.7GB 
> and myisam_max_extra_sort_file_size set to 2GB. I confirmed this with 
> Show Variables. I thought this would be enough to sort the indexes. I 
> have confirmed there is enough free space on the hard drive to handle 
> this both in the database directory and the mysql temp directory.
>
>
>> Also keep in mind that Repair by sort doesn't work for UNIQUE or 
>> PRIMARY KEYs.
>
> So the solution is to keep using Load Data Infile with even larger 
> max_sort variables and more disk space? There are only 4 indexes on 
> the table and the largest index is around 50 bytes. This is running on 
> an AMD3500+ with 1gb RAM.
>
> My problem is it takes about 6 hours to load in the rows using Load 
> Data Infile to the empty table (which is fine) then by doing a "Show 
> ProcessList" I'll know if it is using FileSort or KeyCache. That is a 
> lot of time to waste if it ends up using KeyCache because then I know 
> it could be a few days to index the table.

If you want to test it out to see how much is required (though it 
shouldn't matter, just set them as large as you possibly could (see 
below)), then take your table and load it once with indexes disabled.  
Then enable them.  It will immediately do one or the other.  If you are 
unhappy, then you can kill it, change variables and repeat.

> So how do people force Load Data Infile to use FileSort and not 
> KeyCache? In other words how do they know ahead of time what to set 
> myisam_max_extra_sort_file_size set and myisam_max_sort_file_size to 
> without guessing? To me it seems a lot like hit and miss guesswork. I 
> would rather see a parameter "BySort" or "ByKeyCache" added to Load 
> File command that forces it to use either FileSort or KeyCache rather 
> than having the developer cross his fingers and hope the system 
> variables are set correctly.

You don't play around with the variables.  Set them to the largest 
possible amount that you would let MySQL use (ie. how much ram and how 
much disk space).  Then it will have the information to make the 
decision properly.  It only uses what it needs, up to your limit.  If 
it sees that your limits are too small, then it won't be able to do a 
By Sort, even if you want it to.  Why aren't you setting them as large 
as you can in the first go?

Regards,

Harrison

-- 
Harrison C. Fisk, Trainer and Consultant
MySQL AB, www.mysql.com

Get a jumpstart on MySQL Cluster -- 
http://www.mysql.com/consulting/packaged/cluster.html

Thread
"Load Data Infile ... Repair with KeyCache" --WAY SLOW!mos2 Mar
  • Re: "Load Data Infile ... Repair with KeyCache" --WAY SLOW!Gleb Paharenko3 Mar
    • Re: "Load Data Infile ... Repair with KeyCache" --WAY SLOW!mos3 Mar
      • Re: "Load Data Infile ... Repair with KeyCache" --WAY SLOW!Harrison Fisk3 Mar
        • Re: "Load Data Infile ... Repair with KeyCache" --WAY SLOW!mos3 Mar
          • Re: "Load Data Infile ... Repair with KeyCache" --WAY SLOW!Harrison Fisk3 Mar
    • Re: "Load Data Infile ... Repair with KeyCache" --WAY SLOW!mos6 Mar
      • Re: "Load Data Infile ... Repair with KeyCache" --WAY SLOW!Harrison Fisk7 Mar
        • Re: "Load Data Infile ... Repair with KeyCache" --WAY SLOW!mos7 Mar