List:Internals« Previous MessageNext Message »
From:Michael Widenius Date:February 28 2013 3:40pm
Subject:effects of load data local on RSS
View as plain text  
Hi!

>>>>> "Zardosht" == Zardosht Kasheff <zardosht@stripped> writes:

Zardosht> Hello all,
Zardosht> We have noticed something peculiar to our storage engine, but could
Zardosht> not reproduce with InnoDB. We notice when a user runs load data local
Zardosht> infile, that a lot of memory that we cannot account for gets used by
Zardosht> the system, sometimes leading the system to crash from being low on
Zardosht> memory. However, when running load data infile ignore, we see no such
Zardosht> issues.

Zardosht> As far as I can tell, the settings for the handler are the same for
Zardosht> load data local infile and load data infile ignore. Both load many
Zardosht> rows within the context of a single statement, and both set the flag
Zardosht> HA_EXTRA_IGNORE_DUP_KEY in handler::extra. Otherwise, as far as the
Zardosht> handler is concerned, these two look identical. Yet one utilizes much
Zardosht> more RSS than the other, and we do not see this behavior with InnoDB.

table->file->extra(HA_EXTRA_IGNORE_DUP_KEY) is only called if IGNORE
or DUP_REPLACE (replace conflicting rows) is used.

Zardosht> Does anyone have any thoughts on what may be happening?

This is a speculation, but it's true for MyISAM and could be true for
InnoDB.

When one is not using 'IGNORE', one can optimize the LOAD DATA into an
empty file as follows:

- Disable all indexes (except primary key if the index is clustered)
- Insert all rows, not updating indexes
- Scan all rows and create the indexes by sorting keys.
  - This can be either done all keys at once or key by key

The above will use more memory for creating the indexes, but it will
be notable faster than inserting rows one by one.

When using IGNORE the above can't be done as we need to skip all rows
that has a duplicate key.

MyISAM and Aria both optimize LOAD DATA the above way.
I think that newer InnoDB can do this too.

Regards,
Monty
Thread
effects of load data local on RSSZardosht Kasheff28 Feb
  • effects of load data local on RSSMichael Widenius28 Feb
    • Re: effects of load data local on RSSZardosht Kasheff28 Feb
      • Re: effects of load data local on RSSMichael Widenius1 Mar
        • Re: effects of load data local on RSSZardosht Kasheff1 Mar
          • Re: effects of load data local on RSSZardosht Kasheff6 Mar
            • Re: effects of load data local on RSSDavi Arnaut6 Mar
          • Re: effects of load data local on RSSMichael Widenius6 Mar