As I understand it, LOAD DATA LOCAL acts basically like ignore. It
What I am wondering is why would LOAD DATA LOCAL use more RSS than
LOAD DATA IGNORE for us, but not for InnoDB. Does anyone have any
On Thu, Feb 28, 2013 at 10:40 AM, Michael Widenius <monty@stripped> wrote:
>>>>>> "Zardosht" == Zardosht Kasheff <zardosht@stripped>
> Zardosht> Hello all,
> Zardosht> We have noticed something peculiar to our storage engine, but could
> Zardosht> not reproduce with InnoDB. We notice when a user runs load data local
> Zardosht> infile, that a lot of memory that we cannot account for gets used by
> Zardosht> the system, sometimes leading the system to crash from being low on
> Zardosht> memory. However, when running load data infile ignore, we see no such
> Zardosht> issues.
> Zardosht> As far as I can tell, the settings for the handler are the same for
> Zardosht> load data local infile and load data infile ignore. Both load many
> Zardosht> rows within the context of a single statement, and both set the flag
> Zardosht> HA_EXTRA_IGNORE_DUP_KEY in handler::extra. Otherwise, as far as the
> Zardosht> handler is concerned, these two look identical. Yet one utilizes much
> Zardosht> more RSS than the other, and we do not see this behavior with InnoDB.
> table->file->extra(HA_EXTRA_IGNORE_DUP_KEY) is only called if IGNORE
> or DUP_REPLACE (replace conflicting rows) is used.
> Zardosht> Does anyone have any thoughts on what may be happening?
> This is a speculation, but it's true for MyISAM and could be true for
> When one is not using 'IGNORE', one can optimize the LOAD DATA into an
> empty file as follows:
> - Disable all indexes (except primary key if the index is clustered)
> - Insert all rows, not updating indexes
> - Scan all rows and create the indexes by sorting keys.
> - This can be either done all keys at once or key by key
> The above will use more memory for creating the indexes, but it will
> be notable faster than inserting rows one by one.
> When using IGNORE the above can't be done as we need to skip all rows
> that has a duplicate key.
> MyISAM and Aria both optimize LOAD DATA the above way.
> I think that newer InnoDB can do this too.