Here's the situation. I have a table of about 900 million rows, consisting
of a bigint and an int in each row. There's an index on the bigint. The
table is referenced in a SELECT in other DB operations roughly 15 or 20
times per day.
We're under tight deadlines and some operations on the table were getting
too slow, so two weeks ago I initiated a DELETE from the table that should
remove about 600 million of the rows, based on values in the int column.
It's still running.
In the interim, it appears that the transaction has finished (doing a select
will yield rows that should be there while rows that should not are indeed
excluded), but the transaction is still "updating" in the processlist and
hasn't returned to the prompt that issued it. Further, MySQL appears to be
thrashing the disk, running about 250 transactions per second of about 20-25
KB per transaction. The end result of this is that any DB operation we run
that's small enough to be in memory is unaffected, but anything that starts
swapping to the disk is sloooow.
We have changed our workflow and I've changed the code so that the affected
table isn't hurting us directly anymore, but now it's two weeks on and very
soon our deadline will have passed and I'll need to resolve this if the
transaction doesn't finish.
At this point I don't care what I should or should not have done, or what
went wrong to cause this sort of hang. I need to know what my options are
when we have time to work on the server without impacting other people's
deadlines, which means I need a few educated guesses on what's going on with
the DB. What will happen if I stop the transaction from my client? If I kill
the process? What will happen if we reboot the server? What the heck is
MySQL DOING that's thrashing the disk anyway? Why is it not using a higher
throughput to the disk?
We're okay for now, but we're kind of without a net in that backups take 16
hours to run because of the slow disk access. So we have something, it's
just not very optimal.
But at some point in the next few weeks I'm going to have to take some sort
of action if this thing doesn't commit, and I don't want it to be offline
for several days undoing whatever it's done.
Any advice would be most appreciated.
-- Joe