List:General Discussion« Previous MessageNext Message »
From:Johan De Meersman Date:November 4 2011 10:27am
Subject:Re: Deleting Records in Big tables
View as plain text  
----- Original Message -----
> From: "Reindl Harald" <h.reindl@stripped>
> 
> well i guess you have to sit out add the key
> wrong table design having an id-column without a key or
> something weird in the application not using the primary
> key for such operations

For high-volume insert-only tables the lack of a key could be intentional; with the
obvious downside that any non-insert operations are going to complete in O(fuck).

Adarsh, I suggest you first work out whether the lack of a primary key was intentional. If
you decide that it is safe to add one (and that you actually do want one), your "fastest"
route is going to be creating a second table identical to the original one but with the
addition of the appropriate primary key, insert-select-ing the data you want to retain
and switching out the tables as per Ananda's suggestion. That'll take rougly the same
time as your delete operation (well, a bit more for the index build) but you now have a
fully defragmented table with an index.

If, however, you find that adding a key is undesireable, I still recommend (also as per
Ananda's suggestion) that you recreate and switchout the table.

You can't do it atomically, unfortunately - DML statements like "rename table" are never
part of a larger transaction and can also not be executed while the affected tables are
locked. Make sure to check wether no inserts have happened between the copy and the
rename - shouldn't, really, if you type the commands as a single colon-divided line; but
check anyway. Ideally, of course, clients are not even connected while performing this.

All of the above, however, is only relevant if you need to delete a LOT of the data (say,
over half) - the create/insert path is going to have to write the full records to disk,
which is much slower than just marking records as deleted in the existing table. If you
are only going to delete a (relative) handful of records, just delete them and be done
with it.

As for the segmented delete, I'm not sure that's going to be useful here. There's no
usable keys, so regardless of how much records you delete, the table is going to be fully
scanned anyway. I suspect that segmented deletes are going to yield only a fractional
speed benefit, but multiply the total time by the number of segments you've cut the
delete into.


-- 
Bier met grenadyn
Is als mosterd by den wyn
Sy die't drinkt, is eene kwezel
Hy die't drinkt, is ras een ezel
Thread
Deleting Records in Big tablesAdarsh Sharma4 Nov
  • Re: Deleting Records in Big tablesAnanda Kumar4 Nov
    • Re: Deleting Records in Big tablesAdarsh Sharma4 Nov
      • Re: Deleting Records in Big tablesAndy Wallace4 Nov
        • Re: Deleting Records in Big tablesDerek Downey4 Nov
          • Re: Deleting Records in Big tablesAndy Wallace4 Nov
  • Re: Deleting Records in Big tablesReindl Harald4 Nov
    • Re: Deleting Records in Big tablesAnanda Kumar4 Nov
      • Re: Deleting Records in Big tablesReindl Harald4 Nov
    • Re: Deleting Records in Big tablesJohan De Meersman4 Nov
  • Re: Deleting Records in Big tablesmos9 Nov
    • Re: Deleting Records in Big tablesJohan De Meersman10 Nov