On Mon, Sep 19, 2011 at 7:19 AM, Johan De Meersman <vegivamp@stripped>wrote:
> ----- Original Message -----
> > From: "Hank" <heskin@stripped>
> > While running a -rq on a large table, I got the following error:
> > myisamchk: warning: Duplicate key for record at 54381140 against
> > record at 54380810
> > How do I find which records are duplicated (without doing the typical
> > self-join or "having cnt(*)>1" query)? This table has 144 million
> > rows, so that's not really feasible.
> Given that the error is a duplicate *key*, "select <key> from <table>
> by <key> having count(<key>) > 1" is an index-covered query, and
> should thus
> be perfectly feasible :-)
> What I'm not so sure about, is wether the duplicate key will show up
> correctly in the index - as that index may be marked corrupt - and so, if it
> falls back to a full tablescan, it's indeed going to take a long time. If it
> does, however, there's no other option anyway: the only way to do it fast is
> an index, and that index is untrustworthy.
Exactly - I can't create an index on the table until I remove the duplicate
records. I suppose I could create a non-unique index on the key fields, and
try that, but then I'd have to (1) create that index, and then (2) do the
full table scan query. Either way, it's going to take a tremendous amount
of time to do that.
Alternatively, it would be most helpful if the tools provided that find and
output the offending record block #s also provided a quick way to actually
print out those offending rows so I could track down how they got in there
in the first place.