----- Original Message -----
From: Martin Oldfield <m@stripped>
Sent: Thursday, August 12, 1999 3:56 AM
Subject: A canonical dump
> I'd like to compare two databases to see if they contain the same
> data. One approach, which seemed obvious to me, was to dump both
> databases and then diff the files. Of course this will generate lots
> of false positives because the order in which the records are returned
> won't be specified, nor indeed will the order of columns within a
> Now the latter problem can probably be ignored here because I can
> force both databases to use the same create statement, and thus get
> the same column order in both files.
> I could solve the former by sorting each file post-dump, but this
> seemed a little messy. Of course it probably makes sense to do the
> checks per table to make the sort more efficient.
> I thought this was probably quite a common problem so I thought I'd
> ask the list's advice before I implement anything.
> Martin Oldfield,
Well, for what it's worth ..
I have a table that consists of large text fields, in order to de-dupe them
I decided to 'fingerprint' them using MD5 before inserting.
I basically strip everything but letters/digits, lowercase the text and
create the MD5. The MD5 column is indexed and set to unique .. therefore any
inserts fail on duplicate keys.
Like I said - for what it's worth .. but this approach is extremely fast for
my purposes, and might serve you well in comparing the two DB's.