List:General Discussion« Previous MessageNext Message »
From:Jay J Date:August 12 1999 10:49am
Subject:Re: A canonical dump
View as plain text  
----- Original Message -----
From: Martin Oldfield <m@stripped>
To: <mysql@stripped>
Sent: Thursday, August 12, 1999 3:56 AM
Subject: A canonical dump

> I'd like to compare two databases to see if they contain the same
> data. One approach, which seemed obvious to me, was to dump both
> databases and then diff the files. Of course this will generate lots
> of false positives because the order in which the records are returned
> won't be specified, nor indeed will the order of columns within a
> record.
> Now the latter problem can probably be ignored here because I can
> force both databases to use the same create statement, and thus get
> the same column order in both files.
> I could solve the former by sorting each file post-dump, but this
> seemed a little messy. Of course it probably makes sense to do the
> checks per table to make the sort more efficient.
> I thought this was probably quite a common problem so I thought I'd
> ask the list's advice before I implement anything.
> Cheers,
> --
> Martin Oldfield,

Well, for what it's worth ..

I have a table that consists of large text fields, in order to de-dupe them
I decided to 'fingerprint' them using MD5 before inserting.

I basically strip everything but letters/digits, lowercase the text and
create the MD5. The MD5 column is indexed and set to unique .. therefore any
inserts fail on duplicate keys.

Like I said - for what it's worth .. but this approach is extremely fast for
my purposes, and might serve you well in comparing the two DB's.

-Jay J

A canonical dumpMartin Oldfield12 Aug
  • Re: A canonical dumpJay J12 Aug
    • Re: A canonical dumpBenjamin Pflugmann12 Aug
  • A canonical dumpsinisa12 Aug