On 1/4/07, Daniel Kiss <mysql@stripped> wrote:
>
> Hi All,
>
> I'm designing a database where it is required for each row in a table to
> have a unique identifier accross a bunch of running MySQL servers on a
> network.
>
>
> I'm considering two options:
>
> 1. Have a two-column primary key, where the first column is a normal
> auto incremented ID field, the second identifies the database and
> contains a constant ID per server. Something like this:
> CREATE TABLE MyTable (
> ROWID int unsigned not null auto_increment,
> DBID int unsigned not null,
> AnyData varchar(10) not null,
> PRIMARY KEY (ROWID, DBID)
> );
>
> INSERT INTO MyTable (DBID, AnyData) VALUES (8, 'Any text');
>
> 2. I would use a traditional one-column binary primary key populated by
> the built-in uuid() fuction of MySQL. Like this:
> CREATE TABLE MyTable (
> ID binary(36) not null,
> AnyData varchar(10) not null,
> PRIMARY KEY (ID)
> );
>
> INSERT INTO MyTable (ID, AnyData) VALUES (uuid(), 'Any text');
>
>
> In my view both solutions have their adventages and disadvantages. The
> first is more optimal in storage space and speed, the second is easier
> to maintain, administer and query.
> And there is another aspect, which is needed to be tested, I guess...
> How fast is the second solution when I execute complex queries based on
> primary key relations? Does it pay to use the ease of the second
> solution? Anyone has any experience in similar problems?
> What is your opinion?
Human readability is important in case something goes wrong, and I like the
first solution better. I looked at the format of UUID in the MySQL
documentation ... not all that human-friendly.
In all the database code I've written (admittedly, all on a single server),
I've always had a function (written in PHP) that returns server unique
identifiers (and a globally unique identifier is along the same lines).
They have typically been fairly long character strings that include the Unix
time (seconds and microseconds) and the PID. I typically also spin-lock
until the microtime changes--that way it is guaranteed that two processes
can't have the same PID at the same time.
I'd say go with (a)generation that you understand and control, and
(b)human-readability in case you have to dissect it.
The server-unique identifiers that I described above have the advantage that
they double as timestamps.
Just an opinion.
Dave.