"Martijn Tonies" <m.tonies@stripped> wrote on 16/03/2006 11:32:45:
> From: <Alec.Cawley@stripped>
> > "Martijn Tonies" <m.tonies@stripped> wrote on 16/03/2006 11:02:32:
> > > Well, the question still is if you should store "unknown" at all ;)
> > >
> > > Not according to Date: you should store what is known. See the
> > > about the "true propositions", from which relational databases are
> > derived
> > > (but you probably know that).
> > As someone totally unread in the theory of databases, that seems
> > puritanical. I assume that what Date would propose is that you have
> > another table (related by master key) in which, if you do not know
> > something, you do not enter it. But this means that if you have 10
> > different pieces of potentially but not necessarily available
> > about a single master record (e.g. a person), you have to do a 10-way
> > in order to retrieve all the information about them. Replacing a
> > theoretically ugly null flag with a 10 way join strikes me, as an
> > rather than a theoretician, the wrong side of the
> > trade-off.
> Using NULLs as well as de-normalization brings the risk of
> integrity problems to your storage, storing what is right is only
> a good thing.
> And when it comes to having to writing JOINs for all your queries,
> lo and behold, I bring you the wonder of the VIEW.
The VIEW eases the syntax, but does it do anything for performance? Surely
it must be much slower to read 11 different tables (Master record
containing all NOT NULL fields, and 10 slave records which may or may not
contain relevant fields)? Ignoring caching, you are going to have at least
one disk access for every NULL field (index lookup which fails) and two
for every non NULL field (index lookup, data lookup) for every null field.
This means that you have multiplied your number of disk accesses (ignoring
caching, again) by 6-11 times (assuming the master record takes two disk
accesses). That again seems a very high price to pay for theoretical