----- Original Message -----
From: "Martijn Tonies" <m.tonies@stripped>
To: "Rhino" <rhino1@stripped>; <mysql@stripped>
Sent: Tuesday, March 14, 2006 2:16 PM
Subject: Re: Accountability with MySQL
>> > >Should you have a flag for the status "movement complete" ? I would
>> > >say
>> >>yes but instead of a simple checkbox, you could store a date value.
>> >>gives you two pieces of information
>> >>a) if the date is null then the movement is not complete.
>> >>b) if the date is NOT null then the movement is complete and you know
>> >>it finished.
>> > There, Shawn and I disagree :-)
>> > In case of a very simple database, one could use NULL
>> > as a "flag" or "signal". But in general, I would avoid
>> > storing NULLs.
>> > Basics of database design: store what you know.
>> > Given that NULLs basically means the absence of a value
>> > (eg: unknown), you shouldn't be storing NULLs.
> That's a bold statement ...
>> That's simply wrong. A null means "unknown or not applicable" and is a
> Ah, so now things become clear, a NULL actually can mean
> two things? So much for clarity then...
Come on; that's not fair. "Unknown" and "not applicable" are more like
different senses of the same thing, not two opposite things.
>> perfectly valid value to use in many, many situations. Two classic
> Let me first state that there's a difference between storing NULLs
> and handling NULLs in your result.
Yes, of course there is: so what? You said NULLs should never be _stored_;
that's what I'm responding to. Using NULLs is not very _difficult_ anyway,
just a bit tedious because it is another case to handle.
> So, let's debunk these two classic examples...
Please note that I did not say that you MUST use NULLs, just that they were
perfectly valid to use in a design.
>> 1. You have a table containing test scores. Some students get every
>> on a given test wrong. Some students don't write the test at all, perhaps
>> because they were sick. In this scenario, I think the correct thing to
>> is 0 for the students who got every question wrong and NULL for the
>> who didn't write the test.
> IMO, the better thing to do in this particular case is to NOT store a test
> result for the students that did not make the test.
Okay, that might be acceptable, if it doesn't cause you to lose track of the
student altogether. But if this table was the only one that even recorded
the _existence_ of the student, you'd have a problem; if someone tried to
verify that the student had attended this school, you wouldn't know that
they had. Or in a more probable case, if that was the only test for that
course and the student missed it and then had no row in the table, you might
not have any way of knowing that they took the course! And if they later
wanted to write the exam, having recovered from their illness, your query
might have the effect of keeping them from taking the second exam: your
query would report that they had never taken the first exam so an
adminstrator might refuse to let him/her take the second exam because they
(apparently) had never been scheduled to take the first one.
Now, you could have a second table to record people who were scheduled to
take tests but failed to take them to cover that situation but I think it
would be easier to record all students in one table and then simply store a
null for any test that they fail to take and a zero for every case where a
student got every answer wrong. When you compute the class average, the
avg() function would ensure that the students who got every question wrong
would pull down the class average but that students who failed to write the
test at all would NOT skew the average because the avg() function ignores
> So, an example table with test scores:
> StudentID Int,
> TestID Int,
> Score TinyInt Unsigned
> Now, students who didn't make the test won't have a record in here.
> Perfectly valid design AND you avoid storing NULLs.
>> 2. You have a table containing employee records. One of the columns is
>> "termination date". What value do you store for a new employee's
>> date? Well, if they are a contractor on a fixed length contract, you
>> calculate the date the contract ends; fair enough. If they are a
>> employee and your area has mandatory retirement, you could calculate the
>> date they turn 65 (or whatever) and use that. But what if they are a
>> permanent employee and you don't have mandatory retirement? I would store
>> NULL to mean "I don't know right now". Then, if and when they gave notice
>> that they were leaving, I would change the termination date from NULL to
>> their last day of work.
> Why store a date column if you don't know?
> Why not use:
> EmployeeID int,
> StartingDate Date,
> TERMINATED_EMPLOYEES (albeit a bit agressive ;) )
> Once more: perfectly valid design.
Yes, that is also a valid design but it means you have to have yet another
table that you could have avoided simply by permitting a null in the
employee table. Hey, if you really want your tables to proliferate like
this, that's up to you.
The key point is that NULLs _do_ work and are a legitimate design decision.
You don't _have_ to use them but they can save you some work and reduce the
number of tables you need. (They can also be a bit more work on the
I took your remarks to mean that NULLs were always a bad idea and were a
symptom of bad design and I strongly disagree with that.
If you are simply saying that you don't like them and prefer to use
different designs to avoid them, then I don't have any problem with that.
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.385 / Virus Database: 268.2.2/280 - Release Date: 13/03/2006