> On Friday 13 March 2009 09:48:36 Michael wrote:
>> I'm sorry for the rant, but nulls in databases make me see red! NULLS
>> ARE GARBAGE!
>> You are using a 'NULL-safe equal to operator', meaning, WHERE NOT
>> group_id = 3, and oh, by the way, I DON'T WANT ANY NULLS TO QUALIFY AS
>> And yes this will be much slower, because now we can't use the index on
>> Null is not an initial value, it is really saying that nothing has ever
>> been put into this field, item, or what-ever. It is uninitialized and
>> no one can say exactly what is in it, this is why it is marked as null.
>> Using the null-safe equal to operator seems wrong, of course too me
>> having any nulls in your data is wrong, or just plain dirty data, and I
>> wouldn't put any faith into the results from a database that contains
>> nulls, because the output is unpredictable if it is not very carefully
>> You (everyone in the SQL world) would be better off using the IS NULL
>> and IS NOT NULL operators and the IFNULL() function to find all null
>> values in their data, and initialize them to the appropriate initial
>> value. This may require discussions with the application designers or
>> project managers. Whom ever is responsible for these null values being
>> your data all the frecking time? It is people that don't really
>> understand what a null value is, and who also have the responsibility
>> of designing a database application. So people like me who write the
>> code need to add 'IS NOT NULL' to every conditional statement in our
>> What you want to do <noted in your email below> should be very simple,
>> but only if you have "CLEAN DATA". I would clean or what we call scrub
>> your data first, then you can execute a simple and very fast select
>> statement. Use a temporary table if you need to, just don't try to
>> process dirty data, the end results will only make you look bad to
>> anyone looking at the results.
>> My $0.02,
> This is news to me, and I'd like to understand, so please don't take this
> as a
> flame or anything.
> I think I understand this part: When designing the database structure, you
> don't want just one table with a million columns, most of which are empty
> any given row. What you do is break it up into several tables, with a
> relationship between the tables. Thus the term "Relational database
> management system" (RDBMS)
> theoretically you break up your data into tables so that there is never a
> applicable field for any given row. In the real world, this can lead to a
> of complexity, so many people usually cheat and have a few "does not
> fields for some records.
> Now, what I've been doing is using NULL for "does not apply" as it seems
> to be
> more true to the data model than using some sort of fake data.
> I freely admit that Nulls increase the complexity of the code, but isn't
> I would be interested in your explanation.
I agree with most everything you said, as far as the use of a RDBMS, and
especially: quoting you,
"theoretically you break up your data into tables so that there is never a
non-applicable field for any given row."
Here is where I disagree with you, and please DON'T take this as an insult
(we're just haggling over good/bad practices ), but I think it is lazy
database design when you say:
"In the real world, this can lead to a lot of complexity, so many people
usually cheat and have a few "does not apply" fields for some records."
OK! Now here is my top 5 explanations why using Nulls is a bad idea:
Explanation(5): The more you understand how the database is to be used,
and the more complexity and thought you put into your database design, the
less complex it will be to retrieve reliable information out of it.
Furthermore, (and this is probably what makes me crazy when Nulls are
evolved) after a ten year stretch of software development, where I and a
team designed our own databases, I did a nine year stretch of statistical
programming, using databases designed by other people, and Nulls in the
data made the results unpredictable, and yeah, made me crazy! I had to
write nightly processes to resolve inconsistencies in the data, if at
least report inconsistencies. You know the old saying "Garbage in =
Garbage out", to me Nulls are garbage, and if there is a good reason for
nulls to be a part of good clean data then someone please help me
Explanation(4):I think that the brilliant computer scientists who
developed the RDBMS database model, didn't want to support nulls
initially, but they had to because null is a form of data, it means "We
don't know what this is", and so it must be stored knowing that Null
should NOT be used as valid information.
Explanation(3):I think it was back in the sixties at Berkley, when some
smart guy on LSD found a use for the Null value, and now this use of Null
is taught in Universities all over the world, and considered to be a
blessed method to practice.
Explanation(2): Using a Null column, and you have an index on that column,
the index will not be used since NULL values do not get stored in an index
and that is why the optimizer will need to go to the data pages in order
to retrieve more information to suffice the requirements of the query.
Explanation(1): I'm just in a argumentative mood today.
Cheers and have a great weekend!