* Rajesh Kumar
> Roger Baklund unknowingly asked us:
> >>What would be a good way to deal with the following...
No, I did not. Scott Haneda asked the question. I replied. :) And I
recommended considering the SET column.
> But for a couple of reasons, SET is not recommend.
> 1. It introduces fragmentation, and indexes are pretty hard.
> 2. It defies the universal rule of a normalised table:
No. This is not an universal rule, and the SET column type _should_ be used
when it is the best choice. IMO.
I suppose the rule you are referring to is the first normal form (1NF).
> You cannot, no matter what, and should not, store more than one
Either you can but shouldn't, or you can't... ;)
> value in a single cell of a database table.
And how would you store a string? A single char per row? ;)
This would depend on how you define "one value". It is usefull to define a
string as "one value", and not as "a list of characters", because the latter
would be in conflict with 1NF. Another example is names: you would often
store "Baklund, Roger" in a single column, even if it is (at least) two
values: family name and given name. Similarly, a product number "X-23/b4"
may contain "multiple values", but it is still usefull to store it in a
single column, and define it as the single value "product#". Maybe the "X-"
prefix means this product is in some special category, you would still keep
a 'prod_cat' column, thus "breaking the rules" of redundancy. This is
normal, most databases of some size have such redundancies, it won't prevent
you from taking advantage of normalization, and you would still call the
database "normalized". Normalization is not an "exact science" in the real
world, you have to use what works best.
If the five checkboxes are static, they will never change, then I see no
reason not to use a SET column, unless, of course, he needs to index on
this. And I don't think he needs to, because he said what he needed was
statistics for the whole table, thus he needs to read the entire table
To stay within 1NF you can simply define a new term "checkboxvalue", which
is an integer between 0 and 31, representing all possible combinations of
the five checkboxes. This integer can be stored in a SET column, which also
gives you a nice "string interface" to the bit manipulation arithmetic you
otherwise would need to perform. And you get both: you can still use bit
arithmetic if you like.
You _can_ use an index on a SET column, but only for exact matches and
ranges, not when searching for "anyone who have checked 'car'". An exception
to this is the last item in the set, because that is assigned the highest
value. In the example in this thread, 'beetle' is the last value,
representing 16, so an indexed range check could be used to find anyone who
have or have not checked 'beetle': "WHERE choiceflag >= 16" or "WHERE
choiceflag < 16".
> Even if you did, it is going to be really hard later on to change the
> names of the SETS (this would contradict with the user's choice), and is
> going to be still harder to add another choice to your list.
Well, ALTER TABLE is not very hard... but not needing to is even easier, of
course. :) He said the list would not change.
> The best I would recommend, though it takes up more space in the table,
> is to have a separate column in another table with a one-to-one join,
> and each column as each choice, and each column with the ENUM type of
one-to-one? why not five columns in the original table? And why not go all
the way with two new tables:
choices (choiceid tinyint,choice varchar(255))
user_choices (userid int,choiceid tinyint)
> Think before you choose, for you don't want to be sorry later.
That is allways a good advice. :)