Chris Morgan wrote:
>
> Why does the ssqls.pl script use the inefficient operator [] (const char*
> field) instead of operator [] (int i)?
All the usual dynamic vs. static typing reasons, really.
SSQLS did behave that way in v2 and earlier, but I changed it in v3.0.0
because that ties a program too closely to the DB schema, making it
brittle in the face of schema changes.
Example: Add a column to a table Foo, then run a program that queries
"SELECT * FROM Foo" and places the result into a vector of Foo (an
SSQLS) with Query::storein(). Your program asplode when it tries to
assign a value to foo[n+1], because Foo contains only n fields. You
have to change the definition of Foo and recompile the program to make
it stop dumping core. There are other fun variants on this problem, if
you think about it.
One option you had to cope with such problems in v2 was to change the
SSQLS definition in your program(s) and rebuild it/them every time you
change the DB schema. That's fine in smaller systems where you can stop
all the clients, change the schema, replace all the client binaries, and
bring everything back up. It's totally impractical in larger systems,
where you cannot bring everything down at the same time. The larger a
system gets, the more likely you'll have to do incremental roll-outs.
v3 fixes this because operator= for SSQLS now skips a field assignment
when there is no field with a given name. You can get a tiny bit of
this advantage with integer indexes if you test the index against the
number of fields in the SSQLS each time, but that eats into your
execution time advantage, and still doesn't give most the advantages of
the lookup-by-name method.
Another v2 coping strategy was to change all "SELECT *" and similar
constructs to "SELECT exact,set,of,fields,supported,by,the,ssqls", so
things like storein() don't care if new columns are added. But, it
still doesn't protect against column deletions or renamings. You still
have to change and rebuild the program to track such changes. You're
not much better off than before, when you had to change the SSQLS
definition and rebuild. And, it means more SQL in your C++, and more
verbose SQL at that, something MySQL++ is supposed to fix.
v3 allows you to keep loose queries like "SELECT *" because it won't try
to populate fields that don't exist in the SSQLS, and fields that still
exist in the SSQLS but don't exist in the DB will remain untouched.
A side benefit of the new behavior is that you no longer have to declare
separate SSQLS types for each and every sub-query you have on a given
table. You can now say something like "SELECT foo,bar FROM qux", then
store it in a qux instance, which has more than just those two fields.
Inefficient, yes, but programmer time is often more expensive than
computer time. And as with everything in C++, it's just an option for
you to choose. You can still declare SSQLSes for each query subset, if
you like.
Perhaps you're now thinking about chastising me about how C++ is a
statically typed language, and how foolish it is that MySQL++ should
make you trade the advantages of static typing for the advantages of
dynamic typing. You are using C++ for a reason; if you wanted dynamic
typing, you know where to get it, no doubt.
Before you ask me to don my Nomex underwear, please reflect that static
typing comes from a single-program viewpoint. This viewpoint says that
when a C++ compiler stops complaining about type issues, you have no
more type issues. You get no such assurance when what you're really
building is a distributed system of programs, as happens when you start
using a client-server database. Now you have two separate type systems
which can change independently, but which somehow have to be made
compatible, and remain compatible as the system evolves. This is
exactly where dynamically typed systems come into their strength.
Since we had to invent a new type system in MySQL++, why not make it
dynamic, since that's the right thing in a distributed system? Once the
data gets inside your C++ code, away from MySQL++, you can keep all the
benefits of static types, where the downsides are not as severe.
> The Reference Manual (
> http://tangentsoft.org/mysql++/doc/html/refman/classmysqlpp_1_1Row.html#a23)
> specifically warns not to do that.
No, what it says is that operator[](int) is faster, not that the const
char* variant should not be used. If there's a fast option and a slow
option, the slow option should be removed unless there's at least one
case where it can pull its own weight. I gave two cases above, and
there are more besides.
Maybe you still disagree. Okay, you have two options, as I see it.
First, you can wait for MySQL++ v3.1, and then provide a patch to give
SSQLS v2 an option to make it use index based assignments.
Second, you can trawl back through the patch history of ssqls.pl known
as custom.pl in v2 and earlier) to find where I changed the way
assignments work and revert just that one change. The result should
still work, although at least one of the examples (and thus dtest) will
then fail, because that example shows off this feature. I imagine you
can live with that.
Now allow me to quiz you.
In all this benchmarking of yours, is this assignment overhead you're
worried about even 1% of the time it takes just to retrieve rows from
the DB? I would think disk and IPC overheads would totally swamp the
assignment time. Benchmarks trump intuition, so maybe you can show me
I'm wrong?
And if I do happen to be wrong about this, how much will it cost to let
Moore's Law fix it instead of the programmers? If you take the standard
18 month doubling time, the law is:
d = 547.5 * log2(p)
giving time in days to achieve a higher relative power per buck, p.
Say you have the worst-case overhead I expect, 1%. Compute power, p,
has to be 1.01 times the current level to offset that overhead, so d is
a little under 8. That means if the overhead is less than 1%, as I
expect, you have but to wait a week or less and the problem solves itself.
How does programmer time compare to hardware cost in your organization?
Unless you're rolling this system out by the millions -- in which case
the incremental rollout issues I brought up above should be biting you,
changing the calculus -- it's probably a lot cheaper to just wait for
the faster hardware.