List:MySQL++« Previous MessageNext Message »
From:Warren Young Date:May 5 2009 6:31am
Subject:Re: Why Does SSQLS populate_* Function use Inefficient operator[]?
View as plain text  
Chris Morgan wrote:
> 
> Why does the ssqls.pl script use the inefficient operator [] (const char*
> field) instead of operator [] (int i)?

All the usual dynamic vs. static typing reasons, really.

SSQLS did behave that way in v2 and earlier, but I changed it in v3.0.0 
because that ties a program too closely to the DB schema, making it 
brittle in the face of schema changes.

Example: Add a column to a table Foo, then run a program that queries 
"SELECT * FROM Foo" and places the result into a vector of Foo (an 
SSQLS) with Query::storein().  Your program asplode when it tries to 
assign a value to foo[n+1], because Foo contains only n fields.  You 
have to change the definition of Foo and recompile the program to make 
it stop dumping core.  There are other fun variants on this problem, if 
you think about it.

One option you had to cope with such problems in v2 was to change the 
SSQLS definition in your program(s) and rebuild it/them every time you 
change the DB schema.  That's fine in smaller systems where you can stop 
all the clients, change the schema, replace all the client binaries, and 
bring everything back up.  It's totally impractical in larger systems, 
where you cannot bring everything down at the same time.  The larger a 
system gets, the more likely you'll have to do incremental roll-outs.

v3 fixes this because operator= for SSQLS now skips a field assignment 
when there is no field with a given name.  You can get a tiny bit of 
this advantage with integer indexes if you test the index against the 
number of fields in the SSQLS each time, but that eats into your 
execution time advantage, and still doesn't give most the advantages of 
the lookup-by-name method.

Another v2 coping strategy was to change all "SELECT *" and similar 
constructs to "SELECT exact,set,of,fields,supported,by,the,ssqls", so 
things like storein() don't care if new columns are added.  But, it 
still doesn't protect against column deletions or renamings.  You still 
have to change and rebuild the program to track such changes.  You're 
not much better off than before, when you had to change the SSQLS 
definition and rebuild.  And, it means more SQL in your C++, and more 
verbose SQL at that, something MySQL++ is supposed to fix.

v3 allows you to keep loose queries like "SELECT *" because it won't try 
to populate fields that don't exist in the SSQLS, and fields that still 
exist in the SSQLS but don't exist in the DB will remain untouched.

A side benefit of the new behavior is that you no longer have to declare 
separate SSQLS types for each and every sub-query you have on a given 
table.  You can now say something like "SELECT foo,bar FROM qux", then 
store it in a qux instance, which has more than just those two fields. 
Inefficient, yes, but programmer time is often more expensive than 
computer time.  And as with everything in C++, it's just an option for 
you to choose.  You can still declare SSQLSes for each query subset, if 
you like.

Perhaps you're now thinking about chastising me about how C++ is a 
statically typed language, and how foolish it is that MySQL++ should 
make you trade the advantages of static typing for the advantages of 
dynamic typing.  You are using C++ for a reason; if you wanted dynamic 
typing, you know where to get it, no doubt.

Before you ask me to don my Nomex underwear, please reflect that static 
typing comes from a single-program viewpoint.  This viewpoint says that 
when a C++ compiler stops complaining about type issues, you have no 
more type issues.  You get no such assurance when what you're really 
building is a distributed system of programs, as happens when you start 
using a client-server database.  Now you have two separate type systems 
which can change independently, but which somehow have to be made 
compatible, and remain compatible as the system evolves.  This is 
exactly where dynamically typed systems come into their strength.

Since we had to invent a new type system in MySQL++, why not make it 
dynamic, since that's the right thing in a distributed system?  Once the 
data gets inside your C++ code, away from MySQL++, you can keep all the 
benefits of static types, where the downsides are not as severe.

> The Reference Manual (
> http://tangentsoft.org/mysql++/doc/html/refman/classmysqlpp_1_1Row.html#a23)
> specifically warns not to do that.  

No, what it says is that operator[](int) is faster, not that the const 
char* variant should not be used.  If there's a fast option and a slow 
option, the slow option should be removed unless there's at least one 
case where it can pull its own weight.  I gave two cases above, and 
there are more besides.

Maybe you still disagree.  Okay, you have two options, as I see it.

First, you can wait for MySQL++ v3.1, and then provide a patch to give 
SSQLS v2 an option to make it use index based assignments.

Second, you can trawl back through the patch history of ssqls.pl known 
as custom.pl in v2 and earlier) to find where I changed the way 
assignments work and revert just that one change.  The result should 
still work, although at least one of the examples (and thus dtest) will 
then fail, because that example shows off this feature.  I imagine you 
can live with that.

Now allow me to quiz you.

In all this benchmarking of yours, is this assignment overhead you're 
worried about even 1% of the time it takes just to retrieve rows from 
the DB?  I would think disk and IPC overheads would totally swamp the 
assignment time.  Benchmarks trump intuition, so maybe you can show me 
I'm wrong?

And if I do happen to be wrong about this, how much will it cost to let 
Moore's Law fix it instead of the programmers?  If you take the standard 
18 month doubling time, the law is:

	d = 547.5 * log2(p)

giving time in days to achieve a higher relative power per buck, p.

Say you have the worst-case overhead I expect, 1%.  Compute power, p, 
has to be 1.01 times the current level to offset that overhead, so d is 
a little under 8.  That means if the overhead is less than 1%, as I 
expect, you have but to wait a week or less and the problem solves itself.

How does programmer time compare to hardware cost in your organization? 
  Unless you're rolling this system out by the millions -- in which case 
the incremental rollout issues I brought up above should be biting you, 
changing the calculus -- it's probably a lot cheaper to just wait for 
the faster hardware.
Thread
Why Does SSQLS populate_* Function use Inefficient operator[]?Chris Morgan5 May
  • Re: Why Does SSQLS populate_* Function use Inefficient operator[]?Warren Young5 May
    • Re: Why Does SSQLS populate_* Function use Inefficient operator[]?Warren Young5 May
      • Re: Why Does SSQLS populate_* Function use Inefficient operator[]?Chris Morgan5 May
        • Re: Why Does SSQLS populate_* Function use Inefficient operator[]?Warren Young5 May
          • Re: Why Does SSQLS populate_* Function use Inefficient operator[]?Chris Morgan5 May