OK. Going to try this again. After reading through these emails I
think I have learned a little more about the way you are thinking.
I DO NOT want to start some kind of flame war.
However, I disagree very strongly with what you are saying. Yes, you
are right, sharding does require more complexity from the application
layer. Sorry for all you developers out there (and I can safely say
that I am NOT a developer!!).
The fundamental issue for you, as I see it, is the increased complexity
caused by sharding the application.
That being said, I will say this...if you develop on some other RDBMS
such as MS or Oracle is it possible to deleveop something like you are
saying...an all-inclusive database that isn't "sharded"? Yep, when I
worked at Netzero in 2001 for example we had two database servers
running Oracle, one on the east coast in Virginia and one one the west
coast in California. The east coast server was a backup of the west
coast server. So one database server did the billing for all of
Netzero's customers. Millions of customers..absolutely. All in one
nice tidy box that I am sure was easier to develop the billing
Here is the kicker. Each box was a top of the line Sun server that had
32 processors and 32 gigs of RAM. They could handle up to 64 procs and
64 gigs. And each cost well over a million dollars for the hardware
alone. Running Oracle on it must have cost over 100,000 dollars for
software licenses. Granted this was in 2001, but the licensing cost for
Oracle haven't gone down any that I am aware of...and the hardware cost
will still be quite steep to do this type of thing.
So I ask you this..
Would it be better to go with that scenario or something like this:
Implement the billing application using MySQL. Shard it. Create
complexity. Your hardware cost saving alone will pay for multiple
developers to handle any complexity increases. Any decent DBA is going
to be able to handle multiple servers required to operate this setup.
You will probably see a decrease in salary cost moving from Oracle to
So for the bottom line of the company it is a overall win by far. It is
only the inherent difficulty in moving complex systems from one type of
DB to another that keep more companies from switching. Why hasn't this
happend previously?? Because until version 4 of MySQL was stable there
were many features not available in MySQL that were needed by these
types of systems.
It is my contention that as the clustering capabilities of MySQL
continue to grow and mature (think of when version 6.0 goes stable)
companies will move to MySQL in droves. THEN you have the ability to
build a single "virtual" database (at least from the point of view of
your application) that will scale simply and elegantly. As I said in
the previous email it is only that 5.1 is in beta that keeps this from
being available now. And many companies, such as Kaneva, are doing this
The only reason that companies like Digg and Flikr can exist and grow at
such phenomenal rates is that they keep the cost of the development of
the system to a minimum and the overhead of operating (licensing costs
and hardware cost) down as low as possible. In addition, of course,
they need the ability to scale out very quickly. Digg didn't get any
significant funding until just recently. And yet they epitomize the web
2.0 companies. They did it by both keeping their cost down and having
the ability to grow quickly. Couldn't have done it with Oracle or MS.
Just my thoughts :)
Naz Gassiep wrote:
> The problem with sharding I have is the large amount of code
> required in the app to make it work. IMHO the app should be agnostic to
> the underlying database system (by that I don't mean the DB in use such
> as MySQL or whatever or the schema, I mean the way the DB has been
> deployed) so that changes can be made to it without having to worry
> about impacting app code. This is one of my fundamental design imperatives.
> Then again, I'm not a regular MySQL user so I don't know what is and
> is not the norm in the MySQL world.
> - Naz.
> Evaldas Imbrasas wrote:
>> You certainly have a right to disagree, but pretty much every
>> scalability talk at the MySQL conference a few weeks ago was focused
>> on data partitioning and sharding. And those talks very given by folks
>> working for some of the most popular (top 100) websites in the world.
>> It certainly looks like data partitioning is the way to go in the
>> MySQL world at this point, probably at least until production-ready
>> and feature-full MySQL Cluster is out. And even then large percentage
>> of dotcom companies would use data partitioning instead since it can
>> be implemented on commodity hardware.
>> Once again, we're talking *really* big websites using MySQL (not
>> Oracle or SQL Server or whatever) here. Most websites won't ever need
>> to partition their production databases, and different RDMS might have
>> different approaches for scalability.
>> On 5/24/07, Naz Gassiep <naz@stripped> wrote:
>>> Data partitioning? Sorry, I disagree that partitioning a table into more
>>> and more servers is the way to scale properly. Perhaps putting
>>> databases' tables onto different servers with different hardware
>>> designed to meat different usage patterns is a good idea, but data
>>> partitioning was a very short lived idea in the world of databases and
>>> I'm glad that as an idea it is dying in practice.