List:General Discussion« Previous MessageNext Message »
From:B. Keith Murphy Date:May 25 2007 5:10am
Subject:Re: Integrity on large sites
View as plain text  
OK.  Going to try this again.  After reading through these emails I 
think I have learned a little more about the way you are thinking. 

I DO NOT want to start some kind of flame war. 

However, I disagree very strongly with what you are saying.  Yes, you 
are right, sharding does require more complexity from the application 
layer.  Sorry for all you developers out there (and I can safely say 
that I am NOT a developer!!). 

The fundamental issue for you, as I see it, is the increased complexity 
caused by sharding the application.

That being said, I will say this...if you develop on some other RDBMS 
such as MS or Oracle is it possible to deleveop something like you are all-inclusive database that isn't "sharded"?  Yep, when I 
worked at Netzero in 2001 for example we had two database servers 
running Oracle, one on the east coast in Virginia and one one the west 
coast in California.  The east coast server was a backup of the west 
coast server.  So one database server did the billing for all of 
Netzero's customers.  Millions of customers..absolutely.  All in one 
nice tidy box that I am sure was easier to develop the billing 
applications around.

Here is the kicker.  Each box was a top of the line Sun server that had 
32 processors and 32 gigs of RAM.  They could handle up to 64 procs and 
64 gigs.  And each cost well over a million dollars for the hardware 
alone.  Running Oracle on it must have cost over 100,000 dollars for 
software licenses.  Granted this was in 2001, but the licensing cost for 
Oracle haven't gone down any that I am aware of...and the hardware cost 
will still be quite steep to do this type of thing. 

So I ask you this..

Would it be better to go with that scenario or something like this:

Implement the billing application using MySQL.  Shard it.  Create 
complexity.  Your hardware cost saving alone will pay for multiple 
developers to handle any complexity increases.  Any decent DBA is going 
to be able to handle multiple servers required to operate this setup.  
You will probably see a decrease in salary cost moving from Oracle to 
MySQL dbas. 

So for the bottom line of the company it is a overall win by far.  It is 
only the inherent difficulty in moving complex systems from one type of 
DB to another that keep more companies from switching.  Why hasn't this 
happend previously??  Because until version 4 of MySQL was stable there 
were many features not available in MySQL that were needed by these 
types of systems.

It is my contention that as the clustering capabilities of MySQL 
continue to grow and mature (think of when version 6.0 goes stable) 
companies will move to MySQL in droves.  THEN you have the ability to 
build a single "virtual" database (at least from the point of view of 
your application) that will scale simply and elegantly.  As I said in 
the previous email it is only that 5.1 is in beta that keeps this from 
being available now.  And many companies, such as Kaneva, are doing this 
right now. 

The only reason that companies like Digg and Flikr can exist and grow at 
such phenomenal rates is that they keep the cost of the development of 
the system to a minimum and the overhead of operating (licensing costs 
and hardware cost) down as low as possible.  In addition, of course, 
they need the ability to scale out very quickly.  Digg didn't get any 
significant funding until just recently.  And yet they epitomize the web 
2.0 companies.  They did it by both keeping their cost down and having 
the ability to grow quickly.  Couldn't have done it with Oracle or MS. 

Just my thoughts :)


Naz Gassiep wrote:
> Wow.
>     The problem with sharding I have is the large amount of code
> required in the app to make it work. IMHO the app should be agnostic to
> the underlying database system (by that I don't mean the DB in use such
> as MySQL or whatever or the schema, I mean the way the DB has been
> deployed) so that changes can be made to it without having to worry
> about impacting app code. This is one of my fundamental design imperatives.
>     Then again, I'm not a regular MySQL user so I don't know what is and
> is not the norm in the MySQL world.
> - Naz.
> Evaldas Imbrasas wrote:
>> You certainly have a right to disagree, but pretty much every
>> scalability talk at the MySQL conference a few weeks ago was focused
>> on data partitioning and sharding. And those talks very given by folks
>> working for some of the most popular (top 100) websites in the world.
>> It certainly looks like data partitioning is the way to go in the
>> MySQL world at this point, probably at least until production-ready
>> and feature-full MySQL Cluster is out. And even then large percentage
>> of dotcom companies would use data partitioning instead since it can
>> be implemented on commodity hardware.
>> Once again, we're talking *really* big websites using MySQL (not
>> Oracle or SQL Server or whatever) here. Most websites won't ever need
>> to partition their production databases, and different RDMS might have
>> different approaches for scalability.
>> On 5/24/07, Naz Gassiep <naz@stripped> wrote:
>>> Data partitioning? Sorry, I disagree that partitioning a table into more
>>> and more servers is the way to scale properly. Perhaps putting
>>> databases' tables onto different servers with different hardware
>>> designed to meat different usage patterns is a good idea, but data
>>> partitioning was a very short lived idea in the world of databases and
>>> I'm glad that as an idea it is dying in practice.

Integrity on large sitesNaz Gassiep24 May
  • Re: Integrity on large sitesPeter Brawley24 May
  • Re: Integrity on large sitesMartijn Tonies24 May
  • Re: Integrity on large sitesPhilip Mather24 May
  • Re: Integrity on large sitesEvaldas Imbrasas24 May
    • Re: Integrity on large sitesNaz Gassiep25 May
      • Re: Integrity on large sitesB. Keith Murphy25 May
      • Re: Integrity on large sitesEvaldas Imbrasas25 May
        • Re: Integrity on large sitesNaz Gassiep25 May
          • Re: Integrity on large sitesB. Keith Murphy25 May
            • Re: Integrity on large sitesBarry Newton25 May
              • Re: Integrity on large sitesNaz Gassiep25 May
            • Re: Integrity on large sitesNaz Gassiep25 May
          • Re: Integrity on large sitesJeremy Cole26 May
  • Re: Integrity on large sitesMartijn Tonies25 May
RE: Integrity on large sitesRhys Campbell25 May