List:General Discussion« Previous MessageNext Message »
From:Naz Gassiep Date:May 25 2007 3:50pm
Subject:Re: Integrity on large sites
View as plain text  
Hey there, thanks for your comments. There are issues where sharding may
be appropriate, but you are talking about the heaviest of heavy duty
loads. Not only that, hardware is getting to the point where it is
surpassing our needs. Remember the days when it cost $200k to run a
library database? Nowadays I could run such a DB on my old laptop that I
just threw out.

The issue is not *only* application complexity, but that is a *major*
one, and ignoring it is not just a matter of budget allocation, it's the
risk that the complexity hides system collapsing bugs. OneTel, a multi
billion dollar telco in Aus that I worked for in 2001 (as a lemming)
died partly because the billing system just fell over and died one day,
bringing their cash flow to a dead stop. The thing was so complex that
debugging it took longer than the cash reserve they had on hand held out
for, so the company went belly up and died a gruesome death.

I know that monolithic DBs are not manageable after a certain point, but
sharding, in my books, is to be avoided wherever possible due to the
availability of far better solutions. E.g., the use of table spaces to
put each table on its own server. How many companies can say that one of
their tables is so large that no single machine can hold it? This
approach, database partitioning rather that data partitioning, allows
you to design the hardware for each table's access patterns. The other
advantage of this method is that an application that was coded with a
single machine DB can be scaled to this solution without changing a
single line in the app code.

Incidentally, I come from the PostgreSQL world where if you truly *must*
do data sharding, it can be done at the DB level, transparently to the
app code.

- Naz.

B. Keith Murphy wrote:
> OK.  Going to try this again.  After reading through these emails I
> think I have learned a little more about the way you are thinking.
> I DO NOT want to start some kind of flame war.
> However, I disagree very strongly with what you are saying.  Yes, you
> are right, sharding does require more complexity from the application
> layer.  Sorry for all you developers out there (and I can safely say
> that I am NOT a developer!!).
> The fundamental issue for you, as I see it, is the increased
> complexity caused by sharding the application.
> That being said, I will say this...if you develop on some other RDBMS
> such as MS or Oracle is it possible to deleveop something like you are
> all-inclusive database that isn't "sharded"?  Yep, when I
> worked at Netzero in 2001 for example we had two database servers
> running Oracle, one on the east coast in Virginia and one one the west
> coast in California.  The east coast server was a backup of the west
> coast server.  So one database server did the billing for all of
> Netzero's customers.  Millions of customers..absolutely.  All in one
> nice tidy box that I am sure was easier to develop the billing
> applications around.
> Here is the kicker.  Each box was a top of the line Sun server that
> had 32 processors and 32 gigs of RAM.  They could handle up to 64
> procs and 64 gigs.  And each cost well over a million dollars for the
> hardware alone.  Running Oracle on it must have cost over 100,000
> dollars for software licenses.  Granted this was in 2001, but the
> licensing cost for Oracle haven't gone down any that I am aware
> of...and the hardware cost will still be quite steep to do this type
> of thing.
> So I ask you this..
> Would it be better to go with that scenario or something like this:
> Implement the billing application using MySQL.  Shard it.  Create
> complexity.  Your hardware cost saving alone will pay for multiple
> developers to handle any complexity increases.  Any decent DBA is
> going to be able to handle multiple servers required to operate this
> setup.  You will probably see a decrease in salary cost moving from
> Oracle to MySQL dbas.
> So for the bottom line of the company it is a overall win by far.  It
> is only the inherent difficulty in moving complex systems from one
> type of DB to another that keep more companies from switching.  Why
> hasn't this happend previously??  Because until version 4 of MySQL was
> stable there were many features not available in MySQL that were
> needed by these types of systems.
> It is my contention that as the clustering capabilities of MySQL
> continue to grow and mature (think of when version 6.0 goes stable)
> companies will move to MySQL in droves.  THEN you have the ability to
> build a single "virtual" database (at least from the point of view of
> your application) that will scale simply and elegantly.  As I said in
> the previous email it is only that 5.1 is in beta that keeps this from
> being available now.  And many companies, such as Kaneva, are doing
> this right now.
> The only reason that companies like Digg and Flikr can exist and grow
> at such phenomenal rates is that they keep the cost of the development
> of the system to a minimum and the overhead of operating (licensing
> costs and hardware cost) down as low as possible.  In addition, of
> course, they need the ability to scale out very quickly.  Digg didn't
> get any significant funding until just recently.  And yet they
> epitomize the web 2.0 companies.  They did it by both keeping their
> cost down and having the ability to grow quickly.  Couldn't have done
> it with Oracle or MS.
> Just my thoughts :)
> Keith
> Naz Gassiep wrote:
>> Wow.
>>     The problem with sharding I have is the large amount of code
>> required in the app to make it work. IMHO the app should be agnostic to
>> the underlying database system (by that I don't mean the DB in use such
>> as MySQL or whatever or the schema, I mean the way the DB has been
>> deployed) so that changes can be made to it without having to worry
>> about impacting app code. This is one of my fundamental design
>> imperatives.
>>     Then again, I'm not a regular MySQL user so I don't know what is and
>> is not the norm in the MySQL world.
>> - Naz.
>> Evaldas Imbrasas wrote:
>>> You certainly have a right to disagree, but pretty much every
>>> scalability talk at the MySQL conference a few weeks ago was focused
>>> on data partitioning and sharding. And those talks very given by folks
>>> working for some of the most popular (top 100) websites in the world.
>>> It certainly looks like data partitioning is the way to go in the
>>> MySQL world at this point, probably at least until production-ready
>>> and feature-full MySQL Cluster is out. And even then large percentage
>>> of dotcom companies would use data partitioning instead since it can
>>> be implemented on commodity hardware.
>>> Once again, we're talking *really* big websites using MySQL (not
>>> Oracle or SQL Server or whatever) here. Most websites won't ever need
>>> to partition their production databases, and different RDMS might have
>>> different approaches for scalability.
>>> On 5/24/07, Naz Gassiep <naz@stripped> wrote:
>>>> Data partitioning? Sorry, I disagree that partitioning a table into
>>>> more
>>>> and more servers is the way to scale properly. Perhaps putting
>>>> databases' tables onto different servers with different hardware
>>>> designed to meat different usage patterns is a good idea, but data
>>>> partitioning was a very short lived idea in the world of databases and
>>>> I'm glad that as an idea it is dying in practice.
Integrity on large sitesNaz Gassiep24 May
  • Re: Integrity on large sitesPeter Brawley24 May
  • Re: Integrity on large sitesMartijn Tonies24 May
  • Re: Integrity on large sitesPhilip Mather24 May
  • Re: Integrity on large sitesEvaldas Imbrasas24 May
    • Re: Integrity on large sitesNaz Gassiep25 May
      • Re: Integrity on large sitesB. Keith Murphy25 May
      • Re: Integrity on large sitesEvaldas Imbrasas25 May
        • Re: Integrity on large sitesNaz Gassiep25 May
          • Re: Integrity on large sitesB. Keith Murphy25 May
            • Re: Integrity on large sitesBarry Newton25 May
              • Re: Integrity on large sitesNaz Gassiep25 May
            • Re: Integrity on large sitesNaz Gassiep25 May
          • Re: Integrity on large sitesJeremy Cole26 May
  • Re: Integrity on large sitesMartijn Tonies25 May
RE: Integrity on large sitesRhys Campbell25 May