List:General Discussion« Previous MessageNext Message »
From:B. Keith Murphy Date:May 25 2007 1:43am
Subject:Re: Integrity on large sites
View as plain text  
Sometimes partitioning is absolutely necessary.  If you can't run a 
cluster - how else can you really scale writes to the database?  Some 
companies can't use clustering because in 5.0.x (the "non-beta" release) 
clustering is all done in memory - all tables have to be in memory (just 
like the old heap tables).  It isn't until 5.1.x that clustering allows 
your data to be stored on disc.  Many companies still consider 5.1 to 
not be production ready.  You might disagree but that is their 
thinking.  So, if you don't use clustering, how else are you going to 
scale an application? 

I suppose you can set up master-master replication - but that doesn't 
really scale to a large extent.  Some companies have huge applications 
with hundreds of gigabytes or even terabytes of data.  I think if you 
read carefully through the presentations from the recent MySQL 
conference by companies such as Digg and Flickr you will find that they 
do partitioning as well as caching and such.  I recall specifically 
reading through a presentation by livejournal about how they split up 
their load across multiple machines by the very partitioning we are 
talking about.

I might be missing something.  I can understand why you wouldn't want to 
work on such a system as it certainly adds complexity to the entire 
database.  But that doesn't mean that it isn't something that isn't 
necessary sometimes.

Just my two cents  :)


Naz Gassiep wrote:
> Data partitioning? Sorry, I disagree that partitioning a table into more
> and more servers is the way to scale properly. Perhaps putting
> databases' tables onto different servers with different hardware
> designed to meat different usage patterns is a good idea, but data
> partitioning was a very short lived idea in the world of databases and
> I'm glad that as an idea it is dying in practice.
> - Naz
> Evaldas Imbrasas wrote:
>> Since the question was about *really* big websites, the answer is both
>> yes and no.
>> Yes, they do turn off RI on the database side, simply because it's not
>> possible to enforce RI on a database system where data is partitioned
>> across server farms (or shards) both vertically and horizontally. And
>> really big websites can't survive without the data partioning.
>> No, they don't usually turn off RI just to improve performance,
>> because the gains would be minimal, and for big websites, scalability
>> is a much bigger issue that performance (although sometimes one
>> depends on the other), and data partitioning is the way to go to solve
>> the scalability problem.
>> On 5/24/07, Naz Gassiep <naz@stripped> wrote:
>>> I'm working in a project at the moment that is using MySQL, and
>>> people keep making assertions like this one:
>>> "*Really* big sites don't ever have referential integrity. Or if the
>>> few spots they do (like with financial transactions) it's implemented
>>> on the application level (via, say, optimistic locking), never the
>>> database level."
>>> A large DB working with no RI would give me nightmares. Is it really
>>> true that large sites turn RI off to improve performance? Am I just
>>> being naive in thinking that everyone runs their DBs with RI in
>>> production?

Integrity on large sitesNaz Gassiep24 May
  • Re: Integrity on large sitesPeter Brawley24 May
  • Re: Integrity on large sitesMartijn Tonies24 May
  • Re: Integrity on large sitesPhilip Mather24 May
  • Re: Integrity on large sitesEvaldas Imbrasas24 May
    • Re: Integrity on large sitesNaz Gassiep25 May
      • Re: Integrity on large sitesB. Keith Murphy25 May
      • Re: Integrity on large sitesEvaldas Imbrasas25 May
        • Re: Integrity on large sitesNaz Gassiep25 May
          • Re: Integrity on large sitesB. Keith Murphy25 May
            • Re: Integrity on large sitesBarry Newton25 May
              • Re: Integrity on large sitesNaz Gassiep25 May
            • Re: Integrity on large sitesNaz Gassiep25 May
          • Re: Integrity on large sitesJeremy Cole26 May
  • Re: Integrity on large sitesMartijn Tonies25 May
RE: Integrity on large sitesRhys Campbell25 May