Sometimes partitioning is absolutely necessary. If you can't run a
cluster - how else can you really scale writes to the database? Some
companies can't use clustering because in 5.0.x (the "non-beta" release)
clustering is all done in memory - all tables have to be in memory (just
like the old heap tables). It isn't until 5.1.x that clustering allows
your data to be stored on disc. Many companies still consider 5.1 to
not be production ready. You might disagree but that is their
thinking. So, if you don't use clustering, how else are you going to
scale an application?
I suppose you can set up master-master replication - but that doesn't
really scale to a large extent. Some companies have huge applications
with hundreds of gigabytes or even terabytes of data. I think if you
read carefully through the presentations from the recent MySQL
conference by companies such as Digg and Flickr you will find that they
do partitioning as well as caching and such. I recall specifically
reading through a presentation by livejournal about how they split up
their load across multiple machines by the very partitioning we are
talking about.
I might be missing something. I can understand why you wouldn't want to
work on such a system as it certainly adds complexity to the entire
database. But that doesn't mean that it isn't something that isn't
necessary sometimes.
Just my two cents :)
Keith
Naz Gassiep wrote:
> Data partitioning? Sorry, I disagree that partitioning a table into more
> and more servers is the way to scale properly. Perhaps putting
> databases' tables onto different servers with different hardware
> designed to meat different usage patterns is a good idea, but data
> partitioning was a very short lived idea in the world of databases and
> I'm glad that as an idea it is dying in practice.
> - Naz
>
> Evaldas Imbrasas wrote:
>
>> Since the question was about *really* big websites, the answer is both
>> yes and no.
>>
>> Yes, they do turn off RI on the database side, simply because it's not
>> possible to enforce RI on a database system where data is partitioned
>> across server farms (or shards) both vertically and horizontally. And
>> really big websites can't survive without the data partioning.
>>
>> No, they don't usually turn off RI just to improve performance,
>> because the gains would be minimal, and for big websites, scalability
>> is a much bigger issue that performance (although sometimes one
>> depends on the other), and data partitioning is the way to go to solve
>> the scalability problem.
>>
>>
>> On 5/24/07, Naz Gassiep <naz@stripped> wrote:
>>
>>> I'm working in a project at the moment that is using MySQL, and
>>> people keep making assertions like this one:
>>>
>>> "*Really* big sites don't ever have referential integrity. Or if the
>>> few spots they do (like with financial transactions) it's implemented
>>> on the application level (via, say, optimistic locking), never the
>>> database level."
>>>
>>> A large DB working with no RI would give me nightmares. Is it really
>>> true that large sites turn RI off to improve performance? Am I just
>>> being naive in thinking that everyone runs their DBs with RI in
>>> production?
>>>
>>>
>>
>
>