They'd all be identical DBs/schemas. And if we updated schema on one,
we'd update all. We're talking like 100-200 operators total (they are
airlines). They just have TONS of data each.
So one of my questions that I was hoping to hear answered (and maybe I
missed it) is this:
Are mySQL reports going to be faster in tables that are exclusive to a
single operator? In other words, if we have 100 operators data in a
single table (let's say millions of rows PER operator), are those
queries any slower (assuming proper indexes and an operator ID and all
that business). Or is the nature of B-Trees such that it shouldn't
matter either way? Logic would dictate that the RDBMS has to sift
through all the records that are NOT for a given operator (of which 99%
of the data would not be, correct?), so that seems like the hard drive
has to jump over gobs and blobs of INT/VARCHAR/TEXT/etc. that it doesn't
need. OR, again, is the filesystem/RDMBS/innodb file structured in such
a way that the index knows which inode to pop to instantly for each row,
so irrelevant rows are negligible hits to seek/read times?
And related, are there any limitations to the number of databases we can
use in a join for internal reports (or perhaps the length of a single
SQL statement). Obviously we would construct a query in PHP or Python or
something to assemble all the operator 1 ... 200 database names, so it's
not like we'd hand-craft that. But it could get a fairly lengthy
string/SQL/query to pass off to mySQL. For a simple example, perhaps we
want to know how many 'foo' there are in a given table per Operator, and
then a total of all 'foo'. (and yes this contrived example could be done
in a loop and tally, but you get the concept).
From: Arthur Fuller <fuller.artful@stripped>
To: Daevid Vincent <daevid@stripped>
Cc: mysql <mysql@stripped>
Subject: Re: Separate customer databases vs all in one
Date: Tue, 17 Mar 2009 21:17:00 -0400
Are these databases identical or merely similar? If they are structurally
identical, I'd go for one database per customer. Then you have isolation,
easy structure updates and above all, consistent front-end code, in
whatever language that occurs. Just obtain the customer ID and then use the
appropriate database. Everything else can remain the same.
The only fly in the ointment concerns whether you'd ever have the customer
need to cross databases. I would imagine that sort of thing is for internal
use, not the customers. In that case, the performance hit if any won't
impact upon the customer, just you.
On Tue, Mar 17, 2009 at 8:21 PM, Daevid Vincent <daevid@stripped> wrote:
> I'm writing a report tool wherein we have many customers who subscribe
> to this SaaS. There are millions of rows of data per customer. All
> customers are islands from each other (of course).
> Are there any major issues or benefits between storing each customer in
> their own database (with their own tables), or all lumped into a single
> At first thought, it seems that by separating them, queries should be
> faster no (as there is less data to sift though per customer)? It of
> course makes upgrading table schema a wee bit more cumbersome, but a
> simple loop and script can handle that easily enough. And since you can
> query across databases, we can still make internal aggregate reports for
> our own usage.
> For example: SELECT * FROM customerA.foo.bar JOIN customerB.foo.bar; or
> we can use UNIONS etc. too.
> Consolidating them into one would seem to bloat the tables and slow
> things down (or is the fact that mySQL uses B-Trees invalidate that
> theory)? It also makes us have to have a customer_id entry in every
> table basically (or some FK to distinguish who's data is who's). It also
> feels like it could leak data if a malformed query were to get through,
> although I'm not terribly worried about this as we do some heavy UAT
> before pushing from DEV to TEST to PROD.
> Performance is a major factor concern here given our huge data sets
> involved. Does joining across databases impose any speed/performance
> hits vs. just joining across tables within a single database?