----- Original Message -----
> From: "Manivannan S." <manivannan_s@stripped>
>
> Hi all,
>
> [lots of data]
> [slow reports]
> [wooo NoSQL magic]
Not that I want to discourage you, but my standard first question is "why do you think
NoSQL (let alone any specific product) is the right solution?" :-)
Don't get me wrong, it might be; but from what little I now know about your environment,
it sounds like applying some data warehousing techniques might suffice - and being the
cynical dinosaur that I am, I have a healthy reluctance about welding new technology onto
a stable environment.
To speed up reporting (and note that these techniques are often applied even when
implementing NoSQL solutions, too) it is usually a good first step to set up a process of
data summarization.
Basically, you pre-calculate averages, medians, groupings, whatever you need for your
reports; and your job also saves the last record IDs it's processed; then on the next
run, you only read the new records and update your summary tables to incorporate the new
data.
Suppose I have a table like this:
ID | Val
--------
1 1
2 7
3 5
4 13
I want to report the average on a daily basis, and calculating that over those rows is
unbearably slow because I'm running the process on a wristwatch from 1860 :-)
So I get a summary table, calculate (1+7+5+13)/4 = 6.5 and that then gets a record saying
this:
Avg | elementCount | lastSeen
-----------------------------
6.5 4 4
Now, over the course of the day, the elements 4, 17 and 2 get added with sequential row
numbers. Instead of calculating (1+7+5+13+4+17+2)/7, which would be slow; I can
substitute the already summarized data by Avg*elementCount. Thus, I calculate (6.5*4 +
4+17+2)/7 = 7, which is a lot faster, and my summary table now looks like this:
Avg | elementCount | lastSeen
-----------------------------
7 7 7
This is of course a stupid example, but it saves you a lot of time if you already have the
summary of several thousand elements and only need to update it for a handful. Similar
tricks are possible for a lot of typical reporting stuff - you don't need to re-calculate
data for past months over and over again, for instance - and that's what makes your
reports run fast.
Just my 2 cents :-)
/johan
--
Bier met grenadyn
Is als mosterd by den wyn
Sy die't drinkt, is eene kwezel
Hy die't drinkt, is ras een ezel