List:General Discussion« Previous MessageNext Message »
From:Johan De Meersman Date:June 13 2012 1:20pm
Subject:Re: NoSQL help
View as plain text  
----- Original Message -----
> From: "Manivannan S." <manivannan_s@stripped>
> 
> Hi all,
> 
> [lots of data]
> [slow reports]
> [wooo NoSQL magic]

Not that I want to discourage you, but my standard first question is "why do you think
NoSQL (let alone any specific product) is the right solution?" :-)

Don't get me wrong, it might be; but from what little I now know about your environment,
it sounds like applying some data warehousing techniques might suffice - and being the
cynical dinosaur that I am, I have a healthy reluctance about welding new technology onto
a stable environment.

To speed up reporting (and note that these techniques are often applied even when
implementing NoSQL solutions, too) it is usually a good first step to set up a process of
data summarization.

Basically, you pre-calculate averages, medians, groupings, whatever you need for your
reports; and your job also saves the last record IDs it's processed; then on the next
run, you only read the new records and update your summary tables to incorporate the new
data.

Suppose I have a table like this:

ID | Val
--------
 1     1
 2     7
 3     5
 4    13

I want to report the average on a daily basis, and calculating that over those rows is
unbearably slow because I'm running the process on a wristwatch from 1860 :-)

So I get a summary table, calculate (1+7+5+13)/4 = 6.5 and that then gets a record saying
this:

Avg | elementCount | lastSeen
-----------------------------
6.5              4          4

Now, over the course of the day, the elements 4, 17 and 2 get added with sequential row
numbers. Instead of calculating (1+7+5+13+4+17+2)/7, which would be slow; I can
substitute the already summarized data by Avg*elementCount. Thus, I calculate (6.5*4 +
4+17+2)/7 = 7, which is a lot faster, and my summary table now looks like this:

Avg | elementCount | lastSeen
-----------------------------
  7              7          7

This is of course a stupid example, but it saves you a lot of time if you already have the
summary of several thousand elements and only need to update it for a handful. Similar
tricks are possible for a lot of typical reporting stuff - you don't need to re-calculate
data for past months over and over again, for instance - and that's what makes your
reports run fast.


Just my 2 cents :-)
/johan

-- 
Bier met grenadyn
Is als mosterd by den wyn
Sy die't drinkt, is eene kwezel
Hy die't drinkt, is ras een ezel
Thread
NoSQL helpManivannan S .13 Jun
  • Re: NoSQL helpJohan De Meersman13 Jun
    • RE: NoSQL helpRick James13 Jun
      • Re: NoSQL helpAnanda Kumar13 Jun
        • RE: NoSQL helpManivannan S .14 Jun
          • Re: NoSQL helpAnanda Kumar14 Jun
            • RE: NoSQL helpManivannan S .14 Jun
            • RE: NoSQL helpManivannan S .14 Jun
              • Re: NoSQL helpAnanda Kumar14 Jun
                • RE: NoSQL helpRick James14 Jun
          • RE: NoSQL helpRick James14 Jun
  • Re: NoSQL helpmos13 Jun
    • Handler?hsv25 Jun