List:General Discussion« Previous MessageNext Message »
From:Manivannan S. Date:June 14 2012 9:33am
Subject:RE: NoSQL help
View as plain text  
I tried with myisam engine also. But it also taking more time to generate the report. In
my database I am having 8 innodb tables and at the same time I am joining 4 tables to get
the report.

I am maintaining 60days records because the user will try to generate the report out of 60
days in terms of second, minute, hourly, weekly and Monthly report also.

From: Ananda Kumar [mailto:anandkl@stripped]
Sent: Thursday, June 14, 2012 12:32 AM
To: Rick James
Cc: Johan De Meersman; Manivannan S.; mysql@stripped
Subject: Re: NoSQL help

Did you try with myisam tables.
They are supposed to be good for reporting requirement
On Wed, Jun 13, 2012 at 11:52 PM, Rick James
<rjames@stripped<mailto:rjames@stripped>> wrote:
I'll second Johan's comments.

"Count the disk hits!"

One minor change:  Don't store averages in the summary table; instead store the SUM(). 
That lets you get the mathematically correct AVERAGE over any time range via
SUM(sum_foo) / SUM(count_foo)

Switching between MySQL and Mongo requires rewriting _all_ of the relevant code.

<opinion> NoSQL will be no better than MySQL for 150GB. </opinion>  "Count the
disk hits!"

I recently built a system that topped out at 350GB (90 days' data).  It involved hourly
ingestion of a few GB of data and a variety of "reports".  The prototype showed that most
reports would take about an hour to run.  Not good.  The final product, with summary
tables, lets the reports be run on-demand and online and each takes only a few seconds. 
By careful use of MEMORY tables, LOAD DATA, etc, the ingestion takes 5 minutes (each hour)
for the raw data and 2 minutes (total) for the 7 summary tables.  PARTITIONing was vital
for the design.  Once an hour a new partition is populated; once a day, 24 hourly
partitions are rolled into a new daily partition and the 90-day old partition is DROPped.

> -----Original Message-----
> From: Johan De Meersman [mailto:vegivamp@stripped<mailto:vegivamp@stripped>]
> Sent: Wednesday, June 13, 2012 6:20 AM
> To: Manivannan S.
> Cc: mysql@stripped<mailto:mysql@stripped>
> Subject: Re: NoSQL help
> ----- Original Message -----
> > From: "Manivannan S."
> <manivannan_s@stripped<mailto:manivannan_s@stripped>>
> >
> > Hi all,
> >
> > [lots of data]
> > [slow reports]
> > [wooo NoSQL magic]
> Not that I want to discourage you, but my standard first question is
> "why do you think NoSQL (let alone any specific product) is the right
> solution?" :-)
> Don't get me wrong, it might be; but from what little I now know about
> your environment, it sounds like applying some data warehousing
> techniques might suffice - and being the cynical dinosaur that I am, I
> have a healthy reluctance about welding new technology onto a stable
> environment.
> To speed up reporting (and note that these techniques are often applied
> even when implementing NoSQL solutions, too) it is usually a good first
> step to set up a process of data summarization.
> Basically, you pre-calculate averages, medians, groupings, whatever you
> need for your reports; and your job also saves the last record IDs it's
> processed; then on the next run, you only read the new records and
> update your summary tables to incorporate the new data.
> Suppose I have a table like this:
> ID | Val
> --------
>  1     1
>  2     7
>  3     5
>  4    13
> I want to report the average on a daily basis, and calculating that
> over those rows is unbearably slow because I'm running the process on a
> wristwatch from 1860 :-)
> So I get a summary table, calculate (1+7+5+13)/4 = 6.5 and that then
> gets a record saying this:
> Avg | elementCount | lastSeen
> -----------------------------
> 6.5              4          4
> Now, over the course of the day, the elements 4, 17 and 2 get added
> with sequential row numbers. Instead of calculating
> (1+7+5+13+4+17+2)/7, which would be slow; I can substitute the already
> summarized data by Avg*elementCount. Thus, I calculate (6.5*4 +
> 4+17+2)/7 = 7, which is a lot faster, and my summary table now looks
> like this:
> Avg | elementCount | lastSeen
> -----------------------------
>   7              7          7
> This is of course a stupid example, but it saves you a lot of time if
> you already have the summary of several thousand elements and only need
> to update it for a handful. Similar tricks are possible for a lot of
> typical reporting stuff - you don't need to re-calculate data for past
> months over and over again, for instance - and that's what makes your
> reports run fast.
> Just my 2 cents :-)
> /johan
> --
> Bier met grenadyn
> Is als mosterd by den wyn
> Sy die't drinkt, is eene kwezel
> Hy die't drinkt, is ras een ezel
> --
> MySQL General Mailing List
> For list archives:
> To unsubscribe:

DISCLAIMER: This email message and all attachments are confidential and may contain
information that is privileged, confidential or exempt from disclosure under applicable
law.  If you are not the intended recipient, you are notified that any dissemination,
distribution or copying of this email is strictly prohibited. If you have received this
email in error, please notify us immediately by return email or to
mailadmin@stripped and destroy the original message.  Opinions, conclusions and
other information in this message that do not relate to the official business of SPAN,
shall be understood to be neither given nor endorsed by SPAN.

NoSQL helpManivannan S .13 Jun
  • Re: NoSQL helpJohan De Meersman13 Jun
    • RE: NoSQL helpRick James13 Jun
      • Re: NoSQL helpAnanda Kumar13 Jun
        • RE: NoSQL helpManivannan S .14 Jun
          • Re: NoSQL helpAnanda Kumar14 Jun
            • RE: NoSQL helpManivannan S .14 Jun
            • RE: NoSQL helpManivannan S .14 Jun
              • Re: NoSQL helpAnanda Kumar14 Jun
                • RE: NoSQL helpRick James14 Jun
          • RE: NoSQL helpRick James14 Jun
  • Re: NoSQL helpmos13 Jun
    • Handler?hsv25 Jun