From: Rick James Date: June 14 2012 5:25pm Subject: RE: NoSQL help List-Archive: http://lists.mysql.com/mysql/227662 Message-Id: <2E7DD7ADE53B044C8C8BCD9C5829E1EB1487607E90@SP2-EX07VS01.ds.corp.yahoo.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable > the report out of 60 days in terms of second, minute, hourly, weekly and = Monthly report also 1-second reports?? Wouldn't that have millions of rows? Is there anything= useful to summarize? For Second and Minute over narrow ranges, you might do fine reading the raw= data. For hourly reports an hourly summary table might be 'right'. For daily, weekly, and monthly, have a daily summary table. There would be= one row per widget per day. > -----Original Message----- > From: Manivannan S. [mailto:manivannan_s@stripped] > Sent: Thursday, June 14, 2012 2:33 AM > To: mysql@stripped > Subject: RE: NoSQL help >=20 > I tried with myisam engine also. But it also taking more time to > generate the report. In my database I am having 8 innodb tables and at > the same time I am joining 4 tables to get the report. >=20 > I am maintaining 60days records because the user will try to generate > the report out of 60 days in terms of second, minute, hourly, weekly > and Monthly report also. >=20 > From: Ananda Kumar [mailto:anandkl@stripped] > Sent: Thursday, June 14, 2012 12:32 AM > To: Rick James > Cc: Johan De Meersman; Manivannan S.; mysql@stripped > Subject: Re: NoSQL help >=20 > Did you try with myisam tables. > They are supposed to be good for reporting requirement On Wed, Jun 13, > 2012 at 11:52 PM, Rick James inc.com>> wrote: > I'll second Johan's comments. >=20 > "Count the disk hits!" >=20 > One minor change: Don't store averages in the summary table; instead > store the SUM(). That lets you get the mathematically correct AVERAGE > over any time range via > SUM(sum_foo) / SUM(count_foo) >=20 > Switching between MySQL and Mongo requires rewriting _all_ of the > relevant code. >=20 > NoSQL will be no better than MySQL for 150GB. > "Count the disk hits!" >=20 > I recently built a system that topped out at 350GB (90 days' data). It > involved hourly ingestion of a few GB of data and a variety of > "reports". The prototype showed that most reports would take about an > hour to run. Not good. The final product, with summary tables, lets > the reports be run on-demand and online and each takes only a few > seconds. By careful use of MEMORY tables, LOAD DATA, etc, the > ingestion takes 5 minutes (each hour) for the raw data and 2 minutes > (total) for the 7 summary tables. PARTITIONing was vital for the > design. Once an hour a new partition is populated; once a day, 24 > hourly partitions are rolled into a new daily partition and the 90-day > old partition is DROPped. >=20 >=20 > > -----Original Message----- > > From: Johan De Meersman > > [mailto:vegivamp@stripped] > > Sent: Wednesday, June 13, 2012 6:20 AM > > To: Manivannan S. > > Cc: mysql@stripped > > Subject: Re: NoSQL help > > > > > > ----- Original Message ----- > > > From: "Manivannan S." > > > > > > > > > > > > > > Hi all, > > > > > > [lots of data] > > > [slow reports] > > > [wooo NoSQL magic] > > > > Not that I want to discourage you, but my standard first question is > > "why do you think NoSQL (let alone any specific product) is the right > > solution?" :-) > > > > Don't get me wrong, it might be; but from what little I now know > about > > your environment, it sounds like applying some data warehousing > > techniques might suffice - and being the cynical dinosaur that I am, > I > > have a healthy reluctance about welding new technology onto a stable > > environment. > > > > To speed up reporting (and note that these techniques are often > > applied even when implementing NoSQL solutions, too) it is usually a > > good first step to set up a process of data summarization. > > > > Basically, you pre-calculate averages, medians, groupings, whatever > > you need for your reports; and your job also saves the last record > IDs > > it's processed; then on the next run, you only read the new records > > and update your summary tables to incorporate the new data. > > > > Suppose I have a table like this: > > > > ID | Val > > -------- > > 1 1 > > 2 7 > > 3 5 > > 4 13 > > > > I want to report the average on a daily basis, and calculating that > > over those rows is unbearably slow because I'm running the process on > > a wristwatch from 1860 :-) > > > > So I get a summary table, calculate (1+7+5+13)/4 =3D 6.5 and that then > > gets a record saying this: > > > > Avg | elementCount | lastSeen > > ----------------------------- > > 6.5 4 4 > > > > Now, over the course of the day, the elements 4, 17 and 2 get added > > with sequential row numbers. Instead of calculating > > (1+7+5+13+4+17+2)/7, which would be slow; I can substitute the > already > > summarized data by Avg*elementCount. Thus, I calculate (6.5*4 + > > 4+17+2)/7 =3D 7, which is a lot faster, and my summary table now looks > > like this: > > > > Avg | elementCount | lastSeen > > ----------------------------- > > 7 7 7 > > > > This is of course a stupid example, but it saves you a lot of time if > > you already have the summary of several thousand elements and only > > need to update it for a handful. Similar tricks are possible for a > lot > > of typical reporting stuff - you don't need to re-calculate data for > > past months over and over again, for instance - and that's what makes > > your reports run fast. > > > > > > Just my 2 cents :-) > > /johan > > > > -- > > Bier met grenadyn > > Is als mosterd by den wyn > > Sy die't drinkt, is eene kwezel > > Hy die't drinkt, is ras een ezel > > > > -- > > MySQL General Mailing List > > For list archives: http://lists.mysql.com/mysql > > To unsubscribe: http://lists.mysql.com/mysql >=20 > DISCLAIMER: This email message and all attachments are confidential and > may contain information that is privileged, confidential or exempt from > disclosure under applicable law. If you are not the intended > recipient, you are notified that any dissemination, distribution or > copying of this email is strictly prohibited. If you have received this > email in error, please notify us immediately by return email or to > mailadmin@stripped and destroy the original message. Opinions, > conclusions and other information in this message that do not relate to > the official business of SPAN, shall be understood to be neither given > nor endorsed by SPAN.