List:General Discussion« Previous MessageNext Message »
From:Dan Nelson Date:May 16 2001 3:09am
Subject:Re: Large scale statistical analysis
View as plain text  
In the last episode (May 15), Seth Northrop said:
> We are performing some rather extensive data collection
> (measurements) and are attempting to come up with the most sane
> storage mechanism to facilitate offline (Igor, MatLab, Custom Apps)
> and online (less complex web based solutions) analysis tools.  The
> problem resides in the amount of data collected; by rough estimates
> we anticipate collecting 3-5 billion rows of data (presently, this
> can be represented in 2D plots; (ie, x,y data).. though, eventually
> data will be collected in more complex fashion, but, the above
> example makes our storage problems a bit easier to digest) per data
> collection cycle.  Of course, that number could fluctuate up or down.
> The key here is that it is a lot of data.

One question to ask yourself is "what data am I likely to do queries
on?"  If you never plan on running queries based on individual sample
values, don't bother putting each point in its own row.  You can still
store things like max/min values for each cycle, so you can do queries
like "Give me all the sample groups where the recorded temperature
exceeded 50 degrees".

I think a modified solution 2 is a good way to go.  Instead of
referencing external files, just use a blob field in your database, or
better yet, create a separate table with id + blob, to keep your
primary table small and easy to query.

-- 
	Dan Nelson
	dnelson@stripped
Thread
Large scale statistical analysisSeth Northrop16 May
  • Re: Large scale statistical analysisDan Nelson16 May
Re: Large scale statistical analysisBob Hall16 May