List:General Discussion« Previous MessageNext Message »
From:Bernd Jagla Date:March 13 2009 7:07am
Subject:question on loading data and generating uniq table
View as plain text  

I wanted to ask some more experienced mysql users to give me some advice on
a project I am currently planning.


I have a text file with three columns: strName(char6), position(integer),

This file has some 3 billion rows (3,000,000,000). There are some "str"s
that are duplicated and eventually I want to create two tables: uniqStr and
posIDX, where

uniqStr has the columns ID (integer,primary) and str (uniq,char36,index)

posIDX has the columns uniqStr_ID, strName,position

(maybe the strName can be moved to a separate table as well and just
referenced in posIDX)


Now, what I would do is load the text file into a table;

index on str; 

create the table uniqStr using a select statement;

create the table posIDX by joining the first and second table.


I am not sure this is the fastest way of doing things. Maybe creating the
uniq sequences first using e.g. uniq on the command line would be faster?
(Thereby skipping loading the first file and creating the index) 


The str where created using a sliding window on a few very long strings.

In the very end I want to search for millions of new "str"s and figure out
if and in which string and at which position they are located. (I am looking
for exact matches)

Maybe this is not even a database problem, but could be solved easier with
different tools?


Thanks for any advice/comment.



question on loading data and generating uniq tableBernd Jagla13 Mar