List:General Discussion« Previous MessageNext Message »
From:Paul DuBois Date:April 28 2003 2:11am
Subject:Re: utility for importing/cleaning data
View as plain text  
At 18:57 -0700 4/27/03, Ray Kiddy wrote:
>I am writing a utility for importing data into MySQL. The new thing 
>about my utility is that it will behave deterministically when 
>dealing with warnings or errors and data validation issues.
>
>I am thinking it will operate in two modes.
>
>The first one is easier. It will take a file that contains such as:
>
>	col1,col2,col3
>	1,foo,bar
>	2,judy,frank
>	3,adam,bob,extra
>	4,alpha,beta
>
>and create a table with 3 columns: col1 CHAR(1), col2 VARCHAR(5), 
>col3 VARCHAR(5), extra1 VARCHAR(5) and put the 4 lines into the 
>table.
>
>The benefit of this approach is that:
>
>1) it will create exactly the columns needed by the data
>2) the columns will be as big as they need to be to hold the data
>3) there will be no data loss from importing.
>
>Imagine that. No data loss. No message about 13274 warnings and no 
>way to find out what the heck happened.
>
>This mode will also take an existing table and add columns or 
>re-size the columns to get the data to fit.
>
>The second mode (or second tool) will do data validation.
>
>The idea is this. Say that if you have a bunch of data and one 
>column is a number, you can import it to be in a SMALLINT column and 
>then, if there is a problem fitting the data into the table, you can 
>separate that line out of the data file, instead of just shoving it 
>into the database, messing up the value and hoping for the best.
>
>I am writing this in C to help with its performance.
>
>I am very open to suggestions or ideas.

You might want to visit http://www.kitebird.com/mysql-cookbook and
download the recipes distribution.  Then look in the transfer directory,
where you'll find a couple of Perl scripts that are relevant to your
task:

# guess_table.pl - characterize the contents of a data file and use the
# information to guess a CREATE TABLE statement for the file

# load_diag.pl - LOAD DATA diagnostic
# Simulates the effect of running a LOAD DATA statement, but attempts
# to determine which rows and columns result in warnings or errors.

They might give you some ideas.

>
>thanx - ray


-- 
Paul DuBois
http://www.kitebird.com/
sql, query
Thread
utility for importing/cleaning dataUnknown Sender28 Apr
  • Re: utility for importing/cleaning dataPaul DuBois28 Apr