List:General Discussion« Previous MessageNext Message »
From:J S Date:June 18 2004 1:40pm
Subject:Re: load data into 2 tables and set id
View as plain text  
Shawn,

Thanks for helping on this. I really appreciate it.

>No problem!!
>
>Please post the structures of your "big_table" and your "url_table"
>(whatever you called them)  and I will help you to rewrite step 4 to count
>how many times a URL appears in the "big_table".

mysql> desc internet_usage;
+--------+-------------+------+-----+---------+-------+
| Field  | Type        | Null | Key | Default | Extra |
+--------+-------------+------+-----+---------+-------+
| uid    | varchar(10) | YES  | MUL | NULL    |       |
| time   | datetime    | YES  |     | NULL    |       |
| ip     | varchar(20) | YES  |     | NULL    |       |
| action | varchar(20) | YES  |     | NULL    |       |
| urlid  | int(11)     | YES  |     | NULL    |       |
| size   | int(11)     | YES  |     | NULL    |       |
+--------+-------------+------+-----+---------+-------+
6 rows in set (0.03 sec)

mysql> desc url_table;
+----------+--------------+------+-----+---------+----------------+
| Field    | Type         | Null | Key | Default | Extra    |
+----------+--------------+------+-----+---------+----------------+
| urlid    | mediumint(9) |      | PRI | NULL    | auto_increment |
| url      | text         | YES  | MUL | NULL    |                |
| hits     | mediumint(9) | YES  |     | NULL    |                |
| category | varchar(50)  | YES  |     | NULL    |                |
+----------+--------------+------+-----+---------+----------------+
4 rows in set (0.01 sec)

mysql>

>
>Is this bulk import a process you need to do repeatedly? if so we need to
>worry about updating the count column on the next batch import and not just
>re-creating it from the new data. That will change the query significantly.

Yes it will be a repeated process. Actually I have a backlog of 6 months 
data to load!

Here's an example of what the data looks like:

mysql> select * from internet_usage limit 5;
+---------+---------------------+----------------+--------------+-------+------+
| uid     | time                | ip             | action       | urlid | 
size |
+---------+---------------------+----------------+--------------+-------+------+
| n58396  | 2004-06-07 21:12:16 | 21.38.25.204   | TCP_TUNNELED |  5999 | 
5297 |
| u344584 | 2004-06-07 21:07:12 | 21.33.136.74   | TCP_HIT      |  4494 |  
438 |
| -       | 2004-06-07 21:07:02 | 21.38.92.76    | TCP_NC_MISS  |  2134 |  
771 |
| u524797 | 2004-06-07 21:03:27 | 21.32.25.41    | TCP_NC_MISS  |   260 |  
582 |
| -       | 2004-06-07 21:09:13 | 21.201.130.240 | TCP_HIT      |  3112 |  
542 |
+---------+---------------------+----------------+--------------+-------+------+

mysql> select * from url_table limit 5;
+-------+-----------------------------------------------------------------------------------+------+---------------+
| urlid | url                                                                
                | hits | category      |
+-------+-----------------------------------------------------------------------------------+------+---------------+
|     1 | 
http://www.bbc.co.uk/horoscopes/chinesehoroscopes/images/hp-snake.gif | NULL 
| Entertainment |
|     2 | http://www.call18866.co.uk/images/logo.jpg                         
                | NULL | none          |
|     3 | http://m2.doubleclick.net/866421/0409_santoku_250.gif              
                | NULL | none          |
|     4 | http://lysto1-dc02.ww.ad.ba.com/                                   
                | NULL | Travel        |
|     5 | http://www.aboutscotland.com/edin/newstreetph/sitview.jpg          
                | NULL | Travel        |
+-------+-----------------------------------------------------------------------------------+------+---------------+
5 rows in set (0.00 sec)


One other problem I'm having here is making the rows in internet_usage 
unique. At the moment I have lots of duplicates, and I was trying to create 
a temporary table but unfortunately got an error 27 (I think this refers to 
a 2GB limit).

mysql> CREATE TEMPORARY TABLE bad_temp AS SELECT DISTINCT * FROM 
internet_usage;
ERROR 1030: Got error 27 from table handler

Is there another way of doing this?

>
>Yours,
>Shawn Green
>Database Administrator
>Unimin Corporation - Spruce Pine
>
> >
> >Mos forgot to populate the url_id column in your user table. I would use
> >his same process but re-arrange it like this:
> >
> >1) create table BIG_TABLE....
> >2) load data infile....
> >3) create table URL_TABLE (
> >       url_id bigint not null auto_increment,
> >       url varchar(25) not null primary key,
> >       category ....
> >       )
> >4) INSERT IGNORE URL_TABLE (url, category)
> >       SELECT url,category
> >       FROM BIG_TABLE
> >4) create table USER_TABLE (
> >       user_id varchar?(...) primary key,
> >       date_time datetime,
> >       url_id bigint,
> >       size int
> >       )
> >
> >5) insert USER_TABLE
> >       SELECT bt.userID, bt.datetime, u.url_id, bt.size
> >       FROM BIG_TABLE bt
> >       INNER JOIN URL_TABLE u
> >       ON u.url = bt.url
> >
> >doing it this way lets the SQL engine handle the conversion of URLs to
> >their IDs in the USER_TABLE...see? No scripting required at all!
> >
> >Yours,
> >Shawn Green
> >Database Administrator
> >Unimin Corporation - Spruce Pine
> >
> >
>
>Shawn,
>
>Many thanks for your reply. It sounds like this might be quicker than my
>perl script which parses a 1GB log file and fills the database in 40 mins
>at
>the mo. (The actual parsing only takes 1.5 mins).
>There's one snag which I forgot about and that's in the url_table I have
>another column called hits which is the number of hits for each url.  I'm
>terrible at SQL and wondered if you might be able to suggest a way of doing
>
>this with the method above?
>
>Thanks,
>
>JS.
>
>_________________________________________________________________
>It's fast, it's easy and it's free. Get MSN Messenger today!
>http://www.msn.co.uk/messenger
>
>
>
>
>
>
>
>--
>MySQL General Mailing List
>For list archives: http://lists.mysql.com/mysql
>To unsubscribe:    http://lists.mysql.com/mysql?unsub=1
>

_________________________________________________________________
Want to block unwanted pop-ups? Download the free MSN Toolbar now!  
http://toolbar.msn.co.uk/

Thread
load data into 2 tables and set idJ S9 Jun
  • Re: load data into 2 tables and set idWilliam R. Mussatto9 Jun
  • Re: load data into 2 tables and set idmos9 Jun
Re: load data into 2 tables and set idJ S9 Jun
  • Re: load data into 2 tables and set idWilliam R. Mussatto9 Jun
Re: load data into 2 tables and set idJ S10 Jun
Re: load data into 2 tables and set idSGreen10 Jun
Re: load data into 2 tables and set idJ S18 Jun
Re: load data into 2 tables and set idSGreen18 Jun
Re: load data into 2 tables and set idJ S18 Jun
Re: load data into 2 tables and set idSGreen18 Jun
Re: load data into 2 tables and set idJ S18 Jun
Re: load data into 2 tables and set idSGreen18 Jun
Re: load data into 2 tables and set idJ S22 Jun
Re: load data into 2 tables and set idJ S22 Jun
Re: load data into 2 tables and set idJ S22 Jun
Re: load data into 2 tables and set idSGreen22 Jun
Re: load data into 2 tables and set idJ S22 Jun
Re: load data into 2 tables and set idJ S23 Jun
Re: load data into 2 tables and set idSGreen23 Jun
Re: load data into 2 tables and set idJ S23 Jun
  • Re: load data into 2 tables and set idMichael Stassen23 Jun
Re: load data into 2 tables and set idSGreen23 Jun
Re: load data into 2 tables and set idJ S23 Jun
Re: load data into 2 tables and set idSGreen23 Jun
Re: load data into 2 tables and set idJ S26 Jun
Re: load data into 2 tables and set idJ S26 Jun