List:General Discussion« Previous MessageNext Message »
From:Jay J Date:August 2 1999 9:29am
Subject:Re: store array of integers using a BLOB? [success]
View as plain text  
> On Mo, 1999-08-02 01:38:38 -0500, Jay Jarvinen <3pound@stripped> wrote:
> > In effect, I want to store a huge array of binary-coded integers in a
> > type field. So I could store 0 .. 4294967295 in 4 bytes (plus mysql L+4
> > bytes for the BLOB column), and avoid storing useless commas.
> Storing these integers into a BLOB field to me only would makes sense,
> if you want to concatenate them all into one huge binary string which
> you then store a a single column.

Yes. A massive string of fixed length (4 bytes) concatenated numbers - which
I'd then extract at runtime with something like:

 my @nums = unpack("A4"x( length($bigstring)/4 ), $bigstring); # or
substring or s/// or ...

..... <insert more testing> .....

Silly me. I'm not lost. Not completely.

Same script, adding <perl>$bigint = $bigint x 100</perl> just before the

With 100 (diff of 9)
-rw-rw----   1 mysql    mysql         409 Aug  1 15:11
-rw-rw-r--   1 root     root          400 Aug  1 15:11 mybinnum.txt

My original tests worked with an array, I then simplified to a single number
for testing purposes - that was the gotcha.

With 3 (diff of 9):
-rw-rw----   1 mysql    mysql          21 Aug  1 15:18
-rw-rw-r--   1 root     root           12 Aug  1 15:18 mybinnum.txt

With 2 (diff of 12):
-rw-rw----   1 mysql    mysql          20 Aug  1 15:18
-rw-rw-r--   1 root     root            8 Aug  1 15:18 mybinnum.txt

Extra overhead at <= 2?

In any case, you can see once I insert more than 2 values I get the desired

<superfluous explanation>
Perhaps I wasn't clear on why I wanted something like this.

1) I'm considering using MySQL to store an inverted index for free-text
searching throughout the database. Where I've indexed the word column and
have corresponding 'binary-int-array' blob that holds the document ID's.
2) Search 'firewalling', storing given words/rules for users which are then
applied to each search. Basically the same storage concept, except it's a
user's ID and an array of 'banword' ID's which are indexed.

I can live with 4,294,967,295 being the maximum row-ID I can store.

"Modern Information Retrieval" gives some guidelines where a "medium
collection (200mb)" could be represented by an index 36% that size (72MB).
On my text data, just the first 1000 records has ~10,000 unique words. So
using a seperate row with INT columns would be ~93,000 rows, rather than

Per process overhead is much larger to slurp in a giant array instead of
iterating over the rows, but with cheap memory and CPU power - it seemed a
win over using many many selects. Opinions, ideas?

Is saving 600 bytes per 100 numbers worth the packing/unpacking trouble?
Maybe not, but storing big integers seperated by commas just didn't sit
right with me, and if a given word matched 100,000 doc-ID's .. that's around
600Kb of savings. If 100 words match 100,000 documents ... etc..
</superfluous explanation>

Thanks for the nudge,

-Jay J

store array of integers using a BLOB?Jay J2 Aug
  • Re: store array of integers using a BLOB?Martin Ramsch2 Aug
  • Re: store array of integers using a BLOB? [success]Jay J2 Aug
  • Re: store array of integers using a BLOB? .. correctionJay J2 Aug
  • store array of integers using a BLOB?Michael Widenius3 Aug