List:General Discussion« Previous MessageNext Message »
From:Tim Gustafson Date:June 24 2010 11:52pm
Subject:Re: selecting data for computation
View as plain text  
> select * from work_packets where work_data is null and get_lock(id,0)=1
> limit 1;

Sorry, I should have said that my example was pseudo-code that I was
typing from memory.  The query you've suggested is certainly an
improvement - I don't know why I though that the get_lock needed to be in
a subquery, but yours is certainly faster.

> My guess is your initial reason for ORDER BY RAND() was to minimize
> duplicate work, but if you're using locks it'll never happen, so you might
> as well just grab the first available packet.  That query shouldn't even
> take 2 seconds, unless you have 50k rows in the table and no index on
> work_data...

Your guess is correct.  I've re-written the query more or less as you've
suggested and it now takes 4 seconds to complete, which I think is

There is not an index on the work_data column as it's a longblob and I was
under the impression that indexing a longblob field wasn't helpful.  Maybe
I should add a work_data_size field as an integer, index that, and search
for records where work_data_size = 0.

For what it's worth, the table currently has 800,000 rows in it and is
likely to expand in the future.

Tim Gustafson

selecting data for computationTim Gustafson24 Jun
  • Re: selecting data for computationDan Nelson24 Jun
    • Re: selecting data for computationTim Gustafson25 Jun
      • Re: selecting data for computationTim Gustafson25 Jun
        • Re: selecting data for computationDan Nelson25 Jun
          • Re: selecting data for computationTim Gustafson26 Jun
            • Re: selecting data for computationBaron Schwartz28 Jun
              • Re: selecting data for computationTim Gustafson1 Jul
      • Re: selecting data for computationJoerg Bruehe25 Jun