List:General Discussion« Previous MessageNext Message »
From:Alexey Polyakov Date:April 23 2006 9:12pm
Subject:Re: Optimising for many rows and returned records (de-coupling query time to record set size for range queries)
View as plain text  
On 4/23/06, Nick Hill <nick@stripped> wrote:

> 1) I can improve performance by a factor of 2-2.5 by changing the double
> lat/lon to an integer then selecting on an integer.
>
> 2) I have concluded that for each 10 fold increase in the number of
> records, select queries take twice as long. For each doubling of the
> number of returned records, there is a sqrt(2) increase in select query
> time.

I've noticed a couple things.
1) Right now you're emulating spatial index.
2) In future, you're going to emulate partitioning.

Why do you think that doing this stuff manually is better than using
builtin capabilities?

> As the database grows, it would likely improve database performance by
> splitting an individual table into several thousand tables using the
> file system directory btree algorithm to effectively pre-select the data
> before the query is handled to the MySQL engine. This is not a neat
> solution. A much better way would be to improve the mysql index
> performance on very large numbers of records.

Selects against a table use b-trees too. Splitting data into lot of
tables won't help with selects at all (well, it may help on scans with
concurrent large data sets if data will be spread across different
physical drives, but not with regular range lookups that you're
doing). It will only help with inserts.

> Given that there is such a strong relationship between the number of
> records returned, and query time, I conclude that the whole index tree
> is matched for every given number of root x records returned. If all
> records we are matching are under a single node or under a small number
> of nodes in the index tree, perhaps there is some way of telling the
> database engine to ignore the rest of the index tree.

What is a 'root record'? Are you speaking about internal
representation of b-tree?

> Could this work, or am I misunderstanding how the index tree works? Are
> there existing optimisations which can de-couple the relationship
> between number of records and query time where the records I am
> selecting are within a small range?

For studying select query performance issues it's better think about
index as simply about a sorted array with random-access, where each
random access costs O(lgN) and accesses to adjanced data cost O(1).
If your points are spread uniformly in space, cost of select query
you've shown is O(N*lgN)


--
Alexey Polyakov
Thread
Optimising for many rows and returned records (de-coupling querytime to record set size for range queries)Nick Hill23 Apr
  • Re: Optimising for many rows and returned records (de-coupling querytime to record set size for range queries)Adam Wolff23 Apr
    • Re: Optimising for many rows and returned records (de-coupling querytime to record set size for range queries)Adam Wolff23 Apr
    • Re: Optimising for many rows and returned records (de-coupling querytime to record set size for range queries)Nick Hill24 Apr
      • Re: Optimising for many rows and returned records (de-coupling querytime to record set size for range queries)Adam Wolff24 Apr
  • Re: Optimising for many rows and returned records (de-coupling query time to record set size for range queries)Alexey Polyakov23 Apr
    • Re: Optimising for many rows and returned records (de-coupling querytime to record set size for range queries)Nick Hill23 Apr