For several years I have been involved with developing retrieval
software for a bibliographic database (the Current Index to
Statistics, see http://www.statindex.org/). Because the data were
only updated once a year we used a custom set of static indices and
custom retrieval software.
Recently I started using MySQL for another project. I have been so
impressed with its speed and flexibility that I think we could abandon
our custom methods and instead use a set of MySQL tables.
I would appreciate hearing from anyone who has used MySQL for an
application like this. In particular, I am considering the design of
the tables for the authors and how I could best facilitate searching
for multiple authors and I would appreciate some advice. The most
common type of retrieval involves intersecting the sets of references
for multiple authors or for authors and keywords/words in titles. I
want to make sure that this can be done quickly.
Consider, for example, a table of journal articles laid out as
create table articles (
seq int not null primary key,
title text not null, /* some titles exceed 255 characters */
jour smallint not null, /* separate table of journal names */
year smallint not null,
volume smallint,
pageS smallint, /* starting page */
pageF smallint /* final page */
)
The authors and keywords are stored in separate tables. Because of
the way we get the data, I think I will store the complete name of the
author in one column. The tables would be
create table authors (
seq int not null,
name varchar(50) not null,
index (seq),
index (name))
create table keywords (
seq int not null,
keyword varchar(30) not null,
index (seq),
index (keyword))
There can be several authors for the same journal article. How do I
find the articles written by a particular combination of authors? For
example, if the table contained
mysql> select * from articles;
+-----+--------------------+------+------+--------+-------+-------+
| seq | title | jour | year | volume | pageS | pageF |
+-----+--------------------+------+------+--------+-------+-------+
| 1 | Now is the time | 1 | 1999 | 20 | 1 | 200 |
| 2 | For all good men | 1 | 1999 | 20 | 201 | 222 |
| 3 | To come to the aid | 3 | 1999 | 20 | 223 | 240 |
+-----+--------------------+------+------+--------+-------+-------+
3 rows in set (0.00 sec)
mysql> select * from authors;
+-----+---------+
| seq | name |
+-----+---------+
| 1 | Able |
| 1 | Baker |
| 2 | Able |
| 2 | Charlie |
+-----+---------+
4 rows in set (0.00 sec)
how do I select only those articles that were written by 'Able' and
'Charlie' (article 2, in this case)? I can easily get those written
by 'Able' using
mysql> select a.* from articles as a, authors as w where a.seq = w.seq
-> and w.name like 'Able';
+-----+------------------+------+------+--------+-------+-------+
| seq | title | jour | year | volume | pageS | pageF |
+-----+------------------+------+------+--------+-------+-------+
| 1 | Now is the time | 1 | 1999 | 20 | 1 | 200 |
| 2 | For all good men | 1 | 1999 | 20 | 201 | 222 |
+-----+------------------+------+------+--------+-------+-------+
2 rows in set (0.00 sec)
but for 'Able' and 'Charlie' I keep coming up with an empty set
mysql> select a.* from articles as a, authors as w1, authors as w2
-> where a.seq = w1.seq and w1.seq = w2.seq and w1.name like 'Able'
-> and w2.seq like 'Charlie';
Empty set (0.01 sec)
I hope I am missing something obvious and you can enlighten me.