List:General Discussion« Previous MessageNext Message »
From:Douglas Bates Date:February 17 2000 5:32am
Subject:Bibliographic databases
View as plain text  
For several years I have been involved with developing retrieval
software for a bibliographic database (the Current Index to
Statistics, see http://www.statindex.org/).  Because the data were
only updated once a year we used a custom set of static indices and
custom retrieval software.

Recently I started using MySQL for another project.  I have been so
impressed with its speed and flexibility that I think we could abandon
our custom methods and instead use a set of MySQL tables.

I would appreciate hearing from anyone who has used MySQL for an
application like this.  In particular, I am considering the design of
the tables for the authors and how I could best facilitate searching
for multiple authors and I would appreciate some advice.  The most
common type of retrieval involves intersecting the sets of references
for multiple authors or for authors and keywords/words in titles.  I
want to make sure that this can be done quickly.

Consider, for example, a table of journal articles laid out as
 create table articles (
   seq int not null primary key,
   title text not null,    /* some titles exceed 255 characters */
   jour smallint not null, /* separate table of journal names */
   year smallint not null,
   volume smallint,
   pageS smallint,         /* starting page */
   pageF smallint          /* final page */
   )

The authors and keywords are stored in separate tables.  Because of
the way we get the data, I think I will store the complete name of the
author in one column.  The tables would be

 create table authors (
   seq int not null,
   name varchar(50) not null,
   index (seq),
   index (name))

 create table keywords (
   seq int not null,
   keyword varchar(30) not null,
   index (seq),
   index (keyword))

There can be several authors for the same journal article.  How do I
find the articles written by a particular combination of authors?  For
example, if the table contained

mysql> select * from articles;
+-----+--------------------+------+------+--------+-------+-------+
| seq | title              | jour | year | volume | pageS | pageF |
+-----+--------------------+------+------+--------+-------+-------+
|   1 | Now is the time    |    1 | 1999 |     20 |     1 |   200 |
|   2 | For all good men   |    1 | 1999 |     20 |   201 |   222 |
|   3 | To come to the aid |    3 | 1999 |     20 |   223 |   240 |
+-----+--------------------+------+------+--------+-------+-------+
3 rows in set (0.00 sec)

mysql> select * from authors;
+-----+---------+
| seq | name    |
+-----+---------+
|   1 | Able    |
|   1 | Baker   |
|   2 | Able    |
|   2 | Charlie |
+-----+---------+
4 rows in set (0.00 sec)

how do I select only those articles that were written by 'Able' and
'Charlie' (article 2, in this case)?  I can easily get those written
by 'Able' using

mysql> select a.* from articles as a, authors as w where a.seq = w.seq 
    ->     and w.name like 'Able';
+-----+------------------+------+------+--------+-------+-------+
| seq | title            | jour | year | volume | pageS | pageF |
+-----+------------------+------+------+--------+-------+-------+
|   1 | Now is the time  |    1 | 1999 |     20 |     1 |   200 |
|   2 | For all good men |    1 | 1999 |     20 |   201 |   222 |
+-----+------------------+------+------+--------+-------+-------+
2 rows in set (0.00 sec)

but for 'Able' and 'Charlie' I keep coming up with an empty set

mysql> select a.* from articles as a, authors as w1, authors as w2 
    ->   where a.seq = w1.seq and w1.seq = w2.seq and w1.name like 'Able'
    ->   and w2.seq like 'Charlie';
Empty set (0.01 sec)

I hope I am missing something obvious and you can enlighten me.
Thread
Bibliographic databasesDouglas Bates17 Feb
  • Re: Bibliographic databasesThimble Smith17 Feb
  • Re: Bibliographic databasesDouglas Bates17 Feb