Previous   Contents   Next
Issues in Science and Technology Librarianship
Spring 2005
DOI:10.5062/F4P55KFS

[Refereed Article]

Evaluating Bibliographic Database Overlap for Marine Science Literature Using an Ecological Concept

Joan Parker
Librarian
Moss Landing Marine Laboratories/Monterey Bay Aquarium Research Institute
parker@mlml.calstate.edu

Introduction

Understanding the overlaps and disparities between bibliographic databases is an essential tool in any librarian's repertoire. Marine science librarians may find this understanding even more important because the field of marine science spans almost every scientific discipline.  Familiarity with the literature of biology and zoology, as well as that of chemistry, geology, and physics is required.  There is no ultimate database to point to as the authority (Kelland 1989).  All databases covering the scientific literature must be considered appropriate at one time or another for marine scientists. The ultimate challenge is to understand when, why and then how to explain the nature of the overlap to scientists.

Studies of database overlap are not new to the library literature.  Bearman and Kunberger (1977) are usually recognized as authors of the first quantitative analysis of database overlap.  Studies since then have varied from descriptions to rigorous analytical examination, such as Hood and Wilson (2003).  Database overlap studies relevant to marine sciences include Chisman's (1989) comparison between Zoological Record and Biological Abstracts, Markham's review of resources for macroalgae research (1990) and Hughes' (2001) analysis of unique items in Zoological Record.  With reasonable coverage of this topic in the library and information science literature, is another analysis justified?

First, a recent quantitative comparison of multiple bibliographic databases commonly used by marine scientists could not be found in the literature.  Second, previous quantitative assessments of database overlap were reliant on multiple file searching using Dialog with results numbering in the thousands. Could another method, using the same database interface available to end users with far fewer citations to analyze, produce equally valid results? 

A review of one measure from the library literature, defined by Gluck (1990) as traditional overlap, revealed it to be analogous to an index frequently used in ecological studies to quantify the comparison of species composition between two populations (Ludwig and Reynolds 1988; Krebs 1999). With a straightforward binary version of a similarity index, the presence or absence of a species is noted for each population. Abundance of each species is not factored into the equation.  The purpose of this research was to see if substituting databases for populations and sources indexed for species provides a tool for quantifying the similarity of two bibliographic databases on a given topic using an equation familiar to marine scientists.

Methods

Four databases were selected as the most appropriate and most likely to be available to marine scientists: Aquatic Sciences and Fisheries Abstracts (ASFA), Biosis, Zoological Record (ZR) and Web of Science (WoS). Three of these are classic abstracting and indexing services.  ASFA offers broad coverage of all topics of interest to marine and ocean scientists. The Food and Agriculture Organization of the United Nations together with its international input centers, and its commercial partner (CSA), produce this database. Since 1864 Zoological Record has made every attempt to cover the literature of zoology comprehensively regardless of publication type or country of publication.  Six decades later Biological Abstracts, now Biosis, was published to cover the field of theoretical and applied biology comprehensively. Biosis and Zoological Record although originating independently are now both owned by Thomson. While overlap between these two databases should be solely attributable to differences in content coverage, it cannot be assumed that shared ownership is not a factor. Web of Science, a citation database rather than an abstracting and indexing service, was included because it remains a perennial favorite of scientists. 

Five marine science search topics were selected based on their generality and a high level of current research interest: Pacific herring, hydrothermal vents, diving behavior or behaviour, coral bleaching, and marine protected areas.  These phrases were searched only in title fields to avoid any discrepancies in indexing as suggested by Jacso (1999).  Care was taken to insure consistent application of phrase searching across all database interfaces. Publication year limits were also imposed to eliminate database currency as a confounding factor.  Only citations from 2000 to 2003 were included in the results.

Prior to conducting any searches it was decided that retrieving more than 100 or fewer than 40 citations from any database would be cause for discarding that topic.  While this may appear arbitrary, the ultimate goal was to develop a technique that provided both valid and manageable search results.  Retrieval of greater than 100 citations would require significant amounts of time to match across five databases.  Retrieval of less than 40 citations could result in too few sources for meaningful comparison across all databases.

A pilot test was conducted on one topic, Pacific herring, to verify that sources, rather than individual citations, were the most appropriate equivalent to species for the comparison.  The data from this pilot demonstrate that using citations skews the resulting coefficients in either direction.  In fact, the indexing of non-journal literature, such as books and conference proceedings, where multiple citations from the same source are present introduces abundance into an equation specifically valid for presence or absence only.  Relying on sources also avoids bias that could be introduced due to selective indexing of any sources by the database producer.

For each topic and database the first appearance of a source was noted, ignoring subsequent occurrences.  These lists of sources by topic were then compared between each two sets of databases with presence indicated by 1 and absence by 0.

The Jaccard Index (Sj), a similarity coefficient known to avoid bias in small sample sizes, was calculated using the formula:

Sj = a/(a+b+c), where
a = number of sources common to each topic
b=number of  sources unique to first database
c=number of  sources unique to second database.

The resulting coefficient can be multiplied by 100 and expressed as a percent.  The higher the percentage, the greater the similarity between results.  For example, a coefficient of .60 can be expressed as 60% similar or 40% dissimilar.  Practically speaking, a score of .60 means that searching in only one of the two databases would result in overlooking 40% of the unique sources.   

Results and Discussion

The resulting coefficients ranged from .27 to .70 meaning all comparisons shared at least 27% of the sources while none shared more than 70%. These results conform to Bradford's Law of Scatter. Similarity coefficients for each database pair are grouped by topic in Table 1.  Examining these results in detail shows the highest coefficients were found for database comparisons using Pacific herring as the search term, an average of .631.  The highest single similarity, a 70% overlap, was obtained from the comparison between Zoological Record and Biosis and the lowest, 52% overlap, from the ASFA-Zoological Record comparison. These results lend credence to the idea that the literature of fish and fisheries is reasonably well controlled from a bibliographic perspective.

Table 1: Similarity coefficients by topic
  herring vents diving coral mpas
ASFA-ZR .52 .327 .485 .50 .275
ASFA-Biosis .636 .351 .484 .545 .393
ASFA-WoS .65 .50 .60 .571 .302
ZR-Biosis .70 .50 .628 .421 .375
ZR-WoS .632 .357 .545 .514 .42
Biosis-WoS .65 .45 .621 .636 .50
Mean .631 .414 .561 .531 .378

Diving behavior coefficients ranged from .484 to .628 with an average value of .561. Again, the highest similarity was found between Zoological Record and Biosis and the lowest, a virtual tie, between ASFA-Biosis and ASFA-Zoological Record.  Coral bleaching searches resulted in a similar range, .421 to .636, but with a slightly lower average, .531.  Here the greatest similarity occurred between Biosis and Web of Science with the Zoological Record-Biosis comparison overlapping the least, a notable divergence from other topics. While ASFA and Zoological Record are the next most dissimilar, the 42% overlap between Zoological Record and Biosis is relatively unusual. Closer examination of the non-overlapping sources provides no greater understanding of the reasons apart from basic indexing policies, specifically Zoological Record's strong tendency toward international coverage.

Hydrothermal vent searches showed less overlap than previous topics, with an average value of .414.  The most similar database pairs were ASFA-Web of Science and Zoological Record-Biosis, each with similarity coefficients of .50. ASFA and Zoological Record showed the least amount of database overlap at 33%.

Marine protected area searches comparisons produced the lowest coefficients of this study. The two databases with the highest similarity value in the category were Biosis-Web of Science at .50 while the lowest (.275) was once again the ASFA-Zoological Record comparison.

While the searches on Pacific herring may indicate a well-controlled literature, these lower values provide support for the opposite conclusion. Topics such as hydrothermal vents and marine protected areas that are studied by ecologists, geologists, marine biologists and oceanographers, may be published in an array of sources, none of which may be covered at a greater than 50% level by any secondary source. In general, the more breadth the topic has, the lower the similarity index, breadth defined as studies being published across multiple marine science disciplines.

Across-topic affinity between databases (Table 2), calculated as the average of coefficients, generally indicates that Biosis and Web of Science were more similar and ASFA-Zoological Record less similar. The almost 60% overlap between Biosis and Web of Science across topics is most likely due to the nature of the latter. Since Web of Science is a citation database there is some expectation that it captures the core literature especially well.  In contrast, ASFA is regarded as a resource for proceedings, agency publications, reports from developing countries and other types of gray literature.  With only five topics searched, these across-database averages can only be interpreted as possible trends.

Table 2: Mean results by database
ASFA-ZR ASFA-Biosis ASFA-WoS ZR-Biosis ZR-WoS Biosis-WoS
.421 .482 .525 .525 .494 .571

Identifying sources found in only one database provides additional understanding of database overlap. Table 3 lists, by topic, the names of each unique source.  General trends, notably the inclusion of dissertations and gray literature in ASFA, book chapters in Zoological Record and Biosis, and abstracts from major US scientific meetings in Biosis, highlight the indexing policies of each database producer.  However, each topic produced some surprising results.

Table 3. Sources found in only one database
Pacific herring
ASFA Zoo Record Biosis Web of Science
Dissertation Abstracts
Reproductive Physiology of Fish
  Zoological Science  
Diving behavior
ASFA Zoo Record Biosis Web of Science
Proc. 5th Channel Islands Symp.
Dissertation Abstracts
NOAA Tech. Memo
Otsuchi Marine Science
Monachus Guardian
Aquabiology
Reptilia
Polar Bioscience
Antarctic Ecosystems (book)
Zoological Science
Zoology
Advances in Ecology
Journal of Marine Science
Neurology
Coral bleaching
ASFA Zoo Record Biosis Web of Science
WIOMSA
Harmful Algal Blooms
DATZ
Recent Adv. Coastal Ecol.
Proc. 9th Int. Coral Reef Symp.
Ocean. Proc. CR (book)
Qatar Univ. Science J.
ESA Meeting Abs.
Zoological Science
Trends in Ecol. Evol.
Plant & Cell Phys
ASM News
Hydrothermal vents
ASFA Zoo Record Biosis Web of Science
Dissertation Abstracts
Proc. 2nd Int. Symp. DSHV
Acta Oceanologica Sinica
Encyclopedia of Ocean Sciences
Oceans 2000
Univ. Bretagne (dissertation)
Japanese J. of Benthology
DSR Part I
Przeglad Geologiczny
Bfn-Skripter
Ecology deepsea h/t vents (book)
Abs. Gen. Meeting ASM
J. Shellfish Research
FEMS Cong. Euro Micro Abs
Biol. Systems Ext. Cond (book ch.)
Doklady Akademii Nauk
Patent
Astrobiology
Comp. Biochem & Phys. A
AAAS Annual Meeting Abs
Life and Environment
Doklady Earth Science
Geochimica et Cosmo
Geothermics
Geology
Envir. Micro
Geology
Comptes Rendus Chim.
Geophys. Res. Let.
Econ. Geol
Geotimes
Marine protected areas
ASFA Zoo Record Biosis Web of Science
IUCN (Towards a strategy...)
Boston College Eng. Affairs Law Rev
Research on Fisheries - 5th halieumetric
Proc. 12th Biennial Coastal Zone Conf
Economics of MPA (UBC Conf.)
CEMARE Research Paper
6th Asian Fish Forum Abs.
NAFO Science Council Res. Doc.
WIOMSA Sci. Symp. Report
Encyclopedia of Ocean Sciences
MPAs Gulf of Maine-MIT SeaGrant
ACP-EU Fisheries Res. Rep.
European Environ. Law Review
Workshop - Scallops
NSW Fisheries Final Report
Ocean Development & Int. Law
Wild Earth
Bahamas J. of Science
Monachus Guardian
Tas. Aq. & Fish. Inst. Tech Report
Proc. 9th Int. Coral Reef Symp.
Biologia Marina
Gulf Ecosystem (book chapter)
Elepaio
Endangered Species Update
Bfn-Skripten
Marine Community Ecol (book ch.)
Alaska SeaGrant Report
MPA's: tools (NAS book)
GBRMPA Res. Pub.
AFS Annual Meeting Abs.
Gulf & Caribbean Res.
Cybium

As expected, Pacific herring resulted in very few unique sources and marine protected areas very many.  In between, are some notable occurrences. Searches on diving behavior retrieved a NOAA Technical Memorandum uniquely from Zoological Record. As NOAA is the U.S. partner in the creation of the ASFA database the absence of this citation from ASFA is unexpected. Likewise, only Biosis returned a result from the Journal of Marine Science. More unique sources for this search term were obtained from Zoological Record, not too surprising as this topic is inherently a zoological one.

Coral bleaching results followed the general trend in indexing policies described above except for the presence of the Proceedings of the 9th International Coral Reef Symposium exclusively in Zoological Record.  The topic of hydrothermal vents also resulted in marine science journals uniquely retrieved from Zoological Record (Deep Sea Research Part I) and Biosis (Journal of Shellfish Research).  Web of Science contained a large number of unique sources that generally represented the literature of geology.

The topic of marine protected areas serves as the best example of the overall richness of the ASFA database for marine science literature.  However, the fact that Zoological Record included almost as many unique sources must be noted.  Again only Biosis retrieved Gulf and Caribbean Research, a journal that is listed as a priority source on the list of serials indexed by ASFA. 

Conclusions

How do these four databases compare in terms of coverage for typical marine science topics?  Is it possible to achieve some quantitative understanding of their similarity without undertaking a dissertation level analysis?  These are the questions this study attempted to answer. 

In answer to the first, this study serves as further reinforcement of the wise practice of librarians to recommend searching multiple databases for full discovery of the available literature on marine science topics.  Even the most similar databases resulted in no more than a 70% overlap for any topic.  One glaring finding was the omission of core marine science journal issues from ASFA. For marine science librarians this is not unexpected as the true value of this database stems from its coverage of the worldwide aquatic sciences literature, not its indexing of core marine science journals. Unfortunately, ASFA's strength and the implications for literature searching may not be widely understood by scientists. 

Including Web of Science in this study was done solely to prove that it should not serve as an ultimate resource for marine scientists.  In only one case, the topic of hydrothermal vents, were many unique sources discovered from Web of Science.  Furthermore, these results were almost exclusively from the mainstream literature of geology, a subject no other database in this study was expected to cover well. All other databases overlapped Web of Science between thirty and sixty-five percent, falling within the upper and lower ranges of this study.

The second question arose from a need to find a relatively rapid means of providing a quantitative assessment of the similarities between two databases. Although it is impossible to prove the validity of this study, its general agreement with Bradford's Law suggests it may indeed be an acceptable method for quantifying overlap between databases. More importantly, each database comparison from start to finish required no more than an uninterrupted hour to collect, compile and analyze. This technique clearly does not match the stringent methods of an information science research project. However, it does present an alternative way for librarians to inform users of differences between seemingly similar resources without resorting to anecdotal evidence.

Literature Cited

Bearman, T.C. and W.A. Kunberger. 1977. A Study of Coverage Overlap Among Fourteen Major Science and Technology Abstracting and Indexing Services. Philadelphia: National Federation of Abstracting and Indexing Services.

Chisman, J.K. 1989. Zoological Record, Biological Abstracts and Biological Abstracts/RRM: a comparison of overlap. RQ. 29(Winter):242-247.

Gluck, M. 1990. A review of journal coverage overlap with an extension to the definition of overlap. Journal of the American Society for Information Science. 41(1):43-60.

Hood, W.W. and C.S. Wilson. 2003. Overlap in bibliographic databases. Journal of the American Society for Information Science and Technology. 54(12):1091-1103.

Hughes, J. 2001. Characterization of unique serials indexed in the Zoological Record. Issues in Science and Technology Librarianship. Number 30, Spring 2001 [Online]. Available: http://www.istl.org/01-spring/refereed.html [Accessed: 7 July 2004].

Jacso, P. 1999. Database content evaluation criteria. In: Content Evaluation of Web and CD-ROM Databases. Englewood, CO: Libraries Unlimited.

Kelland, J. L. 1989. Marine science information in non-marine databases. In: Marine Science Information Throughout the World: Sharing the Resources. (ed. By C.A. Winn, R.W. Burkhart, & J.C. Burkhart), pp. 133-137. Fort Pierce, Fla.: IAMSLIC.

Krebs, C.J. 1999. Ecological Methodology. 2nd edition. Menlo Park, CA: Benjamin Cummings.

Ludwig, J.A. and J.F. Reynolds. 1988. Statistical ecology: a primer on methods and computing. New York: Wiley.

Markham, J.W. 1990. Online retrieval strategies and database comparisons for literataure on macroalgae. In: IAMSLIC at the Crossroads: Proceedings of the 15th Annual Conference (ed. By R.W. Burkhart & J.C. Burkhart), pp. 157-161. St. Petersburg, Fla.: IAMSLIC.

Previous   Contents   Next