Issues in Science and Technology Librarianship	Winter 1999

DOI:10.5062/F4HX19P4

URLs in this document have been updated. Links enclosed in {curly brackets} have been changed. If a replacement link was located, the new URL was added and the link is active; if a new site could not be identified, the broken link was removed.

Optimizing Web Access to Geospatial Data: The Cornell University Geospatial Information Repository (CUGIR)

Philip Herold
Information Services Coordinator
Albert R. Mann Library, Cornell University
ph31@cornell.edu

Thomas D. Gale
Programmer/Analyst Librarian
Albert R. Mann Library, Cornell University

Thomas P. Turner
Metadata Librarian
Albert R. Mann Library, Cornell University

Abstract

With the aid of a 1997 Federal Geographic Data Committee CCAP Award, Cornell's Albert R. Mann Library recently established the Cornell University Geospatial Information Repository (CUGIR), a Web-based clearinghouse containing geospatial data and metadata related to New York State. The staff at Mann Library has established an efficient model for spatial data distribution. This paper describes the processes, problems, and solutions involved in the creation of a geospatial data distribution system.

Introduction

Libraries with collections of digital geospatial data, and those that serve and support the use of geographic information systems, are utilizing the Internet and World Wide Web as an efficient and flexible means to distribute collections of geospatial data. Creating a Web-based distribution system requires librarians to understand and address a wide array of issues surrounding geospatial data, including metadata and standards, partnerships and liability, and data organization and technical infrastructure. Although some of these issues mirror those associated with other, more commonly known and understood library resources, geospatial data contain attributes that require special attention and an understanding of both cartographic and geographic concepts.

Staff at the Albert R. Mann Library at Cornell University began looking at ways to disseminate geospatial data from Mann's collections via the World Wide Web in 1995, and in 1998 established a Web-based clearinghouse for New York State geospatial data and metadata. Building a clearinghouse entailed creating partnerships with local, state and Federal agencies, understanding how to interpret and apply the Federal Geographic Data Committee (FGDC) Content Standard for Geospatial Metadata, and designing a search and retrieval interface and a flexible, scalable data storage system. These tasks brought both anticipated and unforeseen challenges. This paper will examine the data dissemination model that Mann Library has adopted, and will explore the tasks and challenges that model has presented.

Why Build a Data Clearinghouse?

Since the release of the U. S. Census Bureau's TIGER/Line 1990 files in 1991, Mann Library has made strong efforts to support the use of geospatial data and geographic information systems by University faculty, students, and staff. In the early 1990s GIS applications were not widely used because there was a lack of available digital data covering fundamental aspects of geography, software applications were immature and difficult to use, and GIS technology was relatively new and its applications were not well known. In the past eight years all of these problems have largely been mitigated.

There remain, however, several impediments to the successful utilization of GIS and geospatial data. One difficulty is the high degree of technical understanding that accompanies using sophisticated and powerful GIS applications. A second issue is the requirement that users understand important cartographic and geographic concepts related to GIS. A third obstacle is the relative difficulty in accessing geospatial data sets required by users to complete projects using GIS. It is the third impediment that poses the greatest challenge to many libraries, because geospatial data is a specialized resource, and a relatively new addition to library collections.

Mann Library makes efforts to alleviate all three of these impediments, by offering workshops, self-paced tutorials, thorough documentation, and flexible consulting services designed to help users achieve the technical and conceptual understandings necessary to use GIS in their work and study. However, even for users with the requisite understanding, providing ready access to the geospatial data needed by Mann's users is problematic because there is a relative scarcity of geospatial data in usable digital formats. Most digital geospatial data are derived by converting existing analog map information into digital formats through digitizing, scanning, or geocoding processes. Most often, digital geospatial data are produced by local, state, and Federal government agencies, where the creation and distribution of this data is typically slow and scattershot. The result is that many fundamental data sets either do not yet exist, or are incomplete. The difficult task of libraries is to identify, acquire, and provide access to those data sets that are complete.

To provide fast, easy access to geospatial data in a well-organized fashion, Mann Library staff designed a Web-based system for data distribution. In our first attempt at this, in 1996, Mann staff worked with the Cornell Institute for Social and Economic Research (CISER) to convert parts of the U. S. Census Bureau's TIGER/Line 1992 files (Herold 1996). Six separate coverages (transportation, hydrography, and four sets of census and political boundaries) were converted for each of New York State's sixty-two counties and organized into a Web site with browsing tools, help, and non-standardized metadata. Users could select a county by name or from an image map and then download geospatial data describing that county.

The success of the New York State TIGER/Line system served as an impetus to develop an expanded and improved Web-based service. In 1997, Mann Library was awarded a one-year grant from the FGDC's Competitive Cooperative Agreements Program (CCAP) to build a clearinghouse node as part of the National Spatial Data Infrastructure (NSDI) Federal Geospatial Clearinghouse. The FGDC's CCAP program is designed to provide seed money (up to $40,000 in 1997) to institutions that undertake one of several types of initiatives towards building, on a local, regional, or national level, the infrastructure for creating, distributing and sharing geospatial data or standards.

Mann Library's clearinghouse node is one of over 90 such nodes located around the world (most located in North America), containing searchable metadata records describing geospatial data sets. All nodes are located on data servers using either the Z39.50 or a compatible information retrieval protocol. As a result, they can be linked to a single search interface called the Geospatial Data Clearinghouse (Federal Geographic Data Committee Geospatial Data Clearinghouse Entry Points) where the metadata contents of all 90 nodes, or any subset in combination, can be searched simultaneously. In addition, most clearinghouse nodes have their own Web sites and customized browsing and searching interfaces.

The CCAP program requires funded agencies to establish partnerships with outside agencies. Mann Library, which services Cornell's College of Agriculture and Life Sciences, College of Human Ecology, and Divisions of Biological Sciences and Nutrition, is primarily interested in working with agencies that produce and own geospatial data related to agriculture, environmental sciences, and selected social sciences. We approached the New York State Department of Environmental Conservation, the owner of many key data sets related to agriculture and the environment, and the Cornell Soil Information Systems Laboratory, where soil survey maps are currently being digitized from analog media, about forming data sharing partnership agreements.

In developing an NSDI Clearinghouse Node, Mann Library and its partners proposed to the FGDC the following objectives:

to establish and manage a National Geospatial Data Clearinghouse Node; to be accessible remotely, both through the Cornell University Library Web site and online catalog and through the NSDI Clearinghouse;
to inventory, document and provide access to geospatial data holdings of Mann Library, the New York State DEC, and the Soil Information Systems Laboratory, in accordance with existing FGDC-endorsed Content Standards for Digital Geospatial Metadata;
to develop a plan to acquire, disseminate, and assist in the development of new data products in the agricultural and environmental sciences; and
to create an on-line, ANSI/ISO Z39.50-compliant database of geospatial metadata; to be browsable and searchable (by fields, coordinates, and free-text keywords or phrases).

Cornell's Clearinghouse Node would serve to further NSDI objectives by:

participating as a node of the National Geospatial Data Clearinghouse;
providing standardized documentation of data adhering to FGDC Content Standards for Metadata;
initiating data collection and sharing within the State of New York; and
providing standardized means of on-line access to geospatial metadata and data utilizing a platform-independent information retrieval protocol.

The development of CUGIR has been accomplished through a team-based model of work and cooperation. Project staff were selected from each division within Mann Library, including Public Services, Technical Services, Collection Development and our Information Technology Section. The primary working group consisted of five regular members, each coordinating work within his or her area of specialty. Other Library staff participated on an as-needed basis. Primary responsibilities for the overall coordination of clearinghouse development were held by a Public Services Librarian with significant experience using and advising in the use of geographic information systems and geospatial data.

Data Definition, Identification, and Preparation

To develop a clearinghouse node or data repository, developers need to define the nature of the data to be collected and create a plan to develop or acquire that data. We began by creating a working collection development policy that established the criteria for data selection. In creating this document the working group addressed a number of issues that would prove essential to developing a clear collection scope and would create an identity for and give purpose to the clearinghouse. The document also serves to define the philosophy of CUGIR, specifically regarding issues of use and access. There are no downloading access restrictions imposed on CUGIR data. CUGIR collects and makes available geospatial data and metadata describing the agricultural, environmental, biological, and social characteristics of New York State. CUGIR also collects data described by the FGDC as "Framework" data -- data that has wide applicability in spatial analysis and use of geographic information systems. This type of data includes transportation, hydrographic, soils, elevation, cadastral, and other commonly used data themes. It is our intention to make data available in the most widely used data formats and in multiple formats when feasible, given the limitations and restrictions of existing resources. The collection development policy clearly states these guidelines and is included on the CUGIR site.

Once CUGIR's scope was clearly defined, staff identified data sets for preparation, documentation, and inclusion in the clearinghouse. We received an inventory from NYSDEC and met with representatives in June 1997 to discuss plans to create metadata for, and select for inclusion, several data sets at the state, county, and 7.5-minute quadrangle levels. We also received a status report from SISL indicating that several counties and quadrangles were in progress with several others awaiting Federal certification. We also created an inventory of data sets in Mann Library's collections that met criteria for inclusion. Documentation and preparation of data sets to be included were prioritized, with Mann Library holdings placed directly after NYSDEC data.

Data preparation was one of the more significant activities and accomplishments of the CUGIR team. Although most data sets coming to CUGIR from agencies outside Mann Library were in the agencies' native formats and required no conversion, there was a significant amount of data conversion that took place in-house. An Arc/Info programmer was hired to perform the conversion of raw TIGER/Line 1995 data into both Arc/Info coverage (which was packaged in Arc/Info interchange (export) format for distribution) and shapefile formats. This programmer converted eleven coverages, including roads, railroads, hydrography, landmarks, and county, minor civil division, place, census tract, census block group, census block, and unified school district boundaries. The coverages were developed for each of New York State's 62 counties (a total of 682 unique geographic themes) in two formats (a total of 1,364 files derived from TIGER/Line 1995). The shapefiles were then archived using UNIX tar and compressed using the public domain software Gzip (GNU zip). Similar geospatial data processing was carried out for several USGS-produced framework-level Digital Line Graph (DLG) small-scale themes for New York.

It should be noted that the data conversion was performed in a way that is both scalable and replicable. Arc/Info AML (Arc Macro Language) scripts were created to automate conversion processes and run them on batch files. These need only be rerun to regenerate the same types of files in the future, and we anticipate that they can be used to convert data from the Census Bureau's 1997 release of TIGER files. AMLs created for CUGIR's data sets will be shared with others who wish to do their own conversion of TIGER. It should also be noted that conversion is a complicated and time-consuming process. It required considerable amounts of time and energy to create AMLs that ran successfully, and to include in them data improvements such as the creation of keycode fields (concatenations of FIPS codes identifying unique polygons) for census designated areas including block groups and blocks

Standards and Metadata

Sharing geospatial data over the World Wide Web involves communication between remote users and data providers. For this process to be successful, metadata and information retrieval standards serve as the couriers between these groups and are central to the entire process. Basic information about geospatial and other forms of data should answer the following questions. What information is produced? Who created it? For what purpose was it created? What was the process used to create it? This data about data is called metadata (Federal Geographic Data Committee Metadata). In June, 1994, the Federal Geographic Data Committee (http://www.fgdc.gov/) established a metadata standard for describing geospatial data that can be used to assist in this process. The Content Standard for Digital Geospatial Metadata defines a minimal set of information that must be recorded about all geospatial data as well as an optional list of more detailed information. The FGDC describes the standard as a method for communicating between producers and users:

"The standard was developed from the perspective of defining the information required by a prospective user to determine the availability of a set of geospatial data, to determine the fitness of the set of geospatial data for an intended use, to determine the means of accessing the set of geospatial data, and to successfully transfer the set of geospatial data. As such, the standard establishes the names of data elements and compound elements to be used for these purposes, the definitions of these data elements and compound elements, and information about the values that are to be provided for the data elements." (Federal Geographic Data Committee Content Standard for Digital Geospatial Metadata).

The Content Standard defines seven basic types of information that potential users might need to know: Identification Information; Data Quality Information; Spatial Data Organization Information; Spatial Reference Information; Entity and Attribute Information; Distribution Information and Metadata Reference Information (Federal Geographic Data Committee Content Standard for Digital Geospatial Metadata). Of these areas only Identification Information (basic information about the file such as originator, abstract, and purpose) and Metadata Reference Information (information about the production of the metadata) are defined as being mandatory for all records. All the other areas of the standard are mandatory if applicable. Within each section are sub-fields that can be defined as mandatory, mandatory if applicable, or optional. This flexibility allows metadata creators to determine the level of detail that they can provide or support based on perceived user needs. It guarantees that at least basic metadata will be recorded about each data set. Hart and Phillips (1998) provide a useful overview of metadata creation.

It is important to note that the FGDC Content Standard is a content standard. It defines the content of the record rather than defining the method for organizing this information in a database or on a server, transferring files or displaying material to users (Federal Geographic Data Committee Content Standard for Digital Geospatial Metadata). Other standards are used to define those processes. The FGDC Content Standard has created a Standard Generalized Mark-up Language (SGML) document type definition. SGML is an international standard that can be used to make digital materials accessible regardless of the specific system used to store the material (Cover 1997). By using SGML, metadata records can be easily indexed and shared using a variety of software. In addition, using server software that supports the Z39.50 protocol enables records in one collection to be seamlessly searched by other systems that employ the same protocol. Lynch (1997) discusses the value of the Z39.50 protocol for digital initiatives.

By Making use of the FGDC Content Standard, SGML and Z39.50, the work done by CUGIR can be easily searched, accessed and used by remote users.

Working with the FGDC Metadata Content Standard

During the grant timeline, metadata activities fell into four categories: staff training, creation of metadata for Mann Library-produced data, collaboration with data partners to create metadata, and planning for inclusion of data sets in the future. To accomplish these tasks, we had to decide who would work on the metadata records, what sort of training they would get and how the records would be created.

Choosing who in an organization will deal with the creation of metadata is an important starting point. Equating the creation of metadata records to the cataloging of books, Schweitzer (1998) suggests that metadata experts, not just data experts, should be involved in the process:

"Data managers who are either technically-literate scientists or scientifically-literate computer specialists [should create metadata records]. Creating correct metadata is like library cataloging, except the creator needs to know more of the scientific information behind the data in order to properly document them. Don't assume that every -ologist or -ographer needs to be able to create proper metadata. They will complain that it is too hard and they won't see the benefits. But ensure that there is good communication between the metadata producer and the data producer; the former will have to ask questions of the latter."

Schweitzer observes that it is not practical for every data specialist to be as familiar with the metadata record structure as is necessary to produce metadata effectively. Therefore, he suggests adjusting workflow so data producers send basic information to data or metadata managers to create metadata. This is the approach that we followed at Mann Library to develop CUGIR.

Technical Services staff at Mann Library completed the metadata work. Learning and using the FGDC metadata standard fit with other work in the department since staff are trained to work with complex metadata structures for other library work. The Metadata Librarian in our Technical Services department was the primary staff member designated to work with geospatial metadata. However, other Technical Services staff members have been given basic training in record structure and GIS and geospatial data concepts. This training was provided by a workshop given by Mann Library's GIS specialist in the summer of 1997. Following this introductory session, five staff members from Technical Services took part in the satellite videoconference: "A Practical Guide to Metadata Implementation for GIS/LIS Professional" (Hart & Phillips 1998). This conference provided an excellent introduction into the metadata record structure and to the tools that could be used to create metadata. Since catalogers were working on the creation of metadata, staff focused on metadata records in relation to one another in a database rather than solely on the content of individual records. This focus reflects a different perspective than that of the data producer. Larsgaard (1996) describes the complexity of cataloging geospatial data and the development of the metadata schema.

Mann Library created metadata for data sets that were produced at the library from TIGER/Line files. As part of that process, important areas of the record were highlighted for mandatory inclusion in CUGIR even though they were only deemed mandatory if applicable by the FGDC standard. All areas of the record had at least basic information. In addition, theme and place keyword types were identified for mandatory inclusion. For instance, data types and attributes are always included as theme keywords and FIPS codes and state, county or quadrangle names are always included as place keywords. This approach allows us to assume consistency within the database for searching and retrieval purposes.

The data sets that were created at Mann Library were at the county-level. Most of the information for the records was the same. Changes were predictable and involved differences in data set title, file name, bounding coordinates and place keywords. In addition, the county-specific information was the same for the ten coverages created. Coverage differences involved data set title, file name, abstract and theme keywords. To reduce the amount of time required to generate approximately 600 metadata records for these files, the Programmer/Analyst wrote a script to generate these files. The Metadata Librarian created a template metadata record, a file with the county-level changes and a file with the coverage-information changes. The script produced the 600+ records from these three files.

Work with Data Partners on Creation of Metadata

Mann Library also worked with data partners to produce metadata for data sets distributed through the Clearinghouse. This process was different but complementary to the process used to create metadata for materials produced at Mann Library. Fields were identified for mandatory inclusion and patterns were built into theme and place keywords. However, the process used to work with data partners was iterative and used the appropriate experts at the appropriate times. The data creators worked on content issues by providing a basic metadata record for review by the Metadata Librarian. The record covered all basic information related to the data set and the data experts were able to focus on the record and data set in question. The Metadata Librarian reviewed the record for format and consistency within the context of the database. Corporate names, theme and place keywords, and title formats are among the issues examined by the Metadata Librarian. This activity is consistent with the work done by catalogers in the Technical Services unit of Mann Library. In addition, Technical Services staff is familiar with using metadata-supporting documentation. The Metadata Librarian then returned the records to data partner staff for revisions and final work. This process enabled the data partners to focus on data provision and basic metadata information while the Metadata Librarian was able to provide metadata expertise and consultation. Future work will also involve the creation of MARC records for these data sets in the library's online catalog. These records will then be passed to two national databases-RLIN and OCLC. They will then be retrievable by searchers of those databases, and the records can also be downloaded by other libraries for their online catalogs.

During this project, metadata was created using NBII's MetaMaker (National Biological Information Infrastructure 1997), mp (USGS July 1998), and cns (USGS October 1998). These products were very useful in understanding the record structure and its requirements. It was also helpful that they worked jointly so several different software interfaces did not need to be used.

Data Organization and Technical Implementation

With data partners and funding secured, data identification, acquisition, and preparation tackled, and the process of generating standardized metadata begun, our staff began the work of building a system to distribute geospatial data and metadata. Prior to receiving a CCAP grant to become a clearinghouse node, we considered dissemination mechanisms that included the use of CD-ROM, Internet, NSDI clearinghouse node, or some combination. Although stipulations in our grant proposal limited our choice of system to a Web-based clearinghouse system, a number of implementation questions quickly followed. Specifically, we needed to address the questions of whether to build or buy software, what hardware we required, how files would be handled and how statistics would be gathered and analyzed.

Software-Build, Buy or Both?

System builders need to identify the most appropriate software infrastructure for a system that will best suit users' needs and will take advantage of current resources. The answer to the question of whether to build and/or buy software develops after enumerating specific requirements, examining time and human resource constraints, and reviewing existing software choices.

Our list of requirements revealed that the system needed to have a Web-based metadata searching facility and a geographic browsing facility supported by an interface that would integrate well with other clearinghouse nodes. Our time frame was set at something less than one year as determined by the CCAP grant. Since we had determined that much of the data conversion would be conducted in-house, we needed to limit the amount of funds that could be allocated to programmers for system development.

Our examination of geographic data distribution sites revealed that there were essentially two choices for our software architecture. First, we could build a Z39.50 database that would house and index the metadata for our system and integrate customized fields that would allow for extremely flexible Web-based browsing when combined with CGI scripts on our Web server. The second option would be to take the tested, popularly implemented indexing and searching freeware called Isite (Center for Networked Information Discovery & Retrieval) and create our browsing system separately. The first choice would allow us to create a very customizable and flexible interface that users could use to browse geographically and to search our metadata. However, this option was rejected for two reasons. First, it would require considerable resources and time to build and test it from the ground up. Secondly, despite the fact that Z39.50 protocol would be supported by such a system, it would be time intensive and difficult to attain the level of integration with other Clearinghouse Nodes that accompanies the FGDC-endorsed Isite software product.

By using the established Isite package, we had the advantage of using a tested, documented, and well-supported free product that worked well with existing nodes. Isite has facilities for simultaneously searching local and remote nodes that use the same software. Also, the FGDC Web site offers the ability to search all clearinghouse nodes that are using the Isite software simultaneously from their site (Federal Geographic Data Committee Geospatial Data Clearinghouse Entry Points). In addition, opting for the Isite solution meant that the short development time and limited human resources could be focused on Web design and browsing facilities rather than on the creation and development of an entire Z39.50 database and information retrieval system. The disadvantage to the Isite system was that it would be difficult to integrate our homegrown browsing facilities given that the Isite product is continually being developed and upgraded.

Given time and human resources constraints and specified system requirements, one needs to make the determination to build or buy part or all of the software that will power a geospatial information dissemination system. In developing CUGIR, our circumstances warranted both build and buy (or rather borrow -- Isite is freeware). We elected to develop our own browsing system and to run the Isite metadata indexing and searching facilities in parallel. The browsing facilities consist of HTML pages containing maps and lists of geographic regions that interact with our data files via Perl CGI scripts (Cornell University Geospatial Information Repository 1998a, 1998b). This system has worked well, and the use of a file naming convention provides a high level of integrity between the systems.

Hardware

Once the software has been chosen, hardware that will support that software must be chosen. Hardware purchases depend on the type of dissemination system chosen as well as the software available or developed. Distributing material via CD-ROM will involve either contracting with a vendor to press the CDs or purchasing a CD writer. In the case of CUGIR, we had a server in place from an earlier geospatial data system implementation that would support Isite software and could sustain anticipated traffic. We needed only to purchase additional disc space to sustain the indexes generated by Isite and to house the data. Disc space is relatively inexpensive and the use of a SCSI port allows us to add more discs easily and efficiently as we acquire them by chaining the components together.

Another dissemination option is to form a partnership with a clearinghouse node. This is a viable option when the quantity of data to be shared is small or there are insufficient funds to purchase or build software or equipment. In this case, data suppliers should consider establishing a partnership with a clearinghouse node, such as CUGIR. If the data is within the clearinghouse's scope, the site developers will likely accommodate this material either free of charge or with a nominal fee.

Hardware decisions should include a system to backup your data, metadata, HTML documents, scripts, and programs regularly. CUGIR uses an 8mm magnetic tape backup system that is run on a weekly and monthly basis. The system and schedule used depend on the frequency of updates to data and metadata. Scripts and HTML files can usually be backed up by keeping local copies on the developer's machine, but maintenance of substantial amounts of data requires a more robust backup system.

File Handling

When maintaining large volumes of files from varying sources, it is useful to adopt a naming convention for metadata and data files. The naming convention allows for the quick and easy means for identifying and organizing data and metadata within the site.

Each unique data file at CUGIR and its corresponding metadata file begins with the same prefix. The prefix begins with either a 3-digit code or 2-letter 2-digit code that represents the geographic level of the data. For example, 109 represents the New York county number for Tompkins County while AA41 represents the quadrangle code for the 7.5 minute Monticello quadrangle in New York State. Following the geographic code is a two-letter feature code that identifies the theme of the data (e.g., 'hy' represents hydrography data). Finally, the prefix ends with a single letter code that indicates the format of the file (e.g., 'a' represents ARC/INFO export format). For example, the file for railroads in Tompkins County in Arc/Info export format is 109rra.e00.gz. There may be a second extension that is required by software to process the file, and the final extension is always indicates the means used to compress the file (Z = UNIX compress, gz = GNU compression). When distributing data files over the Web, compression is a necessity because data files are quite large. To ensure that users can open files, it is important to adopt common compression methods (e.g., UNIX compress or GNU zip). More details on the file naming convention used in CUGIR can be found within CUGIR (1998c).

The file naming convention provides an authoritative means of naming files that arrive from a variety of producers. Fortunately, the partner organizations of CUGIR have adopted, in part, the use of FIPS (Federal Information Processing Standard) codes and either the NYS Department of Transportation or USGS quadrangle codes in their naming of files. Use of these codes provides a base from which Perl scripts can be written to rename and move files around the site quickly. This convention is aided by having a fairly standard geographic coding system (such as FIPS) at its core.

Statistics

Once a system is in place, it is important to track which and how many data sets and metadata files are being disseminated. These statistics will be used in the future to acquire funding and to identify the most used data files for planning purposes. Web server logs provide a basic tracking mechanism for the number of downloads from CUGIR. However, the information from these logs is in no way comprehensive and often requires considerable work to strip away unwanted data. We utilize the log file analyzers, Analog ({http://www.analog.cx/}) and Webalizer ({http://www.usagl.net/webalizer/}) to customize the output of Web log statistics in HTML format. Another option for developers is to pipe entries from Web server logs into a database as they are generated. This system dynamically generates highly customizable reports. A recent article written in WebTechiques magazine (Stein 1998) details a method to do this with Apache Web server and MySQL database.

Future Considerations

When Mann Library constructed CUGIR it made a long-term commitment to providing geospatial data about New York State. The partnerships we formed with data providers are not one-time data acquisition agreements, but relationships that will grow and mature as these partners continue to expand and update the range of data sets they produce. Our focus is now on the long-term relationships with data partners, on finding ways to increase the amount of geospatial data and metadata in CUGIR, and on enhancing the access we provide to the data already within the repository.

We continue to contact data producing agencies whose data is not currently available via the Internet, encouraging them to place their data and metadata in CUGIR. We also continue to provide free metadata and data consulting services to new partners in order that they may begin the difficult process of creating standardized metadata describing their data products.

Our plans include making a number of enhancements to our data-browsing interface, including adding increased customization to the data theme and geography selection tools. We also plan to undertake a CUGIR user survey to better understand the ways in which people search and browse for geospatial data and metadata. With results from the user study combined with an analysis of our access logs, we will attempt to refine CUGIR's interface to make it easier to locate and retrieve data sets and metadata.

References

Center for Networked Information Discovery and Retrieval. [Homepage]. [Online]. Available: {http://www.mcnc-rdi.org/} [February 4, 1999].

Cornell University Geospatial Information Repository. 1998a. Browse by Map. [Online]. Available: {http://cugir.mannlib.cornell.edu/mapbrowse.jsp?series=counties} [February 4, 1999].

Cornell University Geospatial Information Repository. 1998b. Browse by List. [Online]. Available: {http://cugir.mannlib.cornell.edu/browse.jsp} [February 4, 1999].

Cornell University Geospatial Information Repository. 1998c. Help & FAQ. [Online]. Available: {http://cugir.mannlib.cornell.edu/help.jsp} [February 8, 1999].

Cover, Robin. 1997. SGML: Answers to Basic Questions. [Online]. Available: {http://www.isgmlug.org/whatsgml.htm} [February 4, 1999].

Federal Geographic Data Committee. Content Standard for Digital Geospatial Metadata (CSDGM). [Online]. Available: {http://www.fgdc.gov/standards/projects/FGDC-standards-projects/metadata/base-metadata/v2_0698.pdf} [February 4, 1999].

______. FGDC Metadata. [Online]. Available: {http://www.fgdc.gov/metadata} [February 4, 1999].

______. Geospatial Data Clearinghouse Entry Points. [Online]. Available: {http://clearinghouse.esri.com/} [February 4, 1999].

______. [Homepage]. [Online]. Available: http://www.fgdc.gov/ [February 4, 1999].

Hart, David and Hugh Phillips. June 10, 1998. Metadata Primer -- A "How To" Guide on Metadata Implementation. [Online]. Available: http://www.lic.wisc.edu/metadata/metaprim.htm [Feburary 4, 1999].

Herold, Philip. 1996. Moving Geospatial Data to the Web: GIS at Mann Library. Library Hi Tech 14(4): 86-87.

Larsgaard, Mary Lynette. 1996. Cataloging Planetospatial Data in Digital Form: Old Wine, New Bottles-New Wine, Old Bottles. In: Geographic Information Systems and Libraries: Patrons, Maps, and Spatial Information. Papers Presented at the 1995 Clinic on Library Applications and Data Processing, April 10-12, 1995. (ed. By Ed. Linda C. Smith & Myke Gluck). Urbana-Champaign, IL: Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign.

Lynch, Clifford A. April, 1997. The Z39.50 Information Retrieval Standard, Part I: A Strategic View of Its Past, Present and Future. [Online]. Available: http://www.dlib.org/dlib/april97/04lynch.html [February 4, 1999].

National Biological Information Infrastructure. January 5, 1999. NBII MetaMaker Version 2.22. [Online]. Available: {http://www.nbii.gov/datainfo/metadata/metadata.symposium/leake/sld010.htm} [February 4, 1999].

Schweitzer, Peter. October 28, 1998. Frequently-asked Questions on FGDC Metadata. [Online]. Available: {http://geology.usgs.gov/tools/metadata/tools/doc/faq.html} [February 4, 1999].

Stein, Lincoln. 1998. Webmasters Domain: The Joy of SQL. WebTechniques: Solutions for Internet and Web Developers. Vol. 3, No. 10. [Online]. Available: http://www.webtechniques.com/ [February 4, 1999]

United States Geological Survey. October 5, 1998. Tools for Creation of Formal Metadata: cns: A Pre-parser for Formal Metadata. [Online]. Available: http://geology.usgs.gov/tools/metadata/tools/doc/cns.html [February 4, 1999]

_______. July 20, 1998. Tools for Creation of Formal Metadata: mp: A Compiler for Formal Metadata. [Online]. Available: http://geology.usgs.gov/tools/metadata/tools/doc/mp.html [February 4, 1999].

We welcome your comments about this article.