CSUS logo

James Card
CIS4740 Advanced Database Management
Fall 1999

Information Retrieval From Multiple Databases


The Problem

There are many databases available online, with many database structures and query languages. There is no standardized way to discover what databases are available, what data elements they contain, and how they can be queried. If you log on to a server that you know contains data that you need, how do you determine:

One step toward a solution is familiar to all of us -- even if the term is not: metadata. Computer operating systems provide metadata (file name, date, size, type). Providing metadata is a primary function of libraries: the library catalog is a database of metadata. Libraries now provide online access to huge databases of bibliographic data: data about data. The data they provide in the catalog is about the data they provide in their collections. This helps us discover and evaluate the potential usefulness of the actual data.

However, even the libraries' online databases suffer our original problem: how do you discover what databases are available, what data elements they contain, and how they can be queried? Yes there are lists of online bibliographic databases available, and each database offers a description of the records it contains and how to conduct a query. There is no way to conduct the same query against multiple databases without rewriting the query to match each one's query language, and this is possible only if you know in advance about the query language and data structures each database provides.

Possible Solutions

There have been many efforts to resolve this problem. Some of them have focussed on building software tools to allow the various database systems to talk to each other, so that you can issue a query using your database's native mechanism and the program translates the request into a format understood by the target system. Others have established standard communication protocols that enable a query to be written using a standardized format, which is then interpreted by the various target database systems. The primary difference is that the second approach allows the retrieval of information without a database client -- the client can be as simple as a dumb teletype terminal (although I doubt that anyone actually uses them as a primary I/O mechanism for a computer anymore). This allows great flexibility in how the data is accessed and used.

The following is a list of potential or partial solutions to our problem, including a brief summary of their relevant features.

Conclusion

The combination of the Z39.50 and SQL standards seems to be the best near-term solution. Both are well-established standards with significant user bases. Test implementations are already running. This is, unfortunately, still a "power user" type of solution. It offers great power and flexibility to skilled users, but even when hidden behind a GUI client application still requires a greater degree of sophistication than most casual users possess.

There is a lot of effort going into XML, and I believe that in the three-to-five year range XML may surpass the Z-SQL effort in viability. XML's close association with Web browser technology promises a lot of interest and attention from both the developer and end-user communities. If every web browser includes built-in XML capability there will be ample incentive for database developers to provide XML interfaces.

Resources

Z+SQL, Distributed Interoperable Database Searching with Z39.50 and SQL Distributed Systems Technology Centre: http://archive.dstc.edu.au/DDU/projects/Z3950/Z+SQL/about.html

Distributed searching across cultural resources using Z39.50 and SQL: a powerful combination Sonya M. Finnigan, Linda J. Bird, Robert M. Colomb: http://www.csu.edu.au/special/online99/proceedings99/104b.htm

Z39.50 Made Simple Sonya Finnigan, Nigel Ward: http://archive.dstc.edu.au/DDU/projects/ZINC/zsimple.htm

The Z39.50 Information Retrieval Standard, Part I: A Strategic View of Its Past, Present and Future Clifford A. Lynch, D-Lib Magazine, April 1997: http://www.dlib.org/dlib/april97/04lynch.html

JCC's SQL Std. Page provides information and links related to the SQL standard: http://www.jcc.com/SQLPages/jccs_sql.htm

Textuality - Knowledge is a text-based application Tim Bray http://www.textuality.com/

Webopedia: Online Computer Dictionary for Internet Terms and Technical Support, includes links to the standards making organizations, tutorials, white papers and other reference materials: http://www.webopedia.com

ICONE Z39.50 client software: http://roadrunner.crxnet.com/onec.html