Part 1 out of 4
WORKSHOP ON ELECTRONIC TEXTS
Edited by James Daly
9-10 June 1992
Library of Congress
Supported by a Grant from the David and Lucile Packard Foundation
*** *** *** ****** *** *** ***
TABLE OF CONTENTS
Prosser Gifford and Carl Fleischhauer
Session I. Content in a New Form: Who Will Use It and What Will They Do?
James Daly (Moderator)
Avra Michelson, Overview
Susan H. Veccia, User Evaluation
Joanne Freeman, Beyond the Scholar
Session II. Show and Tell
Jacqueline Hess (Moderator)
Elli Mylonas, Perseus Project
Eric M. Calaluca, Patrologia Latina Database
Carl Fleischhauer and Ricky Erway, American Memory
Dorothy Twohig, The Papers of George Washington
Maria L. Lebron, The Online Journal of Current Clinical Trials
Lynne K. Personius, Cornell mathematics books
Session III. Distribution, Networks, and Networking:
Options for Dissemination
Robert G. Zich (Moderator)
Clifford A. Lynch
Ronald L. Larsen
Edwin B. Brownrigg
Session IV. Image Capture, Text Capture, Overview of Text and
Image Storage Formats
William L. Hooton (Moderator)
A) Principal Methods for Image Capture of Text:
direct scanning, use of microform
Anne R. Kenney
Pamela Q.J. Andre
Judith A. Zidar
Donald J. Waters
B) Special Problems: bound volumes, conservation,
reproducing printed halftones
C) Image Standards and Implications for Preservation
D) Text Conversion: OCR vs. rekeying, standards of accuracy
and use of imperfect texts, service bureaus
Judith A. Zidar
Session V. Approaches to Preparing Electronic Texts
Susan Hockey (Moderator)
Eric M. Calaluca
Session VI. Copyright Issues
Session VII. Conclusion
Prosser Gifford (Moderator)
Appendix I: Program
Appendix II: Abstracts
Appendix III: Directory of Participants
*** *** *** ****** *** *** ***
I would like to thank Carl Fleischhauer and Prosser Gifford for the
opportunity to learn about areas of human activity unknown to me a scant
ten months ago, and the David and Lucile Packard Foundation for
supporting that opportunity. The help given by others is acknowledged on
a separate page.
19 October 1992
*** *** *** ****** *** *** ***
The Workshop on Electronic Texts (1) drew together representatives of
various projects and interest groups to compare ideas, beliefs,
experiences, and, in particular, methods of placing and presenting
historical textual materials in computerized form. Most attendees gained
much in insight and outlook from the event. But the assembly did not
form a new nation, or, to put it another way, the diversity of projects
and interests was too great to draw the representatives into a cohesive,
Everyone attending the Workshop shared an interest in preserving and
providing access to historical texts. But within this broad field the
attendees represented a variety of formal, informal, figurative, and
literal groups, with many individuals belonging to more than one. These
groups may be defined roughly according to the following topics or
* Searchable coded texts
* National and international computer networks
* CD-ROM production and dissemination
* Methods and technology for converting older paper materials into
* Study of the use of digital materials by scholars and others
This summary is arranged thematically and does not follow the actual
sequence of presentations.
(1) In this document, the phrase electronic text is used to mean
any computerized reproduction or version of a document, book,
article, or manuscript (including images), and not merely a machine-
readable or machine-searchable text.
(2) The Workshop was held at the Library of Congress on 9-10 June
1992, with funding from the David and Lucile Packard Foundation.
The document that follows represents a summary of the presentations
made at the Workshop and was compiled by James DALY. This
introduction was written by DALY and Carl FLEISCHHAUER.
PRESERVATION AND IMAGING
Preservation, as that term is used by archivists,(3) was most explicitly
discussed in the context of imaging. Anne KENNEY and Lynne PERSONIUS
explained how the concept of a faithful copy and the user-friendliness of
the traditional book have guided their project at Cornell University.(4)
Although interested in computerized dissemination, participants in the
Cornell project are creating digital image sets of older books in the
public domain as a source for a fresh paper facsimile or, in a future
phase, microfilm. The books returned to the library shelves are
high-quality and useful replacements on acid-free paper that should last
a long time. To date, the Cornell project has placed little or no
emphasis on creating searchable texts; one would not be surprised to find
that the project participants view such texts as new editions, and thus
not as faithful reproductions.
In her talk on preservation, Patricia BATTIN struck an ecumenical and
flexible note as she endorsed the creation and dissemination of a variety
of types of digital copies. Do not be too narrow in defining what counts
as a preservation element, BATTIN counseled; for the present, at least,
digital copies made with preservation in mind cannot be as narrowly
standardized as, say, microfilm copies with the same objective. Setting
standards precipitously can inhibit creativity, but delay can result in
chaos, she advised.
In part, BATTIN's position reflected the unsettled nature of image-format
standards, and attendees could hear echoes of this unsettledness in the
comments of various speakers. For example, Jean BARONAS reviewed the
status of several formal standards moving through committees of experts;
and Clifford LYNCH encouraged the use of a new guideline for transmitting
document images on Internet. Testimony from participants in the National
Agricultural Library's (NAL) Text Digitization Program and LC's American
Memory project highlighted some of the challenges to the actual creation
or interchange of images, including difficulties in converting
preservation microfilm to digital form. Donald WATERS reported on the
progress of a master plan for a project at Yale University to convert
books on microfilm to digital image sets, Project Open Book (POB).
The Workshop offered rather less of an imaging practicum than planned,
but "how-to" hints emerge at various points, for example, throughout
KENNEY's presentation and in the discussion of arcana such as
thresholding and dithering offered by George THOMA and FLEISCHHAUER.
(3) Although there is a sense in which any reproductions of
historical materials preserve the human record, specialists in the
field have developed particular guidelines for the creation of
acceptable preservation copies.
(4) Titles and affiliations of presenters are given at the
beginning of their respective talks and in the Directory of
Participants (Appendix III).
THE MACHINE-READABLE TEXT: MARKUP AND USE
The sections of the Workshop that dealt with machine-readable text tended
to be more concerned with access and use than with preservation, at least
in the narrow technical sense. Michael SPERBERG-McQUEEN made a forceful
presentation on the Text Encoding Initiative's (TEI) implementation of
the Standard Generalized Markup Language (SGML). His ideas were echoed
by Susan HOCKEY, Elli MYLONAS, and Stuart WEIBEL. While the
presentations made by the TEI advocates contained no practicum, their
discussion focused on the value of the finished product, what the
European Community calls reusability, but what may also be termed
durability. They argued that marking up--that is, coding--a text in a
well-conceived way will permit it to be moved from one computer
environment to another, as well as to be used by various users. Two
kinds of markup were distinguished: 1) procedural markup, which
describes the features of a text (e.g., dots on a page), and 2)
descriptive markup, which describes the structure or elements of a
document (e.g., chapters, paragraphs, and front matter).
The TEI proponents emphasized the importance of texts to scholarship.
They explained how heavily coded (and thus analyzed and annotated) texts
can underlie research, play a role in scholarly communication, and
facilitate classroom teaching. SPERBERG-McQUEEN reminded listeners that
a written or printed item (e.g., a particular edition of a book) is
merely a representation of the abstraction we call a text. To concern
ourselves with faithfully reproducing a printed instance of the text,
SPERBERG-McQUEEN argued, is to concern ourselves with the representation
of a representation ("images as simulacra for the text"). The TEI proponents'
interest in images tends to focus on corollary materials for use in teaching,
for example, photographs of the Acropolis to accompany a Greek text.
By the end of the Workshop, SPERBERG-McQUEEN confessed to having been
converted to a limited extent to the view that electronic images
constitute a promising alternative to microfilming; indeed, an
alternative probably superior to microfilming. But he was not convinced
that electronic images constitute a serious attempt to represent text in
electronic form. HOCKEY and MYLONAS also conceded that their experience
at the Pierce Symposium the previous week at Georgetown University and
the present conference at the Library of Congress had compelled them to
reevaluate their perspective on the usefulness of text as images.
Attendees could see that the text and image advocates were in
constructive tension, so to say.
Three nonTEI presentations described approaches to preparing
machine-readable text that are less rigorous and thus less expensive. In
the case of the Papers of George Washington, Dorothy TWOHIG explained
that the digital version will provide a not-quite-perfect rendering of
the transcribed text--some 135,000 documents, available for research
during the decades while the perfect or print version is completed.
Members of the American Memory team and the staff of NAL's Text
Digitization Program (see below) also outlined a middle ground concerning
searchable texts. In the case of American Memory, contractors produce
texts with about 99-percent accuracy that serve as "browse" or
"reference" versions of written or printed originals. End users who need
faithful copies or perfect renditions must refer to accompanying sets of
digital facsimile images or consult copies of the originals in a nearby
library or archive. American Memory staff argued that the high cost of
producing 100-percent accurate copies would prevent LC from offering
access to large parts of its collections.
THE MACHINE-READABLE TEXT: METHODS OF CONVERSION
Although the Workshop did not include a systematic examination of the
methods for converting texts from paper (or from facsimile images) into
machine-readable form, nevertheless, various speakers touched upon this
matter. For example, WEIBEL reported that OCLC has experimented with a
merging of multiple optical character recognition systems that will
reduce errors from an unacceptable rate of 5 characters out of every
l,000 to an unacceptable rate of 2 characters out of every l,000.
Pamela ANDRE presented an overview of NAL's Text Digitization Program and
Judith ZIDAR discussed the technical details. ZIDAR explained how NAL
purchased hardware and software capable of performing optical character
recognition (OCR) and text conversion and used its own staff to convert
texts. The process, ZIDAR said, required extensive editing and project
staff found themselves considering alternatives, including rekeying
and/or creating abstracts or summaries of texts. NAL reckoned costs at
$7 per page. By way of contrast, Ricky ERWAY explained that American
Memory had decided from the start to contract out conversion to external
service bureaus. The criteria used to select these contractors were cost
and quality of results, as opposed to methods of conversion. ERWAY noted
that historical documents or books often do not lend themselves to OCR.
Bound materials represent a special problem. In her experience, quality
control--inspecting incoming materials, counting errors in samples--posed
the most time-consuming aspect of contracting out conversion. ERWAY
reckoned American Memory's costs at $4 per page, but cautioned that fewer
cost-elements had been included than in NAL's figure.
OPTIONS FOR DISSEMINATION
The topic of dissemination proper emerged at various points during the
Workshop. At the session devoted to national and international computer
networks, LYNCH, Howard BESSER, Ronald LARSEN, and Edwin BROWNRIGG
highlighted the virtues of Internet today and of the network that will
evolve from Internet. Listeners could discern in these narratives a
vision of an information democracy in which millions of citizens freely
find and use what they need. LYNCH noted that a lack of standards
inhibits disseminating multimedia on the network, a topic also discussed
by BESSER. LARSEN addressed the issues of network scalability and
modularity and commented upon the difficulty of anticipating the effects
of growth in orders of magnitude. BROWNRIGG talked about the ability of
packet radio to provide certain links in a network without the need for
wiring. However, the presenters also called attention to the
shortcomings and incongruities of present-day computer networks. For
example: 1) Network use is growing dramatically, but much network
traffic consists of personal communication (E-mail). 2) Large bodies of
information are available, but a user's ability to search across their
entirety is limited. 3) There are significant resources for science and
technology, but few network sources provide content in the humanities.
4) Machine-readable texts are commonplace, but the capability of the
system to deal with images (let alone other media formats) lags behind.
A glimpse of a multimedia future for networks, however, was provided by
Maria LEBRON in her overview of the Online Journal of Current Clinical
Trials (OJCCT), and the process of scholarly publishing on-line.
The contrasting form of the CD-ROM disk was never systematically
analyzed, but attendees could glean an impression from several of the
show-and-tell presentations. The Perseus and American Memory examples
demonstrated recently published disks, while the descriptions of the
IBYCUS version of the Papers of George Washington and Chadwyck-Healey's
Patrologia Latina Database (PLD) told of disks to come. According to
Eric CALALUCA, PLD's principal focus has been on converting Jacques-Paul
Migne's definitive collection of Latin texts to machine-readable form.
Although everyone could share the network advocates' enthusiasm for an
on-line future, the possibility of rolling up one's sleeves for a session
with a CD-ROM containing both textual materials and a powerful retrieval
engine made the disk seem an appealing vessel indeed. The overall
discussion suggested that the transition from CD-ROM to on-line networked
access may prove far slower and more difficult than has been anticipated.
WHO ARE THE USERS AND WHAT DO THEY DO?
Although concerned with the technicalities of production, the Workshop
never lost sight of the purposes and uses of electronic versions of
textual materials. As noted above, those interested in imaging discussed
the problematical matter of digital preservation, while the TEI proponents
described how machine-readable texts can be used in research. This latter
topic received thorough treatment in the paper read by Avra MICHELSON.
She placed the phenomenon of electronic texts within the context of
broader trends in information technology and scholarly communication.
Among other things, MICHELSON described on-line conferences that
represent a vigorous and important intellectual forum for certain
disciplines. Internet now carries more than 700 conferences, with about
80 percent of these devoted to topics in the social sciences and the
humanities. Other scholars use on-line networks for "distance learning."
Meanwhile, there has been a tremendous growth in end-user computing;
professors today are less likely than their predecessors to ask the
campus computer center to process their data. Electronic texts are one
key to these sophisticated applications, MICHELSON reported, and more and
more scholars in the humanities now work in an on-line environment.
Toward the end of the Workshop, Michael LESK presented a corollary to
MICHELSON's talk, reporting the results of an experiment that compared
the work of one group of chemistry students using traditional printed
texts and two groups using electronic sources. The experiment
demonstrated that in the event one does not know what to read, one needs
the electronic systems; the electronic systems hold no advantage at the
moment if one knows what to read, but neither do they impose a penalty.
DALY provided an anecdotal account of the revolutionizing impact of the
new technology on his previous methods of research in the field of classics.
His account, by extrapolation, served to illustrate in part the arguments
made by MICHELSON concerning the positive effects of the sudden and radical
transformation being wrought in the ways scholars work.
Susan VECCIA and Joanne FREEMAN delineated the use of electronic
materials outside the university. The most interesting aspect of their
use, FREEMAN said, could be seen as a paradox: teachers in elementary
and secondary schools requested access to primary source materials but,
at the same time, found that "primariness" itself made these materials
difficult for their students to use.
Marybeth PETERS reviewed copyright law in the United States and offered
advice during a lively discussion of this subject. But uncertainty
remains concerning the price of copyright in a digital medium, because a
solution remains to be worked out concerning management and synthesis of
copyrighted and out-of-copyright pieces of a database.
As moderator of the final session of the Workshop, Prosser GIFFORD directed
discussion to future courses of action and the potential role of LC in
advancing them. Among the recommendations that emerged were the following:
* Workshop participants should 1) begin to think about working
with image material, but structure and digitize it in such a
way that at a later stage it can be interpreted into text, and
2) find a common way to build text and images together so that
they can be used jointly at some stage in the future, with
appropriate network support, because that is how users will want
to access these materials. The Library might encourage attempts
to bring together people who are working on texts and images.
* A network version of American Memory should be developed or
consideration should be given to making the data in it
available to people interested in doing network multimedia.
Given the current dearth of digital data that is appealing and
unencumbered by extremely complex rights problems, developing a
network version of American Memory could do much to help make
network multimedia a reality.
* Concerning the thorny issue of electronic deposit, LC should
initiate a catalytic process in terms of distributed
responsibility, that is, bring together the distributed
organizations and set up a study group to look at all the
issues related to electronic deposit and see where we as a
nation should move. For example, LC might attempt to persuade
one major library in each state to deal with its state
equivalent publisher, which might produce a cooperative project
that would be equitably distributed around the country, and one
in which LC would be dealing with a minimal number of publishers
and minimal copyright problems. LC must also deal with the
concept of on-line publishing, determining, among other things,
how serials such as OJCCT might be deposited for copyright.
* Since a number of projects are planning to carry out
preservation by creating digital images that will end up in
on-line or near-line storage at some institution, LC might play
a helpful role, at least in the near term, by accelerating how
to catalog that information into the Research Library Information
Network (RLIN) and then into OCLC, so that it would be accessible.
This would reduce the possibility of multiple institutions digitizing
the same work.
The Workshop was valuable because it brought together partisans from
various groups and provided an occasion to compare goals and methods.
The more committed partisans frequently communicate with others in their
groups, but less often across group boundaries. The Workshop was also
valuable to attendees--including those involved with American Memory--who
came less committed to particular approaches or concepts. These
attendees learned a great deal, and plan to select and employ elements of
imaging, text-coding, and networked distribution that suit their
respective projects and purposes.
Still, reality rears its ugly head: no breakthrough has been achieved.
On the imaging side, one confronts a proliferation of competing
data-interchange standards and a lack of consensus on the role of digital
facsimiles in preservation. In the realm of machine-readable texts, one
encounters a reasonably mature standard but methodological difficulties
and high costs. These latter problems, of course, represent a special
impediment to the desire, as it is sometimes expressed in the popular
press, "to put the [contents of the] Library of Congress on line." In
the words of one participant, there was "no solution to the economic
problems--the projects that are out there are surviving, but it is going
to be a lot of work to transform the information industry, and so far the
investment to do that is not forthcoming" (LESK, per litteras).
*** *** *** ****** *** *** ***
GIFFORD * Origin of Workshop in current Librarian's desire to make LC's
collections more widely available * Desiderata arising from the prospect
of greater interconnectedness *
After welcoming participants on behalf of the Library of Congress,
American Memory (AM), and the National Demonstration Lab, Prosser
GIFFORD, director for scholarly programs, Library of Congress, located
the origin of the Workshop on Electronic Texts in a conversation he had
had considerably more than a year ago with Carl FLEISCHHAUER concerning
some of the issues faced by AM. On the assumption that numerous other
people were asking the same questions, the decision was made to bring
together as many of these people as possible to ask the same questions
together. In a deeper sense, GIFFORD said, the origin of the Workshop
lay in the desire of the current Librarian of Congress, James H.
Billington, to make the collections of the Library, especially those
offering unique or unusual testimony on aspects of the American
experience, available to a much wider circle of users than those few
people who can come to Washington to use them. This meant that the
emphasis of AM, from the outset, has been on archival collections of the
basic material, and on making these collections themselves available,
rather than selected or heavily edited products.
From AM's emphasis followed the questions with which the Workshop began:
who will use these materials, and in what form will they wish to use
them. But an even larger issue deserving mention, in GIFFORD's view, was
the phenomenal growth in Internet connectivity. He expressed the hope
that the prospect of greater interconnectedness than ever before would
lead to: 1) much more cooperative and mutually supportive endeavors; 2)
development of systems of shared and distributed responsibilities to
avoid duplication and to ensure accuracy and preservation of unique
materials; and 3) agreement on the necessary standards and development of
the appropriate directories and indices to make navigation
straightforward among the varied resources that are, and increasingly
will be, available. In this connection, GIFFORD requested that
participants reflect from the outset upon the sorts of outcomes they
thought the Workshop might have. Did those present constitute a group
with sufficient common interests to propose a next step or next steps,
and if so, what might those be? They would return to these questions the
FLEISCHHAUER * Core of Workshop concerns preparation and production of
materials * Special challenge in conversion of textual materials *
Quality versus quantity * Do the several groups represented share common
Carl FLEISCHHAUER, coordinator, American Memory, Library of Congress,
emphasized that he would attempt to represent the people who perform some
of the work of converting or preparing materials and that the core of
the Workshop had to do with preparation and production. FLEISCHHAUER
then drew a distinction between the long term, when many things would be
available and connected in the ways that GIFFORD described, and the short
term, in which AM not only has wrestled with the issue of what is the
best course to pursue but also has faced a variety of technical
FLEISCHHAUER remarked AM's endeavors to deal with a wide range of library
formats, such as motion picture collections, sound-recording collections,
and pictorial collections of various sorts, especially collections of
photographs. In the course of these efforts, AM kept coming back to
textual materials--manuscripts or rare printed matter, bound materials,
etc. Text posed the greatest conversion challenge of all. Thus, the
genesis of the Workshop, which reflects the problems faced by AM. These
problems include physical problems. For example, those in the library
and archive business deal with collections made up of fragile and rare
manuscript items, bound materials, especially the notoriously brittle
bound materials of the late nineteenth century. These are precious
cultural artifacts, however, as well as interesting sources of
information, and LC desires to retain and conserve them. AM needs to
handle things without damaging them. Guillotining a book to run it
through a sheet feeder must be avoided at all costs.
Beyond physical problems, issues pertaining to quality arose. For
example, the desire to provide users with a searchable text is affected
by the question of acceptable level of accuracy. One hundred percent
accuracy is tremendously expensive. On the other hand, the output of
optical character recognition (OCR) can be tremendously inaccurate.
Although AM has attempted to find a middle ground, uncertainty persists
as to whether or not it has discovered the right solution.
Questions of quality arose concerning images as well. FLEISCHHAUER
contrasted the extremely high level of quality of the digital images in
the Cornell Xerox Project with AM's efforts to provide a browse-quality
or access-quality image, as opposed to an archival or preservation image.
FLEISCHHAUER therefore welcomed the opportunity to compare notes.
FLEISCHHAUER observed in passing that conversations he had had about
networks have begun to signal that for various forms of media a
determination may be made that there is a browse-quality item, or a
distribution-and-access-quality item that may coexist in some systems
with a higher quality archival item that would be inconvenient to send
through the network because of its size. FLEISCHHAUER referred, of
course, to images more than to searchable text.
As AM considered those questions, several conceptual issues arose: ought
AM occasionally to reproduce materials entirely through an image set, at
other times, entirely through a text set, and in some cases, a mix?
There probably would be times when the historical authenticity of an
artifact would require that its image be used. An image might be
desirable as a recourse for users if one could not provide 100-percent
accurate text. Again, AM wondered, as a practical matter, if a
distinction could be drawn between rare printed matter that might exist
in multiple collections--that is, in ten or fifteen libraries. In such
cases, the need for perfect reproduction would be less than for unique
items. Implicit in his remarks, FLEISCHHAUER conceded, was the admission
that AM has been tilting strongly towards quantity and drawing back a
little from perfect quality. That is, it seemed to AM that society would
be better served if more things were distributed by LC--even if they were
not quite perfect--than if fewer things, perfectly represented, were
distributed. This was stated as a proposition to be tested, with
responses to be gathered from users.
In thinking about issues related to reproduction of materials and seeing
other people engaged in parallel activities, AM deemed it useful to
convene a conference. Hence, the Workshop. FLEISCHHAUER thereupon
surveyed the several groups represented: 1) the world of images (image
users and image makers); 2) the world of text and scholarship and, within
this group, those concerned with language--FLEISCHHAUER confessed to finding
delightful irony in the fact that some of the most advanced thinkers on
computerized texts are those dealing with ancient Greek and Roman materials;
3) the network world; and 4) the general world of library science, which
includes people interested in preservation and cataloging.
FLEISCHHAUER concluded his remarks with special thanks to the David and
Lucile Packard Foundation for its support of the meeting, the American
Memory group, the Office for Scholarly Programs, the National
Demonstration Lab, and the Office of Special Events. He expressed the
hope that David Woodley Packard might be able to attend, noting that
Packard's work and the work of the foundation had sponsored a number of
projects in the text area.
SESSION I. CONTENT IN A NEW FORM: WHO WILL USE IT AND WHAT WILL THEY DO?
DALY * Acknowledgements * A new Latin authors disk * Effects of the new
technology on previous methods of research *
Serving as moderator, James DALY acknowledged the generosity of all the
presenters for giving of their time, counsel, and patience in planning
the Workshop, as well as of members of the American Memory project and
other Library of Congress staff, and the David and Lucile Packard
Foundation and its executive director, Colburn S. Wilbur.
DALY then recounted his visit in March to the Center for Electronic Texts
in the Humanities (CETH) and the Department of Classics at Rutgers
University, where an old friend, Lowell Edmunds, introduced him to the
department's IBYCUS scholarly personal computer, and, in particular, the
new Latin CD-ROM, containing, among other things, almost all classical
Latin literary texts through A.D. 200. Packard Humanities Institute
(PHI), Los Altos, California, released this disk late in 1991, with a
nominal triennial licensing fee.
Playing with the disk for an hour or so at Rutgers brought home to DALY
at once the revolutionizing impact of the new technology on his previous
methods of research. Had this disk been available two or three years
earlier, DALY contended, when he was engaged in preparing a commentary on
Book 10 of Virgil's Aeneid for Cambridge University Press, he would not
have required a forty-eight-square-foot table on which to spread the
numerous, most frequently consulted items, including some ten or twelve
concordances to key Latin authors, an almost equal number of lexica to
authors who lacked concordances, and where either lexica or concordances
were lacking, numerous editions of authors antedating and postdating Virgil.
Nor, when checking each of the average six to seven words contained in
the Virgilian hexameter for its usage elsewhere in Virgil's works or
other Latin authors, would DALY have had to maintain the laborious
mechanical process of flipping through these concordances, lexica, and
editions each time. Nor would he have had to frequent as often the
Milton S. Eisenhower Library at the Johns Hopkins University to consult
the Thesaurus Linguae Latinae. Instead of devoting countless hours, or
the bulk of his research time, to gathering data concerning Virgil's use
of words, DALY--now freed by PHI's Latin authors disk from the
tyrannical, yet in some ways paradoxically happy scholarly drudgery--
would have been able to devote that same bulk of time to analyzing and
interpreting Virgilian verbal usage.
Citing Theodore Brunner, Gregory Crane, Elli MYLONAS, and Avra MICHELSON,
DALY argued that this reversal in his style of work, made possible by the
new technology, would perhaps have resulted in better, more productive
research. Indeed, even in the course of his browsing the Latin authors
disk at Rutgers, its powerful search, retrieval, and highlighting
capabilities suggested to him several new avenues of research into
Virgil's use of sound effects. This anecdotal account, DALY maintained,
may serve to illustrate in part the sudden and radical transformation
being wrought in the ways scholars work.
MICHELSON * Elements related to scholarship and technology * Electronic
texts within the context of broader trends within information technology
and scholarly communication * Evaluation of the prospects for the use of
electronic texts * Relationship of electronic texts to processes of
scholarly communication in humanities research * New exchange formats
created by scholars * Projects initiated to increase scholarly access to
converted text * Trend toward making electronic resources available
through research and education networks * Changes taking place in
scholarly communication among humanities scholars * Network-mediated
scholarship transforming traditional scholarly practices * Key
information technology trends affecting the conduct of scholarly
communication over the next decade * The trend toward end-user computing
* The trend toward greater connectivity * Effects of these trends * Key
transformations taking place * Summary of principal arguments *
Avra MICHELSON, Archival Research and Evaluation Staff, National Archives
and Records Administration (NARA), argued that establishing who will use
electronic texts and what they will use them for involves a consideration
of both information technology and scholarship trends. This
consideration includes several elements related to scholarship and
technology: 1) the key trends in information technology that are most
relevant to scholarship; 2) the key trends in the use of currently
available technology by scholars in the nonscientific community; and 3)
the relationship between these two very distinct but interrelated trends.
The investment in understanding this relationship being made by
information providers, technologists, and public policy developers, as
well as by scholars themselves, seems to be pervasive and growing,
MICHELSON contended. She drew on collaborative work with Jeff Rothenberg
on the scholarly use of technology.
MICHELSON sought to place the phenomenon of electronic texts within the
context of broader trends within information technology and scholarly
communication. She argued that electronic texts are of most use to
researchers to the extent that the researchers' working context (i.e.,
their relevant bibliographic sources, collegial feedback, analytic tools,
notes, drafts, etc.), along with their field's primary and secondary
sources, also is accessible in electronic form and can be integrated in
ways that are unique to the on-line environment.
Evaluation of the prospects for the use of electronic texts includes two
elements: 1) an examination of the ways in which researchers currently
are using electronic texts along with other electronic resources, and 2)
an analysis of key information technology trends that are affecting the
long-term conduct of scholarly communication. MICHELSON limited her
discussion of the use of electronic texts to the practices of humanists
and noted that the scientific community was outside the panel's overview.
MICHELSON examined the nature of the current relationship of electronic
texts in particular, and electronic resources in general, to what she
maintained were, essentially, five processes of scholarly communication
in humanities research. Researchers 1) identify sources, 2) communicate
with their colleagues, 3) interpret and analyze data, 4) disseminate
their research findings, and 5) prepare curricula to instruct the next
generation of scholars and students. This examination would produce a
clearer understanding of the synergy among these five processes that
fuels the tendency of the use of electronic resources for one process to
stimulate its use for other processes of scholarly communication.
For the first process of scholarly communication, the identification of
sources, MICHELSON remarked the opportunity scholars now enjoy to
supplement traditional word-of-mouth searches for sources among their
colleagues with new forms of electronic searching. So, for example,
instead of having to visit the library, researchers are able to explore
descriptions of holdings in their offices. Furthermore, if their own
institutions' holdings prove insufficient, scholars can access more than
200 major American library catalogues over Internet, including the
universities of California, Michigan, Pennsylvania, and Wisconsin.
Direct access to the bibliographic databases offers intellectual
empowerment to scholars by presenting a comprehensive means of browsing
through libraries from their homes and offices at their convenience.
The second process of communication involves communication among
scholars. Beyond the most common methods of communication, scholars are
using E-mail and a variety of new electronic communications formats
derived from it for further academic interchange. E-mail exchanges are
growing at an astonishing rate, reportedly 15 percent a month. They
currently constitute approximately half the traffic on research and
education networks. Moreover, the global spread of E-mail has been so
rapid that it is now possible for American scholars to use it to
communicate with colleagues in close to 140 other countries.
Other new exchange formats created by scholars and operating on Internet
include more than 700 conferences, with about 80 percent of these devoted
to topics in the social sciences and humanities. The rate of growth of
these scholarly electronic conferences also is astonishing. From l990 to
l991, 200 new conferences were identified on Internet. From October 1991
to June 1992, an additional 150 conferences in the social sciences and
humanities were added to this directory of listings. Scholars have
established conferences in virtually every field, within every different
discipline. For example, there are currently close to 600 active social
science and humanities conferences on topics such as art and
architecture, ethnomusicology, folklore, Japanese culture, medical
education, and gifted and talented education. The appeal to scholars of
communicating through these conferences is that, unlike any other medium,
electronic conferences today provide a forum for global communication
with peers at the front end of the research process.
Interpretation and analysis of sources constitutes the third process of
scholarly communication that MICHELSON discussed in terms of texts and
textual resources. The methods used to analyze sources fall somewhere on
a continuum from quantitative analysis to qualitative analysis.
Typically, evidence is culled and evaluated using methods drawn from both
ends of this continuum. At one end, quantitative analysis involves the
use of mathematical processes such as a count of frequencies and
distributions of occurrences or, on a higher level, regression analysis.
At the other end of the continuum, qualitative analysis typically
involves nonmathematical processes oriented toward language
interpretation or the building of theory. Aspects of this work involve
the processing--either manual or computational--of large and sometimes
massive amounts of textual sources, although the use of nontextual
sources as evidence, such as photographs, sound recordings, film footage,
and artifacts, is significant as well.
Scholars have discovered that many of the methods of interpretation and
analysis that are related to both quantitative and qualitative methods
are processes that can be performed by computers. For example, computers
can count. They can count brush strokes used in a Rembrandt painting or
perform regression analysis for understanding cause and effect. By means
of advanced technologies, computers can recognize patterns, analyze text,
and model concepts. Furthermore, computers can complete these processes
faster with more sources and with greater precision than scholars who
must rely on manual interpretation of data. But if scholars are to use
computers for these processes, source materials must be in a form
amenable to computer-assisted analysis. For this reason many scholars,
once they have identified the sources that are key to their research, are
converting them to machine-readable form. Thus, a representative example
of the numerous textual conversion projects organized by scholars around
the world in recent years to support computational text analysis is the
TLG, the Thesaurus Linguae Graecae. This project is devoted to
converting the extant ancient texts of classical Greece. (Editor's note:
according to the TLG Newsletter of May l992, TLG was in use in thirty-two
different countries. This figure updates MICHELSON's previous count by one.)
The scholars performing these conversions have been asked to recognize
that the electronic sources they are converting for one use possess value
for other research purposes as well. As a result, during the past few
years, humanities scholars have initiated a number of projects to
increase scholarly access to converted text. So, for example, the Text
Encoding Initiative (TEI), about which more is said later in the program,
was established as an effort by scholars to determine standard elements
and methods for encoding machine-readable text for electronic exchange.
In a second effort to facilitate the sharing of converted text, scholars
have created a new institution, the Center for Electronic Texts in the
Humanities (CETH). The center estimates that there are 8,000 series of
source texts in the humanities that have been converted to
machine-readable form worldwide. CETH is undertaking an international
search for converted text in the humanities, compiling it into an
electronic library, and preparing bibliographic descriptions of the
sources for the Research Libraries Information Network's (RLIN)
machine-readable data file. The library profession has begun to initiate
large conversion projects as well, such as American Memory.
While scholars have been making converted text available to one another,
typically on disk or on CD-ROM, the clear trend is toward making these
resources available through research and education networks. Thus, the
American and French Research on the Treasury of the French Language
(ARTFL) and the Dante Project are already available on Internet.
MICHELSON summarized this section on interpretation and analysis by
noting that: 1) increasing numbers of humanities scholars in the library
community are recognizing the importance to the advancement of
scholarship of retrospective conversion of source materials in the arts
and humanities; and 2) there is a growing realization that making the
sources available on research and education networks maximizes their
usefulness for the analysis performed by humanities scholars.
The fourth process of scholarly communication is dissemination of
research findings, that is, publication. Scholars are using existing
research and education networks to engineer a new type of publication:
scholarly-controlled journals that are electronically produced and
disseminated. Although such journals are still emerging as a
communication format, their number has grown, from approximately twelve
to thirty-six during the past year (July 1991 to June 1992). Most of
these electronic scholarly journals are devoted to topics in the
humanities. As with network conferences, scholarly enthusiasm for these
electronic journals stems from the medium's unique ability to advance
scholarship in a way that no other medium can do by supporting global
feedback and interchange, practically in real time, early in the research
process. Beyond scholarly journals, MICHELSON remarked the delivery of
commercial full-text products, such as articles in professional journals,
newsletters, magazines, wire services, and reference sources. These are
being delivered via on-line local library catalogues, especially through
CD-ROMs. Furthermore, according to MICHELSON, there is general optimism
that the copyright and fees issues impeding the delivery of full text on
existing research and education networks soon will be resolved.
The final process of scholarly communication is curriculum development
and instruction, and this involves the use of computer information
technologies in two areas. The first is the development of
computer-oriented instructional tools, which includes simulations,
multimedia applications, and computer tools that are used to assist in
the analysis of sources in the classroom, etc. The Perseus Project, a
database that provides a multimedia curriculum on classical Greek
civilization, is a good example of the way in which entire curricula are
being recast using information technologies. It is anticipated that the
current difficulty in exchanging electronically computer-based
instructional software, which in turn makes it difficult for one scholar
to build upon the work of others, will be resolved before too long.
Stand-alone curricular applications that involve electronic text will be
sharable through networks, reinforcing their significance as intellectual
products as well as instructional tools.
The second aspect of electronic learning involves the use of research and
education networks for distance education programs. Such programs
interactively link teachers with students in geographically scattered
locations and rely on the availability of electronic instructional
resources. Distance education programs are gaining wide appeal among
state departments of education because of their demonstrated capacity to
bring advanced specialized course work and an array of experts to many
classrooms. A recent report found that at least 32 states operated at
least one statewide network for education in 1991, with networks under
development in many of the remaining states.
MICHELSON summarized this section by noting two striking changes taking
place in scholarly communication among humanities scholars. First is the
extent to which electronic text in particular, and electronic resources
in general, are being infused into each of the five processes described
above. As mentioned earlier, there is a certain synergy at work here.
The use of electronic resources for one process tends to stimulate its
use for other processes, because the chief course of movement is toward a
comprehensive on-line working context for humanities scholars that
includes on-line availability of key bibliographies, scholarly feedback,
sources, analytical tools, and publications. MICHELSON noted further
that the movement toward a comprehensive on-line working context for
humanities scholars is not new. In fact, it has been underway for more
than forty years in the humanities, since Father Roberto Busa began
developing an electronic concordance of the works of Saint Thomas Aquinas
in 1949. What we are witnessing today, MICHELSON contended, is not the
beginning of this on-line transition but, for at least some humanities
scholars, the turning point in the transition from a print to an
electronic working context. Coinciding with the on-line transition, the
second striking change is the extent to which research and education
networks are becoming the new medium of scholarly communication. The
existing Internet and the pending National Education and Research Network
(NREN) represent the new meeting ground where scholars are going for
bibliographic information, scholarly dialogue and feedback, the most
current publications in their field, and high-level educational
offerings. Traditional scholarly practices are undergoing tremendous
transformations as a result of the emergence and growing prominence of
what is called network-mediated scholarship.
MICHELSON next turned to the second element of the framework she proposed
at the outset of her talk for evaluating the prospects for electronic
text, namely the key information technology trends affecting the conduct
of scholarly communication over the next decade: 1) end-user computing
and 2) connectivity.
End-user computing means that the person touching the keyboard, or
performing computations, is the same as the person who initiates or
consumes the computation. The emergence of personal computers, along
with a host of other forces, such as ubiquitous computing, advances in
interface design, and the on-line transition, is prompting the consumers
of computation to do their own computing, and is thus rendering obsolete
the traditional distinction between end users and ultimate users.
The trend toward end-user computing is significant to consideration of
the prospects for electronic texts because it means that researchers are
becoming more adept at doing their own computations and, thus, more
competent in the use of electronic media. By avoiding programmer
intermediaries, computation is becoming central to the researcher's
thought process. This direct involvement in computing is changing the
researcher's perspective on the nature of research itself, that is, the
kinds of questions that can be posed, the analytical methodologies that
can be used, the types and amount of sources that are appropriate for
analyses, and the form in which findings are presented. The trend toward
end-user computing means that, increasingly, electronic media and
computation are being infused into all processes of humanities
scholarship, inspiring remarkable transformations in scholarly
The trend toward greater connectivity suggests that researchers are using
computation increasingly in network environments. Connectivity is
important to scholarship because it erases the distance that separates
students from teachers and scholars from their colleagues, while allowing
users to access remote databases, share information in many different
media, connect to their working context wherever they are, and
collaborate in all phases of research.
The combination of the trend toward end-user computing and the trend
toward connectivity suggests that the scholarly use of electronic
resources, already evident among some researchers, will soon become an
established feature of scholarship. The effects of these trends, along
with ongoing changes in scholarly practices, point to a future in which
humanities researchers will use computation and electronic communication
to help them formulate ideas, access sources, perform research,
collaborate with colleagues, seek peer review, publish and disseminate
results, and engage in many other professional and educational activities.
In summary, MICHELSON emphasized four points: 1) A portion of humanities
scholars already consider electronic texts the preferred format for
analysis and dissemination. 2) Scholars are using these electronic
texts, in conjunction with other electronic resources, in all the
processes of scholarly communication. 3) The humanities scholars'
working context is in the process of changing from print technology to
electronic technology, in many ways mirroring transformations that have
occurred or are occurring within the scientific community. 4) These
changes are occurring in conjunction with the development of a new
communication medium: research and education networks that are
characterized by their capacity to advance scholarship in a wholly unique
MICHELSON also reiterated her three principal arguments: l) Electronic
texts are best understood in terms of the relationship to other
electronic resources and the growing prominence of network-mediated
scholarship. 2) The prospects for electronic texts lie in their capacity
to be integrated into the on-line network of electronic resources that
comprise the new working context for scholars. 3) Retrospective conversion
of portions of the scholarly record should be a key strategy as information
providers respond to changes in scholarly communication practices.
VECCIA * AM's evaluation project and public users of electronic resources
* AM and its design * Site selection and evaluating the Macintosh
implementation of AM * Characteristics of the six public libraries
selected * Characteristics of AM's users in these libraries * Principal
ways AM is being used *
Susan VECCIA, team leader, and Joanne FREEMAN, associate coordinator,
American Memory, Library of Congress, gave a joint presentation. First,
by way of introduction, VECCIA explained her and FREEMAN's roles in
American Memory (AM). Serving principally as an observer, VECCIA has
assisted with the evaluation project of AM, placing AM collections in a
variety of different sites around the country and helping to organize and
implement that project. FREEMAN has been an associate coordinator of AM
and has been involved principally with the interpretative materials,
preparing some of the electronic exhibits and printed historical
information that accompanies AM and that is requested by users. VECCIA
and FREEMAN shared anecdotal observations concerning AM with public users
of electronic resources. Notwithstanding a fairly structured evaluation
in progress, both VECCIA and FREEMAN chose not to report on specifics in
terms of numbers, etc., because they felt it was too early in the
evaluation project to do so.
AM is an electronic archive of primary source materials from the Library
of Congress, selected collections representing a variety of formats--
photographs, graphic arts, recorded sound, motion pictures, broadsides,
and soon, pamphlets and books. In terms of the design of this system,
the interpretative exhibits have been kept separate from the primary
resources, with good reason. Accompanying this collection are printed
documentation and user guides, as well as guides that FREEMAN prepared for
teachers so that they may begin using the content of the system at once.
VECCIA described the evaluation project before talking about the public
users of AM, limiting her remarks to public libraries, because FREEMAN
would talk more specifically about schools from kindergarten to twelfth
grade (K-12). Having started in spring 1991, the evaluation currently
involves testing of the Macintosh implementation of AM. Since the
primary goal of this evaluation is to determine the most appropriate
audience or audiences for AM, very different sites were selected. This
makes evaluation difficult because of the varying degrees of technology
literacy among the sites. AM is situated in forty-four locations, of
which six are public libraries and sixteen are schools. Represented
among the schools are elementary, junior high, and high schools.
District offices also are involved in the evaluation, which will
conclude in summer 1993.
VECCIA focused the remainder of her talk on the six public libraries, one
of which doubles as a state library. They represent a range of
geographic areas and a range of demographic characteristics. For
example, three are located in urban settings, two in rural settings, and
one in a suburban setting. A range of technical expertise is to be found
among these facilities as well. For example, one is an "Apple library of
the future," while two others are rural one-room libraries--in one, AM
sits at the front desk next to a tractor manual.
All public libraries have been extremely enthusiastic, supportive, and
appreciative of the work that AM has been doing. VECCIA characterized
various users: Most users in public libraries describe themselves as
general readers; of the students who use AM in the public libraries,
those in fourth grade and above seem most interested. Public libraries
in rural sites tend to attract retired people, who have been highly
receptive to AM. Users tend to fall into two additional categories:
people interested in the content and historical connotations of these
primary resources, and those fascinated by the technology. The format
receiving the most comments has been motion pictures. The adult users in
public libraries are more comfortable with IBM computers, whereas young
people seem comfortable with either IBM or Macintosh, although most of
them seem to come from a Macintosh background. This same tendency is
found in the schools.
What kinds of things do users do with AM? In a public library there are
two main goals or ways that AM is being used: as an individual learning
tool, and as a leisure activity. Adult learning was one area that VECCIA
would highlight as a possible application for a tool such as AM. She
described a patron of a rural public library who comes in every day on
his lunch hour and literally reads AM, methodically going through the
collection image by image. At the end of his hour he makes an electronic
bookmark, puts it in his pocket, and returns to work. The next day he
comes in and resumes where he left off. Interestingly, this man had
never been in the library before he used AM. In another small, rural
library, the coordinator reports that AM is a popular activity for some
of the older, retired people in the community, who ordinarily would not
use "those things,"--computers. Another example of adult learning in
public libraries is book groups, one of which, in particular, is using AM
as part of its reading on industrialization, integration, and urbanization
in the early 1900s.
One library reports that a family is using AM to help educate their
children. In another instance, individuals from a local museum came in
to use AM to prepare an exhibit on toys of the past. These two examples
emphasize the mission of the public library as a cultural institution,
reaching out to people who do not have the same resources available to
those who live in a metropolitan area or have access to a major library.
One rural library reports that junior high school students in large
numbers came in one afternoon to use AM for entertainment. A number of
public libraries reported great interest among postcard collectors in the
Detroit collection, which was essentially a collection of images used on
postcards around the turn of the century. Train buffs are similarly
interested because that was a time of great interest in railroading.
People, it was found, relate to things that they know of firsthand. For
example, in both rural public libraries where AM was made available,
observers reported that the older people with personal remembrances of
the turn of the century were gravitating to the Detroit collection.
These examples served to underscore MICHELSON's observation re the
integration of electronic tools and ideas--that people learn best when
the material relates to something they know.
VECCIA made the final point that in many cases AM serves as a
public-relations tool for the public libraries that are testing it. In
one case, AM is being used as a vehicle to secure additional funding for
the library. In another case, AM has served as an inspiration to the
staff of a major local public library in the South to think about ways to
make its own collection of photographs more accessible to the public.
FREEMAN * AM and archival electronic resources in a school environment *
Questions concerning context * Questions concerning the electronic format
itself * Computer anxiety * Access and availability of the system *
Hardware * Strengths gained through the use of archival resources in
Reiterating an observation made by VECCIA, that AM is an archival
resource made up of primary materials with very little interpretation,
FREEMAN stated that the project has attempted to bridge the gap between
these bare primary materials and a school environment, and in that cause
has created guided introductions to AM collections. Loud demand from the
educational community, chiefly from teachers working with the upper
grades of elementary school through high school, greeted the announcement
that AM would be tested around the country.
FREEMAN reported not only on what was learned about AM in a school
environment, but also on several universal questions that were raised
concerning archival electronic resources in schools. She discussed
several strengths of this type of material in a school environment as
opposed to a highly structured resource that offers a limited number of
paths to follow.
FREEMAN first raised several questions about using AM in a school
environment. There is often some difficulty in developing a sense of
what the system contains. Many students sit down at a computer resource
and assume that, because AM comes from the Library of Congress, all of
American history is now at their fingertips. As a result of that sort of
mistaken judgment, some students are known to conclude that AM contains
nothing of use to them when they look for one or two things and do not
find them. It is difficult to discover that middle ground where one has
a sense of what the system contains. Some students grope toward the idea
of an archive, a new idea to them, since they have not previously
experienced what it means to have access to a vast body of somewhat
Other questions raised by FREEMAN concerned the electronic format itself.
For instance, in a school environment it is often difficult both for
teachers and students to gain a sense of what it is they are viewing.
They understand that it is a visual image, but they do not necessarily
know that it is a postcard from the turn of the century, a panoramic
photograph, or even machine-readable text of an eighteenth-century
broadside, a twentieth-century printed book, or a nineteenth-century
diary. That distinction is often difficult for people in a school
environment to grasp. Because of that, it occasionally becomes difficult
to draw conclusions from what one is viewing.
FREEMAN also noted the obvious fear of the computer, which constitutes a
difficulty in using an electronic resource. Though students in general
did not suffer from this anxiety, several older students feared that they
were computer-illiterate, an assumption that became self-fulfilling when
they searched for something but failed to find it. FREEMAN said she
believed that some teachers also fear computer resources, because they
believe they lack complete control. FREEMAN related the example of
teachers shooing away students because it was not their time to use the
system. This was a case in which the situation had to be extremely
structured so that the teachers would not feel that they had lost their
grasp on what the system contained.
A final question raised by FREEMAN concerned access and availability of
the system. She noted the occasional existence of a gap in communication
between school librarians and teachers. Often AM sits in a school
library and the librarian is the person responsible for monitoring the
system. Teachers do not always take into their world new library
resources about which the librarian is excited. Indeed, at the sites
where AM had been used most effectively within a library, the librarian
was required to go to specific teachers and instruct them in its use. As
a result, several AM sites will have in-service sessions over a summer,
in the hope that perhaps, with a more individualized link, teachers will
be more likely to use the resource.
A related issue in the school context concerned the number of
workstations available at any one location. Centralization of equipment
at the district level, with teachers invited to download things and walk
away with them, proved unsuccessful because the hours these offices were
open were also school hours.
Another issue was hardware. As VECCIA observed, a range of sites exists,
some technologically advanced and others essentially acquiring their
first computer for the primary purpose of using it in conjunction with
AM's testing. Users at technologically sophisticated sites want even
more sophisticated hardware, so that they can perform even more
sophisticated tasks with the materials in AM. But once they acquire a
newer piece of hardware, they must learn how to use that also; at an
unsophisticated site it takes an extremely long time simply to become
accustomed to the computer, not to mention the program offered with the
computer. All of these small issues raise one large question, namely,
are systems like AM truly rewarding in a school environment, or do they
simply act as innovative toys that do little more than spark interest?
FREEMAN contended that the evaluation project has revealed several strengths
that were gained through the use of archival resources in schools, including:
* Psychic rewards from using AM as a vast, rich database, with
teachers assigning various projects to students--oral presentations,
written reports, a documentary, a turn-of-the-century newspaper--
projects that start with the materials in AM but are completed using
other resources; AM thus is used as a research tool in conjunction
with other electronic resources, as well as with books and items in
the library where the system is set up.
* Students are acquiring computer literacy in a humanities context.
* This sort of system is overcoming the isolation between disciplines
that often exists in schools. For example, many English teachers are
requiring their students to write papers on historical topics
represented in AM. Numerous teachers have reported that their
students are learning critical thinking skills using the system.
* On a broader level, AM is introducing primary materials, not only
to students but also to teachers, in an environment where often
simply none exist--an exciting thing for the students because it
helps them learn to conduct research, to interpret, and to draw
their own conclusions. In learning to conduct research and what it
means, students are motivated to seek knowledge. That relates to
another positive outcome--a high level of personal involvement of
students with the materials in this system and greater motivation to
conduct their own research and draw their own conclusions.
* Perhaps the most ironic strength of these kinds of archival
electronic resources is that many of the teachers AM interviewed
were desperate, it is no exaggeration to say, not only for primary
materials but for unstructured primary materials. These would, they
thought, foster personally motivated research, exploration, and
excitement in their students. Indeed, these materials have done
just that. Ironically, however, this lack of structure produces
some of the confusion to which the newness of these kinds of
resources may also contribute. The key to effective use of archival
products in a school environment is a clear, effective introduction
to the system and to what it contains.
DISCUSSION * Nothing known, quantitatively, about the number of
humanities scholars who must see the original versus those who would
settle for an edited transcript, or about the ways in which humanities
scholars are using information technology * Firm conclusions concerning
the manner and extent of the use of supporting materials in print
provided by AM to await completion of evaluative study * A listener's
reflections on additional applications of electronic texts * Role of
electronic resources in teaching elementary research skills to students *
During the discussion that followed the presentations by MICHELSON,
VECCIA, and FREEMAN, additional points emerged.
LESK asked if MICHELSON could give any quantitative estimate of the
number of humanities scholars who must see or want to see the original,
or the best possible version of the material, versus those who typically
would settle for an edited transcript. While unable to provide a figure,
she offered her impressions as an archivist who has done some reference
work and has discussed this issue with other archivists who perform
reference, that those who use archives and those who use primary sources
for what would be considered very high-level scholarly research, as
opposed to, say, undergraduate papers, were few in number, especially
given the public interest in using primary sources to conduct
genealogical or avocational research and the kind of professional
research done by people in private industry or the federal government.
More important in MICHELSON's view was that, quantitatively, nothing is
known about the ways in which, for example, humanities scholars are using
information technology. No studies exist to offer guidance in creating
strategies. The most recent study was conducted in 1985 by the American
Council of Learned Societies (ACLS), and what it showed was that 50
percent of humanities scholars at that time were using computers. That
constitutes the extent of our knowledge.
Concerning AM's strategy for orienting people toward the scope of
electronic resources, FREEMAN could offer no hard conclusions at this
point, because she and her colleagues were still waiting to see,
particularly in the schools, what has been made of their efforts. Within
the system, however, AM has provided what are called electronic exhibits-
-such as introductions to time periods and materials--and these are
intended to offer a student user a sense of what a broadside is and what
it might tell her or him. But FREEMAN conceded that the project staff
would have to talk with students next year, after teachers have had a
summer to use the materials, and attempt to discover what the students
were learning from the materials. In addition, FREEMAN described
supporting materials in print provided by AM at the request of local
teachers during a meeting held at LC. These included time lines,
bibliographies, and other materials that could be reproduced on a
photocopier in a classroom. Teachers could walk away with and use these,
and in this way gain a better understanding of the contents. But again,
reaching firm conclusions concerning the manner and extent of their use
would have to wait until next year.
As to the changes she saw occurring at the National Archives and Records
Administration (NARA) as a result of the increasing emphasis on
technology in scholarly research, MICHELSON stated that NARA at this
point was absorbing the report by her and Jeff Rothenberg addressing
strategies for the archival profession in general, although not for the
National Archives specifically. NARA is just beginning to establish its
role and what it can do. In terms of changes and initiatives that NARA
can take, no clear response could be given at this time.
GREENFIELD remarked two trends mentioned in the session. Reflecting on
DALY's opening comments on how he could have used a Latin collection of
text in an electronic form, he said that at first he thought most scholars
would be unwilling to do that. But as he thought of that in terms of the
original meaning of research--that is, having already mastered these texts,
researching them for critical and comparative purposes--for the first time,
the electronic format made a lot of sense. GREENFIELD could envision
growing numbers of scholars learning the new technologies for that very
aspect of their scholarship and for convenience's sake.
Listening to VECCIA and FREEMAN, GREENFIELD thought of an additional
application of electronic texts. He realized that AM could be used as a
guide to lead someone to original sources. Students cannot be expected
to have mastered these sources, things they have never known about
before. Thus, AM is leading them, in theory, to a vast body of
information and giving them a superficial overview of it, enabling them
to select parts of it. GREENFIELD asked if any evidence exists that this
resource will indeed teach the new user, the K-12 students, how to do
research. Scholars already know how to do research and are applying
these new tools. But he wondered why students would go beyond picking
out things that were most exciting to them.
FREEMAN conceded the correctness of GREENFIELD's observation as applied
to a school environment. The risk is that a student would sit down at a
system, play with it, find some things of interest, and then walk away.
But in the relatively controlled situation of a school library, much will
depend on the instructions a teacher or a librarian gives a student. She
viewed the situation not as one of fine-tuning research skills but of
involving students at a personal level in understanding and researching
things. Given the guidance one can receive at school, it then becomes
possible to teach elementary research skills to students, which in fact
one particular librarian said she was teaching her fifth graders.
FREEMAN concluded that introducing the idea of following one's own path
of inquiry, which is essentially what research entails, involves more
than teaching specific skills. To these comments VECCIA added the
observation that the individual teacher and the use of a creative
resource, rather than AM itself, seemed to make the key difference.
Some schools and some teachers are making excellent use of the nature
of critical thinking and teaching skills, she said.
Concurring with these remarks, DALY closed the session with the thought that
the more that producers produced for teachers and for scholars to use with
their students, the more successful their electronic products would prove.
SESSION II. SHOW AND TELL
Jacqueline HESS, director, National Demonstration Laboratory, served as
moderator of the "show-and-tell" session. She noted that a
question-and-answer period would follow each presentation.
MYLONAS * Overview and content of Perseus * Perseus' primary materials
exist in a system-independent, archival form * A concession * Textual
aspects of Perseus * Tools to use with the Greek text * Prepared indices
and full-text searches in Perseus * English-Greek word search leads to
close study of words and concepts * Navigating Perseus by tracing down
indices * Using the iconography to perform research *
Elli MYLONAS, managing editor, Perseus Project, Harvard University, first
gave an overview of Perseus, a large, collaborative effort based at
Harvard University but with contributors and collaborators located at
numerous universities and colleges in the United States (e.g., Bowdoin,
Maryland, Pomona, Chicago, Virginia). Funded primarily by the
Annenberg/CPB Project, with additional funding from Apple, Harvard, and
the Packard Humanities Institute, among others, Perseus is a multimedia,
hypertextual database for teaching and research on classical Greek
civilization, which was released in February 1992 in version 1.0 and
distributed by Yale University Press.
Consisting entirely of primary materials, Perseus includes ancient Greek
texts and translations of those texts; catalog entries--that is, museum
catalog entries, not library catalog entries--on vases, sites, coins,
sculpture, and archaeological objects; maps; and a dictionary, among
other sources. The number of objects and the objects for which catalog
entries exist are accompanied by thousands of color images, which
constitute a major feature of the database. Perseus contains
approximately 30 megabytes of text, an amount that will double in
subsequent versions. In addition to these primary materials, the Perseus
Project has been building tools for using them, making access and
navigation easier, the goal being to build part of the electronic
environment discussed earlier in the morning in which students or
scholars can work with their sources.
The demonstration of Perseus will show only a fraction of the real work
that has gone into it, because the project had to face the dilemma of
what to enter when putting something into machine-readable form: should
one aim for very high quality or make concessions in order to get the
material in? Since Perseus decided to opt for very high quality, all of
its primary materials exist in a system-independent--insofar as it is
possible to be system-independent--archival form. Deciding what that
archival form would be and attaining it required much work and thought.
For example, all the texts are marked up in SGML, which will be made
compatible with the guidelines of the Text Encoding Initiative (TEI) when
they are issued.
Drawings are postscript files, not meeting international standards, but
at least designed to go across platforms. Images, or rather the real
archival forms, consist of the best available slides, which are being
digitized. Much of the catalog material exists in database form--a form
that the average user could use, manipulate, and display on a personal
computer, but only at great cost. Thus, this is where the concession
comes in: All of this rich, well-marked-up information is stripped of
much of its content; the images are converted into bit-maps and the text
into small formatted chunks. All this information can then be imported
into HyperCard and run on a mid-range Macintosh, which is what Perseus
users have. This fact has made it possible for Perseus to attain wide
use fairly rapidly. Without those archival forms the HyperCard version
being demonstrated could not be made easily, and the project could not
have the potential to move to other forms and machines and software as
they appear, none of which information is in Perseus on the CD.
Of the numerous multimedia aspects of Perseus, MYLONAS focused on the
textual. Part of what makes Perseus such a pleasure to use, MYLONAS
said, is this effort at seamless integration and the ability to move
around both visual and textual material. Perseus also made the decision
not to attempt to interpret its material any more than one interprets by
selecting. But, MYLONAS emphasized, Perseus is not courseware: No
syllabus exists. There is no effort to define how one teaches a topic
using Perseus, although the project may eventually collect papers by
people who have used it to teach. Rather, Perseus aims to provide
primary material in a kind of electronic library, an electronic sandbox,
so to say, in which students and scholars who are working on this
material can explore by themselves. With that, MYLONAS demonstrated
Perseus, beginning with the Perseus gateway, the first thing one sees
upon opening Perseus--an effort in part to solve the contextualizing
problem--which tells the user what the system contains.
MYLONAS demonstrated only a very small portion, beginning with primary
texts and running off the CD-ROM. Having selected Aeschylus' Prometheus
Bound, which was viewable in Greek and English pretty much in the same
segments together, MYLONAS demonstrated tools to use with the Greek text,
something not possible with a book: looking up the dictionary entry form
of an unfamiliar word in Greek after subjecting it to Perseus'
morphological analysis for all the texts. After finding out about a
word, a user may then decide to see if it is used anywhere else in Greek.
Because vast amounts of indexing support all of the primary material, one
can find out where else all forms of a particular Greek word appear--
often not a trivial matter because Greek is highly inflected. Further,
since the story of Prometheus has to do with the origins of sacrifice, a
user may wish to study and explore sacrifice in Greek literature; by
typing sacrifice into a small window, a user goes to the English-Greek
word list--something one cannot do without the computer (Perseus has
indexed the definitions of its dictionary)--the string sacrifice appears
in the definitions of these sixty-five words. One may then find out
where any of those words is used in the work(s) of a particular author.
The English definitions are not lemmatized.
All of the indices driving this kind of usage were originally devised for
speed, MYLONAS observed; in other words, all that kind of information--
all forms of all words, where they exist, the dictionary form they belong
to--were collected into databases, which will expedite searching. Then
it was discovered that one can do things searching in these databases
that could not be done searching in the full texts. Thus, although there
are full-text searches in Perseus, much of the work is done behind the
scenes, using prepared indices. Re the indexing that is done behind the
scenes, MYLONAS pointed out that without the SGML forms of the text, it
could not be done effectively. Much of this indexing is based on the
structures that are made explicit by the SGML tagging.
It was found that one of the things many of Perseus' non-Greek-reading
users do is start from the dictionary and then move into the close study
of words and concepts via this kind of English-Greek word search, by which
means they might select a concept. This exercise has been assigned to
students in core courses at Harvard--to study a concept by looking for the
English word in the dictionary, finding the Greek words, and then finding
the words in the Greek but, of course, reading across in the English.
That tells them a great deal about what a translation means as well.
Should one also wish to see images that have to do with sacrifice, that
person would go to the object key word search, which allows one to
perform a similar kind of index retrieval on the database of
archaeological objects. Without words, pictures are useless; Perseus has
not reached the point where it can do much with images that are not
cataloged. Thus, although it is possible in Perseus with text and images
to navigate by knowing where one wants to end up--for example, a
red-figure vase from the Boston Museum of Fine Arts--one can perform this
kind of navigation very easily by tracing down indices. MYLONAS
illustrated several generic scenes of sacrifice on vases. The features
demonstrated derived from Perseus 1.0; version 2.0 will implement even
better means of retrieval.
MYLONAS closed by looking at one of the pictures and noting again that
one can do a great deal of research using the iconography as well as the
texts. For instance, students in a core course at Harvard this year were
highly interested in Greek concepts of foreigners and representations of
non-Greeks. So they performed a great deal of research, both with texts
(e.g., Herodotus) and with iconography on vases and coins, on how the
Greeks portrayed non-Greeks. At the same time, art historians who study
iconography were also interested, and were able to use this material.
DISCUSSION * Indexing and searchability of all English words in Perseus *
Several features of Perseus 1.0 * Several levels of customization
possible * Perseus used for general education * Perseus' effects on
education * Contextual information in Perseus * Main challenge and
emphasis of Perseus *
Several points emerged in the discussion that followed MYLONAS's presentation.
Although MYLONAS had not demonstrated Perseus' ability to cross-search
documents, she confirmed that all English words in Perseus are indexed
and can be searched. So, for example, sacrifice could have been searched
in all texts, the historical essay, and all the catalogue entries with
their descriptions--in short, in all of Perseus.
Boolean logic is not in Perseus 1.0 but will be added to the next
version, although an effort is being made not to restrict Perseus to a
database in which one just performs searching, Boolean or otherwise. It
is possible to move laterally through the documents by selecting a word
one is interested in and selecting an area of information one is
interested in and trying to look that word up in that area.
Since Perseus was developed in HyperCard, several levels of customization
are possible. Simple authoring tools exist that allow one to create
annotated paths through the information, which are useful for note-taking
and for guided tours for teaching purposes and for expository writing.
With a little more ingenuity it is possible to begin to add or substitute
material in Perseus.
Perseus has not been used so much for classics education as for general
education, where it seemed to have an impact on the students in the core
course at Harvard (a general required course that students must take in
certain areas). Students were able to use primary material much more.
The Perseus Project has an evaluation team at the University of Maryland
that has been documenting Perseus' effects on education. Perseus is very
popular, and anecdotal evidence indicates that it is having an effect at
places other than Harvard, for example, test sites at Ball State
University, Drury College, and numerous small places where opportunities
to use vast amounts of primary data may not exist. One documented effect
is that archaeological, anthropological, and philological research is
being done by the same person instead of by three different people.
The contextual information in Perseus includes an overview essay, a
fairly linear historical essay on the fifth century B.C. that provides
links into the primary material (e.g., Herodotus, Thucydides, and
Plutarch), via small gray underscoring (on the screen) of linked
passages. These are handmade links into other material.
To different extents, most of the production work was done at Harvard,
where the people and the equipment are located. Much of the
collaborative activity involved data collection and structuring, because
the main challenge and the emphasis of Perseus is the gathering of
primary material, that is, building a useful environment for studying
classical Greece, collecting data, and making it useful.
Systems-building is definitely not the main concern. Thus, much of the
work has involved writing essays, collecting information, rewriting it,
and tagging it. That can be done off site. The creative link for the
overview essay as well as for both systems and data was collaborative,
and was forged via E-mail and paper mail with professors at Pomona and
CALALUCA * PLD's principal focus and contribution to scholarship *
Various questions preparatory to beginning the project * Basis for
project * Basic rule in converting PLD * Concerning the images in PLD *
Running PLD under a variety of retrieval softwares * Encoding the
database a hard-fought issue * Various features demonstrated * Importance
of user documentation * Limitations of the CD-ROM version *
Eric CALALUCA, vice president, Chadwyck-Healey, Inc., demonstrated a
software interpretation of the Patrologia Latina Database (PLD). PLD's
principal focus from the beginning of the project about three-and-a-half
years ago was on converting Migne's Latin series, and in the end,
CALALUCA suggested, conversion of the text will be the major contribution
to scholarship. CALALUCA stressed that, as possibly the only private
publishing organization at the Workshop, Chadwyck-Healey had sought no
federal funds or national foundation support before embarking upon the
project, but instead had relied upon a great deal of homework and
marketing to accomplish the task of conversion.
Ever since the possibilities of computer-searching have emerged, scholars
in the field of late ancient and early medieval studies (philosophers,
theologians, classicists, and those studying the history of natural law
and the history of the legal development of Western civilization) have
been longing for a fully searchable version of Western literature, for
example, all the texts of Augustine and Bernard of Clairvaux and
Boethius, not to mention all the secondary and tertiary authors.
Various questions arose, CALALUCA said. Should one convert Migne?
Should the database be encoded? Is it necessary to do that? How should
it be delivered? What about CD-ROM? Since this is a transitional
medium, why even bother to create software to run on a CD-ROM? Since
everybody knows people will be networking information, why go to the
trouble--which is far greater with CD-ROM than with the production of
magnetic data? Finally, how does one make the data available? Can many
of the hurdles to using electronic information that some publishers have
imposed upon databases be eliminated?
The PLD project was based on the principle that computer-searching of
texts is most effective when it is done with a large database. Because
PLD represented a collection that serves so many disciplines across so
many periods, it was irresistible.
The basic rule in converting PLD was to do no harm, to avoid the sins of
intrusion in such a database: no introduction of newer editions, no
on-the-spot changes, no eradicating of all possible falsehoods from an
edition. Thus, PLD is not the final act in electronic publishing for
this discipline, but simply the beginning. The conversion of PLD has
evoked numerous unanticipated questions: How will information be used?
What about networking? Can the rights of a database be protected?
Should one protect the rights of a database? How can it be made
Those converting PLD also tried to avoid the sins of omission, that is,
excluding portions of the collections or whole sections. What about the
images? PLD is full of images, some are extremely pious
nineteenth-century representations of the Fathers, while others contain
highly interesting elements. The goal was to cover all the text of Migne
(including notes, in Greek and in Hebrew, the latter of which, in
particular, causes problems in creating a search structure), all the
indices, and even the images, which are being scanned in separately
Several North American institutions that have placed acquisition requests
for the PLD database have requested it in magnetic form without software,
which means they are already running it without software, without
anything demonstrated at the Workshop.
What cannot practically be done is go back and reconvert and re-encode
data, a time-consuming and extremely costly enterprise. CALALUCA sees
PLD as a database that can, and should, be run under a variety of
retrieval softwares. This will permit the widest possible searches.
Consequently, the need to produce a CD-ROM of PLD, as well as to develop
software that could handle some 1.3 gigabyte of heavily encoded text,
developed out of conversations with collection development and reference
librarians who wanted software both compassionate enough for the
pedestrian but also capable of incorporating the most detailed
lexicographical studies that a user desires to conduct. In the end, the
encoding and conversion of the data will prove the most enduring
testament to the value of the project.
The encoding of the database was also a hard-fought issue: Did the
database need to be encoded? Were there normative structures for encoding
humanist texts? Should it be SGML? What about the TEI--will it last,
will it prove useful? CALALUCA expressed some minor doubts as to whether
a data bank can be fully TEI-conformant. Every effort can be made, but
in the end to be TEI-conformant means to accept the need to make some
firm encoding decisions that can, indeed, be disputed. The TEI points
the publisher in a proper direction but does not presume to make all the
decisions for him or her. Essentially, the goal of encoding was to
eliminate, as much as possible, the hindrances to information-networking,
so that if an institution acquires a database, everybody associated with
the institution can have access to it.
CALALUCA demonstrated a portion of Volume 160, because it had the most
anomalies in it. The software was created by Electronic Book
Technologies of Providence, RI, and is called Dynatext. The software
works only with SGML-coded data.
Viewing a table of contents on the screen, the audience saw how Dynatext
treats each element as a book and attempts to simplify movement through a
volume. Familiarity with the Patrologia in print (i.e., the text, its
source, and the editions) will make the machine-readable versions highly
useful. (Software with a Windows application was sought for PLD,
CALALUCA said, because this was the main trend for scholarly use.)
CALALUCA also demonstrated how a user can perform a variety of searches
and quickly move to any part of a volume; the look-up screen provides
some basic, simple word-searching.
CALALUCA argued that one of the major difficulties is not the software.
Rather, in creating a product that will be used by scholars representing
a broad spectrum of computer sophistication, user documentation proves
to be the most important service one can provide.
CALALUCA next illustrated a truncated search under mysterium within ten
words of virtus and how one would be able to find its contents throughout
the entire database. He said that the exciting thing about PLD is that
many of the applications in the retrieval software being written for it
will exceed the capabilities of the software employed now for the CD-ROM
version. The CD-ROM faces genuine limitations, in terms of speed and
comprehensiveness, in the creation of a retrieval software to run it.
CALALUCA said he hoped that individual scholars will download the data,
if they wish, to their personal computers, and have ready access to
important texts on a constant basis, which they will be able to use in
their research and from which they might even be able to publish.
(CALALUCA explained that the blue numbers represented Migne's column numbers,
which are the standard scholarly references. Pulling up a note, he stated
that these texts were heavily edited and the image files would appear simply
as a note as well, so that one could quickly access an image.)
FLEISCHHAUER/ERWAY * Several problems with which AM is still wrestling *
Various search and retrieval capabilities * Illustration of automatic
stemming and a truncated search * AM's attempt to find ways to connect
cataloging to the texts * AM's gravitation towards SGML * Striking a
balance between quantity and quality * How AM furnishes users recourse to
images * Conducting a search in a full-text environment * Macintosh and
IBM prototypes of AM * Multimedia aspects of AM *
A demonstration of American Memory by its coordinator, Carl FLEISCHHAUER,
and Ricky ERWAY, associate coordinator, Library of Congress, concluded
the morning session. Beginning with a collection of broadsides from the
Continental Congress and the Constitutional Convention, the only text
collection in a presentable form at the time of the Workshop, FLEISCHHAUER
highlighted several of the problems with which AM is still wrestling.
(In its final form, the disk will contain two collections, not only the
broadsides but also the full text with illustrations of a set of
approximately 300 African-American pamphlets from the period 1870 to 1910.)
As FREEMAN had explained earlier, AM has attempted to use a small amount
of interpretation to introduce collections. In the present case, the
contractor, a company named Quick Source, in Silver Spring, MD., used
software called Toolbook and put together a modestly interactive
introduction to the collection. Like the two preceding speakers,
FLEISCHHAUER argued that the real asset was the underlying collection.
FLEISCHHAUER proceeded to describe various search and retrieval
capabilities while ERWAY worked the computer. In this particular package
the "go to" pull-down allowed the user in effect to jump out of Toolbook,
where the interactive program was located, and enter the third-party
software used by AM for this text collection, which is called Personal
Librarian. This was the Windows version of Personal Librarian, a
software application put together by a company in Rockville, Md.
Since the broadsides came from the Revolutionary War period, a search was
conducted using the words British or war, with the default operator reset
as or. FLEISCHHAUER demonstrated both automatic stemming (which finds
other forms of the same root) and a truncated search. One of Personal
Librarian's strongest features, the relevance ranking, was represented by
a chart that indicated how often words being sought appeared in
documents, with the one receiving the most "hits" obtaining the highest
score. The "hit list" that is supplied takes the relevance ranking into
account, making the first hit, in effect, the one the software has
selected as the most relevant example.
While in the text of one of the broadside documents, FLEISCHHAUER
remarked AM's attempt to find ways to connect cataloging to the texts,
which it does in different ways in different manifestations. In the case
shown, the cataloging was pasted on: AM took MARC records that were
written as on-line records right into one of the Library's mainframe
retrieval programs, pulled them out, and handed them off to the contractor,
who massaged them somewhat to display them in the manner shown. One of
AM's questions is, Does the cataloguing normally performed in the mainframe
work in this context, or had AM ought to think through adjustments?
FLEISCHHAUER made the additional point that, as far as the text goes, AM
has gravitated towards SGML (he pointed to the boldface in the upper part
of the screen). Although extremely limited in its ability to translate
or interpret SGML, Personal Librarian will furnish both bold and italics
on screen; a fairly easy thing to do, but it is one of the ways in which
SGML is useful.
Striking a balance between quantity and quality has been a major concern
of AM, with accuracy being one of the places where project staff have
felt that less than 100-percent accuracy was not unacceptable.
FLEISCHHAUER cited the example of the standard of the rekeying industry,
namely 99.95 percent; as one service bureau informed him, to go from
99.95 to 100 percent would double the cost.
FLEISCHHAUER next demonstrated how AM furnishes users recourse to images,
and at the same time recalled LESK's pointed question concerning the
number of people who would look at those images and the number who would
work only with the text. If the implication of LESK's question was
sound, FLEISCHHAUER said, it raised the stakes for text accuracy and
reduced the value of the strategy for images.
Contending that preservation is always a bugaboo, FLEISCHHAUER
demonstrated several images derived from a scan of a preservation
microfilm that AM had made. He awarded a grade of C at best, perhaps a
C minus or a C plus, for how well it worked out. Indeed, the matter of
learning if other people had better ideas about scanning in general, and,
in particular, scanning from microfilm, was one of the factors that drove
AM to attempt to think through the agenda for the Workshop. Skew, for
example, was one of the issues that AM in its ignorance had not reckoned
would prove so difficult.
Further, the handling of images of the sort shown, in a desktop computer
environment, involved a considerable amount of zooming and scrolling.
Ultimately, AM staff feel that perhaps the paper copy that is printed out
might be the most useful one, but they remain uncertain as to how much
on-screen reading users will do.
Returning to the text, FLEISCHHAUER asked viewers to imagine a person who
might be conducting a search in a full-text environment. With this
scenario, he proceeded to illustrate other features of Personal Librarian
that he considered helpful; for example, it provides the ability to
notice words as one reads. Clicking the "include" button on the bottom
of the search window pops the words that have been highlighted into the
search. Thus, a user can refine the search as he or she reads,
re-executing the search and continuing to find things in the quest for
materials. This software not only contains relevance ranking, Boolean
operators, and truncation, it also permits one to perform word algebra,
so to say, where one puts two or three words in parentheses and links
them with one Boolean operator and then a couple of words in another set
of parentheses and asks for things within so many words of others.
Until they became acquainted recently with some of the work being done in
classics, the AM staff had not realized that a large number of the
projects that involve electronic texts were being done by people with a
profound interest in language and linguistics. Their search strategies
and thinking are oriented to those fields, as is shown in particular by
the Perseus example. As amateur historians, the AM staff were thinking
more of searching for concepts and ideas than for particular words.
Obviously, FLEISCHHAUER conceded, searching for concepts and ideas and
searching for words may be two rather closely related things.
While displaying several images, FLEISCHHAUER observed that the Macintosh
prototype built by AM contains a greater diversity of formats. Echoing a
previous speaker, he said that it was easier to stitch things together in
the Macintosh, though it tended to be a little more anemic in search and
retrieval. AM, therefore, increasingly has been investigating
sophisticated retrieval engines in the IBM format.
FLEISCHHAUER demonstrated several additional examples of the prototype
interfaces: One was AM's metaphor for the network future, in which a
kind of reading-room graphic suggests how one would be able to go around
to different materials. AM contains a large number of photographs in
analog video form worked up from a videodisc, which enable users to make
copies to print or incorporate in digital documents. A frame-grabber is
built into the system, making it possible to bring an image into a window
and digitize or print it out.
FLEISCHHAUER next demonstrated sound recording, which included texts.
Recycled from a previous project, the collection included sixty 78-rpm
phonograph records of political speeches that were made during and
immediately after World War I. These constituted approximately three
hours of audio, as AM has digitized it, which occupy 150 megabytes on a
CD. Thus, they are considerably compressed. From the catalogue card,
FLEISCHHAUER proceeded to a transcript of a speech with the audio
available and with highlighted text following it as it played.
A photograph has been added and a transcription made.
Considerable value has been added beyond what the Library of Congress
normally would do in cataloguing a sound recording, which raises several
questions for AM concerning where to draw lines about how much value it can
afford to add and at what point, perhaps, this becomes more than AM could
reasonably do or reasonably wish to do. FLEISCHHAUER also demonstrated
a motion picture. As FREEMAN had reported earlier, the motion picture
materials have proved the most popular, not surprisingly. This says more
about the medium, he thought, than about AM's presentation of it.
Because AM's goal was to bring together things that could be used by
historians or by people who were curious about history,
turn-of-the-century footage seemed to represent the most appropriate
collections from the Library of Congress in motion pictures. These were
the very first films made by Thomas Edison's company and some others at
that time. The particular example illustrated was a Biograph film,
brought in with a frame-grabber into a window. A single videodisc
contains about fifty titles and pieces of film from that period, all of
New York City. Taken together, AM believes, they provide an interesting
DISCUSSION * Using the frame-grabber in AM * Volume of material processed
and to be processed * Purpose of AM within LC * Cataloguing and the
nature of AM's material * SGML coding and the question of quality versus
During the question-and-answer period that followed FLEISCHHAUER's
presentation, several clarifications were made.
AM is bringing in motion pictures from a videodisc. The frame-grabber
devices create a window on a computer screen, which permits users to
digitize a single frame of the movie or one of the photographs. It
produces a crude, rough-and-ready image that high school students can
incorporate into papers, and that has worked very nicely in this way.
Commenting on FLEISCHHAUER's assertion that AM was looking more at
searching ideas than words, MYLONAS argued that without words an idea
does not exist. FLEISCHHAUER conceded that he ought to have articulated
his point more clearly. MYLONAS stated that they were in fact both
talking about the same thing. By searching for words and by forcing
people to focus on the word, the Perseus Project felt that they would get
them to the idea. The way one reviews results is tailored more to one
kind of user than another.
Concerning the total volume of material that has been processed in this
way, AM at this point has in retrievable form seven or eight collections,
all of them photographic. In the Macintosh environment, for example,
there probably are 35,000-40,000 photographs. The sound recordings
number sixty items. The broadsides number about 300 items. There are
500 political cartoons in the form of drawings. The motion pictures, as
individual items, number sixty to seventy.
AM also has a manuscript collection, the life history portion of one of
the federal project series, which will contain 2,900 individual
documents, all first-person narratives. AM has in process about 350
African-American pamphlets, or about 12,000 printed pages for the period
1870-1910. Also in the works are some 4,000 panoramic photographs. AM
has recycled a fair amount of the work done by LC's Prints and
Photographs Division during the Library's optical disk pilot project in
the 1980s. For example, a special division of LC has tooled up and
thought through all the ramifications of electronic presentation of
photographs. Indeed, they are wheeling them out in great barrel loads.
The purpose of AM within the Library, it is hoped, is to catalyze several
of the other special collection divisions which have no particular
experience with, in some cases, mixed feelings about, an activity such as
AM. Moreover, in many cases the divisions may be characterized as not
only lacking experience in "electronifying" things but also in automated
cataloguing. MARC cataloguing as practiced in the United States is
heavily weighted toward the description of monograph and serial
materials, but is much thinner when one enters the world of manuscripts
and things that are held in the Library's music collection and other
units. In response to a comment by LESK, that AM's material is very
heavily photographic, and is so primarily because individual records have
been made for each photograph, FLEISCHHAUER observed that an item-level
catalog record exists, for example, for each photograph in the Detroit
Publishing collection of 25,000 pictures. In the case of the Federal
Writers Project, for which nearly 3,000 documents exist, representing
information from twenty-six different states, AM with the assistance of
Karen STUART of the Manuscript Division will attempt to find some way not
only to have a collection-level record but perhaps a MARC record for each
state, which will then serve as an umbrella for the 100-200 documents
that come under it. But that drama remains to be enacted. The AM staff
is conservative and clings to cataloguing, though of course visitors tout
artificial intelligence and neural networks in a manner that suggests that
perhaps one need not have cataloguing or that much of it could be put aside.
The matter of SGML coding, FLEISCHHAUER conceded, returned the discussion
to the earlier treated question of quality versus quantity in the Library
of Congress. Of course, text conversion can be done with 100-percent
accuracy, but it means that when one's holdings are as vast as LC's only
a tiny amount will be exposed, whereas permitting lower levels of
accuracy can lead to exposing or sharing larger amounts, but with the
quality correspondingly impaired.
TWOHIG * A contrary experience concerning electronic options * Volume of
material in the Washington papers and a suggestion of David Packard *
Implications of Packard's suggestion * Transcribing the documents for the
CD-ROM * Accuracy of transcriptions * The CD-ROM edition of the Founding
Fathers documents *
Finding encouragement in a comment of MICHELSON's from the morning
session--that numerous people in the humanities were choosing electronic
options to do their work--Dorothy TWOHIG, editor, The Papers of George
Washington, opened her illustrated talk by noting that her experience
with literary scholars and numerous people in editing was contrary to
MICHELSON's. TWOHIG emphasized literary scholars' complete ignorance of
the technological options available to them or their reluctance or, in
some cases, their downright hostility toward these options.
After providing an overview of the five Founding Fathers projects
(Jefferson at Princeton, Franklin at Yale, John Adams at the
Massachusetts Historical Society, and Madison down the hall from her at
the University of Virginia), TWOHIG observed that the Washington papers,
like all of the projects, include both sides of the Washington
correspondence and deal with some 135,000 documents to be published with
extensive annotation in eighty to eighty-five volumes, a project that
will not be completed until well into the next century. Thus, it was
with considerable enthusiasm several years ago that the Washington Papers
Project (WPP) greeted David Packard's suggestion that the papers of the
Founding Fathers could be published easily and inexpensively, and to the
great benefit of American scholarship, via CD-ROM.
In pragmatic terms, funding from the Packard Foundation would expedite
the transcription of thousands of documents waiting to be put on disk in
the WPP offices. Further, since the costs of collecting, editing, and
converting the Founding Fathers documents into letterpress editions were
running into the millions of dollars, and the considerable staffs
involved in all of these projects were devoting their careers to
producing the work, the Packard Foundation's suggestion had a
revolutionary aspect: Transcriptions of the entire corpus of the
Founding Fathers papers would be available on CD-ROM to public and
college libraries, even high schools, at a fraction of the cost--
$100-$150 for the annual license fee--to produce a limited university
press run of 1,000 of each volume of the published papers at $45-$150 per
Back to Full Books