Ammons: Welcome to the Preservation Technology Podcast. The show that brings you the people and projects that are advancing the future of America’s heritage. I’m Kevin Ammons with the National Park Service’s National Center for Preservation Technology and Training. In this edition of the podcast, we join NCPTT’s Jeff Guin as he speaks with Kit Arrington, digital library specialist at the Library of Congress. They will discuss how the Library of Congress digitizes and shares documents online for long-term public access.
Guin: Kit, thanks for being on the podcast.
Arrington: Thank you for inviting me.
Guin: I wanted to start by asking you how long have you been with the Library of Congress, and what got you interested in the field digital preservation?
Arrington: I’ve been at the Library for 15 years now. I came in with what was then the National Digital Library Pilot Project. And I’m working now on what has become standard Library practice -having digital elements of everyday work. My interest in digital preservation developed along with the Library’s digital growth and is a natural part of one of our mission mandates – preservation – which extends to the digital formats under our care.
Guin: You coordinate the digital aspects of preparing the very popular HABS, HAER, and HALS data and documentation for online presentation. These are very popular programs. Tell me a little about what they are and what that process is for actually getting the files online.
Arrington: With the HABS/HAER/HALS, nothing can be described quickly ever there. They are just such rich and wonderful treasures of ours. They are unique holdings for us because they are active programs and the collections are always growing, and they are absolutely one of our most popular collections, and always have been. Now that they are online, it is just wonderful the new audiences we are reaching. They are being used by students, historians, life-long learners — everyone. Our collaborative relationship and the high level of cooperation that we enjoy with the office of the National Park Service that oversees these programs and creates the documentation is very special and unique. As many of your listeners might know, the Historic American Buildings Survey began over 75 years ago. They were a Works Progress Administration effort to put out-of-work architects to work and document historic properties. And the Library and the Department of the Interior were a part of that from the beginning. And it’s continued to this day.
The Historic American Engineering Record (HAER) was created in 1969, and the Historic American Landscapes Survey began in 2000. Between those three collections, there are now over 39,000 surveys, which contain over 500,000 measured drawings, photographs and written history pages. The digital conversion process for these materials began in 1996. And I have been working on them and with them since that time.
When we worked with the National Park Service in 1996 to map their collections management database — they had a database they just used for tracking their own work. And we took that database and mapped the records into a bibliographic format that fit in with the electronic access to our other collections that was being developed for the World Wide Web. The next year in 1997, we began a five-year project to scan the collections that we had on-site, including the new material being added quarterly each year and continues to be added quarterly each year.
For maximizing the efficiency of the digital conversion of a collection this large — because the requirements for scanning a typed page and an original medium format b&w negative are very different — we had separate projects to scan the text history pages, the architectural drawings, and the original negatives. This included the nitrate negatives from the 1930’s. By 2001, we had scanned and processed everything that was at the Library. During that time the HABS/HAER/HALS division of the National Park Service completely revamped their database and have worked to add additional information, such as subject terms, which has enriched the records, as well as transforming their workflow to include digital images of the drawings and histories as part of what they transmit to the Library, which allows for faster access online. We’re moving toward NPS providing all of the digital images, though we are not yet accepting born-digital photographs as part of the archival documentation, though we have begun discussions on it in response to the realities of the decreasing availability of large format film technology. So that’s where we stand to day. It’s all online and available now.
Guin: How do you share your digital files online so that the largest possible audience can actually use these files?
Arrington: The digital files of the Prints & Photograph Division collections are made available online through our online catalog, which you can find at www.loc.gov/pictures. We have item and group level records and thumbnail images available for almost all of our digitized collections, which includes photographs, posters, architectural drawings, political cartoons, stereographs, glass negatives – many things. For a variety of reasons, different rights issues being the most common, for some items, the larger digital images are not available offsite, though you can access them if you are here at the Library. Because it is in the public domain, all of the images in the HABS/HAER/HALS collections are available on the Web – from a thumbnail image to the highest resolution, uncompressed TIFF image.
We’re also exploring reaching out through other venues, including our collaboration with Flickr where we are now posting some of our collections.
Guin: How do people are actually using this information?
Arrington: In Flickr it is very fun. All the kinds of Flickr groups where you will have “oh we like public signs” or any variety of people who have huge specialized images that they are interested in. They include a lot of the collections that we put up. You’ll see many, many of our images used in Wikipedia when people are illustrating what they are posting as an entry in Wikipedia. They’ll come to the library to find their images to illustrate it.
People post our images on their Websites, they use them in documentaries, publications, school projects, research projects, commercial projects. One of my favorite uses of the HABS/HAER/HALS materials is a web site that offers “Free Drawings and Plans” and under categories such as “Build your own Barn” they’ve downloaded the architectural drawings from surveys of barns and made them available, fully crediting the Library as their source.
Guin: How do you analyze the way your images are being accessed and used? Have those analytics changed your process for digitizing or sharing these files in any way?
Arrington: The examples that we’re aware of are actually are mostly for our own research, things that we’ve come across or users that are contacting us with questions. It is really sort of anecdotal for our own experience.
This is not my area of expertise, but I understand that the Library is exploring more now the use of statistics software for gathering information on how our website is being accessed and utilized. At this point, in P&P we’ve only tracked very rough and general statistics for the number of folks coming to the P&P online catalog and collections, and we’ve seen these numbers grow exponentially through time, not unsurprisingly. We don’t currently explore those numbers at the image level, of course we also currently have over 1.5 million digital images available through our P&P online catalog, which are a fraction of the over 14 million items in our collections.
Guin: The Library of Congress plays an important role, worldwide, in making sure that its digital content will be accessible for future generations. How do you determine archival formats?
Arrington: In the Prints & Photographs Division when we began our conversion projects with an RFP in 1995, we selected the TIFF file format as our archival format. We continue monitoring changes through time, for example we’ve been keeping our eye on JP2 – which some institutions are beginning to adopt as their archival format — currently TIFF remains the most widely used and supported file format for archival images. The Library’s National Digital Information Infrastructure and Preservation Program Web site: www.digitalpreservation.gov contains an analysis of file formats for the Library’s use that analyzes their sustainability.
Guin: How is your role changing as more content is born digital?
Arrington: We’re taking the same principles for collection, preservation and access that we’ve always followed and are applying them to the realm of born digital. In the same way that we’ve had to research how best to care for a film negative, we’re doing the same for born digital – though it’s a much more active and constantly changing process. On our own, and taking advantage of the work and efforts of others, such as professional photography associations such as the ASMP, or in the work of the Federal Digitization Guidelines Initiative we’re monitoring the changing file formats through time. At this point we haven’t actually accepted a large number of born digital items into our collections, but only because at this point in time we have not had any significant submissions of modern works. But we’ve had enough to begin to explore and establish workflows for accepting, storing and providing access to them. A group of photo-journalist photographs that we collected following 9/11 was one of our first significant born digital acquisitions. In another area we are studying the developing “Best Practices” for preserving vector file formats, like AutoCAD, in anticipation of the eventual inclusion of those kinds of items in our architectural and engineering collections. It is now the rare architect who draws by hand.
Of course in other parts of the library, we have a website preservation program with different events through time and major elections or the Olympics that are preserving websites. We worked collaboratively with the Internet Archive in the early days for the work they are doing preserving websites. So there are issues that we’re aware of and collecting as an institution and sorting out as we go along and as they change themselves.
Guin: How can the smaller heritage preservation organizations–or conscientious individuals–make sure their data is saved in archival format?
Arrington: At this point in time awareness of the importance of digital preservation has permeated the consciousness of most preservation organizations, and an increasing number of individuals. For cultural organizations in particular, with very little effort it is easy to find a number of excellent “Best Practice” guidelines to follow in your area of expertise for becoming knowledgeable about the issues to consider to best create, collect, and preserve their digital objects – whether they are text, or images, or sound files. In addition to being a resource for private, public and government organizations and institutions, the Library’s NDIIP program is offering a new resource to help individuals be more aware of how to preserve their personal digital items. It is a work in progress, but if the Library hopes to collect, for example, important photos in the future, we need to help folks understand now what they should be doing to save them!
Guin: On a personal level, you have an interest in other aspects of hands-on historic preservation. Tell me about that, and does it affect your view of the documents and files your are introducing to the virtual world?
Arrington: My mother had an anthropology background and worked in museums, then owned a used bookstore, and I fully credit her love and appreciation for what objects can teach us with my own appreciation of being able to live and learn from the past within the present. For that reason, I will say that I’ve always maintained a healthy skepticism of the longevity of digital objects – photographs being a perfect example. I have boxes of wonderful family photos that are intact and have just moved through time with our family despite years of not being touched or accessed. And they are treasures.
The digital equivalent of family histories being created today will require a much greater attention through time to be accessible to future generations. But because of my joy in accessing these “old things” today, I want to be sure that will be true for future generations who want to access the digital files of today. I’m also always questioning the preservation issues – Are our file specifications good enough to move through time? How are we backing these up? How are we tracking them? Using another photo example, just as color photo prints have fragile preservation issues, the color management of a digital file to maintain accurate color representation through time (or amongst various hardware and software) is tricky.
Really, these are the same old issues that we’ve always addressed in caring for our collections. And following the same preservation and access principles that have always guided us, we will make the best choices that we can with digital items to hold and care for them too.
Guin: Kit, thanks so much for being on the podcast.
Arrington: Thank you very much Jeff.
Ammons: That was Jeff Guin interviewing Kit Arrington of the Library of Congress. If you would like to learn more about this project, visit our podcast shownotes at the National Center for Preservation Technology and Training website. That’s ncptt.nps.gov. Until next time, good bye everybody.