Picture a crisp new library book, with acid-free paper and hard cover, neatly shelved in the stacks, housed in a dry room with a moderated temperature, snug within the protection of a granite building. Thanks to our long-established library system, the book’s care and longevity is pretty much guaranteed. As the Library of Congress’s Caroline Arms puts it, "Whether or not that book was used, it would probably survive quite happily with no one bothering about it for several hundred years."
But the digital version of that book doesn’t have the same guaranteed protection. A digital "book" resides in an electronic medium that has few parallels and metaphors in the physical world of paper, bricks and mortar. Everything about that digital book needs to be rethought, including how to store it, how to retrieve it and what format to keep it in for long-term preservation. Caroline has spent much of the past few years not only rethinking and theorizing about the challenge; she has helped to create concrete policies and practices, new foundations upon which others are building future digital libraries.
Though her degrees are in mathematics and business administration, Caroline's professional life has been focused on information technology. "I’ve spent most of my career bridging the gap between the really technical people and people who are trying to apply technology, whether it’s to doing their research or in libraries."
In 1995, Caroline came to the Library of Congress to work on the American Memory project, but the looming challenge of preserving digital content eventually caught her attention. She credits the influence of the 1996 Waters and Garrett report, "The Preservation of Digital Information (PDF, 192KB)." "It was the seminal report that came out at about that time," she said. "That was the report that first introduced the terms emulation and migration."
Around 2000, she saw an opportunity to make the connection between American Memory and digital preservation. "The first thing that I wrote in relation to thinking about preservation was for RLG DigiNews," she said. It was titled "Keeping Memory Alive: Practices for Preserving Digital Content at the National Digital Library Program of the Library of Congress (external link)."
At the time, there was still a profound lack of awareness in the library and publishing world about digital preservation. Caroline said that a lot has happened since 2000. There’s less of a need these days to explain and re-explain digital preservation to institutions; they "get it."
Stakeholders are beginning to work together proactively, planning for the long term. "There’s a recognition that decisions made earlier in the lifecycle of digital content have a real impact on how easy it will be to preserve [that content] later," said Caroline. "In the physical world, preservation was something that came years later. The original publisher was long out of the picture by the time you got to think of whether you were going to do mass deacidification or microfilming or whatever to preserve the content or the artifact of that book. You have to take steps early in the lifecycle [for digital content] to up the probability that the content can be preserved and reused. And these include thinking hard about the formats."
Which is exactly what Caroline and her colleague Carl Fleischhauer have done: think hard about the formats. Why? Preservation will be easier if wise choices are made about digital formats. Curators should make collection decisions knowing which formats will and won’t be easily sustainable. For a file to be useful decades from now, its format's specification and characteristics must be documented.
Caroline and Carl’s exhaustive format research led to their creation of the Digital Formats Web site, the definitive inventory of information about current and emerging digital formats. This Web site has become an essential resource among the international digital preservation community. Do a Google search on "digital formats" and their site comes up first on the list.
Caroline has also helped establish open standards. "Libraries and archives will be well served by having widely used applications, using formats that are openly specified, standards that will be maintained going forward in an open forum and not by a particular company." She emphasized two important factors in digital preservation: disclosure and adoption. Disclosure means open, publicly accessible specification; adoption means a format is widely used. Explaining adoption, she said, "If [a format] is used by lots of applications, then the preservation challenge is not just a challenge for libraries and archives but for a much broader community. So we can expect that other people will address the problem of technological obsolescence."
She was involved with the original development of the Open Archives Initiative Protocol for Metadata Harvesting (external link). "I really believe in networking and– as much as copyright allows – letting the content out so that other people can use it in the ways that they want to," she said. "I get excited when I see people doing interesting things with our content. And I have to believe that in other fields great things are happening because [many] archival institutions are not holding their assets tightly but letting them out."
Businesses are beginning to understand the benefits of interoperability and open standards for digital formats as well. Most of their employees and customers use office software, and content they create must remain accessible. Recent laws such as Sarbanes-Oxley and HIPAA set requirements for long-term document retention. The Library of Congress and the British Library took advantage of common interests with corporate customers of Microsoft and joined the effort to prepare Office Open XML for submission as an ISO/IEC standard.
A new NDIIPP activity she is excited to be part of involves a different synergy. "Photographers are interested in the lifecycle of the images they create. Two new Preserving Creative America projects are with associations of professional photographers." They focus on guidelines for digital photographers, the formats to use for master images and practices for embedding metadata in the images. "If you have the thing, you can have at least some of the relevant metadata as well, even if the thing is now separate from its source," she said.
She finds it encouraging that the library community’s technological comprehension has grown over the past decade, but she admits, "There is still a lot to do to bring more people with stronger technical backgrounds into libraries and into the planning and management process. There's still a need for more librarians who have a better understanding of what’s easy and what’s hard about technology. And this doesn’t need to be actually doing stuff, but to be able to communicate with the technical people to try and work together in planning projects and designing systems."
There is not enough space here to list all of Caroline’s professional accomplishments. Her work has benefited cultural institutions nationally and internationally. And it is the Library’s great loss that Caroline will retire in June 2008. Many of those who work with her agree it is not an overstatement to say that she is irreplaceable. Caroline is quietly but intensely focused, what one might call "scary smart," and she works diligently and tirelessly at a level of detail most others cannot or do not want to.
She does not appear to be worried about the current state and the future of digital preservation though. "As I’m leaving I’m confident that there are more people in the institution now who are bridging the gap between the more traditional curation and cataloging tasks and the IT aspects then there were when I got here."
Still, she frets over the niggling details right to the end. She would very much like to see her successors continue her digital format work. "Our format resource needs to keep growing," she said. "There are lots of formats we haven’t yet described."