Library of Congress

Digital Preservation

The Library of Congress > Digital Preservation > News Archive > CDL Public Web Archive Service Collections Launched

July 15, 2009 -- The California Digital Library has opened its Web Archiving Service (external link) collections. Topics in the collection range from California government agencies to middle-eastern politics to natural disasters. The institutions that harvested and curated the websites include New York University, the University of North Texas, Stanford University and several University of California campuses.

Users can browse the public archives by URL or search by keyword. Users can also view changes over time for a given web site if that website was harvested more than once. This feature is especially useful when comparing something like the daily reporting on the 2007 Southern California wildfires or a quick check to see what new documents were added to a site from crawl to crawl.

CDL built the WAS to support the Web-at-Risk project, which is funded by the National Digital Information Infrastructure and Preservation Program and the University of California (external link). The WAS enables users to select, capture, curate, preserve and provide access to archived websites, using curatorial tools developed by CDL and web-archiving tools developed by the Internet Archive.

Early in the Web-at-Risk project, CDL solicited input from librarians and archivists to determine their web-archiving needs. CDL then refined the WAS so that users can do their work by means of simple menu choices; all the technological complexity remains hidden in the background.

A curator selects a website to archive, adds some basic descriptive information, chooses whether to capture the site once or regularly and whether or not to harvest web pages linked to her target site. She can then initiate a harvest.

Once a website harvest is completed the WAS emails the curator, who can then review the archived contents. She can sort the contents by document type, like by PDF or videos, view detailed server reports and decide whether to keep or delete the content. She can compare the results of separate crawls and determine if any files have changed, are new, are missing or are unchanged.

The "Help" feature includes video demonstrations on how to create site entries and evaluate crawl results.