skip navigation
  • Ask a LibrarianDigital CollectionsLibrary Catalogs
  •   Options
The Library of Congress > Digital Preservation > About the Program > Inside the Library > Resources > Tools
Digital Preservation
  • Digital Preservation Home
  • News & Events
  • Partners
  • Tools & Services
  • Publications
  • Video Presentations
  • Podcasts
  • Personal Archiving
  • About the Program
  • Contact Us

Related Resources

  • Digital Formats Sustainability
  • Federal Agencies Digitization Guidelines Initiative
  • Section 108 Study Group
  • Standards at the Library of Congress
  • Web Archiving

rss image RSS | Blog

podcast image Podcasts

flying envelope graphic that looks like a winged pegasus Email Updates

Why Digital Preservation Is Important For Everyone

screenshot from the Why Digital Preservation is Important for Everyone video

Check out a short video about the basics of digital preservation.

arrow bullet Check it out

NDIIPP Newsletter

Digital Preservation Newsletter

Latest NDIIPP Newsletter

Library of Congress Digital Preservation Tools

Digital Preservation inside the Library of Congress.
About the Program | Program Partners | Inside the Library | NDIIPP Background
Library of Congress Resources Home | Tools and Services | Publications

Library of Congress Digital Preservation Tools and Services Inventory

This is a list of software tools and utilities designed, developed or used by the Library of Congress in its digital preservation program. By making this list available, the Library encourages others in the preservation community to share in, and take advantage of, the work and resources of the Library.

Tool Listing

Tools are listed alphabetically by the name of the tool. A suite of tools developed by the Library and its NDIIPP partners for the purpose of validation and transfer of data that conforms to the BagIt specification are now hosted at Sourceforge.

BagIt

A format for transferring digital content. Content is packaged (the bag) along with a small amount of machine-readable text (the tag) to help automate the content's receipt, storage and retrieval. There is no software to install. A bag consists of a base directory containing the tag and a subdirectory that holds the content files. The tag is a simple text-file manifest, like a packing slip, that consists of two elements:

1. An inventory of the content files in the bag
2. A checksum for each file.

A slightly more sophisticated bag lists URLs instead of simple directory paths. A script then consults the tag, detects the URLs and retrieves the files over the Internet, ten or more at a time. This type of simultaneous multiple transfer reduces the overall data-transfer time. In another optional file, users can add content metadata.

  • Developer: Library of Congress, California Digital Library
  • Written in: n/a
  • OS and run-time environment: n/a
  • Application: n/a
  • Documentation: Bagit Specification (PDF, 83 Kb)
  • License: n/a
  • Last tool update: 05/31/08

Bag Validator

The Bag Validator tool is a small Python script that validates a Bag, checking for files in the manifest that are missing from the disk, files on the disk that are not listed in the manifest, and duplicate entries in manifest.

  • Developer: Library of Congress
  • Written in: Python
  • OS and run-time environment: Unix
  • Application: n/a
  • Documentation: Contact Leslie Johnston at lesliej [at] loc.gov for information
  • License: n/a
  • Last tool update: 06/20/08

Parallel Retriever

The Parallel Retriever implements a simple Python-based wrapper around wget and rsync, producing a package in the BagIt spec when given a "file manifest" and a "fetch.txt" file. It has been used to transfer content from several transfer partners hosting rsync and HTTP servers, at rates exceeding 200Mbps over Internet2. It was initially built specifically for Internet Archive rsync transfers, but was extended to support the BagIt spec, and HTTP as well as rsync.

  • Developer: Library of Congress
  • Written in: Python
  • OS and run-time environment: Unix
  • Application: n/a
  • Documentation: Contact Leslie Johnston at lesliej [at] loc.gov for information
  • License: n/a
  • Last tool update: 08/05/08

VerifyIt

The VerifyIt tool is a script that verifies a MD5 Bag manifest using 11 parallel md5sum processes.

  • Developer: Library of Congress
  • Written in: Shell script
  • OS and run-time environment: Unix
  • Application: n/a
  • Documentation: Contact Leslie Johnston at lesliej [at] loc.gov for information
  • License: n/a
  • Last tool update: 07/22/08

up arrow graphic Back to Top

disclaimer

About | Site Map | Contact | Accessibility | Legal | USA.gov