Library of Congress

Digital Preservation

The Library of Congress > Digital Preservation > News Archive > Library Develops Specification for Transferring Digital Content

June 2, 2008 -- The Library of Congress and the California Digital Library have jointly developed a format for transferring digital content. The BagIt format specification is based on the concept of "bag it and tag it," where digital content is packaged (the bag) along with a small amount of machine-readable text (the tag) to help automate the content’s receipt, storage and retrieval. There is no software to install.

BagIt is an attempt to simplify large-scale data transfers between cultural institutions. It builds on the success of the Library’s data-transfer projects, including the Archive Ingest and Handling Test and the San Diego Supercomputer Center data-storage project (PDF, 11KB). BagIt streamlines the process and reduces the number of "moving parts."

Though data transfer can be done by network or disk, the Library expects to receive more and more digital collections over the network. Conforming to the BagIt format will guarantee users a smooth, orderly transfer, with little or no human intervention. John Kunze, Preservation Technologies Architect at the California Digital Library and one of the principle authors of BagIt, said, "It really makes it easy for another institution to hold onto some of our content, and that's reassuring."

A bag consists of a base directory, or folder, containing the tag and a subdirectory that holds the content files. The tag is a simple text-file manifest, like a packing slip, that consists of two elements:

1. An inventory of the content files in the bag
2. A checksum for each file.

A checksum is a way to validate that everything in the bag arrived OK. Once the content is validated, the receiver emails the sender a confirmation receipt.

This example shows a typical line from a manifest:

        408ad21d50cef31da4df6d9ed81b01a7    data/docs/example.doc

Each line contains a string of characters – the checksum – followed by the name of the file ("example.doc") and its directory path.

A slightly more sophisticated bag lists URLs instead of simple directory paths. A script then consults the tag, detects the URLs and retrieves the files over the Internet, ten or more at a time. This type of simultaneous multiple transfer greatly reduces the overall data-transfer time.

In another optional file, users can add metadata about the content, such as a description of the package contents and detailed contact information for the sender.

The full BagIt specification is available at (PDF, 84KB).