Library of Congress

Digital Preservation

The Library of Congress > Digital Preservation > News Archive > Library Releases Software Tools

January 9, 2009 -- The Library of Congress has released software tools that cultural heritage organizations can use to send and receive digital data. All the tools are open source, which means they can be freely used and modified with minimal conditions.

The tools are available through SourceForge, the technology community’s hub for open source software distribution and services, under the Library of Congress Transfer Tools (external link) project.

The project is based on use of the BagIt specification (PDF, 63KB), which is a hierarchical file packaging format for the exchange of digital content. The Library's Repository Development Group worked with the California Digital Library to jointly develop the specification.

These are the first software tools the Library has formally released as open source.  They support validation and transfer of data that conforms to the BagIt specification.
The Library plans to release additional tools as part of a suite of solutions and software development resources as they are completed over time.

Three tools are available now. Bag Validator is a Python script that validates a Bag, checking for missing files, extra files, and duplicate files. Parallel Retriever implements a simple Python-based wrapper around wget and rsync to optimize the transfer of content between locations through parallelization. It supports rsync, HTTP, and FTP transfers. VerifyIt is a shell script that verifies file checksums within a Bag manifest using parallel processes.