Library of Congress

Digital Preservation

The Library of Congress > Digital Preservation > News Archive > Extractor Tool Helps Preserves Microsoft Outlook Emails and Attachments

September 24, 2010 -- The Persistent Digital Archives and Library System research project (external link) has recently released an open source software tool that extracts email, attachments and other objects from Microsoft Outlook Personal Folders (.pst) files, converting messages into XML.

PeDALSThe PeDALS Email Extractor, available at (external link), will take a .pst file saved from any version of Outlook and extract messages and attachments, retaining the folder structure present in the file.

Among the benefits to the Email Extractor over similar commercial and open-source Outlook plug-ins are speed, logging and exception handling, no limits on the number of messages, and the freedom from having to import.pst files back into Outlook.

This work is taking place as part of the Persistent Digital Archives and Library System project, led by The Arizona State Library, Archives, and Public Records (external link), which is developing a shared curatorial framework for the preservation of digital public records.  The project also aims to remove barriers to adopting technology by keeping costs as low as possible.

“We’ve taken the tool as far as we can these past few months, and we need to start modifying it to work specifically with PeDALS,” said Pete Watters, the project’s principal investigator. “In its current state, though, it can be a pretty powerful tool for anyone who needs to quickly turn a PST file into XML that can be read or transformed by other tools.”

Both an executable version of the tool and the C# code behind it are available on SourceForge. The application needs Microsoft .NET Framework 2.0 to run, and the code can be modified using Microsoft Visual Studio 2005.

In early 2010, PeDALS developed an automated process to preserve official e-mail records produced by Microsoft Outlook. 

The PeDALS project is supported by the Library of Congress National Digital Information Infrastructure and Preservation Program.