Sustainability of Digital Formats: Planning for Library of Congress Collections

Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact
Format Description Categories >> Browse Alphabetical List

PDF/A-1, PDF for Long-term Preservation, Use of PDF 1.4

>> Back
Table of Contents
Format Description Properties Explanation of format description terms

Identification and description Explanation of format description terms

Full name ISO 19005-1. Document management - Electronic document file format for long-term preservation - Part 1: Use of PDF 1.4 (PDF/A-1)
Description

PDF/A-1 is a constrained form of Adobe PDF version 1.4 intended to be suitable for long-term preservation of page-oriented documents for which PDF is already being used in practice. The ISO standard [ISO 19005-1:2005] was developed by a working group with representatives from government, industry, and academia and active support from Adobe Systems Incorporated.

The PDF/A family of standards attempt to maximize:

  • Device independence
  • Self-containment
  • Self-documentation

The constraints include:

  • Audio and video content are forbidden
  • Javascript and executable file launches are prohibited
  • All fonts must be embedded and also must be legally embeddable for unlimited, universal rendering
  • Colorspaces specified in a device-independent manner
  • Encryption is disallowed
  • Use of standards-based metadata is mandated

The PDF/A-1 standard defines two levels of conformance: conformance level A (known as PDF/A-1a) satisfies all requirements in the specification; level B (known as PDF/A-1b) is a lower level of conformance, "encompassing the requirements of this part of ISO 19005 regarding the visual appearance of electronic documents, but not their structural or semantic properties."

Production phase A final-state format for delivery to end users and long-term preservation of the document as disseminated to users.
Relationship to other formats
    Subtype of PDF, Portable Document Format
    Subtype of PDF_1_4, PDF Version 1.4
    Subtype of PDF/A_family, PDF/A, PDF for Long-term Preservation, Family
    Has subtype PDF/A-1a, PDF for Long-term Preservation, Based on PDF 1.4, Level A Conformance
    Has subtype PDF/A-1b, PDF for Long-term Preservation, Based on PDF 1.4, Level B Conformance
    Has later version PDF/A-2, PDF for Long-term Preservation, Use of ISO 32000-1 (PDF 1.7)

Local use Explanation of format description terms

LC experience or existing holdings LC was represented on the working group for the original PDF/A-1 standard and continues to be active in the development of new versions.
LC preference

See PDF/A_family.


Sustainability factors Explanation of format description terms

Disclosure

Open standard, approved in May 2005 and published by ISO in October 2005. Developed under the the working group ISO/TC 171 SC2, Document Imaging Applications, Application Issues, for which AIIM (The Association for Information and Image Management) acted as secretariat to the US TAG for ISO/TC 171 until 2017. NPES (now the Association for Print Technologies), secretariat to US TAG for ISO/TC 130, was a co-sponsor.

For maintenance of the PDF/A series of standards, ISO established a Joint Working Group (ISO/TC 171/SC 2/WG 5), which also includes representation from ISO/TC 46 SC1 (Archives/Records Management), ISO/TC 130 (Graphics Technology), and ISO/TC 42 (Photography). As of October 2020, this working group operates through the non-profit PDF Association (PDFA), Inc., now the secretariat for ISO/TC 171 SC2, which is responsible for maintenance of most PDF standards, including PDF/A-1. PDFA also acts as the U.S, Technical Advisory Group (TAG) Administrator to ISO/TC 171 SC2.

    Documentation ISO 19005-1:2005. Document management -- Electronic document file format for long-term preservation -- Part 1: Use of PDF 1.4 (PDF/A-1). The standard cannot be used without PDF Reference, Third Edition, Version 1.4, which it uses as a normative reference. Link via Internet Archive.
Adoption

Since the standard was published in late 2005, tools for creation, conversion, and validation of PDF/A-1 reached the market steadily. Adobe's own Acrobat Professional 7.0, released in 2004, allowed saving files in a form compliant with the draft standard. Acrobat 8 (2006) and later versions support the standard as published. Microsoft Office 2007 supported creation of PDF/A files through Save as PDF, originally an add-on module. Open Office introduced support for PDF/A in release 2.4 (in early 2008). LibreOffice can export documents as PDF/A-1; as of version 6.0.7.3 (current as of early 2019), this was the only version of PDF/A that could be generated.

Many commercial companies with products aimed at large enterprises, have produced products supporting the creation, migration, and validation of PDF/A files. Pioneers included Apago, Inc., Visioneer (for scanning paper to PDF/A-1b), Callas Software, Compart Systemhaus, Luratech, Nuance, and PDF Tools AG. Many of these companies are based in Europe, where the growing requirements from the EU for use of digital formats that are formal (preferably ISO) standards has produced more market pressure than in the U.S. Starting with version 0.93 (released in January 2007), the widely used open source Apache FOP (Formatting Object Processor, based on the W3C's XSL-FO standard) introduced support for the minimal PDF/A profile, PDF/A-1b. Apache FOP 1.1 (released in October 2012) added some support for PDF/A-1a as part of improved support for accessibility. A large number of supporting products can be found at https://www.pdfa.org/products/ from the PDF Association.

The standards development process involved active participation on behalf of communities whose endorsement or adoption would create significant momentum for wider adoption in the sense of requirement or preference for PDF/A over generic PDF for archival deposit or submission. Important groups are government agencies and legislative and judicial institutions. Adobe reported migration of legacy "report silos" at several (un-named) financial institutions at a meeting of the European DLM (Document Lifecycle Management) Forum in Helsinki in November 2006. An increasing number of libraries and other archival institutions are recommending or requiring PDF/A. For pragmatic reasons, when PDF/A is mandated, PDF/A-1b is usually acceptable. Full PDF/A-1a compliance, with tagged document structure, is hard to achieve except in a workflow that anticipates that objective from initial document creation. Libraries and archives recommending or mandating PDF/A for textual documents deposited in a digital repository soon after the standard was published included: Virginia Tech for electronic theses; the National Archives of Norway; and the University of Texas Libraries. The United States Patent and Trademarks Office (USPTO) has requirements for PDFs that it accepts for electronic filing; the requirements are based on the PDF/A specification. Documents conforming to PDF/A-1 meet the USPTO requirements. In February 2014, PDF/A-1 was declared a preferred format for textual documents, scanned text, presentations, and posters for transfer of permanent electronic records to the U.S. National Archives.  For scanned text, additional image quality guidance is given.

According to an announcement available on the PACER (Public Access to Court Electronic Records -- for U.S. Federal Courts) web site from 2011 through December 2016, "The Judiciary is planning to change the technical standard for filing documents in the Case Management and Electronic Case Filing (CM/ECF) system from PDF to PDF/A." However, in October 2020, there is no indication that this requirement has been implemented. Comments welcome.

A list of entities recommending or requiring use of PDF/A was found at http://www.adobe.com/enterprise/standards/pdfa/ from Adobe between 2010 and early 2013 (link now via Internet Archive). Another list of entities recommending or requiring use of PDF/A by 2011 was found at http://www.pdfa.org/2011/06/recommendations-for-pdfa/ (link now via Internet Archive) from the PDF Association, an alliance of vendors.

Funded under the EU's PREFORMA program as a project from 2015 to 2017, veraPDF was developed as a software tool and open-source library to support validation of PDF/A files against all the parts and profiles in ISO 19005.

    Licensing and patents

Adobe has a number of patents covering technology that is disclosed in the Portable Document Format (PDF) Specification, version 1.3 and later, and hence in the ISO 19005-1 specification by reference. As an ISO standard, the compliance of ISO 19005-1 with the ISO/IEC/ITU common patent policy has been vetted.

A summary of relevant information on the Adobe Web site in December 2010 at http://partners.adobe.com/public/developer/support/topic_legal_notices.html (link now via Internet Archive) follows. Note that all the patents listed on this Adobe page had probably expired as of 2019-01-01.

To promote the use of PDF for information interchange the following patents are licensed by Adobe on a royalty-free, non-exclusive basis for the term of each patent for developing software that produces, consumes, and interprets PDF files : 5,634,064 (filed 1996-08-02, granted 1997-05-27, probably expired as of 2019-01-01); 5,737,599 (filed 1995-12-07, granted 1998-04-07, probably expired as of 2019-01-01); 5,781,785 (filed 1995-09-26, granted 1998-07-14, probably expired as of 2019-01-01); 5,819,301 (filed 1997-09-09, granted 1998-10-06, probably expired as of 2019-01-01); 6,028,583 (filed 1998-01-16, granted 2002-02-22, probably expired as of 2019-01-01); 6,289,364 (filed 1997-12-22, granted 2001-09-11, probably expired as of 2019-01-01); 6,421,460 (filed 1999-05-06, granted 2002-07-16, probably expired as of 2019-01-01). Patent 5,860,074 (filed 1997-08-14, granted 1999-01-12, probably expired as of 2019-01-01) is similarly licensed on a royalty-free, non-exclusive basis for its term but only for the purpose of developing software that produces PDF files (thus specifically excluding software that consumes and/or interprets PDF files).

In association with the adoption of PDF, version 1.7 as an ISO standard (ISO 32000-1:2008), Adobe issued a Public Patent License, granting "every individual and organization in the world the royalty-free right, under all Essential Claims that Adobe owns, to make, have made, use, sell, import and distribute Compliant Implementations."

Transparency Depends upon compliant software tools to read. Building tools requires sophistication. PDF/A does not permit encryption.
Self-documentation Support for embedding any form of metadata for a document is extremely good. Use of XMP is mandatory for basic descriptive and identifying metadata. Other XMP metadata packages can be embedded.
External dependencies PDF/A is constrained to avoid external dependencies. All necessary fonts must be embedded.
Technical protection considerations PDF/A does not permit encryption.

Quality and functionality factors Explanation of format description terms

Text
Normal rendering Good support is possible, particularly for files complying with the PDF/A-1a profile but not guaranteed. The PDF/A-1 format does not preclude creating documents from scanned page images using the PDF/A-1b conformance profile; such files do not necessarily support indexing of the document text or extraction of text for quotation. See PDF/A FAQ from the PDF Association.
Integrity of document structure The logical structure of a document is only represented in a PDF/A file if the creator or process during creation takes steps to incorporate structural tagging. The PDF/A standard recommends the representation of structural hierarchy.
Integrity of layout and display PDF is designed to represent the layout of page-oriented documents.
Support for mathematics, formulae, etc. Can be represented by embedded graphics.
Functionality beyond normal rendering Annotations may be embedded. Bookmarks may be provided.

File type signifiers and format identifiers Explanation of format description terms

Tag Value Note
Filename extension pdf
The standard does not indicate that a different extension should be used to distinguish PDF from PDF/A.
Internet Media Type application/pdf
Media type registered with IANA. See also PDF Family.
Magic numbers See note.  According to the PDF/A-1 specification, "Neither the version number in the header of a PDF file nor the value of the Version key in the document catalog dictionary shall be used in determining whether a file is in accordance with this part of ISO 19005." Hence, although the magic number in the header will usually be "%PDF-1.4" (see PDF_1_4), it should not be relied on.
Indicator for profile, level, version, etc. See note.  The standard specifies that the PDF/A version and conformance level of a file shall be specified using the PDF/A Identification extension schema defined in the standard. This schema has two mandatory elements: pdfaid:part (integer) and pdfaid:conformance (closed list of text values). A PDF/A-1 file should have the integer value 1 for pdfaid:part.
Pronom PUID See note.  There is no PRONOM entry specifically for PDF/A-1. See https://www.nationalarchives.gov.uk/PRONOM/fmt/95 for profile PDF/A-1a and https://www.nationalarchives.gov.uk/PRONOM/fmt/354 for PDF/A-1b.
Wikidata Title ID See note.  There is no Wikidata Title ID specifically for PDF/A-1. See https://www.wikidata.org/wiki/Q1547957 for all versions and profiles of PDF/A. See https://www.wikidata.org/wiki/Q26541013 for profile PDF/A-1a and https://www.wikidata.org/wiki/Q26543628 for PDF/A-1b.

Notes Explanation of format description terms

General  
History

PDF/A was developed to address the issue that large bodies of official documents and important information are maintained in PDF, but that PDF is not suitable as an archival format. The Administrative Office of the U.S. Courts was a driving force in forming a U.S. Committee to initiate an ISO standard based on PDF. The activity was started in October 2002 under the joint auspices of AIIM (Association for Information and Image Management) and NPES (National Printing Equipment Suppliers, aka the Association for Suppliers of Printing and Publishing Technologies, and recently renamed the Association for Print Technologies). NPES had sponsored the development of the PDF/X standard for prepress graphics exchange, published by ISO in March 2002 as ISO 15929:2002 and continued to be engaged in the development and maintenance of PDF/X standards in the many parts of ISO 15930.

A call for participation in the joint AIIM/NPES project was published online in late 2002. Useful details of the standardization process, including a timeline, are in a presentation from a session at the annual meeting of the Society of American Archivists in July 2005, when ISO 19005-1 had been approved but not yet published. ISO 19005-1 was published in October 2005.

Part 2 of ISO 19005, PDF/A-2 (ISO 19005-2:2011) extended the capabilities of PDF/A-1. PDF/A-2 is based on PDF version 1.7 (as defined in ISO 32000-1) rather than PDF version 1.4.


Format specifications Explanation of format description terms


Useful references

URLs


Last Updated: 07/29/2022