Sustainability of Digital Formats
 Planning for Library of Congress Collections

Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact
Format Description Categories >> Browse Alphabetical List

PDF/A-3, PDF for Long-term Preservation, Use of ISO 32000-1, With Embedded Files

>> Back
Table of Contents
Format Description Properties Explanation of format description terms

Identification and description Explanation of format description terms

Full name ISO 19005-3. Document management - Electronic document file format for long-term preservation - Part 3: Use of ISO 32000-1 with support for embedded files (PDF/A-3)
Description

PDF/A-3 is a constrained form of Adobe PDF version 1.7 (as defined in ISO 32000-1) intended to be suitable for archiving of page-oriented documents for which PDF is already being used in practice. PDF/A-3 adds a single and highly significant feature to its predecessor PDF/A-2 (ISO 19005-2) specification, to permit the embedding within a PDF/A file a file, or files, in any other format, not just other PDF/A files (as permitted in PDF/A-2).

See PDF/A for more information about the PDF/A family of standards. See PDF/A-2 for information about the version of the PDF/A standard that PDF/A-3 extends.

As in PDF/A-2, the PDF/A-3 standard defines three levels of conformance: conformance level A satisfies all requirements in the specification; level B is a lower level of conformance, satisfying requirements intended to be those minimally necessary to ensure that the rendered visual appearance of a conforming file is preservable over the long term. The specification notes that "Level B conforming files might not have sufficiently rich internal information to allow for the preservation of the document's logical structure and content text stream in natural reading order, which is provided by Level A conformance." An intermediate level of conformance, Level U conformance corresponds to Level B conformance with the additional requirement that all text in the document have Unicode equivalents.

PDF/A-3 allows for embedding of files of any type, but imposes requirements beyond those in "regular" PDF 1.7 files as defined by ISO 32000-1. Files that comply with these requirements are termed "associated" files; an explicit association must be made between each embedded files and the containing PDF or object or structure (e.g., image, page, or logical section) within the PDF. See Notes below for more detail on the association mechanism. Predefined values for relationships for associated files (in the required AFRelationship key) are Source, Data, Alternative, Supplement, and Unspecified. MIME types must be provided for associated files. The PDF/A-3 specification requires the use of application/octet-stream if a more specific MIME type is not known. The compilers of this resource have not determined whether more explicit characterization necessary to support long-term preservation (e.g., version) can be indicated. Comments welcome. Human-readable descriptions for the associated files can be provided and are recommended. Conforming readers must provide a mechanism for a user to choose to extract and save (not open) associated files.

See Notes below for use cases and examples that illustrate motivation for adding support for embedded files to the PDF/A-3 standard.

Production phase A final-state format for delivery to end users and long-term preservation of the document as disseminated to users.
Relationship to other formats
    Subtype of PDF, Portable Document Format
    Subtype of PDF_1_7, PDF, Version 1.7 (ISO 32000-1:2008)
    Extension of PDF/A, PDF for Long-term Preservation
    Extension of PDF/A-2, PDF for Long-term Preservation, Use of ISO 32000-1 (PDF 1.7)
    Has subtype PDF/A-3a, PDF/A-3u, PDF/A-3b, not separately described at this website.

Local use Explanation of format description terms

LC experience or existing holdings LC was represented on the working group for the original PDF/A standard and continues to participate in the development of new versions.
LC preference

The Library of Congress expresses preferences for formats for content (primarily in physical form) for its collections through the "Best Edition" specification from the U.S. Copyright Office in Circular 7b. Circular 7b (as reviewed in 09⁄2012) lists formats acceptable for mandatory deposit of Electronic Serials available only online, in order of preference. For page-oriented renditions, PDF/A appears first on the list. Other forms of PDF are acceptable, preferably with searchable text. The preference for PDF/A was declared before PDF/A-3 was published, and the preference should not be interpreted as acceptance for copyright deposit of any files embedded in a PDF/A-3 file. The Library has not expressed a preference regarding PDF/A-3, pending community-wide experience with this version of the PDF/A format.

See PDF/A-2.


Sustainability factors Explanation of format description terms

Disclosure

Open standard, published by ISO in October 2012. Developed by the working group ISO/TC 171 SC2, Document Imaging Applications, Application Issues, for which AIIM (The Association for Information and Image Management) acts as secretariat. under the auspices of ISO. It is a Joint Working Group, including participation from ISO/TC 46 SC11, Archives/records Management, ISO/TC 130, Graphics Technology, and ISO/TC 42, Photography.

    Documentation

ISO 19005-3:2012. Document management -- Electronic document file format for long-term preservation -- Part 1: Use of ISO 32000-1 with support for embedded files (PDF/A-3). The standard cannot be used without ISO 32000-1. Document management -- Portable document format -- Part 1: PDF 1.7, which it uses as a normative reference.

Adoption

As of November 2012, PDF/A-3 is a brand new standard. It is too early to assess adoption of PDF/A-3 per se, although several vendors of tools supporting creation of or conversion to PDF/A have announced that they already offer support for embedded files. See PDF/A for a discussion of adoption of PDF/A in general, bearing in mind whether that discussion still considers only PDF/A-1 and PDF/A-2. Comments welcome.

    Licensing and patents No concerns for PDF/A per se. Licensing or patent concerns may arise for embedded files.
Transparency See PDF/A in relation to PDF/A-1 and PDF/A-2. For PDF/A-3, transparency and characterization of embedded files are primary concerns for long-term preservation.
Self-documentation See PDF/A.
External dependencies See PDF/A.
Technical protection considerations See PDF/A.

Quality and functionality factors Explanation of format description terms

Text
Normal rendering See PDF/A.
Integrity of document structure See PDF/A.
Integrity of layout and display See PDF/A.
Support for mathematics, formulae, etc. See PDF/A.
Functionality beyond normal rendering See PDF/A.

File type signifiers Explanation of format description terms

Tag Value Note
Filename extension pdf
The standard does not indicate that a different extension should be used to distinguish PDF from PDF/A.
Internet Media Type See related format.  See PDF/A.
Magic numbers See related format.  See PDF/A.
Indicator for profile, level, version, etc. See note.  The standard specifies that the PDF/A version and conformance level of a file shall be specified using the PDF/A Identification extension schema defined in the standard. This schema has two mandatory elements: pdfaid:part (integer) and pdfaid:conformance (closed list of text values). A PDF/A-3 file should have the integer value 3 for pdfaid:part.
File signature See related format.  See PDF/A.

Notes Explanation of format description terms

General

The specification does not present the motivation for extending the PDF/A-2 format to support the embedding of files of any type. A single illustrative example appears in Annex E. This example is a PDF text document that displays a mathematical equation and includes a chart based on a spreadsheet. Associated files that might be embedded in the PDF/A-3 include a word-processing file (with relationship Source to the whole document), a MathML expression of the equation (with relationship Supplement to the structure or Form Xobject that displays the equation), and a spreadsheet file and CSV file (both associated with the same chart with relationship Data). A richer expression of use cases is made in a recording of a webinar, from vendor Luratech, entitled PDF/A-3, All change for document based processes! The webinar makes statements about the intent of the standard that are not found in the standard itself. Most significantly, the presenter indicates that in a PDF/A-3 file, the embedded files should be considered "non-archival." In other words, the source or supplementary material is considered as only of short-term or temporary use. What should be considered as "archived" for the long term is only the primary PDF content with its visible page display. Secondly, uses cases chosen for what was primarily a marketing presentation with existing customers in mind indicate workflows and use contexts during the primary lifecycle of a document that could have benefits for entities responsible for longer-term archiving. These use cases included:

  • Hybrid archiving. If every revision of a document is accompanied by storing a PDF/A-3 file with the source word-processing file embedded, the document is archive-ready whenever editing stops;
  • Embedding richer metadata in "native" discipline-specific or application-specific format: Although PDF/A already requires and supports XMP, many metadata schemes have rich XML representations that are not easily converted to RDF, on which XMP is based. However, rich metadata in a well-known application-specific format could be embedded and associated with the document as a whole -- in addition to XMP metadata using built-in schema.
  • German standard under development for electronic invoice exchange format based on PDF/A-3. The invoice data can be represented in machine-readable form as an XML file embedded in a human-readable PDF/A-3.

Annex E of ISO 19005-3:2012 specifies that each embedded file in a PDF/A-3 file must be identified in a file specification dictionary (as described in section 7.11.3 of ISO 32000-1). The inclusion (in a Desc key) of a human-readable description of the file is recommended. The associated embedded bitstream dictionary (essentially a header for the embedded bitstream itself) must include a MIME type (in a Subtype key), using application/octet-bitstream if a more precise MIME type is not known. The embedded bitstream dictionary must also include a Params key that contains at least a ModDate key to indicate the latest modification date of the file embedded. The specification for the Params key (in section 7.11.4.1 of ISO 32000-1) does not seem to allow for embedding file characteristic details more specific than a MIME type. Comments welcome. Relationships for associated files must be expressed in a new key introduced for PDF/A-3 (and expected to be in the forthcoming ISO 32000-2 standard for regular PDF). This AFRelationship key is located in the file specification dictionary. It indicates a relationship type of Source, Data, Alternative, Supplement, and Unspecified. Relationship links are established from the document or parts of the document by use of the AF key, which contains an array of file specification dictionaries (as described above). Files associated with the entire document are represented by an AF key in the Catalog for the PDF file. Files associated with a page are represented by an AF key in the relevant Page dictionary. Files associated with an Image XObject or A Form XObject are represented by an AF key in the attributes dictionary for the object. Similarly, a file associated with a logical structure (such as an article or a table) or an annotation is represented by an AF key in the structure dictionary or annotation dictionary. A mechanism also exists to relate an associated file to a marked section of content; however, use of relationships to structures is preferred.

See also PDF/A.

History

The primary difference between PDF/A-1 and PDF/A-2 was the use of a later underlying version of PDF. Added capabilities, all in compliance with ISO 32000-1, included:

  • Improvements to tagged PDF (for enhanced accessibility)
  • Compressed Object and XRef streams (for smaller file sizes)
  • Support for embedding of PDF/A-compliant file attachments, portable collections and PDF packages
  • Support for transparency in images
  • Support for JPEG 2000 compression for images

See also PDF/A.


Format specifications Explanation of format description terms


Useful references

URLs


Last Updated: Monday, 03-Mar-2014 16:22:06 EST