Sustainability of Digital Formats: Planning for Library of Congress Collections

Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact
Format Description Categories >> Browse Alphabetical List

PDF_1_4, PDF Version 1.4

>> Back
Table of Contents
Format Description Properties Explanation of format description terms

Identification and description Explanation of format description terms

Full name PDF (Portable Document Format), version 1.4
Description

PDF (Portable Document Format), developed by Adobe Systems Incorporated, is described by Adobe as a general document representation language. PDF represents formatted, page-oriented documents. These documents may be structured or simple. They may contain text, images, graphics, and other multimedia content, such as video and audio. There is support for annotations, metadata, hypertext links, and bookmarks.

Version 1.4 of PDF was the basis for versions of the PDF/X family of standards published by ISO in 2003 and for the first version of the ISO standard PDF/A (ISO 19005-1), published in 2005.

Production phase In general, a final-state format for delivery to end users. Some versions of the PDF/X family of standards, which are primarily middle-state formats for submission of files to publications or commercial printing services, are based on PDF 1.4.
Relationship to other formats
    Subtype of PDF_family, Portable Document Format Family
    Has earlier version PDF_1_3, PDF, Versions 1.0-1.3
    Has later version PDF_1_5, PDF, Version 1.5
    Has subtype PDF/A-1, PDF for Long-term Preservation, Use of PDF 1.4. The first version of PDF/A was based on PDF 1.4

Local use Explanation of format description terms

LC experience or existing holdings

The Library of Congress creates PDFs as service formats for some content it creates or makes available, including for some digitized historical materials, primarily to support convenient downloading and printing. Some of this content is in version PDF 1.4. Examples (as of early 2019) include text transcriptions made for books and pamphlets digitized for American Memory in the late 1990s: a broadside and a travel book from 1862.

The National Digital Newspaper Program, which produces Chronicling America requires awardees to deliver a PDF per page, using detailed guidelines. These guidelines require XMP metadata following specific conventions and require that "The PDF will be compatible with Acrobat 5.0 or later." Hence the earliest version of PDF accepted is PDF 1.4; in practice, as of early 2019, all the awardees and LC itself appeared to be using PDF 1.4. These PDFs are each for a single image, with OCR text available for searching. Example: newspaper page from July 1930.

LC preference See PDF_family.

Sustainability factors Explanation of format description terms

Disclosure Fully documented by Adobe Systems. Incorporated as a normative reference into ISO standards for PDF/A-1 and some versions of the PDF/X_family. See also PDF_family.
    Documentation PDF Reference, Third Edition. Adobe Portable Document Format, Version 1.4. Link via Internet Archive. See also PDF_family.
Adoption

PDF 1.4 is widely used as the basis for the first version of the PDF/A format (PDF/A-1). It is also used for versions of the PDF/X family of standards for prepress graphics exchange published in 2003.

In early 2019, the LibreOffice Export to PDF command produces a PDF 1.4 file. Also in early 2019, the printer/copier/scanners (MFDs) used by the Library of Congress can scan multipage documents direct to PDF 1.4 files. No other PDF option is supported by the MFDs.

    Licensing and patents See PDF_family.
Transparency See PDF_family.
Self-documentation Version 1.4 can include XMP metadata packages. XMP is Adobe's framework for including arbitrary blocks of metadata, using a representation in RDF.
External dependencies See PDF_family.
Technical protection considerations See PDF_family.

Quality and functionality factors Explanation of format description terms

Text
Normal rendering See PDF_family.
Integrity of document structure See PDF_family.
Integrity of layout and display See PDF_family.
Support for mathematics, formulae, etc. See PDF_family.
Functionality beyond normal rendering See PDF_family.

File type signifiers and format identifiers Explanation of format description terms

Tag Value Note
Filename extension pdf
See PDF_family.
Internet Media Type application/pdf
Media type registered with IANA. See also PDF_family.
Magic numbers Hex: 25 50 44 46 2D 31 2E 34
ASCII: %PDF-1.4
From PRONOM. However, the magic number value in the header (%PDF-1.4) declaring the PDF version with which the file complies can be overridden elsewhere in the file. See Note below for more detail.
Pronom PUID fmt/18
See https://www.nationalarchives.gov.uk/PRONOM/fmt/18 for PDF 1.4.
Wikidata Title ID Q26085326
See https://www.wikidata.org/wiki/Q26085326 for PDF 1.4.

Notes Explanation of format description terms

General

Identification of chronological versions of PDF can be given in two places in a PDF file. All PDF files should have a version identified in the header with the 5 characters %PDF– followed by a version number. For PDF files conforming to ISO 32000-1:2008 or earlier specifications (i.e. prior to ISO 32000-2:2017), the version number has the form 1.N, where N is a digit between 0 and 7. For example, PDF 1.4 is identified by %PDF–1.4. However, beginning with PDF 1.4, a conforming PDF writer may use the Version entry in the document Catalog to override the version specified in the header. The location of the Catalog within the file is indicated in the Root entry of the file trailer/footer. This override feature was introduced to facilitate the incremental updating of a PDF by simply adding to the end of the file. As a result, it is necessary to locate the Catalog within the file to get the correct version number. Unless the PDF is "linearized," in which case the Catalog is up front, this will require reading the trailer and then using the reference there to locate the Catalog, which will typically be compressed. This has practical implications because format identification tools, including DROID, typically look for particular characters at the beginning of a file (i.e., in the header), to permit identification with minimal effort. DROID can look for characters at the end of the file, but is not able to follow an indirect reference or decompress file contents. When the version number is not the same in the header and the Catalog, there is potential for format identification errors.

History PDF 1.4 was published in November 2001, and corresponds to Acrobat version 5. PDF 1.4 was incorporated into versions of the PDF/X and PDF/A families of ISO standards.

Format specifications Explanation of format description terms


Useful references

URLs


Last Updated: 06/30/2022