Sustainability of Digital Formats
 Planning for Library of Congress Collections

Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact
Format Description Categories >> Browse Alphabetical List

EPUB, Electronic Publication, Version 3

>> Back
Table of Contents
Format Description Properties Explanation of format description terms

Identification and description Explanation of format description terms

Full name EPUB, Electronic Publication, Version 3
Description

EPUB is a format for electronic publications with reflowable text in marked up document structure with associated images for illustrations, all in a container format. Reflowable text allows the text display to be optimized for the particular display device used by the reader of the EPUB-formatted book; this is in contrast to documents with pre-determined pagination. EPUB allows publishers to control document presentation through style-sheets. EPUB, Version 3 also provides for audio, including synchronization of text and audio, and for text-to-speech synthesis. Video may be embedded, but readers need not support video. Creators must provide a still image as fallback for a video clip.

The EPUB_3 specification comprises an overview and four separate specification documents:

  • Open Container Format (OCF), version 3. Specifies the mandatory container for an EPUB. OCF is based on the ZIP format.
  • EPUB Publications 3.0. Defines how components of an OPS publication are related, defining the natural reading order, and identifying alternate representations for content elements. OPF also holds document metadata. Specifies options for representation of the content of an electronic publication. Specifies "preferred vocabularies" for XML-based content components and "core media types" that include raster and vector image formats that readers must support.
  • EPUB Content Documents 3.0. Defines profiles of XHTML, SVG, and CSS for use as the primary content in EPUB publications.
  • EPUB Media Overlays 3.0. Defines a format and a processing model for synchronization of text and audio.

EPUB is the unifying term used to denote a collection of Content Documents, a Package Document, at least one Navigation Document, and other supporting files, typically in a variety of media types, including structured text and graphics, packaged in a ZIP-based OCF container that constitute a cohesive unit for publication, as defined by the EPUB standards. The container file for an EPUB has the extension .epub; changing the extension to .zip may permit exploration of the individual files.

Every EPUB Publication includes a single XML-based Package Document, which specifies all the publication's constituent content documents, through a manifest element, together with associated required resources, defines a reading order for linear consumption, through a spine element, and associates publication-level metadata and navigation information. An EPUB Navigation Document in HTML5/XHTML is a required component of an EPUB Publication; it provides the basis for both machine-readable and human-readable navigation. A mandatory toc nav element defines the primary navigation hierarchy and must be consistent with the spine. Other nav elements can support navigation options familiar from printed books, such as lists of illustrations, or from electronic documents, such as marks for significant structural components.

EPUB 3 was approved by IDPF as a Recommendation in October 2011. A June 2012 resolution of ISO/IEC JTC 1/SC 34 (Document Description and Processing Languages) notes that EPUB 3 will be submitted to ISO/IEC JTC1/SC34 as a Draft Technical Specification via the JTC 1 fast-track procedure by the Korean national standards body.

Production phase An EPUB file is likely to be used primarily as a final-state format, for dissemination to end-users
Relationship to other formats
    Has earlier version EPUB_2, EPUB, Electronic Publication, Version 2. Last minor version of EPUB, Version 2 was 2.0.1, approved in 2010. There are substantial changes between EPUB_2 and EPUB_3.
    Subtype of OCF (Open Container Format), based on the ZIP archiving format, not yet separately described at this site. EPUB_3 uses version 3 of OCF.
    Contains A single XML-based Package Document not yet separately described at this site.
    May contain GIF, GIF Graphics Interchange Format, Version 89a
    May contain JFIF, JFIF, JPEG File Interchange Format
    May contain PNG, Portable Network Graphics
    May contain SVG_1_1, Scalable Vector Graphics (SVG), Version 1.1. EPUB_3 specifies a slightly restricted version of SVG_1_1. In particular, animation objects are not permitted.
    May contain MP3_ENC, MP3 Audio Encoding. Assumed to be wrapped in the widely used de facto file format MP3_FF which wraps MP3 encoding with optional ID3 metadata blocks.
    May contain MP4_FF_2_AAC, MPEG-4 File Format, V.2, with Advanced Audio Encoding.. Limited to Low Complexity audio compression. See AAC_MP4_LC, AAC (MPEG-4) Low Complexity Object.

Local use Explanation of format description terms

LC experience or existing holdings No current use at LC.
LC preference As an XML-based format using publicly documented schemas that represent the logical structure of a publication, EPUB_3 satisfies most of the desired characteristics for formats for textual works, if the content files are not encrypted, if the file is not subject to technological protection that inhibits long-term preservation and access, and if all content is stored within the EPUB container. The fact that metadata records, for example, in the ONIX schema, appear only to be available through links to external records and not embedded in an EPUB publication suggests that LC would want to receive /access metadata records in conjunction with ingestion of an EPUB publication. Any interactive functionality supported by embedded Javascript will be harder to preserve for the long term than static re-flowable content.

Sustainability factors Explanation of format description terms

Disclosure Open standard, developed and maintained under the auspices of the International Digital Publishing Forum (IDPF).
    Documentation Specifications for EPUB version 3 from IDPF.
Adoption

This format is very new as of February 2012, and signs of future support for EPUB_3 are only just beginning to emerge from developers of tools supporting EPUB_2.

A free, open-source validator, EpubCheck is available at https://github.com/IDPF/epubcheck/wiki. Readium, an IDPF project to accelerate adoption of EPUB_3, has recently released a reader plug-in for the Chrome browser, and can be expected to deliver more tools in coming months.

Adobe InDesign in CS5.5 already supports some features from EPUB_3, such as support for Japanese scripts and embedded rich media.

    Licensing and patents No licensing concerns for production or use of content compliant with the EPUB specifications or core media types for Version 3.
Transparency

Text content must be in XHTML/HTML5, which rates highly for transparency. However, encryption is permitted. If content files are encrypted, the package file must contain the information necessary for decryption, including key and algorithm used. If used, embedded fonts may be obfuscated (see Notes below) which also reduces transparency.

Self-documentation

The mandatory EPUB Package Document can include unqualified Dublin Core (DCMES) metadata and readers must recognize these elements. dc:Title, dc:Identifier, and dc:Language are mandatory. If more than one title is present, titles are required to be given title-type properties, to allow for series/collection titles, subtitles, short titles, edition statements, etc. Also mandatory is the dcterms:Modified element, which is combined with dc:Identifier to act as an identifier for a particular package. A meta element may be used to define and populate other metadata elements. In addition, a link can be made to externally stored metadata records in other schemas.

The EPUB Package Document also includes a manifest of component files. A mandatory component file is an EPUB Navigation Document, which provides structural metadata to relate the various content documents through a table of contents. The mandatory spine element in the Package Document stipulates a natural reading order.

External dependencies No dependencies for unencrypted publications. However, encrypted, protected publications usually depend for access on specific proprietary reader applications to satisfy the procedures and perform the particular decryption operations required by the DRM scheme selected by the publication's vendor.
Technical protection considerations

In addition to support for encryption of content files within the OCF container, an optional element of the OCF container format can specify digital rights management (DRM) terms and procedures. Commercially published EPUBs can be expected to protect their EPUB files with DRM. Since EPUB_3 is new, it is hard to predict whether a variety of DRM techniques will emerge. For EPUB_2, the DRM scheme most widely deployed is Adobe's ADEPT mechanism. Apple uses its own DRM for iBooks; Barnes & Noble also has its own DRM scheme.

Embedded third-party fonts may be "obfuscated" by partial encryption. See Notes below for more information. The result is that any reader tool has to be capable of performing the decryption in order to be able to use the intended fonts.


Quality and functionality factors Explanation of format description terms

Text
Normal rendering Good support.
Integrity of document structure The logical structure of a document is an essential feature of EPUB.
Integrity of layout and display Publishers may choose to control some aspects of layout through style-sheets. However, flowable text, by definition, will break lines and paginate text differently depending on the reading platform and user choices.
Support for mathematics, formulae, etc. EPUB_3 can contain MathML markup.
Functionality beyond normal rendering Flowable text can adapt to reading devices with a variety of form factors. Synchronization of audio and text is supported. A pronunciation lexicon can be embedded to support text to speech renderings.

File type signifiers Explanation of format description terms

Tag Value Note
Filename extension epub
Recommended extension for the EPUB container file.
Internet Media Type application/epub+zip
From OCF specification.
Magic numbers See note.  From OCF specification:
  • The bytes “PK” will be at the beginning of the file, followed by two additional bytes from the ZIP specification: \003 \004
  • The bytes “mimetype” will be at position 30
  • actual MIME type (i.e., the ASCII string “application/epub+zip”) will begin at position 38
Indicator for profile, level, version, etc. See note.  The version of EPUB, in this case "3.0," is identified in the version attribute of the root <package> element in the .opf file, which can be found in the OEBPS directory when the contents of the .epub file is "unzipped", i.e., extracted from the ZIP archive into its component files.

Notes Explanation of format description terms

General

Among the changes in EPUB_3 from EPUB_2 is the adoption of the ZIP-based container as the only serialization for an EPUB publication. "OCF 3.0 [OCF3] only defines a single-file (ZIP-based) container, and no longer defines a "Filesystem Container" abstraction. This change was made in conjunction with new restrictions in Publications 3.0 restricting references to remote resources in EPUB Publications to specific media types and contexts. Taken together, these changes mean that the only instantiation of an EPUB Publication defined at this time is the EPUB ZIP Container, and that EPUB files must in general contain all constituent parts of the Publication, with certain well-defined exceptions." Audio and video content may be stored remotely rather than in the EPUB container. All other content must be in the EPUB container.

Conforming EPUB reading systems must support: HTML5, XHTML, CSS, SVG, GIF, JPEG, PNG, MP3, and AAC (low complexity) in an MP4 wrapper. These are EPUB 3 Core Media Types that all Reading Systems must support and publications may include. Publications may include resources of other media types, but for each such resource there must be an alternative resource of a Core Media Type using methods defined in the EPUB specification.

Some vendors of proprietary fonts may only permit their use (and embedding) in EPUBs if the fonts are in some way bound to the particular publication and not available on the user's system for other purposes. EPUB supports a method of font obfuscation (also known as "mangling") for this purpose. Obfuscation of embedded fonts for EPUBs is achieved by modifying the first 1040 bytes using a SHA-1 digest of the publication's unique identifier, stripped of any whitespace characters.

History

The Open eBook Publication Structure or "OEB", originally produced in 1999, was the precursor to EPUB.

Version 1.0 of the Publication Structure was created in the winter, spring, and summer of 1999 by the Open eBook Authoring Group. Following the release of OEBPS 1.0, the Open eBook Forum (OeBF) was formally incorporated in January 2000. OEBPS Version 1.0.1 [OEBPS_1_0], a maintenance release, was brought out in July 2001. OEBPS Version 1.2 [OEBPS_1_2], incorporating new support for control by content providers over presentation along with other corrections and improvements, was released as a Recommended Specification in August 2002.

EPUB 2 was initially standardized in 2007. EPUB 2.0.1 was approved in 2010.

EPUB, Version 3, was approved as an IDPF Recommendation in October 2011. It is substantially different from EPUB, Version 2, both in using only a single form for textual content and in having support for audio, video, and scripted interactivity (through Javascript). No longer supported are the EPUB_2 formats for text content, one based on the Digital Talking Book [DTB_2005] format and a second form based on XHTML 1.1 compatible with OEBPS_1_2. A single new encoding for textual Content Documents is based on HTML5/XHTML and CSS3, despite the fact that both of these W3C standards are still works in progress. SVG is supported for graphics and it is possible to have an EPUB_3 document whose "pages" consists only of graphics, for example for a graphic novel. Several legacy features are deprecated. Some legacy structures may be included for compatibility of EPUB_3 documents with existing EPUB_2 readers. EPUB_3 readers are expected to render publications using version 2 and version 3.


Format specifications Explanation of format description terms


Useful references

URLs

Books, articles, etc.

Last Updated: Wednesday, 05-Feb-2014 17:24:29 EST