Sustainability of Digital Formats
|
|
| Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact | |


| Full name | EPUB, Electronic Publication, Version 3 |
|---|---|
| Description |
EPUB is a format for electronic publications with reflowable text in marked up document structure with associated images for illustrations, all in a container format. Reflowable text allows the text display to be optimized for the particular display device used by the reader of the EPUB-formatted book; this is in contrast to documents with pre-determined pagination. EPUB allows publishers to control document presentation through style-sheets. EPUB, Version 3 also provides for audio, including synchronization of text and audio, and for text-to-speech synthesis. Video may be embedded, but readers need not support video. Creators must provide a still image as fallback for a video clip. The EPUB_3 specification comprises an overview and four separate specification documents:
EPUB is the unifying term used to denote a collection of Content Documents, a Package Document, at least one Navigation Document, and other supporting files, typically in a variety of media types, including structured text and graphics, packaged in a ZIP-based OCF container that constitute a cohesive unit for publication, as defined by the EPUB standards. The container file for an EPUB has the extension .epub; changing the extension to .zip may permit exploration of the individual files. Every EPUB Publication includes a single XML-based Package Document, which specifies all the publication's constituent content documents, through a manifest element, together with associated required resources, defines a reading order for linear consumption, through a spine element, and associates publication-level metadata and navigation information. An EPUB Navigation Document in HTML5/XHTML is a required component of an EPUB Publication; it provides the basis for both machine-readable and human-readable navigation. A mandatory toc nav element defines the primary navigation hierarchy and must be consistent with the spine. Other nav elements can support navigation options familiar from printed books, such as lists of illustrations, or from electronic documents, such as marks for significant structural components. EPUB 3 was approved by IDPF as a Recommendation in October 2011. A June 2012 resolution of ISO/IEC JTC 1/SC 34 (Document Description and Processing Languages) notes that EPUB 3 will be submitted to ISO/IEC JTC1/SC34 as a Draft Technical Specification via the JTC 1 fast-track procedure by the Korean national standards body. |
| Production phase | An EPUB file is likely to be used primarily as a final-state format, for dissemination to end-users |
| Relationship to other formats | |
| Has earlier version | EPUB_2, EPUB, Electronic Publication, Version 2. Last minor version of EPUB, Version 2 was 2.0.1, approved in 2010. There are substantial changes between EPUB_2 and EPUB_3. |
| Subtype of | OCF (Open Container Format), based on the ZIP archiving format, not yet separately described at this site. EPUB_3 uses version 3 of OCF. |
| Contains | A single XML-based Package Document not yet separately described at this site. |
| May contain | GIF, GIF Graphics Interchange Format, Version 89a |
| May contain | JFIF, JFIF, JPEG File Interchange Format |
| May contain | PNG, Portable Network Graphics |
| May contain | SVG_1_1, Scalable Vector Graphics (SVG), Version 1.1. EPUB_3 specifies a slightly restricted version of SVG_1_1. In particular, animation objects are not permitted. |
| May contain | MP3_ENC, MP3 Audio Encoding. Assumed to be wrapped in the widely used de facto file format MP3_FF which wraps MP3 encoding with optional ID3 metadata blocks. |
| May contain | MP4_FF_2_AAC, MPEG-4 File Format, V.2, with Advanced Audio Encoding.. Limited to Low Complexity audio compression. See AAC_MP4_LC, AAC (MPEG-4) Low Complexity Object. |

| LC experience or existing holdings | No current use at LC. |
|---|---|
| LC preference | As an XML-based format using publicly documented schemas that represent the logical structure of a publication, EPUB_3 satisfies most of the desired characteristics for formats for textual works, if the content files are not encrypted, if the file is not subject to technological protection that inhibits long-term preservation and access, and if all content is stored within the EPUB container. The fact that metadata records, for example, in the ONIX schema, appear only to be available through links to external records and not embedded in an EPUB publication suggests that LC would want to receive /access metadata records in conjunction with ingestion of an EPUB publication. Any interactive functionality supported by embedded Javascript will be harder to preserve for the long term than static re-flowable content. |

| Disclosure | Open standard, developed and maintained under the auspices of the International Digital Publishing Forum (IDPF). |
|---|---|
| Documentation | Specifications for EPUB version 3 from IDPF. |
| Adoption |
This format is very new as of February 2012, and signs of future support for EPUB_3 are only just beginning to emerge from developers of tools supporting EPUB_2. A free, open-source validator, EpubCheck is available at http://code.google.com/p/epubcheck/. Readium, an IDPF project to accelerate adoption of EPUB_3, has recently released a reader plug-in for the Chrome browser, and can be expected to deliver more tools in coming months. Adobe InDesign in CS5.5 already supports some features from EPUB_3, such as support for Japanese scripts and embedded rich media. |
| Licensing and patents | No licensing concerns for production or use of content compliant with the EPUB specifications or core media types for Version 3. |
| Transparency |
Text content must be in XHTML/HTML5, which rates highly for transparency. However, encryption is permitted. If content files are encrypted, the package file must contain the information necessary for decryption, including key and algorithm used. If used, embedded fonts may be obfuscated (see Notes below) which also reduces transparency. |
| Self-documentation |
The mandatory EPUB Package Document can include unqualified Dublin Core (DCMES) metadata and readers must recognize these elements. dc:Title, dc:Identifier, and dc:Language are mandatory. If more than one title is present, titles are required to be given title-type properties, to allow for series/collection titles, subtitles, short titles, edition statements, etc. Also mandatory is the dcterms:Modified element, which is combined with dc:Identifier to act as an identifier for a particular package. A meta element may be used to define and populate other metadata elements. In addition, a link can be made to externally stored metadata records in other schemas. The EPUB Package Document also includes a manifest of component files. A mandatory component file is an EPUB Navigation Document, which provides structural metadata to relate the various content documents through a table of contents. The mandatory spine element in the Package Document stipulates a natural reading order. |
| External dependencies | No dependencies for unencrypted publications. However, encrypted, protected publications usually depend for access on specific proprietary reader applications to satisfy the procedures and perform the particular decryption operations required by the DRM scheme selected by the publication's vendor. |
| Technical protection considerations |
In addition to support for encryption of content files within the OCF container, an optional element of the OCF container format can specify digital rights management (DRM) terms and procedures. Commercially published EPUBs can be expected to protect their EPUB files with DRM. Since EPUB_3 is new, it is hard to predict whether a variety of DRM techniques will emerge. For EPUB_2, the DRM scheme most widely deployed is Adobe's ADEPT mechanism. Apple uses its own DRM for iBooks; Barnes & Noble also has its own DRM scheme. Embedded third-party fonts may be "obfuscated" by partial encryption. See Notes below for more information. The result is that any reader tool has to be capable of performing the decryption in order to be able to use the intended fonts. |

| Text | |
|---|---|
| Normal rendering | Good support. |
| Integrity of document structure | The logical structure of a document is an essential feature of EPUB. |
| Integrity of layout and display | Publishers may choose to control some aspects of layout through style-sheets. However, flowable text, by definition, will break lines and paginate text differently depending on the reading platform and user choices. |
| Support for mathematics, formulae, etc. | EPUB_3 can contain MathML markup. |
| Functionality beyond normal rendering | Flowable text can adapt to reading devices with a variety of form factors. Synchronization of audio and text is supported. A pronunciation lexicon can be embedded to support text to speech renderings. |

| Tag | Value | Note |
|---|---|---|
| Filename extension | epub |
Recommended extension for the EPUB container file. |
| Internet Media Type | application/epub+zip |
From OCF specification. |
| Magic numbers | See note. | From OCF specification:
|
| Indicator for profile, level, version, etc. | See note. | The version of EPUB, in this case "3.0," is identified in the version attribute of the root <package> element in the .opf file, which can be found in the OEBPS directory when the contents of the .epub file is "unzipped", i.e., extracted from the ZIP archive into its component files. |

| General |
Among the changes in EPUB_3 from EPUB_2 is the adoption of the ZIP-based container as the only serialization for an EPUB publication. "OCF 3.0 [OCF3] only defines a single-file (ZIP-based) container, and no longer defines a "Filesystem Container" abstraction. This change was made in conjunction with new restrictions in Publications 3.0 restricting references to remote resources in EPUB Publications to specific media types and contexts. Taken together, these changes mean that the only instantiation of an EPUB Publication defined at this time is the EPUB ZIP Container, and that EPUB files must in general contain all constituent parts of the Publication, with certain well-defined exceptions." Audio and video content may be stored remotely rather than in the EPUB container. All other content must be in the EPUB container. Conforming EPUB reading systems must support: HTML5, XHTML, CSS, SVG, GIF, JPEG, PNG, MP3, and AAC (low complexity) in an MP4 wrapper. These are EPUB 3 Core Media Types that all Reading Systems must support and publications may include. Publications may include resources of other media types, but for each such resource there must be an alternative resource of a Core Media Type using methods defined in the EPUB specification. Some vendors of proprietary fonts may only permit their use (and embedding) in EPUBs if the fonts are in some way bound to the particular publication and not available on the user's system for other purposes. EPUB supports a method of font obfuscation (also known as "mangling") for this purpose. Obfuscation of embedded fonts for EPUBs is achieved by modifying the first 1040 bytes using a SHA-1 digest of the publication's unique identifier, stripped of any whitespace characters. |
|---|---|
| History |
The Open eBook Publication Structure or "OEB", originally produced in 1999, was the precursor to EPUB. Version 1.0 of the Publication Structure was created in the winter, spring, and summer of 1999 by the Open eBook Authoring Group. Following the release of OEBPS 1.0, the Open eBook Forum (OeBF) was formally incorporated in January 2000. OEBPS Version 1.0.1 [OEBPS_1_0], a maintenance release, was brought out in July 2001. OEBPS Version 1.2 [OEBPS_1_2], incorporating new support for control by content providers over presentation along with other corrections and improvements, was released as a Recommended Specification in August 2002. EPUB 2 was initially standardized in 2007. EPUB 2.0.1 was approved in 2010. EPUB, Version 3, was approved as an IDPF Recommendation in October 2011. It is substantially different from EPUB, Version 2, both in using only a single form for textual content and in having support for audio, video, and scripted interactivity (through Javascript). No longer supported are the EPUB_2 formats for text content, one based on the Digital Talking Book [DTB_2005] format and a second form based on XHTML 1.1 compatible with OEBPS_1_2. A single new encoding for textual Content Documents is based on HTML5/XHTML and CSS3, despite the fact that both of these W3C standards are still works in progress. SVG is supported for graphics and it is possible to have an EPUB_3 document whose "pages" consists only of graphics, for example for a graphic novel. Several legacy features are deprecated. Some legacy structures may be included for compatibility of EPUB_3 documents with existing EPUB_2 readers. EPUB_3 readers are expected to render publications using version 2 and version 3. |

|
|