Sustainability of Digital Formats
 Planning for Library of Congress Collections

Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact
Format Description Categories >> Browse Alphabetical List

EPUB, Electronic Publication, Version 2

>> Back
Table of Contents
Format Description Properties Explanation of format description terms

Identification and description Explanation of format description terms

Full name EPUB, Electronic Publication, Version 2
Description

EPUB is a format for electronic publications with reflowable text in marked up document structure with associated images for illustrations, all in a container format. Reflowable text allows the text display to be optimized for the particular display device used by the reader of the EPUB-formatted book; this is in contrast to documents with pre-determined pagination. EPUB allows publishers to control document presentation through style-sheets. EPUB, Version 2 comprises three separate specifications:

  • OCF, Open Container Format, specifies the mandatory container for an EPUB. OCF is based on the ZIP format.
  • OPS, Open Publication Structure specifies options for representation of the content of an electronic publication. Content components can be XML-based documents for the flowable text, together with raster or vector images representing illustrations. OPS specifies "preferred vocabularies" for XML-based content components and "core media types" that include raster and vector image formats that readers must support.
  • OPF, Open Packaging Format, defines how components of an OPS publication are related, defining the natural reading order, and identifying alternate representations for content elements. OPF also holds document metadata.

EPUB is the unifying term used to denote a collection of OPS Documents, an OPF Package file, and other files, typically in a variety of media types, including structured text and graphics, packaged in an OCF container that constitute a cohesive unit for publication, as defined by the EPUB standards. The container file for an EPUB has the extension .epub.

EPUB 2 was initially standardized in 2007. EPUB 2.0.1 was approved in 2010.

Production phase An EPUB file is likely to be used primarily as a final-state format, for dissemination to end-users
Relationship to other formats
    Subtype of OCF (Open Container Format), based on the ZIP archiving format, not yet separately described at this site.
    Contains OPF, Open Packaging Format, not yet separately described at this site.
    May contain OEBPS_1_2, Open eBook Forum Publication Structure 1.2
    May contain DTB_2005, DTB (Digital Talking Book), 2005
    May contain GIF, GIF Graphics Interchange Format, Version 89a
    May contain JFIF, JFIF, JPEG File Interchange Format
    May contain PNG, Portable Network Graphics
    May contain SVG_1_1, Scalable Vector Graphics (SVG), Version 1.1
    May contain Other XML-based content types, including XML "islands" containing XML chunks based on non-preferred schemas or DTDs.
    Has later version EPUB_3, EPUB, Electronic Publication, Version 3

Local use Explanation of format description terms

LC experience or existing holdings No current use at LC.
LC preference As an XML-based format using publicly documented schemas that represent the logical structure of a publication, EPUB_2 satisfies most of the desired characteristics for formats for textual works, if the content files are not encrypted and the file is not subject to technological protection that inhibits long-term preservation and access.

Sustainability factors Explanation of format description terms

Disclosure Open standard, developed and maintained under the auspices of the International Digital Publishing Forum (IDPF).
    Documentation Specifications for EPUB version 2.0.1 from IDPF.
Adoption

As of March 2011, it is clear that the EPUB specification has filled a consumer need for a reflowable text format that can be used with a variety of hardware and software readers, so that a purchased eBook can be read on more than one reading device owned by the purchaser. Some vendors feel that this portability will appeal to consumers, in contrast to proprietary formats that can only be read on devices supported by the vendor. EPUB is also popular as a distribution format for transcribed books out of copyright. Although the format is also sometimes used for books digitized by scanning and OCR, it is unsatisfactory unless the OCR quality is very high.

As of March 2011, hardware reading devices supporting the EPUB format include: iPad, iPhone, Sony, Nook, Kobo, and Android devices.

Software EPUB readers include: add-ons for browsers; Adobe Digital Editions; ibisReader; and web-based readers, such as Bookworm and BookGlutton. See EPUB eBook Readers from epubbooks.com for a list of reading applications and devices. Some readers can only handle files not protected by digital rights management (DRM).

Publishing software for producing EPUB publications includes: Adobe InDesign and Content Server; Calibre; free tools to convert to EPUB_2 from various formats, including TEI, Microsoft Word, and RTF at http://code.google.com/p/epub-tools/; Sigil open source ePub/eBook editor.

Despite all the strong signs of adoption, the fact that work started on version 3 of EPUB so soon, and that it will be significantly different, might indicate that some active supporters of EPUB are looking for a different balance between publisher control and user convenience (flexibility, simplicity and cost-effectiveness). Alternatively, it might simply indicate that the technology for electronic publications is in flux. Embedded rich media content, which will be supported in EPUB, Version 3, will provide more of a challenge to developers of free software.

    Licensing and patents No licensing concerns for production or use of content compliant with the EPUB specifications or core media types for Version 2.
Transparency Text content must be in XML or XHTML, which rates highly for transparency. However, encryption is permitted. If content files are encrypted, the package file must contain the information necessary for decryption, including key and algorithm used. If used, embedded fonts may be obfuscated (see Notes below) which also reduces transparency.
Self-documentation The OPF packaging file can include unqualified Dublin Core (DCMES) metadata. It also provides structural metadata to relate the various content documents through a table of contents and to stipulate a natural reading order.
External dependencies No dependencies for unencrypted publications. However, encrypted, protected publications usually depend for access on specific proprietary reader applications to satisfy the procedures and perform the particular decryption operations required by the DRM scheme selected by the publication's vendor.
Technical protection considerations

In addition to support for encryption of content files within the OCF container, an optional element of the OCF container format can specify digital rights management terms and procedures. In practice, as of March 2011, most purchased EPUBs are protected by Adobe's ADEPT DRM scheme.


Quality and functionality factors Explanation of format description terms

Text
Normal rendering Good support.
Integrity of document structure The logical structure of a document is an essential feature of EPUB.
Integrity of layout and display Publishers may choose to control some aspects of layout through style-sheets. However, flowable text, by definition, will break lines and paginate text differently depending on the reading platform and user choices.
Support for mathematics, formulae, etc. Not supported.
Functionality beyond normal rendering Flowable text can adapt to reading devices with a variety of form factor.

File type signifiers Explanation of format description terms

Tag Value Note
Filename extension epub
Recommended extension for the EPUB publication file in its container format.
Internet Media Type application/epub+zip
From OCF specification.
Magic numbers See note.  From OCF specification:
  • The bytes “PK” will be at the beginning of the file
  • The bytes “mimetype” will be at position 30
  • actual MIME type (i.e., the ASCII string “application/epub+zip”) will begin at position 38
Indicator for profile, level, version, etc. See note.  The version of EPUB, in this case "2.0," is identified in the version attribute of the root <package> element in the .opf file, which can be found in the OEBPS directory when the contents of the .epub file is "unzipped", i.e., extracted from the ZIP archive into its component files.

Notes Explanation of format description terms

General

Conforming EPUB reading systems must support: XML, XHTML, DTBOOK (including NCX), OEBPS_1_2, CSS, GIF, JPEG, PNG, and SVG. These are OPS Core Media Types that all Reading Systems must support and publications may include. Publications may include resources of other media types, but for each such resource there must be an alternative resource of an OPS Core Media Type using methods defined in this specification or the OPF specification.

Some vendors of proprietary fonts may only permit their use (and embedding) in EPUBs if the fonts are in some way bound to the particular publication and not available on the user's system for other purposes. EPUB supports a method of font obfuscation (also known as "mangling") for this purpose. Obfuscation of embedded fonts for EPUBs is achieved by modifying the first 1040 bytes using a SHA-1 digest of the publication's unique identifier.

History

The Open eBook Publication Structure or "OEB", originally produced in 1999, was the precursor to OPS.

Version 1.0 of the Publication Structure was created in the winter, spring, and summer of 1999 by the Open eBook Authoring Group. Following the release of OEBPS 1.0, the Open eBook Forum (OeBF) was formally incorporated in January 2000. Version 1.0.1, a maintenance release, was brought out in July 2001. OEBPS Version 1.2, incorporating new support for control by content providers over presentation along with other corrections and improvements, was released as a Recommended Specification in August 2002.

EPUB 2 was initially standardized in 2007. EPUB 2.0.1 was approved in 2010.

EPUB, Version 3, was approved as an IDPF Recommendation in October 2011. It is substantially different from EPUB, Version 2. Many existing features are dropped, including the use of the Digital Talking Book DTB_2005 as a document content format. The preferred content format for textual content in EPUB_3 is the XHTML serialization of HTML5. New features include support for rich media and MathML. The talking book functionality is replaced by a more general SMIL-based mechanism for media overlays and support for text-to-speech pronunciation hints.


Format specifications Explanation of format description terms


Useful references

URLs


Last Updated: Thursday, 24-Oct-2013 16:08:09 EDT