Sustainability of Digital Formats
 Planning for Library of Congress Collections

Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact
Format Description Categories >> Browse Alphabetical List

OOXML Format Family -- ISO/IEC 29500 and ECMA 376

>> Back
Table of Contents
Format Description Properties Explanation of format description terms

Identification and description Explanation of format description terms

Full name Office Open XML (OOXML, ISO/IEC 29500, ECMA 376) Format Family
Description

This description is an overview of the family of formats defined by ISO/IEC 29500: Information technology -- Document description and processing languages -- Office Open XML File Formats and the corresponding ECMA 376 specifications.

This family of XML-based formats was designed by Microsoft to match the functionality of the proprietary binary formats that had been used as the default formats in Microsoft Office applications (Word, Excel, and PowerPoint) through Office 2003 and be fully compatible with the existing corpus of documents. In December 2005, a Technical Committee of the ECMA standardization organization (TC45) was established to review documentation for the proposed Office Open XML specification submitted by Microsoft. The committee incorporated expertise from large customers using Microsoft Office for enterprise systems, software vendors already developing products to read, write, and transform office documents, and archival institutions with an interest in long-term preservation. The resulting document was approved as ECMA 376 in December 2006 and was then submitted for standardization through ISO/IEC JTC 1 in early 2007. Approval as ISO/IEC 29500 followed in early 2008. ISO/IEC 29500 incorporated many detailed changes and a restructuring of the parts. One important change was to separate the specifications for markup that supports the functional requirements of the three main content categories from the specifications for elements and attributes that support backwards compatibility and legacy formats. Legacy markup was documented in a new part under the title Transitional Migration Features. Files that comply with ISO/IEC 29500 Part 1 are termed "Strict" and files that comply with Part 4 (which is structured as textual modifications to Part 1) are termed "Transitional."

The primary members of the OOXML Format Family are document formats for the key office productivity content categories:

  • DOCX for word-processing files, specified via WordprocessingML. See DOCX/OOXML_2012.
  • XLSX for spreadsheet files, specified via SpreadsheetML. See XLSX/OOXML_2012.
  • PPTX for presentation/slideshow files, specified via PresentationML. See PPTX/OOXML_2012.

Key supporting members of the format family include:

  • A package format based on ZIP that is used as a container by all three primary OOXML content categories and other non-OOXML file formats, such as Visio .vsdx and Open XML Paper Specification .oxps files. See OPC/OOXML_2012.
  • A markup language, Markup Compatibility and Extensibility (MCE), for supporting extensibility of an XML-based format over time, so that files created by future application versions and using new elements and attributes in new namespaces can be structured in a way that earlier application versions can still read the files and take appropriate actions when faced with markup using the new namespaces. Any of the three primary content types can include markup using the MCE namespace and schema. Like OPC, MCE can be applied beyond OOXML. See MCE/OOXML_2012.

WordprocessingML, SpreadsheetML, and PresentationML are specified in Parts 1 and 4 of ISO/IEC 29500. Part 1 defines the Strict variant of these three formats. Part 4, written as a supplement to Part 1, specifies additional markup to support compatibility with various legacy applications. The Transitional variant of each of these formats allows markup documented in Part 4 in addition to that documented in Part 1. In addition to the three main markup languages (MLs), the standard defines several supporting markup languages and schemas: DrawingML, which includes markup for graphical elements in any of the three main document types, including embedded images, vector graphics for diagrams, and analytical charts derived from data in a document; Office Math Markup Language (OMML), which supports the display of mathematics in the context of applications that support collaborative editing and tracked changes within mathematical expressions; a schema for bibliographies; and several supporting schemas for document properties (core, extended, and custom). Part 4 includes the specification for VML, a deprecated graphics language superseded by DrawingML.

Several closely related formats are covered completely or in large part by the ISO/IEC 29500 and ECMA 376 specifications. These include documents used as templates for other documents and macro-enabled variants of the primary content types. See Notes below.

Production phase OOXML can be used in any production phase for office documents, as they are created (initial state), exchanged for editing and review (middle-state), and published (final-state).
Relationship to other formats
    Subtype of OPC/OOXML_2012, Open Packaging Conventions (Office Open XML) , ISO/IEC 29500-2:2008-2012
    Has subtype DOCX/OOXML_2012, DOCX Transitional (Office Open XML), ISO 29500:2008-2012, ECMA-376. Uses WordprocessingML in an OPC/OOXML_2012 package.
    Has subtype DOCX/OOXML_Strict_2012, DOCX Strict (Office Open XML), ISO 29500-1:2008-2012, ECMA-376, Editions 2-4. Disallows legacy markup permitted in DOCX Transitional to support backwards compatibility.
    Has subtype XLSX/OOXML_2012, XLSX Transitional (Office Open XML), ISO 29500:2008-2012, ECMA-376. Uses SpreadsheetML in an OPC/OOXML_2012 package.
    Has subtype XLSX/OOXML_Strict_2012, XLSX Strict (Office Open XML), ISO 29500-1:2008-2012, ECMA-376, Editions 2-4. Disallows legacy markup permitted in XLSX Transitional to support backwards compatibility. Permits storage of dates in profile of ISO 8601 date and time format.
    Has subtype PPTX/OOXML_2012, PPTX Transitional (Office Open XML), ISO 29500:2008-2012, ECMA-376. Uses PresentationML in an OPC/OOXML_2012 package.
    Has subtype PPTX/OOXML_Strict_2012, PPTX Strict (Office Open XML), ISO 29500-1:2008-2012, ECMA-376, Editions 2-4. Disallows legacy markup permitted in PPTX Transitional to support backwards compatibility.
    Has subtype Other application-specific formats fully defined by OOXML as specified in ISO/IEC 29500. These include .dotx, .potx, and .xltx, template files as used and produced by Microsoft Office products since Office 2007. The template formats are not described separately at this web site; they are essentially identical to the corresponding document formats, which are described as subtypes and linked above.
    Has modified version Other application-specific formats closely related to OOXML. These include .docm, .pptm, and .xlsm files, macro-enabled formats as produced by Microsoft Office products since Office 2007. The macro-enabled variant formats are not described at this web site at this time. See Notes below for more on Microsoft Office use of macros.
    May contain MCE/OOXML_2012, Markup Compatibility and Extensibility (Office Open XML), ISO 29500-3:2008-2012
    Subtype of ZIP_6_2_0, ZIP File Format, Version 6.2.0 (PKWARE)
    Defined via XML, Extensible Markup Language (XML)

Local use Explanation of format description terms

LC experience or existing holdings

LC was represented on ECMA TC45 during the initial standardization processes and continues to be active in the maintenance through SC34/WG4.

For other aspects of LC experience with OOXML, see individual subtypes, particularly DOCX/OOXML_2012, XLSX/OOXML_2012, and PPTX/OOXML_2012.

LC preference

See individual subtypes, particularly DOCX/OOXML_2012, XLSX/OOXML_2012, and PPTX/OOXML_2012.


Sustainability factors Explanation of format description terms

Disclosure Family of formats based on international open standard. Maintained by ISO/IEC JTC1 SC34/WG4. Originated by Microsoft Corporation and first standardized through ECMA International in 2006. Approval as ISO/IEC 29500 was in 2008.
    Documentation

ISO/IEC 29500, Information technology -- Document description and processing languages -- Office Open XML File Formats -- Parts 1-4. Latest version (2012 as of December 2014) is available from ISO/IEC Publicly Available Standards.

All editions of the OOXML standards as published by ECMA are available from ECMA-376: Office Open XML File Formats. See Notes below for a chronology.

Annex L of Part 1 is a Primer (informative rather than normative) that introduces key features of the constituent markup languages, including WordprocessingML, SpreadsheetML, and PresentationML, relating elements and attributes to intended functionality through examples.

Adoption Widely adopted. See individual subtypes for more detail.
    Licensing and patents

The specification originated from Microsoft Corporation. Current and future versions of ISO/IEC 29500 and ECMA-376 are covered by Microsoft's Open Specification Promise, whereby Microsoft "irrevocably promises" not to assert any claims against those making, using, and selling conforming implementations of any specification covered by the promise (so long as those accepting the promise refrain from suing Microsoft for patent infringement in relation to Microsoft's implementation of the covered specification).

Transparency

For transparency of the package containing the constituent parts of the DOCX file, see OPC/OOXML_2012.

Inside the OPC package, the OOXML formats are XML-based and inherently more transparent than their binary predecessors. See individual subtypes for more detail.

Self-documentation

See individual subtypes, in particular OPC/OOXML_2012, the package format that is used by the other formats.

External dependencies

None, beyond XML-aware software. See individual subtypes, in particular OPC/OOXML_2012, the package format that is used by the other formats.

Technical protection considerations Encryption is not permitted within the OPC package [OPC/OOXML_2012] used to wrap all OOXML documents as of 2014. However, an OPC package may be encrypted and some applications using this container format as the basis for a more specific format, may use encryption during interchange or DRM for distribution.

Quality and functionality factors Explanation of format description terms

Other
See individual subtypes

See individual subtypes, particularly DOCX/OOXML_2012, XLSX/OOXML_2012, and PPTX/OOXML_2012.


File type signifiers Explanation of format description terms

Tag Value Note
Filename extension docx
xlsx
pptx
And extensions used by other formats based on the OOXML specifications. See Notes below.
Internet Media Type application/vnd.openxmlformats-officedocument.wordprocessingml.document
application/vnd.openxmlformats-officedocument.spreadsheetml.document
application/vnd.openxmlformats-officedocument.presentationml.document
From IANA assignments site.
File signature See related format.  See ZIP_PK.
XML namespace declaration http://schemas.openxmlformats.org/.../2006/main
http://purl.oclc.org/ooxml/.../main
The first pattern is for the Transitional variants of the three OOXML document types, with the ellipses being replaced by wordprocessingml, spreadsheetml, or presentationml as appropriate. The second pattern applies to the Strict variants of the three document types, with the same substitutions for the ellipses.

Notes Explanation of format description terms

General

File extensions used: The extensions listed below are commonly used. They are not defined in ISO/IEC 29500, but most of them are specified in the many MIME type registrations made with IANA, using the pattern application/vnd.openxmlformats-officedocument.... Macro-enabled versions of the primary document types and of templates are for files that do not technically comply with the OOXML standard, which does not permit the use of macros. But the files do use the OPC/OOXML_2012 package specification and many parts of the package follow the official standard. See note below for more on macro-enabled files, which are documented by Microsoft.

  • Wordprocessing file extensions: .docx (document); dotx (template). Not strictly OOXML but very closely related: .docm (macro-enabled document); .dotm (macro-enabled template)
  • Spreadsheet file extensions: .xlsx (spreadsheet); .xltx (template). Not strictly OOXML but very closely related: .xlsm (macro-enabled spreadsheet); .xltm (macro-enabled template); .xlsb (XML-based structure with binary data, for faster loading and smaller files, intended to address early performance problems for very large .xlsx spreadsheets)
  • Presentation file extensions: .pptx (presentation for editing); .ppsx (slideshow for running); .potx (presentation template); .pptm (macro-enabled presentation for editing); .ppsm (macro-enabled slideshow for running); .potm (macro-enabled template)

Macro-enabled variants: The macro-enabled variants of the OOXML formats are straightforward extensions of the primary formats without macros. They use OPC/OOXML_2012 as a package, simply adding a few parts that contain the macros and associated data. Macros for Office are written in Visual Basic for Applications (VBA). Note that macros do not work in Office for Mac 2008. In Office for Mac 2011 (the latest version as of 2014), macros are supported. However, not all macros originally written for a Windows version of Office will run on a Mac without modification to take account of differences between the implementations of VBA for the Windows and Mac (OS X) versions of Office, for example those that use ActiveX or other Windows-specific features. The additional parts for macros are defined in three supplementary documents:

  • [MS-OVBA] specifies the format for a file that holds the macro code for a VBA (Visual Basic for Applications) Project. All the macros for a workbook are included in a single file/part.
  • [MS-OFFMACRO], applicable to Office 2007, specifies the structure that connects the VBA Project part to the workbook, including an optional part for VBA Supplemental Data, and two parts that support use of older macros from Excel 4.0 (Macro Sheets and International Macro Sheets).
  • [MS-OFFMACRO2] performs the same function as [MS-OFFMACRO], but is applicable to Office 2010 and Office 2013. The two specifications are almost identical except for the standards they reference and one small new feature added to VBA with version 7.0.

Note that the VBA Editor used to develop macros and distributed as part of desktop Office applications for Windows (but hidden from users by default), offers the ability to export VBA macro code as a .BAS file (which is a regular text file). Microsoft applications offer a Save As option to drop the macro-related parts from a macro-enabled file and create a regular document. This is commonly done to archive a snapshot of a document or spreadsheet in which macros are used to update the file, perhaps based on external data. Apache Open Office and LibreOffice offer options as to how to handle macros on import of .xlsm files. Neither application can run all VBA macros as-is, although, according to Using Microsoft Office and LibreOffice in late 2014, "recent versions of LibreOffice can run some Visual Basic scripts" if the feature is enabled.

History

The first XML-based formats for Word and Excel were included in the release of Office 2003; these were flat XML files. These are partially documented on the Microsoft Development Network (MSDN) site. These formats were precursors to OOXML, with both similarities and significant differences.

The original OOXML specification was published as ECMA-376 in 2006. The primary difference between that version and the version published as ISO/IEC 29500:2008 was the split between the Strict variants of DOCX, XLSX, and PPTX (as specified in Part 1) and the Transitional variants (as defined in Part 4 in conjunction with Part 1). All versions since ISO/IEC 29500:2008 specify essentially the same format. The editions published by ISO/IEC in 2011 and 2012 consisted primarily of clarifications and corrections. In particular, modifications to Part 4 (Transitional Migration Features) have been intended to ensure that the specification corresponds to the corpus of existing documents and that interoperability between existing applications was improved rather than disrupted. See individual subtypes, particularly DOCX/OOXML_2012, XLSX/OOXML_2012, and PPTX/OOXML_2012 for more detail for the three primary OOXML document types. The chronology of editions specifying the OOXML family of formats is:

  • ECMA-376, 1st edition (December 2006)
  • ISO/IEC 29500:2008
  • ECMA-376, 2nd edition (December 2008) [specification identical to ISO/IEC 29500:2008]
  • ISO/IEC 29500:2011
  • ECMA-376, 3rd edition (June 2011) [specification identical to ISO/IEC 29500:2011]
  • ISO/IEC 29500:2012
  • ECMA-376, 4th edition (December 2012) [specification identical to ISO/IEC 29500:2012]

A new edition of Part 3 of the specification, for Markup Compatibility and Extensibility, has been prepared. It was approved in September 2014 and will be published in early 2015. The intent of the update was to clarify the text and to emphasize the applicability of MCE beyond OOXML to support interoperability. The new edition does not introduce new features but does remove some flexibility that had not been exploited in practice and is deemed unnecessary. Most importantly, it makes the process for handling MCE on file import much clearer.

Another chronology of relevance to digital archivists is the support for OOXML formats in different versions of the Office software. Files created by the Microsoft Office applications have a /docProps/app.xml part that contains properties for the document as a whole, including <Application> and <AppVersion>. Values for AppVersion are numeric, representing internal version numbers used by Microsoft during development: 12 = Windows Office 2007 or Office for Mac 2008; 14 = Windows Office 2010 or Mac Office 2011; 15 = Office 2013.

  • Office 2007: Read, Transitional only. Write (as default), Transitional only. Based on ECMA-376, Edition 1. Also introduced read/write support for ODF 1.1.
  • Office 2010: Read, Transitional and Strict. Write (as default), Transitional only.  Based on ISO/IEC 29500:2008. Also read/write support for ODF 1.1
  • Office for Mac 2011: Read, Transitional only.  Write (as default), Transitional only. As of December 2014, there is not a desktop Office for Mac that can read or write Strict.
  • Office 2013: Read, Transitional and Strict.  Write, Transitional (as default), Strict (as option).  Based on ISO/IEC 29500:2011. Introduced read/write support for ODF 1.2.

A test of the iPad version of Excel in December 2014 revealed "Microsoft Macintosh Excel" for the value for Application and "15.0300" as the value for Appversion. The file used the Transitional namespace. Comments welcome.


Format specifications Explanation of format description terms


Useful references

URLs


Last Updated: Monday, 16-Nov-2015 10:58:18 EST