Sustainability of Digital Formats: Planning for Library of Congress Collections

Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact
Format Description Categories >> Browse Alphabetical List

OOXML Format Family -- ISO/IEC 29500 and ECMA 376

>> Back
Table of Contents
Format Description Properties Explanation of format description terms

Identification and description Explanation of format description terms

Full name Office Open XML (OOXML, ISO/IEC 29500, ECMA 376) Format Family
Description

This description is an overview of the family of formats defined by ISO/IEC 29500: Information technology -- Document description and processing languages -- Office Open XML File Formats and the corresponding ECMA 376 specifications.

This family of XML-based formats was designed by Microsoft to match the functionality of the proprietary binary formats that had been used as the default formats in Microsoft Office applications (Word, Excel, and PowerPoint) through Office 2003 and be fully compatible with the existing corpus of documents. In December 2005, a Technical Committee of the ECMA standardization organization (TC45) was established to review documentation for the proposed Office Open XML specification submitted by Microsoft. The committee incorporated expertise from large customers using Microsoft Office for enterprise systems, software vendors already developing products to read, write, and transform office documents, and archival institutions with an interest in long-term preservation. The resulting document was approved as ECMA 376 in December 2006 and was then submitted for standardization through ISO/IEC JTC 1 in early 2007. Approval as ISO/IEC 29500 followed in early 2008. ISO/IEC 29500 incorporated many detailed changes and a restructuring of the parts. One important change was to separate the specifications for markup that supports the functional requirements of the three main content categories from the specifications for elements and attributes that support backwards compatibility and legacy formats. Legacy markup was documented in a new part under the title Transitional Migration Features. Files that comply with ISO/IEC 29500 Part 1 are termed "Strict" and files that comply with Part 4 (which is structured as textual modifications to Part 1) are termed "Transitional."

The primary members of the OOXML Format Family are document formats for the key office productivity content categories:

  • DOCX for word-processing files, specified via WordprocessingML. See DOCX/OOXML_2012.
  • XLSX for spreadsheet files, specified via SpreadsheetML. See XLSX/OOXML_2012.
  • PPTX for presentation/slideshow files, specified via PresentationML. See PPTX/OOXML_2012.

Key supporting members of the format family include:

  • A package format based on ZIP that is used as a container by all three primary OOXML content categories and other non-OOXML file formats, such as Visio .vsdx and Open XML Paper Specification .oxps files. See OPC/OOXML_2012.
  • A markup language, Markup Compatibility and Extensibility (MCE), for supporting extensibility of an XML-based format over time, so that files created by future application versions and using new elements and attributes in new namespaces can be structured in a way that earlier application versions can still read the files and take appropriate actions when faced with markup using the new namespaces. Any of the three primary content types can include markup using the MCE namespace and schema. Like OPC, MCE can be applied beyond OOXML. See MCE/OOXML_2012.

WordprocessingML, SpreadsheetML, and PresentationML are specified in Parts 1 and 4 of ISO/IEC 29500. Part 1 defines the Strict variant of these three formats. Part 4, written as a supplement to Part 1, specifies additional markup to support compatibility with various legacy applications. The Transitional variant of each of these formats allows markup documented in Part 4 in addition to that documented in Part 1. In addition to the three main markup languages (MLs), the standard defines several supporting markup languages and schemas: DrawingML, which includes markup for graphical elements in any of the three main document types, including embedded images, vector graphics for diagrams, and analytical charts derived from data in a document; Office Math Markup Language (OMML), which supports the display of mathematics in the context of applications that support collaborative editing and tracked changes within mathematical expressions; a schema for bibliographies; and several supporting schemas for document properties (core, extended, and custom). Part 4 includes the specification for VML, a deprecated graphics language superseded by DrawingML.

Several closely related formats are covered completely or in large part by the ISO/IEC 29500 and ECMA 376 specifications. These include documents used as templates for other documents and macro-enabled variants of the primary content types. See Notes below.

Production phase OOXML can be used in any production phase for office documents, as they are created (initial state), exchanged for editing and review (middle-state), and published (final-state).
Relationship to other formats
    Subtype of OPC/OOXML_2012, Open Packaging Conventions (Office Open XML) , ISO/IEC 29500-2:2008-2012
    Has subtype DOCX/OOXML_2012, DOCX Transitional (Office Open XML), ISO 29500:2008-2012, ECMA-376. Uses WordprocessingML in an OPC/OOXML_2012 package.
    Has subtype DOCX/OOXML_Strict_2012, DOCX Strict (Office Open XML), ISO 29500-1:2008-2012, ECMA-376, Editions 2-4. Disallows legacy markup permitted in DOCX Transitional to support backwards compatibility.
    Has subtype XLSX/OOXML_2012, XLSX Transitional (Office Open XML), ISO 29500:2008-2012, ECMA-376. Uses SpreadsheetML in an OPC/OOXML_2012 package.
    Has subtype XLSX/OOXML_Strict_2012, XLSX Strict (Office Open XML), ISO 29500-1:2008-2012, ECMA-376, Editions 2-4. Disallows legacy markup permitted in XLSX Transitional to support backwards compatibility. Permits storage of dates in profile of ISO 8601 date and time format.
    Has subtype PPTX/OOXML_2012, PPTX Transitional (Office Open XML), ISO 29500:2008-2012, ECMA-376. Uses PresentationML in an OPC/OOXML_2012 package.
    Has subtype PPTX/OOXML_Strict_2012, PPTX Strict (Office Open XML), ISO 29500-1:2008-2012, ECMA-376, Editions 2-4. Disallows legacy markup permitted in PPTX Transitional to support backwards compatibility.
    Has subtype Other application-specific formats fully defined by OOXML as specified in ISO/IEC 29500. These include .dotx, .potx, and .xltx, template files as used and produced by Microsoft Office products since Office 2007. The template formats are not described separately at this web site; they are essentially identical to the corresponding document formats, which are described as subtypes and linked above.
    Has modified version Other application-specific formats closely related to OOXML. These include .docm, .pptm, and .xlsm files, macro-enabled formats as produced by Microsoft Office products since Office 2007. The macro-enabled variant formats are not described at this web site at this time. See Notes below for more on Microsoft Office use of macros.
    May contain MCE/OOXML_2012, Markup Compatibility and Extensibility (Office Open XML), ISO 29500-3:2008-2015
    Subtype of ZIP_6_2_0, ZIP File Format, Version 6.2.0 (PKWARE)
    Defined via XML, Extensible Markup Language (XML)

Local use Explanation of format description terms

LC experience or existing holdings

The Library of Congress was represented on ECMA TC45 during the initial standardization processes and has continued to be active in the maintenance through SC34/WG4.

For other aspects of experience with OOXML, see individual subtypes, particularly DOCX/OOXML_2012, XLSX/OOXML_2012, and PPTX/OOXML_2012.

LC preference

The Library of Congress Recommended Format Statement (RFS) lists OOXML as an acceptable format for textual works in digital form and electronic serials. See also individual subtypes, particularly DOCX/OOXML_2012, XLSX/OOXML_2012, and PPTX/OOXML_2012.


Sustainability factors Explanation of format description terms

Disclosure Family of formats based on international open standard. Maintained by ISO/IEC JTC1 SC34/WG4. Originated by Microsoft Corporation and first standardized through ECMA International in 2006. Approval as ISO/IEC 29500 was in 2008.
    Documentation

ISO/IEC 29500, Information technology -- Document description and processing languages -- Office Open XML File Formats -- Parts 1-4. Latest version (2016 as of May 2020) is available from ISO/IEC Publicly Available Standards.

All editions of the OOXML standards as published by ECMA are available from ECMA-376: Office Open XML File Formats. See Notes below for a chronology.

Annex L of Part 1 is a Primer (informative rather than normative) that introduces key features of the constituent markup languages, including WordprocessingML, SpreadsheetML, and PresentationML, relating elements and attributes to intended functionality through examples.

Adoption

Widely adopted. According to a March 2020 post from CIODive, "Microsoft owns nearly 90% of the office suite market, or email and authoring market, as Gartner calls it. Google holds onto just over 10%, but is gaining about 1% market share annually." See individual subtypes for more detail.

In addition to end user applications that support OOXML, some open-source software libraries are available. libOPC provides support for reading and writing the OPC packages and also support for processing markup using the Markup Compatibility and Extensibility (MCE) mechanisms defined in the standard. In June 2014, Microsoft released its Open XML SDK (first released for use in 2007), as open source. Apache POI - the Java API for Microsoft Documents provides some open source support for OOXML documents, but admits that not all features are handled, with XSLX support being "most developed."

As applications have introduced support for OOXML, some developers have run into interoperability problems. Many of these have been forwarded as defect reports to the working group maintaining ISO/IEC 29500 and resolved through clarifications or small corrections in new editions of the OOXML standard or statements by Microsoft as to variations from the standard in Word, Excel, and PowerPoint in [MS-OI29500] (Office Implementation Information for ISO/IEC 29500 Standards Support) and [MS-OE376] (Office Implementation Information for ECMA-376 Standards Support).

    Licensing and patents

The specification originated from Microsoft Corporation. Current and future versions of ISO/IEC 29500 and ECMA-376 are covered by Microsoft's Open Specification Promise, whereby Microsoft "irrevocably promises" not to assert any claims against those making, using, and selling conforming implementations of any specification covered by the promise (so long as those accepting the promise refrain from suing Microsoft for patent infringement in relation to Microsoft's implementation of the covered specification).

Transparency

For transparency of the package containing the constituent parts of the DOCX file, see OPC/OOXML_2012.

Inside the OPC package, the OOXML formats are XML-based and inherently more transparent than their binary predecessors. See individual subtypes for more detail.

Self-documentation

See individual subtypes, in particular OPC/OOXML_2012, the package format that is used by the other formats.

External dependencies

None, beyond XML-aware software. See individual subtypes, in particular OPC/OOXML_2012, the package format that is used by the other formats.

Technical protection considerations Encryption is not permitted within the OPC package [OPC/OOXML_2012] used to wrap all OOXML documents as of 2014. However, an OPC package may be encrypted and some applications using this container format as the basis for a more specific format, may use encryption during interchange or DRM for distribution.

Quality and functionality factors Explanation of format description terms

Other
See individual subtypes

See individual subtypes, particularly DOCX/OOXML_2012, XLSX/OOXML_2012, and PPTX/OOXML_2012.


File type signifiers and format identifiers Explanation of format description terms

Tag Value Note
Filename extension docx
xlsx
pptx
And extensions used by other formats based on the OOXML specifications. See Notes below.
Internet Media Type application/vnd.openxmlformats-officedocument.wordprocessingml.document
application/vnd.openxmlformats-officedocument.spreadsheetml.document
application/vnd.openxmlformats-officedocument.presentationml.document
From IANA assignments site.
File signature See related format.  See ZIP_PK.
XML namespace declaration http://schemas.openxmlformats.org/.../2006/main
http://purl.oclc.org/ooxml/.../main
The first pattern is for the Transitional variants of the three OOXML document types, with the ellipses being replaced by wordprocessingml, spreadsheetml, or presentationml as appropriate. The second pattern applies to the Strict variants of the three document types, with the same substitutions for the ellipses.
Pronom PUID fmt/189
See http://www.nationalarchives.gov.uk/PRONOM/fmt/189.
Wikidata Title ID See note.  The Wikidata:WikiProject Informatics/File formats resource provides records for a large number of OOXML subtypes. See Wikidata:WikiProject Informatics/File formats/Lists/File formats. See also descriptions of subtypes on this website, for example, DOCX/OOXML_2012, XLSX/OOXML_2012, PPTX/OOXML_2012.

Notes Explanation of format description terms

General

File extensions used: The extensions listed below are commonly used. They are not defined in ISO/IEC 29500, but most of them are specified in the many MIME type registrations made with IANA, using the pattern application/vnd.openxmlformats-officedocument.... Macro-enabled versions of the primary document types and of templates are for files that do not technically comply with the OOXML standard, which does not permit the use of macros. But the files do use the OPC/OOXML_2012 package specification and many parts of the package follow the official standard. See note below for more on macro-enabled files, which are documented by Microsoft.

  • Wordprocessing file extensions: .docx (document); dotx (template). Not strictly OOXML but very closely related: .docm (macro-enabled document); .dotm (macro-enabled template)
  • Spreadsheet file extensions: .xlsx (spreadsheet); .xltx (template). Not strictly OOXML but very closely related: .xlsm (macro-enabled spreadsheet); .xltm (macro-enabled template); .xlsb (XML-based structure with binary data, for faster loading and smaller files, intended to address early performance problems for very large .xlsx spreadsheets)
  • Presentation file extensions: .pptx (presentation for editing); .ppsx (slideshow for running); .potx (presentation template); .pptm (macro-enabled presentation for editing); .ppsm (macro-enabled slideshow for running); .potm (macro-enabled template)

Macro-enabled variants: The macro-enabled variants of the OOXML formats are straightforward extensions of the primary formats without macros. They use OPC/OOXML_2012 as a package, simply adding a few parts that contain the macros and associated data. Macros for Office are written in Visual Basic for Applications (VBA). Note that macros do not work in Office for Mac 2008. In Office for Mac 2011 (the latest version as of 2014), macros are supported. However, not all macros originally written for a Windows version of Office will run on a Mac without modification to take account of differences between the implementations of VBA for the Windows and Mac (OS X) versions of Office, for example those that use ActiveX or other Windows-specific features. The additional parts for macros are defined in three supplementary documents:

  • [MS-OVBA] specifies the format for a file that holds the macro code for a VBA (Visual Basic for Applications) Project. All the macros for a workbook are included in a single file/part.
  • [MS-OFFMACRO], applicable to Office 2007, specifies the structure that connects the VBA Project part to the workbook, including an optional part for VBA Supplemental Data, and two parts that support use of older macros from Excel 4.0 (Macro Sheets and International Macro Sheets).
  • [MS-OFFMACRO2] performs the same function as [MS-OFFMACRO], but is applicable to Office 2010 and Office 2013. The two specifications are almost identical except for the standards they reference and one small new feature added to VBA with version 7.0.

Note that the VBA Editor used to develop macros and distributed as part of desktop Office applications for Windows (but hidden from users by default), offers the ability to export VBA macro code as a .BAS file (which is a regular text file). Microsoft applications offer a Save As option to drop the macro-related parts from a macro-enabled file and create a regular document. This is commonly done to archive a snapshot of a document or spreadsheet in which macros are used to update the file, perhaps based on external data. Apache Open Office and LibreOffice offer options as to how to handle macros on import of .xlsm files. Neither application can run all VBA macros as-is, although, according to Using Microsoft Office and LibreOffice in late 2014, "recent versions of LibreOffice can run some Visual Basic scripts" if the feature is enabled.

History

The first XML-based formats for Word and Excel were included in the release of Office 2003; these were flat XML files. These were partially documented on the Microsoft Development Network (MSDN) site. These formats were precursors to OOXML, with both similarities and significant differences.

The original OOXML specification was published as ECMA-376 in 2006. The primary difference between that version and the version published as ISO/IEC 29500:2008 was the split between the Strict variants of DOCX, XLSX, and PPTX (as specified in Part 1) and the Transitional variants (as defined in Part 4 in conjunction with Part 1). All versions since ISO/IEC 29500:2008 specify essentially the same format. The editions published by ISO/IEC in 2011 and 2012 consisted primarily of clarifications and corrections. In particular, modifications to Part 4 (Transitional Migration Features) have been intended to ensure that the specification corresponds to the corpus of existing documents and that interoperability between existing applications was improved rather than disrupted. See individual subtypes, particularly DOCX/OOXML_2012, XLSX/OOXML_2012, and PPTX/OOXML_2012 for more detail for the three primary OOXML document types. The chronology of editions specifying the OOXML family of formats is:

  • ECMA-376, 1st edition (December 2006)
  • ISO/IEC 29500:2008
  • ECMA-376, 2nd edition (December 2008) [specification identical to ISO/IEC 29500:2008]
  • ISO/IEC 29500:2011
  • ECMA-376, 3rd edition (June 2011) [specification identical to ISO/IEC 29500:2011]
  • ISO/IEC 29500:2012
  • ECMA-376, 4th edition (December 2012) [specification identical to ISO/IEC 29500:2012]
  • ECMA-376, Part 1, 4th edition (December 2012) [specification identical to ISO/IEC 29500-1:2012]; ECMA-376, Part 4, 4th edition (December 2012) [specification identical to ISO/IEC 29500-4:2012]
  • ISO/IEC 29500-3:2015
  • ECMA-376, Part 3, 5th edition (2015) [specification identical to ISO/IEC 29500-3:2015]
  • ISO/IEC 29500-1:2016; ISO/IEC 29500-4:2016
  • ECMA-376, Part 1, 5th edition (October 2016) [specification identical to ISO/IEC 29500-1:2016]; ECMA-376, Part 4, 5th edition (October 2016) [specification identical to ISO/IEC 29500-4:2016]
  • ISO/IEC 29500-2:2021

A new edition of Part 3 of the specification, for Markup Compatibility and Extensibility, was published in early 2015. The intent of the update was to clarify the text and to emphasize the applicability of MCE beyond OOXML to support interoperability. The new edition does not introduce new features but does remove some flexibility that had not been exploited in practice and is deemed unnecessary. Most importantly, it makes the process for handling MCE on file import much clearer.

Another chronology of relevance to digital archivists is the support for OOXML formats in different versions of the Office software. See Office File Formats Overview from 2016 for a Microsoft summary of the chronology. Files created by the Microsoft Office applications have a /docProps/app.xml part that contains properties for the document as a whole, including <Application> and <AppVersion>. Values for AppVersion are numeric, representing internal version numbers used by Microsoft during development. The integral part of the AppVersion values in files created by versions of Microsoft Office are: 12 = Windows Office 2007 or Office for Mac 2008; 14 = Windows Office 2010 or Mac Office 2011; 15 = Windows Office 2013 and Office for Mac 2016; 16 = Windows Office 2016.

  • Office 2007: Read, Transitional only. Write (as default), Transitional only. Based on ECMA-376, Edition 1. Also introduced read/write support for ODF 1.1.
  • Office 2010: Read, Transitional and Strict. Write (as default), Transitional only.  Based on ISO/IEC 29500:2008. Also read/write support for ODF 1.1
  • Office for Mac 2011: Read, Transitional only.  Write (as default), Transitional only.
  • Office 2013: Read, Transitional and Strict.  Write, Transitional (as default), Strict (as option).  Based on ISO/IEC 29500:2011. Introduced read/write support for ODF 1.2.
  • Office 2016: Read, Transitional and Strict.  Write, Transitional (as default), Strict (as option).  Based on ISO/IEC 29500:2012.
  • Office for Mac 2016: Read, Transitional and Strict.  Write, Transitional.  Based on ISO/IEC 29500:2012. As of early 2017, there was not a desktop Office for Mac that can write Strict.

Note that although versions of Office dated 2016 were released for both Windows and the Mac OS, they do not declare the same AppVersion value. The Windows and Mac versions of Office do not have identical codebases. Tests of the most recent iPad version of Excel in December 2014 and February 2017 revealed "Microsoft Macintosh Excel" for the value for Application and "15.0300" as the value for Appversion. Thus, it appears that the iPad versions of Office apps are related to Office for Mac. The compilers of this resource have not had the opportunity to check AppVersions in files created using Office 365 or Android apps. Comments welcome.

Starting with Office 2016, Microsoft has strongly encouraged subscription-based access and frequent updates. Format support may be adjusted in updates. For example, Office for Mac 2016 (first released in mid-2015), introduced support for export to ODF in the June 2016 update.

A new edition of ISO/IEC 29500-2 Part 2: Open packaging conventions was published in 2021. This edition preserves all functionality of the previous edition and adds no new functionality, but has been extensively re-organized and brought into line with ISO practices and the other specifications in the OOXML family. Where appropriate, it now uses undated or more recent versions of standards as normative references. Particular areas that have been clarified relate to the use of non-ASCII characters in names of parts in a package and the application of digital signatures.


Format specifications Explanation of format description terms


Useful references

URLs


Last Updated: 05/06/2022