Sustainability of Digital Formats: Planning for Library of Congress Collections

Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact
Format Description Categories >> Browse Alphabetical List

Microsoft Compound File Binary File Format, Version 4

>> Back
Table of Contents
Format Description Properties Explanation of format description terms

Identification and description Explanation of format description terms

Full name Microsoft Compound File Binary File Format, Version 4
Description

The Microsoft Compound File Binary (CFB) file format is used for storing storage objects and stream objects in a hierarchical structure within a single file.  See CFB_3 for a full description of the file system structure.

There are two active versions of CFB, version 3 and version 4. One major distinction between the versions is that the sector size for version 3 is of 512 bytes and the sector size for version 4 is 4096 bytes.

The minimum size of a compound file is three sectors: one header, one FAT sector and one directory sector.

  • 4096-byte sector compound files can have 64-bit file and user-defined datastream sizes, up to slightly less than 16 terabytes.
  • The maximum number of directory entries (storage objects and stream objects) is roughly 4 billion. This corresponds to a maximum directory sector chain length of slightly less than 512 GB for a 4096-byte sector compound file.

A file in the CFB format begins with a 512-byte header.  The first sector of a compound file with 4096-byte sectors is padded with zeros.  Values given below are as they occur in the physical file, for example when viewed using a Hex dump utility.

  • Header Signature for the CFB format with 8-byte Hex value D0CF11E0A1B11AE1. Gary Kessler notes that the beginning of this string looks like "DOCFILE"
  • 16 bytes of zeros
  • 2-byte Hex value 3E00 indicating CFB minor version 3E. The specification states that the minor version should always be indicated as 3e.
  • 2-byte Hex value 0400 indicating CFB major version 4.
  • 2-byte Hex value FFFE indicating little-endian byte order for all integer values. This byte order applies to all CFB files.
  • 2-byte Hex value 0C00 (indicating the sector size of 4096 bytes used for major version 4)
  • 480 bytes for remainder of the 512-byte header
Relationship to other formats
    Has earlier version CFB_3, Microsoft Compound File Binary File Format, Version 3
    Affinity to AAF_1_1, Advanced Authoring Format (AAF) Object, Version 1.1.

Early versions of the AAF format detailed use of the structured storage systems outlined in CFB to store the objects on disk.

    Affinity to WPD, WordPerfect Document Family. According to WPD from Archiveteam.org, WordPerfect version 7 can also store documents known as "WordPerfect Compound File" using the Microsoft OLE Compound file format with the same WPD extensions. OLE embedded objects are stored inside a storage called PerfectOffice_OBJECT, whereas the real document part is now stored as stream PerfectOffice_MAIN. In principal the format of this internal document part is the same like in previous versions, but one difference is that the minor version number is raised from 1 to 2.

Local use Explanation of format description terms

LC experience or existing holdings See various subtypes for holdings information.
LC preference See the Recommended Formats Statement for the Library of Congress format preferences.

Sustainability factors Explanation of format description terms

Disclosure See CFB_3
    Documentation See CFB_3
Adoption See CFB_3
    Licensing and patents See CFB_3
Transparency See CFB_3
Self-documentation See CFB_3
External dependencies See CFB_3
Technical protection considerations See CFB_3

Quality and functionality factors Explanation of format description terms


File type signifiers and format identifiers Explanation of format description terms

Tag Value Note
Filename extension See related format.  See CFB_3
Magic numbers Hex: D0 CF 11 E0 A1 B1 1A E1
Documented in the CFB specification, in 2.2 Compound File Header. Applies to all files in CFB format; see GCK'S File Signatures Table entry for Compound Binary File format (aka OLECF).
File signature Hex: 3E 00 04 00 FE FF 0C 00
At byte offset 24 from beginning of file. Documented in specification at 2.2 Compound File Header. This sequence indicates CFB (Compound File Binary format) major version 4, minor version 3e. The specification states that the minor version should always be indicated as 3e.

Notes Explanation of format description terms

General

In addition to the Major Version field value declaration of the version number in the header, the Sector Shift field specifies the sector size depending on the version declaration. If Major Version is 4, then the Sector Shift must be 0x000C, specifying a sector size of 4096 bytes.

History  

Format specifications Explanation of format description terms


Useful references


Last Updated: 11/28/2023