Sustainability of Digital Formats: Planning for Library of Congress Collections

Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact
Format Description Categories >> Browse Alphabetical List

NetCDF-3 (Network Common Data Form, version 3)

>> Back
Table of Contents
Format Description Properties Explanation of format description terms

Identification and description Explanation of format description terms

Full name NetCDF-3 (Network Common Data Form, version 3)
Description

NetCDF is a set of software libraries and self-describing, machine-independent data formats for array-oriented scientific data. The first version of the format was developed in the late 1980s at the Unidata Program Center, with the objective of building a file format that would permit sharing of data among atmospheric scientists. It has found wide use in other scientific communities, with different communities developing discipline-specific conventions. The format was and is designed to be portable, platform-independent, scalable, and appendable. See Notes below for more detail on design objectives.

As of January 2015, there are four variants of the format. The first two, known as Classic and 64-bit Offset, are nearly identical and together are often referred to as netCDF-3. This format description is for netCDF-3. The later formats, based on HDF5 will be described separately. The classic format was the only format for netCDF data created between 1989 and 2004 by the reference software from Unidata. It is still the default format for new netCDF data files, and the form in which most netCDF data is stored. The intent is to maintain support for netCDF-3 indefinitely.

The netCDF-3 format supports one flexible and widely applicable data model: annotated multidimensional arrays of typed elements. The representation of these arrays uses a structure of dimensions, variables, and attributes. A netCDF classic or 64-bit offset dataset is stored as a single file comprising two parts:

  • a header, containing all the information about dimensions, attributes, and variables, but not the variable data
  • a data part, comprising two subparts. Firstly, a fixed-size data portion contains the data for variables with predetermined fixed dimensions; data for each such variable is stored contiguously. Last comes the variable-size data portion, containing the data for any so-called record variables that share the single permitted unlimited dimension; this data is stored as a variable number of fixed-size records, with each record storing the data for all record variables at one point in the dataset's multidimensional space.

The encoding is based on the IETF standard XDR (External Data Representation), described in RFC 4506. XDR uses a base unit of 4 bytes, with smaller data types padded to 4 bytes and variable length data types padded to a multiple of 4 bytes. Floating point numbers are represented in IEEE 754 format. An advantage of XDR is that it supports efficient location and reading of a subset without parsing preceding data.

For visualization purposes, according to Jan Heijmans, in An Introduction to Distributed Visualization, netCDF-3 is "not as powerful as HDF, but it does offer all features that most applications will ever need." He goes on to suggest that it is supported by more visualization software tools than HDF and is the most logical choice if a dataset is not too complicated. Dataset size limitations in the Classic format led to the simple change for the 64-bit offset variant of netCDF-3. Size constraints derive from several factors and are too complex to explain in this description. On a platform without Large File Support (LFS), the file limit of the operating system (typically 2 Gbytes) is often the most significant constraint. With LFS, the limitations depend on the nature of variables and the size of the data for individual variables. See NetCDF Classic Format Limitations and NetCDF 64-bit Offset Format Limitations from the netCDF Users Guide. Functional shortcomings (significant in some circumstances but not all) of netCDF-3 that led to the development of netCDF-4 include the lack of support for parallel input/output, user-defined data types, or for compression. NetCDF-4, which is based on HDF5, also introduced a new grouping structure and several features to facilitate better self-description.

Production phase Generally used for middle- and final-state archiving.
Relationship to other formats
    Has later version NetCDF-4, Network Common Data Form, Version 4. The netCDF software libraries support both versions 3 and 4. However, the stored data formats are very different. NetCDF-4 is stored as HDF5.

Local use Explanation of format description terms

LC experience or existing holdings None
LC preference None

Sustainability factors Explanation of format description terms

Disclosure

Fully and openly documented. NetCDF was developed by and is maintained and documented by the Unidata Program Center, a consortial program within UCAR (University Corporation for Atmospheric Research).

    Documentation

Software can be downloaded from http://www.unidata.ucar.edu/downloads/netcdf/index.jsp. Documentation is at http://www.unidata.ucar.edu/software/netcdf/docs/.

The format (in Classic and 64-bit offset variants) has been recognized and re-published as a standard by both NASA's Earth Science Data Systems [https://earthdata.nasa.gov/standards/netcdf-classic] and by the Open Geospatial Consortium (OGC)[OGC 10-090 and 10-092]. The organization and wording of these standards is different but the governing BNF description for the format is the same.

Adoption

NetCDF-3 is widely used in atmospheric and earth sciences. See Where is NetCDF used? from Unidata. One indication of its popularity is the number of communities that have developed conventions for metadata and variable names. The Climate and Forecast (CF) conventions are the most fully developed and were approved as an OGC standard in February 2013. They incorporate many conventions that are applicable to other geospatial data domains.

As noted under Documentation above, NetCDF-3 has been formally adopted as a standard by OGC and NASA's Earth Science Data Systems. The Federal Geographic Data Committee includes netCDF on its list of FGDC Endorsed External Standards. See a timeline of standards body endorsements from Unidata.

NetCDF-3 is supported by significant tools for analysis and visualization, including the commercial products IDL, MATLAB, and ESRI's ArcGIS, and the open-source VTK (Visualization Toolkit). Other analysis applications that support netCDF include: FERRET, GrADS, IDV, NCL, NCO, ncview, Panoply, and R. Conversion toolkits for geospatial formats, such as the open source GDAL and commercial FME (from Safe Software) also support netCDF-3. Unidata maintains a list of software supporting netCDF.

NetCDF-3 Classic is used as a master format for a number of collaborative projects. For example, the CMIP5 Coupled Model Intercomparison Project has standardized on netCDF-3 Classic, with one output variable per file, and using the netCDF CF (Climate and Forecast) metadata conventions in a specific way. This large collection of data is the basis for a study by the Intergovernmental Panel on Climate Change. The Alfred Wegener Institutue for Polar and Marine Research has developed a module JANEME for JHOVE2 to characterize netCDF files and extract metadata into both a Dublin Core template and a profile of ISO 19115 adopted by the Collaborative Climate Community Data and Processing Grid (C3grid).

    Licensing and patents

No concerns.

Transparency

NetCDF-3 is a binary format that requires the netCDF software libraries for the data to be accessed and manipulated. However, the ncdump utility that is distributed with the software libraries converts the entire contents of a netCDF-3 file to an ASCII form.

Self-documentation

NetCDF-3 offers the capability to apply attributes to a file as a whole or any individual variable. There is no explicit support for embedding structured metadata using a particular schema or syntax. However, conventions developed by a community enable the use of standard names for physical quantities and metadata elements. There is a recommendation that datasets identify which conventions they adhere to through a global Conventions attribute. There is also a recommendation for dataset description attributes to use to support discovery through digital libraries, etc. See 2005 version of recommendation at Unidata and updated recommendation from ESIP. A global attribute Metadata_Conventions is recommended to identify use of this attribute convention.

Of particular note among conventions are the Climate and Forecast (CF) conventions, which include a substantial table of standard names and definitions for physical quantities commonly represented by data, with associated recommendations on units. This actively maintained vocabulary covers many sub-disciplines of meteorology and climatology and also includes standard names and recommendations for physical quantities relevant to other geospatial contexts.

External dependencies None beyond access to netCDF-aware software.
Technical protection considerations None.

Quality and functionality factors Explanation of format description terms

Dataset
Normal functionality

The representation of self-describing arrays uses a structure of dimensions, variables, and attributes. A variable can hold a multidimensional array of data values of the same type.

Numeric data in multidimensional arrays can be of any of the following number types: 8, 16, and 32-bit signed integers, and 32 and 64-bit floating point values. Character (string) data of indefinite length is also supported.

Support for software interfaces (APIs, etc.)

An integral component of netCDF is a software library that provides an API (in Fortran, C, C++, Java, and other languages) to read and write files in the netCDF-3 format.

Data documentation (quality, provenance, etc.)

NetCDF-3 offers the capability to apply attributes to a file as a whole or any individual variable. There is no explicit support for embedding structured metadata using a particular schema or syntax. However, particular communities use conventions for naming variables and using attributes.

Beyond normal functionality

Multidimensional arrays can have one unlimited (appendable) dimension.

GIS images and datasets
Normal functionality

NetCDF-3 is not a geospatial format per se. However, it is widely used for geospatial data. In order to serve as a format for geospatial data that can be shared and used in different contexts, the description of the coordinate reference systems and projections employed must be recorded in a recognizable and unambiguous way. Two related sets of community developed conventions are widely used for this purpose and if a dataset follows those conventions and contains an explicit identification of the conventions followed, its data can be imported into a GIS system and combined with data in other formats for geospatial analysis.

  • The COARDS conventions (at http://ferret.wrc.noaa.gov/noaa_coop/coop_cdf_profile.html) were developed under the sponsorship of NOAA's Cooperative Ocean/Atmosphere Research Data Service in the mid 1990s to specify a standard way to document and organize longitude, latitude, vertical coordinate variables, and time or date, and how to relate other variables to that four-dimensional structure.
  • The Climate and Forecast CF conventions (at http://cfconventions.org/) generalize and extend the COARDS conventions. The CF conventions, of which the first version was published in 2003, define metadata elements that provide a definitive description of what the data in each variable represents, and the spatial and temporal properties of the data. A design objective for the CF conventions is that a dataset that complies with the more constrained COARDS conventions should also comply with the CF conventions, which allow additional ways to express positions and time.
Support for GIS metadata

There is no single or recommended way to embed metadata in a specific serialization or schema in netCDF-3 files. Since XML consists of strings, XML can be embedded in netCDF files by means of string variables or attributes; however, there is no officially recommended approach. Unidata makes available a service (ncISO) as part of its THREDDS Data Server that outputs metadata from a netCDF file in a form compliant with ISO 19115 (Geographic Information -- Metadata).

Support for grids The combination of the netCDF data model and the application of the CF conventions can provide explicit and flexible support for grid-based analysis. The conventions make recommendations for grid definition and mappings that allow for grids that are not based simply on latitude and longitude.
Beyond normal functionality See Dataset Quality and Functionality factors above.

File type signifiers and format identifiers Explanation of format description terms

Tag Value Note
Filename extension nc
From NetCDF FAQ. The former use of .cdf as an extension was deprecated in 1994.
Internet Media Type application/x-netcdf
From The File Extension Source.
Magic numbers Hex: 43 44 46 01
ASCII: CDF \x01
For classic format. From NetCDF Classic Format: The Format in Detail.
Magic numbers Hex: 43 44 46 02
ASCII: CDF \x02
For 64-bit offset format. From NetCDF Classic Format: The Format in Detail.

Notes Explanation of format description terms

General

The stated objectives for the netCDF format are that it be:

  • Self-Describing. A netCDF file includes information about the data it contains.
  • Portable. A netCDF file can be accessed by computers with different ways of storing integers, characters, and floating-point numbers.
  • Scalable. A small subset of a large dataset may be accessed efficiently.
  • Appendable. Data may be appended to a properly structured netCDF file without copying the dataset or redefining its structure.
  • Sharable. One writer and multiple readers may simultaneously access the same netCDF file.
  • Archivable. Access to all earlier forms of netCDF data will be supported by current and future versions of the software.
History

As of January 2012, there are four variants of the NetCDF binary data format.

  • the classic format, used since 1989
  • the 64-bit offset format, introduced in 2004 to support larger variables
  • the netCDF-4 format, introduced in 2008 to support more powerful forms of data representation, based on HDF5
  • the netCDF-4 classic model format, also introduced in 2008, based on HDF5, but without the data modeling extensions

Format specifications Explanation of format description terms


Useful references

URLs

Books, articles, etc.

Last Updated: 07/27/2017