Go to the previous, next section.

The XDR Layer

XDR is a standard for describing and encoding data and a library of functions for external data representation, allowing programmers to encode data structures in a machine-independent way. NetCDF employs XDR for representing all data, in both the header part and the data parts. XDR is used to write portable data that can be read on any other machine for which the XDR library has been implemented.

Many vendors provide an XDR library along with other C run-time libraries. The netCDF software distribution also includes Sun's portable implementation of XDR for platforms that don't already have a vendor-supplied XDR library.

An I/O layer implemented much like the C standard I/O (stdio) library is used by the XDR layer to read and write XDR-encoded data to netCDF files. Hence an understanding of the standard I/O library provides answers to most questions about multiple processes accessing data concurrently, the use of I/O buffers, and the costs of opening and closing netCDF files. In particular, it is possible to have one process writing a netCDF file while other processes read it. Data reads and writes are no more atomic than calls to stdio fread() and fwrite(). An ncsync() call (NCSNC() for FORTRAN) is analogous to the fflush() call in the standard I/O library, writing unwritten buffered data so other processes can read it; ncsync() also brings header changes up-to-date (e.g., changes to attribute values).

As in the stdio library, flushes are also performed when "seeks" occur to a different area of the file. Hence the order of read and write operations can influence I/O performance significantly. Reading data in the same order in which it was written within each record will minimize buffer flushes.

There is one unusual case where the situation is more complex: when a writer enters define mode to add some additional dimensions, variables, or attributes to an existing netCDF file that is also open for reading by other processes. In this case, when the writer leaves define mode, a new copy of the file is created with the new dimensions, attributes, or variables and the old data, but readers that still have the file open will not see the changes. You should not expect netCDF data access to work with multiple writers having the same file open for writing simultaneously.

For VMS systems, the performance penalty for permitting shared access (under the current implementation of stdio in the C run-time library) seemed too great to make shared access the default, so netCDF files on VMS are opened non-shared. This still permits multiple simultaneous readers of the same file, but one writer prevents any readers from accessing the file. Implementors can easily allow shared access for a VMS implementation, if shared access is a more important requirement than access speed.

It is possible to tune an implementation of netCDF for some platforms by replacing the I/O layer beneath XDR with a different platform-specific I/O layer. This may change the similarities between netCDF and standard I/O, and hence characteristics related to data sharing, buffering, and the cost of I/O operations.

The cost of using a canonical representation for data like XDR varies according to the type of data and whether the XDR form is the same as the machine's native form for that type. XDR is especially efficient for byte, character, and short integer data.

For some data types on some machines, the time required to convert data to and from XDR form can be significant. The best case is byte arrays, for which very little conversion expense occurs, since the XDR library has built-in support for them. The netCDF implementation includes similar support added to XDR for arrays of short (16-bit) integers. The worst case is reading or writing large arrays of floating-point data on a machine that does not use IEEE floating-point as its native representation. The XDR library incurs the expense of a function call for each floating-point quantity accessed. On some architectures the cost of a function invocation for each floating-point number can dominate the cost of netCDF access to floating-point fields.

The distributed netCDF implementation is meant to be portable. Platform-specific ports that further optimize the implementation for better I/O performance or that unroll the loops in the XDR library to optimize XDR conversion of long integer and floating-point arrays are practical and desirable in cases where higher performance for data access is necessary.

Go to the previous, next section.