NetCDF User's Guide - Data Access

Go to the previous, next section.

Data Access

The netCDF interface supports several kinds of data access to data in an open netCDF file:

direct (random) access to single data values,
direct access to an arbitrary generalized cross-section of data for a single variable,
record-oriented access to data for a single variable, and
record-oriented access to a subset of the variables in a record.

To directly access a single data value, you specify a netCDF file, a variable, and a multidimensional index for the variable. Files are not specified by name every time you want to access data, but instead by a small integer obtained when the file is first created or opened. Similarly, variables are not specified by name for every data access either, but by variable IDs, small integers used to identify variables in a netCDF file.

Data in a netCDF file can be accessed as single values or as hyperslabs. A hyperslab is a generalized piece of a multidimensional variable that is specified by giving the indices of a corner point and a list of edge lengths along each of the dimensions of the variable. The corner point specified must be the one with the smallest indices, that is the one closest to the origin of the variable index space. The block of data values returned (or written) has the last dimension of the variable varying fastest, and the first dimension varying most slowly in the C interface. For FORTRAN, the order is reversed, with the first dimension of the variable varying fastest and the last dimension varying most slowly. These ordering conventions correspond to the customary order in which multidimensional variables are stored in C and FORTRAN.

As an example of hyperslab access, assume that in the example netCDF file (see section Components of a NetCDF File), you wish to read all the data for the temp variable at the second (850 millibar) level, and assume that there are currently three records (time values) in the netCDF file. Recall that the dimensions are defined as

        lat = 5, lon = 10, level = 4, time = unlimited;

and the variable temp is declared as

        float   temp(time, level, lat, lon);

in the CDL notation.

A corresponding C variable that holds data for all four levels and all three times might be declared as

#define LATS  5
#define LONS 10
#define LEVELS 4
#define TIMES 3                 /* currently */
    ...
float   temp[TIMES*LEVELS*LATS*LONS];

to keep the data in a one-dimensional array, or

    ...
float   temp[TIMES][LEVELS][LATS][LONS];

using a multidimensional array declaration. To read in a hyperslab of data for only one level at a time, for example, you could leave out the LEVEL dimension or set it to 1.

In FORTRAN, the dimensions are reversed from the CDL declaration with the first dimension varying fastest and the record dimension as the last dimension of a record variable. Thus a FORTRAN declaration for the coresponding variable that holds all times and levels is

      PARAMETER (LATS=5, LONS=10, LEVELS=4, TIMES=3)
         ...
      REAL TEMP(LONS, LATS, LEVELS, TIMES)

To specify the hyperslab of data that represents the second level, all times, all latitudes, and all longitudes, we need to provide a corner and some edge lengths. The corner should be (0, 1, 0, 0) in C--or (1, 1, 2, 1) in FORTRAN--because you want to start at the beginning of each of the time, lon, and lat dimensions, but you want to begin at the second value of the level dimension. The edge lengths should be (3, 1, 5, 10) in C--or (10, 5, 1, 3) in FORTRAN--since you want to get data for all three time values, only one level value, all five lat values, and all 10 lon values. You should expect to get a total of 150 float values returned (3 * 1 * 5 * 10), and should provide enough space in your array for this many. The order in which the data will be returned is with the last dimension, lon, varying fastest for C, or with the first dimension, LON, varying fastest for FORTRAN:

              C                  FORTRAN

     temp[0][1][0][0]      TEMP(1, 1, 2, 1)
     temp[0][1][0][1]      TEMP(2, 1, 2, 1)
     temp[0][1][0][2]      TEMP(3, 1, 2, 1)
     temp[0][1][0][3]      TEMP(4, 1, 2, 1)

           ...                 ...

     temp[2][1][4][7]      TEMP( 8, 5, 2, 3)
     temp[2][1][4][8]      TEMP( 9, 5, 2, 3)
     temp[2][1][4][9]      TEMP(10, 5, 2, 3)

Note that the different dimension orders for the C and FORTRAN interfaces do not reflect a different order for values stored on the disk, but merely different orders supported by the procedural interfaces to the two languages. In general, it does not matter whether a netCDF file is written using the C or FORTRAN interface; netCDF files written from either language may be read by programs written in the other language.

Besides the regular hyperslab access described above, the netCDF abstraction also supports the reading and writing of generalized hyperslabs. A generalized hyperslab has the same corner point and edge lengths attributes as a regular hyperslab. It allows, however, more general mappings between the points of the hyperslab and both the disk-resident values of the netCDF variable and the locations for those values in memory.

In a regular hyperslab, the mapping between the points of the hyperslab and the values of a netCDF variable can be described as contiguous: along each dimension, adjacent hyperslab points correspond to adjacent netCDF variable values. In a generalized hyperslab, however, this is not necessarily true; in any dimension, adjacent generalized hyperslab points correspond to netCDF variable values that are separated by a distance of n values. n is the stride of the dimension and needn't be the same for all dimensions. Strides may reasonably vary from one (to access adjacent netCDF variable values) to the number of points in a netCDF dimension. A regular hyperslab has unity strides in all dimensions.

The other mapping allowed by generalized hyperslabs is between the points of the hyperslab and the memory locations of the corresponding values. In a regular hyperslab, this mapping is trivial: the structure of the in-memory values (i.e. the dimensional sizes and their order) is identical to that of the hyperslab. In a generalized hyperslab, however, this is not necessarily true, instead an index mapping vector is used to define the mapping between points in the generalized hyperslab and the memory locations of the corresponding values. The offset, in bytes, from the origin of a memory-resident array to a particular point is given by the inner product (1) of the index mapping vector with the point's coordinate offset vector (2) . The index mapping vector for a regular hyperslab would have -- in order from most rapidly varying dimension to most slowly -- the byte size of a memory-resident datum (e.g. 4 for a floating-point value), then the product of that value with the edge length of the most rapidly varying dimension of the hyperslab, then the product of that value with the edge length of the next most rapidly varying dimension, and so on. In a generalized hyperslab, however, the correspondence between hyperslab points and memory locations can be radically diferent. For example, the following C definitions

struct vel {
    int flags;
    float u;
    float v;
} vel[NX][NY];
long imap[2] = {
    sizeof(struct vel),
    sizeof(struct vel)*NY};

where imap is the index mapping vector, can be used to access the memory-resident values of the netCDF variable, vel[NY][NX], even though the dimensions are transposed and the data is contained in a 2-D array of structures rather than a 2-D array of floating-point values.

Note that, although the netCDF abstraction allows the use of generalized hyperslab access if warranted by the situation, it does not mandate it. If you do not need generalized hyperslab access, you may ignore this capability and use regular hyperslab access instead.

To perform conventional record-oriented access, you specify a netCDF file, a record variable (one defined with an unlimited dimension), and for the record number use the value of the first dimension (last dimension in FORTRAN), using hyperslab access to get the record of values.

You may read or write multiple record variables, even if the variables are of different types, with a single call in the C interface. In this case you must write a whole record's-worth of data for each desired variable. This interface is supported as a convenience for C programmers, but is not strictly necesary, since the same result may be achieved with one read or write call for each variable. An equivalent portable FORTRAN interface that replaces multiple calls with a single call is unfortunately not possible. However, data written using the C interface can still be read with the FORTRAN interface, using one call per variable.

When efficiency is a concern, you should keep in mind the order in which netCDF data are written on the disk, since the best I/O performance is achieved by reading or writing contiguous data. All variable data are ordered with the last dimension for each variable varying fastest in the C interface, or the slowest in the FORTRAN interface. This means that for record variables in particular, at least one disk access per record will be required for reading a value from each record. Hence reading a hyperslab that takes one value out of each record will require as many disk accesses as the number of values requested. For writing, the situation is even worse, since each record must first be read and then rewritten to change a single value within a record. If you have a choice about the order in which data is accessed or the order of the dimensions that define the shape of a variable, try to choose these two orders in harmony to avoid needless inefficiency.

Go to the previous, next section.