bronx.datagrip.netcdf

Utility class to deal with netcdf files.

It allows to generate a simplified representation of a NetCDF Dataset structure (see the netcdf_dataset_structure() method). Additionaly, to NetCDF datasets can be compared: both the structure of the file and the data will be compared (see the netcdf_dataset_diff() and netcdf_file_diff() methods).

Here is a demo:

First we need a helper function that creates a NetCDF dataset for demonstration purposes:

>>> def create_demo_netcdf4(nc_filename):
...     demoset = netCDF4.Dataset(nc_filename, mode='w')
...     demoset.setncattr('title', 'NetCDF Demo Data')
...     x_u = demoset.createDimension('X', 5)
...     y_u = demoset.createDimension('Y', 2)
...     g1 = demoset.createGroup('group1')
...     g2 = demoset.createGroup('group2')
...     vx = demoset.createVariable('x', np.float64, ('X', ), zlib=True)
...     vx[:] = np.arange(1, 7, 1.4)
...     vy = demoset.createVariable('y', np.float64, ('Y', ), zlib=True)
...     vy[:] = [0, 1]
...     T = g1.createVariable('T', np.float32, ('X', 'Y'))
...     T.setncattr('unit', 'Kelvin')
...     T[...] = np.reshape(np.arange(270, 280, 1), (5, 2))
...     T[1, 0] = np.ma.masked
...     z_u = g2.createDimension('Z', size=None)
...     T = g2.createVariable('T', np.float32, ('X', 'Y', 'Z'))
...     T.setncattr('unit', 'Kelvin')
...     T[..., 0] = np.reshape(np.arange(270, 280, 1), (5, 2))
...     T[..., 1] = np.reshape(np.arange(300, 310, 1), (5, 2))
...     T[0, 0, 0] = np.nan
...     return demoset

Create a NetCDF dataset and display its structure:

>>> import tempfile
>>> demofile1 = tempfile.NamedTemporaryFile(mode='wb', delete=True)
>>> demoset1 = create_demo_netcdf4(demofile1.name)
>>> demodesc1 = netcdf_dataset_structure(demoset1)
>>> demodesc1 == {
...     'dimensions': {'X': {'size': 5, 'unlimited': False},
...                    'Y': {'size': 2, 'unlimited': False}},
...     'groups': {'group1': {'dimensions': {},
...                           'groups': {},
...                           'ncattrs': {},
...                           'variables': {'T': {'datatype': np.float32,
...                                               'dimensions': ('X', 'Y'),
...                                               'filters': {'complevel': 0, },
...                                               'ncattrs': {'unit': 'Kelvin'},
...                                               'shape': (5, 2)}}},
...                'group2': {'dimensions': {'Z': {'size': 2, 'unlimited': True}},
...                           'groups': {},
...                           'ncattrs': {},
...                           'variables': {'T': {'datatype': np.float32,
...                                               'dimensions': ('X', 'Y', 'Z'),
...                                               'filters': {'complevel': 0, },
...                                               'ncattrs': {'unit': 'Kelvin'},
...                                               'shape': (5, 2, 2)}}}},
...     'ncattrs': {'title': 'NetCDF Demo Data'},
...     'variables': {'x': {'datatype': np.float64,
...                         'dimensions': ('X',),
...                         'filters': {'complevel': 4,
...                                     'shuffle': True,
...                                     'zlib': True},
...                         'ncattrs': {},
...                         'shape': (5,)},
...                   'y': {'datatype': np.float64,
...                         'dimensions': ('Y',),
...                         'filters': {'complevel': 4,
...                                     'shuffle': True,
...                                     'zlib': True},
...                         'ncattrs': {},
...                         'shape': (2,)}}}
True

First comparison attempt (compare a dataset to itself):

>>> netcdf_dataset_diff(demoset1, demoset1)
== Comparison of the two netcdf structures: ==
(legend: created: "+"  deleted: "-"  unchanged: "="  updated: "?")
No differences
== Comparison of data available in both netcdf datasets: ==
4 data arrays out of 4 are identical
True

Create a different dataset and perform a comparison:

>>> demofile2 = tempfile.NamedTemporaryFile(mode='wb', delete=True)
>>> demoset2 = create_demo_netcdf4(demofile2.name)
>>> demoset2.delncattr('title')
>>> demoset2['group1']['T'][1, 0] = 276.
>>> demoset2['group1']['T'][1, 1] = np.ma.masked
>>> demoset2['group2']['T'].unit = 'Celsius'
>>> demoset2['group2']['T'][1, 1, 1] = 0
>>> s2_g3 = demoset2.createGroup('group3')
>>> s2_T = s2_g3.createVariable('T', np.float32, ('X', 'Y'))
>>> s2_T[...] = 0
>>> netcdf_dataset_diff(demoset1, demoset2)  
== Comparison of the two netcdf structures: ==
(legend: created: "+"  deleted: "-"  unchanged: "="  updated: "?")
? groups:
  | + group3: NetCDF4ParentStructure::<<as_dict:: ...>>
  | ? group2:
  |   | ? variables:
  |   |   | ? T:
  |   |   |   | ? ncattrs:
  |   |   |   |   | ? unit: before='Kelvin' after='Celsius'
? ncattrs:
  | - title: 'NetCDF Demo Data'
== Comparison of data available in both netcdf datasets: ==
2 data arrays out of 4 are identical
/group1/T differs
/group2/T differs
False

Clean things up…

>>> demofile1.close()
>>> demofile2.close()

Functions

bronx.datagrip.netcdf.netcdf_dataset_diff(netcdf4_ref, netcdf4_new, verbose=True)[source]

Compare two NetCDF4.Dataset objects.

Both the structure of the NetCDF Dataset and the data will be compared.

bronx.datagrip.netcdf.netcdf_dataset_structure(netcdf4_obj)[source]

Generate a representation of the structure of a NetCDF4.Dataset object.

bronx.datagrip.netcdf.netcdf_file_diff(netcdf_file_ref, netcdf_file_new)[source]

Compare two NetCDF files.

Classes

class bronx.datagrip.netcdf.NetCDF4DimensionStructure(netcdf4_obj)[source]

Bases: dict

Represents the structure of a NetCDF4.Dimension object.

Parameters:

netcdf4_obj – The Dimension object to process.

class bronx.datagrip.netcdf.NetCDF4ParentStructure(netcdf4_obj)[source]

Bases: dict

Represents the structure of a NetCDF4.Dataset or NetCDF4.Group object.

Parameters:

netcdf4_obj – The Dataset or Group object to process.

class bronx.datagrip.netcdf.NetCDF4VariableStructure(netcdf4_obj)[source]

Bases: dict

Represents the structure of a NetCDF4.Variable object.

Parameters:

netcdf4_obj – The Variable object to process.