CSDM

class csdmpy.CSDM(filename='', version=None, description='', **kwargs)[source]

Bases: object

Create an instance of a CSDM class.

This class is based on the root CSDM object of the core scientific dataset (CSD) model. The class is a composition of the DependentVariable and Dimension instances, where an instance of the DependentVariable class describes a \(p\)-component dependent variable, and an instance of the Dimension class describes a dimension of a \(d\)-dimensional space. Additional attributes of this class are listed below.

Attributes Summary

version

Version number of the CSD model on file.

description

Description of the dataset.

read_only

If True, the data-file is serialized as read only, otherwise, False.

tags

List of tags attached to the dataset.

timestamp

Timestamp from when the file was last serialized.

geographic_coordinate

Geographic coordinate, if present, from where the file was last serialized.

dimensions

Tuple of the Dimension instances.

dependent_variables

Tuple of the DependentVariable instances.

application

Application metadata dictionary of the CSDM object.

data_structure

Json serialized string describing the CSDM class instance.

filename

Local file address of the current file.

Methods summary

add_dimension

Add a new Dimension instance to the CSDM object.

add_dependent_variable

Add a new DependentVariable instance to the CSDM instance.

dict

Serialize the CSDM instance as a python dictionary.

to_dict

Alias to the dict() method of the class.

dumps

Serialize the CSDM instance as a JSON data-exchange string.

astype

Return a copy of the CSDM object by converting the numeric type of each dependent variables components to the given value.

save

Serialize the CSDM instance as a JSON data-exchange file.

copy

Create a copy of the current CSDM instance.

split

Split the dependent-variables into view of individual csdm objects.

Numpy compatible attributes summary

real

Return a csdm object with only the real part of the dependent variable components.

imag

Return a csdm object with only the imaginary part of the dependent variable components.

shape

Return the count along each dimension of the csdm objects as a tuple.

T

Return a csdm object with a transpose of the dataset.

Numpy compatible method summary

max

Return a csdm object with the maximum dependent variable component along a given axis.

min

Return a csdm object with the minimum dependent variable component along a given axis.

clip

Clip the dependent variable components between the min and max values.

conj

Return a csdm object with the complex conjugate of all dependent variable components.

round

Return a csdm object by rounding the dependent variable components to the given decimals.

sum

Return a csdm object with the sum of the dependent variable components over a given dimension=axis.

mean

Return a csdm object with the mean of the dependent variable components over a given dimension=axis.

var

Return a csdm object with the variance of the dependent variable components over a given dimension=axis.

std

Return a csdm object with the standard deviation of the dependent variable components over a given dimension=axis.

prod

Return a csdm object with the product of the dependent variable components over a given dimension=axis.

Attributes documentation

version

Version number of the CSD model on file.

description

Description of the dataset. The default value is an empty string.

Example

>>> print(data.description)
A simulated sine curve.
Returns

A string of UTF-8 allows characters describing the dataset.

Raises

TypeError – When the assigned value is not a string.

read_only

If True, the data-file is serialized as read only, otherwise, False.

By default, the CSDM object loads a copy of the .csdf(e) file, irrespective of the value of the read_only attribute. The value of this attribute may be toggled at any time after the file import. When serializing the .csdf(e) file, if the value of the read_only attribute is found True, the file will be serialized as read only.

tags

List of tags attached to the dataset.

timestamp

Timestamp from when the file was last serialized. This attribute is real only.

The timestamp stamp is a string representation of the Coordinated Universal Time (UTC) formatted according to the iso-8601 standard.

Raises

AttributeError – When the attribute is modified.

geographic_coordinate

Geographic coordinate, if present, from where the file was last serialized. This attribute is read-only.

The geographic coordinates correspond to the location where the file was last serialized. If present, the geographic coordinates are described with three attributes, the required latitude and longitude, and an optional altitude.

Raises

AttributeError – When the attribute is modified.

dimensions

Tuple of the Dimension instances.

dependent_variables

Tuple of the DependentVariable instances.

application

Application metadata dictionary of the CSDM object.

>>> print(data.application)
{}

By default, the application attribute is an empty dictionary, that is, the application metadata stored by the previous application is ignored upon file import.

The application metadata may, however, be retained with a request via the load() method. This feature may be useful to related applications where application metadata might contain additional information. The attribute may be updated with a python dictionary.

The application attribute is where an application can place its own metadata as a python dictionary object containing application specific metadata, using a reverse domain name notation string as the attribute key, for example,

Example

>>> data.application = {
...     "com.example.myApp" : {
...         "myApp_key": "myApp_metadata"
...      }
... }
>>> print(data.application)
{'com.example.myApp': {'myApp_key': 'myApp_metadata'}}
Returns

Python dictionary object with the application metadata.

data_structure

Json serialized string describing the CSDM class instance.

The data_structure attribute is only intended for a quick preview of the dataset. This JSON serialized string from this attribute avoids displaying large datasets. Do not use the value of this attribute to save the data to a file, instead use the save() methods of the instance.

Raises

AttributeError – When modified.

filename

Local file address of the current file.

Numpy compatible attributes documentation

real

Return a csdm object with only the real part of the dependent variable components.

imag

Return a csdm object with only the imaginary part of the dependent variable components.

shape

Return the count along each dimension of the csdm objects as a tuple.

T

Return a csdm object with a transpose of the dataset.

Methods documentation

add_dimension(*args, **kwargs)[source]

Add a new Dimension instance to the CSDM object.

There are several ways to add a new independent variable.

From a python dictionary containing valid keywords.

>>> import csdmpy as cp
>>> datamodel = cp.new()
>>> py_dictionary = {
...     'type': 'linear',
...     'increment': '5 G',
...     'count': 50,
...     'coordinates_offset': '-10 mT'
... }
>>> datamodel.add_dimension(py_dictionary)

Using keyword as the arguments.

>>> datamodel.add_dimension(
...     type = 'linear',
...     increment = '5 G',
...     count = 50,
...     coordinates_offset = '-10 mT'
... )

Using a Dimension class.

>>> var1 = Dimension(type = 'linear',
...                  increment = '5 G',
...                  count = 50,
...                  coordinates_offset = '-10 mT')
>>> datamodel.add_dimension(var1)

Using a subtype class.

>>> var2 = cp.LinearDimension(count = 50,
...                  increment = '5 G',
...                  coordinates_offset = '-10 mT')
>>> datamodel.add_dimension(var2)

From a numpy array.

>>> array = np.arange(50)
>>> dim = cp.as_dimension(array)
>>> datamodel.add_dimension(dim)

In the third and fourth example, the instances, var1 and var2 are added to the datamodel as a reference, i.e., if the instance var1 or var2 is destroyed, the datamodel instance will become corrupt. As a recommendation, always pass a copy of the Dimension instance to the add_dimension() method.

add_dependent_variable(*args, **kwargs)[source]

Add a new DependentVariable instance to the CSDM instance.

There are again several ways to add a new dependent variable instance.

From a python dictionary containing valid keywords.

>>> import numpy as np

>>> datamodel = cp.new()

>>> numpy_array = (100*np.random.rand(3,50)).astype(np.uint8)
>>> py_dictionary = {
...     'type': 'internal',
...     'components': numpy_array,
...     'name': 'star',
...     'unit': 'W s',
...     'quantity_name': 'energy',
...     'quantity_type': 'pixel_3'
... }
>>> datamodel.add_dependent_variable(py_dictionary)

From a list of valid keyword arguments.

>>> datamodel.add_dependent_variable(type='internal',
...                                  name='star',
...                                  unit='W s',
...                                  quantity_type='pixel_3',
...                                  components=numpy_array)

From a DependentVariable instance.

>>> from csdmpy import DependentVariable
>>> var1 = DependentVariable(type='internal',
...                          name='star',
...                          unit='W s',
...                          quantity_type='pixel_3',
...                          components=numpy_array)
>>> datamodel.add_dependent_variable(var1)

If passing a DependentVariable instance, as a general recommendation, always pass a copy of the DependentVariable instance to the add_dependent_variable() method.

dict(update_timestamp=False, read_only=False)[source]

Serialize the CSDM instance as a python dictionary.

Parameters
  • update_timestamp (bool) – If True, timestamp is updated to current time.

  • read_only (bool) – If true, the read_only flag is set true.

Example

>>> data.dict() 
{'csdm': {'version': '1.0', 'timestamp': '1994-11-05T13:15:30Z',
'geographic_coordinate': {'latitude': '10 deg', 'longitude': '93.2 deg',
'altitude': '10 m'}, 'description': 'A simulated sine curve.',
'dimensions': [{'type': 'linear', 'description': 'A temporal dimension.',
'count': 10, 'increment': '0.1 s', 'quantity_name': 'time','label': 'time',
'reciprocal': {'quantity_name': 'frequency'}}], 'dependent_variables':
[{'type': 'internal', 'description': 'A response dependent variable.',
'name': 'sine curve', 'encoding': 'base64', 'numeric_type': 'float32',
'quantity_type': 'scalar', 'component_labels': ['response'],'components':
['AAAAABh5Fj9xeHM/cXhzPxh5Fj8yMQ0lGHkWv3F4c79xeHO/GHkWvw==']}]}}
to_dict(update_timestamp=False, read_only=False)[source]

Alias to the dict() method of the class.

dumps(update_timestamp=False, read_only=False, version='1.0', **kwargs)[source]

Serialize the CSDM instance as a JSON data-exchange string.

Parameters
  • update_timestamp (bool) – If True, timestamp is updated to current time.

  • read_only (bool) – If true, the file is serialized as read_only.

  • version (str) – The file is serialized with the given CSD model version.

Example

>>> data.dumps()  
save(filename='', read_only=False, version='1.0', output_device=None, indent=0)[source]

Serialize the CSDM instance as a JSON data-exchange file.

There are two types of file serialization extensions, .csdf and .csdfe. In the CSD model, when every instance of the DependentVariable objects from a CSDM class has an internal subtype, the corresponding CSDM instance is serialized with a .csdf file extension. If any single DependentVariable instance has an external subtype, the CSDM instance is serialized with a .csdfe file extension. The two different file extensions are used to alert the end-user of the possible deserialization error associated with the .csdfe file extensions had the external data file becomes inaccessible.

In csdmpy, however, irrespective of the dependent variable subtypes from the serialized JSON file, by default, all instances of DependentVariable class are treated an internal after import. Therefore, when serialized, the CSDM object should be stored as a .csdf file.

To store a file as a .csdfe file, the user much set the value of the encoding attribute from the dependent variables to raw. In which case, a binary file named filename_i.dat will be generated where \(i\) is the \(i^\text{th}\) dependent variable. The parameter filename is an argument of this method.

Note

Only dependent variables with encoding="raw" will be serialized to a binary file.

Parameters
  • filename (str) – The filename of the serialized file.

  • read_only (bool) – If true, the file is serialized as read_only.

  • version (str) – The file is serialized with the given CSD model version.

  • output_device (object) – Object where the data is written. If provided, the argument filename become irrelevant.

Example

>>> data.save('my_file.csdf')
to_list()[source]

Return the dimension coordinates and dependent variable components as a list of numpy arrays. For multiple dependent variables, the components of each dependent variable is appended in the order of the dependent variables.

For example,
  • A 2D{1} will be packed as \([x_{0}, x_{1}, y_{0,0}]\)

  • A 2D{3} will be packed as \([x_{0}, x_{1}, y_{0,0}, y_{0,1}, y_{0,2}]\)

  • A 1D{1,2} will be packed as \([x_{0}, y_{0,0}, y_{1,0}, y_{1,1}]\)

where \(x_i\) represents the \(i^\text{th}\) dimension and \(y_{i,j}\) represents the \(j^\text{th}\) component of the \(i^\text{th}\) dependent variable.

astype(numeric_type)[source]

Return a copy of the CSDM object by converting the numeric type of each dependent variables components to the given value.

Parameters

numeric_type – A numpy dtype or a string with a valid numeric type

Example

>>> data_32 = data_64.astype('float32')  
copy()[source]

Create a copy of the current CSDM instance.

Returns

A CSDM instance.

Example

>>> data.copy()  
split()[source]

Split the dependent-variables into view of individual csdm objects.

Returns

A list of CSDM objects, each with one dependent variable. The objects are returned as a view.

Example

>>> # data contains two dependent variables
>>> d1, d2 = data.split()  
transpose()[source]

Return a transpose of the dependent variable data from the CSDM object.

fft(axis=0)[source]

Perform a FFT along the given dimension=axis, for linear dimension assuming Nyquist-shannan relation.

Parameters

axis – The index of the dimension along which the FFT is performed.

The FFT method uses the complex_fft attribute of the Dimension object to decide whether a forward or inverse Fourier transform is performed. If the value of the complex_fft is True, an inverse FFT is performed, otherwise a forward FFT.

For FFT process, this function is equivalent to performing

phase = np.exp(-2j * np.pi * coordinates_offset * reciprocal_coordinates)
x_fft = np.fft.fftshift(np.fft.fft(x)) * phase

over all components for every dependent variable.

Similarly, for inverse FFT process, this function is equivalent to performing

phase = np.exp(2j * np.pi * reciprocal_coordinates_offset * coordinates)
x = np.fft.ifft(np.fft.ifftshift(x_fft * phase))

over all components for every dependent variable.

Returns

A CSDM object with the Fourier Transform data.

Numpy compatible method documentation

max(axis=None)[source]

Return a csdm object with the maximum dependent variable component along a given axis.

Parameters

axis – An integer or None or a tuple of m integers cooresponding to the index/indices of dimensions along which the sum of the dependent variable components is performed. If None, the output is the sum over all dimensions per dependent variable.

Returns

A CSDM object with m dimensions removed, or a numpy array when dimension is None.

Example

>>> data2 = data.max()  
min(axis=None)[source]

Return a csdm object with the minimum dependent variable component along a given axis.

Parameters

axis – An integer or None or a tuple of m integers cooresponding to the index/indices of dimensions along which the sum of the dependent variable components is performed. If None, the output is over all dimensions per dependent variable.

Returns

A CSDM object with m dimensions removed, or a list when axis is None.

clip(min=None, max=None)[source]

Clip the dependent variable components between the min and max values.

Parameters
  • min – The minimum clip value.

  • max – The maximum clip value.

Returns

A CSDM object with values clipped between min and max.

conj()[source]

Return a csdm object with the complex conjugate of all dependent variable components.

round(decimals=0)[source]

Return a csdm object by rounding the dependent variable components to the given decimals.

sum(axis=None)[source]

Return a csdm object with the sum of the dependent variable components over a given dimension=axis.

Parameters

axis – An integer or None or a tuple of m integers cooresponding to the index/indices of dimensions along which the sum of the dependent variable components is performed. If None, the output is over all dimensions per dependent variable.

Returns

A CSDM object with m dimensions removed, or a list when axis is None.

mean(axis=None)[source]

Return a csdm object with the mean of the dependent variable components over a given dimension=axis.

Parameters

axis – An integer or None or a tuple of m integers cooresponding to the index/indices of dimensions along which the sum of the dependent variable components is performed. If None, the output is over all dimensions per dependent variable.

Returns

A CSDM object with m dimensions removed, or a list when axis is None.

var(axis=None)[source]

Return a csdm object with the variance of the dependent variable components over a given dimension=axis.

Parameters

axis – An integer or None or a tuple of m integers cooresponding to the index/indices of dimensions along which the sum of the dependent variable components is performed. If None, the output is over all dimensions per dependent variable.

Returns

A CSDM object with m dimensions removed, or a list when axis is None.

std(axis=None)[source]

Return a csdm object with the standard deviation of the dependent variable components over a given dimension=axis.

Parameters

axis – An integer or None or a tuple of m integers cooresponding to the index/indices of dimensions along which the sum of the dependent variable components is performed. If None, the output is over all dimensions per dependent variable.

Returns

A CSDM object with m dimensions removed, or a list when axis is None.

prod(axis=None)[source]

Return a csdm object with the product of the dependent variable components over a given dimension=axis.

Parameters

axis – An integer or None or a tuple of m integers cooresponding to the index/indices of dimensions along which the product of the dependent variable components is performed. If None, the output is over all dimensions per dependent variable.

Returns

A CSDM object with m dimensions removed, or a list when axis is None.