Welcome to the csdmpy documentation¶

Deployment

PyPI version PyPI - Python Version

Build Status

Travis (.org) GitHub Workflow Status Documentation Status

License

License

Metrics

PyPI - Downloads Total alerts Language grade: Python https://codecov.io/gh/DeepanshS/csdmpy/branch/master/graph/badge.svg

GitHub

GitHub issues

Citation

https://img.shields.io/badge/DOI-PLOS%20ONE%2015(1):%20e0225953-Purple?size=large https://zenodo.org/badge/DOI/10.5281/zenodo.3973603.svg

About

The csdmpy package is the Python support for the core scientific dataset (CSD) model file exchange-format 1. The package is based on the core scientific dataset (CSD) model, which is designed as a building block in the development of a more sophisticated portable scientific dataset file standard. The CSD model is capable of handling a wide variety of scientific datasets both within and across disciplinary fields.

The main objective of this python package is to facilitate an easy import and export of the CSD model serialized files for Python users. The package utilizes Numpy library and, therefore, offers the end-users versatility to process or visualize the imported datasets with any third-party package(s) compatible with Numpy.


The sample CSDM compliant files used in this documentation are available online.

https://img.shields.io/badge/Download-CSDM%20sample%20files-blueviolet?size=large

View the core scientific dataset model (CSDM) examples gallery.

https://img.shields.io/badge/View-Example%20Gallery-Purple?size=large

Tutorial on generating and serializing CSDM objects from Numpy arrays.

https://img.shields.io/badge/View-Tutorial%20Gallery-Blue?size=large

Table of Contents¶

Introduction to CSDM format¶

The core scientific dataset (CSD) model is a light-weight, portable, versatile, and standalone data model capable of handling a variety of scientific datasets. The model only encapsulates data values and the minimum metadata to accurately represent a p-component dependent variable, \((\mathbf{U}_0, ... \mathbf{U}_q, ... \mathbf{U}_{p-1})\), discretely sampled at M unique points in a d-dimensional coordinate space, \((\mathbf{X}_0, \mathbf{X}_1, ... \mathbf{X}_k, ... \mathbf{X}_{d-1})\). The model is not intended to encapsulate any information on how the data might be acquired, processed, or visualized.

The data model is versatile in allowing many use cases for most spectroscopy, diffraction, and imaging techniques. As such the model supports multi-component datasets associated with continuous physical quantities that are discretely sampled in a multi-dimensional space associated with other carefully controlled quantities, for e.g., a mass as a function of temperature, a current as a function of voltage and time, a signal voltage as a function of magnetic field gradient strength, a color image with a red, green, and blue (RGB) light intensity components as a function of two independent spatial dimensions, or the six components of the symmetric second-rank diffusion tensor MRI as a function of three independent spatial dimensions. Additionally, the model supports multiple dependent variables sharing the same \(d\)-dimensional coordinate space. For example, a simultaneous measurement of current and voltage as a function of time, simultaneous acquisition of air temperature, pressure, wind velocity, and solar-flux as a function of Earth’s latitude and longitude coordinates. We refer to these dependent variables as correlated-datasets.

The CSD model is independent of the hardware, operating system, application software, programming language, and the object-oriented file-serialization format utilized in serializing the CSD model to the file. Out of numerous file serialization formats, XML, JSON, property list, we chose the data-exchange oriented JSON (JavaScript Object Notation) file-serialization format because it is human-readable and easily integrable with any number of programming languages and field related application-software.

CSDM¶

Description¶

The root level object of the CSD model.

Attributes¶

Name

Type

Description

version

String

A required version number of CSDM file-exchange format.

dimensions

[Dimension, 
]

A required ordered and unique array of dimension objects. An empty array is a valid value.

dependent_variables

[DependentVariable, 
]

A required array of dependent-variable objects. An empty array is a valid value.

tags

[String, 
]

An optional list of keywords associated with the dataset.

read_only

Boolean

An optional value with default as False. If true, the serialized file is archived.

timestamp

String

An optional UTC ISO-8601 format timestamp from when the CSDM-compliant file was last serialized.

geographic_coordinate

geographic_coordinate

An optional object with attributes required to describe the location from where the CSDM-compliant file was last serialized.

description

String

An optional description of the datasets in the CSD model.

application

Generic

An optional generic dictionary object containing application specific metadata describing the CSDM object.

Dimension¶

A generalized object describing a dimension of a multi-dimensional grid/space.

Specialized Class¶

Attributes¶

Name

Type

Description

type

DimObjectSubtype

A required enumeration literal with a valid dimension subtype.

label

String

An optional label of the dimension.

description

String

An optional description of the dimension.

application

Generic

An optional generic dictionary object containing application specific metadata describing the dimension.

DependentVariable¶

Description¶

A generalized object describing a dependent variable of the dataset, which holds an ordered list of p components, indexed as q=0 to p-1, as

()¶\[[\mathbf{U}_0, ... \mathbf{U}_q, ... \mathbf{U}_{p-1}].\]

Specialized Class¶

Attributes¶

Name

Type

Description

type

DVObjectSubtype

An enumeration literal with a valid dependent variable subtype.

name

String

Name of the dependent variable.

unit

String

The unit associated with the physical quantities describing the dependent variable.

quantity_name

String

Quantity name associated with the physical quantities describing the dependent variable.

numeric_type

NumericType

An enumeration literal with a valid numeric type.

quantity_type

QuantityType

An enumeration literal with a valid quantity type.

component_labels

[String, String, 
 ]

Ordered array of labels associated with ordered array of components of the dependent variable.

sparse_sampling

SparseSampling

Object with attribute required to describe a sparsely sampled dependent variable components.

description

String

Description of the dependent variable.

application

Generic

Generic dictionary object containing application specific metadata describing the dependent variable.

Enumeration¶

DimObjectSubtype¶

An enumeration with literals as the value of the Dimension objects’ type attribute.

Literal

Description

linear

Literal specifying an instance of a LinearDimension object.

monotonic

Literal specifying an instance of a MonotonicDimension object.

labeled

Literal specifying an instance of a LabeledDimension object.

DVObjectSubtype¶

An enumeration with literals as the values of the DependentVariable object’ type attribute.

Literal

Description

internal

Literal specifying an instance of an InternalDependentVariable object.

external

Literal specifying an instance of an ExternalDependentVariable object.

NumericType¶

An enumeration with literals as the value of the DependentVariable objects’ numeric_type attribute.

Literal

Description

uint8

8-bit unsigned integer

uint16

16-bit unsigned integer

uint32

32-bit unsigned integer

uint64

64-bit unsigned integer

int8

8-bit signed integer

int16

16-bit signed integer

int32

32-bit signed integer

int64

64-bit signed integer

float32

32-bit floating point number

float64

64-bit floating point number

complex64

two 32-bit floating points numbers

complex128

two 64-bit floating points numbers

QuantityType¶

An enumeration with literals as the value of the DependentVariable objects’ quantity_type attribute. The value is used in interpreting the p-components of the dependent variable.

  • scalar

    A dependent variable with \(p=1\) component interpret as a scalar, \(\mathcal{S}_i=U_{0,i}\).

  • vector_n

    A dependent variable with \(p=n\) components interpret as vector components, \(\mathcal{V}_i= \left[ U_{0,i}, U_{1,i}, ... U_{n-1,i}\right]\).

  • matrix_n_m

    A dependent variable with \(p=mn\) components interpret as a \(n \times m\) matrix as follows,

    ()¶\[\begin{split}M_i = \left[ \begin{array}{cccc} U_{0,i} & U_{1,i} & ... &U_{(n-1)m,i} \\ U_{1,i} & U_{m+1,i} & ... &U_{(n-1)m+1,i} \\ \vdots & \vdots & \vdots & \vdots \\ U_{m-1,i} & U_{2m-1,i} & ... &U_{nm-1,i} \end{array} \right]\end{split}\]
  • symmetric_matrix_n

    A dependent variable with \(p=n^2\) components interpret as a matrix symmetric about its leading diagonal as shown below,

    ()¶\[\begin{split}M^{(s)}_i = \left[ \begin{array}{cccc} U_{0,i} & U_{1,i} & ... & U_{n-1,i} \\ U_{1,i} & U_{n,i} & ... &U_{2n-2,i} \\ \vdots & \vdots & \vdots & \vdots \\ U_{n-1,i} & U_{2n-2,i} & ... &U_{\frac{n(n+1)}{2}-1,i} \end{array} \right]\end{split}\]
  • pixel_n

    A dependent variable with \(p=n\) components interpret as image/pixel components, \(\mathcal{P}_i= \left[ U_{0,i}, U_{1,i}, ... U_{n-1,i}\right]\).

Here, the terms \(n\) and \(m\) are intergers.

ScalarQuantity¶

ScalarQuantity is an object composed of a numerical value and any valid SI unit symbol or any number of accepted non-SI unit symbols. It is serialized in the JSON file as a string containing a numerical value followed by the unit symbol, for example,

  • “3.4 m” (SI)

  • “2.3 bar” (non-SI)

Installation¶

Requirements¶

csdmpy has the following strict requirements:

Other requirements include:

Installing csdmpy¶

On Local machine (Using pip)¶

PIP is a package manager for Python packages and is included with python version 3.4 and higher. PIP is the easiest way to install python packages.

$ pip install csdmpy

If you get a PermissionError, it usually means that you do not have the required administrative access to install new packages to your Python installation. In this case, you may consider adding the --user option, at the end of the statement, to install the package into your home directory. You can read more about how to do this in the pip documentation.

$ pip install csdmpy --user
Upgrading to a newer version¶

To upgrade, type the following in the terminal/Prompt

$ pip install csdmpy -U

On Google Colab Notebook¶

Colaboratory is a Google research project. It is a Jupyter notebook environment that runs entirely in the cloud. Launch a new notebook on Colab. To install the package, type

!pip install csdmpy

in the first cell, and execute. All done! You may now start using the library.

Getting started with csdmpy package¶

We have put together a set of guidelines for importing the csdmpy package and related methods and attributes. We encourage the users to follow these guidelines to promote consistency, amongst others. Import the package using

>>> import csdmpy as cp

To load a .csdf or a .csdfe file, use the load() method of the csdmpy module. In the following example, we load a sample test file.

>>> filename = cp.tests.test01 # replace this with your file's name.
>>> testdata1 = cp.load(filename)

Here, testdata1 is an instance of the CSDM class.

At the root level, the CSDM object includes various useful optional attributes that may contain additional information about the dataset. One such useful attribute is the description key, which briefs the end-users on the contents of the dataset. To access the value of this attribute use,

>>> testdata1.description
'A simulated sine curve.'

Accessing dimensions and dependent variables of the dataset¶

An instance of the CSDM object may include multiple dimensions and dependent variables. Collectively, the dimensions form a multi-dimensional grid system, and the dependent variables populate this grid. In csdmpy, dimensions and dependent variables are structured as list object. To access these lists, use the dimensions and dependent_variables attribute of the CSDM object, respectively. For example,

>>> x = testdata1.dimensions
>>> y = testdata1.dependent_variables

In this example, the dataset contains one dimension and one dependent variable.

You may access the instances of individual dimension and dependent variable by using the proper indexing. For example, the dimension and dependent variable at index 0 may be accessed using x[0] and y[0], respectively.

Every instance of the Dimension object has its own set of attributes that further describe the respective dimension. For example, a Dimension object may have an optional description attribute,

>>> x[0].description
'A temporal dimension.'

Similarly, every instance of the DependentVariable object has its own set of attributes. In this example, the description attribute from the dependent variable is

>>> y[0].description
'A response dependent variable.'

Coordinates along the dimension¶

Every dimension object contains a list of coordinates associated with every grid index along the dimension. To access these coordinates, use the coordinates attribute of the respective Dimension instance. In this example, the coordinates are

>>> x[0].coordinates
<Quantity [0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9] s>

Note

x[0].coordinates returns a Quantity instance from the Astropy package. The csdmpy module utilizes the units library from astropy.units module to handle physical quantities. The numerical value and the unit of the physical quantities are accessed through the Quantity instance, using the value and the unit attributes, respectively. Please refer to the astropy.units documentation for details. In the csdmpy module, the Quantity.value is a Numpy array. For instance, in the above example, the underlying Numpy array from the coordinates attribute is accessed as

>>> x[0].coordinates.value
array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])

Components of the dependent variable¶

Every dependent variable object has at least one component. The number of components of the dependent variable is determined from the quantity_type attribute of the dependent variable object. For example, a scalar quantity has one-component, while a vector quantity may have multiple components. To access the components of the dependent variable, use the components attribute of the respective DependentVariable instance. For example,

>>> y[0].components
array([[ 0.0000000e+00,  5.8778524e-01,  9.5105654e-01,  9.5105654e-01,
         5.8778524e-01,  1.2246469e-16, -5.8778524e-01, -9.5105654e-01,
        -9.5105654e-01, -5.8778524e-01]], dtype=float32)

The components attribute is a Numpy array. Note, the number of dimensions of this array is \(d+1\), where \(d\) is the number of Dimension objects from the dimensions attribute. The additional dimension in the Numpy array corresponds to the number of components of the dependent variable. For instance, in this example, there is a single dimension, i.e., \(d=1\) and, therefore, the value of the components attribute holds a two-dimensional Numpy array of shape

>>> y[0].components.shape
(1, 10)

where the first element of the shape tuple, 1, is the number of components of the dependent variable and the second element, 10, is the number of points along the dimension, i.e., x[0].coordinates.

Plotting the dataset¶

It is always helpful to represent a scientific dataset with visual aids such as a plot or a figure instead of columns of numbers. As such, throughout this documentation, we provide a figure or two for every example dataset. We make use of Python’s Matplotlib library for generating these figures. The users may, however, use their favorite plotting library.

The following snippet plots the dataset from this example. Here, the axis_label is an attribute of both Dimension and DependentVariable instances, and the name is an attribute of the DependentVariable instance.

>>> import matplotlib.pyplot as plt

>>> plt.figure(figsize=(5, 3.5))  
>>> plt.plot(x[0].coordinates, y[0].components[0])  
>>> plt.xlabel(x[0].axis_label)  
>>> plt.ylabel(y[0].axis_label[0])  
>>> plt.title(y[0].name)  
>>> plt.tight_layout()  
>>> plt.show()

(Source code, png, hires.png, pdf)

_images/getting_started.png

Serializing CSDM object to file¶

An instance of a CSDM object is serialized as a csdf/csdfe JSON-format file with the save() method. When serializing the dependent-variable from the CSDM object to the data-file, the csdmpy module uses the value of the dependent variable’s encoding attribute to determine the encoding type of the serialized data. There are three encoding types for the dependent variables:

  • none

  • base64

  • raw

Note

By default, all instances of DependentVariable from a CSDM object are serialized as base64 strings.

For the following examples, consider data as an instance of the CSDM class.

To serialize a dependent variable with a given encoding type, set the value of it’s encoding attribute to the respective encoding. For example,

As ``none`` encoding

>>> data.dependent_variables[0].encoding = "none"
>>> data.save('my_file.csdf')

The above code will serialize the dependent variable at index zero to a JSON file, my_file.csdf, where each component of the dependent variable is serialized as an array of JSON number.

As ``base64`` encoding

>>> data.dependent_variables[0].encoding = "base64"
>>> data.save('my_file.csdf')

The above code will serialize the dependent variable at index zero to a JSON file, my_file.csdf, where each component of the dependent variable is serialized as a base64 string.

As ``raw`` encoding

>>> data.dependent_variables[0].encoding = "raw"
>>> data.save('my_file.csdfe')

The above code will serialize the metadata from the dependent variable at index zero to a JSON file, my_file.csdfe, which includes a link to an external file where the components of the respective dependent variable are serialized as a binary array. The binary file is named, my_file_0.dat, where my_file is the filename from the argument of the save method, and 0 is the index number of the dependent variable from the CSDM object.

Multiple encoding types

In the case of multiple dependent-variables, you may choose to serialize each dependent variables with a different encoding, for example,

>>> my_data.dependent_variables[0].encoding = "raw"
>>> my_data.dependent_variables[1].encoding = "base64"
>>> my_data.dependent_variables[2].encoding = "none"
>>> my_data.dependent_variables[3].encoding = "base64"
>>> my_data.save('my_file.csdfe')

In the above example, my_data is a CSDM object containing four DependentVariable objects. Here, we serialize the dependent variable at index two with none, the dependent variables at index one and three with bae64, and the dependent variables at index zero with raw encoding, respectively.

Note

Because an instance of the dependent variable, that is, the index zero in the above example, is set to be serialized with an external subtype, the corresponding file should be saved with a .csdfe extension.

Using csdmpy objects¶

The csdmpy module is not just designed for deserializing and serializing the .csdf or .csdfe files. It can also be used to create new datasets, a feature that is most useful when converting datasets to CSDM compliant files.

Generating Dimension objects¶

LinearDimension¶

A LinearDimension is where the coordinates are regularly spaced along the dimension. This type of dimension is frequently encountered in many scientific datasets. There are several ways to generate LinearDimension.

Using the Dimension class.

>>> import csdmpy as cp
>>> x = cp.Dimension(type='linear', count=10, increment="0.1 s", label="time", description="A temporal dimension.")
>>> print(x)
LinearDimension([0.  0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9] s)

Using the LinearDimension class.

>>> import csdmpy as cp
>>> x1 = cp.LinearDimension(count=10, increment="0.1 s", label="time",
...                          description="A temporal dimension.")
>>> print(x1)
LinearDimension([0.  0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9] s)

Using NumPy array

You may also create a LinearDimesion object from a one-dimensional NumPy array using the as_dimension() method.

>>> import numpy as np
>>> array = np.arange(10) * 0.1
>>> x2 = cp.as_dimension(array)
>>> print(x2)
LinearDimension([0.  0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9])

Note, the Dimension object x2 is dimensionless. You can create a physical dimension by either providing an appropriate unit as the argument to the as_dimension() method,

>>> x3 = cp.as_dimension(array, unit='s')
>>> print(x3)
LinearDimension([0.  0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9] s)

or appropriately multiplying the dimension object x2 with a ScalarQuantity.

>>> x2 *= cp.ScalarQuantity('s')
>>> print(x2)
LinearDimension([0.  0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9] s)

The coordinates of the x2 LinearDimension object are

>>> x2.coordinates
<Quantity [0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9] s>

where x2.coordinates is a Quantity array. The value and the unit of the quantity instance are

>>> # To access the numpy array
>>> numpy_array = x.coordinates.value
>>> print('numpy array =', numpy_array)
numpy array = [0.  0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9]

>>> # To access the astropy.unit
>>> unit = x.coordinates.unit
>>> print('unit =', unit)
unit = s

respectively.

Note

When generating LinearDimension objects from NumPy array, the NumPy array must be one-dimensional and regularly spaced.

>>> cp.as_dimension(np.arange(20).reshape(2, 10))  
ValueError: Cannot convert a 2 dimensional array to a Dimension object.

MonotonicDimension¶

A MonotonicDimension is one where the coordinates along the dimension are sampled monotonically, that is, either strictly increasing or decreasing coordinates. Like the LinearDimension, there are several ways to generate a MonotonicDimension.

Using the Dimension class.

>>> import csdmpy as cp
>>> x = cp.Dimension(type='monotonic',
...                  coordinates=['10ns', '100ns', '1”s', '10”s', '100”s',
...                               '1ms', '10ms', '100ms', '1s', '10s'])
>>> print(x)
MonotonicDimension([1.e+01 1.e+02 1.e+03 1.e+04 1.e+05 1.e+06 1.e+07 1.e+08 1.e+09 1.e+10] ns)

Using the MonotonicDimension class.

>>> import numpy as np
>>> array = np.asarray([-0.28758166, -0.22712233, -0.19913859, -0.17235106,
...                     -0.1701172, -0.10372635, -0.01817061, 0.05936719,
...                     0.18141424, 0.34758913])
>>> x = cp.MonotonicDimension(coordinates=array)*cp.ScalarQuantity('cm')
>>> print(x)
MonotonicDimension([-0.28758166 -0.22712233 -0.19913859 -0.17235106 -0.1701172  -0.10372635
 -0.01817061  0.05936719  0.18141424  0.34758913] cm)

In the above example, we generate a dimensionless MonotonicDimension from the NumPy array and then scale its dimensionality by multiplying the object with an appropriate ScalarQuantity.

From numpy arrays.

Use the as_dimension() method to convert a numpy array as a Dimension object.

>>> numpy_array = 10 ** (np.arange(10)/10)
>>> x_dim = cp.as_dimension(numpy_array, unit='A')
>>> print(x_dim)
MonotonicDimension([1.         1.25892541 1.58489319 1.99526231 2.51188643 3.16227766
 3.98107171 5.01187234 6.30957344 7.94328235] A)

When generating MonotonicDimension object using the Numpy array, the array must be monotonic, that is, either strictly increasing or decreasing. An exception will be raised otherwise.

>>> numpy_array = np.random.rand(10)
>>> x_dim = cp.as_dimension(numpy_array) 
Exception: Invalid array for Dimension object.

LabeledDimension¶

A LabeledDimension is one where the coordinates along the dimension are string labels. You can similarly generate a labeled dimension.

Using the Dimension class.

>>> import csdmpy as cp
>>> x = cp.Dimension(type='labeled',
...                  labels=['The', 'great', 'circle'])
>>> print(x)
LabeledDimension(['The' 'great' 'circle'])

Using the LabeledDimension class.

>>> x = cp.LabeledDimension(labels=['The', 'great', 'circle'])
>>> print(x)
LabeledDimension(['The' 'great' 'circle'])

From numpy arrays or python list.

Use the as_dimension() method to convert a numpy array as a Dimension object.

>>> array = ['The', 'great', 'circle']
>>> x = cp.as_dimension(array)
>>> print(x)
LabeledDimension(['The' 'great' 'circle'])

Generating DependentVariable objects¶

A DependentVariable is where the responses of the multi-dimensional dataset reside. There are two types of DependentVariable objects, internal and external. In this section, we show how to generate DependentVariable objects of both types.

InternalDependentVariable¶

Single component dependent variable¶

Using the DependentVariable class.

>>> dv1 = cp.DependentVariable(type='internal', quantity_type='scalar',
...                            components=np.arange(10000), unit='J',
...                            description='A sample internal dependent variable.')
>>> print(dv1)
DependentVariable(
[[   0    1    2 ... 9997 9998 9999]] J, quantity_type=scalar, numeric_type=int64)

Using NumPy array

Use the as_dependent_variable() method to convert a NumPy array into a DependentVariable object. Note, this method returns a view of the NumPy array as the DependentVariable object.

>>> dv1 = cp.as_dependent_variable(np.arange(10000).astype(np.complex64), unit='J')
>>> print(dv1)
DependentVariable(
[[0.000e+00+0.j 1.000e+00+0.j 2.000e+00+0.j ... 9.997e+03+0.j
  9.998e+03+0.j 9.999e+03+0.j]] J, quantity_type=scalar, numeric_type=complex64)

You may additionally provide the quantity_type for the dependent variable,

>>> dv2 = cp.as_dependent_variable(np.arange(10000).astype(np.complex64), quantity_type='pixel_1')
>>> print(dv2)
DependentVariable(
[[0.000e+00+0.j 1.000e+00+0.j 2.000e+00+0.j ... 9.997e+03+0.j
  9.998e+03+0.j 9.999e+03+0.j]], quantity_type=pixel_1, numeric_type=complex64)
Multi-component dependent variable¶

To generate a multi-component DependentVariable object, add an appropriate quantity_type value, see QuantityType for details.

Using the DependentVariable class.

>>> dv1 = cp.DependentVariable(type='internal', quantity_type='vector_2',
...                            components=np.arange(10000), unit='J',
...                            description='A sample internal dependent variable.')
>>> print(dv1)
DependentVariable(
[[   0    1    2 ... 4997 4998 4999]
 [5000 5001 5002 ... 9997 9998 9999]] J, quantity_type=vector_2, numeric_type=int64)

The above example generates a two-component dependent variable.

Using NumPy array

>>> dv1 = cp.as_dependent_variable(np.arange(9000).astype(np.complex64),
...                                unit='m/s', quantity_type='symmetric_matrix_3')
>>> print(dv1)
DependentVariable(
[[0.000e+00+0.j 1.000e+00+0.j 2.000e+00+0.j ... 1.497e+03+0.j
  1.498e+03+0.j 1.499e+03+0.j]
 [1.500e+03+0.j 1.501e+03+0.j 1.502e+03+0.j ... 2.997e+03+0.j
  2.998e+03+0.j 2.999e+03+0.j]
 [3.000e+03+0.j 3.001e+03+0.j 3.002e+03+0.j ... 4.497e+03+0.j
  4.498e+03+0.j 4.499e+03+0.j]
 [4.500e+03+0.j 4.501e+03+0.j 4.502e+03+0.j ... 5.997e+03+0.j
  5.998e+03+0.j 5.999e+03+0.j]
 [6.000e+03+0.j 6.001e+03+0.j 6.002e+03+0.j ... 7.497e+03+0.j
  7.498e+03+0.j 7.499e+03+0.j]
 [7.500e+03+0.j 7.501e+03+0.j 7.502e+03+0.j ... 8.997e+03+0.j
  8.998e+03+0.j 8.999e+03+0.j]] m / s, quantity_type=symmetric_matrix_3, numeric_type=complex64)

The above example generates a six-component dependent variable.

Note

For multi-component DependentVariable objects, the size of the NumPy array must be an integer multiple of the total number of components.

>>> d1 = cp.as_dependent_variable(np.arange(127), quantity_type='pixel_2') 
ValueError: cannot reshape array of size 127 into shape (2,63)

Notice in the above examples, we use a one-dimensional NumPy array to generate a DependentVariable object. If a multi-dimensional NumPy array is given as the argument, the array will be raveled (flattened) before returning the DependentVariable object. Note, in the core scientific dataset model, the DependentVariable objects only contain information about the number of components and not the dimensions. For example, consider the following.

>>> d2 = cp.as_dependent_variable(np.arange(6000).reshape(10,20,30), quantity_type='vector_2')
>>> print(d2)
DependentVariable(
[[   0    1    2 ... 2997 2998 2999]
 [3000 3001 3002 ... 5997 5998 5999]], quantity_type=vector_2, numeric_type=int64)

Here, a three-dimensional Numpy array is given as the argument with a quantity_type of vector_2. The DependentVariable object generated from this array contains two-components by appropriately flattening the input array.

ExternalDependentVariable¶

The ExternalDependentVariable objects are generated similar to the InternalDependentVariable object. The only difference is that the components of the dependent variable are located at a remote and local address.

Using the DependentVariable class.

>>> dv = cp.DependentVariable(type='external', quantity_type='scalar', unit='J',
...                           components_url='address to the binary file.',
...                           numeric_type='int64',
...                           description='A sample internal dependent variable.') 

A DependentVariable of type external is useful for data serialization. When using with csdmpy, all instances of the external dependent variable objects are set as internal after downloading the components from the components_url.

Generating CSDM objects¶

An empty csdm object¶

To create a new empty csdm object, import the csdmpy module and create a new instance of the CSDM class following,

>>> import csdmpy as cp
>>> new_data = cp.new(description='A new test dataset')

The new() method returns an instance of the CSDM class with zero dimensions and dependent variables. respectively, i.e., a 0D{0} dataset. In the above example, this instance is assigned to the new_data variable. Optionally, a description may also be provided as an argument of the new() method. The data structure from the above example is

>>> print(new_data.data_structure)
{
  "csdm": {
    "version": "1.0",
    "description": "A new test dataset"
  }
}

From a NumPy array¶

Perhaps the easiest way to generate a csdm object is to convert the NumPy array holding the dataset as a csdm object using the as_csdm() method, which returns a view of the array as a CSDM object. Here, the NumPy array becomes the dependent variable of the CSDM object of the given quantity_type. Unlike the as_dependent_variable() method, however, the as_csdm() method retains the shape of the Numpy array and uses this information to generate the dimensions of the CSDM object. By default, the dimensions are of a linear subtype with unit increment. Consider the following example.

>>> array = np.arange(30).reshape(3, 10)
>>> csdm_obj = cp.as_csdm(array)
>>> print(csdm_obj)
CSDM(
DependentVariable(
[[[ 0  1  2  3  4  5  6  7  8  9]
  [10 11 12 13 14 15 16 17 18 19]
  [20 21 22 23 24 25 26 27 28 29]]], quantity_type=scalar, numeric_type=int64),
LinearDimension([0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]),
LinearDimension([0. 1. 2.])
)

Here, a two-dimensional NumPy array of shape (3, 10) is given as the argument of the as_csdm() method. The resulting CSDM object, csdm_obj, contains a 2D{1} datasets, with two linear dimensions of unit increment and 10 and 3 points, respectively, and a single one-component dependent variable of quantity_type scalar.

Note

The order of the dimensions in the CSDM object is the reverse of the order of axes from the corresponding Numpy array. Thus, the dimension at index 0 of the CSDM object is the last axis of the Numpy array.

You may additionally provide a quantity type as the argument of the as_csdm() method. When the quantity type requires more than one component, see QuantityType, the first axis of the NumPy array must be the number of components. For example,

>>> csdm_obj1 = cp.as_csdm(array, quantity_type='pixel_3')
>>> print(csdm_obj1)
CSDM(
DependentVariable(
[[ 0  1  2  3  4  5  6  7  8  9]
 [10 11 12 13 14 15 16 17 18 19]
 [20 21 22 23 24 25 26 27 28 29]], quantity_type=pixel_3, numeric_type=int64),
LinearDimension([0. 1. 2. 3. 4. 5. 6. 7. 8. 9.])
)

Here, the csdm_obj1 object is a 1D{3} datasets, with a single three-component dependent variable. In this case, the length of the NumPy array along axis 0, i.e., 3, is consistent with the number of components required by the quantity type pixel_3. The remaining axes of the NumPy array are used in generating the dimensions of the csdm object. In this example, this corresponds to a single dimension of linear type with 10 points.

The following example generates a 3D{2} vector dataset. Here, the first axis of the four-dimensional Numpy array is the components of the vector dataset, and the remaining three axes become the respective dimensions.

>>> array2 = np.arange(12000).reshape(2,30,20,10)
>>> csdm_obj2 = cp.as_csdm(array2, quantity_type='vector_2')
>>> print(len(csdm_obj2.dimensions), len(csdm_obj2.dependent_variables[0].components))
3 2

An exception will be raised if the quantity_type and the number of points along the first axis of the NumPy array are inconsistent, for example,

>>> csdm_obj_err = cp.as_csdm(array, quantity_type='vector_2')  
ValueError: Expecting exactly 2 components for quantity type, `vector_2`, found 3.
Make sure `array.shape[0]` is equal to the number of components supported by vector_2.

Note

Only a csdm object with a single dependent variable may be created from a NumPy array. To add more dependent variables to the CSDM object, see Adding DependentVariable objects to CSDM object.

Adding Dimension objects to CSDM object¶

There are three subtypes of Dimension objects,

  • LinearDimension

  • MonotonicDimension

  • LabeledDimension

Using an instance of the Dimension class

Please read the topic Generating Dimension objects for details on how to generate an instance of the Dimension class. Once created, use the dimensions to generate a CSDM object.

>>> linear_dim = cp.LinearDimension(count=10, increment='0.1 C/V')
>>> new_data = cp.CSDM(dimensions=[linear_dim])
>>> print(new_data)
CSDM(
LinearDimension([0.  0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9] C / V)
)

Using Python’s dictionary objects

When using python dictionaries, the key-value pairs of the dictionary must be a valid collection for the given Dimension subtype. For example,

>>> # dictionary representation of a linear dimension.
>>> d0 = {
...     'type': 'linear',
...     'description': 'This is a linear dimension',
...     'count': 5,
...     'increment': '0.1 rad'
... }
>>> # dictionary representation of a monotonic dimension.
>>> d1 = {
...     'type': 'monotonic',
...     'description': 'This is a monotonic dimension',
...     'coordinates': ['1 m/s', '2 cm/s', '4 mm/s'],
... }
>>> # dictionary representation of a labeled dimension.
>>> d2 = {
...     'type': 'labeled',
...     'description': 'This is a labeled dimension',
...     'labels': ['Cu', 'Ag', 'Au'],
... }
>>> # add the dictionaries to the CSDM object.
>>> new_data = cp.CSDM(dimensions=[d0, d1, d2])
>>> print(new_data)
CSDM(
LinearDimension([0.  0.1 0.2 0.3 0.4] rad),
MonotonicDimension([1.    0.02  0.004] m / s),
LabeledDimension(['Cu' 'Ag' 'Au'])
)

Adding DependentVariable objects to CSDM object¶

There are two subtypes of DependentVariable class:

  • InternalDependentVariable: We refer to an instance of the DependentVariable as internal when the components of the dependent variable are listed along with the other metadata specifying the dependent variable.

  • ExternalDependentVariable: We refer to an instance of the DependentVariable as external when the components of the dependent variable are stored in an external file as binary data either locally or at a remote server.

Using an instance of the DependentVariable class

Please read the topic Generating DependentVariable objects for details on how to generate an instance of the DependentVariable class. Once created, use the dependent variables to generate a CSDM object.

>>> dv = cp.as_dependent_variable(np.arange(10))
>>> new_data = cp.CSDM(dependent_variables=[dv])
>>> print(new_data)
CSDM(
DependentVariable(
[[0 1 2 3 4 5 6 7 8 9]], quantity_type=scalar, numeric_type=int64)
)

Using Python’s dictionary objects

When using python dictionaries, the key-value pairs of the dictionary must be a valid collection for the given DependentVariable subtype. For example,

>>> dv0 = {
...     'type': 'internal',
...     'quantity_type': 'scalar',
...     'description': 'This is an internal scalar dependent variable',
...     'unit': 'cm',
...     'components': np.arange(50)
... }
>>> dv1 = {
...     'type': 'internal',
...     'quantity_type': 'vector_2',
...     'description': 'This is an internal vector dependent variable',
...     'unit': 'cm',
...     'components': np.arange(100)
... }
>>> new_data = cp.CSDM(dependent_variables=[dv0, dv1])
>>> print(new_data)
CSDM(
DependentVariable(
[[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
  24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
  48 49]] cm, quantity_type=scalar, numeric_type=int64),
DependentVariable(
[[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
  24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
  48 49]
 [50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73
  74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97
  98 99]] cm, quantity_type=vector_2, numeric_type=int64)
)

Interacting with csdmpy objects¶

Interacting with Dimension objects¶

LinearDimension¶

There are several attributes and methods associated with the LinearDimension, each controlling the coordinates along the dimension. The following section demonstrates the effect of these attributes and methods on the coordinates of the LinearDimension.

>>> import csdmpy as cp
>>> x = cp.LinearDimension(count=10, increment="0.1 s", label="time",
...                          description="A temporal dimension.")
>>> print(x)
LinearDimension([0.  0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9] s)
Attributes¶
type

This attribute returns the type of the instance.

>>> print(x.type)
linear

The attributes that modify the coordinates

count

The number of points along the dimension

>>> print('number of points =', x.count)
number of points = 10

To update the number of points, update the value of this attribute,

>>> x.count = 12
>>> print('new number of points =', x.count)
new number of points = 12

>>> print('new coordinates =', x.coordinates)
new coordinates = [0.  0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.  1.1] s
increment
>>> print('old increment =', x.increment)
old increment = 0.1 s

>>> x.increment = "10 s"
>>> print('new increment =', x.increment)
new increment = 10.0 s

>>> print('new coordinates =', x.coordinates)
new coordinates = [  0.  10.  20.  30.  40.  50.  60.  70.  80.  90. 100. 110.] s
coordinates_offset
>>> print('old reference offset =', x.coordinates_offset)
old reference offset = 0.0 s

>>> x.coordinates_offset = "1 s"
>>> print('new reference offset =', x.coordinates_offset)
new reference offset = 1.0 s

>>> print('new coordinates =', x.coordinates)
new coordinates = [  1.  11.  21.  31.  41.  51.  61.  71.  81.  91. 101. 111.] s
origin_offset
>>> print('old origin offset =', x.origin_offset)
old origin offset = 0.0 s

>>> x.origin_offset = "1 day"
>>> print ('new origin offset =', x.origin_offset)
new origin offset = 1.0 d

>>> print('new coordinates =', x.coordinates)
new coordinates = [  1.  11.  21.  31.  41.  51.  61.  71.  81.  91. 101. 111.] s

The last operation updates the value of the origin offset, however, the coordinates remain unaffected. This is because the coordinates attribute refers to the reference coordinates. You may access the absolute coordinates through the absolute_coordinates attribute.

>>> print('absolute coordinates =', x.absolute_coordinates)
absolute coordinates = [86401. 86411. 86421. 86431. 86441. 86451. 86461. 86471. 86481. 86491.
 86501. 86511.] s

The attributes that modify the order of coordinates

complex_fft

If true, orders the coordinates along the dimension according to the output of a complex Fast Fourier Transform (FFT) routine.

>>> print('old coordinates =', x.coordinates)
old coordinates = [  1.  11.  21.  31.  41.  51.  61.  71.  81.  91. 101. 111.] s

>>> x.complex_fft = True
>>> print('new coordinates =', x.coordinates)
new coordinates = [-59. -49. -39. -29. -19.  -9.   1.  11.  21.  31.  41.  51.] s

Other attributes

period

The period of the dimension.

>>> print('old period =', x.period)
old period = inf s

>>> x.period = '10 s'
>>> print('new period =', x.period)
new period = 10.0 s
quantity_name

Returns the quantity name.

>>> print('quantity name is', x.quantity_name)
quantity name is time
label
>>> x.label
'time'

>>> x.label = 't1'
>>> x.label
't1'
axis_label

Returns a formatted string for labeling axis.

>>> x.label
't1'
>>> x.axis_label
't1 / (s)'
Methods¶

to(): This method is used for unit conversions.

>>> print('old unit =', x.coordinates.unit)
old unit = s

>>> print('old coordinates =', x.coordinates)
old coordinates = [-59. -49. -39. -29. -19.  -9.   1.  11.  21.  31.  41.  51.] s

>>> ## unit conversion
>>> x.to('min')

>>> print ('new coordinates =', x.coordinates)
new coordinates = [-0.98333333 -0.81666667 -0.65       -0.48333333 -0.31666667 -0.15
  0.01666667  0.18333333  0.35        0.51666667  0.68333333  0.85      ] min

Note

In the above examples, the coordinates are ordered according to the FFT output order, based on the previous set of operations.

The argument of this method is a string containing the unit, in this case, min, whose dimensionality is be consistent with the dimensionality of the coordinates. An exception will be raised otherwise.

>>> x.to('km/s')  
Exception: The unit 'km / s' (speed) is inconsistent with the unit 'min' (time).
Changing the dimensionality¶

You may scale the dimension object by multiplying the object with the appropriate ScalarQuantity, as follows,

>>> print(x)
LinearDimension([-0.98333333 -0.81666667 -0.65       -0.48333333 -0.31666667 -0.15
  0.01666667  0.18333333  0.35        0.51666667  0.68333333  0.85      ] min)
>>> x *= cp.ScalarQuantity('m/s')
>>> print(x)
LinearDimension([-59. -49. -39. -29. -19.  -9.   1.  11.  21.  31.  41.  51.] m)

MonotonicDimension¶

There are several attributes and methods associated with a MonotonicDimension, controlling the coordinates along the dimension. The following section demonstrates the effect of these attributes and methods on the coordinates.

>>> import numpy as np
>>> array = np.asarray([-0.28758166, -0.22712233, -0.19913859, -0.17235106,
...                     -0.1701172, -0.10372635, -0.01817061, 0.05936719,
...                     0.18141424, 0.34758913])
>>> x = cp.MonotonicDimension(coordinates=array)*cp.ScalarQuantity('cm')
Attributes¶

The following are the attributes of the MonotonicDimension instance.

type

This attribute returns the type of the instance.

>>> print(x.type)
monotonic

The attributes that modify the coordinates

count

The number of points along the dimension

>>> print ('number of points =', x.count)
number of points = 10

You may update the number of points with this attribute, however, you can only lower the number of points.

>>> x.count = 6
>>> print('new number of points =', x.count)
new number of points = 6
>>> print(x.coordinates)
[-0.28758166 -0.22712233 -0.19913859 -0.17235106 -0.1701172  -0.10372635] cm
origin_offset
>>> print('old origin offset =', x.origin_offset)
old origin offset = 0.0 cm

>>> x.origin_offset = "1 km"
>>> print('new origin offset =', x.origin_offset)
new origin offset = 1.0 km

>>> print(x.coordinates)
[-0.28758166 -0.22712233 -0.19913859 -0.17235106 -0.1701172  -0.10372635] cm

The last operation updates the value of the origin offset, however, the value of the coordinates attribute remains unchanged. This is because the coordinates refer to the reference coordinates. The absolute coordinates are accessed through the absolute_coordinates attribute.

>>> print('absolute coordinates =', x.absolute_coordinates)
absolute coordinates = [99999.71241834 99999.77287767 99999.80086141 99999.82764894
 99999.8298828  99999.89627365] cm

Other attributes

label
>>> x.label = 't1'
>>> print('new label =', x.label)
new label = t1
period
>>> print('old period =', x.period)
old period = inf cm

>>> x.period = '10 m'
>>> print('new period =', x.period)
new period = 10.0 m
quantity_name

Returns the quantity name.

>>> print ('quantity is', x.quantity_name)
quantity is length
Methods¶

to()

The method is used for unit conversions. It follows,

>>> print('old unit =', x.coordinates.unit)
old unit = cm
>>> print('old coordinates =', x.coordinates)
old coordinates = [-0.28758166 -0.22712233 -0.19913859 -0.17235106 -0.1701172  -0.10372635] cm

>>> ## unit conversion
>>> x.to('mm')

>>> print('new coordinates =', x.coordinates)
new coordinates = [-2.8758166 -2.2712233 -1.9913859 -1.7235106 -1.701172  -1.0372635] mm

The argument of this method is a unit, in this case, ‘mm’, whose dimensionality must be consistent with the dimensionality of the coordinates. An exception will be raised otherwise,

>>> x.to('km/s')  
Exception("Validation Failed: The unit 'km / s' (speed) is inconsistent with the unit 'mm' (length).")
Changing the dimensionality¶

You may scale the dimension object by multiplying the object with the appropriate ScalarQuantity, as follows,

>>> print(x)
MonotonicDimension([-2.8758166 -2.2712233 -1.9913859 -1.7235106 -1.701172  -1.0372635] mm)
>>> x *= cp.ScalarQuantity('2 s/mm')
>>> print(x)
MonotonicDimension([-0.57516332 -0.45424466 -0.39827718 -0.34470212 -0.3402344  -0.2074527 ] cm s / mm)

Interacting with CSDM objects¶

Basic math operations¶

The csdm object supports basic mathematical operations such as additive and multiplicative operations.

Note

All operations applied to or involving the csdm objects apply only to the components of the dependent variables within the csdm object. These operations do not apply to the dimensions within the csdm object.

Consider the following csdm data object.

>>> arr1 = np.arange(6, dtype=np.float32).reshape(2, 3)
>>> csdm_obj1 = cp.as_csdm(arr1)
>>> # converting the dimension to proper physical dimensions.
>>> csdm_obj1.dimensions[0]*=cp.ScalarQuantity('2.64 m')
>>> csdm_obj1.dimensions[0].coordinates_offset = '1 km'
>>> # converting the dimension to proper physical dimensions.
>>> csdm_obj1.dimensions[1]*=cp.ScalarQuantity('10 ”s')
>>> csdm_obj1.dimensions[1].coordinates_offset = '-0.5 ms'
>>> print(csdm_obj1)
CSDM(
DependentVariable(
[[[0. 1. 2.]
  [3. 4. 5.]]], quantity_type=scalar, numeric_type=float32),
LinearDimension([1000.   1002.64 1005.28] m),
LinearDimension([-500. -490.] us)
)
Additive operations involving a scalar¶

Example 1

>>> csdm_obj1 += np.pi
>>> print(csdm_obj1)
CSDM(
DependentVariable(
[[[3.1415927 4.141593  5.141593 ]
  [6.141593  7.141593  8.141593 ]]], quantity_type=scalar, numeric_type=float32),
LinearDimension([1000.   1002.64 1005.28] m),
LinearDimension([-500. -490.] us)
)

Example 2

>>> csdm_obj2 = csdm_obj1 + (2 - 4j)
>>> print(csdm_obj2)
CSDM(
DependentVariable(
[[[ 5.141593-4.j  6.141593-4.j  7.141593-4.j]
  [ 8.141593-4.j  9.141593-4.j 10.141593-4.j]]], quantity_type=scalar, numeric_type=complex64),
LinearDimension([1000.   1002.64 1005.28] m),
LinearDimension([-500. -490.] us)
)
Multiplicative operations involving scalar / ScalarQuantity¶

Example 3

>>> csdm_obj1 = cp.as_csdm(np.ones(6).reshape(2, 3))
>>> csdm_obj2 = csdm_obj1 * 4.693
>>> print(csdm_obj2)
CSDM(
DependentVariable(
[[[4.693 4.693 4.693]
  [4.693 4.693 4.693]]], quantity_type=scalar, numeric_type=float64),
LinearDimension([0. 1. 2.]),
LinearDimension([0. 1.])
)

Example 4

>>> csdm_obj2 = csdm_obj1 * 3j/2.4
>>> print(csdm_obj2)
CSDM(
DependentVariable(
[[[0.+1.25j 0.+1.25j 0.+1.25j]
  [0.+1.25j 0.+1.25j 0.+1.25j]]], quantity_type=scalar, numeric_type=complex128),
LinearDimension([0. 1. 2.]),
LinearDimension([0. 1.])
)

You may change the dimensionality of the dependent variables by multiplying the csdm object with the appropriate scalar quantity, for example,

Example 5

>>> csdm_obj1 *= cp.ScalarQuantity('3.23 m')
>>> print(csdm_obj1)
CSDM(
DependentVariable(
[[[3.23 3.23 3.23]
  [3.23 3.23 3.23]]] m, quantity_type=scalar, numeric_type=float64),
LinearDimension([0. 1. 2.]),
LinearDimension([0. 1.])
)

Example 6

>>> csdm_obj1 /= cp.ScalarQuantity('3.23 m')
>>> print(csdm_obj1)
CSDM(
DependentVariable(
[[[1. 1. 1.]
  [1. 1. 1.]]], quantity_type=scalar, numeric_type=float64),
LinearDimension([0. 1. 2.]),
LinearDimension([0. 1.])
)
Additive operations involving two csdm objects¶

The additive operations are supported between two csdm objects only when the two objects have identical sets of Dimension objects and DependentVariable objects with the same dimensionality. For examples,

Example 7

>>> csdm1 = cp.as_csdm(np.ones((2,3)), unit='m/s')
>>> csdm2 = cp.as_csdm(np.ones((2,3)), unit='cm/s')
>>> csdm_obj = csdm1 + csdm2
>>> print(csdm_obj)
CSDM(
DependentVariable(
[[[1.01 1.01 1.01]
  [1.01 1.01 1.01]]] m / s, quantity_type=scalar, numeric_type=float64),
LinearDimension([0. 1. 2.]),
LinearDimension([0. 1.])
)

An exception will be raised if the DependentVariable objects of the two csdm objects have different dimensionality.

Example 8

>>> csdm1 = cp.as_csdm(np.ones((2,3)), unit='m/s')
>>> csdm2 = cp.as_csdm(np.ones((2,3)))
>>> csdm_obj = csdm1 + csdm2 
Exception: Cannot operate on dependent variables with physical types: speed and dimensionless.

Similarly, an exception will be raised if the dimension objects of the two csdm objects are different.

Example 9

>>> csdm1 = cp.as_csdm(np.ones((2,3)), unit='m/s')
>>> csdm1.dimensions[1] = cp.MonotonicDimension(coordinates=['1 ms', '1 s'])
>>> csdm2 = cp.as_csdm(np.ones((2,3)), unit='cm/s')
>>> csdm_obj = csdm1 + csdm2 
Exception: Cannot operate on CSDM objects with different dimensions.

Basic Slicing and Indexing¶

The CSDM objects support NumPy basic slicing and indexing and follow the same rules as the NumPy array. Consider the following 3D{1} csdm object.

>>> csdm1 = cp.as_csdm(np.zeros((5, 10, 20)), unit='s')
>>> csdm1.dimensions[0] = cp.as_dimension(np.arange(20)*0.5+4.3, unit='kg')
>>> csdm1.dimensions[1] = cp.as_dimension([1, 2, 3, 5, 7, 11, 13, 17, 19, 23], unit='mm')
>>> csdm1.dimensions[2] = cp.LabeledDimension(labels=list('abcde'))
>>> print(csdm1.shape)
(20, 10, 5)
>>> print(csdm1.dimensions)
[LinearDimension(count=20, increment=0.5 kg, coordinates_offset=4.3 kg, quantity_name=mass),
MonotonicDimension(coordinates=[ 1.  2.  3.  5.  7. 11. 13. 17. 19. 23.] mm, quantity_name=length, reciprocal={'quantity_name': 'wavenumber'}),
LabeledDimension(labels=['a', 'b', 'c', 'd', 'e'])]

The above object csdm1 has three dimensions, each with different dimensionality and dimension type. To retrieve a sub-grid of this 3D{1} dataset, use the NumPy indexing scheme.

Example 10

>>> sub_csdm = csdm1[0]
>>> print(sub_csdm.shape)
(10, 5)
>>> print(sub_csdm.dimensions)
[MonotonicDimension(coordinates=[ 1.  2.  3.  5.  7. 11. 13. 17. 19. 23.] mm, quantity_name=length, reciprocal={'quantity_name': 'wavenumber'}),
LabeledDimension(labels=['a', 'b', 'c', 'd', 'e'])]

The above example returns a 2D{1} cross-section of the 3D{1} datasets corresponding to the index 0 along the first dimension of the csdm1 object as a sub_csdm csdm object. The two dimensions in sub_csdm are the MonotonicDimension and LabeledDimension.

Example 11

>>> sub_csdm = csdm1[::5, 2::2, :]
>>> print(sub_csdm.shape)
(4, 4, 5)
>>> print(sub_csdm.dimensions)
[LinearDimension(count=4, increment=2.5 kg, coordinates_offset=4.3 kg, quantity_name=mass),
MonotonicDimension(coordinates=[ 3.  7. 13. 19.] mm, quantity_name=length, reciprocal={'quantity_name': 'wavenumber'}),
LabeledDimension(labels=['a', 'b', 'c', 'd', 'e'])]

The above example returns a 3D{1} dataset, sub_csdm, which contains a sub-grid of the 3D{1} datasets from csdm1. In sub_csdm, the first dimension is a sub-grid of the first dimension from the csdm1 object, where only every fifth grid point is selected. Similarly, the second dimension of the sub_csdm object is sampled from the second dimension of the csdm1 object, where every second grid point is selected, starting with the entry at the grid index two. The third dimension of the sub_csdm object is the same as the third object of the csdm1 object. The values of the corresponding linear, monotonic, and labeled dimensions are adjusted accordingly. For example, notice the value of the count and increment attributes of the linear dimension in sub_csdm object.

Example 12

>>> sub_csdm = csdm1[::5, 2::2, -3::-1]
>>> print(sub_csdm.shape)
(4, 4, 3)
>>> print(sub_csdm.dimensions)
[LinearDimension(count=4, increment=2.5 kg, coordinates_offset=4.3 kg, quantity_name=mass),
MonotonicDimension(coordinates=[ 3.  7. 13. 19.] mm, quantity_name=length, reciprocal={'quantity_name': 'wavenumber'}),
LabeledDimension(labels=['c', 'b', 'a'])]

The above example is similar to the previous examples, except the third dimension indexed in reversed starting at the third index from the end.

Support for Numpy methods¶

In most cases, the csdm object may be used as if it were a NumPy array. See the list of all supported Supported NumPy functions.

Method that only operate on dimensionless dependent variables¶

Example 13

>>> csdm_obj1 = cp.as_csdm(10**(np.arange(10)/10))
>>> new_csdm1 = np.log10(csdm_obj1)
>>> print(new_csdm1)
CSDM(
DependentVariable(
[[0.  0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9]], quantity_type=scalar, numeric_type=float64),
LinearDimension([0. 1. 2. 3. 4. 5. 6. 7. 8. 9.])
)

Example 14

>>> new_csdm2 = np.cos(2*np.pi*new_csdm1)
>>> print(new_csdm2)
CSDM(
DependentVariable(
[[ 1.          0.80901699  0.30901699 -0.30901699 -0.80901699 -1.
  -0.80901699 -0.30901699  0.30901699  0.80901699]], quantity_type=scalar, numeric_type=float64),
LinearDimension([0. 1. 2. 3. 4. 5. 6. 7. 8. 9.])
)

Example 15

>>> new_csdm2 = np.exp(new_csdm1 * cp.ScalarQuantity('K')) 
ValueError: Cannot apply `exp` to quantity with physical type `temperature`.

An exception is raised for csdm object with non-dimensionless dependent variables.

Method that are independent of the dependent variable dimensionality¶

Example 16

>>> new_csdm2 = np.square(new_csdm1 * cp.ScalarQuantity('K'))
>>> print(new_csdm2)
CSDM(
DependentVariable(
[[0.   0.01 0.04 0.09 0.16 0.25 0.36 0.49 0.64 0.81]] K2, quantity_type=scalar, numeric_type=float64),
LinearDimension([0. 1. 2. 3. 4. 5. 6. 7. 8. 9.])
)

Example 17

>>> new_csdm1 = np.sqrt(new_csdm2)
>>> print(new_csdm1)
CSDM(
DependentVariable(
[[0.  0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9]] K, quantity_type=scalar, numeric_type=float64),
LinearDimension([0. 1. 2. 3. 4. 5. 6. 7. 8. 9.])
)
Dimension reduction methods¶

Example 18

>>> csdm1 = cp.as_csdm(np.ones((10,20,30)), unit='”G')
>>> csdm1.shape
(30, 20, 10)
>>> new = np.sum(csdm1, axis=1)
>>> new.shape
(30, 10)
>>> print(new.dimensions)
[LinearDimension(count=30, increment=1.0),
LinearDimension(count=10, increment=1.0)]

Example 19

>>> csdm1 = cp.as_csdm(np.ones((10,20,30)), unit='”G')
>>> csdm1.shape
(30, 20, 10)
>>> new = np.sum(csdm1, axis=(1, 2))
>>> new.shape
(30,)
>>> print(new.dimensions)
[LinearDimension(count=30, increment=1.0)]

Example 20

>>> minimum = np.min(new_csdm1)
>>> print(minimum)
0.0 K
>>> np.min(new_csdm1) == new_csdm1.min()
True

Note

See the list of all supported Supported NumPy functions.

Plotting CSDM object with matplotlib¶

As you may have noticed by now, a CSDM object holds basic metadata such as the label, unit, and physical quantity of the dimensions and dependent-variables, which is enough to visualize the CSDM datasets on proper coordinate axes. In the following section, we illustrate how you may use the CSDM object with the matplotlib plotting library.

When plotting CSDM objects with matplotlib, we make use of the CSDM object’s metadata to produce a matplotlib Axes object with basic formattings, such as the coordinate axes label, dependent variable labels, and legends. You may still additionally customize your figures. Please refer to the matplotlib documentation for further details.

To enable plotting CSDM objects with matplotlib, add a projection="csdm" to the matplotlib’s Axes instance, as follows,

ax = plt.subplot(projection="csdm")
# now add the matplotlib plotting functions to this axes.
# ax.plot(csdm_object) or
# ax.imshow(csdm_object) ... etc

See the following examples.

1D CSDM objects with plot()|scatter()¶

1D{1} datasets¶

# -*- coding: utf-8 -*-
import matplotlib.pyplot as plt
import numpy as np

import csdmpy as cp

# Create a test 1D{1} dataset. ================================================

# Step-1: Create dimension objects.
x = cp.as_dimension(np.arange(10) * 0.1 + 15, unit="s", label="t1")

# Step-2: Create dependent variable objects.
y = cp.as_dependent_variable(np.random.rand(10), unit="cm", name="test-0")

# Step-3: Create the CSDM object with Dimension and Dependent variable objects.
csdm = cp.CSDM(dimensions=[x], dependent_variables=[y])


# Plot ========================================================================
plt.figure(figsize=(5, 3.5))
# create the axes with `projection="csdm"`
ax = plt.subplot(projection="csdm")
# use matplotlib plot function with csdm object.
ax.plot(csdm)
plt.tight_layout()
plt.show()

(Source code, png, hires.png, pdf)

_images/oneD_plot_00_00.png
# Scatter =====================================================================
plt.figure(figsize=(5, 3.5))
# create the axes with `projection="csdm"`
ax = plt.subplot(projection="csdm")
# use matplotlib plot function with csdm object.
ax.scatter(csdm, marker="x", color="red")
plt.tight_layout()
plt.show()

(png, hires.png, pdf)

_images/oneD_plot_01_00.png

1D{1, 1, 
} datasets¶

Plotting on the same Axes¶

When multiple single-component dependent variables are present within the CSDM object, the data from all dependent-variables is plotted on the same axes. The name of each dependent variable is displayed within the legend.

Plotting on separate Axes¶

To plot the data from individual dependent variables onto separate axes, use the split() method to first split the CSDM object with n dependent variables into n CSDM objects with single dependent variables, and then plot them separately.

# -*- coding: utf-8 -*-
import matplotlib.pyplot as plt
import numpy as np

import csdmpy as cp

# Create a test 1D{1, 1, 1, 1, 1} dataset. ====================================

# Step-1: Create dimension objects.
x = cp.as_dimension(np.arange(40) * 0.5 - 10, unit="”m", label="x")

# Step-2: Create dependent variable objects.
units = ["cm", "s", "m/s", ""]
y = [
    cp.as_dependent_variable(np.random.rand(40) + 10, unit=units[i], name=f"test-{i}")
    for i in range(4)
]

# Step-3: Create the CSDM object with Dimension and Dependent variable objects.
csdm = cp.CSDM(dimensions=[x], dependent_variables=y)


# Plot ========================================================================
plt.figure(figsize=(5, 3.5))
# create the axes with `projection="csdm"`
ax = plt.subplot(projection="csdm")
# use matplotlib plot function with csdm object.
ax.plot(csdm)
plt.title("Data plotted on the same figure")
plt.tight_layout()
plt.show()

(Source code, png, hires.png, pdf)

_images/oneD111_plot_00_00.png
# The plot on separate axes ===================================================

# Split the CSDM object into multiple single dependent-variable CSDM objects.
sub_type = csdm.split()

# create the axes with `projection="csdm"`
_, ax = plt.subplots(2, 2, figsize=(8, 6), subplot_kw={"projection": "csdm"})
# now use matplotlib plot function with csdm object.
ax[0, 0].plot(sub_type[0])
ax[0, 1].plot(sub_type[1])
ax[1, 0].plot(sub_type[2])
ax[1, 1].plot(sub_type[3])
plt.title("Data plotted separately")
plt.tight_layout()
plt.show()

(png, hires.png, pdf)

_images/oneD111_plot_01_00.png

2D CSDM objects with imshow()|contour()|contourf()¶

2D{1} datasets¶

# -*- coding: utf-8 -*-
import matplotlib.pyplot as plt
import numpy as np

import csdmpy as cp

# Create a test 2D{1} dataset. ================================================

# Step-1: Create dimension objects.
x1 = cp.as_dimension(np.arange(10) * 0.1 + 15, unit="s", label="t1")
x2 = cp.as_dimension(np.arange(10) * 12.5, unit="s", label="t2")

# Step-2: Create dependent variable objects.
y = cp.as_dependent_variable(np.diag(np.ones(10)), name="body-diagonal")

# Step-3: Create the CSDM object with Dimension and Dependent variable objects.
csdm = cp.CSDM(dimensions=[x1, x2], dependent_variables=[y])

# Plot imshow =================================================================
plt.figure(figsize=(5, 3.5))
# create the axes with `projection="csdm"`
ax = plt.subplot(projection="csdm")
# use matplotlib imshow function with csdm object.
ax.imshow(csdm, origin="upper", aspect="auto")
plt.tight_layout()
plt.show()

(Source code, png, hires.png, pdf)

_images/twoD_plot_00_00.png
# Plot contour ================================================================
plt.figure(figsize=(5, 3.5))
# create the axes with `projection="csdm"`
ax = plt.subplot(projection="csdm")
# use matplotlib contour function with csdm object.
ax.contour(csdm)
plt.tight_layout()
plt.show()

(png, hires.png, pdf)

_images/twoD_plot_01_00.png

2D{1, 1, ..} datasets¶

Plotting on the same Axes¶

When multiple single-component dependent variables are present within the CSDM object, the data from all dependent-variables is plotted on the same axes. The name of each dependent variable is displayed along the color bar.

# -*- coding: utf-8 -*-
import matplotlib.pyplot as plt
import numpy as np

import csdmpy as cp

# Create a test 2D{1} dataset. ================================================

# Step-1: Create dimension objects.
x1 = cp.as_dimension(np.arange(10) * 0.1 + 15, unit="s", label="t1")
x2 = cp.as_dimension(np.arange(10) * 12.5, unit="s", label="t2")

# Step-2: Create dependent variable objects.
y1 = cp.as_dependent_variable(np.diag(np.ones(10)), name="body-diagonal")
y2 = cp.as_dependent_variable(np.diag(np.ones(5), 5), name="off-body-diagonal")

# Step-3: Create the CSDM object with Dimension and Dependent variable objects.
csdm = cp.CSDM(dimensions=[x1, x2], dependent_variables=[y1, y2])

# Plot imshow =================================================================
plt.figure(figsize=(5, 3.5))
# create the axes with `projection="csdm"`
ax = plt.subplot(projection="csdm")
# use matplotlib imshow function with csdm object.
ax.imshow(csdm, origin="upper", aspect="auto", cmaps=["Blues", "Reds"], alpha=0.5)
plt.tight_layout()
plt.show()

(Source code, png, hires.png, pdf)

_images/twoD111_plot_00_00.png
# Plot contourf ===============================================================
plt.figure(figsize=(5, 3.5))
# create the axes with `projection="csdm"`
ax = plt.subplot(projection="csdm")
# use matplotlib contourf function with csdm object.
ax.contourf(csdm, cmaps=["Blues", "Reds"], alpha=0.5)
plt.tight_layout()
plt.show()

(png, hires.png, pdf)

_images/twoD111_plot_01_00.png
Plotting on separate Axes¶

To plot the data from individual dependent variables onto separate axes, use the split() method to first split the CSDM object with n dependent variables into n CSDM objects with single dependent variables, and then plot them separately.

Tutorial examples on generating CSDM datasets¶

1D Datasets¶

1D{1} datasets¶

In the following example, we illustrate how one can covert a Numpy array into a CSDM object. Start by importing the Numpy and csdmpy libraries.

import matplotlib.pyplot as plt
import numpy as np

import csdmpy as cp

Let’s generate a 1D NumPy array of as our dataset.

test_data = np.zeros(500)
test_data[250] = 1

Create a DependentVariable object from the numpy object

dv = cp.as_dependent_variable(test_data, unit="%")

Create the corresponding dimensions object. Here, we create a LinearDimension object

dim = cp.LinearDimension(count=500, increment="1 m")

Creating the CSDM object.

csdm_object = cp.CSDM(dependent_variables=[dv], dimensions=[dim])

Plot of the dataset.

plt.figure(figsize=(5, 3.5))
ax = plt.gca(projection="csdm")
ax.plot(csdm_object)
plt.tight_layout()
plt.show()
plot 0 1D

To serialize the file, use the save method.

csdm_object.save("1D_1_dataset.csdf")

Total running time of the script: ( 0 minutes 0.127 seconds)

Gallery generated by Sphinx-Gallery

1D{1,1} datasets¶

In the following example, we illustrate how one can covert a Numpy array into a CSDM object. Start by importing the Numpy and csdmpy libraries.

import matplotlib.pyplot as plt
import numpy as np

import csdmpy as cp

Let’s generate two 1D NumPy arrays as the dependent variables of as our dataset.

test_data1 = np.zeros(500)
test_data1[250] = 1

test_data2 = np.zeros(500)
test_data2[150] = 1

Create the two DependentVariable objects from the numpy objects.

dv1 = cp.as_dependent_variable(test_data1, unit="%")
dv2 = cp.as_dependent_variable(test_data2, unit="J")

Create the corresponding dimension object. Here, we create a LinearDimension object.

dim = cp.LinearDimension(count=500, increment="43 cm", coordinates_offset="-0.1 km")

Creating the CSDM object.

csdm_object = cp.CSDM(dependent_variables=[dv1, dv2], dimensions=[dim])

Plot of the dataset.

plt.figure(figsize=(5, 3.5))
ax = plt.gca(projection="csdm")
ax.plot(csdm_object)
plt.tight_layout()
plt.show()
plot 1 1D

To serialize the file, use the save method.

csdm_object.save("1D_11_dataset.csdf")

Total running time of the script: ( 0 minutes 0.165 seconds)

Gallery generated by Sphinx-Gallery

2D Datasets¶

2D{1} dataset with two linear dimensions¶

In the following example, we illustrate how one can covert a Numpy array into a CSDM object. Start by importing the Numpy and csdmpy libraries.

import matplotlib.pyplot as plt
import numpy as np

import csdmpy as cp

Let’s generate a 2D NumPy array of random numbers as our dataset.

data = np.random.rand(65536).reshape(256, 256)

Create the DependentVariable object from the numpy object.

dv = cp.as_dependent_variable(data, unit="Pa")

Create the two Dimension objects

d0 = cp.LinearDimension(
    count=256, increment="15.23 ”s", coordinates_offset="-1.95 ms", label="t1"
)

d1 = cp.LinearDimension(
    count=256, increment="10 cm", coordinates_offset="-5 m", label="x2"
)

Here, d0 and d1 are LinearDimension objects with 256 points and 15.23 ”s and 10 cm as increment.

Creating the CSDM object.

csdm_object = cp.CSDM(dependent_variables=[dv], dimensions=[d0, d1])
print(csdm_object.dimensions)

Out:

[LinearDimension(count=256, increment=15.23 ”s, coordinates_offset=-1.95 ms, quantity_name=time, label=t1, reciprocal={'quantity_name': 'frequency'}),
LinearDimension(count=256, increment=10.0 cm, coordinates_offset=-5.0 m, quantity_name=length, label=x2, reciprocal={'quantity_name': 'wavenumber'})]

Plot of the dataset.

plt.figure(figsize=(5, 3.5))
ax = plt.gca(projection="csdm")
cb = ax.imshow(csdm_object, aspect="auto")
plt.colorbar(cb)
plt.tight_layout()
plt.show()
plot 0 2D

To serialize the file, use the save method.

csdm_object.save("2D_1_dataset.csdf")

Total running time of the script: ( 0 minutes 0.226 seconds)

Gallery generated by Sphinx-Gallery

2D{1} dataset with linear and monotonic dimensions¶

In the following example, we illustrate how one can covert a Numpy array into a CSDM object. Start by importing the Numpy and csdmpy libraries.

import matplotlib.pyplot as plt
import numpy as np

import csdmpy as cp

Let’s generate a 2D NumPy array of random numbers as our dataset.

data = np.random.rand(8192).reshape(32, 256)

Create the DependentVariable object from the numpy object.

dv = cp.as_dependent_variable(data, unit="J/(mol K)")

Create the two Dimension objects.

d0 = cp.LinearDimension(
    count=256, increment="15.23 ”s", coordinates_offset="-1.95 ms", label="t1"
)

Here, d0 is a LinearDimension with 256 points and 15.23 ”s increment. You may similarly set the second dimension as a LinearDimension, however, in this example, let’s set it as a MonotonicDimension.

array = 10 ** (np.arange(32) / 8)
d1 = cp.as_dimension(array, unit="”s", label="t2")

The variable array is a NumPy array that is uniformly sampled on a log scale. To convert this array into a Dimension object, we use the as_dimension() method.

Creating the CSDM object.

csdm_object = cp.CSDM(dependent_variables=[dv], dimensions=[d0, d1])
print(csdm_object.dimensions)

Out:

[LinearDimension(count=256, increment=15.23 ”s, coordinates_offset=-1.95 ms, quantity_name=time, label=t1, reciprocal={'quantity_name': 'frequency'}),
MonotonicDimension(coordinates=[1.00000000e+00 1.33352143e+00 1.77827941e+00 2.37137371e+00
 3.16227766e+00 4.21696503e+00 5.62341325e+00 7.49894209e+00
 1.00000000e+01 1.33352143e+01 1.77827941e+01 2.37137371e+01
 3.16227766e+01 4.21696503e+01 5.62341325e+01 7.49894209e+01
 1.00000000e+02 1.33352143e+02 1.77827941e+02 2.37137371e+02
 3.16227766e+02 4.21696503e+02 5.62341325e+02 7.49894209e+02
 1.00000000e+03 1.33352143e+03 1.77827941e+03 2.37137371e+03
 3.16227766e+03 4.21696503e+03 5.62341325e+03 7.49894209e+03] us, quantity_name=time, label=t2, reciprocal={'quantity_name': 'frequency'})]

Plot of the dataset.

plt.figure(figsize=(5, 3.5))
cp.plot(csdm_object)
plt.tight_layout()
plt.show()
plot 1 2D

To serialize the file, use the save method.

csdm_object.save("2D_1_dataset.csdf")

Total running time of the script: ( 0 minutes 0.196 seconds)

Gallery generated by Sphinx-Gallery

Gallery generated by Sphinx-Gallery

An emoji 😁 example¶

Let’s make use of what we learned so far and create a simple 1D{1} dataset. To make it interesting, let’s create an emoji dataset.

Start by importing the csdmpy package.

>>> import csdmpy as cp

Create a labeled dimension. Here, we make use of python dictionary.

>>> x = dict(type='labeled', labels=['🍈','🍉','🍋','🍌','đŸ„‘','🍍'])

The above python dictionary contains two keys. The type key identifies the dimension as a labeled dimension while the labels key holds an array of labels. In this example, the labels are emojis. Add this dictionary to the list of dimensions.

Next, create a dependent variable. Similarly, set up a python dictionary corresponding to the dependent variable object.

>>> y = dict(type='internal', numeric_type='float32', quantity_type='scalar',
...     components=[[0.5, 0.25, 1, 2, 1, 0.25]])

Here, the python dictionary contains type, numeric_type, and components key. The value of the components key holds an array of data values corresponding to the labels from the labeled dimension.

Create a csdm object from the dimensions and dependent variables and we have a 😂 dataset


>>> fun_data = cp.CSDM(
...     dimensions=[x],
...     dependent_variables=[y],
...     description="An emoji dataset"
... )
>>> print(fun_data.data_structure)
{
  "csdm": {
    "version": "1.0",
    "description": "An emoji dataset",
    "dimensions": [
      {
        "type": "labeled",
        "labels": [
          "🍈",
          "🍉",
          "🍋",
          "🍌",
          "đŸ„‘",
          "🍍"
        ]
      }
    ],
    "dependent_variables": [
      {
        "type": "internal",
        "numeric_type": "float32",
        "quantity_type": "scalar",
        "components": [
          [
            "0.5, 0.25, ..., 1.0, 0.25"
          ]
        ]
      }
    ]
  }
}

To serialize this file, use the save() method of the fun_data instance as

>>> fun_data.dependent_variables[0].encoding = 'base64'
>>> fun_data.save('my_file.csdf')

In the above code, the components from the dependent_variables attribute at index zero, are encoded as base64 strings before serializing to the my_file.csdf file.

You may also save the components as a binary file, in which case, the file is serialized with a .csdfe file extension.

>>> fun_data.dependent_variables[0].encoding = 'raw'
>>> fun_data.save('my_file_raw.csdfe')

API-Reference¶

csdmpy¶

The csdmpy is a python package for importing and exporting files serialized with the core scientific dataset model file-format. The package supports a \(p\)-component dependent variable, \(\mathbf{U} \equiv \{\mathbf{U}_{0}, \ldots,\mathbf{U}_{q}, \ldots,\mathbf{U}_{p-1} \}\), which is discretely sampled at \(M\) unique points in a \(d\)-dimensional space \((\mathbf{X}_0, \ldots \mathbf{X}_k, \ldots \mathbf{X}_{d-1})\). Besides, the package also supports multiple dependent variables, \(\mathbf{U}_i\), sharing the same \(d\)-dimensional space.

Here, every dataset is an instance of the CSDM class, which holds a list of dimensions and dependent variables. Every dimension, \(\mathbf{X}_k\), is an instance of the Dimension class, while every dependent variable, \(\mathbf{U}_i\), is an instance of the DependentVariable class.

Methods¶

Methods Summary

parse_dict

Parse a CSDM compliant python dictionary and return a CSDM object.

load

Loads a .csdf/.csdfe file and returns an instance of the CSDM class.

loads

Loads a JSON serialized string as a CSDM object.

new

Creates a new instance of the CSDM class containing a 0D{0} dataset.

as_dimension

Generate and return a Dimension object from a 1D numpy array.

as_dependent_variable

Generate and return a DependentVariable object from a 1D or 2D numpy array.

as_csdm

Generate and return a view of the nD numpy array as a csdm object.

plot

A supplementary function for plotting basic 1D and 2D datasets only.

Method Documentation

csdmpy.parse_dict(dictionary)[source]¶

Parse a CSDM compliant python dictionary and return a CSDM object.

Parameters

dictionary – A CSDM compliant python dictionary.

csdmpy.load(filename=None, application=False, verbose=False)[source]¶

Loads a .csdf/.csdfe file and returns an instance of the CSDM class.

The file must be a JSON serialization of the CSD Model.

Example

>>> data1 = cp.load('local_address/file.csdf') 
>>> data2 = cp.load('url_address/file.csdf') 
Parameters
  • filename (str) – A local or a remote address to the .csdf or `.csdfe file.

  • application (bool) – If true, the application metadata from application that last serialized the file will be imported. Default is False.

  • verbose (bool) – If the filename is a URL, this option will show the progress bar for the file download status, when True.

Returns

A CSDM instance.

csdmpy.loads(string)[source]¶

Loads a JSON serialized string as a CSDM object.

Parameters

string – A JSON serialized CSDM string.

Returns

A CSDM object.

Example

>>> object_from_string = cp.loads(cp.new('A test dump').dumps())
>>> print(object_from_string.data_structure)  
{
  "csdm": {
    "version": "1.0",
    "timestamp": "2019-10-21T20:33:17Z",
    "description": "A test dump",
    "dimensions": [],
    "dependent_variables": []
  }
}
csdmpy.new(description='')[source]¶

Creates a new instance of the CSDM class containing a 0D{0} dataset.

Parameters

description (str) – A string describing the csdm object. This is optional.

Example

>>> import csdmpy as cp
>>> emptydata = cp.new(description='Testing Testing 1 2 3')
>>> print(emptydata.data_structure)
{
  "csdm": {
    "version": "1.0",
    "description": "Testing Testing 1 2 3"
  }
}
Returns

A CSDM instance.

csdmpy.as_csdm(array, unit='', quantity_type='scalar')[source]¶

Generate and return a view of the nD numpy array as a csdm object. The nD array is the dependent variable of the csdm object of the given quantity type. The shape of the nD array is used to generate Dimension object of linear subtype.

Parameters
  • array – The nD numpy array.

  • unit – The unit for the dependent variable. The default is empty string.

  • quantity_type – The quantity type of the dependent variable.

Example

>>> array = np.arange(30).reshape(3, 10)
>>> csdm_obj = cp.as_csdm(array)
>>> print(csdm_obj)
CSDM(
DependentVariable(
[[[ 0  1  2  3  4  5  6  7  8  9]
  [10 11 12 13 14 15 16 17 18 19]
  [20 21 22 23 24 25 26 27 28 29]]], quantity_type=scalar, numeric_type=int64),
LinearDimension([0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]),
LinearDimension([0. 1. 2.])
)
csdmpy.as_dimension(array, unit='', type=None, **kwargs)[source]¶

Generate and return a Dimension object from a 1D numpy array.

Parameters
  • array – A 1D numpy array.

  • unit – The unit of the coordinates along the dimension.

  • type – The dimension type. Valid values are linear, monotonic, labeled, or None. If the value is None, let us decide. The default value is None.

  • kwargs – Additional keyword arguments from the Dimension class.

Example

>>> array = np.arange(15)*0.5
>>> dim_object = cp.as_dimension(array)
>>> print(dim_object)
LinearDimension([0.  0.5 1.  1.5 2.  2.5 3.  3.5 4.  4.5 5.  5.5 6.  6.5 7. ])
>>> array = ['The', 'great', 'circle']
>>> dim_object = cp.as_dimension(array, label='in the sky')
>>> print(dim_object)
LabeledDimension(['The' 'great' 'circle'])
csdmpy.as_dependent_variable(array, **kwargs)[source]¶

Generate and return a DependentVariable object from a 1D or 2D numpy array.

Parameters
  • array – A 1D or 2D numpy array.

  • kwargs – Additional keyword arguments from the DependentVariable class.

Example

>>> array = np.arange(1e4).astype(np.complex128)
>>> dim_object = cp.as_dependent_variable(array)
>>> print(dim_object)
DependentVariable(
[[0.000e+00+0.j 1.000e+00+0.j 2.000e+00+0.j ... 9.997e+03+0.j
  9.998e+03+0.j 9.999e+03+0.j]], quantity_type=scalar, numeric_type=complex128)
csdmpy.plot(csdm_object, reverse_axis=None, range=None, **kwargs)[source]¶

A supplementary function for plotting basic 1D and 2D datasets only.

Parameters
  • csdm_object – The CSDM object.

  • reverse_axis – An ordered array of boolean specifying which dimensions will be displayed on a reverse axis.

  • range – A list of minimum and maxmim coordinates along the dimensions. The range along each dimension is given as [min, max]

  • kwargs –

    Additional keyword arguments are used in matplotlib plotting functions. We implement the following matplotlib methods for the one and two-dimensional datasets.

    • The 1D{1} scalar dataset use the plt.plot() method.

    • The 1D{2} vector dataset use the plt.quiver() method.

    • The 2D{1} scalar dataset use the plt.imshow() method if the two dimensions have a linear subtype. If any one of the dimension is monotonic, plt.NonUniformImage() method is used instead.

    • The 2D{2} vector dataset use the plt.quiver() method.

    • The 2D{3} pixel dataset use the plt.imshow(), assuming the pixel dataset as an RGB image.

Returns

A matplotlib figure instance.

Example

>>> cp.plot(data_object) 

CSDM¶

class csdmpy.CSDM(filename='', version=None, description='', **kwargs)[source]¶

Bases: object

Create an instance of a CSDM class.

This class is based on the root CSDM object of the core scientific dataset (CSD) model. The class is a composition of the DependentVariable and Dimension instances, where an instance of the DependentVariable class describes a \(p\)-component dependent variable, and an instance of the Dimension class describes a dimension of a \(d\)-dimensional space. Additional attributes of this class are listed below.

Attributes Summary

version

Version number of the CSD model on file.

description

Description of the dataset.

read_only

If True, the data-file is serialized as read only, otherwise, False.

tags

List of tags attached to the dataset.

timestamp

Timestamp from when the file was last serialized.

geographic_coordinate

Geographic coordinate, if present, from where the file was last serialized.

dimensions

Tuple of the Dimension instances.

x

Alias for the dimensions attribute.

dependent_variables

Tuple of the DependentVariable instances.

y

Alias for the dependent_variables attribute.

application

Application metadata dictionary of the CSDM object.

data_structure

Json serialized string describing the CSDM class instance.

filename

Local file address of the current file.

Methods summary

add_dimension

Add a new Dimension instance to the CSDM object.

add_x

Alias to the add_dimension method.

add_dependent_variable

Add a new DependentVariable instance to the CSDM instance.

add_y

Alias for add_dependent_variable method.

dict

Serialize the CSDM instance as a python dictionary.

to_dict

Alias to the dict() method of the class.

dumps

Serialize the CSDM instance as a JSON data-exchange string.

astype

Return a copy of the CSDM object by converting the numeric type of each dependent variables components to the given value.

save

Serialize the CSDM instance as a JSON data-exchange file.

copy

Create a copy of the current CSDM instance.

split

View of the dependent-variables as individual csdm objects.

Numpy compatible attributes summary

real

Return a csdm object with only the real part of the dependent variable components.

imag

Return a csdm object with only the imaginary part of the dependent variable components.

shape

Return the count along each dimension of the csdm object.

size

Return the size of the dependent_variable components.

T

Return a csdm object with a transpose of the dataset.

Numpy compatible method summary

max

Return a csdm object with the maximum dependent variable component along a given axis.

min

Return a csdm object with the minimum dependent variable component along a given axis.

clip

Clip the dependent variable components between the min and max values.

conj

Return a csdm object with the complex conjugate of all dependent variable components.

round

Return a csdm object by rounding the dependent variable components to the given decimals.

sum

Return a csdm object with the sum of the dependent variable components over a given dimension=axis.

mean

Return a csdm object with the mean of the dependent variable components over a given dimension=axis.

var

Return a csdm object with the variance of the dependent variable components over a given dimension=axis.

std

Return a csdm object with the standard deviation of the dependent variable components over a given dimension=axis.

prod

Return a csdm object with the product of the dependent variable components over a given dimension=axis.

Attributes documentation

version¶

Version number of the CSD model on file.

description¶

Description of the dataset. The default value is an empty string.

Example

>>> print(data.description)
A simulated sine curve.
Returns

A string of UTF-8 allows characters describing the dataset.

Raises

TypeError – When the assigned value is not a string.

read_only¶

If True, the data-file is serialized as read only, otherwise, False.

By default, the CSDM object loads a copy of the .csdf(e) file, irrespective of the value of the read_only attribute. The value of this attribute may be toggled at any time after the file import. When serializing the .csdf(e) file, if the value of the read_only attribute is found True, the file will be serialized as read only.

tags¶

List of tags attached to the dataset.

timestamp¶

Timestamp from when the file was last serialized. Attribute is real only.

The timestamp stamp is a string representation of the Coordinated Universal Time (UTC) formatted according to the iso-8601 standard.

Raises

AttributeError – When the attribute is modified.

geographic_coordinate¶

Geographic coordinate, if present, from where the file was last serialized. This attribute is read-only.

The geographic coordinates correspond to the location where the file was last serialized. If present, the geographic coordinates are described with three attributes, the required latitude and longitude, and an optional altitude.

Raises

AttributeError – When the attribute is modified.

dimensions¶

Tuple of the Dimension instances.

x¶

Alias for the dimensions attribute.

dependent_variables¶

Tuple of the DependentVariable instances.

y¶

Alias for the dependent_variables attribute.

application¶

Application metadata dictionary of the CSDM object.

>>> print(data.application)
{}

By default, the application attribute is an empty dictionary, that is, the application metadata stored by the previous application is ignored upon file import.

The application metadata may, however, be retained with a request via the load() method. This feature may be useful to related applications where application metadata might contain additional information. The attribute may be updated with a python dictionary.

The application attribute is where an application can place its own metadata as a python dictionary object containing application specific metadata, using a reverse domain name notation string as the attribute key, for example,

Example

>>> data.application = {
...     "com.example.myApp" : {
...         "myApp_key": "myApp_metadata"
...      }
... }
>>> print(data.application)
{'com.example.myApp': {'myApp_key': 'myApp_metadata'}}
Returns

Python dictionary object with the application metadata.

data_structure¶

Json serialized string describing the CSDM class instance.

The data_structure attribute is only intended for a quick preview of the dataset. This JSON serialized string from this attribute avoids displaying large datasets. Do not use the value of this attribute to save the data to a file, instead use the save() methods of the instance.

Raises

AttributeError – When modified.

filename¶

Local file address of the current file.

Numpy compatible attributes documentation

real¶

Return a csdm object with only the real part of the dependent variable components.

imag¶

Return a csdm object with only the imaginary part of the dependent variable components.

shape¶

Return the count along each dimension of the csdm object.

size¶

Return the size of the dependent_variable components.

T¶

Return a csdm object with a transpose of the dataset.

Methods documentation

add_dimension(*args, **kwargs)[source]¶

Add a new Dimension instance to the CSDM object.

There are several ways to add a new independent variable. From a python dictionary containing valid keywords.

>>> import csdmpy as cp
>>> datamodel = cp.new()
>>> py_dictionary = {
...     'type': 'linear',
...     'increment': '5 G',
...     'count': 50,
...     'coordinates_offset': '-10 mT'
... }
>>> datamodel.add_dimension(py_dictionary)

Using keyword as the arguments.

>>> datamodel.add_dimension(
...     type = 'linear',
...     increment = '5 G',
...     count = 50,
...     coordinates_offset = '-10 mT'
... )

Using a Dimension class.

>>> var1 = Dimension(type = 'linear',
...                  increment = '5 G',
...                  count = 50,
...                  coordinates_offset = '-10 mT')
>>> datamodel.add_dimension(var1)

Using a subtype class.

>>> var2 = cp.LinearDimension(count = 50,
...                  increment = '5 G',
...                  coordinates_offset = '-10 mT')
>>> datamodel.add_dimension(var2)

From a numpy array.

>>> array = np.arange(50)
>>> dim = cp.as_dimension(array)
>>> datamodel.add_dimension(dim)

In the third and fourth example, the instances, var1 and var2 are added to the datamodel as a reference, i.e., if the instance var1 or var2 is destroyed, the datamodel instance will become corrupt. As a recommendation, always pass a copy of the Dimension instance to the add_dimension() method.

Deprecated since version 0.4: Use cp.CSDM(dimensions=[..]) instead.

add_x(*args, **kwargs)[source]¶

Alias to the add_dimension method.

add_dependent_variable(*args, **kwargs)[source]¶

Add a new DependentVariable instance to the CSDM instance.

There are again several ways to add a new dependent variable instance. From a python dictionary containing valid keywords.

>>> import numpy as np

>>> datamodel = cp.new()

>>> numpy_array = (100*np.random.rand(3,50)).astype(np.uint8)
>>> py_dictionary = {
...     'type': 'internal',
...     'components': numpy_array,
...     'name': 'star',
...     'unit': 'W s',
...     'quantity_name': 'energy',
...     'quantity_type': 'pixel_3'
... }
>>> datamodel.add_dependent_variable(py_dictionary)

From a list of valid keyword arguments.

>>> datamodel.add_dependent_variable(type='internal',
...                                  name='star',
...                                  unit='W s',
...                                  quantity_type='pixel_3',
...                                  components=numpy_array)

From a DependentVariable instance.

>>> from csdmpy import DependentVariable
>>> var1 = DependentVariable(type='internal',
...                          name='star',
...                          unit='W s',
...                          quantity_type='pixel_3',
...                          components=numpy_array)
>>> datamodel.add_dependent_variable(var1)

If passing a DependentVariable instance, as a general recommendation, always pass a copy of the DependentVariable instance to the add_dependent_variable() method.

Deprecated since version 0.4: Use cp.CSDM(dependent_variables=[..]) instead.

add_y(*args, **kwargs)[source]¶

Alias for add_dependent_variable method.

dict(update_timestamp=False, read_only=False)[source]¶

Serialize the CSDM instance as a python dictionary.

Parameters
  • update_timestamp (bool) – If True, timestamp is updated to current time.

  • read_only (bool) – If true, the read_only flag is set true.

Example

>>> data.dict()['csdm']['version']
'1.0'
to_dict(update_timestamp=False, read_only=False)[source]¶

Alias to the dict() method of the class.

dumps(update_timestamp=False, read_only=False, version='1.0', **kwargs)[source]¶

Serialize the CSDM instance as a JSON data-exchange string.

Parameters
  • update_timestamp (bool) – If True, timestamp is updated to current time.

  • read_only (bool) – If true, the file is serialized as read_only.

  • version (str) – The file is serialized with the given CSD model version.

Example

>>> data.dumps()[:63] # first 63 characters
'{"csdm": {"version": "1.0", "timestamp": "1994-11-05T13:15:30Z"'
save(filename='', read_only=False, version='1.0', output_device=None, indent=0)[source]¶

Serialize the CSDM instance as a JSON data-exchange file.

There are two types of file serialization extensions, .csdf and .csdfe. In the CSD model, when every instance of the DependentVariable objects from a CSDM class has an internal subtype, the corresponding CSDM instance is serialized with a .csdf file extension. If any single DependentVariable instance has an external subtype, the CSDM instance is serialized with a .csdfe file extension. The two different file extensions are used to alert the end-user of the possible deserialization error associated with the .csdfe file extensions had the external data file becomes inaccessible.

In csdmpy, however, irrespective of the dependent variable subtypes from the serialized JSON file, by default, all instances of DependentVariable class are treated an internal after import. Therefore, when serialized, the CSDM object should be stored as a .csdf file.

To store a file as a .csdfe file, the user much set the value of the encoding attribute from the dependent variables to raw. In which case, a binary file named filename_i.dat will be generated where \(i\) is the \(i^\text{th}\) dependent variable. The parameter filename is an argument of this method.

Note

Only dependent variables with encoding="raw" will be serialized to a binary file.

Parameters
  • filename (str) – The filename of the serialized file.

  • read_only (bool) – If true, the file is serialized as read_only.

  • version (str) – The file is serialized with the given CSD model version.

  • output_device (object) – Object where the data is written. If provided, the argument filename become irrelevant.

Example

>>> data.save('my_file.csdf')
to_list()[source]¶

Return the dimension coordinates and dependent variable components as a list of numpy arrays. For multiple dependent variables, the components of each dependent variable is appended in the order of the dependent variables.

For example,
  • A 2D{1} will be packed as \([x_{0}, x_{1}, y_{0,0}]\)

  • A 2D{3} will be packed as \([x_{0}, x_{1}, y_{0,0}, y_{0,1}, y_{0,2}]\)

  • A 1D{1,2} will be packed as \([x_{0}, y_{0,0}, y_{1,0}, y_{1,1}]\)

where \(x_i\) represents the \(i^\text{th}\) dimension and \(y_{i,j}\) represents the \(j^\text{th}\) component of the \(i^\text{th}\) dependent variable.

astype(numeric_type)[source]¶

Return a copy of the CSDM object by converting the numeric type of each dependent variables components to the given value.

Parameters

numeric_type – A numpy dtype or a string with a valid numeric type

Example

>>> data_32 = data_64.astype('float32')  
copy()[source]¶

Create a copy of the current CSDM instance.

Returns

A CSDM instance.

Example

>>> data2 = data.copy()
split()[source]¶

View of the dependent-variables as individual csdm objects.

Returns

A list of CSDM objects, each with one dependent variable. The objects are returned as a view.

Example

>>> # data contains two dependent variables
>>> d1, d2 = data.split()  
transpose()[source]¶

Return a transpose of the dependent variable data from the CSDM object.

fft(axis=0)[source]¶

Perform a FFT along the given dimension=axis, for linear dimension assuming Nyquist-shannan relation.

Parameters

axis – The index of the dimension along which the FFT is performed.

The FFT method uses the complex_fft attribute of the Dimension object to decide whether a forward or inverse Fourier transform is performed. If the value of the complex_fft is True, an inverse FFT is performed, otherwise a forward FFT.

For FFT process, this function is equivalent to performing

phase = np.exp(-2j * np.pi * coordinates_offset * reciprocal_coordinates)
x_fft = np.fft.fftshift(np.fft.fft(x)) * phase

over all components for every dependent variable.

Similarly, for inverse FFT process, this function is equivalent to performing

phase = np.exp(2j * np.pi * reciprocal_coordinates_offset * coordinates)
x = np.fft.ifft(np.fft.ifftshift(x_fft * phase))

over all components for every dependent variable.

Returns

A CSDM object with the Fourier Transform data.

Numpy compatible method documentation

max(axis=None)[source]¶

Return a csdm object with the maximum dependent variable component along a given axis.

Parameters

axis – An integer or None or a tuple of m integers cooresponding to the index/indices of dimensions along which the sum of the dependent variable components is performed. If None, the output is the sum over all dimensions per dependent variable.

Returns

A CSDM object with m dimensions removed, or a numpy array when dimension is None.

Example

>>> data.max()
<Quantity 0.95105654>
min(axis=None)[source]¶

Return a csdm object with the minimum dependent variable component along a given axis.

Parameters

axis – An integer or None or a tuple of m integers cooresponding to the index/indices of dimensions along which the sum of the dependent variable components is performed. If None, the output is over all dimensions per dependent variable.

Returns

A CSDM object with m dimensions removed, or a list when axis is None.

clip(min=None, max=None)[source]¶

Clip the dependent variable components between the min and max values.

Parameters
  • min – The minimum clip value.

  • max – The maximum clip value.

Returns

A CSDM object with values clipped between min and max.

conj()[source]¶

Return a csdm object with the complex conjugate of all dependent variable components.

round(decimals=0)[source]¶

Return a csdm object by rounding the dependent variable components to the given decimals.

sum(axis=None)[source]¶

Return a csdm object with the sum of the dependent variable components over a given dimension=axis.

Parameters

axis – An integer or None or a tuple of m integers cooresponding to the index/indices of dimensions along which the sum of the dependent variable components is performed. If None, the output is over all dimensions per dependent variable.

Returns

A CSDM object with m dimensions removed, or a list when axis is None.

mean(axis=None)[source]¶

Return a csdm object with the mean of the dependent variable components over a given dimension=axis.

Parameters

axis – An integer or None or a tuple of m integers cooresponding to the index/indices of dimensions along which the sum of the dependent variable components is performed. If None, the output is over all dimensions per dependent variable.

Returns

A CSDM object with m dimensions removed, or a list when axis is None.

var(axis=None)[source]¶

Return a csdm object with the variance of the dependent variable components over a given dimension=axis.

Parameters

axis – An integer or None or a tuple of m integers cooresponding to the index/indices of dimensions along which the sum of the dependent variable components is performed. If None, the output is over all dimensions per dependent variable.

Returns

A CSDM object with m dimensions removed, or a list when axis is None.

std(axis=None)[source]¶

Return a csdm object with the standard deviation of the dependent variable components over a given dimension=axis.

Parameters

axis – An integer or None or a tuple of m integers cooresponding to the index/indices of dimensions along which the sum of the dependent variable components is performed. If None, the output is over all dimensions per dependent variable.

Returns

A CSDM object with m dimensions removed, or a list when axis is None.

prod(axis=None)[source]¶

Return a csdm object with the product of the dependent variable components over a given dimension=axis.

Parameters

axis – An integer or None or a tuple of m integers cooresponding to the index/indices of dimensions along which the product of the dependent variable components is performed. If None, the output is over all dimensions per dependent variable.

Returns

A CSDM object with m dimensions removed, or a list when axis is None.

Dimension¶

LinearDimension¶

class csdmpy.LinearDimension(count, increment, complex_fft=False, **kwargs)[source]¶

Bases: csdmpy.dimension.quantitative.BaseQuantitativeDimension

LinearDimension class.

Generates an object representing a physical dimension whose coordinates are uniformly sampled along a grid dimension. See LinearDimension for details.

property complex_fft¶

If True, orders the coordinates according to FFT output order.

property coordinates¶

Return the coordinates along the dimensions.

property count¶

Total number of points along the linear dimension.

dict()[source]¶

Return the LinearDimension as a python dictionary.

get_nmr_reference_offset()[source]¶

Calculate reference offset for NMR datsets.

property increment¶

Increment along the linear dimension.

reciprocal_coordinates()[source]¶

Return reciprocal coordinates assuming Nyquist-shannan theorem.

reciprocal_increment()[source]¶

Return reciprocal increment assuming Nyquist-shannan theorem.

property type¶

Return the type of the dimension.

MonotonicDimension¶

class csdmpy.MonotonicDimension(coordinates, **kwargs)[source]¶

Bases: csdmpy.dimension.quantitative.BaseQuantitativeDimension

Monotonic grid dimension.

Generates an object representing a physical dimension whose coordinates are monotonically sampled along a grid dimension. See MonotonicDimension for details.

property coordinates¶

Return the coordinates along the dimensions.

property coordinates_offset¶

Value at index zero, \(c_k\), along the dimension.

property count¶

Total number of points along the monotonic dimension.

dict()[source]¶

Return the MonotonicDimension as a python dictionary.

property type¶

Return the type of the dimension.

LabeledDimension¶

class csdmpy.LabeledDimension(labels, label='', description='', application={}, **kwargs)[source]¶

Bases: csdmpy.dimension.base.BaseDimension

A labeled dimension.

Generates an object representing a non-physical dimension whose coordinates are labels. See LabeledDimension for details.

property coordinates¶

Return the coordinates along the dimensions. This is an alias for labels.

property count¶

Total number of labels along the dimension.

dict()[source]¶

Return the LabeledDimension as a python dictionary.

is_quantitative()[source]¶

Return True, if the dimension is quantitative, otherwise False. :returns: A Boolean.

property labels¶

Return a list of labels along the dimension.

property type¶

Return the type of the dimension.

class csdmpy.Dimension(*args, **kwargs)[source]¶

Bases: object

Dimension class.

An instance of this class describes a dimension of a multi-dimensional system. In version 1.0 of the CSD model, there are three subtypes of the Dimension class:

Creating an instance of a dimension object

There are two ways of creating a new instance of a Dimension class.

From a python dictionary containing valid keywords.

>>> from csdmpy import Dimension
>>> dimension_dictionary = {
...     'type': 'linear',
...     'description': 'test',
...     'increment': '5 G',
...     'count': 10,
...     'coordinates_offset': '10 mT',
...     'origin_offset': '10 T'
... }
>>> x = Dimension(dimension_dictionary)

Here, dimension_dictionary is the python dictionary.

From valid keyword arguments.

>>> x = Dimension(type = 'linear',
...               description = 'test',
...               increment = '5 G',
...               count = 10,
...               coordinates_offset = '10 mT',
...               origin_offset = '10 T')

Attributes Summary

type

The dimension subtype.

description

Brief description of the dimension object.

application

Application metadata dictionary of the dimension object.

coordinates

Coordinates, \({\bf X}_k\), along the dimension.

coords

Alias for the coordinates attribute.

absolute_coordinates

Absolute coordinates, \(\bf X_k^{\rm{abs}}\), along the dimension.

count

Number of coordinates, \(N_k \ge 1\), along the dimension.

increment

Increment along a linear dimension.

coordinates_offset

Offset corresponding to the zero of the indexes array, \(\mathbf{J}_k\).

origin_offset

Origin offset, \(o_k\), along the dimension.

complex_fft

If true, the coordinates are the ordered as the output of a complex fft.

quantity_name

Quantity name associated with the physical quantities specifying dimension.

label

Label associated with the dimension.

labels

Ordered list of labels along the Labeled dimension.

period

Period of the dimension.

axis_label

Formatted string for displaying label along the dimension axis.

data_structure

JSON serialized string describing the Dimension class instance.

Methods Summary

to

Convert the coordinates along the dimension to the unit, unit.

dict

Return Dimension object as a python dictionary.

to_dict

Alias to the dict() method of the class.

is_quantitative

Return True if the dependent variable is quantitative.

copy

Return a copy of the Dimension object.

reciprocal_coordinates

Return reciprocal coordinates assuming Nyquist-shannan theorem.

reciprocal_increment

Return reciprocal increment assuming Nyquist-shannan theorem.

Attributes Documentation

type¶

The dimension subtype.

There are three valid subtypes of Dimension class. The valid literals are given by the DimObjectSubtype enumeration.

>>> print(x.type)
linear
Returns

A string with a valid dimension subtype.

Raises

AttributeError – When the attribute is modified.

description¶

Brief description of the dimension object.

The default value is an empty string, ‘’. The attribute may be modified, for example,

>>> print(x.description)
This is a test

>>> x.description = 'This is a test dimension.'
Returns

A string of UTF-8 allows characters describing the dimension.

Raises

TypeError – When the assigned value is not a string.

application¶

Application metadata dictionary of the dimension object.

>>> print(x.application)
{}

The application attribute is where an application can place its own metadata as a python dictionary object containing application specific metadata, using a reverse domain name notation string as the attribute key, for example,

>>> x.application = {
...     "com.example.myApp" : {
...         "myApp_key": "myApp_metadata"
...      }
... }
>>> print(x.application)
{'com.example.myApp': {'myApp_key': 'myApp_metadata'}}
Returns

A python dictionary containing dimension application metadata.

coordinates¶

Coordinates, \({\bf X}_k\), along the dimension.

Example

>>> print(x.coordinates)
[100. 105. 110. 115. 120. 125. 130. 135. 140. 145.] G

For linear dimensions, the order of the coordinates also depend on the value of the complex_fft attributes. For examples, when the value of the complex_fft attribute is True, the coordinates are

>>> x.complex_fft = True
>>> print(x.coordinates)
[ 75.  80.  85.  90.  95. 100. 105. 110. 115. 120.] G
Returns

A Quantity array of coordinates for quantitative dimensions, i.e. linear and monotonic.

Returns

A Numpy array for labeled dimensions.

Raises

AttributeError – For dimensions with subtype linear.

coords¶

Alias for the coordinates attribute.

absolute_coordinates¶

Absolute coordinates, \(\bf X_k^{\rm{abs}}\), along the dimension.

This attribute is only valid for quantitative dimensions, that is, linear and monotonic dimensions. The absolute coordinates are given as

()¶\[\mathbf{X}_k^\mathrm{abs} = \mathbf{X}_k + o_k \mathbf{1}\]

where \(\mathbf{X}_k\) are the coordinates along the dimension and \(o_k\) is the origin_offset. For example, consider

>>> print(x.origin_offset)
10.0 T
>>> print(x.coordinates[:5])
[100. 105. 110. 115. 120.] G

then the absolute coordinates are

>>> print(x.absolute_coordinates[:5])
[100100. 100105. 100110. 100115. 100120.] G

For linear dimensions, the order of the absolute_coordinates further depend on the value of the complex_fft attributes. For examples, when the value of the complex_fft attribute is True, the absolute coordinates are

>>> x.complex_fft = True
>>> print(x.absolute_coordinates[:5])
[100075. 100080. 100085. 100090. 100095.] G
Returns

A Quantity array of absolute coordinates for quantitative dimensions, i.e linear and monotonic.

Raises

AttributeError – For labeled dimensions.

count¶

Number of coordinates, \(N_k \ge 1\), along the dimension.

Example

>>> print(x.count)
10
>>> x.count = 5
Returns

An Integer specifying the number of coordinates along the dimension.

Raises

TypeError – When the assigned value is not an integer.

increment¶

Increment along a linear dimension.

The attribute is only valid for Dimension instances with the subtype linear. When assigning a value, the dimensionality of the value must be consistent with the dimensionality of other members specifying the dimension.

Example

>>> print(x.increment)
5.0 G
>>> x.increment = "0.1 G"
>>> print(x.coordinates)
[100.  100.1 100.2 100.3 100.4 100.5 100.6 100.7 100.8 100.9] G
Returns

A Quantity instance with the increment along the dimension.

Raises
  • AttributeError – For dimension with subtypes other than linear.

  • TypeError – When the assigned value is not a string containing a quantity or a Quantity object.

coordinates_offset¶

Offset corresponding to the zero of the indexes array, \(\mathbf{J}_k\).

When assigning a value, the dimensionality of the value must be consistent with the dimensionality of the other members specifying the dimension.

Example

>>> print(x.coordinates_offset)
10.0 mT
>>> x.coordinates_offset = "0 T"
>>> print(x.coordinates)
[ 0.  5. 10. 15. 20. 25. 30. 35. 40. 45.] G

The attribute is invalid for labeled dimensions.

Returns

A Quantity instance with the coordinates offset.

Raises
  • AttributeError – For labeled dimensions.

  • TypeError – When the assigned value is not a string containing a quantity or a Quantity object.

origin_offset¶

Origin offset, \(o_k\), along the dimension.

When assigning a value, the dimensionality of the value must be consistent with the dimensionality of other members specifying the dimension.

Example

>>> print(x.origin_offset)
10.0 T
>>> x.origin_offset = "1e5 G"

The origin offset only affect the absolute_coordinates along the dimension. This attribute is invalid for labeled dimensions.

Returns

A Quantity instance with the origin offset.

Raises
  • AttributeError – For labeled dimensions.

  • TypeError – When the assigned value is not a string containing a quantity or a Quantity object.

complex_fft¶

If true, the coordinates are the ordered as the output of a complex fft.

This attribute is only valid for the Dimension instances with linear subtype. The value of this attribute is a boolean specifying if the coordinates along the dimension are evaluated as the output of a complex fast Fourier transform (FFT) routine. For example, consider the following Dimension object,

>>> test = Dimension(type='linear', increment = '1', count = 10)
>>> test.complex_fft
False
>>> print(test.coordinates)
[0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]

>>> test.complex_fft = True
>>> print(test.coordinates)
[-5. -4. -3. -2. -1.  0.  1.  2.  3.  4.]
Returns

A Boolean.

Raises

TypeError – When the assigned value is not a boolean.

quantity_name¶

Quantity name associated with the physical quantities specifying dimension.

The attribute is invalid for the labeled dimension.

>>> print(x.quantity_name)
magnetic flux density
Returns

A string with the quantity name.

Raises
  • AttributeError – For labeled dimensions.

  • NotImplementedError – When assigning a value.

label¶

Label associated with the dimension.

Example

>>> print(x.label)
field strength
>>> x.label = 'magnetic field strength'
Returns

A string containing the label.

Raises

TypeError – When the assigned value is not a string.

labels¶

Ordered list of labels along the Labeled dimension.

Consider the following labeled dimension,

>>> x2 = Dimension(
...         type='labeled',
...         labels=['Cu', 'Ag', 'Au']
...      )

then the lables along the labeled dimension are

>>> print(x2.labels)
['Cu' 'Ag' 'Au']

Note

For Labeled dimension, the coordinates attribute is an alias of labels attribute. For example,

>>> np.all(x2.coordinates == x2.labels)
True

In the above example, x2 is an instance of the Dimension class with labeled subtype.

Returns

A Numpy array with labels along the dimension.

Raises

AttributeError – For dimensions with subtype other than labeled.

period¶

Period of the dimension.

The default value of the period is infinity, i.e., the dimension is non-periodic.

Example

>>> print(x.period)
inf G
>>> x.period = '1 T'

To assign a dimension as non-periodic, one of the following may be used,

>>> x.period = '1/0 T'
>>> x.period = 'infinity ”T'
>>> x.period = '∞ G'

Attention

The physical quantity of the period must be consistent with other physical quantities specifying the dimension.

Returns

A Quantity instance with the period of the dimension.

Raises
  • AttributeError – For labeled dimensions.

  • TypeError – When the assigned value is not a string containing a quantity or a Quantity object.

axis_label¶

Formatted string for displaying label along the dimension axis.

This attribute is not a part of the original core scientific dataset model, however, it is a convenient supplementary attribute that provides a formated string ready for labeling dimension axes. For quantitative dimensions, this attributes returns a string, label / unit, if the label is a non-empty string, otherwise, quantity_name / unit. Here quantity_name and label are the attributes of the Dimension instances, and unit is the unit associated with the coordinates along the dimension. For examples,

>>> x.label
'field strength'
>>> x.axis_label
'field strength / (G)'

For labeled dimensions, this attribute returns label.

Returns

A formated string of label.

Raises

AttributeError – When assigned a value.

data_structure¶

JSON serialized string describing the Dimension class instance.

This supplementary attribute is useful for a quick preview of the dimension object. The attribute cannot be modified.

>>> print(x.data_structure)
{
  "type": "linear",
  "count": 10,
  "increment": "5.0 G",
  "coordinates_offset": "10.0 mT",
  "origin_offset": "10.0 T",
  "quantity_name": "magnetic flux density",
  "label": "field strength",
  "description": "This is a test",
  "reciprocal": {
    "quantity_name": "electrical mobility"
  }
}
Returns

A json serialized string of the dimension object.

Raises

AttributeError – When modified.

Method Documentation

to(unit='', equivalencies=None)[source]¶

Convert the coordinates along the dimension to the unit, unit.

This method is a wrapper of the to method from the Quantity class and is only valid for physical dimensions.

Example

>>> print(x.coordinates)
[100. 105. 110. 115. 120. 125. 130. 135. 140. 145.] G
>>> x.to('mT')
>>> print(x.coordinates)
[10.  10.5 11.  11.5 12.  12.5 13.  13.5 14.  14.5] mT
Parameters

unit – A string containing a unit with the same dimensionality as the coordinates along the dimension.

Raises

AttributeError – For labeled dimensions.

dict()[source]¶

Return Dimension object as a python dictionary.

Example

>>> x.dict() 
{'type': 'linear', 'description': 'This is a test', 'count': 10,
'increment': '5.0 G', 'coordinates_offset': '10.0 mT',
'origin_offset': '10.0 T', 'quantity_name': 'magnetic flux density',
'label': 'field strength'}
to_dict()[source]¶

Alias to the dict() method of the class.

is_quantitative()[source]¶

Return True if the dependent variable is quantitative.

Example

>>> x.is_quantitative()
True
copy()[source]¶

Return a copy of the Dimension object.

reciprocal_coordinates()[source]¶

Return reciprocal coordinates assuming Nyquist-shannan theorem.

reciprocal_increment()[source]¶

Return reciprocal increment assuming Nyquist-shannan theorem.

DependentVariable¶

class csdmpy.DependentVariable(*args, **kwargs)[source]¶

Bases: object

Create an instance of the DependentVariable class.

The instance of this class represents a dependent variable, \(\mathbf{U}\). A dependent variable holds \(p\)-component data values, where \(p>0\) is an integer. For example, a scalar is single-component (\(p=1\)), a vector may have up to n-components (\(p=n\)), while a second rank symmetric tensor have six unique component (\(p=6\)).

Creating a new dependent variable.

There are two ways of creating a new instance of a DependentVariable class.

From a python dictionary containing valid keywords.

>>> from csdmpy import DependentVariable
>>> import numpy as np
>>> numpy_array = np.arange(30).reshape(3,10).astype(np.float32)

>>> dependent_variable_dictionary = {
...     'type': 'internal',
...     'components': numpy_array,
...     'name': 'star',
...     'unit': 'W s',
...     'quantity_name': 'energy',
...     'quantity_type': 'pixel_3'
... }
>>> y = DependentVariable(dependent_variable_dictionary)

Here, dependent_variable_dictionary is the python dictionary.

From valid keyword arguments.

>>> y = DependentVariable(
...         type='internal',
...         name='star',
...         unit='W s',
...         quantity_type='pixel_3',
...         components=numpy_array
...     )

Attributes Summary

type

The dependent variable subtype.

description

Brief description of the dependent variables.

application

Application metadata of the DependentVariable object.

name

Name of the dependent variable.

unit

Unit associated with the dependent variable.

quantity_name

Quantity name of physical quantities associated with the dependent variable.

encoding

The encoding method used in representing the dependent variable.

numeric_type

The numeric type of the component values from the dependent variable.

quantity_type

Quantity type of the dependent variable.

component_labels

List of labels corresponding to the components of the dependent variable.

components

Component array of the dependent variable.

components_url

URL where the data components of the dependent variable are stored.

axis_label

List of formatted string labels for each component of the dependent variable.

data_structure

Json serialized string describing the DependentVariable class instance.

Methods Summary

to

Convert the unit of the dependent variable to the unit.

dict

Return DependentVariable object as a python dictionary.

to_dict

Alias to the dict() method of the class.

copy

Return a copy of the DependentVariable object.

Attributes Documentation

type¶

The dependent variable subtype.

There are two valid subtypes of DependentVariable class with the following enumeration literals,

internal
external

corresponding to Internal and External sub class. By default, all instances of the DependentVariable class are assigned as internal upon import. The user may update the value of this attribute, at any time, with a string containing a valid type literal, for example,

>>> print(y.type)
internal

>>> y.type = 'external'

When type is external, the data values from the corresponding dependent variable are serialized to an external file within the same directory as the .csdfe file.

Returns

A string with a valid dependent variable subtype.

Raises

ValueError – When an invalid value is assigned.

description¶

Brief description of the dependent variables.

The default value is an empty string, ‘’.

>>> print(y.description)
A test image
>>> y.description = 'A test pixel_3 image'
>>> print(y.description)
A test pixel_3 image
Returns

A string of UTF-8 allowed characters describing the dependent variable.

Raises

TypeError – When the assigned value is not a string.

application¶

Application metadata of the DependentVariable object.

>>> print(y.application)
{}

The application attribute is where an application can place its own metadata as a python dictionary object containing the application specific metadata, using a reverse domain name notation string as the attribute key, for example,

>>> y.application = {
...     "com.example.myApp" : {
...         "myApp_key": "myApp_metadata"
...      }
... }
>>> print(y.application)
{'com.example.myApp': {'myApp_key': 'myApp_metadata'}}

Please refer to the Core Scientific Dataset Model article for details.

Returns

A python dictionary containing dependent variable application metadata.

name¶

Name of the dependent variable.

>>> y.name
'star'
>>> y.name = 'rock star'
Returns

A string containing the name of the dependent variable.

Raises

TypeError – When the assigned value is not a string.

unit¶

Unit associated with the dependent variable.

Note

The attribute cannot be modified. To convert the unit, use the to() method of the class instance.

>>> y.unit
Unit("s W")
Returns

A Unit object from astropy.unit package.

Raises

AttributeError – When assigned a value.

quantity_name¶

Quantity name of physical quantities associated with the dependent variable.

>>> y.quantity_name
'energy'
Returns

A string with the quantity name associated with the dependent variable physical quantities .

Raises

NotImplementedError – When assigning a value.

encoding¶

The encoding method used in representing the dependent variable.

The value of this attribute determines the method used when serializing or deserializing the data values to and from the file. Currently, there are three valid encoding methods:

raw
base64
none

A value, raw, means that the data values are serialized as binary data. The value, base64, implies that the data values are serialized as base64 strings, while, the value none refers to text-based serialization.

By default, the encoding attribute of all dependent variable object are set to base64 after import. The user may update this attribute, at any time, with a string containing a valid encoding literal, for example,

>>> y.encoding = 'base64'

The value of this attribute will be used in serializing the data to the file, when using the save() method.

Returns

A string with a valid encoding type.

Raises

ValueError – If an invalid encoding value is assigned.

numeric_type¶

The numeric type of the component values from the dependent variable.

There are currently twelve valid numeric types in core scientific dataset model.

uint8

int8

float32

complex64

uint16

int16

float64

complex128

uint32

int32

uint64

int64

Besides, csdmpy also accepts any valid type object, such as int, float, np.complex64, as long as the type is consistent with the above twelve entries.

When assigning a valid value, this attribute updates the dtype of the Numpy array from the corresponding components attribute.

>>> y.numeric_type
'float32'

>>> print(y.components)
[[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9.]
 [10. 11. 12. 13. 14. 15. 16. 17. 18. 19.]
 [20. 21. 22. 23. 24. 25. 26. 27. 28. 29.]]

>>> y.numeric_type = 'complex64'
>>> print(y.components[:,:5])
[[ 0.+0.j  1.+0.j  2.+0.j  3.+0.j  4.+0.j]
 [10.+0.j 11.+0.j 12.+0.j 13.+0.j 14.+0.j]
 [20.+0.j 21.+0.j 22.+0.j 23.+0.j 24.+0.j]]

>>> y.numeric_type = float # python type object
>>> print(y.components[:,:5])
[[ 0.  1.  2.  3.  4.]
 [10. 11. 12. 13. 14.]
 [20. 21. 22. 23. 24.]]
Returns

A string with a valid numeric type.

Raises

ValueError – If an invalid numeric type value is assigned.

quantity_type¶

Quantity type of the dependent variable.

There are currently six valid quantity types,

scalar
vector_n
pixel_n
matrix_n_m
symmetric_matrix_n

where n and m are integers. The value of the attribute is modified with a string containing a valid quantity type.

>>> y.quantity_type
'pixel_3'
>>> y.quantity_type = 'vector_3'
Returns

A string with a valid quantity type.

Raises

ValueError – If an invalid value is assigned.

component_labels¶

List of labels corresponding to the components of the dependent variable.

>>> y.component_labels
['', '', '']

To update the component_labels, assign an array of strings with same number of elements as the number of components.

>>> y.component_labels = ['channel 0', 'channel 1', 'channel 2']

The individual labels are accessed with proper indexing, for example,

>>> y.component_labels[2]
'channel 2'
Returns

A list of component label strings.

Raises

TypeError – When the assigned value is not an array of strings.

components¶

Component array of the dependent variable.

The value of this attribute, \(\mathbb{U}\), is a Numpy array of shape \((p \times N_{d-1} \times ... N_1 \times N_0)\) where \(p\) is the number of components, and \(N_k\) is the number of points from the \(k^\mathrm{th}\) Dimension object.

Note

The shape of the components Numpy array, \((p \times N_{d-1} \times ... N_1 \times N_0)\), is reverse the shape of the components array, \((N_0 \times N_1 \times ... N_{d-1} \times p)\), from the CSD model. This is because CSD model utilizes a column-major order to shape the components array relative to the order of the dimension while Numpy utilizes a row-major order.

The dimensionality of this Numpy array is \(d+1\) where \(d\) is the number of dimension objects. The zeroth axis with \(p\) points is the number of components.

This attribute can only be updated when the shape of the new array is the same as the shape of the components array.

For example,

>>> print(y.components.shape)
(3, 10)
>>> y.numeric_type
'float32'

is a three-component dependent variable with ten data values per component. The numeric type of the data values, in this example, is float32. To update the components array, assign an array of shape (3, 10) to the components attribute. In the following example, we assign a Numpy array,

>>> y.components = np.linspace(0,256,30, dtype='u1').reshape(3,10)
>>> y.numeric_type
'uint8'

Notice, the value of the numeric_type attribute is automatically updated based on the dtype of the Numpy array. In this case, from a float32 to uint8. In this other example,

>>> try: 
...     y.components = np.random.rand(1,10).astype('u1')
... except ValueError as e:
...     print(e)
The shape of the `ndarray`, `(1, 10)`, is inconsistent with the
shape of the components array, `(3, 10)`.

a ValueError is raised because the shape of the input array (1, 10) is not consistent with the shape of the components array, (3, 10).

Returns

A Numpy array of components.

Raises

ValueError – When assigning an array whose shape is not consistent with the shape of the components array.

components_url¶

URL where the data components of the dependent variable are stored.

This attribute is only informative and cannot be modified. Its value is a string containing the local or remote address of the file where the data values are stored. The attribute is only valid for dependent variable with type, external.

Returns

A string containing the URL.

Raises

AttributeError – When assigned a value.

axis_label¶

List of formatted string labels for each component of the dependent variable.

This attribute is not a part of the original core scientific dataset model, however, it is a convenient supplementary attribute that provides formated string ready for labeling the components of the dependent variable. The string at index i is formatted as component_labels[i] / unit if component_labels[i] is a non-empty string, otherwise, quantity_name / unit. Here, quantity_name, component_labels, and unit`are the attributes of the :ref:`dv_api instance. For example,

>>> y.axis_label
['energy / (s W)', 'energy / (s W)', 'energy / (s W)']
Returns

A list of formated component label strings.

Raises

AttributeError – When assigned a value.

data_structure¶

Json serialized string describing the DependentVariable class instance.

This supplementary attribute is useful for a quick preview of the dependent variable object. For convenience, the values from the components attribute are truncated to the first and the last two numbers per component. The encoding keyword is also hidden from this view.

>>> print(y.data_structure)
{
  "type": "internal",
  "description": "A test image",
  "name": "star",
  "unit": "s * W",
  "quantity_name": "energy",
  "numeric_type": "float32",
  "quantity_type": "pixel_3",
  "components": [
    [
      "0.0, 1.0, ..., 8.0, 9.0"
    ],
    [
      "10.0, 11.0, ..., 18.0, 19.0"
    ],
    [
      "20.0, 21.0, ..., 28.0, 29.0"
    ]
  ]
}
Returns

A json serialized string of the dependent variable object.

Raises

AttributeError – When modified.

Method Documentation

to(unit)[source]¶

Convert the unit of the dependent variable to the unit.

Parameters

unit – A string containing a unit with the same dimensionality as the components of the dependent variable.

>>> y.unit
Unit("s W")
>>> print(y.components[0,5])
5.0
>>> y.to('mJ')
>>> y.unit
Unit("mJ")
>>> print(y.components[0,5])
5000.0

Note

This method is a wrapper of the to method from the Quantity class.

dict()[source]¶

Return DependentVariable object as a python dictionary.

Example

>>> y.dict() 
{'type': 'internal', 'description': 'A test image', 'name': 'star',
'unit': 's * W', 'quantity_name': 'energy', 'encoding': 'none',
'numeric_type': 'float32', 'quantity_type': 'pixel_3',
'components': [[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0],
[10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0],
[20.0, 21.0, 22.0, 23.0, 24.0, 25.0, 26.0, 27.0, 28.0, 29.0]]}
to_dict()[source]¶

Alias to the dict() method of the class.

copy()[source]¶

Return a copy of the DependentVariable object.

Statistics¶

Methods Summary

integral

Evaluate the integral of the dependent variables over all dimensions.

mean

Evaluate the mean coordinate of a dependent variable along each dimension.

var

Evaluate the variance of the dependent variables along each dimension.

std

Evaluate the standard deviation of the dependent variables along each dimension.

Method Documentation

csdmpy.statistics.integral(csdm)[source]¶

Evaluate the integral of the dependent variables over all dimensions.

Parameters

csdm – A csdm object.

Returns

A list of integrals corresponding to the list of the dependent variables. If only one dependent variable is present, return a quantity instead.

Example

>>> import csdmpy.statistics as stat
>>> x = np.arange(100) * 2 - 100.0
>>> gauss = np.exp(-((x - 5.) ** 2) / (2 * 4. ** 2))
>>> csdm = cp.as_csdm(gauss, unit='T')
>>> csdm.dimensions[0] = cp.as_dimension(x, unit="m")
>>> stat.integral(csdm)
<Quantity 10.0265131 m T>
csdmpy.statistics.mean(csdm)[source]¶

Evaluate the mean coordinate of a dependent variable along each dimension.

Parameters

csdm – A csdm object.

Returns

A list of tuples, where each tuple represents the mean coordinates of the dependent variables. If only one dependent variable is present, return a tuple of coordinates instead.

Example

>>> stat.mean(csdm)
(<Quantity 5. m>,)
csdmpy.statistics.var(csdm)[source]¶

Evaluate the variance of the dependent variables along each dimension.

Parameters

csdm – A csdm object.

Returns

A list of tuples, where each tuple is the variance along the dimensions of the dependent variables. If only one dependent variable is present, return a tuple instead.

Example

>>> stat.var(csdm)
(<Quantity 16. m2>,)
csdmpy.statistics.std(csdm)[source]¶

Evaluate the standard deviation of the dependent variables along each dimension.

Parameters

csdm – A csdm object.

Returns

A list of tuples, where each tuple is the standard deviation along the dimensions of the dependent variables. If only one dependent variable is present, return a tuple instead.

Example

>>> stat.std(csdm)
(<Quantity 4. m>,)

CSDMAxes¶

class csdmpy.helper_functions.CSDMAxes(fig, rect, facecolor=None, frameon=True, sharex=None, sharey=None, label='', xscale=None, yscale=None, box_aspect=None, **kwargs)[source]¶

Bases: matplotlib.axes._axes.Axes

A custom CSDM data plot axes.

Methods Summary

plot

Generate a figure axes using the plot method from the matplotlib library.

scatter

Generate a figure axes using the scatter method from the matplotlib library.

imshow

Generate a figure axes using the imshow method from the matplotlib library.

contour

Generate a figure axes using the contour method from the matplotlib library.

contourf

Generate a figure axes using the contourf method from the matplotlib library.

Method Documentation

plot(csdm, *args, **kwargs)[source]¶

Generate a figure axes using the plot method from the matplotlib library.

Apply to all 1D datasets with single-component dependent-variables. For multiple dependent variables, the data from individual dependent-variables is plotted on the same figure.

Parameters
  • csdm – A CSDM object of a one-dimensional dataset.

  • kwargs – Additional keyword arguments for the matplotlib plot() method.

Example

>>> ax = plt.subplot(projection='csdm') 
>>> ax.plot(csdm_object) 
>>> plt.show() 
scatter(csdm, *args, **kwargs)[source]¶

Generate a figure axes using the scatter method from the matplotlib library.

Apply to all 1D datasets with single-component dependent-variables. For multiple dependent variables, the data from individual dependent-variables is plotted on the same figure.

Parameters
  • csdm – A CSDM object of a one-dimensional dataset.

  • kwargs – Additional keyword arguments for the matplotlib plot() method.

Example

>>> ax = plt.subplot(projection='csdm') 
>>> ax.scatter(csdm_object) 
>>> plt.show() 
imshow(csdm, origin='lower', *args, **kwargs)[source]¶

Generate a figure axes using the imshow method from the matplotlib library.

Apply to all 2D datasets with either single-component (scalar), three-components (pixel_3), or four-components (pixel_4) dependent-variables. For single-component (scalar) dependent-variable, a colormap image is produced. For three-components (pixel_3) dependent-variable, an RGB image is produced. For four-components (pixel_4) dependent-variable, an RGBA image is produced.

For multiple dependent variables, the data from individual dependent-variables is plotted on the same figure.

Parameters
  • csdm – A CSDM object of a two-dimensional dataset with scalar, pixel_3, or pixel_4 quantity_type dependent variable.

  • origin – The matplotlib origin argument. In matplotlib, the default is ‘upper’. In csdmpy, however, the default to ‘lower’.

  • kwargs – Additional keyword arguments for the matplotlib imshow() method.

Example

>>> ax = plt.subplot(projection='csdm') 
>>> ax.imshow(csdm_object) 
>>> plt.show() 
contour(csdm, *args, **kwargs)[source]¶

Generate a figure axes using the contour method from the matplotlib library.

Apply to all 2D datasets with a single-component (scalar) dependent-variables. For multiple dependent variables, the data from individual dependent-variables is plotted on the same figure.

Parameters
  • csdm – A CSDM object of a two-dimensional dataset with scalar dependent variable.

  • kwargs – Additional keyword arguments for the matplotlib contour() method.

Example

>>> ax = plt.subplot(projection='csdm') 
>>> ax.contour(csdm_object) 
>>> plt.show() 
contourf(csdm, *args, **kwargs)[source]¶

Generate a figure axes using the contourf method from the matplotlib library.

Apply to all 2D datasets with a single-component (scalar) dependent-variables. For multiple dependent variables, the data from individual dependent-variables is plotted on the same figure.

Parameters
  • csdm – A CSDM object of a two-dimensional dataset with scalar dependent variable.

  • kwargs – Additional keyword arguments for the matplotlib contourf() method.

Example

>>> ax = plt.subplot(projection='csdm') 
>>> ax.contourf(csdm_object) 
>>> plt.show() 

Numpy methods¶

Supported NumPy functions¶

The csdm object supports the use of NumPy functions, as

>>> y = np.func(x) 

where x and y are the csdm objects, and func is any one of the following functions. These functions apply to each component of the dependent variables from a given csdm object, x.

Trigonometric functions

The trigonometric functions apply to the components of the dependent variables from a csdm object.

Note

The components must be dimensionless quantities.

A list of supported trigonometric functions.¶

Functions

Description

sin

Apply sine to the components of the dependent variables

cos

Apply cosine to the components of the dependent variables

tan

Apply tangent to the components of the dependent variables

arcsin

Apply inverse sine to the components of the dependent variables

arccos

Apply inverse cosine to the components of the dependent variables

arctan

Apply inverse tangent to the components of the dependent variables

sinh

Apply hyperbolic sine to the components of the dependent variables

cosh

Apply hyperbolic cosine to the components of the dependent variables

tanh

Apply hyperbolic tangent to the components of the dependent variables

arcsinh

Apply inverse hyperbolic sine to the components of the dependent variables

arccosh

Apply inverse hyperbolic cosine to the components of the dependent variables

arctanh

Apply inverse hyperbolic tangent to the components of the dependent variables

Mathematical operations

The following mathematical functions apply to the components of the dependent variables from a csdm object.

Note

The components must be dimensionless quantities.

A list of supported mathematical functions.¶

Functions

Description

exp

Calculate the exponential of the components of the dependent variables.

expm1

Apply \(e^x - 1\), where x are the components of the dependent variables.

exp2

Calculate \(2^x\), where x are the components of the dependent variables.

log

Calculate natural logarithm of the components of the dependent variables.

log1p

Calculate natural logarithm plus one on the components of the dependent variables.

log2

Calculate base-2 logarithm of the components of the dependent variables.

log10

Calculate base-10 logarithm of the components of the dependent variables.

The following mathematical functions apply to the components of the dependent variables from a csdm object irrespective of the components’ dimensionality.

Arithmetic operations¶

Functions

Description

reciprocal

Return element-wise reciprocal.

positive

Return element-wise numerical positive.

negative

Return element-wise numerical negative.

Miscellaneous¶

Functions

Description

sqrt

Return element-wise non-negative square-root.

cbrt

Return element-wise cube-root.

square

Return element-wise square.

absolute

Return element-wise absolute value.

fabs

Return element-wise absolute value.

sign

Return element-wise sign of the values.

Handling complex numbers¶

Functions

Description

angle

Return element-wise angle of a complex value.

real

Return element-wise real part of a complex value.

imag

Return element-wise imaginary part of a complex value.Ă„

conj

Return element-wise conjugate.

conjugate

Return element-wise conjugate.

Sums, products, differences¶

Functions

Description

prod

Return the product of the components of a dependent variable along a dimension.

sum

Return the sum of the components of a dependent variable along a dimension.

Rounding¶

Functions

Description

rint

Round elements to the nearest integer.

around

Round elements to the given number of decimals.

round

Round elements to the given number of decimals.

Other functions

  • min

  • max

  • mean

  • var

  • std

Dimension specific Apodization methods¶

The following methods of form

()¶\[y = f(a x),\]

where \(a\) is the function argument, and \(x\) are the coordinates along the dimension, apodize the components of the dependent variables along the respective dimensions. The dimensionality of \(a\) must be the reciprocal of that of \(x\). The resulting CSDM object has the same number of dimensions as the original object.

Method Summary

sin(csdm, arg[, dimension])

Apodize the components along the dimension with \(\sin(a x)\).

cos(csdm, arg[, dimension])

Apodize the components along the dimension with \(\cos(a x)\).

tan(csdm, arg[, dimension])

Apodize the components along the dimension with \(\tan(a x)\).

arcsin(csdm, arg[, dimension])

Apodize the components along the dimension with \(\arcsin(a x)\).

arccos(csdm, arg[, dimension])

Apodize the components along the dimension with \(\arccos(a x)\).

arctan(csdm, arg[, dimension])

Apodize the components along the dimension with \(\arctan(a x)\).

exp(csdm, arg[, dimension])

Apodize the components along the dimension with \(\exp(a x)\).

Method Documentation

csdmpy.apodize.sin(csdm, arg, dimension=0)¶

Apodize the components along the dimension with \(\sin(a x)\).

Parameters
  • csdm – A CSDM object.

  • arg – String or Quantity object. The function argument \(a\).

  • dimension – An integer or tuple of m integers cooresponding to the index/indices of the dimensions along which the sine of the dependent variable components is performed.

Returns

A CSDM object with d-m dimensions, where d is the total number of dimensions from the original csdm object.

csdmpy.apodize.cos(csdm, arg, dimension=0)¶

Apodize the components along the dimension with \(\cos(a x)\).

Parameters
  • csdm – A CSDM object.

  • arg – String or Quantity object. The function argument \(a\).

  • dimension – An integer or tuple of m integers cooresponding to the index/indices of the dimensions along which the cosine of the dependent variable components is performed.

Returns

A CSDM object with d-m dimensions, where d is the total number of dimensions from the original csdm object.

csdmpy.apodize.tan(csdm, arg, dimension=0)¶

Apodize the components along the dimension with \(\tan(a x)\).

Parameters
  • csdm – A CSDM object.

  • arg – String or Quantity object. The function argument \(a\).

  • dimension – An integer or tuple of m integers cooresponding to the index/indices of the dimensions along which the tangent of the dependent variable components is performed.

Returns

A CSDM object with d-m dimensions, where d is the total number of dimensions from the original csdm object.

csdmpy.apodize.arcsin(csdm, arg, dimension=0)¶

Apodize the components along the dimension with \(\arcsin(a x)\).

Parameters
  • csdm – A CSDM object.

  • arg – String or Quantity object. The function argument \(a\).

  • dimension – An integer or tuple of m integers cooresponding to the index/indices of the dimensions along which the inverse sine of the dependent variable components is performed.

Returns

A CSDM object with d-m dimensions, where d is the total number of dimensions from the original csdm object.

csdmpy.apodize.arccos(csdm, arg, dimension=0)¶

Apodize the components along the dimension with \(\arccos(a x)\).

Parameters
  • csdm – A CSDM object.

  • arg – String or Quantity object. The function argument \(a\).

  • dimension – An integer or tuple of m integers cooresponding to the index/indices of the dimensions along which the inverse cosine of the dependent variable components is performed.

Returns

A CSDM object with d-m dimensions, where d is the total number of dimensions from the original csdm object.

csdmpy.apodize.arctan(csdm, arg, dimension=0)¶

Apodize the components along the dimension with \(\arctan(a x)\).

Parameters
  • csdm – A CSDM object.

  • arg – String or Quantity object. The function argument \(a\).

  • dimension – An integer or tuple of m integers cooresponding to the index/indices of the dimensions along which the inverse tangent of the dependent variable components is performed.

Returns

A CSDM object with d-m dimensions, where d is the total number of dimensions from the original csdm object.

csdmpy.apodize.exp(csdm, arg, dimension=0)¶

Apodize the components along the dimension with \(\exp(a x)\).

Parameters
  • csdm – A CSDM object.

  • arg – String or Quantity object. The function argument \(a\).

  • dimension – An integer or tuple of m integers cooresponding to the index/indices of the dimensions along which the exp of the dependent variable components is performed.

Returns

A CSDM object with d-m dimensions, where d is the total number of dimensions from the original csdm object.

Changelog¶

v0.4.1¶

Patch update for the CSDM dimension’s quantity_name attribute value from units compatible with astropy>=4.3

v0.4¶

What’s new¶

  • The add_dimension and add_dependent_variable from CSDM class are deprecated.

Bugfix¶

  • Fixed error in calculating the nmr dimensionless frequency ratio (ppm) when dimension.complex_fft=False

v0.3.5¶

  • Fix the missing library error from pip installation.

v0.3.4¶

Changes¶

  • Image and Contour plots of csdm objects no longer draw colorbar. Colorbar can be requested separately using plt.colorbar().

v0.3.3¶

What’s new!¶

  • Add size method to the CSDM object.

  • Added alias for the csdm keywords that are short and easy for coding. The following is the list of aliases

    • dependent_variables -> y

    • dimensions -> x

    • add_dependent_variable -> add_x

    • add_dimension -> add_x

    • coordinates -> coords

Bug fixes¶

  • Fixed bug causing a false error when reading sparse datasets.

v0.3.2¶

Bug fixes¶

  • Bugfix in fft method when applied to multi-dimensional CSDM objects.

  • Added new tutorial examples.

v0.3.1¶

Bug fixes¶

  • Bugfix regarding the phase multiplier for the CSDM.fft() methods where an incorrect phase was multiplied to the signal vector.

v0.3.0¶

What’s new!¶

  • Support for matplotlib.pyplot functions from CSDM objects.
    • plot,

    • scatter,

    • imshow,

    • contour, and

    • contourf

    Now you can directly plot CSDM objects as an argument to the above matplotlib methods.

v0.2.2¶

Bug fixes¶

  • Fixed bug where the metadata from the csdm.application key was not serialized to the file when using csdm.save() method.

  • Fixed a bug where the transpose of a CSDM object failed to retain the quantity_type information after the transpose.

Other changes¶

  • Add a new diffusion tensor MRI dataset to the example gallery.

  • Added dict() as an alias to the to_dict() method for all objects.

  • Added an alias of the cp.plot() function to the CSDM object as the plot() method.

v0.2.1¶

What’s new!¶

  • Add reciprocal_coordinates() and reciprocal_increment() methods to the LinearDimension class.

  • Added fft() function to the CSDM class.

  • Added transpose() method to the CSDM class.

v0.2.0¶

What’s new!¶

  • Added following methods to the CSDM class:
    • __eq__() for all class

    • __add__() = Adds two csdm object.

    • __iadd__() = Adds two csdm objects in-place.

    • __sub__() = Subtrace two csdm objects.

    • __isub__() = Subtrace two csdm objects in-place.

    • __mul__() = Multiply the components of the csdm object by a scalar.

    • __imul__() = Multiply the components of the csdm object by a scalar in-place.

    • __truvdiv__() = Divide the components of the csdm object by a scalar.

    • __itruediv__() = Divide the components of the csdm object by a scalar in-place.

    • split() = Split the dependent-variables into individual csdm objects.

  • Support for Numpy dimension reduction functions
    • sum(): Sum along a given dimension.

    • prod(): Product along a given dimension.

  • Support for Numpy ufunc functions:
    • sin, cos, tan, arcsin, arccos, arctan, sinh, cosh, tanh, arcsinh, arccosh, arctanh, exp, exp2, log, log2, log10, expm1, log1p, negative, positive, square, absolute, fabs, rint, sign, conj, conjugate, sqrt, cbrt, reciprocal

  • Added apodization functions.
    • sin, cos, tan, arcsin, arccos, arctan, exp

Bug fixes¶

  • Fixed a bug in cp.plot() method.

v0.1.5¶

  • Added method to convert the frequency dimension to nmr dimensionless frequency ratio with syntax, dimension.to('ppm', 'nmr_frequency_ratio'), where dimension is a LinearDimension object.

  • The csdmpy.plot() method also displays the dimension index on the axis label.

v0.1.4¶

  • Added to_dict() method to the CSDM, Dimension, and DependentVariable objects.

v0.1.3¶

  • Fixed warning message when physical quantity name is not found in the astropy units package.

  • Added dumps and loads function to dump and load the data model as json serialized string, respectively without serializing it to a file.

v0.0.11 to v0.1.2¶

  • Add a required unsigned_interger_type for SparseSampling dimension.

  • Fixed minor bugs.

  • Added a tags attribute to the CSDmodel object.

  • Changed ‘sampling_interval’ key to ‘count’.

  • Changed ‘quantity’ key to ‘quantity_name’.

  • Changed ‘index_zero_value’ key to ‘coordinates_offset’.

  • Changed ‘fft_output_order’ key to ‘complex_fft’.

  • Renamed IndependentVariable class to Dimension.

  • Renamed LinearlySpacedDimension class to LinearDimension.

  • Renamed ArbitrarilySpacedDimension class to MonotonicDimension.

  • Added a reciprocal attribute to LinearDimension and MonotonicDimension classes.

  • Removed the reverse attribute from all Dimension classes.

  • Changed ‘sampling_interval’ keyword to ‘increment’.

  • Changed ‘reference_offset’ keyword to ‘index_zero_value’.

  • Changed ‘linear_spacing’ literal to ‘linear’.

  • Changed ‘arbitrarily_sampled’ literal to ‘monotonic’.

  • Changed the defining of the coordinates for the LinearDimension from

    ()¶\[X^\text{ref} = m_k J_k - c_k {\bf 1}\]

    to

    ()¶\[X^\text{ref} = m_k J_k + c_k {\bf 1},\]

    where \(c_k\) is the reference offset, \(m_k\) is the increment, and \(J_k\) is the set of integer indices along the dimension.

  • Added ‘description’ key to ‘Dimension’, ‘DependentVariable’ and ‘CSDM’ object.

  • Changed ‘CSDM’ keyword to ‘csdm’

  • Changed ‘FFT_output_order’ keyword to ‘fft_output_order’

  • Changed ‘components_URL’ keyword to ‘components_url’


Citations¶

1

Srivastava D.J., Vosegaard T., Massiot D., Grandinetti P.J. (2020) Core Scientific Dataset Model: A lightweight and portable model and file format for multi-dimensional scientific data. PLOS ONE 15(1): e0225953.

Additionally, if you use the csdmpy python package, please find the citation for the respective version from zenodo.

Media coverage¶

Des chimistes élaborent un nouveau format pour le partage de données scientifiques Des chimistes élaborent un nouveau format pour le partage de données scientifiques.

Simplifying how scientists share data Simplifying how scientists share data.

Indices and tables¶