Welcome to csdmpy’s documentation!¶
The csdmpy package is the Python support for the core scientific dataset (CSD) model file exchange-format. The package is based on the core scientific dataset (CSD) model, which is designed as a building block in the development of a more sophisticated portable scientific dataset file standard. The CSD model is capable of handling a wide variety of scientific datasets both within and across disciplinary fields.
The main objective of this python package is to facilitate an easy import and export of the CSD model serialized files for Python users. The package utilizes Numpy library and, therefore, offers the end-users versatility to process or visualize the imported datasets with any third-party package(s) compatible with Numpy.
The sample CSDM compliant files used in this documentation are available online.
The core scientific dataset (CSD) model¶
The core scientific dataset (CSD) model is a light-weight, portable, versatile, and standalone data model capable of handling a variety of scientific datasets. The model only encapsulates data values and the minimum metadata to accurately represent a p-component dependent variable, \((\mathbf{U}_0, ... \mathbf{U}_q, ... \mathbf{U}_{p-1})\), discretely sampled at M unique points in a d-dimensional coordinate space, \((\mathbf{X}_0, \mathbf{X}_1, ... \mathbf{X}_k, ... \mathbf{X}_{d-1})\). The model is not intended to encapsulate any information on how the data might be acquired, processed, or visualized.
The data model is versatile in allowing many use cases for most spectroscopy, diffraction, and imaging techniques. As such the model supports multi-component datasets associated with continuous physical quantities that are discretely sampled in a multi-dimensional space associated with other carefully controlled quantities, for e.g., a mass as a function of temperature, a current as a function of voltage and time, a signal voltage as a function of magnetic field gradient strength, a color image with a red, green, and blue (RGB) light intensity components as a function of two independent spatial dimensions, or the six components of the symmetric second-rank diffusion tensor MRI as a function of three independent spatial dimensions. Additionally, the model supports multiple dependent variables sharing the same \(d\)-dimensional coordinate space. For example, a simultaneous measurement of current and voltage as a function of time, simultaneous acquisition of air temperature, pressure, wind velocity, and solar-flux as a function of Earth’s latitude and longitude coordinates. We refer to these dependent variables as correlated-datasets.
The CSD model is independent of the hardware, operating system, application software, programming language, and the object-oriented file-serialization format utilized in serializing the CSD model to the file. Out of numerous file serialization formats, XML, JSON, property list, we chose the data-exchange oriented JSON (JavaScript Object Notation) file-serialization format because it is human-readable and easily integrable with any number of programming languages and field related application-software.
CSDM¶
Description¶
The root level object of the CSD model.
Attributes¶
Name |
Type |
Description |
---|---|---|
version |
String |
A required version number of CSDM file-exchange format. |
dimensions |
[Dimension, …] |
A required ordered and unique array of dimension objects. An empty array is a valid value. |
dependent_variables |
[DependentVariable, …] |
A required array of dependent-variable objects. An empty array is a valid value. |
tags |
[String, …] |
An optional list of keywords associated with the dataset. |
read_only |
Boolean |
An optional value with default as False. If true, the serialized file is archived. |
timestamp |
String |
An optional UTC ISO-8601 format timestamp from when the CSDM-compliant file was last serialized. |
geographic_coordinate |
geographic_coordinate |
An optional object with attributes required to describe the location from where the CSDM-compliant file was last serialized. |
description |
String |
An optional description of the datasets in the CSD model. |
application |
Generic |
An optional generic dictionary object containing application specific metadata describing the CSDM object. |
Dimension¶
A generalized object describing a dimension of a multi-dimensional grid/space.
Specialized Class¶
Attributes¶
Name |
Type |
Description |
---|---|---|
type |
A required enumeration literal with a valid dimension subtype. |
|
label |
String |
An optional label of the dimension. |
description |
String |
An optional description of the dimension. |
application |
Generic |
An optional generic dictionary object containing application specific metadata describing the dimension. |
DependentVariable¶
Description¶
A generalized object describing a dependent variable of the dataset, which holds an ordered list of p components, indexed as q=0 to p-1, as
Specialized Class¶
Attributes¶
Name |
Type |
Description |
---|---|---|
type |
An enumeration literal with a valid dependent variable subtype. |
|
name |
String |
Name of the dependent variable. |
unit |
String |
The unit associated with the physical quantities describing the dependent variable. |
quantity_name |
String |
Quantity name associated with the physical quantities describing the dependent variable. |
numeric_type |
An enumeration literal with a valid numeric type. |
|
quantity_type |
An enumeration literal with a valid quantity type. |
|
component_labels |
[String, String, … ] |
Ordered array of labels associated with ordered array of components of the dependent variable. |
sparse_sampling |
Object with attribute required to describe a sparsely sampled dependent variable components. |
|
description |
String |
Description of the dependent variable. |
application |
Generic |
Generic dictionary object containing application specific metadata describing the dependent variable. |
Enumeration¶
DimObjectSubtype¶
An enumeration with literals as the value of the Dimension objects’ type attribute.
Literal |
Description |
---|---|
linear |
Literal specifying an instance of a LinearDimension object. |
monotonic |
Literal specifying an instance of a MonotonicDimension object. |
labeled |
Literal specifying an instance of a LabeledDimension object. |
DVObjectSubtype¶
An enumeration with literals as the values of the DependentVariable object’ type attribute.
Literal |
Description |
---|---|
internal |
Literal specifying an instance of an InternalDependentVariable object. |
external |
Literal specifying an instance of an ExternalDependentVariable object. |
NumericType¶
An enumeration with literals as the value of the DependentVariable objects’ numeric_type attribute.
Literal |
Description |
---|---|
uint8 |
8-bit unsigned integer |
uint16 |
16-bit unsigned integer |
uint32 |
32-bit unsigned integer |
uint64 |
64-bit unsigned integer |
int8 |
8-bit signed integer |
int16 |
16-bit signed integer |
int32 |
32-bit signed integer |
int64 |
64-bit signed integer |
float32 |
32-bit floating point number |
float64 |
64-bit floating point number |
complex64 |
two 32-bit floating points numbers |
complex128 |
two 64-bit floating points numbers |
QuantityType¶
An enumeration with literals as the value of the DependentVariable objects’ quantity_type attribute. The value is used in interpreting the p-components of the dependent variable.
- scalar
A dependent variable with \(p=1\) component interpret as a scalar, \(\mathcal{S}_i=U_{0,i}\).
- vector_n
A dependent variable with \(p=n\) components interpret as vector components, \(\mathcal{V}_i= \left[ U_{0,i}, U_{1,i}, ... U_{n-1,i}\right]\).
- matrix_n_m
A dependent variable with \(p=mn\) components interpret as a \(n \times m\) matrix as follows,
\[\begin{split}M_i = \left[ \begin{array}{cccc} U_{0,i} & U_{1,i} & ... &U_{(n-1)m,i} \\ U_{1,i} & U_{m+1,i} & ... &U_{(n-1)m+1,i} \\ \vdots & \vdots & \vdots & \vdots \\ U_{m-1,i} & U_{2m-1,i} & ... &U_{nm-1,i} \end{array} \right]\end{split}\]
- symmetric_matrix_n
A dependent variable with \(p=n^2\) components interpret as a matrix symmetric about its leading diagonal as shown below,
\[\begin{split}M^{(s)}_i = \left[ \begin{array}{cccc} U_{0,i} & U_{1,i} & ... & U_{n-1,i} \\ U_{1,i} & U_{n,i} & ... &U_{2n-2,i} \\ \vdots & \vdots & \vdots & \vdots \\ U_{n-1,i} & U_{2n-2,i} & ... &U_{\frac{n(n+1)}{2}-1,i} \end{array} \right]\end{split}\]
- pixel_n
A dependent variable with \(p=n\) components interpret as image/pixel components, \(\mathcal{P}_i= \left[ U_{0,i}, U_{1,i}, ... U_{n-1,i}\right]\).
Here, the terms \(n\) and \(m\) are intergers.
ScalarQuantity¶
ScalarQuantity is an object composed of a numerical value and any valid SI unit symbol or any number of accepted non-SI unit symbols. It is serialized in the JSON file as a string containing a numerical value followed by the unit symbol, for example,
“3.4 m” (SI)
“2.3 bar” (non-SI)
csdmpy package¶
Installing csdmpy package¶
Using PIP:
PIP is a package manager for Python packages and is included with python version 3.4 and higher.
$ pip install csdmpy
The requirements for csdmpy¶
The list of packages dependencies for csdmpy module
numpy>=1.10.1 (for handling n-dimensional arrays)
setuptools>=27.3
requests>=2.21.0 (for downloading files from server)
astropy>=3.0 (for astropy units module)
matplotlib>=3.0 (for rendering plots)
Getting Started With csdmpy package¶
We have put together a set of guidelines for importing the csdmpy package and related methods and attributes. We encourage the users to follow these guidelines to promote consistency, amongst others. Import the package using
>>> import csdmpy as cp
To load a .csdf or a .csdfe file, use the load()
method of the csdmpy module. In the following example, we load a
sample test file.
>>> filename = cp.tests.test01 # replace this with your file's name.
>>> testdata1 = cp.load(filename)
Here, testdata1
is an instance of the CSDM class.
At the root level, the CSDM object includes various useful optional
attributes that may contain additional information about the dataset. One such
useful attribute is the description
key, which briefs
the end-users on the contents of the dataset. To access the value of this
attribute use,
>>> testdata1.description
'A simulated sine curve.'
Accessing the dimensions and dependent variables of the dataset¶
An instance of the CSDM object may include multiple dimensions and
dependent variables. Collectively, the dimensions form a multi-dimensional grid
system, and the dependent variables populate this grid.
In csdmpy,
dimensions and dependent variables are structured as python’s tuple object.
To access these tuples, use the dimensions
and
dependent_variables
attribute of the CSDM object,
respectively. For example,
>>> x = testdata1.dimensions
>>> y = testdata1.dependent_variables
In this example, the dataset contains one dimension and one dependent variable, and therefore, x and y are tuples with a single instance.
>>> print('x is a {0} of length {1}.'.format(type(x).__name__, len(x)))
x is a tuple of length 1.
>>> print('y is a {0} of length {1}.'.format(type(y).__name__, len(y)))
y is a tuple of length 1.
You may access the instances of individual dimension and dependent variable by
using the proper indexing. For example, the dimension and dependent variable
at index 0 may be accessed using x[0]
and y[0]
, respectively.
Every instance of the Dimension object has its own set of attributes
that further describe the respective dimension. For example, a Dimension object
may have an optional description
attribute,
>>> x[0].description
'A temporal dimension.'
Similarly, every instance of the DependentVariable object has its own set of
attributes. In this example, the
description
attribute from the dependent variable is
>>> y[0].description
'A response dependent variable.'
Coordinates along the dimension¶
Every dimension object contains a list of coordinates associated with every
grid index along the dimension. To access these coordinates, use
the coordinates
attribute of the
respective Dimension instance. In this example, the coordinates are
>>> x[0].coordinates
<Quantity [0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9] s>
Note
x[0].coordinates
returns a
Quantity
instance from the
Astropy package.
The csdmpy module utilizes the units library from
astropy.units module
to handle physical quantities. The numerical value and the
unit of the physical quantities are accessed through the Quantity
instance, using the value
and the unit
attributes, respectively.
Please refer to the astropy.units
documentation for details.
In the csdmpy module, the Quantity.value
is a
Numpy array.
For instance, in the above example, the underlying Numpy array from the
coordinates attribute is accessed as
>>> x[0].coordinates.value
array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])
Components of the dependent variable¶
Every dependent variable object has at least one component. The number of
components of the dependent variable is determined from the
quantity_type
attribute
of the dependent variable object. For example, a scalar quantity has
one-component, while a vector quantity may have multiple components. To access
the components of the dependent variable, use the
components
attribute of the respective DependentVariable instance. For example,
>>> y[0].components
array([[ 0.0000000e+00, 5.8778524e-01, 9.5105654e-01, 9.5105654e-01,
5.8778524e-01, 1.2246469e-16, -5.8778524e-01, -9.5105654e-01,
-9.5105654e-01, -5.8778524e-01]], dtype=float32)
The components
attribute
is a Numpy array. Note, the number of dimensions of this array is \(d+1\),
where \(d\) is the number of Dimension objects from the
dimensions
attribute. The additional dimension in the
Numpy array corresponds to the number of components of the dependent variable.
For instance, in this example, there is a single dimension, i.e., \(d=1\)
and, therefore, the value of the
components
attribute holds a two-dimensional Numpy array of shape
>>> y[0].components.shape
(1, 10)
where the first element of the shape tuple, 1, is the number of
components of the dependent variable and the second element, 10, is the
number of points along the dimension, i.e., x[0].coordinates
.
Plotting the dataset¶
It is always helpful to represent a scientific dataset with visual aids such as a plot or a figure instead of columns of numbers. As such, throughout this documentation, we provide a figure or two for every example dataset. We make use of Python’s Matplotlib library for generating these figures. The users may, however, use their favorite plotting library.
Attention
Although we show code for visualizing the dataset, this documentation is not a guide for data visualization.
The following snippet plots the dataset from this example. Here, the axis_label is an attribute of both Dimension and DependentVariable instances, and the name is an attribute of the DependentVariable instance.
>>> import matplotlib.pyplot as plt
>>> plt.plot(x[0].coordinates, y[0].components[0])
>>> plt.xlabel(x[0].axis_label)
>>> plt.ylabel(y[0].axis_label[0])
>>> plt.title(y[0].name)
>>> plt.show()

See also
CSDM, Dimension, DependentVariable, Quantity, numpy array, Matplotlib library
Examples¶
In this section, we present illustrative examples for importing files serialized with the CSD model using the csdmpy package. Because the CSD model allows multi-dimensional datasets with multiple dependent variables, we use a shorthand notation of \(d\mathrm{D}\{p\}\) to indicate that a dataset has a \(p\)-component dependent variable defined on a \(d\)-dimensional coordinate grid. In the case of correlated datasets, the number of components in each dependent variable is given as a list within the curly braces, i.e., \(d\mathrm{D}\{p_0, p_1, p_2, ...\}\).
The sample CSDM compliant files used in this documentation are available online.
Example Dataset
Scalar, 1D{1} datasets¶
The 1D{1} datasets are one dimensional, \(d=1\), with one single-component, \(p=1\), dependent variable. In this section, we walk through some examples of 1D{1} datasets.
Let’s start by first importing the csdmpy module.
>>> import csdmpy as cp
Global Mean Sea Level rise dataset¶
The following dataset is the Global Mean Sea Level (GMSL) rise from the late 19th to the Early 21st Century. The original dataset was downloaded as a CSV file and subsequently converted to the CSD model format. Let’s import this file.
>>> filename = 'Test Files/gmsl/GMSL.csdf'
>>> sea_level = cp.load(filename)
The variable filename is a string with the address to the sea_level.csdf
file.
The load()
method of the csdmpy module reads the
file and returns an instance of the CSDM class, in
this case, as a variable sea_level
. For a quick preview of the data
structure, use the data_structure
attribute of this
instance.
>>> print(sea_level.data_structure)
{
"csdm": {
"version": "1.0",
"read_only": true,
"timestamp": "2019-05-21T13:43:00Z",
"tags": [
"Jason-2",
"satellite altimetry",
"mean sea level",
"climate"
],
"description": "Global Mean Sea Level (GMSL) rise from the late 19th to the Early 21st Century.",
"dimensions": [
{
"type": "linear",
"count": 1608,
"increment": "0.08333333333 yr",
"coordinates_offset": "1880.0416666667 yr",
"quantity_name": "time",
"reciprocal": {
"quantity_name": "frequency"
}
}
],
"dependent_variables": [
{
"type": "internal",
"name": "Global Mean Sea Level",
"unit": "mm",
"quantity_name": "length",
"numeric_type": "float32",
"quantity_type": "scalar",
"component_labels": [
"GMSL"
],
"components": [
[
"-183.0, -171.125, ..., 59.6875, 58.5"
]
]
}
]
}
}
Warning
The serialized string from the data_structure
attribute is not the same as the JSON serialization on the file.
This attribute is only intended for a quick preview of the data
structure and avoids displaying large datasets. Do not use
the value of this attribute to save the data to the file. Instead, use the
save()
method of the CSDM
class.
The tuples of the dimensions and dependent variables from this example are
>>> x = sea_level.dimensions
>>> y = sea_level.dependent_variables
respectively. The coordinates along the dimension and the component of the dependent variable are
>>> print(x[0].coordinates)
[1880.04166667 1880.125 1880.20833333 ... 2013.79166666 2013.87499999
2013.95833333] yr
>>> print(y[0].components[0])
[-183. -171.125 -164.25 ... 66.375 59.6875 58.5 ]
respectively.
Tip
Plotting a one-dimension scalar line-plot.
Before we plot this dataset, we find it convenient to write a small plotting method. This method makes it easier, later, when we describe 1D{1} examples form a variety of scientific datasets. The method follows-
>>> import matplotlib.pyplot as plt
>>> def plot1D(dataObject):
... # tuples of dependent and dimension instances.
... x = dataObject.dimensions
... y = dataObject.dependent_variables
...
... plt.figure(figsize=(4,3))
... plt.plot(x[0].coordinates, y[0].components[0].real, color='k', linewidth=0.5)
...
... plt.xlim(x[0].coordinates[0].value, x[0].coordinates[-1].value)
...
... # The axes labels and figure title.
... plt.xlabel(x[0].axis_label)
... plt.ylabel(y[0].axis_label[0])
... plt.title(y[0].name)
...
... plt.grid(color='gray', linestyle='--', linewidth=0.3)
... plt.tight_layout(pad=0, w_pad=0, h_pad=0)
... plt.show()
A quick walk-through of the plot1D
method. The method accepts an
instance of the CSDM class as an argument. Within the method, we
make use of the instance’s attributes in addition to the matplotlib
functions. The first line assigns the tuple of the dimensions and dependent
variables to x
and y
, respectively. The following two lines add a plot of
the components of the dependent variable versus the coordinates of the dimension.
The next line sets the x-range. For labeling the axes,
we use the axis_label
attribute
of both dimension and dependent variable instances. For the figure title,
we use the name
attribute
of the dependent variable instance. The next statement adds the grid lines.
For additional information, refer to Matplotlib
documentation.
The plot1D
method is only for illustrative purposes. The users may use any
plotting library to visualize their datasets.
Now to plot the sea_level dataset.
>>> plot1D(sea_level)

Nuclear Magnetic Resonance (NMR) dataset¶
The following dataset is a \(^{13}\mathrm{C}\) time-domain NMR Bloch decay signal of ethanol. Let’s load this data file and take a quick look at its data structure. We follow the previously described steps.
>>> filename = 'Test Files/NMR/blochDecay/blochDecay.csdf'
>>> NMR_data = cp.load(filename)
>>> print(NMR_data.data_structure)
{
"csdm": {
"version": "1.0",
"read_only": true,
"timestamp": "2016-03-12T16:41:00Z",
"geographic_coordinate": {
"altitude": "238.9719543457031 m",
"longitude": "-83.05154573892345 °",
"latitude": "39.97968794964322 °"
},
"tags": [
"13C",
"NMR",
"spectrum",
"ethanol"
],
"description": "A time domain NMR 13C Bloch decay signal of ethanol.",
"dimensions": [
{
"type": "linear",
"count": 4096,
"increment": "0.1 ms",
"coordinates_offset": "-0.3 ms",
"quantity_name": "time",
"reciprocal": {
"coordinates_offset": "-3005.363 Hz",
"origin_offset": "75426328.86 Hz",
"quantity_name": "frequency",
"label": "13C frequency shift"
}
}
],
"dependent_variables": [
{
"type": "internal",
"numeric_type": "complex128",
"quantity_type": "scalar",
"components": [
[
"(-8899.40625-1276.7734375j), (-4606.88037109375-742.4124755859375j), ..., (37.548492431640625+20.156890869140625j), (-193.9228515625-67.06524658203125j)"
]
]
}
]
}
}
This particular example illustrates two additional attributes of the CSD model,
namely, the geographic_coordinate
and
tags
. The geographic_coordinate described the
location where the CSDM file was last serialized. You may access this
attribute through,
>>> NMR_data.geographic_coordinate
{'altitude': '238.9719543457031 m', 'longitude': '-83.05154573892345 °', 'latitude': '39.97968794964322 °'}
Similarly, the tags attribute can be accessed through,
>>> NMR_data.tags
['13C', 'NMR', 'spectrum', 'ethanol']
You may add additional tags, if so desired, to this list using the append method of python’s list class, such as
>>> NMR_data.tags.append("Bloch decay")
>>> NMR_data.tags
['13C', 'NMR', 'spectrum', 'ethanol', 'Bloch decay']
Unlike the previous example, the data structure of the NMR measurement is a complex-valued dependent variable where the values are
>>> y = NMR_data.dependent_variables
>>> print(y[0].components[0])
[-8899.40625 -1276.7734375j -4606.88037109 -742.41247559j
9486.43847656 -770.0413208j ... -70.95385742 -28.32843018j
37.54849243 +20.15689087j -193.92285156 -67.06524658j]
The coordinates along the dimension are
>>> x = NMR_data.dimensions
>>> x0 = x[0].coordinates
>>> print(x0)
[-3.000e-01 -2.000e-01 -1.000e-01 ... 4.090e+02 4.091e+02 4.092e+02] ms
Now to the plot the dataset,
>>> plot1D(NMR_data)

Electron Paramagnetic Resonance (EPR) dataset¶
The following simulation of the EPR dataset is formerly obtained as a JCAMP-DX file and subsequently converted to the CSD model file-format. The data structure of this dataset and the corresponding plot follows,
>>> filename = 'Test Files/EPR/AmanitaMuscaria_base64.csdf'
>>> EPR_data = cp.load(filename)
>>> print(EPR_data.data_structure)
{
"csdm": {
"version": "1.0",
"read_only": true,
"timestamp": "2015-02-26T16:41:00Z",
"description": "A Electron Paramagnetic Resonance simulated dataset.",
"dimensions": [
{
"type": "linear",
"count": 298,
"increment": "4.0 G",
"coordinates_offset": "2750.0 G",
"quantity_name": "magnetic flux density"
}
],
"dependent_variables": [
{
"type": "internal",
"name": "Amanita.muscaria",
"numeric_type": "float32",
"quantity_type": "scalar",
"component_labels": [
"Intensity Derivative"
],
"components": [
[
"0.067, 0.136, ..., -0.035, -0.137"
]
]
}
]
}
}
>>> plot1D(EPR_data)

Gas Chromatography dataset¶
The following Gas Chromatography dataset is also obtained as a JCAMP-DX file and subsequently converted to the CSD model file-format. The data structure and the plot of the gas chromatography dataset follows,
>>> filename = 'Test Files/GC/cinnamon_none.csdf'
>>> GCData = cp.load(filename)
>>> print(GCData.data_structure)
{
"csdm": {
"version": "1.0",
"read_only": true,
"timestamp": "2011-12-16T12:24:10Z",
"description": "A Gas Chromatography dataset of cinnamon stick.",
"dimensions": [
{
"type": "linear",
"count": 6001,
"increment": "0.0034 min",
"quantity_name": "time",
"reciprocal": {
"quantity_name": "frequency"
}
}
],
"dependent_variables": [
{
"type": "internal",
"name": "Headspace from cinnamon stick",
"numeric_type": "float32",
"quantity_type": "scalar",
"component_labels": [
"monotonic"
],
"components": [
[
"48453.0, 48444.0, ..., 48040.0, 48040.0"
]
]
}
]
}
}
>>> plot1D(GCData)

Fourier Transform Infrared Spectroscopy (FTIR) dataset¶
For the following FTIR dataset, we again convert the original JCAMP-DX file to the CSD model file-format. The data structure and the plot of the FTIR dataset follows,
>>> filename = 'Test Files/IR/caffeine_none.csdf'
>>> FTIR_data = cp.load(filename)
>>> print(FTIR_data.data_structure)
{
"csdm": {
"version": "1.0",
"read_only": true,
"timestamp": "2019-07-01T21:03:42Z",
"description": "An IR spectrum of caffeine.",
"dimensions": [
{
"type": "linear",
"count": 1842,
"increment": "1.930548614883216 cm^-1",
"coordinates_offset": "449.41 cm^-1",
"quantity_name": "wavenumber",
"reciprocal": {
"quantity_name": "length"
}
}
],
"dependent_variables": [
{
"type": "internal",
"name": "Caffeine",
"numeric_type": "float32",
"quantity_type": "scalar",
"component_labels": [
"Transmittance"
],
"components": [
[
"99.31053, 99.08212, ..., 100.22944, 100.22944"
]
]
}
]
}
}
>>> plot1D(FTIR_data)

Ultraviolet–visible (UV-vis) dataset¶
The following UV-vis dataset is originally downloaded as a JCAMP-DX file and consequently turned to the CSD model file-format. The data structure and the plot of the UV-vis dataset follows,
>>> filename = 'Test Files/UV-Vis/benzeneVapour_base64.csdf'
>>> UV_data = cp.load(filename)
>>> print(UV_data.data_structure)
{
"csdm": {
"version": "1.0",
"read_only": true,
"timestamp": "2014-09-30T11:16:33Z",
"description": "A UV-vis spectra of benzene vapours.",
"dimensions": [
{
"type": "linear",
"count": 4001,
"increment": "0.01 nm",
"coordinates_offset": "230.0 nm",
"quantity_name": "length",
"label": "wavelength",
"reciprocal": {
"quantity_name": "wavenumber"
}
}
],
"dependent_variables": [
{
"type": "internal",
"name": "Vapour of Benzene",
"numeric_type": "float32",
"quantity_type": "scalar",
"component_labels": [
"Absorbance"
],
"components": [
[
"0.25890622, 0.25923702, ..., 0.16814752, 0.16786034"
]
]
}
]
}
}
>>> plot1D(UV_data)

Mass spectrometry dataset¶
The following is an example of a sparse dataset. The acetone.csdf CSDM data file is stored as a sparse dependent variable data. Upon import, the values of the dependent variable component sparsely populate the coordinate grid. The remaining unpopulated coordinates are assigned a zero value.
>>> filename = 'Test Files/MassSpec/acetone.csdf'
>>> mass_spec = cp.load(filename)
>>> print(mass_spec.data_structure)
{
"csdm": {
"version": "1.0",
"read_only": true,
"timestamp": "2019-06-23T17:53:26Z",
"description": "MASS spectrum of acetone",
"dimensions": [
{
"type": "linear",
"count": 51,
"increment": "1.0",
"coordinates_offset": "10.0",
"label": "m/z"
}
],
"dependent_variables": [
{
"type": "internal",
"name": "acetone",
"numeric_type": "float32",
"quantity_type": "scalar",
"component_labels": [
"relative abundance"
],
"components": [
[
"0.0, 0.0, ..., 10.0, 0.0"
]
]
}
]
}
}
Here, the coordinates along the dimension are
>>> print(mass_spec.dimensions[0].coordinates)
[10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27.
28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45.
46. 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 60.]
and the components of the dependent variable follow
>>> print(mass_spec.dependent_variables[0].components[0])
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 9. 9. 49. 0. 0. 79. 1000. 19. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
270. 10. 0.]
Note, only eight values were specified in the dependent variable components attribute in the .csdf file. The remaining component values are set to zero.
Now to plot the dataset.
>>> plot1D(mass_spec)

Scalar, 2D{1} datasets¶
The 2D{1} datasets are two dimensional, \(d=2\), with one single-component dependent variable, \(p=1\). Here, we present examples of 2D{1} datasets with the scalar dependent variable.
Astronomy dataset¶
The following dataset is a new observation of the Bubble Nebula acquired by The Hubble Heritage Team, in February 2016. The original dataset was obtained in the FITS format and subsequently converted to the CSD model file-format. For the convenience of illustration, we have downsampled the original dataset.
Let’s load the .csdfe file and look at its data structure.
>>> import csdmpy as cp
>>> bubble_nebula = cp.load('Test Files/BubbleNebula/Bubble.csdfe')
>>> print(bubble_nebula.data_structure)
{
"csdm": {
"version": "1.0",
"read_only": true,
"timestamp": "2016-02-26T16:41:00Z",
"description": "The dataset is a new observation of the Bubble Nebula acquired by The Hubble Heritage Team, in February 2016.",
"dimensions": [
{
"type": "linear",
"count": 1024,
"increment": "-0.0002581136196 °",
"coordinates_offset": "350.311874957 °",
"quantity_name": "plane angle",
"label": "Right Ascension"
},
{
"type": "linear",
"count": 1024,
"increment": "0.0001219957797701109 °",
"coordinates_offset": "61.12851494969163 °",
"quantity_name": "plane angle",
"label": "Declination"
}
],
"dependent_variables": [
{
"type": "internal",
"name": "Bubble Nebula, 656nm",
"numeric_type": "float32",
"quantity_type": "scalar",
"components": [
[
"0.0, 0.0, ..., 0.0, 0.0"
]
]
}
]
}
}
Here, the variable bubble_nebula
is an instance of the CSDM
class. From the data structure, one finds two dimensions, labeled as
Right Ascension and Declination, and one single-component dependent
variable named Bubble Nebula, 656nm.
Let’s get the tuples of the dimension and dependent variable instances from
the bubble_nebula
instance following,
>>> x = bubble_nebula.dimensions
>>> y = bubble_nebula.dependent_variables
There are two dimension instances in x
. Let’s look
at the coordinates along each dimension, using the
coordinates
attribute of the
respective instances.
>>> print(x[0].coordinates[:10])
[350.31187496 350.31161684 350.31135873 350.31110062 350.3108425
350.31058439 350.31032628 350.31006816 350.30981005 350.30955193] deg
>>> print(x[1].coordinates[:10])
[61.12851495 61.12863695 61.12875894 61.12888094 61.12900293 61.12912493
61.12924692 61.12936892 61.12949092 61.12961291] deg
Here, we only print the first ten coordinates along the respective dimensions.
The component of the dependent variable is accessed through the
components
attribute.
>>> y00 = y[0].components[0]
Visualizing the dataset
Now, to plot the dataset.
Tip
Intensity plot.
>>> import matplotlib.pyplot as plt
>>> from matplotlib.colors import LogNorm
>>> import numpy as np
>>> def plot():
... # Figure setup.
... fig, ax = plt.subplots(1,1, figsize=(4,3))
... ax.set_facecolor('w')
...
... # the coordinates along the two dimensions
... x0 = x[0].coordinates
... x1 = x[1].coordinates
...
... # Set the extents of the image.
... extent=[x0[0].value, x0[-1].value, x1[0].value, x1[-1].value]
...
... # Log intensity image plot.
... im = ax.imshow(np.abs(y00), origin='lower', cmap='bone_r',
... norm=LogNorm(vmax=y00.max()/10, vmin=7.5e-3, clip=True),
... extent=extent, aspect='auto')
...
... # Set the axes labels and the figure tile.
... ax.set_xlabel(x[0].axis_label)
... ax.set_ylabel(x[1].axis_label)
... ax.set_title(y[0].name)
... ax.locator_params(nbins=5)
...
... # Add a colorbar.
... cbar = fig.colorbar(im)
... cbar.ax.set_ylabel(y[0].axis_label[0])
...
... # Set the x and y limits.
... ax.set_xlim([350.25, 350.1])
... ax.set_ylim([61.15, 61.22])
...
... # Add grid lines.
... ax.grid(color='gray', linestyle='--', linewidth=0.5)
...
... plt.tight_layout(pad=0, w_pad=0, h_pad=0)
... plt.show()
>>> plot()

Nuclear Magnetic Resonance (NMR) dataset¶
The following example is a \(^{29}\mathrm{Si}\) NMR time-domain saturation recovery measurement of a highly siliceous zeolite ZSM-12. Usually, the spin recovery measurements are acquired over a rectilinear grid where measurements along one of the dimensions are non-uniform and span several orders of magnitude. In this example, we illustrate the use of monotonic dimensions for describing such datasets.
Let’s load the file.
>>> import csdmpy as cp
>>> filename = 'Test Files/NMR/satrec/satRec.csdf'
>>> NMR_2D_data = cp.load(filename)
The tuples of the dimension and dependent variable instances from the
NMR_2D_data
instance are
>>> x = NMR_2D_data.dimensions
>>> y = NMR_2D_data.dependent_variables
respectively. There are two dimension instances in this example with respective dimension data structures as
>>> print(x[0].data_structure)
{
"type": "linear",
"description": "A full echo echo acquisition along the t2 dimension using a Hahn echo.",
"count": 1024,
"increment": "80.0 µs",
"coordinates_offset": "-41.04 ms",
"quantity_name": "time",
"label": "t2",
"reciprocal": {
"coordinates_offset": "-8766.0626 Hz",
"origin_offset": "79578822.26200001 Hz",
"quantity_name": "frequency",
"label": "29Si frequency shift"
}
}
and
>>> print(x[1].data_structure)
{
"type": "monotonic",
"coordinates": [
"1 s",
"5 s",
"10 s",
"20 s",
"40 s",
"80 s"
],
"quantity_name": "time",
"label": "t1",
"reciprocal": {
"quantity_name": "frequency"
}
}
respectively. The first dimension is uniformly spaced, as indicated by the linear subtype, while the second dimension is non-linear and monotonically sampled. The coordinates along the respective dimensions are
>>> x0 = x[0].coordinates
>>> print(x0)
[-41040. -40960. -40880. ... 40640. 40720. 40800.] us
>>> x1 = x[1].coordinates
>>> print(x1)
[ 1. 5. 10. 20. 40. 80.] s
Notice, the unit of x0
is in microseconds. It might be convenient to
convert the unit to milliseconds. To do so, use the
to()
method of the respective
Dimension instance as follows,
>>> x[0].to('ms')
>>> x0 = x[0].coordinates
>>> print(x0)
[-41.04 -40.96 -40.88 ... 40.64 40.72 40.8 ] ms
As before, the components of the dependent variable are accessed using the
components
attribute.
>>> y00 = y[0].components[0]
>>> print(y00)
[[ 182.26953 +136.4989j -530.45996 +145.59097j
-648.56055 +296.6433j ... -1034.6655 +123.473114j
137.29883 +144.3381j -151.75049 -18.316727j]
[ -80.799805 +138.63733j -330.4419 -131.69786j
-356.23877 +463.6406j ... 854.9712 +373.60577j
432.64648 +525.6024j -35.51758 -141.60239j ]
[ -215.80469 +163.03308j -330.6836 -308.8578j
-1313.7393 -1557.9144j ... -979.9209 +271.06757j
-667.6211 +61.262817j 150.32227 -41.081024j]
[ 6.2421875 -163.0319j -654.5654 +372.27518j
-1209.3877 -217.7103j ... 202.91211 +910.0657j
-163.88281 +343.41882j 27.354492 +21.467224j]
[ -86.03516 -129.40945j -461.1875 -74.49284j
68.13672 -641.11975j ... 803.3242 -423.6355j
-267.3672 -226.39514j 77.77344 +80.2041j ]
[ -436.0664 -131.52814j 216.32812 +441.56696j
-577.0254 -658.17645j ... -1780.457 +454.20862j
-1765.7441 -375.72888j 407.0703 +162.24716j ]]
Visualizing the dataset
Tip
Intensity plot with cross-sections
More often than not, the code required to plot the data become exhaustive. Here is one such example.
>>> import matplotlib.pyplot as plt
>>> from matplotlib.image import NonUniformImage
>>> import numpy as np
>>> def plot_nmr_2d():
... """
... Set the extents of the image.
... To set the independent variable coordinates at the center of each image
... pixel, subtract and add half the sampling interval from the first
... and the last coordinate, respectively, of the linearly sampled
... dimension, i.e., x0.
... """
... si=x[0].increment
... extent = ((x0[0]-0.5*si).to('ms').value,
... (x0[-1]+0.5*si).to('ms').value,
... x1[0].value,
... x1[-1].value)
...
... """
... Create a 2x2 subplot grid. The subplot at the lower-left corner is for
... the image intensity plot. The subplots at the top-left and bottom-right
... are for the data slice at the horizontal and vertical cross-section,
... respectively. The subplot at the top-right corner is empty.
... """
... fig, axi = plt.subplots(2,2, figsize=(4,3),
... gridspec_kw = {'width_ratios':[4,1],
... 'height_ratios':[1,4]})
...
... """
... The image subplot quadrant.
... Add an image over a rectilinear grid. Here, only the real part of the
... data values is used.
... """
... ax = axi[1,0]
... im = NonUniformImage(ax, interpolation='nearest',
... extent=extent, cmap='bone_r')
... im.set_data(x0, x1, y00.real/y00.real.max())
...
... """Add the colorbar and the component label."""
... cbar = fig.colorbar(im)
... cbar.ax.set_ylabel(y[0].axis_label[0])
...
... """Set up the grid lines."""
... ax.images.append(im)
... for i in range(x1.size):
... ax.plot(x0, np.ones(x0.size)*x1[i], 'k--', linewidth=0.5)
... ax.grid(axis='x', color='k', linestyle='--', linewidth=0.5, which='both')
...
... """Setup the axes, add the axes labels, and the figure title."""
... ax.set_xlim([extent[0], extent[1]])
... ax.set_ylim([extent[2], extent[3]])
... ax.set_xlabel(x[0].axis_label)
... ax.set_ylabel(x[1].axis_label)
... ax.set_title(y[0].name)
...
... """Add the horizontal data slice to the top-left subplot."""
... ax0 = axi[0,0]
... top = y00[-1].real
... ax0.plot(x0, top, 'k', linewidth=0.5)
... ax0.set_xlim([extent[0], extent[1]])
... ax0.set_ylim([top.min(), top.max()])
... ax0.axis('off')
...
... """Add the vertical data slice to the bottom-right subplot."""
... ax1 = axi[1,1]
... right = y00[:,513].real
... ax1.plot(right, x1, 'k', linewidth=0.5)
... ax1.set_ylim([extent[2], extent[3]])
... ax1.set_xlim([right.min(), right.max()])
... ax1.axis('off')
...
... """Turn off the axis system for the top-right subplot."""
... axi[0,1].axis('off')
...
... plt.tight_layout(pad=0., w_pad=0., h_pad=0.)
... plt.subplots_adjust(wspace=0.025, hspace=0.05)
... plt.show()
>>> plot_nmr_2d()

Transmission Electron Microscopy (TEM) dataset¶
The following TEM dataset is a section of an early larval brain of Drosophila melanogaster used in the analysis of neuronal microcircuitry. The dataset was obtained from the TrakEM2 tutorial and subsequently converted to the CSD model file-format.
Let’s import the CSD model data-file and look at its data structure.
>>> import csdmpy as cp
>>> import matplotlib.pyplot as plt
>>> filename = 'Test Files/TEM/TEM.csdf'
>>> TEM = cp.load(filename)
>>> print(TEM.data_structure)
{
"csdm": {
"version": "1.0",
"read_only": true,
"timestamp": "2016-03-12T16:41:00Z",
"description": "TEM image of the early larval brain of Drosophila melanogaster used in the analysis of neuronal microcircuitry.",
"dimensions": [
{
"type": "linear",
"count": 512,
"increment": "4.0 nm",
"quantity_name": "length",
"reciprocal": {
"quantity_name": "wavenumber"
}
},
{
"type": "linear",
"count": 512,
"increment": "4.0 nm",
"quantity_name": "length",
"reciprocal": {
"quantity_name": "wavenumber"
}
}
],
"dependent_variables": [
{
"type": "internal",
"numeric_type": "uint8",
"quantity_type": "scalar",
"components": [
[
"126, 107, ..., 164, 171"
]
]
}
]
}
}
This dataset consists of two linear dimensions and one single-component dependent variable. The tuples of the dimension and the dependent variable instances from this example are
>>> x = TEM.dimensions
>>> y = TEM.dependent_variables
and the respective coordinates (viewed only for the first ten coordinates),
>>> print(x[0].coordinates[:10])
[ 0. 4. 8. 12. 16. 20. 24. 28. 32. 36.] nm
>>> print(x[1].coordinates[:10])
[ 0. 4. 8. 12. 16. 20. 24. 28. 32. 36.] nm
For convenience, let’s convert the coordinate from nm to µm using the
to()
method of the respective Dimension
instance,
>>> x[0].to('µm')
>>> x[1].to('µm')
and plot the data.
>>> def plot_image():
... plt.figure(figsize=(4,3))
...
... # Set the extents of the image plot.
... extent = [x[0].coordinates[0].value, x[0].coordinates[-1].value,
... x[1].coordinates[0].value, x[1].coordinates[-1].value]
...
... # Add the image plot.
... im = plt.imshow(y[0].components[0], origin='lower', extent=extent, cmap='gray')
...
... # Add a colorbar.
... cbar = plt.gca().figure.colorbar(im)
... cbar.ax.set_ylabel(y[0].axis_label[0])
...
... # Set up the axes label and figure title.
... plt.xlabel(x[0].axis_label)
... plt.ylabel(x[1].axis_label)
... plt.title(y[0].name)
...
... # Set up the grid lines.
... plt.grid(color='k', linestyle='--', linewidth=0.5)
...
... plt.tight_layout(pad=0, w_pad=0, h_pad=0)
... plt.show()
>>> plot_image()

Vector, 2D{2} datasets¶
The 2D{2} datasets are two-dimensional, \(d=2\), with one two-component dependent variable, \(p=2\).
Vector dataset¶
The following is an example of a simulated electric field vector dataset of a dipole as a function of two linearly sampled spatial dimensions.
>>> import csdmpy as cp
>>> filename = 'Test Files/vector/electric_field/electric_field_raw.csdfe'
>>> vector_data = cp.load(filename)
>>> print (vector_data.data_structure)
{
"csdm": {
"version": "1.0",
"read_only": true,
"timestamp": "2014-09-30T11:16:33Z",
"description": "A simulated electric field dataset from an electric dipole.",
"dimensions": [
{
"type": "linear",
"count": 64,
"increment": "0.0625 cm",
"coordinates_offset": "-2.0 cm",
"quantity_name": "length",
"label": "x",
"reciprocal": {
"quantity_name": "wavenumber"
}
},
{
"type": "linear",
"count": 64,
"increment": "0.0625 cm",
"coordinates_offset": "-2.0 cm",
"quantity_name": "length",
"label": "y",
"reciprocal": {
"quantity_name": "wavenumber"
}
}
],
"dependent_variables": [
{
"type": "internal",
"name": "Electric field lines",
"unit": "C^-1 * N",
"quantity_name": "electric field strength",
"numeric_type": "float32",
"quantity_type": "vector_2",
"components": [
[
"3.7466873e-07, 3.3365018e-07, ..., 3.5343004e-07, 4.0100363e-07"
],
[
"1.6129676e-06, 1.6765767e-06, ..., 1.846712e-06, 1.7754871e-06"
]
]
}
]
}
}
The tuples of the dimension and dependent variable instances from this example are
>>> x = vector_data.dimensions
>>> y = vector_data.dependent_variables
with the respective coordinates (viewed only up to five values), as
>>> print(x[0].coordinates[:5])
[-2. -1.9375 -1.875 -1.8125 -1.75 ] cm
>>> print(x[1].coordinates[:5])
[-2. -1.9375 -1.875 -1.8125 -1.75 ] cm
In this example, the components of the dependent variable are
vectors as seen from the
quantity_type
attribute of the corresponding dependent variable instance.
>>> print(y[0].quantity_type)
vector_2
From the value vector_2, vector indicates a vector dataset, while 2 indicates the number of vector components.
Visualizing the dataset
Let’s visualize the vector data using the streamplot method from the matplotlib package. Before we could visualize, however, there is an initial processing step. We use the Numpy library for processing.
>>> import numpy as np
>>> X, Y = np.meshgrid(x[0].coordinates, x[1].coordinates)
>>> U, V = y[0].components[0], y[0].components[1]
>>> R = np.sqrt(U**2 + V**2)
>>> R/=R.min()
>>> Rlog=np.log10(R)
In the above steps, we calculate the X-Y grid points along with a scaled magnitude of the vector dataset. The magnitude is scaled such that the minimum value is one. Next, calculate the log of the scaled magnitude to visualize the intensity on a logarithmic scale.
And now, the plot.
Tip
Streamplot vector visualization
>>> import matplotlib.pyplot as plt
>>> def plot_vector():
... plt.figure(figsize=(4,3.5))
... plt.streamplot(X.value, Y.value, U, V, density =1,
... linewidth=Rlog, color=Rlog, cmap='viridis')
...
... plt.xlim([x[0].coordinates[0].value, x[0].coordinates[-1].value])
... plt.ylim([x[1].coordinates[0].value, x[1].coordinates[-1].value])
...
... # Set axes labels and figure title.
... plt.xlabel(x[0].axis_label)
... plt.ylabel(x[1].axis_label)
... plt.title(y[0].name)
...
... # Set grid lines.
... plt.grid(color='gray', linestyle='--', linewidth=0.5)
...
... plt.tight_layout(pad=0, w_pad=0, h_pad=0)
... plt.show()
>>> plot_vector()

Pixel, 2D{3} datasets¶
The 2D{3} datasets is two dimensional, \(d=2\), with a single three-component dependent variable, \(p=3\).
Image datasets¶
A common example from this subset is perhaps the RGB image dataset. An RGB image dataset has two spatial dimensions and one dependent variable with three components corresponding to the red, green, and blue color intensities.
The following is an example of the RGB image dataset.
>>> import csdmpy as cp
>>> filename = 'Test Files/image/raccoon_image.csdf'
>>> ImageData = cp.load(filename)
>>> print (ImageData.data_structure)
{
"csdm": {
"version": "1.0",
"read_only": true,
"timestamp": "2016-03-12T16:41:00Z",
"tags": [
"racoon",
"image",
"Judy Weggelaar"
],
"description": "An RBG image of a raccoon face.",
"dimensions": [
{
"type": "linear",
"count": 1024,
"increment": "1.0",
"label": "horizontal index"
},
{
"type": "linear",
"count": 768,
"increment": "1.0",
"label": "vertical index"
}
],
"dependent_variables": [
{
"type": "internal",
"name": "raccoon",
"numeric_type": "uint8",
"quantity_type": "pixel_3",
"component_labels": [
"red",
"green",
"blue"
],
"components": [
[
"121, 138, ..., 119, 118"
],
[
"112, 129, ..., 155, 154"
],
[
"131, 148, ..., 93, 92"
]
]
}
]
}
}
The tuples of the dimension and dependent variable instances from
ImageData
instance are
>>> x = ImageData.dimensions
>>> y = ImageData.dependent_variables
respectively. There are two dimensions, and the coordinates along each dimension are
>>> print('x0 =', x[0].coordinates[:10])
x0 = [0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]
>>> print('x1 =', x[1].coordinates[:10])
x1 = [0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]
respectively. In the above example, only the first ten coordinates along each dimension are displayed.
The dependent variable is the image data, as also seen from the
quantity_type
attribute
of the corresponding DependentVariable instance.
>>> print(y[0].quantity_type)
pixel_3
From the value pixel_3, pixel indicates a pixel data point, while 3 indicates the number of pixel components.
As usual, the components of the dependent variable are accessed through
the components
attribute.
To access the individual components, use the appropriate array indexing.
For example,
>>> print (y[0].components[0])
[[121 138 153 ... 119 131 139]
[ 89 110 130 ... 118 134 146]
[ 73 94 115 ... 117 133 144]
...
[ 87 94 107 ... 120 119 119]
[ 85 95 112 ... 121 120 120]
[ 85 97 111 ... 120 119 118]]
will return an array with the first component of all data values. In this case,
the components correspond to the red color intensity, also indicated by the
corresponding component label. The label corresponding to
this component array is accessed through the
component_labels
attribute with appropriate indexing, that is
>>> print (y[0].component_labels[0])
red
To avoid displaying larger output, as an example, we print the shape of each component array (using Numpy array’s shape attribute) for the three components along with their respective labels.
>>> print (y[0].component_labels[0], y[0].components[0].shape)
red (768, 1024)
>>> print (y[0].component_labels[1], y[0].components[1].shape)
green (768, 1024)
>>> print (y[0].component_labels[2], y[0].components[2].shape)
blue (768, 1024)
The shape (768, 1024) corresponds to the number of points from the each dimension instances.
Note
In this example, since there is only one dependent variable, the index
of y is set to zero, which is y[0]
. The indices for the
components
and the
component_labels
,
on the other hand, spans through the number of components.
Now, to visualize the dataset as an RGB image, we use the matplotlib imshow method.
Tip
RGB image plot
>>> import matplotlib.pyplot as plt
>>> import numpy as np
>>> def image_data():
... fig, ax = plt.subplots(1,1, figsize=(4,3))
... ax.imshow(np.moveaxis(y[0].components, 0, -1 ))
... ax.set_xlabel(x[0].axis_label)
... ax.set_ylabel(x[1].axis_label)
... plt.tight_layout(pad=0, w_pad=0, h_pad=0)
... plt.show()
>>> image_data()

Labeled Dataset¶
The CSD model also supports labeled dimensions. In the following example, we present a mixed linear and labeled two-dimensional dataset representing the population of the country as a function of year. The dataset is obtained from The World Bank.
Import the csdmpy model and load the dataset.
>>> import csdmpy as cp
>>> import matplotlib.pyplot as plt
>>> filename = 'Test Files/labeled/population.csdf'
>>> labeled_data = cp.load(filename)
The tuple of dimension and dependent variable objects from
labeled_data
instance are
>>> x = labeled_data.dimensions
>>> y = labeled_data.dependent_variables
Since one of the dimensions is a labeled dimension, let’s make use of the
type
attribute of the dimension instances
to find out which dimension is labeled.
>>> x[0].type
'linear'
>>> x[1].type
'labeled'
Look like the second dimension is a labeled dimension with 1
>>> x[1].count
263
labels, where the first five labels are
>>> print(x[1].labels[:5])
['Aruba' 'Afghanistan' 'Angola' 'Albania' 'Andorra']
Note
For labeled dimensions, the coordinates
attribute is an alias of the labels
attribute. Therefore,
>>> print(x[1].coordinates[:5])
['Aruba' 'Afghanistan' 'Angola' 'Albania' 'Andorra']
The coordinates along the first dimension viewed up to the first ten points are
>>> print(x[0].coordinates[:10])
[1960. 1961. 1962. 1963. 1964. 1965. 1966. 1967. 1968. 1969.] yr
Plotting the dataset
You may plot this dataset however you like. Here, we use a bar graph to
represent the population of countries in the year 2017. The data
corresponding to this year is a cross-section of the dependent variable
at index 57 along the x[0]
dimension.
>>> print(x[0].coordinates[57])
2017.0 yr
To keep the plot simple, we only plot the first 20 country labels along
the x[1]
dimension.
>>> def plot_bar():
... plt.figure(figsize=(4,4))
...
... x_data = x[1].coordinates[:20]
... x_pos = np.arange(20)
... y_data = y[0].components[0][:20, 57]
...
... plt.bar(x_data, y_data, align='center', alpha=0.5)
... plt.xticks(x_pos, x_data, rotation=90)
... plt.ylabel(y[0].axis_label[0])
... plt.yscale("log")
... plt.title(y[0].name)
... plt.tight_layout(pad=0, w_pad=0, h_pad=0)
... plt.show()
>>> plot_bar()

Footnotes
- 1
In the CSD model, the attribute count is only valid for the LinearDimension. In csdmpy, however, the
count
attribute is valid for all dimension objects and returns an integer with the number of grid points along the dimension.
Using csdmpy’s objects¶
The csdmpy module is not just designed for deserializing and serializing the .csdf or .csdfe files. It can also be used to create new datasets, a feature that is most useful when converting datasets to CSDM compliant files.
How to create a new dataset¶
To create a new dataset, import the csdmpy module and create a new instance of the CSDM class following,
>>> import csdmpy as cp
>>> new_data = cp.new(description='A new test dataset')
The new()
method returns an instance of the CSDM class with zero
dimensions and dependent variables. respectively, i.e., a 0D{0} dataset.
In the above example, this instance is assigned to the new_data
variable.
Optionally, a description may also be provided as an argument of the
new()
method.
The data structure from the above example is
>>> print(new_data.data_structure)
{
"csdm": {
"version": "1.0",
"description": "A new test dataset",
"dimensions": [],
"dependent_variables": []
}
}
How to add instances of Dimension class¶
In the previous section, we created a new dataset using
>>> import csdmpy as cp
>>> new_data = cp.new(description='A new test dimension dataset')
In this section, we illustrate, adding dimension objects to a csdm instance.
An instance of the Dimension class is added using the
add_dimension()
method of the CSDM
instance.
There are three subtypes of Dimension objects,
LinearDimension
MonotonicDimension
LabeledDimension
See also
Dimension API for further detail.
LinearDimension¶
A linear dimension is where the coordinates along the dimension are uniformly
spaced. Let’s add a LinearDimension instance to the new_data
instance,
using Python’s dictionary object as follows
>>> d0 = {
... 'type': 'linear',
... 'description': 'This is a linear dimension',
... 'count': 10,
... 'increment': '0.1 s'
... }
Here, we define the dimension type as linear and provide an increment,
along with the total number of points, count, along the dimension. Now, add
this dictionary to the new_data
instance using
>>> new_data.add_dimension(d0)
This will generate and add a LinearDimension object to the list of dimensions. The dataset is now a 1D{0} dataset with the following data structure,
>>> print(new_data.data_structure)
{
"csdm": {
"version": "1.0",
"description": "A new test dimension dataset",
"dimensions": [
{
"type": "linear",
"description": "This is a linear dimension",
"count": 10,
"increment": "0.1 s",
"quantity_name": "time",
"reciprocal": {
"quantity_name": "frequency"
}
}
],
"dependent_variables": []
}
}
MonotonicDimension¶
Try adding another Dimension object to this dataset. This time add a monotonic dimension. A monotonic dimension is where the coordinates along the dimension, are spaced either strictly increasing or strictly decreasing. In the following example, we use a different approach for adding the dimension object, that is, using the keyword arguments as follows,
>>> new_data.add_dimension(
... type='monotonic',
... description='This is a monotonic dimension',
... coordinates=['1 µG', '2.1 mG', '12.4 G', '0.5 T', '2 T'])
The above operation generates an instance of the MonotonicDimension and adds
it to the new_data
instance, thereby, creating a 2D{0} dataset. The data
structure form the updated new_data
instance follows,
>>> print(new_data.data_structure)
{
"csdm": {
"version": "1.0",
"description": "A new test dimension dataset",
"dimensions": [
{
"type": "linear",
"description": "This is a linear dimension",
"count": 10,
"increment": "0.1 s",
"quantity_name": "time",
"reciprocal": {
"quantity_name": "frequency"
}
},
{
"type": "monotonic",
"description": "This is a monotonic dimension",
"coordinates": [
"1 µG",
"2.1 mG",
"12.4 G",
"0.5 T",
"2 T"
],
"quantity_name": "magnetic flux density"
}
],
"dependent_variables": []
}
}
Notice, every time a new physical dimension is added, the value of the
quantity_name
attribute is
appropriately added, if applicable.
LabeledDimension¶
The third type of dimension is the labeled dimension. As the name suggests, this dimension consists of labels. This type of dimension is useful for datasets describing, for example, the ionization energy as a function of atomic symbols or the population of different countries.
Let’s add a labeled dimension to the new_data
instance.
This time pass an instance of the Dimension class as the argument of the
add_dimension()
method. To create an instance of
the Dimension class follow,
>>> from csdmpy import Dimension
>>> d1 = Dimension(
... type = 'labeled',
... description = 'This is a labeled dimensions.',
... labels = ['Cu', 'Ag', 'Au']
... )
In the above code, the variable d1
is an instance of Dimension class.
Now add this instance to the add_dimension()
method.
>>> new_data.add_dimension(d1)
This generates a 3D{0} dataset with the data structure —
>>> print(new_data.data_structure)
{
"csdm": {
"version": "1.0",
"description": "A new test dimension dataset",
"dimensions": [
{
"type": "linear",
"description": "This is a linear dimension",
"count": 10,
"increment": "0.1 s",
"quantity_name": "time",
"reciprocal": {
"quantity_name": "frequency"
}
},
{
"type": "monotonic",
"description": "This is a monotonic dimension",
"coordinates": [
"1 µG",
"2.1 mG",
"12.4 G",
"0.5 T",
"2 T"
],
"quantity_name": "magnetic flux density"
},
{
"type": "labeled",
"description": "This is a labeled dimensions.",
"labels": [
"Cu",
"Ag",
"Au"
]
}
],
"dependent_variables": []
}
}
Attention
When using a Dimension instance as an argument of the
add_dimension()
method, one
must be aware that instances in Python are passed by reference. Therefore,
any changes to the instance d1
, in the above example, will affect the
corresponding dimension instance in the new_data
instance.
To be safe, as a general
recommendation, one should always pass a copy of the instance to the
add_dimension()
method. We allow the use of
Dimension objects as arguments because it provides an easy alternative
for copying an instance of the Dimension class from one
CSDM instance to another.
How to add instances of DependentVariable class¶
In the previous two sections, we illustrated, creating a new dataset and adding dimensions to the dataset. In this example, we illustrate adding dependent variables to the dataset. Let’s start by creating a new dataset,
>>> import csdmpy as cp
>>> new_data = cp.new(description='A new test dependent variables dataset')
An instance of the DependentVariable class is added using the
add_dependent_variable()
method of the CSDM
instance. There are two subtypes of DependentVariable class:
InternalDependentVariable: We refer to an instance of the DependentVariable as internal when the components of the dependent variable are listed along with the other metadata specifying the dependent variable.
ExternalDependentVariable: We refer to an instance of the DependentVariable as external when the components of the dependent variable are stored in an external file as binary data either locally or at a remote server.
In version 1.0, the CSD model allows dependent variables with quantity types as
scalar
vector_n
pixel_n
matrix_n_m
symmetric_matrix_n
See also
Read more about DependentVariable API.
Adding a scalar dependent variables¶
A dependent variable with quantity type scalar is a single-component dependent variable. Consider the following python dictionary
>>> d0 = {
... 'type': 'internal',
... 'quantity_type': 'scalar',
... 'description': 'This is an internal scalar dependent variable',
... 'unit': 'cm',
... 'components': [np.arange(100)]
... }
Here, the components are scalar physical quantities with the unit cm. The
components are listed as the value of the
components
keyword, and
therefore, the value of the type keyword is specified as internal.
Note
The value of the components attribute is a list of numpy array.
In csdmpy, the first dimension is reserved for the components. Since this
example has only one component, we specify it as a list of numpy array.
Alternatively, one could also assign
np.arange(100).reshape(np.newaxis, 100)
as the value of the components
attribute.
To add a dependent variable to the new_data
instance, use the
add_dependent_variable()
method as
>>> new_data.add_dependent_variable(d0)
This will generate and add a DependentVariable object to the list of dependent variables, thereby creating a 0D{1} dataset. The data structure, after adding the dependent variable is,
>>> print(new_data.data_structure)
{
"csdm": {
"version": "1.0",
"description": "A new test dependent variables dataset",
"dimensions": [],
"dependent_variables": [
{
"type": "internal",
"description": "This is an internal scalar dependent variable",
"unit": "cm",
"quantity_name": "length",
"numeric_type": "int64",
"quantity_type": "scalar",
"components": [
[
"0, 1, ..., 98, 99"
]
]
}
]
}
}
Adding a multi-component dependent variables¶
In this next example, we demonstrate how to add a dependent variable with
multiple components. This time we use keywords as the argument of the
add_dependent_variable()
method to add a new
dependent variable.
>>> new_data.add_dependent_variable(
... type='internal',
... description='This is an internal vector dependent variable',
... quantity_type='vector_3',
... unit='kg * m / s^2',
... components=np.arange(300, dtype='complex64').reshape(3,100)
... )
Notice, the value of the components keyword is a numpy array of shape 3 x 100, where 3 is the number of components, and 100 is the number of points per component. Here we specify vector_3 as the value of the quantity_type, indicating that the three components of the dependent variable should be interpreted as vector dataset. Users may, however, substitute this value with any valid quantity_type consistent with the number of components. For example, the above dataset may also be interpreted as image data if the quantity type is given as pixel_3.
The data structure after adding the above dependent variable is
>>> print(new_data.data_structure)
{
"csdm": {
"version": "1.0",
"description": "A new test dependent variables dataset",
"dimensions": [],
"dependent_variables": [
{
"type": "internal",
"description": "This is an internal scalar dependent variable",
"unit": "cm",
"quantity_name": "length",
"numeric_type": "int64",
"quantity_type": "scalar",
"components": [
[
"0, 1, ..., 98, 99"
]
]
},
{
"type": "internal",
"description": "This is an internal vector dependent variable",
"unit": "kg * m * s^-2",
"quantity_name": "force",
"numeric_type": "complex64",
"quantity_type": "vector_3",
"components": [
[
"0j, (1+0j), ..., (98+0j), (99+0j)"
],
[
"(100+0j), (101+0j), ..., (198+0j), (199+0j)"
],
[
"(200+0j), (201+0j), ..., (298+0j), (299+0j)"
]
]
}
]
}
}
How to save datasets¶
An instance of a CSDM object is serialized as a csdf/csdfe
JSON-format file with the save()
method.
When serializing the dependent-variable from the CSDM object to the data-file,
the csdmpy module uses the value of the dependent variable’s
encoding
attribute to
determine the encoding type of the serialized data. There are three encoding
types for the dependent variables:
none
base64
raw
Note
By default, all instances of
DependentVariable
from a
CSDM
object are serialized as
base64 strings.
For the following examples, consider data
as an instance of the
CSDM
class.
Serializing dependent-variables¶
To serialize a dependent variable with a given encoding type, set the value of it’s encoding attribute to the respective encoding. For example,
As ``none`` encoding
>>> data.dependent_variables[0].encoding = "none"
>>> data.save('my_file.csdf')
The above code will serialize the dependent variable at index zero to a JSON file, my_file.csdf, where each component of the dependent variable is serialized as an array of JSON number.
As ``base64`` encoding
>>> data.dependent_variables[0].encoding = "base64"
>>> data.save('my_file.csdf')
The above code will serialize the dependent variable at index zero to a JSON file, my_file.csdf, where each component of the dependent variable is serialized as a base64 string.
As ``raw`` encoding
>>> data.dependent_variables[0].encoding = "raw"
>>> data.save('my_file.csdfe')
The above code will serialize the metadata from the dependent variable at index zero to a JSON file, my_file.csdfe, which includes a link to an external file where the components of the respective dependent variable are serialized as a binary array. The binary file is named, my_file_0.dat, where my_file is the filename from the argument of the save method, and 0 is the index number of the dependent variable from the CSDM object.
Multiple encoding types
In the case of multiple dependent-variables, you may choose to serialize each dependent variables with a different encoding, for example,
>>> my_data.dependent_variables[0].encoding = "raw"
>>> my_data.dependent_variables[1].encoding = "base64"
>>> my_data.dependent_variables[2].encoding = "none"
>>> my_data.dependent_variables[3].encoding = "base64"
>>> my_data.save('my_file.csdfe')
In the above example, my_data
is a CSDM object containing four
DependentVariable
objects. Here, we
serialize the dependent variable at index two with none
,
the dependent variables at index one and three with bae64
,
and the dependent variables at index zero with raw
encoding, respectively.
Note
Because an instance of the dependent variable, that is, the index zero in the above example, is set to be serialized with an external subtype, the corresponding file should be saved with a .csdfe extension.
An emoji 😁 example¶
Let’s make use of what we learned so far and create a simple 1D{1} dataset. To make it interesting, let’s create an emoji dataset.
Start by importing the csdmpy package.
>>> import csdmpy as cp
Create a new dataset with the new()
method.
>>> fundata = cp.new(description='An emoji dataset')
Here, fundata is an instance of the CSDM class with a 0D{0} dataset. The data structure of this instance is
>>> print(fundata.data_structure)
{
"csdm": {
"version": "1.0",
"description": "An emoji dataset",
"dimensions": [],
"dependent_variables": []
}
}
Add a labeled dimension to the fundata instance. Here, we’ll make use of python dictionary.
>>> x = dict(type='labeled', labels=['🍈','🍉','🍋','🍌','🥑','🍍'])
The above python dictionary contains two keys. The type key identifies the
dimension as a labeled dimension while the labels key holds an
array of labels. In this example, the labels are emojis. Add this dictionary
as an argument of the add_dimension()
method
of the fundata instance.
>>> fundata.add_dimension(x)
>>> print(fundata.data_structure)
{
"csdm": {
"version": "1.0",
"description": "An emoji dataset",
"dimensions": [
{
"type": "labeled",
"labels": [
"🍈",
"🍉",
"🍋",
"🍌",
"🥑",
"🍍"
]
}
],
"dependent_variables": []
}
}
We have successfully added a labeled dimension to the fundata instance.
Next, add a dependent variable. Set up a python dictionary corresponding to the
dependent variable object and add this dictionary as an argument of the
add_dependent_variable()
method of the fundata
instance.
>>> y =dict(type='internal', numeric_type='float32', quantity_type='scalar',
... components=[[0.5, 0.25, 1, 2, 1, 0.25]])
>>> fundata.add_dependent_variable(y)
Here, the python dictionary contains type, numeric_type, and components key. The value of the components key holds an array of data values corresponding to the labels from the labeled dimension.
Now, we have a 😂 dataset…
>>> print(fundata.data_structure)
{
"csdm": {
"version": "1.0",
"description": "An emoji dataset",
"dimensions": [
{
"type": "labeled",
"labels": [
"🍈",
"🍉",
"🍋",
"🍌",
"🥑",
"🍍"
]
}
],
"dependent_variables": [
{
"type": "internal",
"numeric_type": "float32",
"quantity_type": "scalar",
"components": [
[
"0.5, 0.25, ..., 1.0, 0.25"
]
]
}
]
}
}
To serialize this file, use the save()
method of the
fundata instance as
>>> fundata.dependent_variables[0].encoding = 'base64'
>>> fundata.save('my_file.csdf')
In the above code, the components from the
dependent_variables
attribute at index zero, are
encoded as base64 strings before serializing to the my_file.csdf file.
You may also save the components as a binary file, in which case, the file is serialized with a .csdfe file extension.
>>> fundata.dependent_variables[0].encoding = 'raw'
>>> fundata.save('my_file_raw.csdfe')
API-Reference¶
csdmpy¶
The csdmpy is a python package for importing and exporting files serialized with the core scientific dataset model file-format. The package supports a \(p\)-component dependent variable, \(\mathbf{U} \equiv \{\mathbf{U}_{0}, \ldots,\mathbf{U}_{q}, \ldots,\mathbf{U}_{p-1} \}\), which is discretely sampled at \(M\) unique points in a \(d\)-dimensional space \((\mathbf{X}_0, \ldots \mathbf{X}_k, \ldots \mathbf{X}_{d-1})\). Besides, the package also supports multiple dependent variables, \(\mathbf{U}_i\), sharing the same \(d\)-dimensional space.
Here, every dataset is an instance of the CSDM class, which holds a list of dimensions and dependent variables. Every dimension, \(\mathbf{X}_k\), is an instance of the Dimension class, while every dependent variable, \(\mathbf{U}_i\), is an instance of the DependentVariable class.
Methods¶
Methods Summary
|
Load a .csdf/.csdfe file and return an instance of CSDM class. |
|
Create a new instance of the CSDM class containing a 0D{0} dataset. |
|
Parse a CSDM compliant python dictionary and return a CSDM object. |
|
Loads a JSON serialized string as a CSDM object. |
|
Helper function for plotting basic 1D and 2D datasets. |
Method Documentation
-
csdmpy.
load
(filename=None, application=False)[source]¶ Load a .csdf/.csdfe file and return an instance of CSDM class.
The file must be a JSON serialization of the CSD Model.
Example
>>> data1 = cp.load('local_address/file.csdf') >>> data2 = cp.load('url_address/file.csdf')
- Parameters
filename (str) – A local or remote address to the .csdf or `.csdfe file.
application (bool) – If true, the application metadata from application that last serialized the file will be imported. Default is False.
sort_fft_order (bool) – If true, the coordinates and the components corresponding to the dimension with complex_fft as True will be sorted upon import and the corresponding complex_fft key-value will be set to False. Default is True.
- Returns
A CSDM instance.
-
csdmpy.
new
(description='')[source]¶ Create a new instance of the CSDM class containing a 0D{0} dataset.
- Parameters
description (str) – A string describing the the csdm object. This is optional.
Example
>>> import csdmpy as cp >>> emptydata = cp.new(description='Testing Testing 1 2 3') >>> print(emptydata.data_structure) { "csdm": { "version": "1.0", "description": "Testing Testing 1 2 3", "dimensions": [], "dependent_variables": [] } }
- Returns
A CSDM instance.
-
csdmpy.
parse_dict
(dictionary)[source]¶ Parse a CSDM compliant python dictionary and return a CSDM object.
- Parameters
dictionary – A CSDM compliant python dictionary.
-
csdmpy.
loads
(string)[source]¶ Loads a JSON serialized string as a CSDM object.
- Parameters
string – A JSON serialized CSDM string.
- Returns
A CSDM object
Example
>>> object_from_string = cp.loads(cp.new('A test dump').dumps()) >>> print(object_from_string.data_structure) { "csdm": { "version": "1.0", "timestamp": "2019-10-21T20:33:17Z", "description": "A test dump", "dimensions": [], "dependent_variables": [] } }
-
csdmpy.
plot
(data_object, **kwargs)[source]¶ Helper function for plotting basic 1D and 2D datasets.
- Parameters
data_object – The csdm object
reverse_axis – An array of boolean specifying which dimensions will be displayed in reverse.
kwargs – Additional kwargs which are passed to the matplotlib plot or imshow functions.
Example
>>> cp.plot(data_object)
CSDM¶
-
class
csdmpy.csdm.
CSDM
(filename='', version=None, description='', **kwargs)[source]¶ Bases:
object
Create an instance of a CSDM class.
This class is based on the root CSDM object of the core scientific dataset (CSD) model. The class is a composition of the DependentVariable and Dimension instances, where an instance of the DependentVariable class describes a \(p\)-component dependent variable, and an instance of the Dimension class describes a dimension of a \(d\)-dimensional space. Additional attributes of this class are listed below.
Attributes Summary
Version number of the CSD model on file.
Description of the dataset.
If True, the data-file is serialized as read only, otherwise, False.
Timestamp from when the file was last serialized.
Geographic coordinate, if present, from where the file was last serialized.
Tuple of Dimension instances.
Tuple of DependentVariable instances.
List of tags attached to the dataset.
Application metadata dictionary of the CSDM object.
Json serialized string describing the CSDM class instance.
Local file address of the current file.
Methods Summary
add_dimension
(*args, **kwargs)add_dependent_variable
(*args, **kwargs)Add a new DependentVariable instance to the CSDM instance.
to_dict
([update_timestamp, read_only, version])Serialize the CSDM instance as a python dictionary.
dumps
([update_timestamp, read_only, version])Serialize the CSDM instance as a JSON data-exchange string.
save
([filename, read_only, version, …])Serialize the CSDM instance as a JSON data-exchange file.
copy
()Create a copy of the current CSDM instance.
Attributes Documentation
-
version
¶ Version number of the CSD model on file.
-
description
¶ Description of the dataset.
The default value is an empty string, ‘’.
Example
>>> print(data.description) A simulated sine curve.
- Returns
A string of UTF-8 allows characters describing the dataset.
- Raises
TypeError – When the assigned value is not a string.
-
read_only
¶ If True, the data-file is serialized as read only, otherwise, False.
By default, the CSDM object loads a copy of the .csdf(e) file, irrespective of the value of the read_only attribute. The value of this attribute may be toggled at any time after the file import. When serializing the .csdf(e) file, if the value of the read_only attribute is found True, the file will be serialized as read only.
-
timestamp
¶ Timestamp from when the file was last serialized.
The timestamp stamp is a string representation of the Coordinated Universal Time (UTC) formatted according to the iso-8601 standard.
- Raises
AttributeError – When the attribute is modified.
-
geographic_coordinate
¶ Geographic coordinate, if present, from where the file was last serialized.
The geographic coordinates correspond to the location where the file was last serialized. If present, the geographic coordinates are described with three attributes, the required latitude and longitude, and an optional altitude.
- Raises
AttributeError – When the attribute is modified.
-
dependent_variables
¶ Tuple of DependentVariable instances.
List of tags attached to the dataset.
-
application
¶ Application metadata dictionary of the CSDM object.
>>> print(data.application) {}
By default, the application attribute is an empty dictionary, that is, the application metadata stored by the previous application is ignored upon file import.
The application metadata may, however, be retained with a request via the
load()
method. This feature may be useful to related applications where application metadata might contain additional information. The attribute may be updated with a python dictionary.The application attribute is where an application can place its own metadata as a python dictionary object containing application specific metadata, using a reverse domain name notation string as the attribute key, for example,
Example
>>> data.application = { ... "com.example.myApp" : { ... "myApp_key": "myApp_metadata" ... } ... } >>> print(data.application) {'com.example.myApp': {'myApp_key': 'myApp_metadata'}}
- Returns
Python dictionary object with the application metadata.
-
data_structure
¶ Json serialized string describing the CSDM class instance.
The data_structure attribute is only intended for a quick preview of the dataset. This JSON serialized string from this attribute avoids displaying large datasets. Do not use the value of this attribute to save the data to a file, instead use the
save()
methods of the instance.- Raises
AttributeError – When modified.
-
filename
¶ Local file address of the current file.
Methods Documentation
-
add_dimension
(*args, **kwargs)[source]¶ Add a new Dimension instance to the CSDM instance.
There are three ways to add a new independent variable.
From a python dictionary containing valid keywords.
>>> import csdmpy as cp >>> datamodel = cp.new() >>> py_dictionary = { ... 'type': 'linear', ... 'increment': '5 G', ... 'count': 50, ... 'coordinates_offset': '-10 mT' ... } >>> datamodel.add_dimension(py_dictionary)
From a list of valid keyword arguments.
>>> datamodel.add_dimension( ... type = 'linear', ... increment = '5 G', ... count = 50, ... coordinates_offset = '-10 mT' ... )
From an Dimension instance.
>>> from csdmpy import Dimension >>> datamodel = cp.new() >>> var1 = Dimension(type = 'linear', ... increment = '5 G', ... count = 50, ... coordinates_offset = '-10 mT') >>> datamodel.add_dimension(var1) >>> print(datamodel.data_structure) { "csdm": { "version": "1.0", "dimensions": [ { "type": "linear", "count": 50, "increment": "5.0 G", "coordinates_offset": "-10.0 mT", "quantity_name": "magnetic flux density" } ], "dependent_variables": [] } }
For the last method, the instance
var1
is added to thedatamodel
as a reference, i.e., if the instancevar1
is destroyed, thedatamodel
instance will become corrupt. As a recommendation always pass a copy of the Dimension instance to theadd_dimension()
method. We provide the later alternative for it is useful for copying an Dimension instance from one CSDM instance to another.
-
add_dependent_variable
(*args, **kwargs)[source]¶ Add a new DependentVariable instance to the CSDM instance.
There are again three ways to add a new dependent variable instance.
From a python dictionary containing valid keywords.
>>> import numpy as np >>> datamodel = cp.new() >>> numpy_array = (100*np.random.rand(3,50)).astype(np.uint8) >>> py_dictionary = { ... 'type': 'internal', ... 'components': numpy_array, ... 'name': 'star', ... 'unit': 'W s', ... 'quantity_name': 'energy', ... 'quantity_type': 'pixel_3' ... } >>> datamodel.add_dependent_variable(py_dictionary)
From a list of valid keyword arguments.
>>> datamodel.add_dependent_variable(type='internal', ... name='star', ... unit='W s', ... quantity_type='pixel_3', ... components=numpy_array)
From a DependentVariable instance.
>>> from csdmpy import DependentVariable >>> var1 = DependentVariable(type='internal', ... name='star', ... unit='W s', ... quantity_type='pixel_3', ... components=numpy_array) >>> datamodel.add_dependent_variable(var1)
If passing a DependentVariable instance, as a general recommendation, always pass a copy of the DependentVariable instance to the
add_dependent_variable()
method. We provide the later alternative as it is useful for copying a DependentVariable instance from one CSDM instance to another.
-
to_dict
(update_timestamp=False, read_only=False, version='1.0')[source]¶ Serialize the CSDM instance as a python dictionary.
- Parameters
update_timestamp (bool) – If True, timestamp is updated to current time.
read_only (bool) – If true, the read_only flag is set true.
version (str) – Serialize the dict with the given csdm version.
Example
>>> data.to_dict() {'csdm': {'version': '1.0', 'timestamp': '1994-11-05T13:15:30Z', 'geographic_coordinate': {'latitude': '10 deg', 'longitude': '93.2 deg', 'altitude': '10 m'}, 'description': 'A simulated sine curve.', 'dimensions': [{'type': 'linear', 'description': 'A temporal dimension.', 'count': 10, 'increment': '0.1 s', 'quantity_name': 'time', 'label': 'time', 'reciprocal': {'quantity_name': 'frequency'}}], 'dependent_variables': [{'type': 'internal', 'description': 'A response dependent variable.', 'name': 'sine curve', 'encoding': 'base64', 'numeric_type': 'float32', 'quantity_type': 'scalar', 'component_labels': ['response'], 'components': ['AAAAABh5Fj9xeHM/cXhzPxh5Fj8yMQ0lGHkWv3F4c79xeHO/GHkWvw==']}]}}
-
dumps
(update_timestamp=False, read_only=False, version='1.0')[source]¶ Serialize the CSDM instance as a JSON data-exchange string.
- Parameters
update_timestamp (bool) – If True, timestamp is updated to current time.
read_only (bool) – If true, the file is serialized as read_only.
version (str) – The file is serialized with the given CSD model version.
Example
>>> data.dumps()
-
save
(filename='', read_only=False, version='1.0', output_device=None)[source]¶ Serialize the CSDM instance as a JSON data-exchange file.
There are two types of file serialization extensions, .csdf and .csdfe. In the CSD model, when every instance of the DependentVariable objects from a CSDM class has an internal subtype, the corresponding CSDM instance is serialized with a .csdf file extension. If any single DependentVariable instance has an external subtype, the CSDM instance is serialized with a .csdfe file extension. The two different file extensions are used to alert the end-user of the possible deserialization error associated with the .csdfe file extensions had the external data file becomes inaccessible.
In csdmpy, however, irrespective of the dependent variable subtypes from the serialized JSON file, by default, all instances of DependentVariable class are treated an internal after import. Therefore, when serialized, the CSDM object should be stored as a .csdf file.
To store a file as a .csdfe file, the user much set the value of the
encoding
attribute from the dependent variables toraw
. In which case, a binary file named filename_i.dat will be generated where \(i\) is the \(i^\text{th}\) dependent variable. The parameter filename is an argument of this method.Note
Only dependent variables with
encoding="raw"
will be serialized to a binary file.- Parameters
filename (str) – The filename of the serialized file.
read_only (bool) – If true, the file is serialized as read_only.
version (str) – The file is serialized with the given CSD model version.
output_device (object) – Object where the data is written. If provided, the argument filename become irrelevant.
Example
>>> data.save('my_file.csdf')
-
Dimension¶
-
class
csdmpy.dimensions.
Dimension
(*args, **kwargs)[source]¶ Bases:
object
Create an instance of the Dimension class.
An instance of this class describes a dimension of a multi-dimensional system. In version 1.0 of the CSD model, there are three subtypes of the Dimension class:
Creating an instance of a dimension object
There are two ways of creating a new instance of a Dimension class.
From a python dictionary containing valid keywords.
>>> from csdmpy import Dimension >>> dimension_dictionary = { ... 'type': 'linear', ... 'description': 'test', ... 'increment': '5 G', ... 'count': 10, ... 'coordinates_offset': '10 mT', ... 'origin_offset': '10 T' ... } >>> x = Dimension(dimension_dictionary)
Here, dimension_dictionary is the python dictionary.
From valid keyword arguments.
>>> x = Dimension(type = 'linear', ... description = 'test', ... increment = '5 G', ... count = 10, ... coordinates_offset = '10 mT', ... origin_offset = '10 T')
Attributes Summary
The dimension subtype.
Brief description of the dimension object.
Application metadata dictionary of the dimension object.
Coordinates, \({\bf X}_k\), along the dimension.
Absolute coordinates, \(\bf X_k^{\rm{abs}}\), along the dimension.
Number of coordinates, \(N_k \ge 1\), along the dimension.
Increment along a linear dimension.
Offset corresponding to the zero of the indexes array, \(\mathbf{J}_k\).
Origin offset, \(o_k\), along the dimension.
Boolean specifying if the coordinates along the dimension are the output of a complex fft.
Quantity name associated with the physical quantities specifying the dimension.
Label associated with the dimension.
Ordered list of labels along the Labeled dimension.
Period of the dimension.
Formatted string for displaying label along the dimension axis.
Json serialized string describing the Dimension class instance.
Methods Summary
to
([unit, equivalencies])Convert the coordinates along the dimension to the unit, unit.
to_dict
()Return Dimension object as a python dictionary.
Return True if the dependent variable is quantitative.
Attributes Documentation
-
type
¶ The dimension subtype.
There are three valid subtypes of Dimension class. The valid literals are given by the DimObjectSubtype enumeration.
>>> print(x.type) linear
- Returns
A string with a valid dimension subtype.
- Raises
AttributeError – When the attribute is modified.
-
description
¶ Brief description of the dimension object.
The default value is an empty string, ‘’. The attribute may be modified, for example,
>>> print(x.description) This is a test >>> x.description = 'This is a test dimension.'
- Returns
A string of UTF-8 allows characters describing the dimension.
- Raises
TypeError – When the assigned value is not a string.
-
application
¶ Application metadata dictionary of the dimension object.
>>> print(x.application) {}
The application attribute is where an application can place its own metadata as a python dictionary object containing application specific metadata, using a reverse domain name notation string as the attribute key, for example,
>>> x.application = { ... "com.example.myApp" : { ... "myApp_key": "myApp_metadata" ... } ... } >>> print(x.application) {'com.example.myApp': {'myApp_key': 'myApp_metadata'}}
- Returns
A python dictionary containing dimension application metadata.
-
coordinates
¶ Coordinates, \({\bf X}_k\), along the dimension.
Example
>>> print(x.coordinates) [100. 105. 110. 115. 120. 125. 130. 135. 140. 145.] G
For linear dimensions, the order of the coordinates also depend on the value of the
complex_fft
attributes. For examples, when the value of the complex_fft attribute is True, the coordinates are>>> x.complex_fft = True >>> print(x.coordinates) [ 75. 80. 85. 90. 95. 100. 105. 110. 115. 120.] G
- Returns
A Quantity array of coordinates for quantitative dimensions, i.e. linear and monotonic.
- Returns
A Numpy array for labeled dimensions.
- Raises
AttributeError – For dimensions with subtype linear.
-
absolute_coordinates
¶ Absolute coordinates, \(\bf X_k^{\rm{abs}}\), along the dimension.
This attribute is only valid for quantitative dimensions, that is, linear and monotonic dimensions. The absolute coordinates are given as
\[\mathbf{X}_k^\mathrm{abs} = \mathbf{X}_k + o_k \mathbf{1}\]where \(\mathbf{X}_k\) are the coordinates along the dimension and \(o_k\) is the
origin_offset
. For example, consider>>> print(x.origin_offset) 10.0 T >>> print(x.coordinates[:5]) [100. 105. 110. 115. 120.] G
then the absolute coordinates are
>>> print(x.absolute_coordinates[:5]) [100100. 100105. 100110. 100115. 100120.] G
For linear dimensions, the order of the absolute_coordinates further depend on the value of the
complex_fft
attributes. For examples, when the value of the complex_fft attribute is True, the absolute coordinates are>>> x.complex_fft = True >>> print(x.absolute_coordinates[:5]) [100075. 100080. 100085. 100090. 100095.] G
- Returns
A Quantity array of absolute coordinates for quantitative dimensions, i.e linear and monotonic.
- Raises
AttributeError – For labeled dimensions.
-
count
¶ Number of coordinates, \(N_k \ge 1\), along the dimension.
Example
>>> print(x.count) 10 >>> x.count = 5
- Returns
An Integer specifying the number of coordinates along the dimension.
- Raises
TypeError – When the assigned value is not an integer.
-
increment
¶ Increment along a linear dimension.
The attribute is only valid for Dimension instances with the subtype linear. When assigning a value, the dimensionality of the value must be consistent with the dimensionality of other members specifying the dimension.
Example
>>> print(x.increment) 5.0 G >>> x.increment = "0.1 G" >>> print(x.coordinates) [100. 100.1 100.2 100.3 100.4 100.5 100.6 100.7 100.8 100.9] G
- Returns
A Quantity instance with the increment along the dimension.
- Raises
AttributeError – For dimension with subtypes other than linear.
TypeError – When the assigned value is not a string containing a quantity or a Quantity object.
-
coordinates_offset
¶ Offset corresponding to the zero of the indexes array, \(\mathbf{J}_k\).
When assigning a value, the dimensionality of the value must be consistent with the dimensionality of the other members specifying the dimension.
Example
>>> print(x.coordinates_offset) 10.0 mT >>> x.coordinates_offset = "0 T" >>> print(x.coordinates) [ 0. 5. 10. 15. 20. 25. 30. 35. 40. 45.] G
The attribute is invalid for labeled dimensions.
- Returns
A Quantity instance with the coordinates offset.
- Raises
AttributeError – For labeled dimensions.
TypeError – When the assigned value is not a string containing a quantity or a Quantity object.
-
origin_offset
¶ Origin offset, \(o_k\), along the dimension.
When assigning a value, the dimensionality of the value must be consistent with the dimensionality of other members specifying the dimension.
Example
>>> print(x.origin_offset) 10.0 T >>> x.origin_offset = "1e5 G"
The origin offset only affect the absolute_coordinates along the dimension. This attribute is invalid for labeled dimensions.
- Returns
A Quantity instance with the origin offset.
- Raises
AttributeError – For labeled dimensions.
TypeError – When the assigned value is not a string containing a quantity or a Quantity object.
-
complex_fft
¶ Boolean specifying if the coordinates along the dimension are the output of a complex fft.
This attribute is only valid for the Dimension instances with linear subtype. The value of this attribute is a boolean specifying if the coordinates along the dimension are evaluated as the output of a complex fast Fourier transform (FFT) routine. For example, consider the following Dimension object,
>>> test = Dimension( ... type='linear', ... increment = '1', ... count = 10 ... ) >>> test.complex_fft False >>> print(test.coordinates) [0. 1. 2. 3. 4. 5. 6. 7. 8. 9.] >>> test.complex_fft = True >>> print(test.coordinates) [-5. -4. -3. -2. -1. 0. 1. 2. 3. 4.]
- Returns
A Boolean.
- Raises
TypeError – When the assigned value is not a boolean.
-
quantity_name
¶ Quantity name associated with the physical quantities specifying the dimension.
The attribute is invalid for the labeled dimension.
>>> print(x.quantity_name) magnetic flux density
- Returns
A string with the quantity name.
- Raises
AttributeError – For labeled dimensions.
NotImplementedError – When assigning a value.
-
label
¶ Label associated with the dimension.
Example
>>> print(x.label) field strength >>> x.label = 'magnetic field strength'
- Returns
A string containing the label.
- Raises
TypeError – When the assigned value is not a string.
-
labels
¶ Ordered list of labels along the Labeled dimension.
Consider the following labeled dimension,
>>> x2 = Dimension( ... type='labeled', ... labels=['Cu', 'Ag', 'Au'] ... )
then the lables along the labeled dimension are
>>> print(x2.labels) ['Cu' 'Ag' 'Au']
Note
For Labeled dimension, the
coordinates
attribute is an alias oflabels
attribute. For example,>>> np.all(x2.coordinates == x2.labels) True
In the above example,
x2
is an instance of the Dimension class with labeled subtype.- Returns
A Numpy array with labels along the dimension.
- Raises
AttributeError – For dimensions with subtype other than labeled.
-
period
¶ Period of the dimension.
The default value of the period is infinity, i.e., the dimension is non-periodic.
Example
>>> print(x.period) inf G >>> x.period = '1 T'
To assign a dimension as non-periodic, one of the following may be used,
>>> x.period = '1/0 T' >>> x.period = 'infinity µT' >>> x.period = '∞ G'
Attention
The physical quantity of the period must be consistent with other physical quantities specifying the dimension.
- Returns
A Quantity instance with the period of the dimension.
- Raises
AttributeError – For labeled dimensions.
TypeError – When the assigned value is not a string containing a quantity or a Quantity object.
-
axis_label
¶ Formatted string for displaying label along the dimension axis.
This attribute is not a part of the original core scientific dataset model, however, it is a convenient supplementary attribute that provides a formated string ready for labeling dimension axes. For quantitative dimensions, this attributes returns a string, label / unit, if the label is a non-empty string, otherwise, quantity_name / unit. Here
quantity_name
andlabel
are the attributes of the Dimension instances, and unit is the unit associated with the coordinates along the dimension. For examples,>>> x.label 'field strength' >>> x.axis_label 'field strength / (G)'
For labeled dimensions, this attribute returns label.
- Returns
A formated string of label.
- Raises
AttributeError – When assigned a value.
-
data_structure
¶ Json serialized string describing the Dimension class instance.
This supplementary attribute is useful for a quick preview of the dimension object. The attribute cannot be modified.
>>> print(x.data_structure) { "type": "linear", "description": "This is a test", "count": 10, "increment": "5.0 G", "coordinates_offset": "10.0 mT", "origin_offset": "10.0 T", "quantity_name": "magnetic flux density", "label": "field strength" }
- Returns
A json serialized string of the dimension object.
- Raises
AttributeError – When modified.
Method Documentation
-
to
(unit='', equivalencies=None)[source]¶ Convert the coordinates along the dimension to the unit, unit.
This method is a wrapper of the to method from the Quantity class and is only valid for physical dimensions.
Example
>>> print(x.coordinates) [100. 105. 110. 115. 120. 125. 130. 135. 140. 145.] G >>> x.to('mT') >>> print(x.coordinates) [10. 10.5 11. 11.5 12. 12.5 13. 13.5 14. 14.5] mT
- Parameters
unit – A string containing a unit with the same dimensionality as the coordinates along the dimension.
- Raises
AttributeError – For labeled dimensions.
-
to_dict
()[source]¶ Return Dimension object as a python dictionary.
Example
>>> x.to_dict() {'type': 'linear', 'description': 'This is a test', 'count': 10, 'increment': '5.0 G', 'coordinates_offset': '10.0 mT', 'origin_offset': '10.0 T', 'quantity_name': 'magnetic flux density', 'label': 'field strength'}
-
DependentVariable¶
-
class
csdmpy.dependent_variables.
DependentVariable
(*args, **kwargs)[source]¶ Bases:
object
Create an instance of the DependentVariable class.
The instance of this class represents a dependent variable, \(\mathbf{U}\). A dependent variable holds \(p\)-component data values, where \(p>0\) is an integer. For example, a scalar is single-component (\(p=1\)), a vector may have up to n-components (\(p=n\)), while a second rank symmetric tensor have six unique component (\(p=6\)).
Creating a new dependent variable.
There are two ways of creating a new instance of a DependentVariable class.
From a python dictionary containing valid keywords.
>>> from csdmpy import DependentVariable >>> import numpy as np >>> numpy_array = np.arange(30).reshape(3,10).astype(np.float32) >>> dependent_variable_dictionary = { ... 'type': 'internal', ... 'components': numpy_array, ... 'name': 'star', ... 'unit': 'W s', ... 'quantity_name': 'energy', ... 'quantity_type': 'pixel_3' ... } >>> y = DependentVariable(dependent_variable_dictionary)
Here, dependent_variable_dictionary is the python dictionary.
From valid keyword arguments.
>>> y = DependentVariable( ... type='internal', ... name='star', ... unit='W s', ... quantity_type='pixel_3', ... components=numpy_array ... )
Attributes Summary
The dependent variable subtype.
Brief description of the dependent variables.
Application metadata of the DependentVariable object.
Name of the dependent variable.
Unit associated with the dependent variable.
Quantity name of the physical quantities associated with the dependent variable.
The encoding method used in representing the dependent variable.
The numeric type of the data values from the dependent variable.
Quantity type of the dependent variable.
List of labels corresponding to the components of the dependent variable.
Component array of the dependent variable.
URL where the data components of the dependent variable are stored.
List of formatted string labels for each component of the dependent variable.
Json serialized string describing the DependentVariable class instance.
Methods Summary
to
(unit)Convert the unit of the dependent variable to the unit.
to_dict
()Return DependentVariable object as a python dictionary.
Attributes Documentation
-
type
¶ The dependent variable subtype.
There are two valid subtypes of DependentVariable class with the following enumeration literals,
internal
external
corresponding to Internal and External sub class. By default, all instances of the DependentVariable class are assigned as internal upon import. The user may update the value of this attribute, at any time, with a string containing a valid type literal, for example,
>>> print(y.type) internal >>> y.type = 'external'
When type is external, the data values from the corresponding dependent variable are serialized to an external file within the same directory as the .csdfe file.
- Returns
A string with a valid dependent variable subtype.
- Raises
ValueError – When an invalid value is assigned.
-
description
¶ Brief description of the dependent variables.
The default value is an empty string, ‘’.
>>> print(y.description) A test image >>> y.description = 'A test pixel_3 image' >>> print(y.description) A test pixel_3 image
- Returns
A string of UTF-8 allowed characters describing the dependent variable.
- Raises
TypeError – When the assigned value is not a string.
-
application
¶ Application metadata of the DependentVariable object.
>>> print(y.application) {}
The application attribute is where an application can place its own metadata as a python dictionary object containing the application specific metadata, using a reverse domain name notation string as the attribute key, for example,
>>> y.application = { ... "com.example.myApp" : { ... "myApp_key": "myApp_metadata" ... } ... } >>> print(y.application) {'com.example.myApp': {'myApp_key': 'myApp_metadata'}}
Please refer to the Core Scientific Dataset Model article for details.
- Returns
A python dictionary containing dependent variable application metadata.
-
name
¶ Name of the dependent variable.
>>> y.name 'star' >>> y.name = 'rock star'
- Returns
A string containing the name of the dependent variable.
- Raises
TypeError – When the assigned value is not a string.
-
unit
¶ Unit associated with the dependent variable.
Note
The attribute cannot be modified. To convert the unit, use the
to()
method of the class instance.>>> y.unit Unit("s W")
- Returns
A Unit object from astropy.unit package.
- Raises
AttributeError – When assigned a value.
-
quantity_name
¶ Quantity name of the physical quantities associated with the dependent variable.
>>> y.quantity_name 'energy'
- Returns
A string with the quantity name associated with the dependent variable physical quantities .
- Raises
NotImplementedError – When assigning a value.
-
encoding
¶ The encoding method used in representing the dependent variable.
The value of this attribute determines the method used when serializing or deserializing the data values to and from the file. Currently, there are three valid encoding methods:
raw
base64
none
A value, raw, means that the data values are serialized as binary data. The value, base64, implies that the data values are serialized as base64 strings, while, the value none refers to text-based serialization.
By default, the encoding attribute of all dependent variable object are set to base64 after import. The user may update this attribute, at any time, with a string containing a valid encoding literal, for example,
>>> y.encoding = 'base64'
The value of this attribute will be used in serializing the data to the file, when using the
save()
method.- Returns
A string with a valid encoding type.
- Raises
ValueError – If an invalid encoding value is assigned.
-
numeric_type
¶ The numeric type of the data values from the dependent variable.
There are currently twelve valid numeric types:
uint8
int8
float32
complex64
uint16
int16
float64
complex128
uint32
int32
uint64
int64
When assigning a valid value, this attribute updates the dtype of the Numpy array from the corresponding
components
attribute. We recommended the use of the numeric type attribute for updating the dtype of the Numpy array. For example,>>> y.numeric_type 'float32' >>> print(y.components) [[ 0. 1. 2. 3. 4. 5. 6. 7. 8. 9.] [10. 11. 12. 13. 14. 15. 16. 17. 18. 19.] [20. 21. 22. 23. 24. 25. 26. 27. 28. 29.]] >>> y.numeric_type = 'complex64' >>> print(y.components[:,:5]) [[ 0.+0.j 1.+0.j 2.+0.j 3.+0.j 4.+0.j] [10.+0.j 11.+0.j 12.+0.j 13.+0.j 14.+0.j] [20.+0.j 21.+0.j 22.+0.j 23.+0.j 24.+0.j]]
- Returns
A string with a valid numeric type.
- Raises
ValueError – If an invalid numeric type value is assigned.
-
quantity_type
¶ Quantity type of the dependent variable.
There are currently six valid quantity types,
scalar
vector_n
pixel_n
matrix_n_m
symmetric_matrix_n
where n and m are integers. The value of the attribute is modified with a string containing a valid quantity type.
>>> y.quantity_type 'pixel_3' >>> y.quantity_type = 'vector_3'
- Returns
A string with a valid quantity type.
- Raises
ValueError – If an invalid value is assigned.
-
component_labels
¶ List of labels corresponding to the components of the dependent variable.
>>> y.component_labels ['', '', '']
To update the component_labels, assign an array of strings with same number of elements as the number of components.
>>> y.component_labels = ['channel 0', 'channel 1', 'channel 2']
The individual labels are accessed with proper indexing, for example,
>>> y.component_labels[2] 'channel 2'
- Returns
A list of component label strings.
- Raises
TypeError – When the assigned value is not an array of strings.
-
components
¶ Component array of the dependent variable.
The value of this attribute, \(\mathbb{U}\), is a Numpy array of shape \((p \times N_{d-1} \times ... N_1 \times N_0)\) where \(p\) is the number of components, and \(N_k\) is the number of points from the \(k^\mathrm{th}\) Dimension object.
Note
The shape of the components Numpy array, \((p \times N_{d-1} \times ... N_1 \times N_0)\), is reverse the shape of the components array, \((N_0 \times N_1 \times ... N_{d-1} \times p)\), from the CSD model. This is because CSD model utilizes a column-major order to shape the components array relative to the order of the dimension while Numpy utilizes a row-major order.
The dimensionality of this Numpy array is \(d+1\) where \(d\) is the number of dimension objects. The zeroth axis with \(p\) points is the number of components.
This attribute can only be updated when the shape of the new array is the same as the shape of the components array.
For example,
>>> print(y.components.shape) (3, 10) >>> y.numeric_type 'float32'
is a three-component dependent variable with ten data values per component. The numeric type of the data values, in this example, is float32. To update the components array, assign an array of shape (3, 10) to the components attribute. In the following example, we assign a Numpy array,
>>> y.components = np.linspace(0,256,30, dtype='u1').reshape(3,10) >>> y.numeric_type 'uint8'
Notice, the value of the numeric_type attribute is automatically updated based on the dtype of the Numpy array. In this case, from a float32 to uint8. In this other example,
>>> try: ... y.components = np.random.rand(1,10).astype('u1') ... except ValueError as e: ... print(e) The shape of `ndarray`, `(1, 10)`, is inconsistent with the shape of the components array, `(3, 10)`.
a ValueError is raised because the shape of the input array (1, 10) is not consistent with the shape of the components array, (3, 10).
- Returns
A Numpy array of components.
- Raises
ValueError – When assigning an array whose shape is not consistent with the shape of the components array.
-
components_url
¶ URL where the data components of the dependent variable are stored.
This attribute is only informative and cannot be modified. Its value is a string containing the local or remote address of the file where the data values are stored. The attribute is only valid for dependent variable with type, external.
- Returns
A string containing the URL.
- Raises
AttributeError – When assigned a value.
-
axis_label
¶ List of formatted string labels for each component of the dependent variable.
This attribute is not a part of the original core scientific dataset model, however, it is a convenient supplementary attribute that provides formated string ready for labeling the components of the dependent variable. The string at index i is formatted as component_labels[i] / unit if component_labels[i] is a non-empty string, otherwise, quantity_name / unit. Here, quantity_name, component_labels, and unit`are the attributes of the :ref:`dv_api instance. For example,
>>> y.axis_label ['energy / (s * W)', 'energy / (s * W)', 'energy / (s * W)']
- Returns
A list of formated component label strings.
- Raises
AttributeError – When assigned a value.
-
data_structure
¶ Json serialized string describing the DependentVariable class instance.
This supplementary attribute is useful for a quick preview of the dependent variable object. For convenience, the values from the components attribute are truncated to the first and the last two numbers per component. The encoding keyword is also hidden from this view.
>>> print(y.data_structure) { "type": "internal", "description": "A test image", "name": "star", "unit": "s * W", "quantity_name": "energy", "numeric_type": "float32", "quantity_type": "pixel_3", "components": [ [ "0.0, 1.0, ..., 8.0, 9.0" ], [ "10.0, 11.0, ..., 18.0, 19.0" ], [ "20.0, 21.0, ..., 28.0, 29.0" ] ] }
- Returns
A json serialized string of the dependent variable object.
- Raises
AttributeError – When modified.
Method Documentation
-
to
(unit)[source]¶ Convert the unit of the dependent variable to the unit.
- Parameters
unit – A string containing a unit with the same dimensionality as the components of the dependent variable.
>>> y.unit Unit("s W") >>> print(y.components[0,5]) 5.0 >>> y.to('mJ') >>> y.unit Unit("mJ") >>> print(y.components[0,5]) 5000.0
Note
This method is a wrapper of the to method from the Quantity class.
-
to_dict
()[source]¶ Return DependentVariable object as a python dictionary.
Example
>>> y.to_dict() {'type': 'internal', 'description': 'A test image', 'name': 'star', 'unit': 's * W', 'quantity_name': 'energy', 'encoding': 'none', 'numeric_type': 'float32', 'quantity_type': 'pixel_3', 'components': [[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0], [10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0], [20.0, 21.0, 22.0, 23.0, 24.0, 25.0, 26.0, 27.0, 28.0, 29.0]]}
-