An emoji 😁 example¢

Let’s make use of what we learned so far and create a simple 1D{1} dataset. To make it interesting, let’s create an emoji dataset.

Start by importing the csdmpy package.

>>> import csdmpy as cp

Create a labeled dimension. Here, we make use of python dictionary.

>>> x = dict(type="labeled", labels=["🍈", "πŸ‰", "πŸ‹", "🍌", "πŸ₯‘", "🍍"])

The above python dictionary contains two keys. The type key identifies the dimension as a labeled dimension while the labels key holds an array of labels. In this example, the labels are emojis. Add this dictionary to the list of dimensions.

Next, create a dependent variable. Similarly, set up a python dictionary corresponding to the dependent variable object.

>>> y = dict(
...     type="internal",
...     numeric_type="float32",
...     quantity_type="scalar",
...     components=[[0.5, 0.25, 1, 2, 1, 0.25]],
... )

Here, the python dictionary contains type, numeric_type, and components key. The value of the components key holds an array of data values corresponding to the labels from the labeled dimension.

Create a csdm object from the dimensions and dependent variables and we have a πŸ˜‚ dataset…

>>> fun_data = cp.CSDM(
...     dimensions=[x], dependent_variables=[y], description="An emoji dataset"
... )
>>> print(fun_data.data_structure)
{
  "csdm": {
    "version": "1.0",
    "description": "An emoji dataset",
    "dimensions": [
      {
        "type": "labeled",
        "labels": [
          "🍈",
          "πŸ‰",
          "πŸ‹",
          "🍌",
          "πŸ₯‘",
          "🍍"
        ]
      }
    ],
    "dependent_variables": [
      {
        "type": "internal",
        "numeric_type": "float32",
        "quantity_type": "scalar",
        "components": [
          [
            "0.5, 0.25, ..., 1.0, 0.25"
          ]
        ]
      }
    ]
  }
}

To serialize this file, use the save() method of the fun_data instance as

>>> fun_data.dependent_variables[0].encoding = "base64"
>>> fun_data.save("my_file.csdf")

In the above code, the components from the dependent_variables attribute at index zero, are encoded as base64 strings before serializing to the my_file.csdf file.

You may also save the components as a binary file, in which case, the file is serialized with a .csdfe file extension.

>>> fun_data.dependent_variables[0].encoding = "raw"
>>> fun_data.save("my_file_raw.csdfe")