Labeled Dataset¶

The CSD model also supports labeled dimensions. In the following example, we present a mixed linear and labeled two-dimensional dataset representing the population of the country as a function of time. The dataset is obtained from The World Bank.

Import the csdmpy model and load the dataset.

import csdmpy as cp

filename = "https://osu.box.com/shared/static/e81to3izj5yv5m7mjq9xw7gmqez2blto.csdf"

The tuple of dimension and dependent variable objects from labeled_data instance are

Since one of the dimensions is a labeled dimension, let’s make use of the type attribute of the dimension instances to find out which dimension is labeled.

print(x.type)

Out:

linear
print(x.type)

Out:

labeled

Here, the second dimension is the labeled dimension with 1

print(x.count)

Out:

263

labels, where the first five labels are

print(x.labels[:5])

Out:

['Aruba' 'Afghanistan' 'Angola' 'Albania' 'Andorra']

Note

For labeled dimensions, the coordinates attribute is an alias of the labels attribute.

print(x.coordinates[:5])

Out:

['Aruba' 'Afghanistan' 'Angola' 'Albania' 'Andorra']

The coordinates along the first dimension, viewed up to the first ten points, are

print(x.coordinates[:10])

Out:

[1960. 1961. 1962. 1963. 1964. 1965. 1966. 1967. 1968. 1969.] yr

Plotting the dataset

You may plot this dataset however you like. Here, we use a bar graph to represent the population of countries in the year 2017. The data corresponding to this year is a cross-section of the dependent variable at index 57 along the x dimension.

print(x.coordinates)

Out:

2017.0 yr

To keep the plot simple, we only plot the first 20 country labels along the x dimension.

import matplotlib.pyplot as plt
import numpy as np

x_data = x.coordinates[:20]
x_pos = np.arange(20)
y_data = y.components[:20, 57]

plt.bar(x_data, y_data, align="center", alpha=0.5)
plt.xticks(x_pos, x_data, rotation=90)
plt.ylabel(y.axis_label)
plt.yscale("log")
plt.title(y.name)
plt.tight_layout()
plt.show() Footnotes

1

In the CSD model, the attribute count is only valid for the LinearDimension. In csdmpy, however, the count attribute is valid for all dimension objects and returns an integer with the number of grid points along the dimension.

Total running time of the script: ( 0 minutes 1.782 seconds)

Gallery generated by Sphinx-Gallery