Working with DataFrames/DataContainer

The data import functions of openqlab make extensive use of openqlab.io.data_container, which are convenient objects that hold measurement data similar to an Excel table. The data containers are actually based on the Pandas DataFrame, plus some additional header information that further describes the measurement data, such as RBW, frequency range, etc. Since working with DataFrames can be very convenient, let us give a short introduction to them here. There is also Panda’s excellent tutorial 10 min to Pandas.

Importing data with the io.read() function already creates a DataFrame for you, but you can also do it manually out of an array of data. Let us create an array x of 100 values from the range [0, 2*pi). We then calculate the sin of these values, and set those to be the data of our new DataFrame. Each DataFrame also has an index, which we just set to be the original x values:

>>> import pandas as pd
>>> x = linspace(0, 2*pi, 100, endpoint=False)
>>> df = pd.DataFrame(sin(x), index=x)
>>> df.head()
                 0
0.000000  0.000000
0.062832  0.062791
0.125664  0.125333
0.188496  0.187381
0.251327  0.248690

As you can see, df.head() gives us the first few values of our table, with the index in the first column, and the sine values in the second column. The column is named 0 here by default, so let’s change that to something more sensible:

>>> df.columns = ['sin']
>>> df.head()
               sin
0.000000  0.000000
0.062832  0.062791
0.125664  0.125333
0.188496  0.187381
0.251327  0.248690

Much better. We can easily add new columns simply by assigning to them:

>>> df['cos'] = cos(df.index)
>>> df.head()
               sin       cos
0.000000  0.000000  1.000000
0.062832  0.062791  0.998027
0.125664  0.125333  0.992115
0.188496  0.187381  0.982287
0.251327  0.248690  0.968583

See? Easy. Note how we could access the original x values via the index property of the DataFrame. The assignment also works for combining multiple DataFrame s into one object, e.g. combining several measurements into one.

You can rename columns in a few ways, the easiest is to use rename(), which will return a new DataFrame (the old one is left unchanged, which you see in the following examples, where the columns of df are still named sin and cos):

>>> df.rename(columns={'sin': 'Sine', 'cos': 'Cosine'}).head()
              Sine    Cosine
0.000000  0.000000  1.000000
0.062832  0.062791  0.998027
0.125664  0.125333  0.992115
0.188496  0.187381  0.982287
0.251327  0.248690  0.968583

Let’s have some statistics:

>>> df.describe()
                sin           cos
count  1.000000e+02  1.000000e+02
mean  -1.068590e-17  4.996004e-17
std    7.106691e-01  7.106691e-01
min   -1.000000e+00 -1.000000e+00
25%   -6.956525e-01 -6.956525e-01
50%   -1.608123e-16 -1.722546e-16
75%    6.956525e-01  6.956525e-01
max    1.000000e+00  1.000000e+00

You can do calculations directly with the columns:

>>> df['sin']**2 + df['cos']**2
0.000000    1.0
0.062832    1.0
0.125664    1.0
0.188496    1.0
0.251327    1.0

Unsurprisingly, sine squared plus cosine squared equals one. Note how the index is still preserved.

Plotting of data in a DataFrame is straight-forward:

>>> df.plot()
< will show a plot of all columns, using the index as x-axis >
>>> df[0:pi].plot()
< same plot, but for the index range of 0..pi >