Working with DataFrames/DataContainer¶
The data import functions of openqlab
make extensive use of openqlab.io.data_container
, which are convenient objects that hold measurement data similar to an Excel table. The data containers are actually based on the Pandas DataFrame
, plus some additional header information that further describes the measurement data, such as RBW, frequency range, etc. Since working with DataFrames
can be very convenient, let us give a short introduction to them here. There is also Panda’s excellent tutorial 10 min to Pandas.
Importing data with the io.read()
function already creates a
DataFrame
for you, but you can also do it manually out of an array of
data. Let us create an array x
of 100 values from the range [0, 2*pi). We then
calculate the sin
of these values, and set those to be the data of our
new DataFrame
. Each DataFrame
also has an index, which we just
set to be the original x
values:
>>> import pandas as pd
>>> x = linspace(0, 2*pi, 100, endpoint=False)
>>> df = pd.DataFrame(sin(x), index=x)
>>> df.head()
0
0.000000 0.000000
0.062832 0.062791
0.125664 0.125333
0.188496 0.187381
0.251327 0.248690
As you can see, df.head()
gives us the first few values of our table, with
the index in the first column, and the sine values in the second column. The
column is named 0
here by default, so let’s change that to something more
sensible:
>>> df.columns = ['sin']
>>> df.head()
sin
0.000000 0.000000
0.062832 0.062791
0.125664 0.125333
0.188496 0.187381
0.251327 0.248690
Much better. We can easily add new columns simply by assigning to them:
>>> df['cos'] = cos(df.index)
>>> df.head()
sin cos
0.000000 0.000000 1.000000
0.062832 0.062791 0.998027
0.125664 0.125333 0.992115
0.188496 0.187381 0.982287
0.251327 0.248690 0.968583
See? Easy. Note how we could access the original x
values via the index
property of the DataFrame
. The assignment also works for combining
multiple DataFrame
s into one object, e.g. combining several measurements
into one.
You can rename columns in a few ways, the easiest is to use rename()
,
which will return a new DataFrame
(the old one is left unchanged, which you see in the following examples, where the columns of df
are still named sin and cos):
>>> df.rename(columns={'sin': 'Sine', 'cos': 'Cosine'}).head()
Sine Cosine
0.000000 0.000000 1.000000
0.062832 0.062791 0.998027
0.125664 0.125333 0.992115
0.188496 0.187381 0.982287
0.251327 0.248690 0.968583
Let’s have some statistics:
>>> df.describe()
sin cos
count 1.000000e+02 1.000000e+02
mean -1.068590e-17 4.996004e-17
std 7.106691e-01 7.106691e-01
min -1.000000e+00 -1.000000e+00
25% -6.956525e-01 -6.956525e-01
50% -1.608123e-16 -1.722546e-16
75% 6.956525e-01 6.956525e-01
max 1.000000e+00 1.000000e+00
You can do calculations directly with the columns:
>>> df['sin']**2 + df['cos']**2
0.000000 1.0
0.062832 1.0
0.125664 1.0
0.188496 1.0
0.251327 1.0
Unsurprisingly, sine squared plus cosine squared equals one. Note how the index is still preserved.
Plotting of data in a DataFrame
is straight-forward:
>>> df.plot()
< will show a plot of all columns, using the index as x-axis >
>>> df[0:pi].plot()
< same plot, but for the index range of 0..pi >