DataContainer¶
See here the original documentation of DataFrame.
- class openqlab.io.data_container.DataContainer(*args, header: dict | None = None, header_type: str | None = None, type: str | None = None, **kwargs)¶
DataContainer inherits from pandas.DataFrame and works with header variable to store additional information besides plain data.
- add(**kwargs)¶
Get Addition of DataContainer and other, element-wise (binary operator add).
Equivalent to
DataContainer + other
, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, radd.Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.
- Parameters:
other (scalar, sequence, DataContainerSeries, dict or DataContainer) – Any single or multiple element data structure, or list-like object.
axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For DataContainerSeries input, axis to match DataContainerSeries index on.
level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataContainer alignment, with this value before computation. If data in both corresponding DataContainer locations is missing the result will be missing.
- Returns:
Result of the arithmetic operation.
- Return type:
See also
DataContainer.add
Add DataContainers.
DataContainer.sub
Subtract DataContainers.
DataContainer.mul
Multiply DataContainers.
DataContainer.div
Divide DataContainers (float division).
DataContainer.truediv
Divide DataContainers (float division).
DataContainer.floordiv
Divide DataContainers (integer division).
DataContainer.mod
Calculate modulo (remainder after division).
DataContainer.pow
Calculate exponential power.
Notes
Mismatched indices will be unioned together.
Examples
>>> df = DataContainer({'angles': [0, 3, 4], ... 'degrees': [360, 180, 360]}, ... index=['circle', 'triangle', 'rectangle']) >>> df angles degrees circle 0 360 triangle 3 180 rectangle 4 360
Add a scalar with operator version which return the same results.
>>> df + 1 angles degrees circle 1 361 triangle 4 181 rectangle 5 361
>>> df.add(1) angles degrees circle 1 361 triangle 4 181 rectangle 5 361
Divide by constant with reverse version.
>>> df.div(10) angles degrees circle 0.0 36.0 triangle 0.3 18.0 rectangle 0.4 36.0
>>> df.rdiv(10) angles degrees circle inf 0.027778 triangle 3.333333 0.055556 rectangle 2.500000 0.027778
Subtract a list and DataContainerSeries by axis with operator version.
>>> df - [1, 2] angles degrees circle -1 358 triangle 2 178 rectangle 3 358
>>> df.sub([1, 2], axis='columns') angles degrees circle -1 358 triangle 2 178 rectangle 3 358
>>> df.sub(pd.DataContainerSeries([1, 1, 1], index=['circle', 'triangle', 'rectangle']), ... axis='index') angles degrees circle -1 359 triangle 2 179 rectangle 3 359
Multiply a dictionary by axis.
>>> df.mul({'angles': 0, 'degrees': 2}) angles degrees circle 0 720 triangle 0 360 rectangle 0 720
>>> df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index') angles degrees circle 0 0 triangle 6 360 rectangle 12 1080
Multiply a DataContainer of different shape with operator version.
>>> other = DataContainer({'angles': [0, 3, 4]}, ... index=['circle', 'triangle', 'rectangle']) >>> other angles circle 0 triangle 3 rectangle 4
>>> df * other angles degrees circle 0 NaN triangle 9 NaN rectangle 16 NaN
>>> df.mul(other, fill_value=0) angles degrees circle 0 0.0 triangle 9 0.0 rectangle 16 0.0
Divide by a MultiIndex by level.
>>> df_multindex = DataContainer({'angles': [0, 3, 4, 4, 5, 6], ... 'degrees': [360, 180, 360, 360, 540, 720]}, ... index=[['A', 'A', 'A', 'B', 'B', 'B'], ... ['circle', 'triangle', 'rectangle', ... 'square', 'pentagon', 'hexagon']]) >>> df_multindex angles degrees A circle 0 360 triangle 3 180 rectangle 4 360 B square 4 360 pentagon 5 540 hexagon 6 720
>>> df.div(df_multindex, level=1, fill_value=0) angles degrees A circle NaN 1.0 triangle 1.0 1.0 rectangle 1.0 1.0 B square 0.0 0.0 pentagon 0.0 0.0 hexagon 0.0 0.0
- agg(**kwargs)¶
Aggregate using one or more operations over the specified axis.
- Parameters:
func (function, str, list or dict) –
Function to use for aggregating the data. If a function, must either work when passed a DataContainer or when passed to DataContainer.apply.
Accepted combinations are:
function
string function name
list of functions and/or function names, e.g.
[np.sum, 'mean']
dict of axis labels -> functions, function names or list of such.
axis ({0 or 'index', 1 or 'columns'}, default 0) – If 0 or ‘index’: apply function to each column. If 1 or ‘columns’: apply function to each row.
*args – Positional arguments to pass to func.
**kwargs – Keyword arguments to pass to func.
- Returns:
The return can be:
scalar : when DataContainerSeries.agg is called with single function
DataContainerSeries : when DataContainer.agg is called with a single function
DataContainer : when DataContainer.agg is called with several functions
- Return type:
scalar, DataContainerSeries or DataContainer
See also
DataContainer.apply
Perform any type of operations.
DataContainer.transform
Perform transformation type operations.
pandas.DataContainer.groupby
Perform operations over groups.
pandas.DataContainer.resample
Perform operations over resampled bins.
pandas.DataContainer.rolling
Perform operations over rolling window.
pandas.DataContainer.expanding
Perform operations over expanding window.
pandas.core.window.ewm.ExponentialMovingWindow
Perform operation over exponential weighted window.
Notes
The aggregation operations are always performed over an axis, either the index (default) or the column axis. This behavior is different from numpy aggregation functions (mean, median, prod, sum, std, var), where the default is to compute the aggregation of the flattened array, e.g.,
numpy.mean(arr_2d)
as opposed tonumpy.mean(arr_2d, axis=0)
.agg is an alias for aggregate. Use the alias.
Functions that mutate the passed object can produce unexpected behavior or errors and are not supported. See Mutating with User Defined Function (UDF) methods for more details.
A passed user-defined-function will be passed a DataContainerSeries for evaluation.
Examples
>>> df = DataContainer([[1, 2, 3], ... [4, 5, 6], ... [7, 8, 9], ... [np.nan, np.nan, np.nan]], ... columns=['A', 'B', 'C'])
Aggregate these functions over the rows.
>>> df.agg(['sum', 'min']) A B C sum 12.0 15.0 18.0 min 1.0 2.0 3.0
Different aggregations per column.
>>> df.agg({'A' : ['sum', 'min'], 'B' : ['min', 'max']}) A B sum 12.0 NaN min 1.0 2.0 max NaN 8.0
Aggregate different functions over the columns and rename the index of the resulting DataContainer.
>>> df.agg(x=('A', 'max'), y=('B', 'min'), z=('C', 'mean')) A B C x 7.0 NaN NaN y NaN 2.0 NaN z NaN NaN 6.0
Aggregate over the columns.
>>> df.agg("mean", axis="columns") 0 2.0 1 5.0 2 8.0 3 NaN dtype: float64
- aggregate(**kwargs)¶
Aggregate using one or more operations over the specified axis.
- Parameters:
func (function, str, list or dict) –
Function to use for aggregating the data. If a function, must either work when passed a DataContainer or when passed to DataContainer.apply.
Accepted combinations are:
function
string function name
list of functions and/or function names, e.g.
[np.sum, 'mean']
dict of axis labels -> functions, function names or list of such.
axis ({0 or 'index', 1 or 'columns'}, default 0) – If 0 or ‘index’: apply function to each column. If 1 or ‘columns’: apply function to each row.
*args – Positional arguments to pass to func.
**kwargs – Keyword arguments to pass to func.
- Returns:
The return can be:
scalar : when DataContainerSeries.agg is called with single function
DataContainerSeries : when DataContainer.agg is called with a single function
DataContainer : when DataContainer.agg is called with several functions
- Return type:
scalar, DataContainerSeries or DataContainer
See also
DataContainer.apply
Perform any type of operations.
DataContainer.transform
Perform transformation type operations.
pandas.DataContainer.groupby
Perform operations over groups.
pandas.DataContainer.resample
Perform operations over resampled bins.
pandas.DataContainer.rolling
Perform operations over rolling window.
pandas.DataContainer.expanding
Perform operations over expanding window.
pandas.core.window.ewm.ExponentialMovingWindow
Perform operation over exponential weighted window.
Notes
The aggregation operations are always performed over an axis, either the index (default) or the column axis. This behavior is different from numpy aggregation functions (mean, median, prod, sum, std, var), where the default is to compute the aggregation of the flattened array, e.g.,
numpy.mean(arr_2d)
as opposed tonumpy.mean(arr_2d, axis=0)
.agg is an alias for aggregate. Use the alias.
Functions that mutate the passed object can produce unexpected behavior or errors and are not supported. See Mutating with User Defined Function (UDF) methods for more details.
A passed user-defined-function will be passed a DataContainerSeries for evaluation.
Examples
>>> df = DataContainer([[1, 2, 3], ... [4, 5, 6], ... [7, 8, 9], ... [np.nan, np.nan, np.nan]], ... columns=['A', 'B', 'C'])
Aggregate these functions over the rows.
>>> df.agg(['sum', 'min']) A B C sum 12.0 15.0 18.0 min 1.0 2.0 3.0
Different aggregations per column.
>>> df.agg({'A' : ['sum', 'min'], 'B' : ['min', 'max']}) A B sum 12.0 NaN min 1.0 2.0 max NaN 8.0
Aggregate different functions over the columns and rename the index of the resulting DataContainer.
>>> df.agg(x=('A', 'max'), y=('B', 'min'), z=('C', 'mean')) A B C x 7.0 NaN NaN y NaN 2.0 NaN z NaN NaN 6.0
Aggregate over the columns.
>>> df.agg("mean", axis="columns") 0 2.0 1 5.0 2 8.0 3 NaN dtype: float64
- apply(**kwargs)¶
Apply a function along an axis of the DataContainer.
Objects passed to the function are DataContainerSeries objects whose index is either the DataContainer’s index (
axis=0
) or the DataContainer’s columns (axis=1
). By default (result_type=None
), the final return type is inferred from the return type of the applied function. Otherwise, it depends on the result_type argument.- Parameters:
func (function) – Function to apply to each column or row.
axis ({0 or 'index', 1 or 'columns'}, default 0) –
Axis along which the function is applied:
0 or ‘index’: apply function to each column.
1 or ‘columns’: apply function to each row.
raw (bool, default False) –
Determines if row or column is passed as a DataContainerSeries or ndarray object:
False
: passes each row or column as a DataContainerSeries to the function.True
: the passed function will receive ndarray objects instead. If you are just applying a NumPy reduction function this will achieve much better performance.
result_type ({'expand', 'reduce', 'broadcast', None}, default None) –
These only act when
axis=1
(columns):’expand’ : list-like results will be turned into columns.
’reduce’ : returns a DataContainerSeries if possible rather than expanding list-like results. This is the opposite of ‘expand’.
’broadcast’ : results will be broadcast to the original shape of the DataContainer, the original index and columns will be retained.
The default behaviour (None) depends on the return value of the applied function: list-like results will be returned as a DataContainerSeries of those. However if the apply function returns a DataContainerSeries these are expanded to columns.
args (tuple) – Positional arguments to pass to func in addition to the array/series.
by_row (False or "compat", default "compat") –
Only has an effect when
func
is a listlike or dictlike of funcs and the func isn’t a string. If “compat”, will if possible first translate the func into pandas methods (e.g.DataContainerSeries().apply(np.sum)
will be translated toDataContainerSeries().sum()
). If that doesn’t work, will try call to apply again withby_row=True
and if that fails, will call apply again withby_row=False
(backward compatible). If False, the funcs will be passed the whole DataContainerSeries at once.Added in version 2.1.0.
engine ({'python', 'numba'}, default 'python') –
Choose between the python (default) engine or the numba engine in apply.
The numba engine will attempt to JIT compile the passed function, which may result in speedups for large DataContainers. It also supports the following engine_kwargs :
nopython (compile the function in nopython mode)
nogil (release the GIL inside the JIT compiled function)
parallel (try to apply the function in parallel over the DataContainer)
Note: Due to limitations within numba/how pandas interfaces with numba, you should only use this if raw=True
Note: The numba compiler only supports a subset of valid Python/numpy operations.
Please read more about the supported python features and supported numpy features in numba to learn what you can or cannot use in the passed function.
Added in version 2.2.0.
engine_kwargs (dict) – Pass keyword arguments to the engine. This is currently only used by the numba engine, see the documentation for the engine argument for more information.
**kwargs – Additional keyword arguments to pass as keywords arguments to func.
- Returns:
Result of applying
func
along the given axis of the DataContainer.- Return type:
See also
DataContainer.map
For elementwise operations.
DataContainer.aggregate
Only perform aggregating type operations.
DataContainer.transform
Only perform transforming type operations.
Notes
Functions that mutate the passed object can produce unexpected behavior or errors and are not supported. See Mutating with User Defined Function (UDF) methods for more details.
Examples
>>> df = DataContainer([[4, 9]] * 3, columns=['A', 'B']) >>> df A B 0 4 9 1 4 9 2 4 9
Using a numpy universal function (in this case the same as
np.sqrt(df)
):>>> df.apply(np.sqrt) A B 0 2.0 3.0 1 2.0 3.0 2 2.0 3.0
Using a reducing function on either axis
>>> df.apply(np.sum, axis=0) A 12 B 27 dtype: int64
>>> df.apply(np.sum, axis=1) 0 13 1 13 2 13 dtype: int64
Returning a list-like will result in a DataContainerSeries
>>> df.apply(lambda x: [1, 2], axis=1) 0 [1, 2] 1 [1, 2] 2 [1, 2] dtype: object
Passing
result_type='expand'
will expand list-like results to columns of a Dataframe>>> df.apply(lambda x: [1, 2], axis=1, result_type='expand') 0 1 0 1 2 1 1 2 2 1 2
Returning a DataContainerSeries inside the function is similar to passing
result_type='expand'
. The resulting column names will be the DataContainerSeries index.>>> df.apply(lambda x: pd.DataContainerSeries([1, 2], index=['foo', 'bar']), axis=1) foo bar 0 1 2 1 1 2 2 1 2
Passing
result_type='broadcast'
will ensure the same shape result, whether list-like or scalar is returned by the function, and broadcast it along the axis. The resulting column names will be the originals.>>> df.apply(lambda x: [1, 2], axis=1, result_type='broadcast') A B 0 1 2 1 1 2 2 1 2
- asof(**kwargs)¶
Return the last row(s) without any NaNs before where.
The last row (for each element in where, if list) without any NaN is taken. In case of a
DataContainer
, the last row without NaN considering only the subset of columns (if not None)If there is no good value, NaN is returned for a DataContainerSeries or a DataContainerSeries of NaN values for a DataContainer
- Parameters:
where (date or array-like of dates) – Date(s) before which the last row(s) are returned.
subset (str or array-like of str, default None) – For DataContainer, if not None, only use these columns to check for NaNs.
- Returns:
The return can be:
scalar : when self is a DataContainerSeries and where is a scalar
DataContainerSeries: when self is a DataContainerSeries and where is an array-like, or when self is a DataContainer and where is a scalar
DataContainer : when self is a DataContainer and where is an array-like
- Return type:
scalar, DataContainerSeries, or DataContainer
See also
merge_asof
Perform an asof merge. Similar to left join.
Notes
Dates are assumed to be sorted. Raises if this is not the case.
Examples
A DataContainerSeries and a scalar where.
>>> s = pd.DataContainerSeries([1, 2, np.nan, 4], index=[10, 20, 30, 40]) >>> s 10 1.0 20 2.0 30 NaN 40 4.0 dtype: float64
>>> s.asof(20) 2.0
For a sequence where, a DataContainerSeries is returned. The first value is NaN, because the first element of where is before the first index value.
>>> s.asof([5, 20]) 5 NaN 20 2.0 dtype: float64
Missing values are not considered. The following is
2.0
, not NaN, even though NaN is at the index location for30
.>>> s.asof(30) 2.0
Take all columns into consideration
>>> df = DataContainer({'a': [10., 20., 30., 40., 50.], ... 'b': [None, None, None, None, 500]}, ... index=pd.DatetimeIndex(['2018-02-27 09:01:00', ... '2018-02-27 09:02:00', ... '2018-02-27 09:03:00', ... '2018-02-27 09:04:00', ... '2018-02-27 09:05:00'])) >>> df.asof(pd.DatetimeIndex(['2018-02-27 09:03:30', ... '2018-02-27 09:04:30'])) a b 2018-02-27 09:03:30 NaN NaN 2018-02-27 09:04:30 NaN NaN
Take a single column into consideration
>>> df.asof(pd.DatetimeIndex(['2018-02-27 09:03:30', ... '2018-02-27 09:04:30']), ... subset=['a']) a b 2018-02-27 09:03:30 30.0 NaN 2018-02-27 09:04:30 40.0 NaN
- combine(**kwargs)¶
Perform column-wise combine with another DataContainer.
Combines a DataContainer with other DataContainer using func to element-wise combine columns. The row and column indexes of the resulting DataContainer will be the union of the two.
- Parameters:
other (DataContainer) – The DataContainer to merge column-wise.
func (function) – Function that takes two series as inputs and return a DataContainerSeries or a scalar. Used to merge the two DataContainers column by columns.
fill_value (scalar value, default None) – The value to fill NaNs with prior to passing any column to the merge func.
overwrite (bool, default True) – If True, columns in self that do not exist in other will be overwritten with NaNs.
- Returns:
Combination of the provided DataContainers.
- Return type:
See also
DataContainer.combine_first
Combine two DataContainer objects and default to non-null values in frame calling the method.
Examples
Combine using a simple function that chooses the smaller column.
>>> df1 = DataContainer({'A': [0, 0], 'B': [4, 4]}) >>> df2 = DataContainer({'A': [1, 1], 'B': [3, 3]}) >>> take_smaller = lambda s1, s2: s1 if s1.sum() < s2.sum() else s2 >>> df1.combine(df2, take_smaller) A B 0 0 3 1 0 3
Example using a true element-wise combine function.
>>> df1 = DataContainer({'A': [5, 0], 'B': [2, 4]}) >>> df2 = DataContainer({'A': [1, 1], 'B': [3, 3]}) >>> df1.combine(df2, np.minimum) A B 0 1 2 1 0 3
Using fill_value fills Nones prior to passing the column to the merge function.
>>> df1 = DataContainer({'A': [0, 0], 'B': [None, 4]}) >>> df2 = DataContainer({'A': [1, 1], 'B': [3, 3]}) >>> df1.combine(df2, take_smaller, fill_value=-5) A B 0 0 -5.0 1 0 4.0
However, if the same element in both DataContainers is None, that None is preserved
>>> df1 = DataContainer({'A': [0, 0], 'B': [None, 4]}) >>> df2 = DataContainer({'A': [1, 1], 'B': [None, 3]}) >>> df1.combine(df2, take_smaller, fill_value=-5) A B 0 0 -5.0 1 0 3.0
Example that demonstrates the use of overwrite and behavior when the axis differ between the DataContainers.
>>> df1 = DataContainer({'A': [0, 0], 'B': [4, 4]}) >>> df2 = DataContainer({'B': [3, 3], 'C': [-10, 1], }, index=[1, 2]) >>> df1.combine(df2, take_smaller) A B C 0 NaN NaN NaN 1 NaN 3.0 -10.0 2 NaN 3.0 1.0
>>> df1.combine(df2, take_smaller, overwrite=False) A B C 0 0.0 NaN NaN 1 0.0 3.0 -10.0 2 NaN 3.0 1.0
Demonstrating the preference of the passed in DataContainer.
>>> df2 = DataContainer({'B': [3, 3], 'C': [1, 1], }, index=[1, 2]) >>> df2.combine(df1, take_smaller) A B C 0 0.0 NaN NaN 1 0.0 3.0 NaN 2 NaN 3.0 NaN
>>> df2.combine(df1, take_smaller, overwrite=False) A B C 0 0.0 NaN NaN 1 0.0 3.0 1.0 2 NaN 3.0 1.0
- combine_first(**kwargs)¶
Update null elements with value in the same location in other.
Combine two DataContainer objects by filling null values in one DataContainer with non-null values from other DataContainer. The row and column indexes of the resulting DataContainer will be the union of the two. The resulting DataContainer contains the ‘first’ DataContainer values and overrides the second one values where both first.loc[index, col] and second.loc[index, col] are not missing values, upon calling first.combine_first(second).
- Parameters:
other (DataContainer) – Provided DataContainer to use to fill null values.
- Returns:
The result of combining the provided DataContainer with the other object.
- Return type:
See also
DataContainer.combine
Perform series-wise operation on two DataContainers using a given function.
Examples
>>> df1 = DataContainer({'A': [None, 0], 'B': [None, 4]}) >>> df2 = DataContainer({'A': [1, 1], 'B': [3, 3]}) >>> df1.combine_first(df2) A B 0 1.0 3.0 1 0.0 4.0
Null values still persist if the location of that null value does not exist in other
>>> df1 = DataContainer({'A': [None, 0], 'B': [4, None]}) >>> df2 = DataContainer({'B': [3, 3], 'C': [1, 1]}, index=[1, 2]) >>> df1.combine_first(df2) A B C 0 NaN 4.0 NaN 1 0.0 3.0 1.0 2 NaN 3.0 1.0
- corrwith(**kwargs)¶
Compute pairwise correlation.
Pairwise correlation is computed between rows or columns of DataContainer with rows or columns of DataContainerSeries or DataContainer. DataContainers are first aligned along both axes before computing the correlations.
- Parameters:
other (DataContainer, DataContainerSeries) – Object with which to compute correlations.
axis ({0 or 'index', 1 or 'columns'}, default 0) – The axis to use. 0 or ‘index’ to compute row-wise, 1 or ‘columns’ for column-wise.
drop (bool, default False) – Drop missing indices from result.
method ({'pearson', 'kendall', 'spearman'} or callable) –
Method of correlation:
pearson : standard correlation coefficient
kendall : Kendall Tau correlation coefficient
spearman : Spearman rank correlation
- callable: callable with input two 1d ndarrays
and returning a float.
numeric_only (bool, default False) –
Include only float, int or boolean data.
Added in version 1.5.0.
Changed in version 2.0.0: The default value of
numeric_only
is nowFalse
.
- Returns:
Pairwise correlations.
- Return type:
See also
DataContainer.corr
Compute pairwise correlation of columns.
Examples
>>> index = ["a", "b", "c", "d", "e"] >>> columns = ["one", "two", "three", "four"] >>> df1 = DataContainer(np.arange(20).reshape(5, 4), index=index, columns=columns) >>> df2 = DataContainer(np.arange(16).reshape(4, 4), index=index[:4], columns=columns) >>> df1.corrwith(df2) one 1.0 two 1.0 three 1.0 four 1.0 dtype: float64
>>> df2.corrwith(df1, axis=1) a 1.0 b 1.0 c 1.0 d 1.0 e NaN dtype: float64
- div(**kwargs)¶
Get Floating division of DataContainer and other, element-wise (binary operator truediv).
Equivalent to
DataContainer / other
, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rtruediv.Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.
- Parameters:
other (scalar, sequence, DataContainerSeries, dict or DataContainer) – Any single or multiple element data structure, or list-like object.
axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For DataContainerSeries input, axis to match DataContainerSeries index on.
level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataContainer alignment, with this value before computation. If data in both corresponding DataContainer locations is missing the result will be missing.
- Returns:
Result of the arithmetic operation.
- Return type:
See also
DataContainer.add
Add DataContainers.
DataContainer.sub
Subtract DataContainers.
DataContainer.mul
Multiply DataContainers.
DataContainer.div
Divide DataContainers (float division).
DataContainer.truediv
Divide DataContainers (float division).
DataContainer.floordiv
Divide DataContainers (integer division).
DataContainer.mod
Calculate modulo (remainder after division).
DataContainer.pow
Calculate exponential power.
Notes
Mismatched indices will be unioned together.
Examples
>>> df = DataContainer({'angles': [0, 3, 4], ... 'degrees': [360, 180, 360]}, ... index=['circle', 'triangle', 'rectangle']) >>> df angles degrees circle 0 360 triangle 3 180 rectangle 4 360
Add a scalar with operator version which return the same results.
>>> df + 1 angles degrees circle 1 361 triangle 4 181 rectangle 5 361
>>> df.add(1) angles degrees circle 1 361 triangle 4 181 rectangle 5 361
Divide by constant with reverse version.
>>> df.div(10) angles degrees circle 0.0 36.0 triangle 0.3 18.0 rectangle 0.4 36.0
>>> df.rdiv(10) angles degrees circle inf 0.027778 triangle 3.333333 0.055556 rectangle 2.500000 0.027778
Subtract a list and DataContainerSeries by axis with operator version.
>>> df - [1, 2] angles degrees circle -1 358 triangle 2 178 rectangle 3 358
>>> df.sub([1, 2], axis='columns') angles degrees circle -1 358 triangle 2 178 rectangle 3 358
>>> df.sub(pd.DataContainerSeries([1, 1, 1], index=['circle', 'triangle', 'rectangle']), ... axis='index') angles degrees circle -1 359 triangle 2 179 rectangle 3 359
Multiply a dictionary by axis.
>>> df.mul({'angles': 0, 'degrees': 2}) angles degrees circle 0 720 triangle 0 360 rectangle 0 720
>>> df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index') angles degrees circle 0 0 triangle 6 360 rectangle 12 1080
Multiply a DataContainer of different shape with operator version.
>>> other = DataContainer({'angles': [0, 3, 4]}, ... index=['circle', 'triangle', 'rectangle']) >>> other angles circle 0 triangle 3 rectangle 4
>>> df * other angles degrees circle 0 NaN triangle 9 NaN rectangle 16 NaN
>>> df.mul(other, fill_value=0) angles degrees circle 0 0.0 triangle 9 0.0 rectangle 16 0.0
Divide by a MultiIndex by level.
>>> df_multindex = DataContainer({'angles': [0, 3, 4, 4, 5, 6], ... 'degrees': [360, 180, 360, 360, 540, 720]}, ... index=[['A', 'A', 'A', 'B', 'B', 'B'], ... ['circle', 'triangle', 'rectangle', ... 'square', 'pentagon', 'hexagon']]) >>> df_multindex angles degrees A circle 0 360 triangle 3 180 rectangle 4 360 B square 4 360 pentagon 5 540 hexagon 6 720
>>> df.div(df_multindex, level=1, fill_value=0) angles degrees A circle NaN 1.0 triangle 1.0 1.0 rectangle 1.0 1.0 B square 0.0 0.0 pentagon 0.0 0.0 hexagon 0.0 0.0
- divide(**kwargs)¶
Get Floating division of DataContainer and other, element-wise (binary operator truediv).
Equivalent to
DataContainer / other
, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rtruediv.Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.
- Parameters:
other (scalar, sequence, DataContainerSeries, dict or DataContainer) – Any single or multiple element data structure, or list-like object.
axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For DataContainerSeries input, axis to match DataContainerSeries index on.
level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataContainer alignment, with this value before computation. If data in both corresponding DataContainer locations is missing the result will be missing.
- Returns:
Result of the arithmetic operation.
- Return type:
See also
DataContainer.add
Add DataContainers.
DataContainer.sub
Subtract DataContainers.
DataContainer.mul
Multiply DataContainers.
DataContainer.div
Divide DataContainers (float division).
DataContainer.truediv
Divide DataContainers (float division).
DataContainer.floordiv
Divide DataContainers (integer division).
DataContainer.mod
Calculate modulo (remainder after division).
DataContainer.pow
Calculate exponential power.
Notes
Mismatched indices will be unioned together.
Examples
>>> df = DataContainer({'angles': [0, 3, 4], ... 'degrees': [360, 180, 360]}, ... index=['circle', 'triangle', 'rectangle']) >>> df angles degrees circle 0 360 triangle 3 180 rectangle 4 360
Add a scalar with operator version which return the same results.
>>> df + 1 angles degrees circle 1 361 triangle 4 181 rectangle 5 361
>>> df.add(1) angles degrees circle 1 361 triangle 4 181 rectangle 5 361
Divide by constant with reverse version.
>>> df.div(10) angles degrees circle 0.0 36.0 triangle 0.3 18.0 rectangle 0.4 36.0
>>> df.rdiv(10) angles degrees circle inf 0.027778 triangle 3.333333 0.055556 rectangle 2.500000 0.027778
Subtract a list and DataContainerSeries by axis with operator version.
>>> df - [1, 2] angles degrees circle -1 358 triangle 2 178 rectangle 3 358
>>> df.sub([1, 2], axis='columns') angles degrees circle -1 358 triangle 2 178 rectangle 3 358
>>> df.sub(pd.DataContainerSeries([1, 1, 1], index=['circle', 'triangle', 'rectangle']), ... axis='index') angles degrees circle -1 359 triangle 2 179 rectangle 3 359
Multiply a dictionary by axis.
>>> df.mul({'angles': 0, 'degrees': 2}) angles degrees circle 0 720 triangle 0 360 rectangle 0 720
>>> df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index') angles degrees circle 0 0 triangle 6 360 rectangle 12 1080
Multiply a DataContainer of different shape with operator version.
>>> other = DataContainer({'angles': [0, 3, 4]}, ... index=['circle', 'triangle', 'rectangle']) >>> other angles circle 0 triangle 3 rectangle 4
>>> df * other angles degrees circle 0 NaN triangle 9 NaN rectangle 16 NaN
>>> df.mul(other, fill_value=0) angles degrees circle 0 0.0 triangle 9 0.0 rectangle 16 0.0
Divide by a MultiIndex by level.
>>> df_multindex = DataContainer({'angles': [0, 3, 4, 4, 5, 6], ... 'degrees': [360, 180, 360, 360, 540, 720]}, ... index=[['A', 'A', 'A', 'B', 'B', 'B'], ... ['circle', 'triangle', 'rectangle', ... 'square', 'pentagon', 'hexagon']]) >>> df_multindex angles degrees A circle 0 360 triangle 3 180 rectangle 4 360 B square 4 360 pentagon 5 540 hexagon 6 720
>>> df.div(df_multindex, level=1, fill_value=0) angles degrees A circle NaN 1.0 triangle 1.0 1.0 rectangle 1.0 1.0 B square 0.0 0.0 pentagon 0.0 0.0 hexagon 0.0 0.0
- dot(**kwargs)¶
Compute the matrix multiplication between the DataContainer and other.
This method computes the matrix product between the DataContainer and the values of an other DataContainerSeries, DataContainer or a numpy array.
It can also be called using
self @ other
.- Parameters:
other (DataContainerSeries, DataContainer or array-like) – The other object to compute the matrix product with.
- Returns:
If other is a DataContainerSeries, return the matrix product between self and other as a DataContainerSeries. If other is a DataContainer or a numpy.array, return the matrix product of self and other in a DataContainer of a np.array.
- Return type:
See also
DataContainerSeries.dot
Similar method for DataContainerSeries.
Notes
The dimensions of DataContainer and other must be compatible in order to compute the matrix multiplication. In addition, the column names of DataContainer and the index of other must contain the same values, as they will be aligned prior to the multiplication.
The dot method for DataContainerSeries computes the inner product, instead of the matrix product here.
Examples
Here we multiply a DataContainer with a DataContainerSeries.
>>> df = DataContainer([[0, 1, -2, -1], [1, 1, 1, 1]]) >>> s = pd.DataContainerSeries([1, 1, 2, 1]) >>> df.dot(s) 0 -4 1 5 dtype: int64
Here we multiply a DataContainer with another DataContainer.
>>> other = DataContainer([[0, 1], [1, 2], [-1, -1], [2, 0]]) >>> df.dot(other) 0 1 0 1 4 1 2 2
Note that the dot method give the same result as @
>>> df @ other 0 1 0 1 4 1 2 2
The dot method works also if other is an np.array.
>>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]]) >>> df.dot(arr) 0 1 0 1 4 1 2 2
Note how shuffling of the objects does not change the result.
>>> s2 = s.reindex([1, 0, 2, 3]) >>> df.dot(s2) 0 -4 1 5 dtype: int64
- floordiv(**kwargs)¶
Get Integer division of DataContainer and other, element-wise (binary operator floordiv).
Equivalent to
DataContainer // other
, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rfloordiv.Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.
- Parameters:
other (scalar, sequence, DataContainerSeries, dict or DataContainer) – Any single or multiple element data structure, or list-like object.
axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For DataContainerSeries input, axis to match DataContainerSeries index on.
level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataContainer alignment, with this value before computation. If data in both corresponding DataContainer locations is missing the result will be missing.
- Returns:
Result of the arithmetic operation.
- Return type:
See also
DataContainer.add
Add DataContainers.
DataContainer.sub
Subtract DataContainers.
DataContainer.mul
Multiply DataContainers.
DataContainer.div
Divide DataContainers (float division).
DataContainer.truediv
Divide DataContainers (float division).
DataContainer.floordiv
Divide DataContainers (integer division).
DataContainer.mod
Calculate modulo (remainder after division).
DataContainer.pow
Calculate exponential power.
Notes
Mismatched indices will be unioned together.
Examples
>>> df = DataContainer({'angles': [0, 3, 4], ... 'degrees': [360, 180, 360]}, ... index=['circle', 'triangle', 'rectangle']) >>> df angles degrees circle 0 360 triangle 3 180 rectangle 4 360
Add a scalar with operator version which return the same results.
>>> df + 1 angles degrees circle 1 361 triangle 4 181 rectangle 5 361
>>> df.add(1) angles degrees circle 1 361 triangle 4 181 rectangle 5 361
Divide by constant with reverse version.
>>> df.div(10) angles degrees circle 0.0 36.0 triangle 0.3 18.0 rectangle 0.4 36.0
>>> df.rdiv(10) angles degrees circle inf 0.027778 triangle 3.333333 0.055556 rectangle 2.500000 0.027778
Subtract a list and DataContainerSeries by axis with operator version.
>>> df - [1, 2] angles degrees circle -1 358 triangle 2 178 rectangle 3 358
>>> df.sub([1, 2], axis='columns') angles degrees circle -1 358 triangle 2 178 rectangle 3 358
>>> df.sub(pd.DataContainerSeries([1, 1, 1], index=['circle', 'triangle', 'rectangle']), ... axis='index') angles degrees circle -1 359 triangle 2 179 rectangle 3 359
Multiply a dictionary by axis.
>>> df.mul({'angles': 0, 'degrees': 2}) angles degrees circle 0 720 triangle 0 360 rectangle 0 720
>>> df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index') angles degrees circle 0 0 triangle 6 360 rectangle 12 1080
Multiply a DataContainer of different shape with operator version.
>>> other = DataContainer({'angles': [0, 3, 4]}, ... index=['circle', 'triangle', 'rectangle']) >>> other angles circle 0 triangle 3 rectangle 4
>>> df * other angles degrees circle 0 NaN triangle 9 NaN rectangle 16 NaN
>>> df.mul(other, fill_value=0) angles degrees circle 0 0.0 triangle 9 0.0 rectangle 16 0.0
Divide by a MultiIndex by level.
>>> df_multindex = DataContainer({'angles': [0, 3, 4, 4, 5, 6], ... 'degrees': [360, 180, 360, 360, 540, 720]}, ... index=[['A', 'A', 'A', 'B', 'B', 'B'], ... ['circle', 'triangle', 'rectangle', ... 'square', 'pentagon', 'hexagon']]) >>> df_multindex angles degrees A circle 0 360 triangle 3 180 rectangle 4 360 B square 4 360 pentagon 5 540 hexagon 6 720
>>> df.div(df_multindex, level=1, fill_value=0) angles degrees A circle NaN 1.0 triangle 1.0 1.0 rectangle 1.0 1.0 B square 0.0 0.0 pentagon 0.0 0.0 hexagon 0.0 0.0
- isin(**kwargs)¶
Whether each element in the DataContainer is contained in values.
- Parameters:
values (iterable, DataContainerSeries, DataContainer or dict) – The result will only be true at a location if all the labels match. If values is a DataContainerSeries, that’s the index. If values is a dict, the keys must be the column names, which must match. If values is a DataContainer, then both the index and column labels must match.
- Returns:
DataContainer of booleans showing whether each element in the DataContainer is contained in values.
- Return type:
See also
DataContainer.eq
Equality test for DataContainer.
DataContainerSeries.isin
Equivalent method on DataContainerSeries.
DataContainerSeries.str.contains
Test if pattern or regex is contained within a string of a DataContainerSeries or Index.
Examples
>>> df = DataContainer({'num_legs': [2, 4], 'num_wings': [2, 0]}, ... index=['falcon', 'dog']) >>> df num_legs num_wings falcon 2 2 dog 4 0
When
values
is a list check whether every value in the DataContainer is present in the list (which animals have 0 or 2 legs or wings)>>> df.isin([0, 2]) num_legs num_wings falcon True True dog False True
To check if
values
is not in the DataContainer, use the~
operator:>>> ~df.isin([0, 2]) num_legs num_wings falcon False False dog True False
When
values
is a dict, we can pass values to check for each column separately:>>> df.isin({'num_wings': [0, 3]}) num_legs num_wings falcon False False dog False True
When
values
is a DataContainerSeries or DataContainer the index and column must match. Note that ‘falcon’ does not match based on the number of legs in other.>>> other = DataContainer({'num_legs': [8, 3], 'num_wings': [0, 2]}, ... index=['spider', 'falcon']) >>> df.isin(other) num_legs num_wings falcon False True dog False False
- join(**kwargs)¶
Join columns of another DataContainer.
Join columns with other DataContainer either on index or on a key column. Efficiently join multiple DataContainer objects by index at once by passing a list.
- Parameters:
other (DataContainer, DataContainerSeries, or a list containing any combination of them) – Index should be similar to one of the columns in this one. If a DataContainerSeries is passed, its name attribute must be set, and that will be used as the column name in the resulting joined DataContainer.
on (str, list of str, or array-like, optional) – Column or index level name(s) in the caller to join on the index in other, otherwise joins index-on-index. If multiple values given, the other DataContainer must have a MultiIndex. Can pass an array as the join key if it is not already contained in the calling DataContainer. Like an Excel VLOOKUP operation.
how ({'left', 'right', 'outer', 'inner', 'cross'}, default 'left') –
How to handle the operation of the two objects.
left: use calling DataContainer’s index (or column if on is specified)
right: use other’s index.
outer: form union of calling DataContainer’s index (or column if on is specified) with other’s index, and sort it lexicographically.
inner: form intersection of calling DataContainer’s index (or column if on is specified) with other’s index, preserving the order of the calling’s one.
cross: creates the cartesian product from both frames, preserves the order of the left keys.
lsuffix (str, default '') – Suffix to use from left DataContainer’s overlapping columns.
rsuffix (str, default '') – Suffix to use from right DataContainer’s overlapping columns.
sort (bool, default False) – Order result DataContainer lexicographically by the join key. If False, the order of the join key depends on the join type (how keyword).
validate (str, optional) –
If specified, checks if join is of specified type.
”one_to_one” or “1:1”: check if join keys are unique in both left and right datasets.
”one_to_many” or “1:m”: check if join keys are unique in left dataset.
”many_to_one” or “m:1”: check if join keys are unique in right dataset.
”many_to_many” or “m:m”: allowed, but does not result in checks.
Added in version 1.5.0.
- Returns:
A DataContainer containing columns from both the caller and other.
- Return type:
See also
DataContainer.merge
For column(s)-on-column(s) operations.
Notes
Parameters on, lsuffix, and rsuffix are not supported when passing a list of DataContainer objects.
Examples
>>> df = DataContainer({'key': ['K0', 'K1', 'K2', 'K3', 'K4', 'K5'], ... 'A': ['A0', 'A1', 'A2', 'A3', 'A4', 'A5']})
>>> df key A 0 K0 A0 1 K1 A1 2 K2 A2 3 K3 A3 4 K4 A4 5 K5 A5
>>> other = DataContainer({'key': ['K0', 'K1', 'K2'], ... 'B': ['B0', 'B1', 'B2']})
>>> other key B 0 K0 B0 1 K1 B1 2 K2 B2
Join DataContainers using their indexes.
>>> df.join(other, lsuffix='_caller', rsuffix='_other') key_caller A key_other B 0 K0 A0 K0 B0 1 K1 A1 K1 B1 2 K2 A2 K2 B2 3 K3 A3 NaN NaN 4 K4 A4 NaN NaN 5 K5 A5 NaN NaN
If we want to join using the key columns, we need to set key to be the index in both df and other. The joined DataContainer will have key as its index.
>>> df.set_index('key').join(other.set_index('key')) A B key K0 A0 B0 K1 A1 B1 K2 A2 B2 K3 A3 NaN K4 A4 NaN K5 A5 NaN
Another option to join using the key columns is to use the on parameter. DataContainer.join always uses other’s index but we can use any column in df. This method preserves the original DataContainer’s index in the result.
>>> df.join(other.set_index('key'), on='key') key A B 0 K0 A0 B0 1 K1 A1 B1 2 K2 A2 B2 3 K3 A3 NaN 4 K4 A4 NaN 5 K5 A5 NaN
Using non-unique key values shows how they are matched.
>>> df = DataContainer({'key': ['K0', 'K1', 'K1', 'K3', 'K0', 'K1'], ... 'A': ['A0', 'A1', 'A2', 'A3', 'A4', 'A5']})
>>> df key A 0 K0 A0 1 K1 A1 2 K1 A2 3 K3 A3 4 K0 A4 5 K1 A5
>>> df.join(other.set_index('key'), on='key', validate='m:1') key A B 0 K0 A0 B0 1 K1 A1 B1 2 K1 A2 B1 3 K3 A3 NaN 4 K0 A4 B0 5 K1 A5 B1
- melt(**kwargs)¶
Unpivot a DataContainer from wide to long format, optionally leaving identifiers set.
This function is useful to massage a DataContainer into a format where one or more columns are identifier variables (id_vars), while all other columns, considered measured variables (value_vars), are “unpivoted” to the row axis, leaving just two non-identifier columns, ‘variable’ and ‘value’.
- Parameters:
id_vars (scalar, tuple, list, or ndarray, optional) – Column(s) to use as identifier variables.
value_vars (scalar, tuple, list, or ndarray, optional) – Column(s) to unpivot. If not specified, uses all columns that are not set as id_vars.
var_name (scalar, default None) – Name to use for the ‘variable’ column. If None it uses
frame.columns.name
or ‘variable’.value_name (scalar, default 'value') – Name to use for the ‘value’ column, can’t be an existing column label.
col_level (scalar, optional) – If columns are a MultiIndex then use this level to melt.
ignore_index (bool, default True) – If True, original index is ignored. If False, the original index is retained. Index labels will be repeated as necessary.
- Returns:
Unpivoted DataContainer.
- Return type:
See also
melt
Identical method.
pivot_table
Create a spreadsheet-style pivot table as a DataContainer.
DataContainer.pivot
Return reshaped DataContainer organized by given index / column values.
DataContainer.explode
Explode a DataContainer from list-like columns to long format.
Notes
Reference the user guide for more examples.
Examples
>>> df = DataContainer({'A': {0: 'a', 1: 'b', 2: 'c'}, ... 'B': {0: 1, 1: 3, 2: 5}, ... 'C': {0: 2, 1: 4, 2: 6}}) >>> df A B C 0 a 1 2 1 b 3 4 2 c 5 6
>>> df.melt(id_vars=['A'], value_vars=['B']) A variable value 0 a B 1 1 b B 3 2 c B 5
>>> df.melt(id_vars=['A'], value_vars=['B', 'C']) A variable value 0 a B 1 1 b B 3 2 c B 5 3 a C 2 4 b C 4 5 c C 6
The names of ‘variable’ and ‘value’ columns can be customized:
>>> df.melt(id_vars=['A'], value_vars=['B'], ... var_name='myVarname', value_name='myValname') A myVarname myValname 0 a B 1 1 b B 3 2 c B 5
Original index values can be kept around:
>>> df.melt(id_vars=['A'], value_vars=['B', 'C'], ignore_index=False) A variable value 0 a B 1 1 b B 3 2 c B 5 0 a C 2 1 b C 4 2 c C 6
If you have multi-index columns:
>>> df.columns = [list('ABC'), list('DEF')] >>> df A B C D E F 0 a 1 2 1 b 3 4 2 c 5 6
>>> df.melt(col_level=0, id_vars=['A'], value_vars=['B']) A variable value 0 a B 1 1 b B 3 2 c B 5
>>> df.melt(id_vars=[('A', 'D')], value_vars=[('B', 'E')]) (A, D) variable_0 variable_1 value 0 a B E 1 1 b B E 3 2 c B E 5
- merge(**kwargs)¶
Merge DataContainer or named DataContainerSeries objects with a database-style join.
A named DataContainerSeries object is treated as a DataContainer with a single named column.
The join is done on columns or indexes. If joining columns on columns, the DataContainer indexes will be ignored. Otherwise if joining indexes on indexes or indexes on a column or columns, the index will be passed on. When performing a cross merge, no column specifications to merge on are allowed.
Warning
If both key columns contain rows where the key is a null value, those rows will be matched against each other. This is different from usual SQL join behaviour and can lead to unexpected results.
- Parameters:
right (DataContainer or named DataContainerSeries) – Object to merge with.
how ({'left', 'right', 'outer', 'inner', 'cross'}, default 'inner') –
Type of merge to be performed.
left: use only keys from left frame, similar to a SQL left outer join; preserve key order.
right: use only keys from right frame, similar to a SQL right outer join; preserve key order.
outer: use union of keys from both frames, similar to a SQL full outer join; sort keys lexicographically.
inner: use intersection of keys from both frames, similar to a SQL inner join; preserve the order of the left keys.
cross: creates the cartesian product from both frames, preserves the order of the left keys.
on (label or list) – Column or index level names to join on. These must be found in both DataContainers. If on is None and not merging on indexes then this defaults to the intersection of the columns in both DataContainers.
left_on (label or list, or array-like) – Column or index level names to join on in the left DataContainer. Can also be an array or list of arrays of the length of the left DataContainer. These arrays are treated as if they are columns.
right_on (label or list, or array-like) – Column or index level names to join on in the right DataContainer. Can also be an array or list of arrays of the length of the right DataContainer. These arrays are treated as if they are columns.
left_index (bool, default False) – Use the index from the left DataContainer as the join key(s). If it is a MultiIndex, the number of keys in the other DataContainer (either the index or a number of columns) must match the number of levels.
right_index (bool, default False) – Use the index from the right DataContainer as the join key. Same caveats as left_index.
sort (bool, default False) – Sort the join keys lexicographically in the result DataContainer. If False, the order of the join keys depends on the join type (how keyword).
suffixes (list-like, default is ("_x", "_y")) – A length-2 sequence where each element is optionally a string indicating the suffix to add to overlapping column names in left and right respectively. Pass a value of None instead of a string to indicate that the column name from left or right should be left as-is, with no suffix. At least one of the values must not be None.
copy (bool, default True) –
If False, avoid copy if possible.
Note
The copy keyword will change behavior in pandas 3.0. Copy-on-Write will be enabled by default, which means that all methods with a copy keyword will use a lazy copy mechanism to defer the copy and ignore the copy keyword. The copy keyword will be removed in a future version of pandas.
You can already get the future behavior and improvements through enabling copy on write
pd.options.mode.copy_on_write = True
indicator (bool or str, default False) – If True, adds a column to the output DataContainer called “_merge” with information on the source of each row. The column can be given a different name by providing a string argument. The column will have a Categorical type with the value of “left_only” for observations whose merge key only appears in the left DataContainer, “right_only” for observations whose merge key only appears in the right DataContainer, and “both” if the observation’s merge key is found in both DataContainers.
validate (str, optional) –
If specified, checks if merge is of specified type.
”one_to_one” or “1:1”: check if merge keys are unique in both left and right datasets.
”one_to_many” or “1:m”: check if merge keys are unique in left dataset.
”many_to_one” or “m:1”: check if merge keys are unique in right dataset.
”many_to_many” or “m:m”: allowed, but does not result in checks.
- Returns:
A DataContainer of the two merged objects.
- Return type:
See also
merge_ordered
Merge with optional filling/interpolation.
merge_asof
Merge on nearest keys.
DataContainer.join
Similar method using indices.
Examples
>>> df1 = DataContainer({'lkey': ['foo', 'bar', 'baz', 'foo'], ... 'value': [1, 2, 3, 5]}) >>> df2 = DataContainer({'rkey': ['foo', 'bar', 'baz', 'foo'], ... 'value': [5, 6, 7, 8]}) >>> df1 lkey value 0 foo 1 1 bar 2 2 baz 3 3 foo 5 >>> df2 rkey value 0 foo 5 1 bar 6 2 baz 7 3 foo 8
Merge df1 and df2 on the lkey and rkey columns. The value columns have the default suffixes, _x and _y, appended.
>>> df1.merge(df2, left_on='lkey', right_on='rkey') lkey value_x rkey value_y 0 foo 1 foo 5 1 foo 1 foo 8 2 bar 2 bar 6 3 baz 3 baz 7 4 foo 5 foo 5 5 foo 5 foo 8
Merge DataContainers df1 and df2 with specified left and right suffixes appended to any overlapping columns.
>>> df1.merge(df2, left_on='lkey', right_on='rkey', ... suffixes=('_left', '_right')) lkey value_left rkey value_right 0 foo 1 foo 5 1 foo 1 foo 8 2 bar 2 bar 6 3 baz 3 baz 7 4 foo 5 foo 5 5 foo 5 foo 8
Merge DataContainers df1 and df2, but raise an exception if the DataContainers have any overlapping columns.
>>> df1.merge(df2, left_on='lkey', right_on='rkey', suffixes=(False, False)) Traceback (most recent call last): ... ValueError: columns overlap but no suffix specified: Index(['value'], dtype='object')
>>> df1 = DataContainer({'a': ['foo', 'bar'], 'b': [1, 2]}) >>> df2 = DataContainer({'a': ['foo', 'baz'], 'c': [3, 4]}) >>> df1 a b 0 foo 1 1 bar 2 >>> df2 a c 0 foo 3 1 baz 4
>>> df1.merge(df2, how='inner', on='a') a b c 0 foo 1 3
>>> df1.merge(df2, how='left', on='a') a b c 0 foo 1 3.0 1 bar 2 NaN
>>> df1 = DataContainer({'left': ['foo', 'bar']}) >>> df2 = DataContainer({'right': [7, 8]}) >>> df1 left 0 foo 1 bar >>> df2 right 0 7 1 8
>>> df1.merge(df2, how='cross') left right 0 foo 7 1 foo 8 2 bar 7 3 bar 8
- mod(**kwargs)¶
Get Modulo of DataContainer and other, element-wise (binary operator mod).
Equivalent to
DataContainer % other
, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rmod.Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.
- Parameters:
other (scalar, sequence, DataContainerSeries, dict or DataContainer) – Any single or multiple element data structure, or list-like object.
axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For DataContainerSeries input, axis to match DataContainerSeries index on.
level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataContainer alignment, with this value before computation. If data in both corresponding DataContainer locations is missing the result will be missing.
- Returns:
Result of the arithmetic operation.
- Return type:
See also
DataContainer.add
Add DataContainers.
DataContainer.sub
Subtract DataContainers.
DataContainer.mul
Multiply DataContainers.
DataContainer.div
Divide DataContainers (float division).
DataContainer.truediv
Divide DataContainers (float division).
DataContainer.floordiv
Divide DataContainers (integer division).
DataContainer.mod
Calculate modulo (remainder after division).
DataContainer.pow
Calculate exponential power.
Notes
Mismatched indices will be unioned together.
Examples
>>> df = DataContainer({'angles': [0, 3, 4], ... 'degrees': [360, 180, 360]}, ... index=['circle', 'triangle', 'rectangle']) >>> df angles degrees circle 0 360 triangle 3 180 rectangle 4 360
Add a scalar with operator version which return the same results.
>>> df + 1 angles degrees circle 1 361 triangle 4 181 rectangle 5 361
>>> df.add(1) angles degrees circle 1 361 triangle 4 181 rectangle 5 361
Divide by constant with reverse version.
>>> df.div(10) angles degrees circle 0.0 36.0 triangle 0.3 18.0 rectangle 0.4 36.0
>>> df.rdiv(10) angles degrees circle inf 0.027778 triangle 3.333333 0.055556 rectangle 2.500000 0.027778
Subtract a list and DataContainerSeries by axis with operator version.
>>> df - [1, 2] angles degrees circle -1 358 triangle 2 178 rectangle 3 358
>>> df.sub([1, 2], axis='columns') angles degrees circle -1 358 triangle 2 178 rectangle 3 358
>>> df.sub(pd.DataContainerSeries([1, 1, 1], index=['circle', 'triangle', 'rectangle']), ... axis='index') angles degrees circle -1 359 triangle 2 179 rectangle 3 359
Multiply a dictionary by axis.
>>> df.mul({'angles': 0, 'degrees': 2}) angles degrees circle 0 720 triangle 0 360 rectangle 0 720
>>> df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index') angles degrees circle 0 0 triangle 6 360 rectangle 12 1080
Multiply a DataContainer of different shape with operator version.
>>> other = DataContainer({'angles': [0, 3, 4]}, ... index=['circle', 'triangle', 'rectangle']) >>> other angles circle 0 triangle 3 rectangle 4
>>> df * other angles degrees circle 0 NaN triangle 9 NaN rectangle 16 NaN
>>> df.mul(other, fill_value=0) angles degrees circle 0 0.0 triangle 9 0.0 rectangle 16 0.0
Divide by a MultiIndex by level.
>>> df_multindex = DataContainer({'angles': [0, 3, 4, 4, 5, 6], ... 'degrees': [360, 180, 360, 360, 540, 720]}, ... index=[['A', 'A', 'A', 'B', 'B', 'B'], ... ['circle', 'triangle', 'rectangle', ... 'square', 'pentagon', 'hexagon']]) >>> df_multindex angles degrees A circle 0 360 triangle 3 180 rectangle 4 360 B square 4 360 pentagon 5 540 hexagon 6 720
>>> df.div(df_multindex, level=1, fill_value=0) angles degrees A circle NaN 1.0 triangle 1.0 1.0 rectangle 1.0 1.0 B square 0.0 0.0 pentagon 0.0 0.0 hexagon 0.0 0.0
- mul(**kwargs)¶
Get Multiplication of DataContainer and other, element-wise (binary operator mul).
Equivalent to
DataContainer * other
, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rmul.Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.
- Parameters:
other (scalar, sequence, DataContainerSeries, dict or DataContainer) – Any single or multiple element data structure, or list-like object.
axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For DataContainerSeries input, axis to match DataContainerSeries index on.
level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataContainer alignment, with this value before computation. If data in both corresponding DataContainer locations is missing the result will be missing.
- Returns:
Result of the arithmetic operation.
- Return type:
See also
DataContainer.add
Add DataContainers.
DataContainer.sub
Subtract DataContainers.
DataContainer.mul
Multiply DataContainers.
DataContainer.div
Divide DataContainers (float division).
DataContainer.truediv
Divide DataContainers (float division).
DataContainer.floordiv
Divide DataContainers (integer division).
DataContainer.mod
Calculate modulo (remainder after division).
DataContainer.pow
Calculate exponential power.
Notes
Mismatched indices will be unioned together.
Examples
>>> df = DataContainer({'angles': [0, 3, 4], ... 'degrees': [360, 180, 360]}, ... index=['circle', 'triangle', 'rectangle']) >>> df angles degrees circle 0 360 triangle 3 180 rectangle 4 360
Add a scalar with operator version which return the same results.
>>> df + 1 angles degrees circle 1 361 triangle 4 181 rectangle 5 361
>>> df.add(1) angles degrees circle 1 361 triangle 4 181 rectangle 5 361
Divide by constant with reverse version.
>>> df.div(10) angles degrees circle 0.0 36.0 triangle 0.3 18.0 rectangle 0.4 36.0
>>> df.rdiv(10) angles degrees circle inf 0.027778 triangle 3.333333 0.055556 rectangle 2.500000 0.027778
Subtract a list and DataContainerSeries by axis with operator version.
>>> df - [1, 2] angles degrees circle -1 358 triangle 2 178 rectangle 3 358
>>> df.sub([1, 2], axis='columns') angles degrees circle -1 358 triangle 2 178 rectangle 3 358
>>> df.sub(pd.DataContainerSeries([1, 1, 1], index=['circle', 'triangle', 'rectangle']), ... axis='index') angles degrees circle -1 359 triangle 2 179 rectangle 3 359
Multiply a dictionary by axis.
>>> df.mul({'angles': 0, 'degrees': 2}) angles degrees circle 0 720 triangle 0 360 rectangle 0 720
>>> df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index') angles degrees circle 0 0 triangle 6 360 rectangle 12 1080
Multiply a DataContainer of different shape with operator version.
>>> other = DataContainer({'angles': [0, 3, 4]}, ... index=['circle', 'triangle', 'rectangle']) >>> other angles circle 0 triangle 3 rectangle 4
>>> df * other angles degrees circle 0 NaN triangle 9 NaN rectangle 16 NaN
>>> df.mul(other, fill_value=0) angles degrees circle 0 0.0 triangle 9 0.0 rectangle 16 0.0
Divide by a MultiIndex by level.
>>> df_multindex = DataContainer({'angles': [0, 3, 4, 4, 5, 6], ... 'degrees': [360, 180, 360, 360, 540, 720]}, ... index=[['A', 'A', 'A', 'B', 'B', 'B'], ... ['circle', 'triangle', 'rectangle', ... 'square', 'pentagon', 'hexagon']]) >>> df_multindex angles degrees A circle 0 360 triangle 3 180 rectangle 4 360 B square 4 360 pentagon 5 540 hexagon 6 720
>>> df.div(df_multindex, level=1, fill_value=0) angles degrees A circle NaN 1.0 triangle 1.0 1.0 rectangle 1.0 1.0 B square 0.0 0.0 pentagon 0.0 0.0 hexagon 0.0 0.0
- pivot(**kwargs)¶
Return reshaped DataContainer organized by given index / column values.
Reshape data (produce a “pivot” table) based on column values. Uses unique values from specified index / columns to form axes of the resulting DataContainer. This function does not support data aggregation, multiple values will result in a MultiIndex in the columns. See the User Guide for more on reshaping.
- Parameters:
columns (str or object or a list of str) – Column to use to make new DataContainer’s columns.
index (str or object or a list of str, optional) – Column to use to make new DataContainer’s index. If not given, uses existing index.
values (str, object or a list of the previous, optional) – Column(s) to use for populating new DataContainer’s values. If not specified, all remaining columns will be used and the result will have hierarchically indexed columns.
- Returns:
Returns reshaped DataContainer.
- Return type:
- Raises:
ValueError: – When there are any index, columns combinations with multiple values. DataContainer.pivot_table when you need to aggregate.
See also
DataContainer.pivot_table
Generalization of pivot that can handle duplicate values for one index/column pair.
DataContainer.unstack
Pivot based on the index values instead of a column.
wide_to_long
Wide panel to long format. Less flexible but more user-friendly than melt.
Notes
For finer-tuned control, see hierarchical indexing documentation along with the related stack/unstack methods.
Reference the user guide for more examples.
Examples
>>> df = DataContainer({'foo': ['one', 'one', 'one', 'two', 'two', ... 'two'], ... 'bar': ['A', 'B', 'C', 'A', 'B', 'C'], ... 'baz': [1, 2, 3, 4, 5, 6], ... 'zoo': ['x', 'y', 'z', 'q', 'w', 't']}) >>> df foo bar baz zoo 0 one A 1 x 1 one B 2 y 2 one C 3 z 3 two A 4 q 4 two B 5 w 5 two C 6 t
>>> df.pivot(index='foo', columns='bar', values='baz') bar A B C foo one 1 2 3 two 4 5 6
>>> df.pivot(index='foo', columns='bar')['baz'] bar A B C foo one 1 2 3 two 4 5 6
>>> df.pivot(index='foo', columns='bar', values=['baz', 'zoo']) baz zoo bar A B C A B C foo one 1 2 3 x y z two 4 5 6 q w t
You could also assign a list of column names or a list of index names.
>>> df = DataContainer({ ... "lev1": [1, 1, 1, 2, 2, 2], ... "lev2": [1, 1, 2, 1, 1, 2], ... "lev3": [1, 2, 1, 2, 1, 2], ... "lev4": [1, 2, 3, 4, 5, 6], ... "values": [0, 1, 2, 3, 4, 5]}) >>> df lev1 lev2 lev3 lev4 values 0 1 1 1 1 0 1 1 1 2 2 1 2 1 2 1 3 2 3 2 1 2 4 3 4 2 1 1 5 4 5 2 2 2 6 5
>>> df.pivot(index="lev1", columns=["lev2", "lev3"], values="values") lev2 1 2 lev3 1 2 1 2 lev1 1 0.0 1.0 2.0 NaN 2 4.0 3.0 NaN 5.0
>>> df.pivot(index=["lev1", "lev2"], columns=["lev3"], values="values") lev3 1 2 lev1 lev2 1 1 0.0 1.0 2 2.0 NaN 2 1 4.0 3.0 2 NaN 5.0
A ValueError is raised if there are any duplicates.
>>> df = DataContainer({"foo": ['one', 'one', 'two', 'two'], ... "bar": ['A', 'A', 'B', 'C'], ... "baz": [1, 2, 3, 4]}) >>> df foo bar baz 0 one A 1 1 one A 2 2 two B 3 3 two C 4
Notice that the first two rows are the same for our index and columns arguments.
>>> df.pivot(index='foo', columns='bar', values='baz') Traceback (most recent call last): ... ValueError: Index contains duplicate entries, cannot reshape
- plot(*args, **kwargs) ndarray | Axes ¶
Make plots of Series or DataFrame.
Uses the backend specified by the option
plotting.backend
. By default, matplotlib is used.- Parameters:
data (Series or DataFrame) – The object for which the method is called.
x (label or position, default None) – Only used if data is a DataFrame.
y (label, position or list of label, positions, default None) – Allows plotting of one column versus another. Only used if data is a DataFrame.
kind (str) –
The kind of plot to produce:
’line’ : line plot (default)
’bar’ : vertical bar plot
’barh’ : horizontal bar plot
’hist’ : histogram
’box’ : boxplot
’kde’ : Kernel Density Estimation plot
’density’ : same as ‘kde’
’area’ : area plot
’pie’ : pie plot
’scatter’ : scatter plot (DataFrame only)
’hexbin’ : hexbin plot (DataFrame only)
ax (matplotlib axes object, default None) – An axes of the current figure.
subplots (bool or sequence of iterables, default False) –
Whether to group columns into subplots:
False
: No subplots will be usedTrue
: Make separate subplots for each column.sequence of iterables of column labels: Create a subplot for each group of columns. For example [(‘a’, ‘c’), (‘b’, ‘d’)] will create 2 subplots: one with columns ‘a’ and ‘c’, and one with columns ‘b’ and ‘d’. Remaining columns that aren’t specified will be plotted in additional subplots (one per column).
Added in version 1.5.0.
sharex (bool, default True if ax is None else False) – In case
subplots=True
, share x axis and set some x axis labels to invisible; defaults to True if ax is None otherwise False if an ax is passed in; Be aware, that passing in both an ax andsharex=True
will alter all x axis labels for all axis in a figure.sharey (bool, default False) – In case
subplots=True
, share y axis and set some y axis labels to invisible.layout (tuple, optional) – (rows, columns) for the layout of subplots.
figsize (a tuple (width, height) in inches) – Size of a figure object.
use_index (bool, default True) – Use index as ticks for x axis.
title (str or list) – Title to use for the plot. If a string is passed, print the string at the top of the figure. If a list is passed and subplots is True, print each item in the list above the corresponding subplot.
grid (bool, default None (matlab style default)) – Axis grid lines.
legend (bool or {'reverse'}) – Place legend on axis subplots.
style (list or dict) – The matplotlib line style per column.
logx (bool or 'sym', default False) – Use log scaling or symlog scaling on x axis.
logy (bool or 'sym' default False) – Use log scaling or symlog scaling on y axis.
loglog (bool or 'sym', default False) – Use log scaling or symlog scaling on both x and y axes.
xticks (sequence) – Values to use for the xticks.
yticks (sequence) – Values to use for the yticks.
xlim (2-tuple/list) – Set the x limits of the current axes.
ylim (2-tuple/list) – Set the y limits of the current axes.
xlabel (label, optional) –
Name to use for the xlabel on x-axis. Default uses index name as xlabel, or the x-column name for planar plots.
Changed in version 2.0.0: Now applicable to histograms.
ylabel (label, optional) –
Name to use for the ylabel on y-axis. Default will show no ylabel, or the y-column name for planar plots.
Changed in version 2.0.0: Now applicable to histograms.
rot (float, default None) – Rotation for ticks (xticks for vertical, yticks for horizontal plots).
fontsize (float, default None) – Font size for xticks and yticks.
colormap (str or matplotlib colormap object, default None) – Colormap to select colors from. If string, load colormap with that name from matplotlib.
colorbar (bool, optional) – If True, plot colorbar (only relevant for ‘scatter’ and ‘hexbin’ plots).
position (float) – Specify relative alignments for bar plot layout. From 0 (left/bottom-end) to 1 (right/top-end). Default is 0.5 (center).
table (bool, Series or DataFrame, default False) – If True, draw a table using the data in the DataFrame and the data will be transposed to meet matplotlib’s default layout. If a Series or DataFrame is passed, use passed data to draw a table.
yerr (DataFrame, Series, array-like, dict and str) – See Plotting with Error Bars for detail.
xerr (DataFrame, Series, array-like, dict and str) – Equivalent to yerr.
stacked (bool, default False in line and bar plots, and True in area plot) – If True, create stacked plot.
secondary_y (bool or sequence, default False) – Whether to plot on the secondary y-axis if a list/tuple, which columns to plot on secondary y-axis.
mark_right (bool, default True) – When using a secondary_y axis, automatically mark the column labels with “(right)” in the legend.
include_bool (bool, default is False) – If True, boolean values can be plotted.
backend (str, default None) – Backend to use instead of the backend specified in the option
plotting.backend
. For instance, ‘matplotlib’. Alternatively, to specify theplotting.backend
for the whole session, setpd.options.plotting.backend
.**kwargs – Options to pass to matplotlib plotting method.
- Returns:
If the backend is not the default matplotlib one, the return value will be the object returned by the backend.
- Return type:
matplotlib.axes.Axes
or numpy.ndarray of them
Notes
See matplotlib documentation online for more on this subject
If kind = ‘bar’ or ‘barh’, you can specify relative alignments for bar plot layout by position keyword. From 0 (left/bottom-end) to 1 (right/top-end). Default is 0.5 (center)
Examples
For Series:
For DataFrame:
For SeriesGroupBy:
For DataFrameGroupBy:
- pow(**kwargs)¶
Get Exponential power of DataContainer and other, element-wise (binary operator pow).
Equivalent to
DataContainer ** other
, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rpow.Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.
- Parameters:
other (scalar, sequence, DataContainerSeries, dict or DataContainer) – Any single or multiple element data structure, or list-like object.
axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For DataContainerSeries input, axis to match DataContainerSeries index on.
level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataContainer alignment, with this value before computation. If data in both corresponding DataContainer locations is missing the result will be missing.
- Returns:
Result of the arithmetic operation.
- Return type:
See also
DataContainer.add
Add DataContainers.
DataContainer.sub
Subtract DataContainers.
DataContainer.mul
Multiply DataContainers.
DataContainer.div
Divide DataContainers (float division).
DataContainer.truediv
Divide DataContainers (float division).
DataContainer.floordiv
Divide DataContainers (integer division).
DataContainer.mod
Calculate modulo (remainder after division).
DataContainer.pow
Calculate exponential power.
Notes
Mismatched indices will be unioned together.
Examples
>>> df = DataContainer({'angles': [0, 3, 4], ... 'degrees': [360, 180, 360]}, ... index=['circle', 'triangle', 'rectangle']) >>> df angles degrees circle 0 360 triangle 3 180 rectangle 4 360
Add a scalar with operator version which return the same results.
>>> df + 1 angles degrees circle 1 361 triangle 4 181 rectangle 5 361
>>> df.add(1) angles degrees circle 1 361 triangle 4 181 rectangle 5 361
Divide by constant with reverse version.
>>> df.div(10) angles degrees circle 0.0 36.0 triangle 0.3 18.0 rectangle 0.4 36.0
>>> df.rdiv(10) angles degrees circle inf 0.027778 triangle 3.333333 0.055556 rectangle 2.500000 0.027778
Subtract a list and DataContainerSeries by axis with operator version.
>>> df - [1, 2] angles degrees circle -1 358 triangle 2 178 rectangle 3 358
>>> df.sub([1, 2], axis='columns') angles degrees circle -1 358 triangle 2 178 rectangle 3 358
>>> df.sub(pd.DataContainerSeries([1, 1, 1], index=['circle', 'triangle', 'rectangle']), ... axis='index') angles degrees circle -1 359 triangle 2 179 rectangle 3 359
Multiply a dictionary by axis.
>>> df.mul({'angles': 0, 'degrees': 2}) angles degrees circle 0 720 triangle 0 360 rectangle 0 720
>>> df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index') angles degrees circle 0 0 triangle 6 360 rectangle 12 1080
Multiply a DataContainer of different shape with operator version.
>>> other = DataContainer({'angles': [0, 3, 4]}, ... index=['circle', 'triangle', 'rectangle']) >>> other angles circle 0 triangle 3 rectangle 4
>>> df * other angles degrees circle 0 NaN triangle 9 NaN rectangle 16 NaN
>>> df.mul(other, fill_value=0) angles degrees circle 0 0.0 triangle 9 0.0 rectangle 16 0.0
Divide by a MultiIndex by level.
>>> df_multindex = DataContainer({'angles': [0, 3, 4, 4, 5, 6], ... 'degrees': [360, 180, 360, 360, 540, 720]}, ... index=[['A', 'A', 'A', 'B', 'B', 'B'], ... ['circle', 'triangle', 'rectangle', ... 'square', 'pentagon', 'hexagon']]) >>> df_multindex angles degrees A circle 0 360 triangle 3 180 rectangle 4 360 B square 4 360 pentagon 5 540 hexagon 6 720
>>> df.div(df_multindex, level=1, fill_value=0) angles degrees A circle NaN 1.0 triangle 1.0 1.0 rectangle 1.0 1.0 B square 0.0 0.0 pentagon 0.0 0.0 hexagon 0.0 0.0
- radd(**kwargs)¶
Get Addition of DataContainer and other, element-wise (binary operator radd).
Equivalent to
other + DataContainer
, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, add.Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.
- Parameters:
other (scalar, sequence, DataContainerSeries, dict or DataContainer) – Any single or multiple element data structure, or list-like object.
axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For DataContainerSeries input, axis to match DataContainerSeries index on.
level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataContainer alignment, with this value before computation. If data in both corresponding DataContainer locations is missing the result will be missing.
- Returns:
Result of the arithmetic operation.
- Return type:
See also
DataContainer.add
Add DataContainers.
DataContainer.sub
Subtract DataContainers.
DataContainer.mul
Multiply DataContainers.
DataContainer.div
Divide DataContainers (float division).
DataContainer.truediv
Divide DataContainers (float division).
DataContainer.floordiv
Divide DataContainers (integer division).
DataContainer.mod
Calculate modulo (remainder after division).
DataContainer.pow
Calculate exponential power.
Notes
Mismatched indices will be unioned together.
Examples
>>> df = DataContainer({'angles': [0, 3, 4], ... 'degrees': [360, 180, 360]}, ... index=['circle', 'triangle', 'rectangle']) >>> df angles degrees circle 0 360 triangle 3 180 rectangle 4 360
Add a scalar with operator version which return the same results.
>>> df + 1 angles degrees circle 1 361 triangle 4 181 rectangle 5 361
>>> df.add(1) angles degrees circle 1 361 triangle 4 181 rectangle 5 361
Divide by constant with reverse version.
>>> df.div(10) angles degrees circle 0.0 36.0 triangle 0.3 18.0 rectangle 0.4 36.0
>>> df.rdiv(10) angles degrees circle inf 0.027778 triangle 3.333333 0.055556 rectangle 2.500000 0.027778
Subtract a list and DataContainerSeries by axis with operator version.
>>> df - [1, 2] angles degrees circle -1 358 triangle 2 178 rectangle 3 358
>>> df.sub([1, 2], axis='columns') angles degrees circle -1 358 triangle 2 178 rectangle 3 358
>>> df.sub(pd.DataContainerSeries([1, 1, 1], index=['circle', 'triangle', 'rectangle']), ... axis='index') angles degrees circle -1 359 triangle 2 179 rectangle 3 359
Multiply a dictionary by axis.
>>> df.mul({'angles': 0, 'degrees': 2}) angles degrees circle 0 720 triangle 0 360 rectangle 0 720
>>> df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index') angles degrees circle 0 0 triangle 6 360 rectangle 12 1080
Multiply a DataContainer of different shape with operator version.
>>> other = DataContainer({'angles': [0, 3, 4]}, ... index=['circle', 'triangle', 'rectangle']) >>> other angles circle 0 triangle 3 rectangle 4
>>> df * other angles degrees circle 0 NaN triangle 9 NaN rectangle 16 NaN
>>> df.mul(other, fill_value=0) angles degrees circle 0 0.0 triangle 9 0.0 rectangle 16 0.0
Divide by a MultiIndex by level.
>>> df_multindex = DataContainer({'angles': [0, 3, 4, 4, 5, 6], ... 'degrees': [360, 180, 360, 360, 540, 720]}, ... index=[['A', 'A', 'A', 'B', 'B', 'B'], ... ['circle', 'triangle', 'rectangle', ... 'square', 'pentagon', 'hexagon']]) >>> df_multindex angles degrees A circle 0 360 triangle 3 180 rectangle 4 360 B square 4 360 pentagon 5 540 hexagon 6 720
>>> df.div(df_multindex, level=1, fill_value=0) angles degrees A circle NaN 1.0 triangle 1.0 1.0 rectangle 1.0 1.0 B square 0.0 0.0 pentagon 0.0 0.0 hexagon 0.0 0.0
- rdiv(**kwargs)¶
Get Floating division of DataContainer and other, element-wise (binary operator rtruediv).
Equivalent to
other / DataContainer
, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, truediv.Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.
- Parameters:
other (scalar, sequence, DataContainerSeries, dict or DataContainer) – Any single or multiple element data structure, or list-like object.
axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For DataContainerSeries input, axis to match DataContainerSeries index on.
level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataContainer alignment, with this value before computation. If data in both corresponding DataContainer locations is missing the result will be missing.
- Returns:
Result of the arithmetic operation.
- Return type:
See also
DataContainer.add
Add DataContainers.
DataContainer.sub
Subtract DataContainers.
DataContainer.mul
Multiply DataContainers.
DataContainer.div
Divide DataContainers (float division).
DataContainer.truediv
Divide DataContainers (float division).
DataContainer.floordiv
Divide DataContainers (integer division).
DataContainer.mod
Calculate modulo (remainder after division).
DataContainer.pow
Calculate exponential power.
Notes
Mismatched indices will be unioned together.
Examples
>>> df = DataContainer({'angles': [0, 3, 4], ... 'degrees': [360, 180, 360]}, ... index=['circle', 'triangle', 'rectangle']) >>> df angles degrees circle 0 360 triangle 3 180 rectangle 4 360
Add a scalar with operator version which return the same results.
>>> df + 1 angles degrees circle 1 361 triangle 4 181 rectangle 5 361
>>> df.add(1) angles degrees circle 1 361 triangle 4 181 rectangle 5 361
Divide by constant with reverse version.
>>> df.div(10) angles degrees circle 0.0 36.0 triangle 0.3 18.0 rectangle 0.4 36.0
>>> df.rdiv(10) angles degrees circle inf 0.027778 triangle 3.333333 0.055556 rectangle 2.500000 0.027778
Subtract a list and DataContainerSeries by axis with operator version.
>>> df - [1, 2] angles degrees circle -1 358 triangle 2 178 rectangle 3 358
>>> df.sub([1, 2], axis='columns') angles degrees circle -1 358 triangle 2 178 rectangle 3 358
>>> df.sub(pd.DataContainerSeries([1, 1, 1], index=['circle', 'triangle', 'rectangle']), ... axis='index') angles degrees circle -1 359 triangle 2 179 rectangle 3 359
Multiply a dictionary by axis.
>>> df.mul({'angles': 0, 'degrees': 2}) angles degrees circle 0 720 triangle 0 360 rectangle 0 720
>>> df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index') angles degrees circle 0 0 triangle 6 360 rectangle 12 1080
Multiply a DataContainer of different shape with operator version.
>>> other = DataContainer({'angles': [0, 3, 4]}, ... index=['circle', 'triangle', 'rectangle']) >>> other angles circle 0 triangle 3 rectangle 4
>>> df * other angles degrees circle 0 NaN triangle 9 NaN rectangle 16 NaN
>>> df.mul(other, fill_value=0) angles degrees circle 0 0.0 triangle 9 0.0 rectangle 16 0.0
Divide by a MultiIndex by level.
>>> df_multindex = DataContainer({'angles': [0, 3, 4, 4, 5, 6], ... 'degrees': [360, 180, 360, 360, 540, 720]}, ... index=[['A', 'A', 'A', 'B', 'B', 'B'], ... ['circle', 'triangle', 'rectangle', ... 'square', 'pentagon', 'hexagon']]) >>> df_multindex angles degrees A circle 0 360 triangle 3 180 rectangle 4 360 B square 4 360 pentagon 5 540 hexagon 6 720
>>> df.div(df_multindex, level=1, fill_value=0) angles degrees A circle NaN 1.0 triangle 1.0 1.0 rectangle 1.0 1.0 B square 0.0 0.0 pentagon 0.0 0.0 hexagon 0.0 0.0
- reindex_like(**kwargs)¶
Return an object with matching indices as other object.
Conform the object to the same index on all axes. Optional filling logic, placing NaN in locations having no value in the previous index. A new object is produced unless the new index is equivalent to the current one and copy=False.
- Parameters:
other (Object of the same data type) – Its row and column indices are used to define the new indices of this object.
method ({None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}) –
Method to use for filling holes in reindexed DataContainer. Please note: this is only applicable to DataContainers/DataContainerSeries with a monotonically increasing/decreasing index.
None (default): don’t fill gaps
pad / ffill: propagate last valid observation forward to next valid
backfill / bfill: use next valid observation to fill gap
nearest: use nearest valid observations to fill gap.
copy (bool, default True) –
Return a new object, even if the passed indexes are the same.
Note
The copy keyword will change behavior in pandas 3.0. Copy-on-Write will be enabled by default, which means that all methods with a copy keyword will use a lazy copy mechanism to defer the copy and ignore the copy keyword. The copy keyword will be removed in a future version of pandas.
You can already get the future behavior and improvements through enabling copy on write
pd.options.mode.copy_on_write = True
limit (int, default None) – Maximum number of consecutive labels to fill for inexact matches.
tolerance (optional) –
Maximum distance between original and new labels for inexact matches. The values of the index at the matching locations must satisfy the equation
abs(index[indexer] - target) <= tolerance
.Tolerance may be a scalar value, which applies the same tolerance to all values, or list-like, which applies variable tolerance per element. List-like includes list, tuple, array, DataContainerSeries, and must be the same size as the index and its dtype must exactly match the index’s type.
- Returns:
Same type as caller, but with changed indices on each axis.
- Return type:
See also
DataContainer.set_index
Set row labels.
DataContainer.reset_index
Remove row labels or move them to new columns.
DataContainer.reindex
Change to new indices or expand indices.
Notes
Same as calling
.reindex(index=other.index, columns=other.columns,...)
.Examples
>>> df1 = DataContainer([[24.3, 75.7, 'high'], ... [31, 87.8, 'high'], ... [22, 71.6, 'medium'], ... [35, 95, 'medium']], ... columns=['temp_celsius', 'temp_fahrenheit', ... 'windspeed'], ... index=pd.date_range(start='2014-02-12', ... end='2014-02-15', freq='D'))
>>> df1 temp_celsius temp_fahrenheit windspeed 2014-02-12 24.3 75.7 high 2014-02-13 31.0 87.8 high 2014-02-14 22.0 71.6 medium 2014-02-15 35.0 95.0 medium
>>> df2 = DataContainer([[28, 'low'], ... [30, 'low'], ... [35.1, 'medium']], ... columns=['temp_celsius', 'windspeed'], ... index=pd.DatetimeIndex(['2014-02-12', '2014-02-13', ... '2014-02-15']))
>>> df2 temp_celsius windspeed 2014-02-12 28.0 low 2014-02-13 30.0 low 2014-02-15 35.1 medium
>>> df2.reindex_like(df1) temp_celsius temp_fahrenheit windspeed 2014-02-12 28.0 NaN low 2014-02-13 30.0 NaN low 2014-02-14 NaN NaN NaN 2014-02-15 35.1 NaN medium
- rfloordiv(**kwargs)¶
Get Integer division of DataContainer and other, element-wise (binary operator rfloordiv).
Equivalent to
other // DataContainer
, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, floordiv.Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.
- Parameters:
other (scalar, sequence, DataContainerSeries, dict or DataContainer) – Any single or multiple element data structure, or list-like object.
axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For DataContainerSeries input, axis to match DataContainerSeries index on.
level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataContainer alignment, with this value before computation. If data in both corresponding DataContainer locations is missing the result will be missing.
- Returns:
Result of the arithmetic operation.
- Return type:
See also
DataContainer.add
Add DataContainers.
DataContainer.sub
Subtract DataContainers.
DataContainer.mul
Multiply DataContainers.
DataContainer.div
Divide DataContainers (float division).
DataContainer.truediv
Divide DataContainers (float division).
DataContainer.floordiv
Divide DataContainers (integer division).
DataContainer.mod
Calculate modulo (remainder after division).
DataContainer.pow
Calculate exponential power.
Notes
Mismatched indices will be unioned together.
Examples
>>> df = DataContainer({'angles': [0, 3, 4], ... 'degrees': [360, 180, 360]}, ... index=['circle', 'triangle', 'rectangle']) >>> df angles degrees circle 0 360 triangle 3 180 rectangle 4 360
Add a scalar with operator version which return the same results.
>>> df + 1 angles degrees circle 1 361 triangle 4 181 rectangle 5 361
>>> df.add(1) angles degrees circle 1 361 triangle 4 181 rectangle 5 361
Divide by constant with reverse version.
>>> df.div(10) angles degrees circle 0.0 36.0 triangle 0.3 18.0 rectangle 0.4 36.0
>>> df.rdiv(10) angles degrees circle inf 0.027778 triangle 3.333333 0.055556 rectangle 2.500000 0.027778
Subtract a list and DataContainerSeries by axis with operator version.
>>> df - [1, 2] angles degrees circle -1 358 triangle 2 178 rectangle 3 358
>>> df.sub([1, 2], axis='columns') angles degrees circle -1 358 triangle 2 178 rectangle 3 358
>>> df.sub(pd.DataContainerSeries([1, 1, 1], index=['circle', 'triangle', 'rectangle']), ... axis='index') angles degrees circle -1 359 triangle 2 179 rectangle 3 359
Multiply a dictionary by axis.
>>> df.mul({'angles': 0, 'degrees': 2}) angles degrees circle 0 720 triangle 0 360 rectangle 0 720
>>> df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index') angles degrees circle 0 0 triangle 6 360 rectangle 12 1080
Multiply a DataContainer of different shape with operator version.
>>> other = DataContainer({'angles': [0, 3, 4]}, ... index=['circle', 'triangle', 'rectangle']) >>> other angles circle 0 triangle 3 rectangle 4
>>> df * other angles degrees circle 0 NaN triangle 9 NaN rectangle 16 NaN
>>> df.mul(other, fill_value=0) angles degrees circle 0 0.0 triangle 9 0.0 rectangle 16 0.0
Divide by a MultiIndex by level.
>>> df_multindex = DataContainer({'angles': [0, 3, 4, 4, 5, 6], ... 'degrees': [360, 180, 360, 360, 540, 720]}, ... index=[['A', 'A', 'A', 'B', 'B', 'B'], ... ['circle', 'triangle', 'rectangle', ... 'square', 'pentagon', 'hexagon']]) >>> df_multindex angles degrees A circle 0 360 triangle 3 180 rectangle 4 360 B square 4 360 pentagon 5 540 hexagon 6 720
>>> df.div(df_multindex, level=1, fill_value=0) angles degrees A circle NaN 1.0 triangle 1.0 1.0 rectangle 1.0 1.0 B square 0.0 0.0 pentagon 0.0 0.0 hexagon 0.0 0.0
- rmod(**kwargs)¶
Get Modulo of DataContainer and other, element-wise (binary operator rmod).
Equivalent to
other % DataContainer
, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, mod.Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.
- Parameters:
other (scalar, sequence, DataContainerSeries, dict or DataContainer) – Any single or multiple element data structure, or list-like object.
axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For DataContainerSeries input, axis to match DataContainerSeries index on.
level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataContainer alignment, with this value before computation. If data in both corresponding DataContainer locations is missing the result will be missing.
- Returns:
Result of the arithmetic operation.
- Return type:
See also
DataContainer.add
Add DataContainers.
DataContainer.sub
Subtract DataContainers.
DataContainer.mul
Multiply DataContainers.
DataContainer.div
Divide DataContainers (float division).
DataContainer.truediv
Divide DataContainers (float division).
DataContainer.floordiv
Divide DataContainers (integer division).
DataContainer.mod
Calculate modulo (remainder after division).
DataContainer.pow
Calculate exponential power.
Notes
Mismatched indices will be unioned together.
Examples
>>> df = DataContainer({'angles': [0, 3, 4], ... 'degrees': [360, 180, 360]}, ... index=['circle', 'triangle', 'rectangle']) >>> df angles degrees circle 0 360 triangle 3 180 rectangle 4 360
Add a scalar with operator version which return the same results.
>>> df + 1 angles degrees circle 1 361 triangle 4 181 rectangle 5 361
>>> df.add(1) angles degrees circle 1 361 triangle 4 181 rectangle 5 361
Divide by constant with reverse version.
>>> df.div(10) angles degrees circle 0.0 36.0 triangle 0.3 18.0 rectangle 0.4 36.0
>>> df.rdiv(10) angles degrees circle inf 0.027778 triangle 3.333333 0.055556 rectangle 2.500000 0.027778
Subtract a list and DataContainerSeries by axis with operator version.
>>> df - [1, 2] angles degrees circle -1 358 triangle 2 178 rectangle 3 358
>>> df.sub([1, 2], axis='columns') angles degrees circle -1 358 triangle 2 178 rectangle 3 358
>>> df.sub(pd.DataContainerSeries([1, 1, 1], index=['circle', 'triangle', 'rectangle']), ... axis='index') angles degrees circle -1 359 triangle 2 179 rectangle 3 359
Multiply a dictionary by axis.
>>> df.mul({'angles': 0, 'degrees': 2}) angles degrees circle 0 720 triangle 0 360 rectangle 0 720
>>> df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index') angles degrees circle 0 0 triangle 6 360 rectangle 12 1080
Multiply a DataContainer of different shape with operator version.
>>> other = DataContainer({'angles': [0, 3, 4]}, ... index=['circle', 'triangle', 'rectangle']) >>> other angles circle 0 triangle 3 rectangle 4
>>> df * other angles degrees circle 0 NaN triangle 9 NaN rectangle 16 NaN
>>> df.mul(other, fill_value=0) angles degrees circle 0 0.0 triangle 9 0.0 rectangle 16 0.0
Divide by a MultiIndex by level.
>>> df_multindex = DataContainer({'angles': [0, 3, 4, 4, 5, 6], ... 'degrees': [360, 180, 360, 360, 540, 720]}, ... index=[['A', 'A', 'A', 'B', 'B', 'B'], ... ['circle', 'triangle', 'rectangle', ... 'square', 'pentagon', 'hexagon']]) >>> df_multindex angles degrees A circle 0 360 triangle 3 180 rectangle 4 360 B square 4 360 pentagon 5 540 hexagon 6 720
>>> df.div(df_multindex, level=1, fill_value=0) angles degrees A circle NaN 1.0 triangle 1.0 1.0 rectangle 1.0 1.0 B square 0.0 0.0 pentagon 0.0 0.0 hexagon 0.0 0.0
- rmul(**kwargs)¶
Get Multiplication of DataContainer and other, element-wise (binary operator rmul).
Equivalent to
other * DataContainer
, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, mul.Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.
- Parameters:
other (scalar, sequence, DataContainerSeries, dict or DataContainer) – Any single or multiple element data structure, or list-like object.
axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For DataContainerSeries input, axis to match DataContainerSeries index on.
level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataContainer alignment, with this value before computation. If data in both corresponding DataContainer locations is missing the result will be missing.
- Returns:
Result of the arithmetic operation.
- Return type:
See also
DataContainer.add
Add DataContainers.
DataContainer.sub
Subtract DataContainers.
DataContainer.mul
Multiply DataContainers.
DataContainer.div
Divide DataContainers (float division).
DataContainer.truediv
Divide DataContainers (float division).
DataContainer.floordiv
Divide DataContainers (integer division).
DataContainer.mod
Calculate modulo (remainder after division).
DataContainer.pow
Calculate exponential power.
Notes
Mismatched indices will be unioned together.
Examples
>>> df = DataContainer({'angles': [0, 3, 4], ... 'degrees': [360, 180, 360]}, ... index=['circle', 'triangle', 'rectangle']) >>> df angles degrees circle 0 360 triangle 3 180 rectangle 4 360
Add a scalar with operator version which return the same results.
>>> df + 1 angles degrees circle 1 361 triangle 4 181 rectangle 5 361
>>> df.add(1) angles degrees circle 1 361 triangle 4 181 rectangle 5 361
Divide by constant with reverse version.
>>> df.div(10) angles degrees circle 0.0 36.0 triangle 0.3 18.0 rectangle 0.4 36.0
>>> df.rdiv(10) angles degrees circle inf 0.027778 triangle 3.333333 0.055556 rectangle 2.500000 0.027778
Subtract a list and DataContainerSeries by axis with operator version.
>>> df - [1, 2] angles degrees circle -1 358 triangle 2 178 rectangle 3 358
>>> df.sub([1, 2], axis='columns') angles degrees circle -1 358 triangle 2 178 rectangle 3 358
>>> df.sub(pd.DataContainerSeries([1, 1, 1], index=['circle', 'triangle', 'rectangle']), ... axis='index') angles degrees circle -1 359 triangle 2 179 rectangle 3 359
Multiply a dictionary by axis.
>>> df.mul({'angles': 0, 'degrees': 2}) angles degrees circle 0 720 triangle 0 360 rectangle 0 720
>>> df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index') angles degrees circle 0 0 triangle 6 360 rectangle 12 1080
Multiply a DataContainer of different shape with operator version.
>>> other = DataContainer({'angles': [0, 3, 4]}, ... index=['circle', 'triangle', 'rectangle']) >>> other angles circle 0 triangle 3 rectangle 4
>>> df * other angles degrees circle 0 NaN triangle 9 NaN rectangle 16 NaN
>>> df.mul(other, fill_value=0) angles degrees circle 0 0.0 triangle 9 0.0 rectangle 16 0.0
Divide by a MultiIndex by level.
>>> df_multindex = DataContainer({'angles': [0, 3, 4, 4, 5, 6], ... 'degrees': [360, 180, 360, 360, 540, 720]}, ... index=[['A', 'A', 'A', 'B', 'B', 'B'], ... ['circle', 'triangle', 'rectangle', ... 'square', 'pentagon', 'hexagon']]) >>> df_multindex angles degrees A circle 0 360 triangle 3 180 rectangle 4 360 B square 4 360 pentagon 5 540 hexagon 6 720
>>> df.div(df_multindex, level=1, fill_value=0) angles degrees A circle NaN 1.0 triangle 1.0 1.0 rectangle 1.0 1.0 B square 0.0 0.0 pentagon 0.0 0.0 hexagon 0.0 0.0
- rpow(**kwargs)¶
Get Exponential power of DataContainer and other, element-wise (binary operator rpow).
Equivalent to
other ** DataContainer
, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, pow.Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.
- Parameters:
other (scalar, sequence, DataContainerSeries, dict or DataContainer) – Any single or multiple element data structure, or list-like object.
axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For DataContainerSeries input, axis to match DataContainerSeries index on.
level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataContainer alignment, with this value before computation. If data in both corresponding DataContainer locations is missing the result will be missing.
- Returns:
Result of the arithmetic operation.
- Return type:
See also
DataContainer.add
Add DataContainers.
DataContainer.sub
Subtract DataContainers.
DataContainer.mul
Multiply DataContainers.
DataContainer.div
Divide DataContainers (float division).
DataContainer.truediv
Divide DataContainers (float division).
DataContainer.floordiv
Divide DataContainers (integer division).
DataContainer.mod
Calculate modulo (remainder after division).
DataContainer.pow
Calculate exponential power.
Notes
Mismatched indices will be unioned together.
Examples
>>> df = DataContainer({'angles': [0, 3, 4], ... 'degrees': [360, 180, 360]}, ... index=['circle', 'triangle', 'rectangle']) >>> df angles degrees circle 0 360 triangle 3 180 rectangle 4 360
Add a scalar with operator version which return the same results.
>>> df + 1 angles degrees circle 1 361 triangle 4 181 rectangle 5 361
>>> df.add(1) angles degrees circle 1 361 triangle 4 181 rectangle 5 361
Divide by constant with reverse version.
>>> df.div(10) angles degrees circle 0.0 36.0 triangle 0.3 18.0 rectangle 0.4 36.0
>>> df.rdiv(10) angles degrees circle inf 0.027778 triangle 3.333333 0.055556 rectangle 2.500000 0.027778
Subtract a list and DataContainerSeries by axis with operator version.
>>> df - [1, 2] angles degrees circle -1 358 triangle 2 178 rectangle 3 358
>>> df.sub([1, 2], axis='columns') angles degrees circle -1 358 triangle 2 178 rectangle 3 358
>>> df.sub(pd.DataContainerSeries([1, 1, 1], index=['circle', 'triangle', 'rectangle']), ... axis='index') angles degrees circle -1 359 triangle 2 179 rectangle 3 359
Multiply a dictionary by axis.
>>> df.mul({'angles': 0, 'degrees': 2}) angles degrees circle 0 720 triangle 0 360 rectangle 0 720
>>> df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index') angles degrees circle 0 0 triangle 6 360 rectangle 12 1080
Multiply a DataContainer of different shape with operator version.
>>> other = DataContainer({'angles': [0, 3, 4]}, ... index=['circle', 'triangle', 'rectangle']) >>> other angles circle 0 triangle 3 rectangle 4
>>> df * other angles degrees circle 0 NaN triangle 9 NaN rectangle 16 NaN
>>> df.mul(other, fill_value=0) angles degrees circle 0 0.0 triangle 9 0.0 rectangle 16 0.0
Divide by a MultiIndex by level.
>>> df_multindex = DataContainer({'angles': [0, 3, 4, 4, 5, 6], ... 'degrees': [360, 180, 360, 360, 540, 720]}, ... index=[['A', 'A', 'A', 'B', 'B', 'B'], ... ['circle', 'triangle', 'rectangle', ... 'square', 'pentagon', 'hexagon']]) >>> df_multindex angles degrees A circle 0 360 triangle 3 180 rectangle 4 360 B square 4 360 pentagon 5 540 hexagon 6 720
>>> df.div(df_multindex, level=1, fill_value=0) angles degrees A circle NaN 1.0 triangle 1.0 1.0 rectangle 1.0 1.0 B square 0.0 0.0 pentagon 0.0 0.0 hexagon 0.0 0.0
- rsub(**kwargs)¶
Get Subtraction of DataContainer and other, element-wise (binary operator rsub).
Equivalent to
other - DataContainer
, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, sub.Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.
- Parameters:
other (scalar, sequence, DataContainerSeries, dict or DataContainer) – Any single or multiple element data structure, or list-like object.
axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For DataContainerSeries input, axis to match DataContainerSeries index on.
level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataContainer alignment, with this value before computation. If data in both corresponding DataContainer locations is missing the result will be missing.
- Returns:
Result of the arithmetic operation.
- Return type:
See also
DataContainer.add
Add DataContainers.
DataContainer.sub
Subtract DataContainers.
DataContainer.mul
Multiply DataContainers.
DataContainer.div
Divide DataContainers (float division).
DataContainer.truediv
Divide DataContainers (float division).
DataContainer.floordiv
Divide DataContainers (integer division).
DataContainer.mod
Calculate modulo (remainder after division).
DataContainer.pow
Calculate exponential power.
Notes
Mismatched indices will be unioned together.
Examples
>>> df = DataContainer({'angles': [0, 3, 4], ... 'degrees': [360, 180, 360]}, ... index=['circle', 'triangle', 'rectangle']) >>> df angles degrees circle 0 360 triangle 3 180 rectangle 4 360
Add a scalar with operator version which return the same results.
>>> df + 1 angles degrees circle 1 361 triangle 4 181 rectangle 5 361
>>> df.add(1) angles degrees circle 1 361 triangle 4 181 rectangle 5 361
Divide by constant with reverse version.
>>> df.div(10) angles degrees circle 0.0 36.0 triangle 0.3 18.0 rectangle 0.4 36.0
>>> df.rdiv(10) angles degrees circle inf 0.027778 triangle 3.333333 0.055556 rectangle 2.500000 0.027778
Subtract a list and DataContainerSeries by axis with operator version.
>>> df - [1, 2] angles degrees circle -1 358 triangle 2 178 rectangle 3 358
>>> df.sub([1, 2], axis='columns') angles degrees circle -1 358 triangle 2 178 rectangle 3 358
>>> df.sub(pd.DataContainerSeries([1, 1, 1], index=['circle', 'triangle', 'rectangle']), ... axis='index') angles degrees circle -1 359 triangle 2 179 rectangle 3 359
Multiply a dictionary by axis.
>>> df.mul({'angles': 0, 'degrees': 2}) angles degrees circle 0 720 triangle 0 360 rectangle 0 720
>>> df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index') angles degrees circle 0 0 triangle 6 360 rectangle 12 1080
Multiply a DataContainer of different shape with operator version.
>>> other = DataContainer({'angles': [0, 3, 4]}, ... index=['circle', 'triangle', 'rectangle']) >>> other angles circle 0 triangle 3 rectangle 4
>>> df * other angles degrees circle 0 NaN triangle 9 NaN rectangle 16 NaN
>>> df.mul(other, fill_value=0) angles degrees circle 0 0.0 triangle 9 0.0 rectangle 16 0.0
Divide by a MultiIndex by level.
>>> df_multindex = DataContainer({'angles': [0, 3, 4, 4, 5, 6], ... 'degrees': [360, 180, 360, 360, 540, 720]}, ... index=[['A', 'A', 'A', 'B', 'B', 'B'], ... ['circle', 'triangle', 'rectangle', ... 'square', 'pentagon', 'hexagon']]) >>> df_multindex angles degrees A circle 0 360 triangle 3 180 rectangle 4 360 B square 4 360 pentagon 5 540 hexagon 6 720
>>> df.div(df_multindex, level=1, fill_value=0) angles degrees A circle NaN 1.0 triangle 1.0 1.0 rectangle 1.0 1.0 B square 0.0 0.0 pentagon 0.0 0.0 hexagon 0.0 0.0
- rtruediv(**kwargs)¶
Get Floating division of DataContainer and other, element-wise (binary operator rtruediv).
Equivalent to
other / DataContainer
, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, truediv.Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.
- Parameters:
other (scalar, sequence, DataContainerSeries, dict or DataContainer) – Any single or multiple element data structure, or list-like object.
axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For DataContainerSeries input, axis to match DataContainerSeries index on.
level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataContainer alignment, with this value before computation. If data in both corresponding DataContainer locations is missing the result will be missing.
- Returns:
Result of the arithmetic operation.
- Return type:
See also
DataContainer.add
Add DataContainers.
DataContainer.sub
Subtract DataContainers.
DataContainer.mul
Multiply DataContainers.
DataContainer.div
Divide DataContainers (float division).
DataContainer.truediv
Divide DataContainers (float division).
DataContainer.floordiv
Divide DataContainers (integer division).
DataContainer.mod
Calculate modulo (remainder after division).
DataContainer.pow
Calculate exponential power.
Notes
Mismatched indices will be unioned together.
Examples
>>> df = DataContainer({'angles': [0, 3, 4], ... 'degrees': [360, 180, 360]}, ... index=['circle', 'triangle', 'rectangle']) >>> df angles degrees circle 0 360 triangle 3 180 rectangle 4 360
Add a scalar with operator version which return the same results.
>>> df + 1 angles degrees circle 1 361 triangle 4 181 rectangle 5 361
>>> df.add(1) angles degrees circle 1 361 triangle 4 181 rectangle 5 361
Divide by constant with reverse version.
>>> df.div(10) angles degrees circle 0.0 36.0 triangle 0.3 18.0 rectangle 0.4 36.0
>>> df.rdiv(10) angles degrees circle inf 0.027778 triangle 3.333333 0.055556 rectangle 2.500000 0.027778
Subtract a list and DataContainerSeries by axis with operator version.
>>> df - [1, 2] angles degrees circle -1 358 triangle 2 178 rectangle 3 358
>>> df.sub([1, 2], axis='columns') angles degrees circle -1 358 triangle 2 178 rectangle 3 358
>>> df.sub(pd.DataContainerSeries([1, 1, 1], index=['circle', 'triangle', 'rectangle']), ... axis='index') angles degrees circle -1 359 triangle 2 179 rectangle 3 359
Multiply a dictionary by axis.
>>> df.mul({'angles': 0, 'degrees': 2}) angles degrees circle 0 720 triangle 0 360 rectangle 0 720
>>> df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index') angles degrees circle 0 0 triangle 6 360 rectangle 12 1080
Multiply a DataContainer of different shape with operator version.
>>> other = DataContainer({'angles': [0, 3, 4]}, ... index=['circle', 'triangle', 'rectangle']) >>> other angles circle 0 triangle 3 rectangle 4
>>> df * other angles degrees circle 0 NaN triangle 9 NaN rectangle 16 NaN
>>> df.mul(other, fill_value=0) angles degrees circle 0 0.0 triangle 9 0.0 rectangle 16 0.0
Divide by a MultiIndex by level.
>>> df_multindex = DataContainer({'angles': [0, 3, 4, 4, 5, 6], ... 'degrees': [360, 180, 360, 360, 540, 720]}, ... index=[['A', 'A', 'A', 'B', 'B', 'B'], ... ['circle', 'triangle', 'rectangle', ... 'square', 'pentagon', 'hexagon']]) >>> df_multindex angles degrees A circle 0 360 triangle 3 180 rectangle 4 360 B square 4 360 pentagon 5 540 hexagon 6 720
>>> df.div(df_multindex, level=1, fill_value=0) angles degrees A circle NaN 1.0 triangle 1.0 1.0 rectangle 1.0 1.0 B square 0.0 0.0 pentagon 0.0 0.0 hexagon 0.0 0.0
- sub(**kwargs)¶
Get Subtraction of DataContainer and other, element-wise (binary operator sub).
Equivalent to
DataContainer - other
, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rsub.Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.
- Parameters:
other (scalar, sequence, DataContainerSeries, dict or DataContainer) – Any single or multiple element data structure, or list-like object.
axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For DataContainerSeries input, axis to match DataContainerSeries index on.
level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataContainer alignment, with this value before computation. If data in both corresponding DataContainer locations is missing the result will be missing.
- Returns:
Result of the arithmetic operation.
- Return type:
See also
DataContainer.add
Add DataContainers.
DataContainer.sub
Subtract DataContainers.
DataContainer.mul
Multiply DataContainers.
DataContainer.div
Divide DataContainers (float division).
DataContainer.truediv
Divide DataContainers (float division).
DataContainer.floordiv
Divide DataContainers (integer division).
DataContainer.mod
Calculate modulo (remainder after division).
DataContainer.pow
Calculate exponential power.
Notes
Mismatched indices will be unioned together.
Examples
>>> df = DataContainer({'angles': [0, 3, 4], ... 'degrees': [360, 180, 360]}, ... index=['circle', 'triangle', 'rectangle']) >>> df angles degrees circle 0 360 triangle 3 180 rectangle 4 360
Add a scalar with operator version which return the same results.
>>> df + 1 angles degrees circle 1 361 triangle 4 181 rectangle 5 361
>>> df.add(1) angles degrees circle 1 361 triangle 4 181 rectangle 5 361
Divide by constant with reverse version.
>>> df.div(10) angles degrees circle 0.0 36.0 triangle 0.3 18.0 rectangle 0.4 36.0
>>> df.rdiv(10) angles degrees circle inf 0.027778 triangle 3.333333 0.055556 rectangle 2.500000 0.027778
Subtract a list and DataContainerSeries by axis with operator version.
>>> df - [1, 2] angles degrees circle -1 358 triangle 2 178 rectangle 3 358
>>> df.sub([1, 2], axis='columns') angles degrees circle -1 358 triangle 2 178 rectangle 3 358
>>> df.sub(pd.DataContainerSeries([1, 1, 1], index=['circle', 'triangle', 'rectangle']), ... axis='index') angles degrees circle -1 359 triangle 2 179 rectangle 3 359
Multiply a dictionary by axis.
>>> df.mul({'angles': 0, 'degrees': 2}) angles degrees circle 0 720 triangle 0 360 rectangle 0 720
>>> df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index') angles degrees circle 0 0 triangle 6 360 rectangle 12 1080
Multiply a DataContainer of different shape with operator version.
>>> other = DataContainer({'angles': [0, 3, 4]}, ... index=['circle', 'triangle', 'rectangle']) >>> other angles circle 0 triangle 3 rectangle 4
>>> df * other angles degrees circle 0 NaN triangle 9 NaN rectangle 16 NaN
>>> df.mul(other, fill_value=0) angles degrees circle 0 0.0 triangle 9 0.0 rectangle 16 0.0
Divide by a MultiIndex by level.
>>> df_multindex = DataContainer({'angles': [0, 3, 4, 4, 5, 6], ... 'degrees': [360, 180, 360, 360, 540, 720]}, ... index=[['A', 'A', 'A', 'B', 'B', 'B'], ... ['circle', 'triangle', 'rectangle', ... 'square', 'pentagon', 'hexagon']]) >>> df_multindex angles degrees A circle 0 360 triangle 3 180 rectangle 4 360 B square 4 360 pentagon 5 540 hexagon 6 720
>>> df.div(df_multindex, level=1, fill_value=0) angles degrees A circle NaN 1.0 triangle 1.0 1.0 rectangle 1.0 1.0 B square 0.0 0.0 pentagon 0.0 0.0 hexagon 0.0 0.0
- to_csv(path_or_buf: FilepathOrBuffer, header: List[str] | bool = True, mode: str = 'w', **kwargs) None ¶
- to_csv(path_or_buf: None, header: List[str] | bool = True, mode: str = 'w', **kwargs) str
Write object to a comma-separated values (csv) file.
- Parameters:
path_or_buf (str, path object, file-like object, or None, default None) – String, path object (implementing os.PathLike[str]), or file-like object implementing a write() function. If None, the result is returned as a string. If a non-binary file object is passed, it should be opened with newline=’’, disabling universal newlines. If a binary file object is passed, mode might need to contain a ‘b’.
sep (str, default ',') – String of length 1. Field delimiter for the output file.
na_rep (str, default '') – Missing data representation.
float_format (str, Callable, default None) – Format string for floating point numbers. If a Callable is given, it takes precedence over other numeric formatting parameters, like decimal.
columns (sequence, optional) – Columns to write.
header (bool or list of str, default True) – Write out the column names. If a list of strings is given it is assumed to be aliases for the column names.
index (bool, default True) – Write row names (index).
index_label (str or sequence, or False, default None) – Column label for index column(s) if desired. If None is given, and header and index are True, then the index names are used. A sequence should be given if the object uses MultiIndex. If False do not print fields for index names. Use index_label=False for easier importing in R.
mode ({'w', 'x', 'a'}, default 'w') –
Forwarded to either open(mode=) or fsspec.open(mode=) to control the file opening. Typical values include:
’w’, truncate the file first.
’x’, exclusive creation, failing if the file already exists.
’a’, append to the end of file if it exists.
encoding (str, optional) – A string representing the encoding to use in the output file, defaults to ‘utf-8’. encoding is not supported if path_or_buf is a non-binary file object.
compression (str or dict, default 'infer') –
For on-the-fly compression of the output data. If ‘infer’ and ‘path_or_buf’ is path-like, then detect compression from the following extensions: ‘.gz’, ‘.bz2’, ‘.zip’, ‘.xz’, ‘.zst’, ‘.tar’, ‘.tar.gz’, ‘.tar.xz’ or ‘.tar.bz2’ (otherwise no compression). Set to
None
for no compression. Can also be a dict with key'method'
set to one of {'zip'
,'gzip'
,'bz2'
,'zstd'
,'xz'
,'tar'
} and other key-value pairs are forwarded tozipfile.ZipFile
,gzip.GzipFile
,bz2.BZ2File
,zstandard.ZstdCompressor
,lzma.LZMAFile
ortarfile.TarFile
, respectively. As an example, the following could be passed for faster compression and to create a reproducible gzip archive:compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}
.Added in version 1.5.0: Added support for .tar files.
May be a dict with key ‘method’ as compression mode and other entries as additional compression options if compression mode is ‘zip’.
Passing compression options as keys in dict is supported for compression modes ‘gzip’, ‘bz2’, ‘zstd’, and ‘zip’.
quoting (optional constant from csv module) – Defaults to csv.QUOTE_MINIMAL. If you have set a float_format then floats are converted to strings and thus csv.QUOTE_NONNUMERIC will treat them as non-numeric.
quotechar (str, default '"') – String of length 1. Character used to quote fields.
lineterminator (str, optional) –
The newline character or character sequence to use in the output file. Defaults to os.linesep, which depends on the OS in which this method is called (’\n’ for linux, ‘\r\n’ for Windows, i.e.).
Changed in version 1.5.0: Previously was line_terminator, changed for consistency with read_csv and the standard library ‘csv’ module.
chunksize (int or None) – Rows to write at a time.
date_format (str, default None) – Format string for datetime objects.
doublequote (bool, default True) – Control quoting of quotechar inside a field.
escapechar (str, default None) – String of length 1. Character used to escape sep and quotechar when appropriate.
decimal (str, default '.') – Character recognized as decimal separator. E.g. use ‘,’ for European data.
errors (str, default 'strict') – Specifies how encoding and decoding errors are to be handled. See the errors argument for
open()
for a full list of options.storage_options (dict, optional) – Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc. For HTTP(S) URLs the key-value pairs are forwarded to
urllib.request.Request
as header options. For other URLs (e.g. starting with “s3://”, and “gcs://”) the key-value pairs are forwarded tofsspec.open
. Please seefsspec
andurllib
for more details, and for more examples on storage options refer here.
- Returns:
If path_or_buf is None, returns the resulting csv format as a string. Otherwise returns None.
- Return type:
None or str
See also
read_csv
Load a CSV file into a DataFrame.
to_excel
Write DataFrame to an Excel file.
Examples
Create ‘out.csv’ containing ‘df’ without indices
>>> df = pd.DataFrame({'name': ['Raphael', 'Donatello'], ... 'mask': ['red', 'purple'], ... 'weapon': ['sai', 'bo staff']}) >>> df.to_csv('out.csv', index=False)
Create ‘out.zip’ containing ‘out.csv’
>>> df.to_csv(index=False) 'name,mask,weapon\nRaphael,red,sai\nDonatello,purple,bo staff\n' >>> compression_opts = dict(method='zip', ... archive_name='out.csv') >>> df.to_csv('out.zip', index=False, ... compression=compression_opts)
To write a csv file to a new folder or nested folder you will first need to create it using either Pathlib or os:
>>> from pathlib import Path >>> filepath = Path('folder/subfolder/out.csv') >>> filepath.parent.mkdir(parents=True, exist_ok=True) >>> df.to_csv(filepath)
>>> import os >>> os.makedirs('folder/subfolder', exist_ok=True) >>> df.to_csv('folder/subfolder/out.csv')
- to_hdf(filepath_or_buffer: str | PathLike | IO, key: str, **kwargs) None ¶
Write the contained data to an HDF5 file using HDFStore.
Hierarchical Data Format (HDF) is self-describing, allowing an application to interpret the structure and contents of a file with no outside information. One HDF file can hold a mix of related objects which can be accessed as a group or as individual objects.
In order to add another DataFrame or Series to an existing HDF file please use append mode and a different a key.
Warning
One can store a subclass of
DataFrame
orSeries
to HDF5, but the type of the subclass is lost upon storing.For more information see the user guide.
- Parameters:
path_or_buf (str or pandas.HDFStore) – File path or HDFStore object.
key (str) – Identifier for the group in the store.
mode ({'a', 'w', 'r+'}, default 'a') –
Mode to open file:
’w’: write, a new file is created (an existing file with the same name would be deleted).
’a’: append, an existing file is opened for reading and writing, and if the file does not exist it is created.
’r+’: similar to ‘a’, but the file must already exist.
complevel ({0-9}, default None) – Specifies a compression level for data. A value of 0 or None disables compression.
complib ({'zlib', 'lzo', 'bzip2', 'blosc'}, default 'zlib') – Specifies the compression library to be used. These additional compressors for Blosc are supported (default if no compressor specified: ‘blosc:blosclz’): {‘blosc:blosclz’, ‘blosc:lz4’, ‘blosc:lz4hc’, ‘blosc:snappy’, ‘blosc:zlib’, ‘blosc:zstd’}. Specifying a compression library which is not available issues a ValueError.
append (bool, default False) – For Table formats, append the input data to the existing.
format ({'fixed', 'table', None}, default 'fixed') –
Possible values:
’fixed’: Fixed format. Fast writing/reading. Not-appendable, nor searchable.
’table’: Table format. Write as a PyTables Table structure which may perform worse but allow more flexible operations like searching / selecting subsets of the data.
If None, pd.get_option(‘io.hdf.default_format’) is checked, followed by fallback to “fixed”.
index (bool, default True) – Write DataFrame index as a column.
min_itemsize (dict or int, optional) – Map column names to minimum string sizes for columns.
nan_rep (Any, optional) – How to represent null values as str. Not allowed with append=True.
dropna (bool, default False, optional) – Remove missing values.
data_columns (list of columns or True, optional) – List of columns to create as indexed data columns for on-disk queries, or True to use all columns. By default only the axes of the object are indexed. See Query via data columns. for more information. Applicable only to format=’table’.
errors (str, default 'strict') – Specifies how encoding and decoding errors are to be handled. See the errors argument for
open()
for a full list of options.encoding (str, default "UTF-8")
See also
read_hdf
Read from HDF file.
DataFrame.to_orc
Write a DataFrame to the binary orc format.
DataFrame.to_parquet
Write a DataFrame to the binary parquet format.
DataFrame.to_sql
Write to a SQL table.
DataFrame.to_feather
Write out feather-format for DataFrames.
DataFrame.to_csv
Write out to a csv file.
Examples
>>> df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, ... index=['a', 'b', 'c']) >>> df.to_hdf('data.h5', key='df', mode='w')
We can add another object to the same file:
>>> s = pd.Series([1, 2, 3, 4]) >>> s.to_hdf('data.h5', key='s')
Reading from HDF file:
>>> pd.read_hdf('data.h5', 'df') A B a 1 4 b 2 5 c 3 6 >>> pd.read_hdf('data.h5', 's') 0 1 1 2 2 3 3 4 dtype: int64
- to_json(filepath_or_buffer: FilepathOrBuffer, mode: str = 'w', orient: str | None = None, **kwargs) None ¶
- to_json(filepath_or_buffer: None, mode: str = 'w', orient: str | None = None, **kwargs) str
Convert the object to a JSON string.
Note NaN’s and None will be converted to null and datetime objects will be converted to UNIX timestamps.
- Parameters:
path_or_buf (str, path object, file-like object, or None, default None) – String, path object (implementing os.PathLike[str]), or file-like object implementing a write() function. If None, the result is returned as a string.
orient (str) –
Indication of expected JSON string format.
Series:
default is ‘index’
allowed values are: {‘split’, ‘records’, ‘index’, ‘table’}.
DataFrame:
default is ‘columns’
allowed values are: {‘split’, ‘records’, ‘index’, ‘columns’, ‘values’, ‘table’}.
The format of the JSON string:
’split’ : dict like {‘index’ -> [index], ‘columns’ -> [columns], ‘data’ -> [values]}
’records’ : list like [{column -> value}, … , {column -> value}]
’index’ : dict like {index -> {column -> value}}
’columns’ : dict like {column -> {index -> value}}
’values’ : just the values array
’table’ : dict like {‘schema’: {schema}, ‘data’: {data}}
Describing the data, where data component is like
orient='records'
.
date_format ({None, 'epoch', 'iso'}) – Type of date conversion. ‘epoch’ = epoch milliseconds, ‘iso’ = ISO8601. The default depends on the orient. For
orient='table'
, the default is ‘iso’. For all other orients, the default is ‘epoch’.double_precision (int, default 10) – The number of decimal places to use when encoding floating point values. The possible maximal value is 15. Passing double_precision greater than 15 will raise a ValueError.
force_ascii (bool, default True) – Force encoded string to be ASCII.
date_unit (str, default 'ms' (milliseconds)) – The time unit to encode to, governs timestamp and ISO8601 precision. One of ‘s’, ‘ms’, ‘us’, ‘ns’ for second, millisecond, microsecond, and nanosecond respectively.
default_handler (callable, default None) – Handler to call if object cannot otherwise be converted to a suitable format for JSON. Should receive a single argument which is the object to convert and return a serialisable object.
lines (bool, default False) – If ‘orient’ is ‘records’ write out line-delimited json format. Will throw ValueError if incorrect ‘orient’ since others are not list-like.
compression (str or dict, default 'infer') –
For on-the-fly compression of the output data. If ‘infer’ and ‘path_or_buf’ is path-like, then detect compression from the following extensions: ‘.gz’, ‘.bz2’, ‘.zip’, ‘.xz’, ‘.zst’, ‘.tar’, ‘.tar.gz’, ‘.tar.xz’ or ‘.tar.bz2’ (otherwise no compression). Set to
None
for no compression. Can also be a dict with key'method'
set to one of {'zip'
,'gzip'
,'bz2'
,'zstd'
,'xz'
,'tar'
} and other key-value pairs are forwarded tozipfile.ZipFile
,gzip.GzipFile
,bz2.BZ2File
,zstandard.ZstdCompressor
,lzma.LZMAFile
ortarfile.TarFile
, respectively. As an example, the following could be passed for faster compression and to create a reproducible gzip archive:compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}
.Added in version 1.5.0: Added support for .tar files.
Changed in version 1.4.0: Zstandard support.
index (bool or None, default None) – The index is only used when ‘orient’ is ‘split’, ‘index’, ‘column’, or ‘table’. Of these, ‘index’ and ‘column’ do not support index=False.
indent (int, optional) – Length of whitespace used to indent each record.
storage_options (dict, optional) –
Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc. For HTTP(S) URLs the key-value pairs are forwarded to
urllib.request.Request
as header options. For other URLs (e.g. starting with “s3://”, and “gcs://”) the key-value pairs are forwarded tofsspec.open
. Please seefsspec
andurllib
for more details, and for more examples on storage options refer here.mode (str, default 'w' (writing)) – Specify the IO mode for output when supplying a path_or_buf. Accepted args are ‘w’ (writing) and ‘a’ (append) only. mode=’a’ is only supported when lines is True and orient is ‘records’.
- Returns:
If path_or_buf is None, returns the resulting json format as a string. Otherwise returns None.
- Return type:
None or str
See also
read_json
Convert a JSON string to pandas object.
Notes
The behavior of
indent=0
varies from the stdlib, which does not indent the output but does insert newlines. Currently,indent=0
and the defaultindent=None
are equivalent in pandas, though this may change in a future release.orient='table'
contains a ‘pandas_version’ field under ‘schema’. This stores the version of pandas used in the latest revision of the schema.Examples
>>> from json import loads, dumps >>> df = pd.DataFrame( ... [["a", "b"], ["c", "d"]], ... index=["row 1", "row 2"], ... columns=["col 1", "col 2"], ... )
>>> result = df.to_json(orient="split") >>> parsed = loads(result) >>> dumps(parsed, indent=4) { "columns": [ "col 1", "col 2" ], "index": [ "row 1", "row 2" ], "data": [ [ "a", "b" ], [ "c", "d" ] ] }
Encoding/decoding a Dataframe using
'records'
formatted JSON. Note that index labels are not preserved with this encoding.>>> result = df.to_json(orient="records") >>> parsed = loads(result) >>> dumps(parsed, indent=4) [ { "col 1": "a", "col 2": "b" }, { "col 1": "c", "col 2": "d" } ]
Encoding/decoding a Dataframe using
'index'
formatted JSON:>>> result = df.to_json(orient="index") >>> parsed = loads(result) >>> dumps(parsed, indent=4) { "row 1": { "col 1": "a", "col 2": "b" }, "row 2": { "col 1": "c", "col 2": "d" } }
Encoding/decoding a Dataframe using
'columns'
formatted JSON:>>> result = df.to_json(orient="columns") >>> parsed = loads(result) >>> dumps(parsed, indent=4) { "col 1": { "row 1": "a", "row 2": "c" }, "col 2": { "row 1": "b", "row 2": "d" } }
Encoding/decoding a Dataframe using
'values'
formatted JSON:>>> result = df.to_json(orient="values") >>> parsed = loads(result) >>> dumps(parsed, indent=4) [ [ "a", "b" ], [ "c", "d" ] ]
Encoding with Table Schema:
>>> result = df.to_json(orient="table") >>> parsed = loads(result) >>> dumps(parsed, indent=4) { "schema": { "fields": [ { "name": "index", "type": "string" }, { "name": "col 1", "type": "string" }, { "name": "col 2", "type": "string" } ], "primaryKey": [ "index" ], "pandas_version": "1.4.0" }, "data": [ { "index": "row 1", "col 1": "a", "col 2": "b" }, { "index": "row 2", "col 1": "c", "col 2": "d" } ] }
- transform(**kwargs)¶
Call
func
on self producing a DataContainer with the same axis shape as self.- Parameters:
func (function, str, list-like or dict-like) –
Function to use for transforming the data. If a function, must either work when passed a DataContainer or when passed to DataContainer.apply. If func is both list-like and dict-like, dict-like behavior takes precedence.
Accepted combinations are:
function
string function name
list-like of functions and/or function names, e.g.
[np.exp, 'sqrt']
dict-like of axis labels -> functions, function names or list-like of such.
axis ({0 or 'index', 1 or 'columns'}, default 0) – If 0 or ‘index’: apply function to each column. If 1 or ‘columns’: apply function to each row.
*args – Positional arguments to pass to func.
**kwargs – Keyword arguments to pass to func.
- Returns:
A DataContainer that must have the same length as self.
- Return type:
:raises ValueError : If the returned DataContainer has a different length than self.:
See also
DataContainer.agg
Only perform aggregating type operations.
DataContainer.apply
Invoke function on a DataContainer.
Notes
Functions that mutate the passed object can produce unexpected behavior or errors and are not supported. See Mutating with User Defined Function (UDF) methods for more details.
Examples
>>> df = DataContainer({'A': range(3), 'B': range(1, 4)}) >>> df A B 0 0 1 1 1 2 2 2 3 >>> df.transform(lambda x: x + 1) A B 0 1 2 1 2 3 2 3 4
Even though the resulting DataContainer must have the same length as the input DataContainer, it is possible to provide several input functions:
>>> s = pd.DataContainerSeries(range(3)) >>> s 0 0 1 1 2 2 dtype: int64 >>> s.transform([np.sqrt, np.exp]) sqrt exp 0 0.000000 1.000000 1 1.000000 2.718282 2 1.414214 7.389056
You can call transform on a GroupBy object:
>>> df = DataContainer({ ... "Date": [ ... "2015-05-08", "2015-05-07", "2015-05-06", "2015-05-05", ... "2015-05-08", "2015-05-07", "2015-05-06", "2015-05-05"], ... "Data": [5, 8, 6, 1, 50, 100, 60, 120], ... }) >>> df Date Data 0 2015-05-08 5 1 2015-05-07 8 2 2015-05-06 6 3 2015-05-05 1 4 2015-05-08 50 5 2015-05-07 100 6 2015-05-06 60 7 2015-05-05 120 >>> df.groupby('Date')['Data'].transform('sum') 0 55 1 108 2 66 3 121 4 55 5 108 6 66 7 121 Name: Data, dtype: int64
>>> df = DataContainer({ ... "c": [1, 1, 1, 2, 2, 2, 2], ... "type": ["m", "n", "o", "m", "m", "n", "n"] ... }) >>> df c type 0 1 m 1 1 n 2 1 o 3 2 m 4 2 m 5 2 n 6 2 n >>> df['size'] = df.groupby('c')['type'].transform(len) >>> df c type size 0 1 m 3 1 1 n 3 2 1 o 3 3 2 m 4 4 2 m 4 5 2 n 4 6 2 n 4
- truediv(**kwargs)¶
Get Floating division of DataContainer and other, element-wise (binary operator truediv).
Equivalent to
DataContainer / other
, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rtruediv.Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.
- Parameters:
other (scalar, sequence, DataContainerSeries, dict or DataContainer) – Any single or multiple element data structure, or list-like object.
axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For DataContainerSeries input, axis to match DataContainerSeries index on.
level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataContainer alignment, with this value before computation. If data in both corresponding DataContainer locations is missing the result will be missing.
- Returns:
Result of the arithmetic operation.
- Return type:
See also
DataContainer.add
Add DataContainers.
DataContainer.sub
Subtract DataContainers.
DataContainer.mul
Multiply DataContainers.
DataContainer.div
Divide DataContainers (float division).
DataContainer.truediv
Divide DataContainers (float division).
DataContainer.floordiv
Divide DataContainers (integer division).
DataContainer.mod
Calculate modulo (remainder after division).
DataContainer.pow
Calculate exponential power.
Notes
Mismatched indices will be unioned together.
Examples
>>> df = DataContainer({'angles': [0, 3, 4], ... 'degrees': [360, 180, 360]}, ... index=['circle', 'triangle', 'rectangle']) >>> df angles degrees circle 0 360 triangle 3 180 rectangle 4 360
Add a scalar with operator version which return the same results.
>>> df + 1 angles degrees circle 1 361 triangle 4 181 rectangle 5 361
>>> df.add(1) angles degrees circle 1 361 triangle 4 181 rectangle 5 361
Divide by constant with reverse version.
>>> df.div(10) angles degrees circle 0.0 36.0 triangle 0.3 18.0 rectangle 0.4 36.0
>>> df.rdiv(10) angles degrees circle inf 0.027778 triangle 3.333333 0.055556 rectangle 2.500000 0.027778
Subtract a list and DataContainerSeries by axis with operator version.
>>> df - [1, 2] angles degrees circle -1 358 triangle 2 178 rectangle 3 358
>>> df.sub([1, 2], axis='columns') angles degrees circle -1 358 triangle 2 178 rectangle 3 358
>>> df.sub(pd.DataContainerSeries([1, 1, 1], index=['circle', 'triangle', 'rectangle']), ... axis='index') angles degrees circle -1 359 triangle 2 179 rectangle 3 359
Multiply a dictionary by axis.
>>> df.mul({'angles': 0, 'degrees': 2}) angles degrees circle 0 720 triangle 0 360 rectangle 0 720
>>> df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index') angles degrees circle 0 0 triangle 6 360 rectangle 12 1080
Multiply a DataContainer of different shape with operator version.
>>> other = DataContainer({'angles': [0, 3, 4]}, ... index=['circle', 'triangle', 'rectangle']) >>> other angles circle 0 triangle 3 rectangle 4
>>> df * other angles degrees circle 0 NaN triangle 9 NaN rectangle 16 NaN
>>> df.mul(other, fill_value=0) angles degrees circle 0 0.0 triangle 9 0.0 rectangle 16 0.0
Divide by a MultiIndex by level.
>>> df_multindex = DataContainer({'angles': [0, 3, 4, 4, 5, 6], ... 'degrees': [360, 180, 360, 360, 540, 720]}, ... index=[['A', 'A', 'A', 'B', 'B', 'B'], ... ['circle', 'triangle', 'rectangle', ... 'square', 'pentagon', 'hexagon']]) >>> df_multindex angles degrees A circle 0 360 triangle 3 180 rectangle 4 360 B square 4 360 pentagon 5 540 hexagon 6 720
>>> df.div(df_multindex, level=1, fill_value=0) angles degrees A circle NaN 1.0 triangle 1.0 1.0 rectangle 1.0 1.0 B square 0.0 0.0 pentagon 0.0 0.0 hexagon 0.0 0.0