Bar of proportion of two variables - pandas

I am having a pandas dataframe as shown below
import numpy as np
data = {
'id': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50],
'baseline': [1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1],
'endline': [1, 0, np.nan, 1, 0, 0, 1, np.nan, 1, 0, 0, 1, 0, 0, 1, 0, np.nan, np.nan, 1, 0, 1, np.nan, 0, 1, 0, 1, 0, np.nan, 1, 0, np.nan, 0, 0, 0, np.nan, 1, np.nan, 1, np.nan, 0, np.nan, 1, 1, 0, 1, 1, 1, 0, 1, 1],
'gender': ['male', 'male', 'male', 'male', 'male', 'male', 'male', 'male', 'male', 'male', 'male', 'male', 'male', 'male', 'male', 'male', 'male', 'male', 'male', 'male', 'male', 'female', 'female', 'female', 'female', 'female', 'female', 'female', 'female', 'female', 'female', 'female', 'female', 'female', 'female', 'female', 'female', 'female', 'female', 'female', 'female', 'female', 'female', 'female', 'female', 'female', 'female', 'female', 'female', 'female']
}
df = pd.DataFrame(data)
df.head(n = 5)
The challenge is the endline column may have some missing values. My goal is to have 2 bars for each variable side by side as shown below.
Thanks in advance!

Seaborn prefers its data in "long form". Pandas' melt can convert the given dataframe to combine the 'baseline' and 'endline' columns.
By default, sns.barplot shows the mean when there are multiple y-values belonging to the same x-value. You can use a different estimator, e.g. summing the values and dividing by the number of values to get a percentage.
Here is some code to get you started:
import matplotlib.pyplot as plt
from matplotlib.ticker import PercentFormatter
import seaborn as sns
import pandas as pd
import numpy as np
data = {
'id': range(1, 51),
'baseline': [1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1],
'endline': [1, 0, np.nan, 1, 0, 0, 1, np.nan, 1, 0, 0, 1, 0, 0, 1, 0, np.nan, np.nan, 1, 0, 1, np.nan, 0, 1, 0, 1, 0, np.nan, 1, 0, np.nan, 0, 0, 0, np.nan, 1, np.nan, 1, np.nan, 0, np.nan, 1, 1, 0, 1, 1, 1, 0, 1, 1]
}
df = pd.DataFrame(data)
sns.set_style('white')
ax = sns.barplot(data=df.melt(value_vars=['baseline', 'endline']),
x='variable', y='value',
estimator=lambda x: np.sum(x) / np.size(x) * 100, ci=None,
color='cornflowerblue')
ax.bar_label(ax.containers[0], fmt='%.1f %%', fontsize=20)
sns.despine(ax=ax, left=True)
ax.grid(True, axis='y')
ax.yaxis.set_major_formatter(PercentFormatter(100))
ax.set_xlabel('')
ax.set_ylabel('')
plt.tight_layout()
plt.show()

Related

CountVectorizer not working in ColumnTransformer

Combining CountVectorizer() with ColumnTransformer() gives me an error. Here is a reproduced case:
from sklearn.compose import ColumnTransformer
from sklearn.feature_extraction.text import CountVectorizer
import pandas as pd
# Create a sample data frame
df = pd.DataFrame({
'corpus': ['This is the first document.', 'This document is the second document.', 'And this is the third one.',
'Is this the first document?', 'I have the fourth document'],
'word_length': [27, 37, 26, 27, 26]
})
text_feature = ["corpus"]
count_transformer = CountVectorizer()
# Create the ColumnTransformer
ct = ColumnTransformer(transformers=[
("count", count_transformer, text_feature)],
remainder='passthrough')
ct.fit_transform(df)
The output says:
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 0, the array at index 0 has size 1 and the array at index 1 has size 5
I tried the code below which does the job but is doesn't scale easily as ColumnTransformer().
np.c_[count_transformer.fit_transform(df["corpus"]).toarray(), df["word_length"].values].
The result is the numpy array below:
array([[ 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 27],
0, 2, 0, 0, 0, 1, 0, 1, 1, 0, 1, 37],
1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 26],
0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 27],
0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 26]], dtype=int64)

prestashop - Can't load Order status at line 242 in file classes/PaymentModule.php

when I order with an obligation to pay
[PrestaShopException]
Can't load Order status
at line 242 in file classes/PaymentModule.php
237. }
238.
239. $order_status = new OrderState((int) $id_order_state, (int) $this->context->language->id);
240. if (!Validate::isLoadedObject($order_status)) {
241. PrestaShopLogger::addLog('PaymentModule::validateOrder - Order Status cannot be loaded', 3, null, 'Cart', (int) $id_cart, true);
242. throw new PrestaShopException('Can\'t load Order status');
243. }
244.
245. if (!$this->active) {
246. PrestaShopLogger::addLog('PaymentModule::validateOrder - Module is not active', 3, null, 'Cart', (int) $id_cart, true);
247. die(Tools::displayError());
PaymentModuleCore->validateOrder - [line 58 - modules/ps_wirepayment/controllers/front/validation.php] - [9 Arguments]
Ps_WirepaymentValidationModuleFrontController->postProcess - [line 270 - classes/controller/Controller.php]
ControllerCore->run - [line 509 - classes/Dispatcher.php]
DispatcherCore->dispatch - [line 24 - override/classes/Dispatcher.php]
Dispatcher->dispatch - [line 28 - index.php]
insert with phpmyadmin the table
ps_order_state:
INSERT INTO `ps_order_state` (`id_order_state`, `invoice`, `send_email`, `module_name`, `color`, `unremovable`, `hidden`, `logable`, `delivery`, `shipped`, `paid`, `deleted`) VALUES
(1, 0, 1, 'cheque', 'RoyalBlue', 1, 0, 0, 0, 0, 0, 0),
(2, 1, 1, '', 'LimeGreen', 1, 0, 1, 0, 0, 1, 0),
(3, 1, 1, '', 'DarkOrange', 1, 0, 1, 0, 0, 1, 0),
(4, 1, 1, '', 'BlueViolet', 1, 0, 1, 1, 1, 1, 0),
(5, 1, 0, '', '#108510', 1, 0, 1, 1, 1, 1, 0),
(6, 0, 1, '', 'Crimson', 1, 0, 0, 0, 0, 0, 0),
(7, 1, 1, '', '#ec2e15', 1, 0, 0, 0, 0, 0, 0),
(8, 0, 1, '', '#8f0621', 1, 0, 0, 0, 0, 0, 0),
(9, 1, 1, '', 'HotPink', 1, 0, 0, 0, 0, 1, 0),
(10, 0, 1, 'bankwire', 'RoyalBlue', 1, 0, 0, 0, 0, 0, 0),
(11, 0, 0, '', 'RoyalBlue', 1, 0, 0, 0, 0, 0, 0),
(12, 1, 1, '', 'LimeGreen', 1, 0, 1, 0, 0, 1, 0),
(13, 1, 0, '', '#DDEEFF', 0, 0, 1, 0, 0, 0, 0);
Your ps_wirepayment module is trying to set an order status to an invalid / deleted id_order_state.
If you haven't modified it , bankwire modulee relies on ps_configuration "PS_OS_BANKWIRE" status value,
so make sure the ID value is valid and links to a valid order state in your database.

Selecting the value at a given date for each lat/lon point in xarray

I have a xr.DataArray object that has a day of 2015 (as a cftime.DateTimeNoLeap object) for each lat-lon point on the grid.
date_matrix2015
<xarray.DataArray (lat: 160, lon: 320)>
array([[cftime.DatetimeNoLeap(2015, 12, 11, 12, 0, 0, 0),
cftime.DatetimeNoLeap(2015, 12, 11, 12, 0, 0, 0),
cftime.DatetimeNoLeap(2015, 12, 11, 12, 0, 0, 0), ...,
cftime.DatetimeNoLeap(2015, 12, 11, 12, 0, 0, 0),
cftime.DatetimeNoLeap(2015, 12, 11, 12, 0, 0, 0),
cftime.DatetimeNoLeap(2015, 12, 11, 12, 0, 0, 0)],
[cftime.DatetimeNoLeap(2015, 12, 11, 12, 0, 0, 0),
cftime.DatetimeNoLeap(2015, 12, 11, 12, 0, 0, 0),
cftime.DatetimeNoLeap(2015, 12, 11, 12, 0, 0, 0), ...,
cftime.DatetimeNoLeap(2015, 12, 11, 12, 0, 0, 0),
cftime.DatetimeNoLeap(2015, 12, 11, 12, 0, 0, 0),
cftime.DatetimeNoLeap(2015, 12, 11, 12, 0, 0, 0)],
[cftime.DatetimeNoLeap(2015, 12, 11, 12, 0, 0, 0),
cftime.DatetimeNoLeap(2015, 12, 11, 12, 0, 0, 0),
cftime.DatetimeNoLeap(2015, 12, 11, 12, 0, 0, 0), ...,
cftime.DatetimeNoLeap(2015, 12, 11, 12, 0, 0, 0),
cftime.DatetimeNoLeap(2015, 12, 11, 12, 0, 0, 0),
cftime.DatetimeNoLeap(2015, 12, 11, 12, 0, 0, 0)],
...,
[cftime.DatetimeNoLeap(2015, 3, 14, 12, 0, 0, 0),
cftime.DatetimeNoLeap(2015, 3, 14, 12, 0, 0, 0),
cftime.DatetimeNoLeap(2015, 3, 14, 12, 0, 0, 0), ...,
cftime.DatetimeNoLeap(2015, 9, 16, 12, 0, 0, 0),
cftime.DatetimeNoLeap(2015, 9, 16, 12, 0, 0, 0),
cftime.DatetimeNoLeap(2015, 9, 16, 12, 0, 0, 0)],
[cftime.DatetimeNoLeap(2015, 9, 15, 12, 0, 0, 0),
cftime.DatetimeNoLeap(2015, 9, 15, 12, 0, 0, 0),
cftime.DatetimeNoLeap(2015, 9, 15, 12, 0, 0, 0), ...,
cftime.DatetimeNoLeap(2015, 9, 16, 12, 0, 0, 0),
cftime.DatetimeNoLeap(2015, 9, 15, 12, 0, 0, 0),
cftime.DatetimeNoLeap(2015, 9, 15, 12, 0, 0, 0)],
[cftime.DatetimeNoLeap(2015, 9, 16, 12, 0, 0, 0),
cftime.DatetimeNoLeap(2015, 9, 16, 12, 0, 0, 0),
cftime.DatetimeNoLeap(2015, 9, 16, 12, 0, 0, 0), ...,
cftime.DatetimeNoLeap(2015, 9, 16, 12, 0, 0, 0),
cftime.DatetimeNoLeap(2015, 9, 16, 12, 0, 0, 0),
cftime.DatetimeNoLeap(2015, 9, 16, 12, 0, 0, 0)]], dtype=object)
Coordinates:
year int64 2015
* lat (lat) float64 -89.14 -88.03 -86.91 -85.79 ... 86.91 88.03 89.14
* lon (lon) float64 0.0 1.125 2.25 3.375 4.5 ... 355.5 356.6 357.8 358.9
I have another xr.DataArray on the same lat-lon grid for vertical velocity (omega) that has data for every day in 2015. At each lat-lon point I would like to select the velocity value on the corresponding day given in date_matrix2015. Ideally I would like to do something like this:
omega.sel(time=date_matrix2015)
I have tried constructing the new dataarray manually with iteration, but I haven't had much luck.
Does anyone have any ideas? Thank you in advance!
------------EDIT---------------
Here is a minimal reproducible example for the problem. To clarify what I am looking for: I have two DataArrays, one for daily precipitation values, and one for daily omega values. I want to determine for each lat/lon point the day that saw the maximum precipitation (I think I have done this part correctly). From there I want to select at each lat/lon point the omega value that occurred on the day of maximum precipitation. So ultimately I would like to end up with a DataArray of omega values that has two dimensions, lat and lon, where the value at each lat/lon point is the omega value on the day of maximum rainfall at that location.
import numpy as np
import xarray as xr
import pandas as pd
precip = np.abs(8*np.random.randn(10,10,10))
omega = 15*np.random.randn(10,10,10)
lat = np.arange(0,10)
lon = np.arange(0, 10)
##Note: actual data resolution is 160x360
dates = pd.date_range('01-01-2015', '01-10-2015')
precip_da = xr.DataArray(precip).rename({'dim_0':'time', 'dim_1':'lat', 'dim_2':'lon'}).assign_coords({'time':dates, 'lat':lat, 'lon':lon})
omega_da = xr.DataArray(omega).rename({'dim_0':'time', 'dim_1':'lat', 'dim_2':'lon'}).assign_coords({'time':dates, 'lat':lat, 'lon':lon})
#Find Date of maximum precip for each lat lon point and store in an array
maxDateMatrix = precip_da.idxmax(dim='time')
#For each lat lon point, select the value from omega_da on the day of maximum precip (i.e. the date given at that location in the maxDateMatrix)
You can pair da.sel with da.idxmax to select the index of the maxima along any number of dimensions:
In [10]: omega_da.sel(time=precip_da.idxmax(dim='time'))
Out[10]:
<xarray.DataArray (lat: 10, lon: 10)>
array([[ 17.72211193, -16.20781517, 9.65493368, -28.16691093,
18.8756182 , 16.81924325, -20.55251804, -18.36625778,
-19.57938236, -10.77385357],
[ 3.95402784, -5.28478105, -8.6632994 , 2.46787932,
20.53981254, -4.74908659, 9.5274101 , -1.08191372,
9.4637305 , -10.91884369],
[-31.30033085, 6.6284144 , 8.15945444, 5.74849304,
12.49505739, 2.11797825, -18.12861347, 7.27497695,
5.16197504, -32.99882591],
...
[-34.73945635, 24.40515233, 14.56982584, 12.16550083,
-8.3558104 , -20.16328749, -33.89051472, -0.09599935,
2.65689584, 29.54056082],
[-18.8660847 , -7.58120994, 15.57632568, 4.19142695,
8.71046261, 9.05684805, 8.48128361, 0.34166869,
8.41090015, -2.31386572],
[ -4.38999926, 17.00411671, 16.66619606, 24.99390669,
-14.01424591, 19.85606151, -16.87897 , 12.84205521,
-16.78824975, -6.33920671]])
Coordinates:
time (lat, lon) datetime64[ns] 2015-01-01 2015-01-01 ... 2015-01-10
* lat (lat) int64 0 1 2 3 4 5 6 7 8 9
* lon (lon) int64 0 1 2 3 4 5 6 7 8 9
See the great section of the xarray docs on Indexing and Selecting Data for more info, especially the section on Advanced Indexing, which goes into using DataArrays as indexers for powerful reshaping operations.

Vectorize this for loop in numpy

I am trying to compute matrix z (defined below) in python with numpy.
Here's my current solution (using 1 for loop)
z = np.zeros((n, k))
for i in range(n):
v = pi * (1 / math.factorial(x[i])) * np.exp(-1 * lamb) * (lamb ** x[i])
numerator = np.sum(v)
c = v / numerator
z[i, :] = c
return z
Is it possible to completely vectorize this computation? I need to do this computation for thousands of iterations, and matrix operations in numpy is much faster than huge for loops.
Here is a vectorized version of E. It replaces the for-loop and scalar arithmetic with NumPy broadcasting and array-based arithmetic:
def alt_E(x):
x = x[:, None]
z = pi * (np.exp(-lamb) * (lamb**x)) / special.factorial(x)
denom = z.sum(axis=1)[:, None]
z /= denom
return z
I ran em.py to get a sense for the typical size of x, lamb, pi, n and k. On data of this size,
alt_E is about 120x faster than E:
In [32]: %timeit E(x)
100 loops, best of 3: 11.5 ms per loop
In [33]: %timeit alt_E(x)
10000 loops, best of 3: 94.7 µs per loop
In [34]: 11500/94.7
Out[34]: 121.43611404435057
This is the setup I used for the benchmark:
import math
import numpy as np
import scipy.special as special
def alt_E(x):
x = x[:, None]
z = pi * (np.exp(-lamb) * (lamb**x)) / special.factorial(x)
denom = z.sum(axis=1)[:, None]
z /= denom
return z
def E(x):
z = np.zeros((n, k))
for i in range(n):
v = pi * (1 / math.factorial(x[i])) * \
np.exp(-1 * lamb) * (lamb ** x[i])
numerator = np.sum(v)
c = v / numerator
z[i, :] = c
return z
n = 576
k = 2
x = np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 5])
lamb = np.array([ 0.84835141, 1.04025989])
pi = np.array([ 0.5806958, 0.4193042])
assert np.allclose(alt_E(x), E(x))
By the way, E could also be calculated using scipy.stats.poisson:
import scipy.stats as stats
pois = stats.poisson(mu=lamb)
def alt_E2(x):
z = pi * pois.pmf(x[:,None])
denom = z.sum(axis=1)[:, None]
z /= denom
return z
but this does not turn out to be faster, at least for arrays of this length:
In [33]: %timeit alt_E(x)
10000 loops, best of 3: 94.7 µs per loop
In [102]: %timeit alt_E2(x)
1000 loops, best of 3: 278 µs per loop
For larger x, alt_E2 is faster:
In [104]: x = np.random.random(10000)
In [106]: %timeit alt_E(x)
100 loops, best of 3: 2.18 ms per loop
In [105]: %timeit alt_E2(x)
1000 loops, best of 3: 643 µs per loop

Convert string to integer pandas dataframe index

I have a pandas dataframe with a multiindex. Unfortunately one of the indices gives years as a string
e.g. '2010', '2011'
how do I convert these to integers?
More concretely
MultiIndex(levels=[[u'2010', u'2011'], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]],
labels=[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
10, 11, 12, , ...]], names=[u'Year', u'Month'])
.
df_cbs_prelim_total.index.set_levels(df_cbs_prelim_total.index.get_level_values(0).astype('int'))
seems to do it, but not inplace. Any proper way of changing them?
Cheers,
Mike
Will probably be cleaner to do this before you assign it as index (as #EdChum points out), but when you already have it as index, you can indeed use set_levels to alter one of the labels of a level of your multi-index. A bit cleaner as your code (you can use index.levels[..]):
In [165]: idx = pd.MultiIndex.from_product([[1,2,3], ['2011','2012','2013']])
In [166]: idx
Out[166]:
MultiIndex(levels=[[1, 2, 3], [u'2011', u'2012', u'2013']],
labels=[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]])
In [167]: idx.levels[1]
Out[167]: Index([u'2011', u'2012', u'2013'], dtype='object')
In [168]: idx = idx.set_levels(idx.levels[1].astype(int), level=1)
In [169]: idx
Out[169]:
MultiIndex(levels=[[1, 2, 3], [2011, 2012, 2013]],
labels=[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]])
You have to reassign it to save the changes (as is done above, in your case this would be df_cbs_prelim_total.index = df_cbs_prelim_total.index.set_levels(...))