Reorder pandas dataframe columns that has columns.name - pandas

I want to reorder columns of a dataframe generated from crosstab. However, the method I used doesn't work because it has columns.name
example data
d = {'levels':['High', 'High', 'Mid', 'Low', 'Low', 'Low', 'Mid'], 'converted':[True, True, True, False, False, True, False]}
df = pd.DataFrame(data=d)
df
levels converted
0 High True
1 High True
2 Mid True
3 Low False
4 Low False
5 Low True
6 Mid False
than I used crosstab to count it
cb = pd.crosstab(df['levels'], df['converted'])
cb
converted False True
levels
High 0 2
Low 2 1
Mid 1 1
I want to swap the order of the two columns. I tried cb[[True, False]] and got error ValueError: Item wrong length 2 instead of 3.
I guess it's because it has columns.name, which is converted

Try with sort_index, when the column type is bool, which will make the normal index slice not work
cb.sort_index(axis=1,ascending=False)
Out[190]:
converted True False
levels
High 2 0
Low 1 2
Mid 1 1

you can try the dataframe reindex method as below:
import pandas as pd
d = {'levels':['High', 'High', 'Mid', 'Low', 'Low', 'Low', 'Mid'], 'converted':[True, True, True, False, False, True, False]}
df = pd.DataFrame(data=d)
print(df)
cb = pd.crosstab(df['levels'],df['converted'])
print(cb)
column_titles = [True,False]
cb=cb.reindex(columns=column_titles)
print(cb)

Related

How to show rows with data which are not equal?

I have two tables
import pandas as pd
import numpy as np
df2 = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]),
columns=['a', 'b', 'c'])
df1 = pd.DataFrame(np.array([[1, 2, 4], [4, 5, 6], [7, 8, 9]]),
columns=['a', 'b', 'c'])
print(df1.equals(df2))
I want to compare them. I want the same result if I would use function df.compare(df1) or at least something close to it. Can't use above fnction as my complier states that 'DataFrame' object has no attribute 'compare'
First approach:
Let's compare value by value:
In [1183]: eq_df = df1.eq(df2)
In [1196]: eq_df
Out[1200]:
a b c
0 True True False
1 True True True
2 True True True
Then let's reduce it down to see which rows are equal for all columns
from functools import reduce
In [1285]: eq_ser = reduce(np.logical_and, (eq_df[c] for c in eq_df.columns))
In [1288]: eq_ser
Out[1293]:
0 False
1 True
2 True
dtype: bool
Now we can print out the rows which are not equal
In [1310]: df1[~eq_ser]
Out[1315]:
a b c
0 1 2 4
In [1316]: df2[~eq_ser]
Out[1316]:
a b c
0 1 2 3
Second approach:
def diff_dataframes(
df1, df2, compare_cols=None
) -> Tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame]:
"""
Given two dataframes and column(s) to compare, return three dataframes with rows:
- common between the two dataframes
- found only in the left dataframe
- found only in the right dataframe
"""
df1 = df1.fillna(pd.NA)
df = df1.merge(df2.fillna(pd.NA), how="outer", on=compare_cols, indicator=True)
df_both = df.loc[df["_merge"] == "both"].drop(columns="_merge")
df_left = df.loc[df["_merge"] == "left_only"].drop(columns="_merge")
df_right = df.loc[df["_merge"] == "right_only"].drop(columns="_merge")
tup = namedtuple("df_diff", ["common", "left", "right"])
return tup(df_both, df_left, df_right)
Usage:
In [1366]: b, l, r = diff_dataframes(df1, df2)
In [1371]: l
Out[1371]:
a b c
0 1 2 4
In [1372]: r
Out[1372]:
a b c
3 1 2 3
Third approach:
In [1440]: eq_ser = df1.eq(df2).sum(axis=1).eq(len(df1.columns))

How do I call my columns after transposing?

I have a dataframe I want to transpose, after doing this I need to call the columns but they are set as index. I have tried resetting the index to no avail
index False True
0 Scan_Periodicity_%_Changed 0.785003 0.214997
1 Assets_Scanned_%_Changed 0.542056 0.457944
I want the True and False columns to be regular columns but they are part of the index and I cannot call
df['True']
Expected Output:
False True
0 Scan_Periodicity_%_Changed 0.785003 0.214997
1 Assets_Scanned_%_Changed 0.542056 0.457944
and when i call True and False I want it to be a column not an index
df['True']
.214997
.457944
Try via reset_index(),set_index() and rename_axis() method:
out= (df.reset_index()
.set_index(['level_0','index'])
.rename_axis(index=[None,None]))
output of out:
False True
0 Scan_Periodicity_%_Changed 0.785003 0.214997
1 Assets_Scanned_%_Changed 0.542056 0.457944
Not sure what do you mean by "I want it to be a column not an index"
If you are looking for not having the 'index' value being displayed as a header you can do as
>>> import pandas as pd
>>>
>>> d = {
... 'index':['Scan_Periodicity_%_Changed','Assets_Scanned_%_Changed'],
... 'False':[0.78,0.54],
... 'True':[0.21,0.45]
... }
>>>
>>> df = pd.DataFrame(d)
>>>
>>> df = df.set_index('index')
>>>
>>> df.index.name = None
>>>
>>>
>>> df
False True
Scan_Periodicity_%_Changed 0.78 0.21
Assets_Scanned_%_Changed 0.54 0.45

Generate combinations with specified order with itertools.combinations

I used itertools.combinations to generate combinations for a dataframe's index. I'd like the combinations in specified order --> (High - Mid - Low)
Example
from itertools import combinations
d = {'levels':['High', 'High', 'Mid', 'Low', 'Low', 'Low', 'Mid'], 'converted':[True, True, True, False, False, True, False]}
df = pd.DataFrame(data=d)
df_ = pd.crosstab(df['levels'], df['converted'])
df_
converted False True
levels
High 0 2
Low 2 1
Mid 1 1
list(combinations(df_.index, 2)) returns [('High', 'Low'), ('High', 'Mid'), ('Low', 'Mid')]
I'd like the third group to be ('Mid', 'Low'), how can I achieve this ?
Use DataFrame.reindex first, but first and second values in list are swapped:
order = ['High','Mid','Low']
a = list(combinations(df_.reindex(order).index, 2))
print (a)
[('High', 'Mid'), ('High', 'Low'), ('Mid', 'Low')]

Add items to a dataframe if item in column already present

Other than brute forcing it with loops, given a dataframe df:
A B C
0 True 1 23.0
1 False 2 25.0
2 ... ... ....
and a list of dicts lod:
[{'A': True, 'B':2, 'C':23}, {'A': True, 'B':1, 'C':24}...]
I would like to add the first element of the lod {A: True, B:2, C:23} because 23.0 is already in the df C column, but not the second element {A: True, B:1, C:24} because 24 is not a value in the C column of df.
So add all items of the list of dicts to the dataframe on a column value already being in the dataframe, otherwise continue to the next element.
You can convert list of dict to a data frame , then using isin
add=pd.DataFrame([{'A': True, 'B':2, 'C':23}, {'A': True, 'B':1, 'C':24}])
s=pd.concat([df,add[add.C.isin(df.C)]])
s
Out[464]:
A B C
0 True 1 23.0
1 False 2 25.0
0 True 2 23.0

Recode a pandas.Series containing 0, 1, and NaN to False, True, and NaN

Suppose I have a Series with NaNs:
pd.Series([0, 1, None, 1])
I want to transform this to be equal to:
pd.Series([False, True, None, True])
You'd think x == 1 would suffice, but instead, this returns:
pd.Series([False, True, False, True])
where the null value has become False. This is because np.nan == 1 returns False, rather than None or np.nan as in R.
Is there a nice, vectorized way to get what I want?
Maybe map can do it:
import pandas as pd
x = pd.Series([0, 1, None, 1])
print x.map({1: True, 0: False})
0 False
1 True
2 NaN
3 True
dtype: object
You can use where:
In [11]: (s == 1).where(s.notnull(), np.nan)
Out[11]:
0 0
1 1
2 NaN
3 1
dtype: float64
Note: the True and False have been cast to float as 0 and 1.