I want to reorder columns of a dataframe generated from crosstab. However, the method I used doesn't work because it has
example data
d = {'levels':['High', 'High', 'Mid', 'Low', 'Low', 'Low', 'Mid'], 'converted':[True, True, True, False, False, True, False]}
df = pd.DataFrame(data=d)
levels converted
0 High True
1 High True
2 Mid True
3 Low False
4 Low False
5 Low True
6 Mid False
than I used crosstab to count it
cb = pd.crosstab(df['levels'], df['converted'])
converted False True
High 0 2
Low 2 1
Mid 1 1
I want to swap the order of the two columns. I tried cb[[True, False]] and got error ValueError: Item wrong length 2 instead of 3.
I guess it's because it has, which is converted

Try with sort_index, when the column type is bool, which will make the normal index slice not work
converted True False
High 2 0
Low 1 2
Mid 1 1

you can try the dataframe reindex method as below:
import pandas as pd
d = {'levels':['High', 'High', 'Mid', 'Low', 'Low', 'Low', 'Mid'], 'converted':[True, True, True, False, False, True, False]}
df = pd.DataFrame(data=d)
cb = pd.crosstab(df['levels'],df['converted'])
column_titles = [True,False]


How to show rows with data which are not equal?

I have two tables
import pandas as pd
import numpy as np
df2 = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]),
columns=['a', 'b', 'c'])
df1 = pd.DataFrame(np.array([[1, 2, 4], [4, 5, 6], [7, 8, 9]]),
columns=['a', 'b', 'c'])
I want to compare them. I want the same result if I would use function or at least something close to it. Can't use above fnction as my complier states that 'DataFrame' object has no attribute 'compare'
First approach:
Let's compare value by value:
In [1183]: eq_df = df1.eq(df2)
In [1196]: eq_df
a b c
0 True True False
1 True True True
2 True True True
Then let's reduce it down to see which rows are equal for all columns
from functools import reduce
In [1285]: eq_ser = reduce(np.logical_and, (eq_df[c] for c in eq_df.columns))
In [1288]: eq_ser
0 False
1 True
2 True
dtype: bool
Now we can print out the rows which are not equal
In [1310]: df1[~eq_ser]
a b c
0 1 2 4
In [1316]: df2[~eq_ser]
a b c
0 1 2 3
Second approach:
def diff_dataframes(
df1, df2, compare_cols=None
) -> Tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame]:
Given two dataframes and column(s) to compare, return three dataframes with rows:
- common between the two dataframes
- found only in the left dataframe
- found only in the right dataframe
df1 = df1.fillna(pd.NA)
df = df1.merge(df2.fillna(pd.NA), how="outer", on=compare_cols, indicator=True)
df_both = df.loc[df["_merge"] == "both"].drop(columns="_merge")
df_left = df.loc[df["_merge"] == "left_only"].drop(columns="_merge")
df_right = df.loc[df["_merge"] == "right_only"].drop(columns="_merge")
tup = namedtuple("df_diff", ["common", "left", "right"])
return tup(df_both, df_left, df_right)
In [1366]: b, l, r = diff_dataframes(df1, df2)
In [1371]: l
a b c
0 1 2 4
In [1372]: r
a b c
3 1 2 3
Third approach:
In [1440]: eq_ser = df1.eq(df2).sum(axis=1).eq(len(df1.columns))

How do I call my columns after transposing?

I have a dataframe I want to transpose, after doing this I need to call the columns but they are set as index. I have tried resetting the index to no avail
index False True
0 Scan_Periodicity_%_Changed 0.785003 0.214997
1 Assets_Scanned_%_Changed 0.542056 0.457944
I want the True and False columns to be regular columns but they are part of the index and I cannot call
Expected Output:
False True
0 Scan_Periodicity_%_Changed 0.785003 0.214997
1 Assets_Scanned_%_Changed 0.542056 0.457944
and when i call True and False I want it to be a column not an index
Try via reset_index(),set_index() and rename_axis() method:
out= (df.reset_index()
output of out:
False True
0 Scan_Periodicity_%_Changed 0.785003 0.214997
1 Assets_Scanned_%_Changed 0.542056 0.457944
Not sure what do you mean by "I want it to be a column not an index"
If you are looking for not having the 'index' value being displayed as a header you can do as
>>> import pandas as pd
>>> d = {
... 'index':['Scan_Periodicity_%_Changed','Assets_Scanned_%_Changed'],
... 'False':[0.78,0.54],
... 'True':[0.21,0.45]
... }
>>> df = pd.DataFrame(d)
>>> df = df.set_index('index')
>>> = None
>>> df
False True
Scan_Periodicity_%_Changed 0.78 0.21
Assets_Scanned_%_Changed 0.54 0.45

Generate combinations with specified order with itertools.combinations

I used itertools.combinations to generate combinations for a dataframe's index. I'd like the combinations in specified order --> (High - Mid - Low)
from itertools import combinations
d = {'levels':['High', 'High', 'Mid', 'Low', 'Low', 'Low', 'Mid'], 'converted':[True, True, True, False, False, True, False]}
df = pd.DataFrame(data=d)
df_ = pd.crosstab(df['levels'], df['converted'])
converted False True
High 0 2
Low 2 1
Mid 1 1
list(combinations(df_.index, 2)) returns [('High', 'Low'), ('High', 'Mid'), ('Low', 'Mid')]
I'd like the third group to be ('Mid', 'Low'), how can I achieve this ?
Use DataFrame.reindex first, but first and second values in list are swapped:
order = ['High','Mid','Low']
a = list(combinations(df_.reindex(order).index, 2))
print (a)
[('High', 'Mid'), ('High', 'Low'), ('Mid', 'Low')]

Add items to a dataframe if item in column already present

Other than brute forcing it with loops, given a dataframe df:
0 True 1 23.0
1 False 2 25.0
2 ... ... ....
and a list of dicts lod:
[{'A': True, 'B':2, 'C':23}, {'A': True, 'B':1, 'C':24}...]
I would like to add the first element of the lod {A: True, B:2, C:23} because 23.0 is already in the df C column, but not the second element {A: True, B:1, C:24} because 24 is not a value in the C column of df.
So add all items of the list of dicts to the dataframe on a column value already being in the dataframe, otherwise continue to the next element.
You can convert list of dict to a data frame , then using isin
add=pd.DataFrame([{'A': True, 'B':2, 'C':23}, {'A': True, 'B':1, 'C':24}])
0 True 1 23.0
1 False 2 25.0
0 True 2 23.0

Recode a pandas.Series containing 0, 1, and NaN to False, True, and NaN

Suppose I have a Series with NaNs:
pd.Series([0, 1, None, 1])
I want to transform this to be equal to:
pd.Series([False, True, None, True])
You'd think x == 1 would suffice, but instead, this returns:
pd.Series([False, True, False, True])
where the null value has become False. This is because np.nan == 1 returns False, rather than None or np.nan as in R.
Is there a nice, vectorized way to get what I want?
Maybe map can do it:
import pandas as pd
x = pd.Series([0, 1, None, 1])
print{1: True, 0: False})
0 False
1 True
2 NaN
3 True
dtype: object
You can use where:
In [11]: (s == 1).where(s.notnull(), np.nan)
0 0
1 1
2 NaN
3 1
dtype: float64
Note: the True and False have been cast to float as 0 and 1.