Add items to a dataframe if item in column already present - pandas

Other than brute forcing it with loops, given a dataframe df:
A B C
0 True 1 23.0
1 False 2 25.0
2 ... ... ....
and a list of dicts lod:
[{'A': True, 'B':2, 'C':23}, {'A': True, 'B':1, 'C':24}...]
I would like to add the first element of the lod {A: True, B:2, C:23} because 23.0 is already in the df C column, but not the second element {A: True, B:1, C:24} because 24 is not a value in the C column of df.
So add all items of the list of dicts to the dataframe on a column value already being in the dataframe, otherwise continue to the next element.

You can convert list of dict to a data frame , then using isin
add=pd.DataFrame([{'A': True, 'B':2, 'C':23}, {'A': True, 'B':1, 'C':24}])
s=pd.concat([df,add[add.C.isin(df.C)]])
s
Out[464]:
A B C
0 True 1 23.0
1 False 2 25.0
0 True 2 23.0

Related

Join 2 data frame with special columns matching new

I want to join two dataframes and get result as below. I tried many ways, but it fails.
I want only texts on df2 ['A'] which contain text on df1 ['A']. What do I need to change in my code?
I want:
0 A0_link0
1 A1_link1
2 A2_link2
3 A3_link3
import pandas as pd
df1 = pd.DataFrame(
{
"A": ["A0", "A1", "A2", "A3"],
})
df2 = pd.DataFrame(
{ "A": ["A0_link0", "A1_link1", "A2_link2", "A3_link3", "A4_link4", 'An_linkn'],
"B" : ["B0_link0", "B1_link1", "B2_link2", "B3_link3", "B4_link4", 'Bn_linkn']
})
result = pd.concat([df1, df2], ignore_index=True, join= "inner", sort=False)
print(result)
Create an intermediate dataframe and map:
d = (df2.assign(key=df2['A'].str.extract(r'([^_]+)'))
.set_index('key'))
df1['A'].map(d['A'])
Output:
0 A0_link0
1 A1_link1
2 A2_link2
3 A3_link3
Name: A, dtype: object
Or merge if you want several columns from df2 (df1.merge(d, left_on='A', right_index=True))
You can set the index as An and pd.concat on columns
result = (pd.concat([df1.set_index(df1['A']),
df2.set_index(df2['A'].str.split('_').str[0])],
axis=1, join="inner", sort=False)
.reset_index(drop=True))
print(result)
A A B
0 A0 A0_link0 B0_link0
1 A1 A1_link1 B1_link1
2 A2 A2_link2 B2_link2
3 A3 A3_link3 B3_link3
df2.A.loc[df2.A.str.split('_',expand=True).iloc[:,0].isin(df1.A)]
0 A0_link0
1 A1_link1
2 A2_link2
3 A3_link3

How do I call my columns after transposing?

I have a dataframe I want to transpose, after doing this I need to call the columns but they are set as index. I have tried resetting the index to no avail
index False True
0 Scan_Periodicity_%_Changed 0.785003 0.214997
1 Assets_Scanned_%_Changed 0.542056 0.457944
I want the True and False columns to be regular columns but they are part of the index and I cannot call
df['True']
Expected Output:
False True
0 Scan_Periodicity_%_Changed 0.785003 0.214997
1 Assets_Scanned_%_Changed 0.542056 0.457944
and when i call True and False I want it to be a column not an index
df['True']
.214997
.457944
Try via reset_index(),set_index() and rename_axis() method:
out= (df.reset_index()
.set_index(['level_0','index'])
.rename_axis(index=[None,None]))
output of out:
False True
0 Scan_Periodicity_%_Changed 0.785003 0.214997
1 Assets_Scanned_%_Changed 0.542056 0.457944
Not sure what do you mean by "I want it to be a column not an index"
If you are looking for not having the 'index' value being displayed as a header you can do as
>>> import pandas as pd
>>>
>>> d = {
... 'index':['Scan_Periodicity_%_Changed','Assets_Scanned_%_Changed'],
... 'False':[0.78,0.54],
... 'True':[0.21,0.45]
... }
>>>
>>> df = pd.DataFrame(d)
>>>
>>> df = df.set_index('index')
>>>
>>> df.index.name = None
>>>
>>>
>>> df
False True
Scan_Periodicity_%_Changed 0.78 0.21
Assets_Scanned_%_Changed 0.54 0.45

Reorder pandas dataframe columns that has columns.name

I want to reorder columns of a dataframe generated from crosstab. However, the method I used doesn't work because it has columns.name
example data
d = {'levels':['High', 'High', 'Mid', 'Low', 'Low', 'Low', 'Mid'], 'converted':[True, True, True, False, False, True, False]}
df = pd.DataFrame(data=d)
df
levels converted
0 High True
1 High True
2 Mid True
3 Low False
4 Low False
5 Low True
6 Mid False
than I used crosstab to count it
cb = pd.crosstab(df['levels'], df['converted'])
cb
converted False True
levels
High 0 2
Low 2 1
Mid 1 1
I want to swap the order of the two columns. I tried cb[[True, False]] and got error ValueError: Item wrong length 2 instead of 3.
I guess it's because it has columns.name, which is converted
Try with sort_index, when the column type is bool, which will make the normal index slice not work
cb.sort_index(axis=1,ascending=False)
Out[190]:
converted True False
levels
High 2 0
Low 1 2
Mid 1 1
you can try the dataframe reindex method as below:
import pandas as pd
d = {'levels':['High', 'High', 'Mid', 'Low', 'Low', 'Low', 'Mid'], 'converted':[True, True, True, False, False, True, False]}
df = pd.DataFrame(data=d)
print(df)
cb = pd.crosstab(df['levels'],df['converted'])
print(cb)
column_titles = [True,False]
cb=cb.reindex(columns=column_titles)
print(cb)

How to find the value by checking the flag

data frame is below
uid,col1,col2,flag
1001,a,b,{'a':True,'b':False}
1002,a,b,{'a':False,'b':True}
out
a
b
by checking the flag, if a is true then print a on the out column, if b flag is true then print b on the out column
IIUC, you can use dot after DataFrame constructor:
m=pd.DataFrame(df['flag'].tolist()).fillna(False)
final=df.assign(New=m.dot(m.columns))
print(final)
uid col1 col2 flag New
0 1001 a b {'a': True} a
1 1002 a b {'b': True} b
If you just want to evaluate the flags column (and col1 and col2 won't be used in any way as per your question), then you can simply get the first key from the flags dict where the value is True:
df.flag.apply(lambda x: next((k for k,v in x.items() if v), ''))
(instead of '' you can, of course, supply any other value for the case that none of the values in the dict is True)
Example:
import pandas as pd
import io
import ast
s = '''uid,col1,col2,flag
1001,a,b,"{'a':True,'b':False}"
1002,a,b,"{'a':False,'b':True}"
1003,a,b,"{'a':True,'b':True}"
1004,a,b,"{'a':False,'b':False}"'''
df = pd.read_csv(io.StringIO(s))
df.flag = df.flag.map(ast.literal_eval)
df['out'] = df.flag.apply(lambda x: next((k for k,v in x.items() if v), ''))
Result
uid col1 col2 flag out
0 1001 a b {'a': True, 'b': False} a
1 1002 a b {'a': False, 'b': True} b
2 1003 a b {'a': True, 'b': True} a
3 1004 a b {'a': False, 'b': False}
Method 1
We can also use Series.apply
to convert the dictionary to series, then remove the fake ones with boolean indexing + DataFrame.stack and select a or b from the index with Index.get_level_values:
s = df['flag'].apply(pd.Series)
df['new']=s[s].stack().index.get_level_values(1)
#df['new']=np.dot(s,s.columns) #or this
print(df)
Method 2:
We can also check the items with Series.apply and save the key in a list if the value is True.
Finally we use Series.explode if we want to get rid of the list.
df['new']=df['flag'].apply(lambda x: [k for k,v in x.items() if v])
df = df.explode('new')
print(df)
or without apply:
df=df.assign(new=[[k for k,v in d.items() if v] for d in df['flag']]).explode('new')
print(df)
Output
uid col1 col2 flag new
0 1001 a b {'a': True, 'b': False} a
1 1002 a b {'a': False, 'b': True} b

Recode a pandas.Series containing 0, 1, and NaN to False, True, and NaN

Suppose I have a Series with NaNs:
pd.Series([0, 1, None, 1])
I want to transform this to be equal to:
pd.Series([False, True, None, True])
You'd think x == 1 would suffice, but instead, this returns:
pd.Series([False, True, False, True])
where the null value has become False. This is because np.nan == 1 returns False, rather than None or np.nan as in R.
Is there a nice, vectorized way to get what I want?
Maybe map can do it:
import pandas as pd
x = pd.Series([0, 1, None, 1])
print x.map({1: True, 0: False})
0 False
1 True
2 NaN
3 True
dtype: object
You can use where:
In [11]: (s == 1).where(s.notnull(), np.nan)
Out[11]:
0 0
1 1
2 NaN
3 1
dtype: float64
Note: the True and False have been cast to float as 0 and 1.