Any smarter way to use three where clause? - pandas

I have an input like this:
Material
Country
CS
RR
MR
VR
XCEPH8710L-13
IT
False
False
True
False
XC4PH8720H-13
GR
False
False
True
False
XCEPH8711H-13
MO
False
True
True
True
XCEPH8710H-13
IT
False
False
True
True
QCEPH8710H-13
RU
False
False
True
True
QCEMH8810-13
IN
False
False
True
True
QCEPH8710H-13
HK
False
True
True
False
XCEPH8710H-13
IT
True
True
True
False
XCEPH8710H-13
RU
True
True
True
False
XCEPH8710H-13
PO
False
True
True
True
QCEPH8710H-13
ES
False
False
True
False
XCEPH8710H-13
IN
False
False
False
False
QCEPH8710H-13
IT
False
False
False
False
XCEPH8710H-13
RU
False
False
True
False
XCEPH8710H-13
RO
False
True
False
False
XCEPH8710H-13
MN
True
False
False
False
I want my output to look like the comments I have in the snip below
Material
Country
CS
RR
MR
VR
Comments
XCEPH8710L-13
IT
False
False
True
False
Update Database
XC4PH8720H-13
GR
False
False
True
False
Update Database
XCEPH8711H-13
MO
False
True
True
True
Restricted
XCEPH8710H-13
IT
False
False
True
True
Update Database
QCEPH8710H-13
RU
False
False
True
True
Update Database
QCEMH8810-13
IN
False
False
True
True
Update Database
QCEPH8710H-13
HK
False
True
True
False
Restricted
XCEPH8710H-13
IT
True
True
True
False
Restricted
XCEPH8710H-13
RU
True
True
True
False
Restricted
XCEPH8710H-13
PO
False
True
True
True
Restricted
QCEPH8710H-13
ES
False
False
True
False
Restricted
XCEPH8710H-13
IN
False
False
False
False
Abort
QCEPH8710H-13
IT
False
False
False
False
Abort
XCEPH8710H-13
RU
False
False
True
False
Restricted
XCEPH8710H-13
RO
False
True
False
False
Abort
XCEPH8710H-13
MN
True
False
False
False
Abort
The code which i have used is below, but the comments are not changing dynamically, is there a better way to do it using pandas or numpy or anything else?
Import Janitor
df.case_when(
df.'RR'.eq("True") & df.'CS'.eq("False"), "Restricted",
df.RR.eq("False") & df.CS.eq("True"), "UPDATE database",
df.CS.eq("True"), "Abort",
"black",
column_name = "Comments"
)
Logic:
Priority 1>
Any False value of MR or VR should be 'Abort'
Priority 2>
Any True value of MR or VR should be acceptable and then checked on RR
If RR is 'True' then 'Restricted'
Priority 3>
Any True value of MR or VR should be acceptable and then checked on RR
If CS is 'False' then 'UPDATE Database'
See the table now it will be clearer:
Abort
Abort
UPDATE Database
Resriected
Resriected
NA
Restricted
Abort
CS
True
False
False
True
False
True
True
False
RR
False
True
False
True
True
False
True
False
MR and VR
False
False
True
False
True
True
True
False

Related

Compare values across multiple columns in a dataframe for each cp_id

I have a dataframe which looks like below . The cp_id are duplicated and has different values in each columns. I need to compare first occurence of each value in every column and compare it with all other for same cp_id. The result should be an additional column telling where there is a mismtach.
The resulting dataframe should look like below :
What is the most optimal way to acheive this ?
Idea is test if number of unique values per groups is not 1 by GroupBy.transform and DataFrameGroupBy.nunique and because multiple columns should be wrong is used DataFrame.dot with columns names for error column, joined values are by ,:
m = df.groupby('cp_id').transform('nunique').ne(1)
df['Result'] = m.dot(m.columns + ' is not match,').str[:-1]
Details:
print (m)
p_code channel subchannel
0 True False False
1 True False False
2 True False False
3 True False False
4 True False False
5 True False False
6 True False False
7 True False False
8 True False False
9 True False False
10 True False False
11 True False False
12 True False False
13 True False False
14 False False True
15 False False True
16 True False False
17 True False False
18 True False False
19 True False False
20 True False False
If need comapre first value per groups ouput is different:
m = df.groupby('cp_id').transform('first').ne(df.drop('cp_id', axis=1))
df['Result'] = m.dot(m.columns + ' is not match,').str[:-1]
Details:
print (m)
p_code channel subchannel
0 False False False
1 True False False
2 False False False
3 True False False
4 False False False
5 True False False
6 True False False
7 True False False
8 False False False
9 True False False
10 False False False
11 True False False
12 False False False
13 True False False
14 False False False
15 False False True
16 False False False
17 True False False
18 False False False
19 False False False
20 True False False

Boolean and of three columns in pandas

I have a dataframe as shown below
ID F1 F2 F3
1 True False False
2 True True True
3 False False False
4 True False False
5 True True True
From the above, I want to create new column which will be True if F1, F2 and F3 are True
ID F1 F2 F3 CONSIDER
1 True False False False
2 True True True True
3 False False False False
4 True False False False
5 True True True True
Use DataFrame.all with filtered columns in list:
df['CONSIDER'] = df[['F1','F2','F3']].all(axis=1)
print (df)
ID F1 F2 F3 CONSIDER
0 1 True False False False
1 2 True True True True
2 3 False False False False
3 4 True False False False
4 5 True True True True

Making subset of Julia dataframe with values greater than x

For instance if I create a dataframe as follows
X = rand(10, 10)
X = convert(DataFrame, X)
How do I show values > 0.5
Thanks in advance.
We should use the most practical tool to accomplish what we want. I don't think a DataFrame is of much use here, so I suggest you convert it back to a numeric array. Of course, you're not planning to use this on an array of random numbers.
using DataFrames
X = rand(10, 10)
X = convert(DataFrame, X)
X = convert(Matrix, X)
ind = X .> 0.5
vals = X[ind]
ind
vals
#=
Main> ind
10×10 BitArray{2}:
false false true true true true false false true true
true false false true false false true true false false
true true false true false true true false false true
false true false true true false false true true false
true false true true true true false true true true
true false false true true true true true true true
false false true false false true true false false true
true true false true true false true true false false
false false false true true true false true false false
true true true false true true true false false false
Main> vals
57-element Array{Float64,1}:
0.52951
0.743512
0.576547
0.697369
0.951203
0.656204
⋮
0.803584
0.714883
0.730805
0.529729
0.845263
=#
You can check the position of the subset in the original array in ind and the actual values in vals

pd.concat with multiindex

I have a multi-indexed df and I want to perform an elementwise operation on it that differs depending on a string in the level 1 column and then combine them using the same index/column structure.
dic = {'X':pd.DataFrame(np.random.randn(10, 2), columns = ['A','B']),
'Y':pd.DataFrame(np.random.randn(10, 2), columns = ['A','B']),
'Z':pd.DataFrame(np.random.randn(10, 2), columns = ['A','B'])}
multi = pd.concat(dic.values(),axis=1,keys=dic.keys())
a = multi[multi.filter(like='A').columns].applymap(lambda x: x>=1 and x <= 2)
b = multi[multi.filter(like='B').columns].applymap(lambda x: x>=-1 and x <= 1)
pd.concat([a,b], axis = 1) gives me the correct data
Out[164]:
X Y Z X Y Z
A A A B B B
0 False False False True False True
1 True True False True True False
2 False False False True False True
3 False False False False True True
4 False False False True True True
5 False False True False True True
6 False False False False True True
7 False False False True True True
8 False True False True False True
9 False False False True False False
But I want it displayed
Out[168]:
X Y Z
A B A B A B
0 False True False False False True
1 True True True True False False
2 False True False False False True
3 False False False True False True
4 False True False True False True
5 False False False True True True
6 False False False True False True
7 False True False True False True
8 False True True False False True
9 False True False False False False
Add sort_index
pd.concat([a,b], axis = 1).sort_index(axis=1)
Out[162]:
X Y Z
A B A B A B
0 False True False True False False
1 False True False True False True
2 False True False True False False
3 False True False True False True
4 False False False True False True
5 False True False False False True
6 False True False True False False
7 False True False True False True
8 False True True True False True
9 False False False True False False

Ignore NaN in Boolean slicing in Pandas

I need to find (and remove) all rows where all elements are greater than some constant, ignoring NaN values:
In[23]: df
Out[23]:
0 1 2 3 4
290 4.0 16.0 18.0 12.0 8.0
291 16.0 18.0 12.0 8.0 9.0
292 18.0 12.0 8.0 9.0 9.0
293 12.0 8.0 9.0 9.0 15.0
294 8.0 9.0 9.0 15.0 18.0
295 9.0 9.0 15.0 18.0 18.0
296 9.0 15.0 18.0 18.0 16.0
297 15.0 18.0 18.0 16.0 20.0
298 18.0 18.0 16.0 20.0 NaN
299 18.0 16.0 20.0 NaN 16.0
300 16.0 20.0 NaN 16.0 14.0
301 20.0 NaN 16.0 14.0 NaN
302 NaN 16.0 14.0 NaN 16.0
303 16.0 14.0 NaN 16.0 15.0
304 14.0 NaN 16.0 15.0 15.0
305 NaN 16.0 15.0 15.0 12.0
306 16.0 15.0 15.0 12.0 16.0
307 15.0 15.0 12.0 16.0 15.0
308 15.0 12.0 16.0 15.0 14.0
309 12.0 16.0 15.0 14.0 17.0
By doing naive:
In[24]:df>10
Out[24]:
0 1 2 3 4
290 False True True True False
291 True True True False False
292 True True False False False
293 True False False False True
294 False False False True True
295 False False True True True
296 False True True True True
297 True True True True True
298 True True True True False
299 True True True False True
300 True True False True True
301 True False True True False
302 False True True False True
303 True True False True True
304 True False True True True
305 False True True True True
306 True True True True True
307 True True True True True
308 True True True True True
309 True True True True True
which misses out several legit rows due to the fact, that NaN is not > 10, and thus giving False
I need the rows 297-309 be removed. How to amend the Boolean indexing to ignore NaN values?
You need to or using bitwise | the boolean conditions with isnull, additionally use parentheses for the conditions due to operator precedence:
In [326]:
(df > 10) | (df.isnull())
Out[326]:
0 1 2 3 4
290 False True True True False
291 True True True False False
292 True True False False False
293 True False False False True
294 False False False True True
295 False False True True True
296 False True True True True
297 True True True True True
298 True True True True True
299 True True True True True
300 True True True True True
301 True True True True True
302 True True True True True
303 True True True True True
304 True True True True True
305 True True True True True
306 True True True True True
307 True True True True True
308 True True True True True
309 True True True True True
Use isnull with | (or):
mask = (df>10) | df.isnull()
#alternatively
#mask = (df.gt(10)) | df.isnull()
print (mask)
0 1 2 3 4
290 False True True True False
291 True True True False False
292 True True False False False
293 True False False False True
294 False False False True True
295 False False True True True
296 False True True True True
297 True True True True True
298 True True True True True
299 True True True True True
300 True True True True True
301 True True True True True
302 True True True True True
303 True True True True True
304 True True True True True
305 True True True True True
306 True True True True True
307 True True True True True
308 True True True True True
309 True True True True True
It return True for NaNs:
print (df.isnull())
0 1 2 3 4
290 False False False False False
291 False False False False False
292 False False False False False
293 False False False False False
294 False False False False False
295 False False False False False
296 False False False False False
297 False False False False False
298 False False False False True
299 False False False True False
300 False False True False False
301 False True False False True
302 True False False True False
303 False False True False False
304 False True False False False
305 True False False False False
306 False False False False False
307 False False False False False
308 False False False False False
309 False False False False False