Pandas is condition on multiple columns

Pandas is condition on multiple columns - pandas

I have a dataframe
col1 col2 col3 col4
A F F F
B F A B
C B A C
D S A F
I want to say if A and F in any of these columns then make a new column and enter "Found"
col1 col2 col3 col4 output
A F F F Found
B F A B Found
C B A C 0
D S A F Found

Use :
df['output']=np.where(df.eq('A').any(1) & df.eq('F').any(1),'Found',0)
Another approach:
df['output']=(df.eq('A').any(1) & df.eq('F').any(1)).map({True:'Found',False:0})
Output:
col1 col2 col3 col4 output
0 A F F F Found
1 B F A B Found
2 C B A C 0
3 D S A F Found

Try this:
df.loc[df.apply(lambda x: ((x=='F').any() & (x=='A').any()).any(),axis=1), 'output'] = 'Found'
df.fillna(0)

You can use pd.DataFrame.where():
df.where(lambda x: (x=='A') | (x=='F')).dropna(thresh=1)

Related

How to set Conditional Row Number by ordinal string values in sub-column from the table

If query returns the table:
col
sub_col
A
A
A
B
A
C
A
A
A
B
A
C
B
A
B
B
B
B
B
C
B
A
B
B
B
B
B
C
Output should be like:
col
sub_col
order_by_sub_col
A
A
0
A
B
1
A
C
2
A
A
0
A
B
1
A
C
2
B
A
0
B
B
1
B
B
2
B
C
3
B
A
0
B
B
1
B
B
2
B
C
3

Finding occurances by comparing 2 columns in dataframe

This is my dataframe:
d = {'id':[1,2,3,4,5,6,7,8],
'col1':['A','A','A','B','B','B','C','D'],
'col2':['C','C','D', 'E', 'F', 'F','G','H'],
'data':['abc','def','ghk','lmn','opq','rst','uvw','xyz']
}
df = pd.DataFrame(d)
I want to find all values in col2 for each unique value in col1. Think of col1 as being a house and col2 as the number of devices in it.
Output:
col1 col2 data
A C abc
def
D ghk
B E lmn
F opq
rst
C G uvw
D H xyz
Update:
Since I have a large number of rows in my original dataset(98k rows), would it be great if I could get a list of values from col1 which have more than one row in col2. Based on my Output, I would need a list with values ['A','B']

If you insist on getting exactly that output, here's one way:
df = df.drop_duplicates(subset=[
'col1', 'col2'
]).drop('id', axis=1).reset_index(drop=True)
df['col1'] = np.where(df.col1.duplicated()==True, '', df.col1)
Which produces:
col1 col2
0 A C
1 D
2 B E
3 F
You might even want to go as far as:
df = df.set_index('col1')
Which produces:
col2
col1
A C
D
B E
F
To export to csv or excel simply do one of the following:
df.to_csv('filename.csv')
df.to_excel('filename.xlsx')
UPDATE: Based on the update in the question, the list of values from col1 can be obtained as follows:
list(df.groupby('col1').col1.filter(lambda x: len(x)>1).unique())
Which produces:
['A', 'B']

Try groupby() with an application of unique():
In [26]: df.groupby('col1').col2.unique()
Out[26]:
col1
A [C, D]
B [E, F]
Name: col2, dtype: object

Exclude rows distinct except for null values

I'm trying to write a query that will return distinct rows while excluding rows that don't have maximum data.
table1
col1 col2 col3 col4 col5
one a b c d
two a b d
three a b c
four a c d
five a b
six a c
seven a e
Basically, I want a query that will return the following from the table above
col1 col2 col3 col4 col5
one a b c d
six a c
seven a e

search values in dataframe and export to new column

I have a large datatset( about 1M rows)
within this dataset, I want to find some values in one of columns (or multiple columns)
for example,
df contains
col1 col2 col3
-------------------
a b c
d e f
g h i
j k l
m n o
what i'm looking for is that searching each row and if the given value exist, then output a "YES" in new col4
any help?
thanks

Scenario 1: search whole dataframe
We can use DataFrame.eq with any over the column axis, so for each row. This means, if the value a is in any of the column for a row, we get True:
df['indicator'] = df.eq('a').any(axis=1)
col1 col2 col3 indicator
0 a b c True
1 d e f False
2 g h i False
3 j k l False
4 m n o False
Scenario 2: for some columns:
We can apply the same logic for sub selection of columns, if we use iloc, to select the first two columns
df['indicator'] = df.iloc[:, :2].eq('d').any(axis=1)
col1 col2 col3 indicator
0 a b c False
1 d e f True
2 g h i False
3 j k l False
4 m n o False

Count duplicate records by using linq

Count duplicate records by using linq
.................................................................................................................................................
Col1 col2
x a
x a
x b
x b
y c
y c
y d
y d
z e
z e
z f
now i want count like follows
x a 2
x b 2
y c 2
y d 2
in linq plese any one assist me

table
.GroupBy(x=>new{x.col1,x.clo2})
.Select(x=>new{ x.key.col1,x.key.col2,x.Count(z=>z.col1)

var Result =
from t in table
group t by new
{
t.col1,
t.col2,
} into gt
select new
{
col1 = gt.Key.col1,
col2 = gt.Key.col2,
count = gt.Count(),
};

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Pandas is condition on multiple columns - pandas

I have a dataframe col1 col2 col3 col4 A F F F B F A B C B A C D S A F I want to say if A and F in any of these columns then make a new column and enter "Found" col1 col2 col3 col4 output A F F F Found B F A B Found C B A C 0 D S A F Found

Use : df['output']=np.where(df.eq('A').any(1) & df.eq('F').any(1),'Found',0) Another approach: df['output']=(df.eq('A').any(1) & df.eq('F').any(1)).map({True:'Found',False:0}) Output: col1 col2 col3 col4 output 0 A F F F Found 1 B F A B Found 2 C B A C 0 3 D S A F Found

Try this: df.loc[df.apply(lambda x: ((x=='F').any() & (x=='A').any()).any(),axis=1), 'output'] = 'Found' df.fillna(0)

You can use pd.DataFrame.where(): df.where(lambda x: (x=='A') | (x=='F')).dropna(thresh=1)

Related

How to set Conditional Row Number by ordinal string values in sub-column from the table

Finding occurances by comparing 2 columns in dataframe

Exclude rows distinct except for null values

search values in dataframe and export to new column

Count duplicate records by using linq

Categories

Resources