Create new column based on information in two other columns

Create new column based on information in two other columns - conditional-statements

I have a data frame with 2 columns of information that I want to compare to create a new condition in a new column.
PPT
1
2
1
A
1
2
A
2
3
A
3
4
B
1
5
B
2
6
B
3
7
C
1
8
C
2
9
C
3
I want the new column to provide a categorisation based on columns 1 and 2 using the following criteria:
if A equals 1, column 3 should be YES
if B equals 2, column 3 should be YES
if C equals 3, column 3 should be YES
All other instances, column 3 should be NO
PPT
1
2
3
1
A
1
YES
2
A
2
NO
3
A
3
NO
4
B
1
NO
5
B
2
YES
6
B
3
NO
7
C
1
NO
8
C
2
NO
9
C
3
YES

Make a mask with all of your conditions, and then you can make a new column set to 'NO' and change all valid rows to 'YES'
# Find rows where conditions are True
mask = ((df['1'] == 'A') & (df['2'] == 1)) | ((df['1'] == 'B') & (df['2'] == 2)) | ((df['1'] == 'C') & (df['2'] == 3))
# Create new column using rows
df['3'] = 'NO'
df.loc[mask, '3'] = 'YES'

Related

How to check pair of string values in a column, after grouping the dataframe using ID column?

My Doubt in a Table/Dataframe viewI have a dataframe containing 2 columns: ID and Code.
ID Code Flag
1 A 0
1 C 1
1 B 1
2 A 0
2 B 1
3 A 0
4 C 0
Within each ID, if Code 'A' exists with 'B' or 'C', then it should flag 1.
I tried Groupby('ID') with filter(). but it is not showing the perfect result. Could anyone please help ?

You can do the following:
First use pd.groupby('ID') and concatenate the codes using 'sum' to create a new column. Then assing the value 1 if a row contains A or B as Code and when the new column contains an A:
df['s'] = df.groupby('ID').Code.transform('sum')
df['Flag'] = 0
df.loc[((df.Code == 'B') | (df.Code == 'C')) & df.s.str.contains('A'), 'Flag'] = 1
df = df.drop(columns = 's')
Output:
ID Code Flag
0 1 A 0
1 1 C 1
2 1 B 1
3 2 A 0
4 2 B 1
5 3 A 0
6 4 C 0

You can use boolean masks, direct for B/C, per group for A, then combine them and convert to integer:
# is the Code a B or C?
m1 = df['Code'].isin(['B', 'C'])
# is there also a A in the same group?
m2 = df['Code'].eq('A').groupby(df['ID']).transform('any')
# if both are True, flag 1
df['Flag'] = (m1&m2).astype(int)
Output:
ID Code Flag
0 1 A 0
1 1 C 1
2 1 B 1
3 2 A 0
4 2 B 1
5 3 A 0
6 4 C 0

Is there any way to display duplicate column value once in multiple rows in SQL?

I have been trying to research and Google forever for this but I cannot find an answer. I have duplicate values in 1 column but I would like to display them only once. Is it even possible in SQL?
What I have:
A
B
C
A
2
3
A
2
4
B
4
4
B
3
4
C
3
9
What I would like:
A
B
C
A
2
3
A
4
B
4
4
B
3
4
C
9

Use this:
SELECT A,
CASE WHEN (LAG(B) OVER (ORDER A)) = B THEN '' ELSE CONVERT(VARCHAR,B) END AS B,
C FROM TABLENAME

How to make one column match duplicates in another column

This problem is out of my ability range and I can’t get anywhere with it beyond knowing I can probably use LEAD, LAG or maybe a cursor?
Here is a breakdown of the table and question:
row_id is always an IDENTITY(1, 1) column.
The set_id column always starts out in groups of 3s (two 0s for the first set_id, don't worry about why).
The letter column is alphabetic. There are varying counts of duplicates.
Here's the original table:
row_id
set_id
letter
1
0
A
2
0
A
3
1
A
4
1
B
5
1
B
6
2
B
7
2
B
8
2
C
9
3
C
10
3
C
11
3
D
12
4
D
13
4
D
14
4
D
What I need is a code that: if there is a duplicate letter in the next row, then the set_id in the next row should be the same as the previous row (alt_set_id).
If that doesn't make sense, here is the result I want:
row_id
set_id
letter
alt_set_id
1
0
A
0
2
0
A
0
3
1
A
0
4
1
B
1
5
1
B
1
6
2
B
1
7
2
B
1
8
2
C
2
9
3
C
2
10
3
C
2
11
3
D
3
12
4
D
3
13
4
D
3
14
4
D
3
Here's where I am with code so far, I'm not really close but I think I am on the right path:
SELECT
*,
CASE
WHEN letter = [letter in next row]
THEN 'yes'
ELSE 'no'
END AS 'next row a duplicate?',
'tbd' AS alt_row_id
FROM
(SELECT
*,
LEAD(letter) OVER (ORDER BY row_id) AS 'letter in next row'
FROM
sort_test) AS dt
WHERE
row_id = row_id
That query has the below result set, which is something I think I can work with, but it doesn't feel very efficient and I'm not yet getting the result needed in the alt_set_id column:
row_id
set_id
letter
letter in next row
next row a duplicate?
alt_set_id
1
0
A
A
yes
tbd
2
0
A
A
yes
tbd
3
1
A
B
no
tbd
4
1
B
B
yes
tbd
5
1
B
B
yes
tbd
6
2
B
B
yes
tbd
7
2
B
C
no
tbd
8
2
C
C
yes
tbd
9
3
C
C
yes
tbd
10
3
C
D
no
tbd
11
3
D
D
yes
tbd
12
4
D
D
yes
tbd
13
4
D
D
yes
tbd
14
4
D
NULL
no
tbd
Thanks for any help!

Based on your example data, you want the minimum set_id for each letter. If so, use window functions;
select t.*, min(set_id) over (partition by letter) as alt_set_id
from sort_test t;

It would appear if I understand correctly a simple correlated subquery will give you the desired result:
select *, (select Min(set_Id) from t t2 where t2.letter=t.letter) as alt_set_id
from t
See working DB Fiddle

most efficient way to set dataframe column indexing to other columns

I have a large Dataframe. One of my columns contains the name of others. I want to eval this colum and set in each row the value of the referenced column:
|A|B|C|Column|
|:|:|:|:-----|
|1|3|4| B |
|2|5|3| A |
|3|5|9| C |
Desired output:
|A|B|C|Column|
|:|:|:|:-----|
|1|3|4| 3 |
|2|5|3| 2 |
|3|5|9| 9 |
I am achieving this result using:
df.apply(lambda d: eval("d." + d['Column']), axis=1)
But it is very slow, even using swifter. Is there a more efficient way of performing this?

For better performance, use df.to_numpy():
In [365]: df['Column'] = df.to_numpy()[df.index, df.columns.get_indexer(df.Column)]
In [366]: df
Out[366]:
A B C Column
0 1 3 4 3
1 2 5 3 2
2 3 5 9 9

For Pandas < 1.2.0, use lookup:
df['Column'] = df.lookup(df.index, df['Column'])
From 1.2.0+, lookup is decprecated, you can just use a for loop:
df['Column'] = [df.at[idx, r['Column']] for idx, r in df.iterrows()]
Output:
A B C Column
0 1 3 4 3
1 2 5 3 2
2 3 5 9 9

Since lookup is going to decprecated try numpy method with get_indexer
df['new'] = df.values[df.index,df.columns.get_indexer(df.Column)]
df
Out[75]:
A B C Column new
0 1 3 4 B 3
1 2 5 3 A 2
2 3 5 9 C 9

Updating multiple columns based on multiple conditions

I've below table with some results for both Morning and Afternoon session (for different periods).
I would like to updated the results based on the simple condition:
Check if in 2 following morning sessions there was a change - if not add 5 to the score:
Example: ID=1, Mor2=C, Mor3=C so Score_M3 = 5+5= 10 (new value). All updated values are marked in the 'Wanted' table.
How can I write this in SQL? I will have a lot of columns and IDs.
My dataset:
ID Mor1 Aft1 Mor2 Aft2 Mor3 Aft3 Score_M1 Score_A1 Score_M2 Score_A2 Score_M3 Score_A3
1 A A C B C B 1 1 1 1 5 6
2 C C C B C B 1 1 1 1 4 5
3 A A A A A A 1 1 1 1 4 1
Wanted :
ID Mor1 Aft1 Mor2 Aft2 Mor3 Aft3 Score_M1 Score_A1 Score_M2 Score_A2 Score_M3 Score_A3
1 A A C B C B 1 1 1 1 *10 6
2 C C C B C B 1 1 *6 1 *9 5
3 A A A A A A 1 1 *6 1 *9 1

Here is the SQL to get you started. You can add many more columns as you see fit.
Can we restate as SAME, rather than Change?
If Mor1 = Mor2 then add +5 to Score2
If Mor2 = Mor3 then add +5 to Score3
UPDATE [StackOver].[dbo].[UpdateMultiCols]
SET
[Score_M1] = Score_M1
,[Score_M2] = Score_M2 +
Case When Mor1 = Mor2 Then 5 else 0 End
,[Score_M3] = Score_M3 +
Case When Mor2 = Mor3 Then 5 else 0 End
GO

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Create new column based on information in two other columns - conditional-statements

Related

How to check pair of string values in a column, after grouping the dataframe using ID column?

Is there any way to display duplicate column value once in multiple rows in SQL?

How to make one column match duplicates in another column

most efficient way to set dataframe column indexing to other columns

Updating multiple columns based on multiple conditions

Categories

Resources