Updating multiple columns based on multiple conditions - sql

I've below table with some results for both Morning and Afternoon session (for different periods).
I would like to updated the results based on the simple condition:
Check if in 2 following morning sessions there was a change - if not add 5 to the score:
Example: ID=1, Mor2=C, Mor3=C so Score_M3 = 5+5= 10 (new value). All updated values are marked in the 'Wanted' table.
How can I write this in SQL? I will have a lot of columns and IDs.
My dataset:
ID Mor1 Aft1 Mor2 Aft2 Mor3 Aft3 Score_M1 Score_A1 Score_M2 Score_A2 Score_M3 Score_A3
1 A A C B C B 1 1 1 1 5 6
2 C C C B C B 1 1 1 1 4 5
3 A A A A A A 1 1 1 1 4 1
Wanted :
ID Mor1 Aft1 Mor2 Aft2 Mor3 Aft3 Score_M1 Score_A1 Score_M2 Score_A2 Score_M3 Score_A3
1 A A C B C B 1 1 1 1 *10 6
2 C C C B C B 1 1 *6 1 *9 5
3 A A A A A A 1 1 *6 1 *9 1

Here is the SQL to get you started. You can add many more columns as you see fit.
Can we restate as SAME, rather than Change?
If Mor1 = Mor2 then add +5 to Score2
If Mor2 = Mor3 then add +5 to Score3
UPDATE [StackOver].[dbo].[UpdateMultiCols]
SET
[Score_M1] = Score_M1
,[Score_M2] = Score_M2 +
Case When Mor1 = Mor2 Then 5 else 0 End
,[Score_M3] = Score_M3 +
Case When Mor2 = Mor3 Then 5 else 0 End
GO

Related

Create new column based on information in two other columns

I have a data frame with 2 columns of information that I want to compare to create a new condition in a new column.
PPT
1
2
1
A
1
2
A
2
3
A
3
4
B
1
5
B
2
6
B
3
7
C
1
8
C
2
9
C
3
I want the new column to provide a categorisation based on columns 1 and 2 using the following criteria:
if A equals 1, column 3 should be YES
if B equals 2, column 3 should be YES
if C equals 3, column 3 should be YES
All other instances, column 3 should be NO
PPT
1
2
3
1
A
1
YES
2
A
2
NO
3
A
3
NO
4
B
1
NO
5
B
2
YES
6
B
3
NO
7
C
1
NO
8
C
2
NO
9
C
3
YES
Make a mask with all of your conditions, and then you can make a new column set to 'NO' and change all valid rows to 'YES'
# Find rows where conditions are True
mask = ((df['1'] == 'A') & (df['2'] == 1)) | ((df['1'] == 'B') & (df['2'] == 2)) | ((df['1'] == 'C') & (df['2'] == 3))
# Create new column using rows
df['3'] = 'NO'
df.loc[mask, '3'] = 'YES'

How to check pair of string values in a column, after grouping the dataframe using ID column?

My Doubt in a Table/Dataframe viewI have a dataframe containing 2 columns: ID and Code.
ID Code Flag
1 A 0
1 C 1
1 B 1
2 A 0
2 B 1
3 A 0
4 C 0
Within each ID, if Code 'A' exists with 'B' or 'C', then it should flag 1.
I tried Groupby('ID') with filter(). but it is not showing the perfect result. Could anyone please help ?
You can do the following:
First use pd.groupby('ID') and concatenate the codes using 'sum' to create a new column. Then assing the value 1 if a row contains A or B as Code and when the new column contains an A:
df['s'] = df.groupby('ID').Code.transform('sum')
df['Flag'] = 0
df.loc[((df.Code == 'B') | (df.Code == 'C')) & df.s.str.contains('A'), 'Flag'] = 1
df = df.drop(columns = 's')
Output:
ID Code Flag
0 1 A 0
1 1 C 1
2 1 B 1
3 2 A 0
4 2 B 1
5 3 A 0
6 4 C 0
You can use boolean masks, direct for B/C, per group for A, then combine them and convert to integer:
# is the Code a B or C?
m1 = df['Code'].isin(['B', 'C'])
# is there also a A in the same group?
m2 = df['Code'].eq('A').groupby(df['ID']).transform('any')
# if both are True, flag 1
df['Flag'] = (m1&m2).astype(int)
Output:
ID Code Flag
0 1 A 0
1 1 C 1
2 1 B 1
3 2 A 0
4 2 B 1
5 3 A 0
6 4 C 0

is there a function where I can do one hot encoding and removing duplicates in R?

I have this database
ID
LABEL
1
A
1
B
2
B
3
c
I'm trying to do an one hot encoding, which I was able to do. However, I also need to remove the duplicated IDs, so my one hot code appears to be like below:
ID
A
B
C
1
1
0
0
1
0
1
0
2
0
1
0
3
0
0
1
and I need this to be the final database
ID
A
B
C
1
1
1
0
2
0
1
0
3
0
0
1
this is my code
dummy <- dummyVars('~ .', data = data_to_be_encoded)
encoded_data <- data.frame(predict(dummy, newdata = data_to_be_encoded))

How to make one column match duplicates in another column

This problem is out of my ability range and I can’t get anywhere with it beyond knowing I can probably use LEAD, LAG or maybe a cursor?
Here is a breakdown of the table and question:
row_id is always an IDENTITY(1, 1) column.
The set_id column always starts out in groups of 3s (two 0s for the first set_id, don't worry about why).
The letter column is alphabetic. There are varying counts of duplicates.
Here's the original table:
row_id
set_id
letter
1
0
A
2
0
A
3
1
A
4
1
B
5
1
B
6
2
B
7
2
B
8
2
C
9
3
C
10
3
C
11
3
D
12
4
D
13
4
D
14
4
D
What I need is a code that: if there is a duplicate letter in the next row, then the set_id in the next row should be the same as the previous row (alt_set_id).
If that doesn't make sense, here is the result I want:
row_id
set_id
letter
alt_set_id
1
0
A
0
2
0
A
0
3
1
A
0
4
1
B
1
5
1
B
1
6
2
B
1
7
2
B
1
8
2
C
2
9
3
C
2
10
3
C
2
11
3
D
3
12
4
D
3
13
4
D
3
14
4
D
3
Here's where I am with code so far, I'm not really close but I think I am on the right path:
SELECT
*,
CASE
WHEN letter = [letter in next row]
THEN 'yes'
ELSE 'no'
END AS 'next row a duplicate?',
'tbd' AS alt_row_id
FROM
(SELECT
*,
LEAD(letter) OVER (ORDER BY row_id) AS 'letter in next row'
FROM
sort_test) AS dt
WHERE
row_id = row_id
That query has the below result set, which is something I think I can work with, but it doesn't feel very efficient and I'm not yet getting the result needed in the alt_set_id column:
row_id
set_id
letter
letter in next row
next row a duplicate?
alt_set_id
1
0
A
A
yes
tbd
2
0
A
A
yes
tbd
3
1
A
B
no
tbd
4
1
B
B
yes
tbd
5
1
B
B
yes
tbd
6
2
B
B
yes
tbd
7
2
B
C
no
tbd
8
2
C
C
yes
tbd
9
3
C
C
yes
tbd
10
3
C
D
no
tbd
11
3
D
D
yes
tbd
12
4
D
D
yes
tbd
13
4
D
D
yes
tbd
14
4
D
NULL
no
tbd
Thanks for any help!
Based on your example data, you want the minimum set_id for each letter. If so, use window functions;
select t.*, min(set_id) over (partition by letter) as alt_set_id
from sort_test t;
It would appear if I understand correctly a simple correlated subquery will give you the desired result:
select *, (select Min(set_Id) from t t2 where t2.letter=t.letter) as alt_set_id
from t
See working DB Fiddle

Pandas merge conflict rows by counts?

A conflict row is that two rows have same feature but with different label, like this:
feature label
a 1
a 0
Now, I want to merge these conflict rows to only one label getting from their counts. If I have more a 1, then a will be labeled as 1. Otherwise, a should be labeled as 0.
I can find these conflicts by df1=df.groupy('feature', as_index=Fasle).nunique(),df1 = df1[df1['label]==2]' , and their value counts by df2 = df.groupby("feature")["label"].value_counts().reset_index(name="counts").
But how to find these conflic rows and their counts in one Dataframe (df_conflict = ?), and then merge them by counts, (df_merged = merge(df))?
Lets take df = pd.DataFrame({"feature":['a','a','b','b','a','c','c','d'],'label':[1,0,0,1,1,0,0,1]}) as example.
feature label
0 a 1
1 a 0
2 b 0
3 b 1
4 a 1
5 c 0
6 c 0
7 d 1
df_conflict should be :
feature label counts
a 1 2
a 0 1
b 0 1
b 1 1
And df_merged will be:
feature label
a 1
b 0
c 0
d 1
I think you need first filter groups with count of unique values by DataFrameGroupBy.nunique with GroupBy.transform before SeriesGroupBy.value_counts:
df1 = df[df.groupby('feature')['label'].transform('nunique').gt(1)]
df_conflict = df1.groupby('feature')['label'].value_counts().reset_index(name='count')
print (df_conflict)
feature label count
0 a 1 2
1 a 0 1
2 b 0 1
3 b 1 1
For second get feature with labels by maximum occurencies:
df_merged = df.groupby('feature')['label'].agg(lambda x: x.value_counts().index[0]).reset_index()
print (df_merged)
feature label
0 a 1
1 b 0
2 c 0
3 d 1