SQL compares the value of 2 columns and select the column with max value row-by-row - sql

I have table something like:
GROUP
NAME
Value_1
Value_2
1
ABC
0
0
1
DEF
4
4
50
XYZ
6
6
50
QWE
6
7
100
XYZ
26
2
100
QWE
26
2
What I would like to do is to groupby group and select the name with highest value_1. If their value_1 are the same, compare and select the max with value_2. If they're still the same, select the first one.
The output will be something like:
GROUP
NAME
Value_1
Value_2
1
DEF
4
4
50
QWE
6
7
100
XYZ
26
2
The challenge for me here is I don't know how many categories in NAME so a simple case when is not working. Thanks for help

You can use window functions to solve the bulk of your problem:
select t.*
from (select t.*,
row_number() over (partition by group order by value1 desc, value2 desc) as seqnum
from t
) t
where seqnum = 1;
The one caveat is the condition:
If they're still the same, select the first one.
SQL tables represent unordered (multi-) sets. There is no "first" one unless a column specifies the ordering. The best you can do is choose an arbitrary value when all the other values are the same.
That said, you might have another column that has an ordering. If so, add that as a third key to the order by.

Related

SQL - Need to find duplicates where one column can have multiple values

I am pretty sure this SQL requires using GROUP BY and HAVING, but not sure how to write it.
I have a table similar to this:
ID
Cust#
Order#
ItemCode
DataPoint1
DataPoint2
1
001
123
I
xxxyyyxxx
123456
2
001
123
Insert
xxxyyyxxx
123456
3
001
123
Delete
asdf
9999
4
001
123
D
asdf
9999
In this table Rows 1 & 2 are effectively duplicates, as are rows 3 & 4.
This is determined by the ItemCode having the value of 'I' or 'Insert' in rows 1 & 2. And 'D' or 'Delete' in rows 3 & 4.
How could I write a SQL select statement to return rows 2 and 4, as I am interested in pulling out the duplicated rows with the higher ID value.
Thanks for any help.
Replace the "offending" column with a consistent value. Then, you can use row_number() or a similar mechanism:
select t.*
from (select t.*,
row_number() over (partition by Cust#, Order#, left(ItemCode, 1), DataPoint1, DataPoint2
order by id asc
) as seqnum
from t
) t
where seqnum > 1;
Note: Not all databases support left(), but all support the functionality somehow. This does assume that the first character of the ItemCode is sufficient to identify identical rows, regardless of the value.

how to sum Stacked In Line sql

i have a table like this
code Quantity
1 5
1 6
2 2
2 1-
3 4
.
.
how can made it like this
code Quantity remain
1 5 5
1 6 11
2 2 2
2 1- 1
3 4 4
.
.
Your query presumes an ordering of the rows. I will assume you have such a column.
Assuming the values are numbers (1- ???), then you can simply use a cumulative sum:
select t.*,
sum(quantity) over (partition by code order by ?) as remaining
from t;
The ? is for the column that specifies the ordering.
You can do a window sum, but you need a column to unambiguously order the records within groups sharing the same code. I assumed that this column is called id.
select t.*, sum(quantity) over(partition by code order by id) remain from mytable t

sql - select single ID for each group with the lowest value

Consider the following table:
ID GroupId Rank
1 1 1
2 1 2
3 1 1
4 2 10
5 2 1
6 3 1
7 4 5
I need an sql (for MS-SQL) select query selecting a single Id for each group with the lowest rank. Each group needs to only return a single ID, even if there are two with the same rank (as 1 and 2 do in the above table). I've tried to select the min value, but the requirement that only one be returned, and the value to be returned is the ID column, is throwing me.
Does anyone know how to do this?
Use row_number():
select t.*
from (select t.*,
row_number() over (partition by groupid order by rank) as seqnum
from t
) t
where seqnum = 1;

Calculate "position in run" in SQL

I have a table of consecutive ids (integers, 1 ... n), and values (integers), like this:
Input Table:
id value
-- -----
1 1
2 1
3 2
4 3
5 1
6 1
7 1
Going down the table i.e. in order of increasing id, I want to count how many times in a row the same value has been seen consecutively, i.e. the position in a run:
Output Table:
id value position in run
-- ----- ---------------
1 1 1
2 1 2
3 2 1
4 3 1
5 1 1
6 1 2
7 1 3
Any ideas? I've searched for a combination of windowing functions including lead and lag, but can't come up with it. Note that the same value can appear in the value column as part of different runs, so partitioning by value may not help solve this. I'm on Hive 1.2.
One way is to use a difference of row numbers approach to classify consecutive same values into one group. Then a row number function to get the desired positions in each group.
Query to assign groups (Running this will help you understand how the groups are assigned.)
select t.*
,row_number() over(order by id) - row_number() over(partition by value order by id) as rnum_diff
from tbl t
Final Query using row_number to get positions in each group assigned with the above query.
select id,value,row_number() over(partition by value,rnum_diff order by id) as pos_in_grp
from (select t.*
,row_number() over(order by id) - row_number() over(partition by value order by id) as rnum_diff
from tbl t
) t

Delete rows, which are duplicated and follow each other consequently

It's hard to formulate, so i'll just show an example and you are welcome to edit my question and title.
Suppose, i have a table
flag id value datetime
0 b 1 343 13
1 a 1 23 12
2 b 1 21 11
3 b 1 32 10
4 c 2 43 11
5 d 2 43 10
6 d 2 32 9
7 c 2 1 8
For each id i want to squeze the table by flag columns such that all duplicate flag values that follow each other collapse to one row with sum aggregation. Desired result:
flag id value
0 b 1 343
1 a 1 23
2 b 1 53
3 c 2 75
4 d 2 32
5 c 2 1
P.S: I found functions like CONDITIONAL_CHANGE_EVENT, which seem to be able to do that, but the examples of them in docs dont work for me
Use the differnece of row number approach to assign groups based on consecutive row flags being the same. Thereafter use a running sum.
select distinct id,flag,sum(value) over(partition by id,grp) as finalvalue
from (
select t.*,row_number() over(partition by id order by datetime)-row_number() over(partition by id,flag order by datetime) as grp
from tbl t
) t
Here's an approach which uses CONDITIONAL_CHANGE_EVENT:
select
flag,
id,
sum(value) value
from (
select
conditional_change_event(flag) over (order by datetime desc) part,
flag,
id,
value
from so
) t
group by part, flag, id
order by part;
The result is different from your desired result stated in the question because of order by datetime. Adding a separate column for the row number and sorting on that gives the correct result.