Mark the record with the lowest value in a group in SQL - sql

I have a table that looks like the below:
ID
ID2
Name
111
223
ABC
111
225
ABC
111
227
ABC
113
234
DEF
113
242
DEF
113
248
DEF
113
259
DEF
113
288
DEF
What I am trying to achieve is to mark the record that has the lowest value in the ID2 table in every ID1 group doing a select statement, e.g.:
ID1
ID2
Name
R
111
223
ABC
Y
111
225
ABC
111
227
ABC
113
234
DEF
Y
113
242
DEF
113
248
DEF
113
259
DEF
113
288
DEF
116
350
GHI
Y
116
356
GHI
How do I achieve this in a SELECT statement?

The window functions should to the trick . Use dense_rank() if you want to see ties.
Select *
,R = case when row_number() over (partition by ID1,Name order by ID2) = 1
then 'Y'
else ''
end
From YourTable
I should add... The window functions can be invaluable. They are well worth your time experimenting with them.

Related

SQL to find related rows in Loop in ANSI SQL or Snowflake SQL

I have a requirement where I need to link all related CUSTOMER ID and assign a Unified Cust ID to all the related Cust_id.
Ex: for below data,
INPUT DATA
PK_ID CUST_ID_1 CUST_ID_2 CUST_ID_3
1 123 456 567
2 898 567 780
3 999 780 111
4 111 222 333
Based on CUST_ID_1/CUST_ID_2/CUST_ID_3 need to link all the and assign a Unified ID to all the rows.
OUTPUT DATA
Unified ID CUST_ID_1 CUST_ID_2 CUST_ID_3
1000 123 456 567
1000 898 567 780
1000 999 780 111
1000 111 222 333
Trying to perform Self Join but it cannot be definite. Is there a function or ANSI SQL feature which can help in this?
What i have tried,
CREATE TEMP TBL_TEMP AS(
SELECT A.PK_ID
FROM TBL A
LEFT JOIN TBL B
ON A.CUST_ID_1=B.CUST_ID_1
AND A.PK_ID<>B.PK_ID)
UPDATE TBL
FROM TBL_TEMP
SET UNIFIED_ID=SEQ_UNIF_ID.nextval
WHERE TBL.PK_ID=TBL_TEMP.PK_ID
This update i have to write for each column and multiple times.
If you are ok with gap in sequences then following is what I can come up with as of now.
update cust_temp a
set unified_id = t.unified_id
from
(
select
case
when (select count(*) from cust_temp b
where arrays_overlap(array_construct(a.cust_id_1,a.cust_id_2,a.cust_id_3),
array_construct(b.cust_id_1,b.cust_id_2,b.cust_id_3)))>1 -- match across data-set
then 1000 -- same value for common rows
else
ts.nextval --- using sequence for non-common rows
end unified_id,
a.cust_id_1,a.cust_id_2,a.cust_id_3
from cust_temp a, table(getnextval(SEQ_UNIF_ID)) ts) t
where t.cust_id_1 = a.cust_id_1
and t.cust_id_2 = a.cust_id_2
and t.cust_id_3 = a.cust_id_3;
Updated data-set
select * from cust_temp;
UNIFIED_ID
CUST_ID_1
CUST_ID_2
CUST_ID_3
1000
123
456
567
1000
898
567
780
1000
111
222
333
20000
100
200
300
1000
999
780
111
1000
234
123
901
23000
260
360
460
24000
160
560
760
Original data set -
select * from cust_temp;
UNIFIED_ID
CUST_ID_1
CUST_ID_2
CUST_ID_3
NULL
123
456
567
NULL
898
567
780
NULL
111
222
333
NULL
100
200
300
NULL
999
780
111
NULL
234
123
901
NULL
260
360
460
NULL
160
560
760
Arrays_overlap logic is thanks to #Simeon.
Following procedure can be used -
EXECUTE IMMEDIATE $$
DECLARE
duplicate number;
x number;
BEGIN
duplicate := (select count(cnt) from (select a.unified_id,count(*) cnt from cust_temp a,
cust_temp b
where
arrays_overlap(array_construct(a.cust_id_1,a.cust_id_2,a.cust_id_3),
array_construct(b.cust_id_1,b.cust_id_2,b.cust_id_3))
AND a.cust_id_1 != b.cust_id_1
AND a.cust_id_2 != b.cust_id_2
AND a.cust_id_3 != b.cust_id_3
group by a.unified_id) where cnt>1
);
for x in 1 to duplicate do
update cust_temp a
set a.unified_id = (select min(b.unified_id) uid from cust_temp b
where arrays_overlap(array_construct(a.cust_id_1,a.cust_id_2,a.cust_id_3),
array_construct(b.cust_id_1,b.cust_id_2,b.cust_id_3)));
end for;
END;
$$
;
Which will produce following output dataset -
UNIFIED_ID
CUST_ID_1
CUST_ID_2
CUST_ID_3
1000
100
200
300
2000
123
456
567
2000
898
567
780
2000
111
222
333
2000
999
780
111
2000
234
123
901
7000
260
360
460
8000
160
560
760
8000
186
160
766
For an input data-set as -
UNIFIED_ID
CUST_ID_1
CUST_ID_2
CUST_ID_3
1000
100
200
300
2000
123
456
567
3000
898
567
780
4000
111
222
333
5000
999
780
111
6000
234
123
901
7000
260
360
460
8000
160
560
760
9000
186
160
766

SQL Server: LAG() OVER (ORDER BY Y) apply same result for duplicate Y value

When I use
LAG(Static_Col_2, 1) OVER (ORDER BY Static_Col_1) AS LAGged_Col
I get these results:
Static_Col_1 Static_Col_2 LAGged_Col
----------------------------------------
1 456 NULL
2 457 456
3 458 457
4 459 458
5 460 459
5 461 460
5 462 461
But I want:
Static_Col_1 Static_Col_2 LAGged_Col
----------------------------------------
1 456 NULL
2 457 456
3 458 457
4 459 458
5 460 459
5 461 459
5 462 459
When '5' repeats the LAG should point to '4' every time.
I don't think you can do this in SQL Server with a simple window function. You can nest window functions or use a group by/join:
select t.*, tt.prev_col2
from t join
(select col1, lag(max(col2)) over (order by col1) as prev_col2
from t
group by col1
) tt
on t.col1 = tt.col1
order by 1;
Here is a db<>fiddle.

Retrieving data from unnormalized vertically ordered table

I have the following relation:
employeevalue(id, name, value, code)
id name value code
101 bobby 150 100
101 bobby 12 150
101 bobby 14.6 200
102 mary 189 100
102 mary 128 150
102 mary 112 200
103 john 112 100
103 john 13 150
103 john 76 200
Where code 100 is value1, 150 is value2 and 200 is value3. How could I write an SQL statement to retrieve the following from this table?
id name value1 value2 value3
101 bobby 150 12 14.6
102 mary 189 128 112
103 john 112 13 76
You can do this with conditional aggregation:
select id,
max(case when code = 100 then value end) as value1,
max(case when code = 150 then value end) as value2,
max(case when code = 200 then value end) as value3
from table t
group by id;

group by column not having specific value

I am trying to obtain a list of Case_Id's where the case does not contain a specific RoleId using Microsoft Sql Server 2012.
For example, I would like to obtain a collection of Case_Id's that do not contain a RoleId of 4.
So from the data set below the query would exclude Case_Id's 49, 50, and 53.
Id RoleId Person_Id Case_Id
--------------------------------------
108 4 108 49
109 1 109 49
110 4 110 50
111 1 111 50
112 1 112 51
113 2 113 52
114 1 114 52
115 7 115 53
116 4 116 53
117 3 117 53
So far I have tried the following
SELECT Case_Id
FROM [dbo].[caseRole] cr
WHERE cr.RoleId!=4
GROUP BY Case_Id ORDER BY Case_Id
The not exists operator seems to fit your need exactly:
SELECT DISTINCT Case_Id
FROM [dbo].[caseRole] cr
WHERE NOT EXISTS (SELECT *
FROM [dbo].[caseRole] cr_inner
WHERE cr_inner.Case_Id = cr.case_id
AND cr_inner.RoleId = 4);
Just add a having clause instead of where:
SELECT Case_Id
FROM [dbo].[caseRole] cr
GROUP BY Case_Id
HAVING SUM(case when cr.RoleId = 4 then 1 else 0 end) = 0
ORDER BY Case_Id;

Using sequences to create group ID

I'm attempting to create group_ids based on a set of item_ids. The only indication that the item_ids are part of a single group is the fact that item_ids are sequential. For example, based on the first two columns below, the output I want is the third:
item item_id group_id
ABC 282 2
ABC 283 2
ABC 284 2
ABC 285 2
ABC 051 3
ABC 052 3
ABC 189 4
ABC 231 5
ABC 232 5
ABC 233 5
ABC 234 5
ABC 247 6
ABC 248 6
ABC 249 6
ABC 250 6
ABC 091 7
ABC 092 7
The group_id doesn't necessarily have to be sequential itself, it only has to be unique. I attempted this with the following code:
create sequence seq
start with 1
minvalue 1
increment by 1
cache 20;
select seq.nextval from dual; --to initialize the sequence
select
item,
item_id,
case when diff = 1 then seq.currval else seq.nextval end group_id
from
(
select
item,
item_id,
(id - lag(id, 1, 0) over (order by 1) diff
from
(
select
item,
item_id
from
table
)
);
But get the following output:
item item_id group_id
ABC 282 2
ABC 283 3
ABC 284 4
ABC 285 5
ABC 051 6
ABC 052 7
ABC 189 8
ABC 231 9
ABC 232 10
ABC 233 11
ABC 234 12
ABC 247 13
ABC 248 14
ABC 249 15
ABC 250 16
ABC 091 17
ABC 092 18
When looking for the cause of the problem, I found an excellent explanation by user ShannonSeverance that details why my solution won't work. However, it didn't provide any suggestions on how to move forward.
Does anyone have any ideas?
You have a problem, because SQL tables are inherently unordered. The following "should" logically work, although it won't in practice:
select ii.*, (item_id - rownum) as grp_id
from item_ids ii;
A sequence of item_ids in order minus the row number is constant. You can use that for a group, at least for a given item. To handle multiple items, concatenate the values together:
select ii.*, item||'-'||(item_id - rownum) as grp_id
from item_ids ii;
To really make this work, you need to add an order by -- this guarantees the ordering of the results from the select. This might work, assuming that there are "holes" between the groups:
select ii.*, item||'-'||(item_id - rownum) as grp_id
from item_ids ii
order by item, item_id;
Otherwise, you need some other column to determine the proper ordering for the items.