Get comma delimited distinct rows - sql

I have a table similar to below. What I am trying to build is a query that returns distinct vendor concatenated string in order of EID desc. In below example it should return
045-FH;799-HD;67-3M
Eid Vendor
1 67-3M
2 67-3M
3 67-3M
4 799-HD
5 799-HD
6 045-FH
7 045-FH
This is what SQL query I have but unable to use order by and distinct at same time. Any help is appreciated.
select distinct vendor
from tblVendor
order by Eid

You could try something like this. The CTE ensures the list is unique. The ordering of the string aggregation is handled by WITHIN GROUP (ORDER BY Eid DESC).
with unq_cte as (
select *, row_number() over (partition by Vendor order by Eid desc) rn
from (values (1, '67-3M'),
(2, '67-3M'),
(3, '67-3M'),
(4, '799-HD'),
(5, '799-HD'),
(6, '045-FH'),
(7, '045-FH')) tblVendor(Eid, Vendor))
select string_agg(Vendor, ',') within group (order by Eid desc)
from unq_cte
where rn=1;
(No column name)
045-FH,799-HD,67-3M
[Edit] Alternately, you could use SELECT TOP 1 WITH TIES to ensure the vendor list is unique
with unq_cte as (
select top 1 with ties *
from (values (1, '67-3M'),
(2, '67-3M'),
(3, '67-3M'),
(4, '799-HD'),
(5, '799-HD'),
(6, '045-FH'),
(7, '045-FH')) tblVendor(Eid, Vendor)
order by row_number() over (partition by Vendor order by Eid desc))
select string_agg(Vendor, ',') within group (order by Eid desc)
from unq_cte;
[Edit 2] Prior to 2017 you could use STUFF and FOR XML to aggregate the string (instead of STRING_AGG)
with unq_cte as (
select top 1 with ties *
from (values (1, '67-3M'),
(2, '67-3M'),
(3, '67-3M'),
(4, '799-HD'),
(5, '799-HD'),
(6, '045-FH'),
(7, '045-FH')) tblVendor(Eid, Vendor)
order by row_number() over (partition by Vendor order by Eid desc))
select stuff((select ' ' + vendor
from unq_cte
order by Eid desc
for xml path('')), 1, 1, '');

Related

How to filter DISTINCT records and ordering them using LISTAGG function

SELECT
s_id
,CASE WHEN LISTAGG(X.item_id, ',') WITHIN GROUP (ORDER BY TRY_TO_NUMBER(Z.item_pg_nbr))= '' THEN NULL
ELSE LISTAGG (X.item_id, ',') WITHIN GROUP (ORDER BY TRY_TO_NUMBER(Z.item_pg_nbr))
END AS item_id_txt
FROM table_1 X
JOIN table_2 Z
ON Z.cmn_id = X.cmn_id
WHERE s_id IN('38301','40228')
GROUP BY s_id;
When I run the above query, I'm getting the same values repeated for ITEM_ID_TXT column. I want to display only the DISTINCT values.
S_ID ITEM_ID_TXT
38301 618444,618444,618444,618444,618444,618444,36184
40228 616162,616162,616162,616162,616162,616162,616162
I also want the concatenated values to be ordered by item_pg_nbr
I can use DISTINCT in the LISTAGG function but that won't give the result ordered by item_pg_nbr.
Need your inputs on this.
Since you cannot use different columns for the distinct and order by within group, one approach would be:
1 Deduplicate while grabbing the minimum item_pg_nbr.
2 listagg and order by the minimum item_pg_nbr.
create or replace table T1(S_ID int, ITEM_ID int, ITEM_PG_NBR int);
insert into T1 (S_ID, ITEM_ID, ITEM_PG_NBR) values
(1, 1, 3),
(1, 2, 9), -- Adding a non-distinct ITEM_ID within group
(1, 2, 2),
(1, 3, 1),
(2, 1, 1),
(2, 2, 2),
(2, 3, 3);
with X as
(
select S_ID, ITEM_ID, min(ITEM_PG_NBR) MIN_PG_NBR
from T1 group by S_ID, ITEM_ID
)
select S_ID, listagg(ITEM_ID, ',') within group (order by MIN_PG_NBR)
from X group by S_ID
;
I guess the question then becomes what happens when you have duplicates within group? It would seem logical that the minimum item_pg_nbr should be used for the order by, but you could just as easily use the max or some other value.

SQL - return multiple values in the same column

Using SSMS, Table that looks like this
Agent
Location
A
1
A
2
B
3
B
4
How I run a query to get:
Agent
Location
A
1,2
B
3,4
You could try a Self Join on the Agent field like this:
SELECT
AGENT_A as AGENT,
CONCAT(CONCAT(LOCATION_A, ','), LOCATION_B) as LOCATION
FROM (
SELECT
A.AGENT as AGENT_A,
A.LOCATION as LOCATION_A,
B.AGENT as AGENT_B,
B.LOCATION as LOCATION_B
FROM SSMS as A
LEFT JOIN SSMS as B
on A.Agent = B.Agent) as T
WHERE LOCATION_A < LOCATION_B
Here you can see a Demo: https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=00cb9ecb7f0584c2436a0ee6bca6a30b
I use SQL Server 2019, but it also works for Azure sql db. So, if you want to return only distinct values, I'd suggest using rank() over() to discard any duplicated values from other agents. There is only one drawback, first unique location would be received by the first available agent.
In the code it is more clear to understand:
Create table Agents
(
Agent char(1),
Location int
)
insert into Agents
VALUES
('A', 1),
('A', 2),
('A', 6),
('B', 3),
('B', 4),
('C', 1),
('C', 4),
('C', 5)
select Agent, STRING_AGG([Location], ',') WITHIN GROUP (ORDER BY Location ASC) as Locations
from
(
select Agent, Location, rank() over (partition by [Location] order by Agent) as rnk
from Agents
) as t
--will return agents with distinct locations, because they have the rank equals to 1
where t.rnk = 1
group by Agent
Here is link to test it: SQLize Online
A recursive CTE:
WITH cte1 as (
SELECT
Agent,
CAST(Location AS VARCHAR(MAX)) Location,
row_number() over (partition by Agent order by Location) R
FROM SSMS
),
ctec as (
SELECT Agent, count(*) as c
FROM SSMS
GROUP BY Agent),
cte2 (Agent, Location, i, L) as (
SELECT
Agent,
CONCAT(Location,'') Location,
1 as i ,
Location L
from cte1
where R=1
union all
select
cte2.Agent,
CONCAT(cte2.Location, ',', cte1.Location),
i+1,
cte1.Location
from cte1
inner join cte2 on cte2.Agent=cte1.Agent
and cte1.Location > cte2.Location and cte1.R = i+1
inner join ctec on cte2.Agent= ctec.Agent
where i < ctec.c
)
SELECT Agent,Location
FROM cte2
WHERE i=(select c from ctec where ctec.Agent=cte2.Agent)
ORDER BY Agent;
see: DBFIDDLE
output, with some added data:
INSERT INTO SSMS VALUES ('C', '5');
INSERT INTO SSMS VALUES ('C', '6');
INSERT INTO SSMS VALUES ('C', '7');
INSERT INTO SSMS VALUES ('D', '5');
INSERT INTO SSMS VALUES ('D', '3');
INSERT INTO SSMS VALUES ('D', '1');
INSERT INTO SSMS VALUES ('D', '2');
Agent
Location
A
1,2
B
3,4
C
5,6,7
D
1,2,3,5

highest consecutive values in a column SQL

I have below table
create table test (Id int, Name char);
insert into test values
(1, 'A'),
(2, 'A'),
(3, 'B'),
(4, 'B'),
(5, 'B'),
(6, 'B'),
(7, 'C'),
(8, 'B'),
(9, 'B');
I want to print the Name that appears consecutively for at least four times
Expected Output:
Name
B
I have tried in different ways similar to below SQL (resulted in two values B & C) but nothing worked.
My sql attempt:
select Name from
(select t.*, row_number() over (order by Id asc) as grpcnt,
row_number() over (partition by Name order by Id) as grpcnt1 from t) test
where (grpcnt-grpcnt1)>=3
group by Name,(grpcnt-grpcnt1) ;
Try removing the where clause and applying your filter in a having clause based on the counts. Moreover, since you are interested in at least four times, your filter should be >=4. See eg using your modified query:
select
Name
from (
select
*,
row_number() over (order by Id asc) as grpcnt,
row_number() over (partition by Name order by Id) as grpcnt1
from test
) t
group by Name,(grpcnt-grpcnt1)
HAVING COUNT(Name)>=4;
View working demo on db fiddle
If your id is your counter you can do this:
select *
from test t
where exists (
select count(*)
from test
where name='B'
and id <= t.id and id > (t.id - 4)
having count(*) = 4
);

How to tag a group of repeating items if the ids are consecutive for n rows?

Building in the test for consecutive ids is proving difficult without breaking it down into parts or using a cursor which I'd like to avoid.
pseudo query -
SELECT all
FROM table with the same description on multiple adjacent rows for >= 4 rows
and set tag = 'y' and order by id
(id,description, tag),
(1, 'xxx', 'n'),
(2, 'xxx', 'n'),
(3, 'xxx', 'n'),
(7, 'xxx', 'n'),
(5, 'xxx', 'n'),
(8, 'xxx', 'n'),
(4, 'xxx', 'n'),
(6, 'zzz', 'n')
desired result
(1, 'xxx', 'y')
(2, 'xxx', 'y')
(3, 'xxx', 'y')
(4, 'xxx', 'y')
(5, 'xxx', 'y')
This is called as gaps and island problem. Something like this should work
;with cte as
(SELECT id,
description,
tag = 'y' ,
cnt = Count(*)over(partition by description, grp)
FROM (SELECT *,
grp = Sum(CASE WHEN prev_description = description THEN 0 ELSE 1 END)Over(Order by id)
FROM (SELECT *,
prev_description = Lag(description) OVER(ORDER BY id)
FROM Yourtable) a) b
GROUP BY id, description, grp
)
Select * from cte
Where cnt >= 4
Another approach using Row_Number
;with cte as
(SELECT id,
description,
tag = 'y' ,
cnt = Count(*)over(partition by description, grp)
FROM (select Grp = row_number()over(order by id) -
row_number()over(partition by description order by id), *
from Yourtable) b
GROUP BY id, description, grp)
Select * from cte
Where cnt >= 4
I think this will do it
select *, 'y' as 'newTag'
from ( select *
, count(*) over (partition by [description], grp) as 'grpSize'
from ( select *
, ( [id] - row_number() over (partition by [description] order by [id]) ) as grp
from [consecutive]
) tt
) ttt
where grpSize >= 4
order by [description], grp, [id]

How can I do a distinct sum?

I am trying to create a "score" statistic which is derived from the value of a certain column, calculated as the sum of a case expression. Unfortunately, the query structure needs to be a full outer join (this is simplified from the actual query, and the join structure survives from the original code), and thus the sum is incorrect, since each row may occur many times. I could group by the unique key; however, that breaks other aggregate functions that are in the same query.
What I really want to do is sum (case when ... distinct claim_id) which of course does not exist; is there an approach that will do what I need? Or does this have to be two queries?
This is on redshift, in case it matters.
create table t1 (id int, proc_date date, claim_id int, proc_code char(1));
create table t2 (id int, diag_date date, claim_id int);
insert into t1 (id, proc_date, claim_id, proc_code)
values (1, '2012-01-01', 0, 'a'),
(2, '2009-02-01', 1, 'b'),
(2, '2019-02-01', 2, 'c'),
(2, '2029-02-01', 3, 'd'),
(3, '2016-04-02', 4, 'e'),
(4, '2005-01-03', 5, 'f'),
(5, '2008-02-03', 6, 'g');
insert into t2 (id, diag_date, claim_id)
values (4, '2004-01-01', 20),
(5, '2010-02-01', 21),
(6, '2007-04-02', 22),
(5, '2011-02-01', 23),
(6, '2008-04-02', 24),
(5, '2012-02-01', 25),
(6, '2009-04-02', 26),
(7, '2002-01-03', 27),
(8, '2001-02-03', 28);
select id, sum(case when proc_code='a' then 5
when proc_code='b' then 10
when proc_code='c' then 15
when proc_code='d' then 20
when proc_code='e' then 25
when proc_code='f' then 30
when proc_code='g' then 35 end), count(distinct t1.claim_id) as proc_count, min(proc_date) as min_proc_date
from t1 full outer join t2 using (id) group by id order by id;
You can separate out your conditional aggregates into a cte or subquery and use OVER(PARTITION BY id) to get an id level aggregate without grouping, something like this:
with cte AS (SELECT *,sum(case when proc_code='a' then 5
when proc_code='b' then 10
when proc_code='c' then 15
when proc_code='d' then 20
when proc_code='e' then 25
when proc_code='f' then 30
when proc_code='g' then 35 end) OVER(PARTITION BY id) AS Some_Sum
, min(proc_date) OVER(PARTITION BY id) as min_proc_date
FROM t1
)
select id
, Some_Sum
, count(distinct cte.claim_id) as proc_count
, min_proc_date
from cte
full outer join t2 using (id)
group by id,Some_Sum,min_proc_Date
order by id;
Demo: SQL Fiddle
Note that you'll have to add these aggregates to the GROUP BY in the outer query, and the fields in your PARTITION BY should match the t1 fields you previously used in GROUP BY, in this case just id, but if your full query had other t1 fields in the GROUP BY be sure to add them to the PARTITION BY
You can use a subquery (by id and id_claim) and then regroup:
with base as (
select id, avg(case when proc_code='a' then 5
when proc_code='b' then 10
when proc_code='c' then 15
when proc_code='d' then 20
when proc_code='e' then 25
when proc_code='f' then 30
when proc_code='g' then 35 end) as value_proc,
t1.claim_id , min(proc_date) as min_proc_date
from t1 full outer join t2 using (id) group by id, t1.claim_id order by id, t1.claim_id)
select id, sum(value_proc), count(distinct claim_id) as proc_count, min(min_proc_date) as min_proc_date
from base
group by id
order by id;
See that I sugest avg for the internal subquery, but if you are sure that the same claim_id have the same letter you can use max or min and that was integer. If not is prefer this.