highest consecutive values in a column SQL - sql

I have below table
create table test (Id int, Name char);
insert into test values
(1, 'A'),
(2, 'A'),
(3, 'B'),
(4, 'B'),
(5, 'B'),
(6, 'B'),
(7, 'C'),
(8, 'B'),
(9, 'B');
I want to print the Name that appears consecutively for at least four times
Expected Output:
Name
B
I have tried in different ways similar to below SQL (resulted in two values B & C) but nothing worked.
My sql attempt:
select Name from
(select t.*, row_number() over (order by Id asc) as grpcnt,
row_number() over (partition by Name order by Id) as grpcnt1 from t) test
where (grpcnt-grpcnt1)>=3
group by Name,(grpcnt-grpcnt1) ;

Try removing the where clause and applying your filter in a having clause based on the counts. Moreover, since you are interested in at least four times, your filter should be >=4. See eg using your modified query:
select
Name
from (
select
*,
row_number() over (order by Id asc) as grpcnt,
row_number() over (partition by Name order by Id) as grpcnt1
from test
) t
group by Name,(grpcnt-grpcnt1)
HAVING COUNT(Name)>=4;
View working demo on db fiddle

If your id is your counter you can do this:
select *
from test t
where exists (
select count(*)
from test
where name='B'
and id <= t.id and id > (t.id - 4)
having count(*) = 4
);

Related

How to filter a table based on queried ids from another table in Snowflake

I'm trying to filter a table based on the queried result from another table.
create temporary table test_table (id number, col_a varchar);
insert into test_table values
(1, 'a'),
(2, 'b'),
(3, 'aa'),
(4, 'a'),
(6, 'bb'),
(7, 'a'),
(8, 'c');
create temporary table test_table_2 (id number, col varchar);
insert into test_table_2 values
(1, 'aa'),
(2, 'bb'),
(3, 'cc'),
(4, 'dd'),
(6, 'ee'),
(7, 'ff'),
(8, 'gg');
Here I want to find out all the id's in test_table with value "a" in col_a, and then I want to filter for rows with one of these id's in test_table_2. I've tried this below way, but got an error: SQL compilation error: syntax error line 6 at position 39 unexpected 'cte'.
with cte as
(
select id from test_table
where col_a = 'a'
)
select * from test_table_2 where id in cte;
This approach below does work, but with large tables, it tends to be very slow. Is there a better more efficient way to scale to very large tables?
with cte as
(
select id from test_table
where col_a = 'a'
)
select t2.* from test_table_2 t2 join cte on t2.id=cte.id;
I would express this using exists logic:
SELECT id
FROM test_table_2 t2
WHERE EXISTS (
SELECT 1
FROM test_table t1
WHERE t2.id = t1.id AND
t1.col_a = 'a'
);
This has one advantage over a join in that Snowflake can stop scanning the test_table_2 table as soon as it finds a match.
your first error can be fixed as below. Joins are usually better suited for lookups compared to exists or in clause if you have a large table.
with cte as
(
select id from test_table
where col_a = 'a'
)
select * from test_table_2 where id in (select distinct id from cte);

How to filter DISTINCT records and ordering them using LISTAGG function

SELECT
s_id
,CASE WHEN LISTAGG(X.item_id, ',') WITHIN GROUP (ORDER BY TRY_TO_NUMBER(Z.item_pg_nbr))= '' THEN NULL
ELSE LISTAGG (X.item_id, ',') WITHIN GROUP (ORDER BY TRY_TO_NUMBER(Z.item_pg_nbr))
END AS item_id_txt
FROM table_1 X
JOIN table_2 Z
ON Z.cmn_id = X.cmn_id
WHERE s_id IN('38301','40228')
GROUP BY s_id;
When I run the above query, I'm getting the same values repeated for ITEM_ID_TXT column. I want to display only the DISTINCT values.
S_ID ITEM_ID_TXT
38301 618444,618444,618444,618444,618444,618444,36184
40228 616162,616162,616162,616162,616162,616162,616162
I also want the concatenated values to be ordered by item_pg_nbr
I can use DISTINCT in the LISTAGG function but that won't give the result ordered by item_pg_nbr.
Need your inputs on this.
Since you cannot use different columns for the distinct and order by within group, one approach would be:
1 Deduplicate while grabbing the minimum item_pg_nbr.
2 listagg and order by the minimum item_pg_nbr.
create or replace table T1(S_ID int, ITEM_ID int, ITEM_PG_NBR int);
insert into T1 (S_ID, ITEM_ID, ITEM_PG_NBR) values
(1, 1, 3),
(1, 2, 9), -- Adding a non-distinct ITEM_ID within group
(1, 2, 2),
(1, 3, 1),
(2, 1, 1),
(2, 2, 2),
(2, 3, 3);
with X as
(
select S_ID, ITEM_ID, min(ITEM_PG_NBR) MIN_PG_NBR
from T1 group by S_ID, ITEM_ID
)
select S_ID, listagg(ITEM_ID, ',') within group (order by MIN_PG_NBR)
from X group by S_ID
;
I guess the question then becomes what happens when you have duplicates within group? It would seem logical that the minimum item_pg_nbr should be used for the order by, but you could just as easily use the max or some other value.

SQL - return multiple values in the same column

Using SSMS, Table that looks like this
Agent
Location
A
1
A
2
B
3
B
4
How I run a query to get:
Agent
Location
A
1,2
B
3,4
You could try a Self Join on the Agent field like this:
SELECT
AGENT_A as AGENT,
CONCAT(CONCAT(LOCATION_A, ','), LOCATION_B) as LOCATION
FROM (
SELECT
A.AGENT as AGENT_A,
A.LOCATION as LOCATION_A,
B.AGENT as AGENT_B,
B.LOCATION as LOCATION_B
FROM SSMS as A
LEFT JOIN SSMS as B
on A.Agent = B.Agent) as T
WHERE LOCATION_A < LOCATION_B
Here you can see a Demo: https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=00cb9ecb7f0584c2436a0ee6bca6a30b
I use SQL Server 2019, but it also works for Azure sql db. So, if you want to return only distinct values, I'd suggest using rank() over() to discard any duplicated values from other agents. There is only one drawback, first unique location would be received by the first available agent.
In the code it is more clear to understand:
Create table Agents
(
Agent char(1),
Location int
)
insert into Agents
VALUES
('A', 1),
('A', 2),
('A', 6),
('B', 3),
('B', 4),
('C', 1),
('C', 4),
('C', 5)
select Agent, STRING_AGG([Location], ',') WITHIN GROUP (ORDER BY Location ASC) as Locations
from
(
select Agent, Location, rank() over (partition by [Location] order by Agent) as rnk
from Agents
) as t
--will return agents with distinct locations, because they have the rank equals to 1
where t.rnk = 1
group by Agent
Here is link to test it: SQLize Online
A recursive CTE:
WITH cte1 as (
SELECT
Agent,
CAST(Location AS VARCHAR(MAX)) Location,
row_number() over (partition by Agent order by Location) R
FROM SSMS
),
ctec as (
SELECT Agent, count(*) as c
FROM SSMS
GROUP BY Agent),
cte2 (Agent, Location, i, L) as (
SELECT
Agent,
CONCAT(Location,'') Location,
1 as i ,
Location L
from cte1
where R=1
union all
select
cte2.Agent,
CONCAT(cte2.Location, ',', cte1.Location),
i+1,
cte1.Location
from cte1
inner join cte2 on cte2.Agent=cte1.Agent
and cte1.Location > cte2.Location and cte1.R = i+1
inner join ctec on cte2.Agent= ctec.Agent
where i < ctec.c
)
SELECT Agent,Location
FROM cte2
WHERE i=(select c from ctec where ctec.Agent=cte2.Agent)
ORDER BY Agent;
see: DBFIDDLE
output, with some added data:
INSERT INTO SSMS VALUES ('C', '5');
INSERT INTO SSMS VALUES ('C', '6');
INSERT INTO SSMS VALUES ('C', '7');
INSERT INTO SSMS VALUES ('D', '5');
INSERT INTO SSMS VALUES ('D', '3');
INSERT INTO SSMS VALUES ('D', '1');
INSERT INTO SSMS VALUES ('D', '2');
Agent
Location
A
1,2
B
3,4
C
5,6,7
D
1,2,3,5

Get comma delimited distinct rows

I have a table similar to below. What I am trying to build is a query that returns distinct vendor concatenated string in order of EID desc. In below example it should return
045-FH;799-HD;67-3M
Eid Vendor
1 67-3M
2 67-3M
3 67-3M
4 799-HD
5 799-HD
6 045-FH
7 045-FH
This is what SQL query I have but unable to use order by and distinct at same time. Any help is appreciated.
select distinct vendor
from tblVendor
order by Eid
You could try something like this. The CTE ensures the list is unique. The ordering of the string aggregation is handled by WITHIN GROUP (ORDER BY Eid DESC).
with unq_cte as (
select *, row_number() over (partition by Vendor order by Eid desc) rn
from (values (1, '67-3M'),
(2, '67-3M'),
(3, '67-3M'),
(4, '799-HD'),
(5, '799-HD'),
(6, '045-FH'),
(7, '045-FH')) tblVendor(Eid, Vendor))
select string_agg(Vendor, ',') within group (order by Eid desc)
from unq_cte
where rn=1;
(No column name)
045-FH,799-HD,67-3M
[Edit] Alternately, you could use SELECT TOP 1 WITH TIES to ensure the vendor list is unique
with unq_cte as (
select top 1 with ties *
from (values (1, '67-3M'),
(2, '67-3M'),
(3, '67-3M'),
(4, '799-HD'),
(5, '799-HD'),
(6, '045-FH'),
(7, '045-FH')) tblVendor(Eid, Vendor)
order by row_number() over (partition by Vendor order by Eid desc))
select string_agg(Vendor, ',') within group (order by Eid desc)
from unq_cte;
[Edit 2] Prior to 2017 you could use STUFF and FOR XML to aggregate the string (instead of STRING_AGG)
with unq_cte as (
select top 1 with ties *
from (values (1, '67-3M'),
(2, '67-3M'),
(3, '67-3M'),
(4, '799-HD'),
(5, '799-HD'),
(6, '045-FH'),
(7, '045-FH')) tblVendor(Eid, Vendor)
order by row_number() over (partition by Vendor order by Eid desc))
select stuff((select ' ' + vendor
from unq_cte
order by Eid desc
for xml path('')), 1, 1, '');

How can I do a distinct sum?

I am trying to create a "score" statistic which is derived from the value of a certain column, calculated as the sum of a case expression. Unfortunately, the query structure needs to be a full outer join (this is simplified from the actual query, and the join structure survives from the original code), and thus the sum is incorrect, since each row may occur many times. I could group by the unique key; however, that breaks other aggregate functions that are in the same query.
What I really want to do is sum (case when ... distinct claim_id) which of course does not exist; is there an approach that will do what I need? Or does this have to be two queries?
This is on redshift, in case it matters.
create table t1 (id int, proc_date date, claim_id int, proc_code char(1));
create table t2 (id int, diag_date date, claim_id int);
insert into t1 (id, proc_date, claim_id, proc_code)
values (1, '2012-01-01', 0, 'a'),
(2, '2009-02-01', 1, 'b'),
(2, '2019-02-01', 2, 'c'),
(2, '2029-02-01', 3, 'd'),
(3, '2016-04-02', 4, 'e'),
(4, '2005-01-03', 5, 'f'),
(5, '2008-02-03', 6, 'g');
insert into t2 (id, diag_date, claim_id)
values (4, '2004-01-01', 20),
(5, '2010-02-01', 21),
(6, '2007-04-02', 22),
(5, '2011-02-01', 23),
(6, '2008-04-02', 24),
(5, '2012-02-01', 25),
(6, '2009-04-02', 26),
(7, '2002-01-03', 27),
(8, '2001-02-03', 28);
select id, sum(case when proc_code='a' then 5
when proc_code='b' then 10
when proc_code='c' then 15
when proc_code='d' then 20
when proc_code='e' then 25
when proc_code='f' then 30
when proc_code='g' then 35 end), count(distinct t1.claim_id) as proc_count, min(proc_date) as min_proc_date
from t1 full outer join t2 using (id) group by id order by id;
You can separate out your conditional aggregates into a cte or subquery and use OVER(PARTITION BY id) to get an id level aggregate without grouping, something like this:
with cte AS (SELECT *,sum(case when proc_code='a' then 5
when proc_code='b' then 10
when proc_code='c' then 15
when proc_code='d' then 20
when proc_code='e' then 25
when proc_code='f' then 30
when proc_code='g' then 35 end) OVER(PARTITION BY id) AS Some_Sum
, min(proc_date) OVER(PARTITION BY id) as min_proc_date
FROM t1
)
select id
, Some_Sum
, count(distinct cte.claim_id) as proc_count
, min_proc_date
from cte
full outer join t2 using (id)
group by id,Some_Sum,min_proc_Date
order by id;
Demo: SQL Fiddle
Note that you'll have to add these aggregates to the GROUP BY in the outer query, and the fields in your PARTITION BY should match the t1 fields you previously used in GROUP BY, in this case just id, but if your full query had other t1 fields in the GROUP BY be sure to add them to the PARTITION BY
You can use a subquery (by id and id_claim) and then regroup:
with base as (
select id, avg(case when proc_code='a' then 5
when proc_code='b' then 10
when proc_code='c' then 15
when proc_code='d' then 20
when proc_code='e' then 25
when proc_code='f' then 30
when proc_code='g' then 35 end) as value_proc,
t1.claim_id , min(proc_date) as min_proc_date
from t1 full outer join t2 using (id) group by id, t1.claim_id order by id, t1.claim_id)
select id, sum(value_proc), count(distinct claim_id) as proc_count, min(min_proc_date) as min_proc_date
from base
group by id
order by id;
See that I sugest avg for the internal subquery, but if you are sure that the same claim_id have the same letter you can use max or min and that was integer. If not is prefer this.