Get the count of longest streak including the break point - sql

I am working on the problem where I have to get the count of streak with max value, but to get the exact result I have to count that point as well where the streak breaks. My table looks like this
+-----------------+--------+-------+
| customer_number | Months | Flags |
+-----------------+--------+-------+
| 1 | 12 | 1 |
| 1 | 1 | 1 |
| 1 | 2 | 1 |
| 1 | 3 | 1 |
| 1 | 4 | 1 |
| 1 | 5 | 1 |
| 1 | 8 | 1 |
| 1 | 9 | 1 |
| 1 | 10 | 1 |
| 1 | 11 | 1 |
| 6 | 12 | 1 |
| 6 | 1 | 1 |
| 6 | 2 | 1 |
| 6 | 3 | 1 |
| 6 | 4 | 1 |
| 6 | 5 | 4 |
| 6 | 9 | 1 |
| 6 | 10 | 1 |
| 6 | 11 | 1 |
| 7 | 5 | 1 |
| 8 | 9 | 1 |
| 8 | 10 | 1 |
| 8 | 11 | 1 |
| 9 | 9 | 1 |
| 9 | 10 | 1 |
| 9 | 11 | 1 |
| 10 | 11 | 1 |
+-----------------+--------+-------+
and my desired output is
+----------+--------------------+
| Customer | Consecutive streak |
+----------+--------------------+
| 1 | 10 |
| 6 | 6 |
| 7 | 1 |
| 8 | 3 |
| 9 | 3 |
| 10 | 1 |
+----------+--------------------+
the code I have
SELECT customer_number, max(streak) max_consecutive_streak FROM (
SELECT customer_number, COUNT(*) as streak
FROM
(select *,
(row_number() over (order by customer_number) -
row_number() over (order by customer_number)
) as counts
from table1
) cc
group by customer_number, counts
)
GROUP BY 1;
It is working good but for customer_number 6 it returns 5 but I want it to be 6, means it should count 4 as well in its longest streak as the streak breaks at this point. Any idea how can I achieve that?

You can use a cte with row_number:
with cte(r, id, flag) as (
select row_number() over (order by c.customer_number), c.* from customers c
),
freq(id, t, f) as (
select c2.id, c2.f, count(*) from
(select c.id, (select sum(c1.flag!=c.flag) from cte c1 where c1.id=c.id and c1.r <= c.r) f from cte c)
c2 group by c2.id, c2.f
)
select id, max(f) from freq group by id;

Related

How to assign duplicate increment in SQL?

While going through SQL columns, if we find text match "NEW" in Calc column, update the incrementing a count starting with 1 in Results column.
It should look like this on the output:
The following uses an id column to resolve the order issue. Replace that with your corresponding expression. This also addresses the requirement to start the display sequence with 1 and also show 0 for the 'NEW' rows.
The SQL (updated):
SELECT logs.*
, CASE WHEN text = 'NEW' THEN 0
ELSE
COALESCE(SUM(CASE WHEN text = 'NEW' THEN 1 END) OVER (PARTITION BY xrank ORDER BY id)+1, 1)
END AS display
FROM logs
ORDER BY id
The result:
+----+-------+------+---------+
| id | xrank | text | display |
+----+-------+------+---------+
| 1 | 1 | A | 1 |
| 2 | 1 | B | 1 |
| 3 | 1 | C | 1 |
| 4 | 1 | NEW | 0 |
| 5 | 1 | D | 2 |
| 6 | 1 | Q | 2 |
| 7 | 1 | B | 2 |
| 8 | 1 | NEW | 0 |
| 9 | 1 | D | 3 |
| 10 | 1 | Z | 3 |
| 11 | 2 | A | 1 |
| 12 | 2 | B | 1 |
| 13 | 2 | C | 1 |
| 14 | 2 | NEW | 0 |
| 15 | 2 | D | 2 |
| 16 | 2 | Q | 2 |
| 17 | 2 | B | 2 |
| 18 | 2 | NEW | 0 |
| 19 | 2 | D | 3 |
| 20 | 2 | Z | 3 |
+----+-------+------+---------+
You need a column that specifies the ordering for the table. With that, just use a cumulative sum:
select t.*,
1 + sum(case when Calc = 'NEW' then 1 else 0 end) over (partition by Rank_Id order by Seq) as display
from t;

Query with WITH clause and COUNT subquery

In the query below, I don't get the results i would expect. Any insights why? How could i reformulate such query to get the desired results?
Schema (SQLite v3.30)
WITH RECURSIVE
cnt(x,y) AS (VALUES(0,ABS(Random()%3)) UNION ALL SELECT x+1, ABS(Random()%3) FROM cnt WHERE x<10),
i_rnd as (SELECT r1.x, r1.y, (SELECT COUNT(*) FROM cnt as r2 WHERE r2.y<=r1.y) as idx FROM cnt as r1)
SELECT * FROM i_rnd ORDER BY y;
result:
| x | y | idx |
| --- | --- | --- |
| 1 | 0 | 3 |
| 5 | 0 | 6 |
| 8 | 0 | 5 |
| 9 | 0 | 4 |
| 10 | 0 | 2 |
| 3 | 1 | 4 |
| 0 | 2 | 11 |
| 2 | 2 | 11 |
| 4 | 2 | 11 |
| 6 | 2 | 11 |
| 7 | 2 | 11 |
expected result:
| x | y | idx |
| --- | --- | --- |
| 1 | 0 | 5 |
| 5 | 0 | 5 |
| 8 | 0 | 5 |
| 9 | 0 | 5 |
| 10 | 0 | 5 |
| 3 | 1 | 6 |
| 0 | 2 | 11 |
| 2 | 2 | 11 |
| 4 | 2 | 11 |
| 6 | 2 | 11 |
| 7 | 2 | 11 |
In other words, idx should indicate how many rows have y less or equal than the y of row considered.
I would just use:
select cnt.*,
count(*) over (order by y)
from cnt;
Here is a db<>fiddle.
The issue with your code is probably that the CTE is re-evaluated each time it is called, so the values are not consistent -- a problem with volatile functions in CTEs.

Ranking different combinations in SQL

I'm working with pharmacy data and I'm trying to rank the use of three specific medications (A, B, C) amongst a large group of patients. In short, I want to figure out the top 12 combinations of these meds that people are using. So for instance, patient 1 might take meds A + B,
patient 2 takes A + C, patient 3 takes B + C, patient 4 takes A + B, and so forth. I did some digging and there are 25 possible combinations to rank. I want my output to look something like this:
The tables I'm working with look like this:
Currently I'm breaking the drugs up into different combination groups by doing something like this:
select distinct concat(substance_name, dosage, unit) as Drug_Dose_Combo,
count(distinct user_id) as Patients
from pharmacy_data a join drug_reference_table b
on a.drug_code=b.drug_code
group by 1
order by 2 desc
However, this seems very inefficient so I'm looking for a better way of building this out. I don't necessarily need to use a rank() here, I just want the output to look similar to what I've outlined above.
Maybe something like (Untested):
WITH meds_taken AS
(SELECT sum(CASE WHEN d.drug_name = :namea THEN 1 ELSE 0 END) AS drug_a
, sum(CASE WHEN d.drug_name = :nameb THEN 1 ELSE 0 END) AS drug_b
, sum(CASE WHEN d.drug_name = :namec THEN 1 ELSE 0 END) AS drug_c
FROM pharmacy_data AS p
JOIN drug_reference AS d ON p.drug_code = d.drug_code
GROUP BY p.user_id)
, med_counts AS
(SELECT drug_a, drug_b, drug_c, count(*) AS "user total"
FROM meds_taken
GROUP BY drug_a, drug_b, drug_c)
SELECT rank() OVER (ORDER BY "user total" DESC) AS rank
, drug_a, drug_b, drug_c, "user total"
FROM med_counts
ORDER BY "user total" DESC;
Alright it's not too clear what you are looking for, but you did indicate that you want to perform some sort of frequency analysis based on combinations of up to three pharmaceutical products.
The first step in an analysis like this is to take the pharmacy data and for each user_id determine the sets of 1, 2, and 3 drug_dose combinations that they participate in, however, since you may want to do the same analysis on the substance_name, drug_name, and/or drug_code I'm going to throw the kitchen sink at it and do all four. Not knowing what sort of DB you have on the back end, I'm going to use SQL Server 2017 for this example, though the concepts used are applicable to DBs such as Oracle, MySQL, PostgreSQL and others though the syntax may differ.
To create the drug_code and other combinations I'll first join the pharmacy_data table to the drug_reference table and then use a recursive query on the composite data:
with usage_info as (
select pd.user_id
, dr.drug_code
, dr.drug_name
, dr.substance_name
, concat(dr.substance_name,dr.dosage,dr.unit) drug_dose
from pharmacy_data pd
join drug_reference dr
on dr.drug_code = pd.drug_code
), recur(user_id, combo_id, dc_combo, dc_combo_size, dn_combo, sn_combo, dd_combo, last_dc) as (
-- Anchor part
select user_id
, cast(cast(drug_code as binary(4)) as varbinary(max))
, cast(drug_code as varchar(max))
, 1
, cast(drug_name as varchar(max))
, cast(substance_name as varchar(max))
, cast(drug_dose as varchar(max))
, drug_code
from usage_info
union all
-- Recursive Part
select prev.user_id
, prev.combo_id+cast(curr.drug_code as binary(4))
, prev.dc_combo+','+cast(curr.drug_code as varchar(max))
, prev.dc_combo_size+1
, prev.dn_combo+','+curr.drug_name
, prev.sn_combo+','+curr.substance_name
, prev.dd_combo+','+curr.drug_dose
, curr.drug_code
from recur prev
join usage_info curr
on prev.user_id = curr.user_id
and prev.last_dc < curr.drug_code
and prev.dc_combo_size < 3 -- Maximum combination size
)
Selecting from the above common table expressions for the data provided in your question:
select * from recur;
shows that some irregularities in the groupings for dn_combo, sn_combo, and possibly the dd_combo columns for example there exists dn_combos for both 'CAZERTA,BEXERA' and 'BEXERA,CAZERTA' which really should be equivalent
To rectify this I'll normalize the combinations by splitting them up and recombining them in sorted order. In the process I'll also deduplicate any instance where a user_id may have two or more equivalent but not identical products e.g. two different doses of the same medication:
, combos as (
select user_id
, combo_id
, dc_combo
, dc_combo_size
, -- Normalize and deduplicate Drug_Name combos
(select string_agg(value,',') within group (order by value)
from (select distinct value from string_split(dn_combo,',')) dn
) dn_combo
, (select count(distinct value) from string_split(dn_combo,',')) dn_combo_size
, -- Normalize and deduplicate Substance_Name combos
(select string_agg(value,',') within group (order by value)
from (select distinct value from string_split(sn_combo,',')) sn
) sn_combo
, (select count(distinct value) from string_split(sn_combo,',')) sn_combo_size
, -- Normalize and deduplicate Drug_Dose combos
(select string_agg(value,',') within group (order by value)
from (select distinct value from string_split(dd_combo,',')) ddc
) dd_combo
, (select count(distinct value) from string_split(dd_combo,',')) dd_combo_size
from recur
)
Now while you could just select the count(user_id) over (partition by <grouping_column>) to get the occurrence frequency of each drug combination those numbers could be inflated. Take for example if your data had an additional user_id of 999 with drug_codes 50, 100, 200, and 350 (that's two different doses of BEXERA along with AXIOM and CAZERTA), then user_id 999 would show up multiple times for every combination that includes BEXERA. Depending on your database flavor you could just select the count(DISTINCT user_id) over (partition by <grouping_column>) but as of SQL Server 2017 it doesn't allow the distinct operator in analytic functions. </shrug> We can still do it just takes another step to identify the unique values per group. Enter Common Table combo2 where we compute row numbers across various partitions:
, combo2 as (
select user_id
, combo_id
, dc_combo
, dc_combo_size
, row_number() over (partition by dc_combo, user_id order by dc_combo) dc_uid_rn
, dn_combo
, dn_combo_size
, row_number() over (partition by dn_combo, user_id order by dc_combo) dn_uid_rn
, row_number() over (partition by dn_combo, dc_combo order by user_id) dn_combo_rn
, sn_combo
, sn_combo_size
, row_number() over (partition by sn_combo, user_id order by dc_combo) sn_uid_rn
, row_number() over (partition by sn_combo, dc_combo order by user_id) sn_combo_rn
, dd_combo
, dd_combo_size
, row_number() over (partition by dd_combo, user_id order by dc_combo) dd_uid_rn
, row_number() over (partition by dd_combo, dc_combo order by user_id) dd_combo_rn
from combos
)
And then finally calculate our counts of which we have two types. The uid_cnt columns are counts of distinct user_ids for each combination, and the combo_cnt columns indicate the number of distinct drug_code combinations that make up the less granular groupings:
select user_id
, combo_id
, dc_combo
, dc_combo_size
, count(case dc_uid_rn when 1 then 1 end) over (partition by dc_combo) dc_uid_cnt
, dn_combo
, dn_combo_size
, count(case dn_uid_rn when 1 then 1 end) over (partition by dn_combo) dn_uid_cnt
, count(case dn_combo_rn when 1 then 1 end) over (partition by dn_combo) dn_combo_cnt
, sn_combo
, sn_combo_size
, count(case sn_uid_rn when 1 then 1 end) over (partition by sn_combo) sn_uid_cnt
, count(case sn_combo_rn when 1 then 1 end) over (partition by sn_combo) sn_combo_cnt
, dd_combo
, dd_combo_size
, count(case dd_uid_rn when 1 then 1 end) over (partition by dd_combo) dd_uid_cnt
, count(case dd_combo_rn when 1 then 1 end) over (partition by dd_combo) dd_combo_cnt
from combo2
order by dn_combo, dd_combo
All together with my additional sample data the above code results in the following table. To see it in action please see the SQL Fiddle:
| user_id | dc_combo | dc_combo_size | dc_uid_cnt | dn_combo | dn_combo_size | dn_uid_cnt | dn_combo_cnt | sn_combo | sn_combo_size | sn_uid_cnt | sn_combo_cnt | dd_combo | dd_combo_size | dd_uid_cnt | dd_combo_cnt |
|---------|-------------|---------------|------------|----------------------|---------------|------------|--------------|---------------------------------|---------------|------------|--------------|-------------------------------------------------|---------------|------------|--------------|
| 3 | 200 | 1 | 2 | AXIOM | 1 | 4 | 3 | nsaid | 1 | 4 | 3 | nsaid10mg | 1 | 2 | 1 |
| 999 | 200 | 1 | 2 | AXIOM | 1 | 4 | 3 | nsaid | 1 | 4 | 3 | nsaid10mg | 1 | 2 | 1 |
| 175 | 300 | 1 | 1 | AXIOM | 1 | 4 | 3 | nsaid | 1 | 4 | 3 | nsaid25mg | 1 | 1 | 1 |
| 1 | 25 | 1 | 1 | AXIOM | 1 | 4 | 3 | nsaid | 1 | 4 | 3 | nsaid5mg | 1 | 1 | 1 |
| 999 | 200,350 | 2 | 1 | AXIOM,BEXERA | 2 | 3 | 5 | nsaid,potassium | 2 | 3 | 5 | nsaid10mg,potassium12mg | 2 | 1 | 1 |
| 999 | 50,200,350 | 3 | 1 | AXIOM,BEXERA | 2 | 3 | 5 | nsaid,potassium | 2 | 3 | 5 | nsaid10mg,potassium12mg,potassium20mg | 3 | 1 | 1 |
| 999 | 50,200 | 2 | 1 | AXIOM,BEXERA | 2 | 3 | 5 | nsaid,potassium | 2 | 3 | 5 | nsaid10mg,potassium20mg | 2 | 1 | 1 |
| 175 | 50,300 | 2 | 1 | AXIOM,BEXERA | 2 | 3 | 5 | nsaid,potassium | 2 | 3 | 5 | nsaid25mg,potassium20mg | 2 | 1 | 1 |
| 1 | 25,50 | 2 | 1 | AXIOM,BEXERA | 2 | 3 | 5 | nsaid,potassium | 2 | 3 | 5 | nsaid5mg,potassium20mg | 2 | 1 | 1 |
| 999 | 100,200,350 | 3 | 1 | AXIOM,BEXERA,CAZERTA | 3 | 2 | 3 | nsaid,potassium,sodium chloride | 3 | 2 | 3 | nsaid10mg,potassium12mg,sodium chloride10mg | 3 | 1 | 1 |
| 999 | 50,100,200 | 3 | 1 | AXIOM,BEXERA,CAZERTA | 3 | 2 | 3 | nsaid,potassium,sodium chloride | 3 | 2 | 3 | nsaid10mg,potassium20mg,sodium chloride10mg | 3 | 1 | 1 |
| 1 | 25,50,100 | 3 | 1 | AXIOM,BEXERA,CAZERTA | 3 | 2 | 3 | nsaid,potassium,sodium chloride | 3 | 2 | 3 | nsaid5mg,potassium20mg,sodium chloride10mg | 3 | 1 | 1 |
| 999 | 100,200 | 2 | 1 | AXIOM,CAZERTA | 2 | 2 | 2 | nsaid,sodium chloride | 2 | 2 | 2 | nsaid10mg,sodium chloride10mg | 2 | 1 | 1 |
| 1 | 25,100 | 2 | 1 | AXIOM,CAZERTA | 2 | 2 | 2 | nsaid,sodium chloride | 2 | 2 | 2 | nsaid5mg,sodium chloride10mg | 2 | 1 | 1 |
| 201 | 350 | 1 | 2 | BEXERA | 1 | 5 | 4 | potassium | 1 | 5 | 4 | potassium12mg | 1 | 2 | 1 |
| 999 | 350 | 1 | 2 | BEXERA | 1 | 5 | 4 | potassium | 1 | 5 | 4 | potassium12mg | 1 | 2 | 1 |
| 999 | 50,350 | 2 | 1 | BEXERA | 1 | 5 | 4 | potassium | 1 | 5 | 4 | potassium12mg,potassium20mg | 2 | 1 | 1 |
| 378 | 400 | 1 | 1 | BEXERA | 1 | 5 | 4 | potassium | 1 | 5 | 4 | potassium15mg | 1 | 1 | 1 |
| 1 | 50 | 1 | 3 | BEXERA | 1 | 5 | 4 | potassium | 1 | 5 | 4 | potassium20mg | 1 | 3 | 1 |
| 175 | 50 | 1 | 3 | BEXERA | 1 | 5 | 4 | potassium | 1 | 5 | 4 | potassium20mg | 1 | 3 | 1 |
| 999 | 50 | 1 | 3 | BEXERA | 1 | 5 | 4 | potassium | 1 | 5 | 4 | potassium20mg | 1 | 3 | 1 |
| 999 | 50,100,350 | 3 | 1 | BEXERA,CAZERTA | 2 | 4 | 5 | potassium,sodium chloride | 2 | 4 | 5 | potassium12mg,potassium20mg,sodium chloride10mg | 3 | 1 | 1 |
| 999 | 100,350 | 2 | 1 | BEXERA,CAZERTA | 2 | 4 | 5 | potassium,sodium chloride | 2 | 4 | 5 | potassium12mg,sodium chloride10mg | 2 | 1 | 1 |
| 201 | 350,450 | 2 | 1 | BEXERA,CAZERTA | 2 | 4 | 5 | potassium,sodium chloride | 2 | 4 | 5 | potassium12mg,sodium chloride30mg | 2 | 1 | 1 |
| 378 | 100,400 | 2 | 1 | BEXERA,CAZERTA | 2 | 4 | 5 | potassium,sodium chloride | 2 | 4 | 5 | potassium15mg,sodium chloride10mg | 2 | 1 | 1 |
| 1 | 50,100 | 2 | 2 | BEXERA,CAZERTA | 2 | 4 | 5 | potassium,sodium chloride | 2 | 4 | 5 | potassium20mg,sodium chloride10mg | 2 | 2 | 1 |
| 999 | 50,100 | 2 | 2 | BEXERA,CAZERTA | 2 | 4 | 5 | potassium,sodium chloride | 2 | 4 | 5 | potassium20mg,sodium chloride10mg | 2 | 2 | 1 |
| 1 | 100 | 1 | 3 | CAZERTA | 1 | 4 | 2 | sodium chloride | 1 | 4 | 2 | sodium chloride10mg | 1 | 3 | 1 |
| 378 | 100 | 1 | 3 | CAZERTA | 1 | 4 | 2 | sodium chloride | 1 | 4 | 2 | sodium chloride10mg | 1 | 3 | 1 |
| 999 | 100 | 1 | 3 | CAZERTA | 1 | 4 | 2 | sodium chloride | 1 | 4 | 2 | sodium chloride10mg | 1 | 3 | 1 |
| 201 | 450 | 1 | 1 | CAZERTA | 1 | 4 | 2 | sodium chloride | 1 | 4 | 2 | sodium chloride30mg | 1 | 1 | 1 |

Postgresql change value based on the change of another field

I have a Postgres table like this:
id | value
----+-------
1 | 100
2 | 100
3 | 100
4 | 100
5 | 200
6 | 200
7 | 200
8 | 100
9 | 100
10 | 300
I'd have a table like this
id | value |new_id
----+---------+-----
1 | 100 | 1
2 | 100 | 1
3 | 100 | 1
4 | 100 | 1
5 | 200 | 2
6 | 200 | 2
7 | 200 | 2
8 | 100 | 3
9 | 100 | 3
10 | 300 | 4
I'd have a new field with a new_id that change when value change and remain the same until value changes again.
My question is similar this but I cannot found a solution.
You can identify sequences where the value is the same by using a difference of row_number(). After getting the difference, you have a group identifier and can calculate the minimum id for each group. Then, dense_rank() will renumber the values based on this ordering.
It looks like this:
select t.id, t.value, dense_rank() over (order by minid) as new_id
from (select t.*, min(id) over (partition by value, grp) as minid
from (select t.*,
(row_number() over (order by id) - row_number() over (partition by value order by id)
) as grp
from table t
) t
) t
You can see what happens to your sample data:
id | value | grp | minid | new_id |
----+-------+-----+-------+--------+
1 | 100 | 0 | 1 | 1 |
2 | 100 | 0 | 1 | 1 |
3 | 100 | 0 | 1 | 1 |
4 | 100 | 0 | 1 | 1 |
5 | 200 | 4 | 5 | 2 |
6 | 200 | 4 | 5 | 2 |
7 | 200 | 4 | 5 | 2 |
8 | 100 | 3 | 8 | 3 |
9 | 100 | 3 | 8 | 3 |
10 | 300 | 9 | 10 | 4 |

Sequential Group By in sql server

For this Table:
+----+--------+-------+
| ID | Status | Value |
+----+--------+-------+
| 1 | 1 | 4 |
| 2 | 1 | 7 |
| 3 | 1 | 9 |
| 4 | 2 | 1 |
| 5 | 2 | 7 |
| 6 | 1 | 8 |
| 7 | 1 | 9 |
| 8 | 2 | 1 |
| 9 | 0 | 4 |
| 10 | 0 | 3 |
| 11 | 0 | 8 |
| 12 | 1 | 9 |
| 13 | 3 | 1 |
+----+--------+-------+
I need to sum sequential groups with the same Status to produce this result.
+--------+------------+
| Status | Sum(Value) |
+--------+------------+
| 1 | 20 |
| 2 | 8 |
| 1 | 17 |
| 2 | 1 |
| 0 | 15 |
| 1 | 9 |
| 3 | 1 |
+--------+------------+
How can I do that in SQL Server?
NB: The values in the ID column are contiguous.
Per the tag I added to your question this is a gaps and islands problem.
The best performing solution will likely be
WITH T
AS (SELECT *,
ID - ROW_NUMBER() OVER (PARTITION BY [STATUS] ORDER BY [ID]) AS Grp
FROM YourTable)
SELECT [STATUS],
SUM([VALUE]) AS [SUM(VALUE)]
FROM T
GROUP BY [STATUS],
Grp
ORDER BY MIN(ID)
If the ID values were not guaranteed contiguous as stated then you would need to use
ROW_NUMBER() OVER (ORDER BY [ID]) -
ROW_NUMBER() OVER (PARTITION BY [STATUS] ORDER BY [ID]) AS Grp
Instead in the CTE definition.
SQL Fiddle