Redshift SQL - Count Sequences of Repeating Values Within Groups

Redshift SQL - Count Sequences of Repeating Values Within Groups - sql

I have a table that looks like this:
| id | date_start | gap_7_days |
| -- | ------------------- | --------------- |
| 1 | 2021-06-10 00:00:00 | 0 |
| 1 | 2021-06-13 00:00:00 | 0 |
| 1 | 2021-06-19 00:00:00 | 0 |
| 1 | 2021-06-27 00:00:00 | 0 |
| 2 | 2021-07-04 00:00:00 | 1 |
| 2 | 2021-07-11 00:00:00 | 1 |
| 2 | 2021-07-18 00:00:00 | 1 |
| 2 | 2021-07-25 00:00:00 | 1 |
| 2 | 2021-08-01 00:00:00 | 1 |
| 2 | 2021-08-08 00:00:00 | 1 |
| 2 | 2021-08-09 00:00:00 | 0 |
| 2 | 2021-08-16 00:00:00 | 1 |
| 2 | 2021-08-23 00:00:00 | 1 |
| 2 | 2021-08-30 00:00:00 | 1 |
| 2 | 2021-08-31 00:00:00 | 0 |
| 2 | 2021-09-01 00:00:00 | 0 |
| 2 | 2021-08-08 00:00:00 | 1 |
| 2 | 2021-08-15 00:00:00 | 1 |
| 2 | 2021-08-22 00:00:00 | 1 |
| 2 | 2021-08-23 00:00:00 | 1 |
For each ID, I check whether consecutive date_start values are 7 days apart, and put a 1 or 0 in gap_7_days accordingly.
I want to do the following (using Redshift SQL only):
Get the length of each sequence of consecutive 1s in gap_7_days for each ID
Expected output:
| id | date_start | gap_7_days | sequence_length |
| -- | ------------------- | --------------- | --------------- |
| 1 | 2021-06-10 00:00:00 | 0 | |
| 1 | 2021-06-13 00:00:00 | 0 | |
| 1 | 2021-06-19 00:00:00 | 0 | |
| 1 | 2021-06-27 00:00:00 | 0 | |
| 2 | 2021-07-04 00:00:00 | 1 | 6 |
| 2 | 2021-07-11 00:00:00 | 1 | 6 |
| 2 | 2021-07-18 00:00:00 | 1 | 6 |
| 2 | 2021-07-25 00:00:00 | 1 | 6 |
| 2 | 2021-08-01 00:00:00 | 1 | 6 |
| 2 | 2021-08-08 00:00:00 | 1 | 6 |
| 2 | 2021-08-09 00:00:00 | 0 | |
| 2 | 2021-08-16 00:00:00 | 1 | 3 |
| 2 | 2021-08-23 00:00:00 | 1 | 3 |
| 2 | 2021-08-30 00:00:00 | 1 | 3 |
| 2 | 2021-08-31 00:00:00 | 0 | |
| 2 | 2021-09-01 00:00:00 | 0 | |
| 2 | 2021-08-08 00:00:00 | 1 | 4 |
| 2 | 2021-08-15 00:00:00 | 1 | 4 |
| 2 | 2021-08-22 00:00:00 | 1 | 4 |
| 2 | 2021-08-23 00:00:00 | 1 | 4 |
Get the number of sequences for each ID
Expected output:
| id | num_sequences |
| -- | ------------------- |
| 1 | 0 |
| 2 | 3 |
How can I achieve this?

If you want the number of sequences, just look at the previous value. When the current value is "1" and the previous is NULL or 0, then you have a new sequence.
So:
select id,
sum( (gap_7_days = 1 and coalesce(prev_gap_7_days, 0) = 0)::int ) as num_sequences
from (select t.*,
lag(gap_7_days) over (partition by id order by date_start) as prev_gap_7_days
from t
) t
group by id;
If you actually want the lengths of the sequences, as in the intermediate results, then ask a new question. That information is not needed for this question.

Related

cumulative amount to current_date

base_table
month id sales cumulative_sales
2021-01-01 33205 10 10
2021-02-01 33205 15 25
Based on the base table above, I would like to add more rows up to the current month,
even if there is no sales.
Expected table
month id sales cumulative_sales
2021-01-01 33205 10 10
2021-02-01 33205 15 25
2021-03-01 33205 0 25
2021-04-01 33205 0 25
2021-05-01 33205 0 25
.........
2021-11-01 33205 0 25
My query stops at
select month, id, sales,
sum(sales) over (partition by id
order by month
rows between unbounded preceding and current row) as cumulative_sales
from base_table

This works. Assumes the month column is constrained to hold only "first of the month" dates. Use the desired hard-coded start date, or use another CTE to get the earliest date from base_table:
with base_table as (
select *
from (values
('2021-01-01'::date,33205,10)
,('2021-02-01' ,33205,15)
,('2021-01-01' ,12345,99)
,('2021-04-01' ,12345,88)
) dat("month",id,sales)
)
select cal.dt::date
,list.id
,coalesce(dat.sales,0) as sales
,coalesce(sum(dat.sales) over (partition by list.id order by cal.dt),0) as cumulative_sales
from generate_series('2020-06-01' /* use desired start date here */,current_date,'1 month') cal(dt)
cross join (select distinct id from base_table) list
left join base_table dat on dat."month" = cal.dt and dat.id = list.id
;
Results:
| dt | id | sales | cumulative_sales |
+------------+-------+-------+------------------+
| 2020-06-01 | 12345 | 0 | 0 |
| 2020-07-01 | 12345 | 0 | 0 |
| 2020-08-01 | 12345 | 0 | 0 |
| 2020-09-01 | 12345 | 0 | 0 |
| 2020-10-01 | 12345 | 0 | 0 |
| 2020-11-01 | 12345 | 0 | 0 |
| 2020-12-01 | 12345 | 0 | 0 |
| 2021-01-01 | 12345 | 99 | 99 |
| 2021-02-01 | 12345 | 0 | 99 |
| 2021-03-01 | 12345 | 0 | 99 |
| 2021-04-01 | 12345 | 88 | 187 |
| 2021-05-01 | 12345 | 0 | 187 |
| 2021-06-01 | 12345 | 0 | 187 |
| 2021-07-01 | 12345 | 0 | 187 |
| 2021-08-01 | 12345 | 0 | 187 |
| 2021-09-01 | 12345 | 0 | 187 |
| 2021-10-01 | 12345 | 0 | 187 |
| 2021-11-01 | 12345 | 0 | 187 |
| 2020-06-01 | 33205 | 0 | 0 |
| 2020-07-01 | 33205 | 0 | 0 |
| 2020-08-01 | 33205 | 0 | 0 |
| 2020-09-01 | 33205 | 0 | 0 |
| 2020-10-01 | 33205 | 0 | 0 |
| 2020-11-01 | 33205 | 0 | 0 |
| 2020-12-01 | 33205 | 0 | 0 |
| 2021-01-01 | 33205 | 10 | 10 |
| 2021-02-01 | 33205 | 15 | 25 |
| 2021-03-01 | 33205 | 0 | 25 |
| 2021-04-01 | 33205 | 0 | 25 |
| 2021-05-01 | 33205 | 0 | 25 |
| 2021-06-01 | 33205 | 0 | 25 |
| 2021-07-01 | 33205 | 0 | 25 |
| 2021-08-01 | 33205 | 0 | 25 |
| 2021-09-01 | 33205 | 0 | 25 |
| 2021-10-01 | 33205 | 0 | 25 |
| 2021-11-01 | 33205 | 0 | 25 |
The cross join pairs every date output by generate_series() with every id value from base_table.
The left join ensures that no dt+id pairs get dropped from the output when no such record exists in base_table.
The coalesce() functions ensure that the sales and cumulative_sales show 0 instead of null for dt+id combinations that don't exist in base_table. Remove them if you don't mind seeing nulls in those columns.

Dense rank is not generating rows correctly

I have a table A:
Create table A(
Name varchar(10),
Number integer,
Exc integer,
D1 date
)
I have inserted 11 rows.
Sel * from A;
+ -----+--------+-----+------------+
| NAME | NUMBER | EXC | D1 |
+ -----+--------+-----+------------+
| a | 1 | 1 | 2020-02-03 |
| a | 1 | 2 | 2020-02-03 |
| a | 1 | 3 | 2020-02-03 |
| a | 1 | 4 | 2020-02-03 |
| a | 1 | 1 | 2020-02-04 |
| a | 1 | 2 | 2020-02-04 |
| a | 1 | 3 | 2020-02-04 |
| a | 1 | 1 | 2020-02-05 |
| a | 1 | 2 | 2020-02-05 |
| a | 1 | 3 | 2020-02-05 |
| a | 1 | 4 | 2020-02-05 |
+ -----+--------+-----+------------+
Now, when I apply dense rank like below:
sel vt.*,dense_rank() OVER(PARTITION BY Name,Number,EXC ORDER BY D1 ) AS rn
from vt;
Output:
+ -----+--------+-----+------------+----+
| NAME | NUMBER | EXC | D1 | RN |
+ -----+--------+-----+------------+----+
| a | 1 | 1 | 2020-02-03 | 1 |
| a | 1 | 2 | 2020-02-03 | 1 |
| a | 1 | 3 | 2020-02-03 | 1 |
| a | 1 | 4 | 2020-02-03 | 1 |
| a | 1 | 1 | 2020-02-04 | 2 |
| a | 1 | 2 | 2020-02-04 | 2 |
| a | 1 | 3 | 2020-02-04 | 2 |
| a | 1 | 1 | 2020-02-05 | 3 |
| a | 1 | 2 | 2020-02-05 | 3 |
| a | 1 | 3 | 2020-02-05 | 3 |
| a | 1 | 4 | 2020-02-05 | 2 |
+ -----+--------+-----+------------+----+
Expected:
+ -----+--------+-----+------------+----+
| NAME | NUMBER | EXC | D1 | RN |
+ -----+--------+-----+------------+----+
| a | 1 | 1 | 2020-02-03 | 1 |
| a | 1 | 2 | 2020-02-03 | 1 |
| a | 1 | 3 | 2020-02-03 | 1 |
| a | 1 | 4 | 2020-02-03 | 1 |
| a | 1 | 1 | 2020-02-04 | 2 |
| a | 1 | 2 | 2020-02-04 | 2 |
| a | 1 | 3 | 2020-02-04 | 2 |
| a | 1 | 1 | 2020-02-05 | 3 |
| a | 1 | 2 | 2020-02-05 | 3 |
| a | 1 | 3 | 2020-02-05 | 3 |
| a | 1 | 4 | 2020-02-05 | 3 | <-- Difference here
+ -----+--------+-----+------------+----+

Removing column EXC from the PARTITION would give you the results that you expect:
DENSE_RANK() OVER(PARTITION BY Name, Number ORDER BY D1)
Demo on DB Fiddle:
name | number | exc | d1 | rn
:--- | -----: | --: | :--------- | :-
a | 1 | 1 | 2020-02-03 | 1
a | 1 | 2 | 2020-02-03 | 1
a | 1 | 3 | 2020-02-03 | 1
a | 1 | 4 | 2020-02-03 | 1
a | 1 | 1 | 2020-02-04 | 2
a | 1 | 2 | 2020-02-04 | 2
a | 1 | 3 | 2020-02-04 | 2
a | 1 | 1 | 2020-02-05 | 3
a | 1 | 2 | 2020-02-05 | 3
a | 1 | 3 | 2020-02-05 | 3
a | 1 | 4 | 2020-02-05 | 3

Do you simply want to rank all rows by date? Then remove the partition clause, so the ranking is done not inside groups.
select name, number, exc, d1, dense_rank() over (order by d1) as rn
from vt
order by d1, name, number, exc;

Get next result with specific ORDER BY satisfying the WHERE clause

Given a TripID I need to grab the next result that satistfies certain criteria (TripSource <> 1 AND HasLot = 1) but I've found the problem that the order to consider "the next Trip" has to be "ORDER BY TripDate, TripOrder". So I mean that TripID has nothing to do with the order.
(I'm using SQL Server 2008, so I can't use LEAD or LAG but I'm also interested in answers using them.)
Example datasource:
+--------+-------------------------+-----------+------------+--------+
| TripID | TripDate | TripOrder | TripSource | HasLot |
+--------+-------------------------+-----------+------------+--------+
1. | 37172 | 2019-08-01 00:00:00.000 | 0 | 1 | 0 |
2. | 37211 | 2019-08-01 00:00:00.000 | 1 | 1 | 0 |
3. | 37198 | 2019-08-01 00:00:00.000 | 2 | 2 | 1 |
4. | 37213 | 2019-08-01 00:00:00.000 | 3 | 1 | 0 |
5. | 37245 | 2019-08-02 00:00:00.000 | 0 | 1 | 0 |
6. | 37279 | 2019-08-02 00:00:00.000 | 1 | 1 | 0 |
7. | 37275 | 2019-08-02 00:00:00.000 | 2 | 1 | 0 |
8. | 37264 | 2019-08-02 00:00:00.000 | 3 | 2 | 0 |
9. | 37336 | 2019-08-03 00:00:00.000 | 0 | 1 | 1 |
10. | 37320 | 2019-08-05 00:00:00.000 | 0 | 1 | 0 |
11. | 37354 | 2019-08-05 00:00:00.000 | 1 | 1 | 0 |
12. | 37329 | 2019-08-05 00:00:00.000 | 2 | 1 | 0 |
13. | 37373 | 2019-08-06 00:00:00.000 | 0 | 1 | 0 |
14. | 37419 | 2019-08-06 00:00:00.000 | 1 | 1 | 0 |
15. | 37421 | 2019-08-06 00:00:00.000 | 2 | 1 | 0 |
16. | 37414 | 2019-08-06 00:00:00.000 | 3 | 1 | 1 |
17. | 37459 | 2019-08-07 00:00:00.000 | 0 | 2 | 1 |
18. | 37467 | 2019-08-07 00:00:00.000 | 1 | 1 | 0 |
19. | 37463 | 2019-08-07 00:00:00.000 | 2 | 1 | 0 |
20. | 37461 | 2019-08-07 00:00:00.000 | 3 | 0 | 0 |
+--------+-------------------------+-----------+------------+--------+
Results I need:
Given TripID 37211 (Row 2.) I need to get 37198 (Row 3.)
Given TripID 37198 (Row 3.) I need to get 37459 (Row 17.)
Given TripID 37459 (Row 17.) I need to get null
Given TripID 37463 (Row 19.) I need to get null

You can use a correlated subquery or outer apply:
select t.*, t2.tripid
from trips t outer apply
(select top (1) t2.*
from trips t2
where t2.tripsource <> 1 and t2.haslot = 1 and
(t2.tripdate > t.tripdate or
t2.tripdate = t.tripdate and t2.triporder > t.triporder
)
order by t2.tripdate desc, t2.triporder desc
) t2;

Count rows each month of a year - SQL Server

I have a table "Product" as :
| ProductId | ProductCatId | Price | Date | Deadline |
--------------------------------------------------------------------
| 1 | 1 | 10.00 | 2016-01-01 | 2016-01-27 |
| 2 | 2 | 10.00 | 2016-02-01 | 2016-02-27 |
| 3 | 3 | 10.00 | 2016-03-01 | 2016-03-27 |
| 4 | 1 | 10.00 | 2016-04-01 | 2016-04-27 |
| 5 | 3 | 10.00 | 2016-05-01 | 2016-05-27 |
| 6 | 3 | 10.00 | 2016-06-01 | 2016-06-27 |
| 7 | 1 | 20.00 | 2016-01-01 | 2016-01-27 |
| 8 | 2 | 30.00 | 2016-02-01 | 2016-02-27 |
| 9 | 1 | 40.00 | 2016-03-01 | 2016-03-27 |
| 10 | 4 | 15.00 | 2016-04-01 | 2016-04-27 |
| 11 | 1 | 25.00 | 2016-05-01 | 2016-05-27 |
| 12 | 5 | 55.00 | 2016-06-01 | 2016-06-27 |
| 13 | 5 | 55.00 | 2016-06-01 | 2016-01-27 |
| 14 | 5 | 55.00 | 2016-06-01 | 2016-02-27 |
| 15 | 5 | 55.00 | 2016-06-01 | 2016-03-27 |
I want to create SP count rows of Product each month with condition Year = CurrentYear , like :
| Month| SumProducts | SumExpiredProducts |
-------------------------------------------
| 1 | 3 | 3 |
| 2 | 3 | 3 |
| 3 | 3 | 3 |
| 4 | 2 | 2 |
| 5 | 2 | 2 |
| 6 | 2 | 2 |
What should i do ?

You can use a query like the following:
SELECT MONTH([Date]),
COUNT(*) AS SumProducts ,
COUNT(CASE WHEN [Date] > Deadline THEN 1 END) AS SumExpiredProducts
FROM mytable
WHERE YEAR([Date]) = YEAR(GETDATE())
GROUP BY MONTH([Date])

Get last value with delta from previous row

I have data
| account | type | position | created_date |
|---------|------|----------|------|
| 1 | 1 | 1 | 2016-08-01 00:00:00 |
| 2 | 1 | 2 | 2016-08-01 00:00:00 |
| 1 | 2 | 2 | 2016-08-01 00:00:00 |
| 2 | 2 | 1 | 2016-08-01 00:00:00 |
| 1 | 1 | 2 | 2016-08-02 00:00:00 |
| 2 | 1 | 1 | 2016-08-02 00:00:00 |
| 1 | 2 | 1 | 2016-08-03 00:00:00 |
| 2 | 2 | 2 | 2016-08-03 00:00:00 |
| 1 | 1 | 2 | 2016-08-04 00:00:00 |
| 2 | 1 | 1 | 2016-08-04 00:00:00 |
| 1 | 2 | 2 | 2016-08-07 00:00:00 |
| 2 | 2 | 1 | 2016-08-07 00:00:00 |
I need to get last positions (account, type, position) and delta from previous position. I'm trying to use Window functions but only get all rows and can't grouping them/get last.
SELECT
account,
type,
FIRST_VALUE(position) OVER w AS position,
FIRST_VALUE(position) OVER w - LEAD(position, 1, 0) OVER w AS delta,
created_date
FROM table
WINDOW w AS (PARTITION BY account ORDER BY created_date DESC)
I have result
| account | type | position | delta | created_date |
|---------|------|----------|-------|--------------|
| 1 | 1 | 1 | 1 | 2016-08-01 00:00:00 |
| 1 | 1 | 2 | 1 | 2016-08-02 00:00:00 |
| 1 | 1 | 2 | 0 | 2016-08-04 00:00:00 |
| 1 | 2 | 2 | 2 | 2016-08-01 00:00:00 |
| 1 | 2 | 1 | -1 | 2016-08-03 00:00:00 |
| 1 | 2 | 2 | 1 | 2016-08-07 00:00:00 |
| 2 | 1 | 2 | 2 | 2016-08-01 00:00:00 |
| 2 | 2 | 1 | 1 | 2016-08-01 00:00:00 |
| and so on |
but i need only last record for each account/type pair
| account | type | position | delta | created_date |
|---------|------|----------|-------|--------------|
| 1 | 1 | 2 | 0 | 2016-08-04 00:00:00 |
| 1 | 2 | 2 | 1 | 2016-08-07 00:00:00 |
| 2 | 1 | 1 | 0 | 2016-08-04 00:00:00 |
| and so on |
Sorry for my bad language and Thanks for any help.

My "best" try..
WITH cte_delta AS (
SELECT
account,
type,
FIRST_VALUE(position) OVER w AS position,
FIRST_VALUE(position) OVER w - LEAD(position, 1, 0) OVER w AS delta,
created_date
FROM table
WINDOW w AS (PARTITION BY account ORDER BY created_date DESC)
),
cte_date AS (
SELECT
account,
type,
MAX(created_date) AS created_date
FROM cte_delta
GROUP BY account, type
)
SELECT cd.*
FROM
cte_delta cd,
cte_date ct
WHERE
cd.account = ct.account
AND cd.type = ct.type
AND cd.created_date = ct.created_date

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Redshift SQL - Count Sequences of Repeating Values Within Groups - sql

Related

cumulative amount to current_date

Dense rank is not generating rows correctly

Get next result with specific ORDER BY satisfying the WHERE clause

Count rows each month of a year - SQL Server

Get last value with delta from previous row

Categories

Resources