Category Entry and Exit Dates per ID AND Category

Category Entry and Exit Dates per ID AND Category - sql

I have the following table, where ID is the unique identifier. An can move from category to category, both up and down. My table records each day an ID stays in a given category. I am trying to identify the start date and the end date of an ID in a given category. The problem is that an ID can move up a category, and move back down to its original category after a certain number of days. Here is my table as an example with only 1 ID:
ID Category Date
1 1 2021-01-01
1 1 2021-01-02
...
1 1 2021-01-24
1 2 2021-01-25
...
1 2 2021-02-15
1 1 2021-02-16
...
1 1 2021-04-20
1 2 2021-04-21
When I try to get the MIN(DATE) and MAX(DATE) and group by the category and ID, it shows me that the account was in Category 1 from 2021-01-01 to 2021-04-20, and in Category 2 from 02-25 to 04-21. I am trying to track the movements of the file in each bucket step by step, meaning in my ideal result, the movements of the account will be tracked as:
ID Category StartDate EndDate
1 1 2021-01-01 2021-01-24
1 2 2021-01-25 2021-02-15
1 1 2021-02-16 2021-04-20
1 2 2021-04-21 NULL (or GETDATE())
How can I achieve this result? Any help would be appreciated. I tried using the RANK() function but because the table records every single day, it seems useless.

This is a type of gaps-and-islands problem that is most easily solved using the difference of row numbers:
select id, category, min(date), max(date)
from (select t.*,
row_number() over (partition by id order by date) as seqnum,
row_number() over (partition by id, category order by date) as seqnum_2
from t
) t
group by id, category, (seqnum - seqnum_2);
Actually, the difference of row numbers is only simplest because you have not specified the database. You can just subtract a sequence of numbers from the date to get a constant that defines each group. That looks like:
select id, category, min(date), max(date)
from (select t.*,
row_number() over (partition by id, category order by date) as seqnum
from t
) t
group by id, category, date - seqnum * interval '1 day';
However, the date arithmetic varies by database.

Related

Calculate the streaks of visit of users limited to 7

I am trying to calculate the consecutive visits a user makes on an app. I used the rank function to determine the streaks maintained by each user. However, my requirement is that the streaks should not exceed 7.
For instance, if a user visits the app for 9 consecutive days. He will have 2 different streaks: one with count 7 and the other with 2.
Using MaxCompute. It's similar to MySQL.
I have the following table named visitors_data:
user_id visit_date
murtaza 01-01-2021
john 01-01-2021
murtaza 02-01-2021
murtaza 03-01-2021
murtaza 04-01-2021
john 01-01-2021
murtaza 05-01-2021
murtaza 06-01-2021
john 02-01-2021
john 03-01-2021
murtaza 07-01-2021
murtaza 08-01-2021
murtaza 09-01-2021
john 20-01-2021
john 21-01-2021
Output should look like this:
user_id streak
murtaza 7
murtaza 2
john 3
john 2
I was able to get the streaks by the following query, but I could not limit the streaks to 7.
WITH groups AS (
SELECT user_id,
RANK() OVER (ORDER BY user_id, visit_date) AS RANK,
visit_date,
DATEADD(visit_date, -RANK() OVER (ORDER BY user_id, visit_date), 'dd') AS date_group
FROM visitors_data
ORDER BY user_id, visit_date)
SELECT
user_id,
COUNT(*) AS streak
FROM groups
GROUP BY
user_id,
date_group
HAVING COUNT(*)>1
ORDER BY COUNT(*);

My thinking ran along similar lines to forpas':
SELECT user_id, COUNT(*) streak
FROM
(
SELECT
user_id, streak,
FLOOR((ROW_NUMBER() OVER (PARTITION BY user_id, streak ORDER BY visit_date)-1)/7) substreak
FROM
(
SELECT
user_id, visit_date,
SUM(runtot) OVER (PARTITION BY user_id ORDER BY visit_date) streak
FROM (
SELECT
user_id, visit_date,
CASE WHEN DATE_ADD(visit_date, INTERVAL -1 DAY) = LAG(visit_date) OVER (PARTITION BY user_id ORDER BY visit_date) THEN 0 ELSE 1 END as runtot
FROM visitors_data
GROUP BY user_id, visit_date
) x
) y
) z
GROUP BY user_id, streak, substreak
As an explanation of how this works; a usual trick for counting runs of successive records is to use LAG to examine the record before and if there is only e.g. one day difference then put a 0, otherwise put a 1. This then means the first record of a consecutive run is 1, and the rest are 0, so the column ends up looking like 1,0,0,0,1,0... SUM OVER ORDER BY sums this in a "running total" fashion. This effectively means it forms a counter that ticks up every time the start of a run is encountered so a run of 4 days followed by a gap then a run of 3 days looks like 1,1,1,1,2,2,2 etc and it forms a "streak ID number".
If this is then fed into a row numbering that partitions by the streak ID number, it establishes an incrementing counter that restarts every time the streak ID changes. If we sub 1 off this so it runs from 0 instead of 1 then we can divide it by 7 to get a "sub streak ID" for our 9-long streak that is 0,0,0,0,0,0,0,1,1 (and so on. A streak of 25 would have 7 zeroes, 7 ones, 7 twos, and 4 threes)
All that remains then is to group by the user, the streak ID, the substreakID and count the result
Before the final group and count the data looks like:
Which should give some idea of how it all works

With a mix of window functions and aggregation:
SELECT user_id, COALESCE(NULLIF(MAX(counter) % 7, 0), 7) streak
FROM (
SELECT *, COUNT(*) OVER (PARTITION BY user_id, grp ORDER BY visit_date) counter
FROM (
SELECT *, SUM(flag) OVER (PARTITION BY user_id ORDER BY visit_date) grp
FROM (
SELECT *, COALESCE(DATE_ADD(visit_date, INTERVAL -1 DAY) <>
LAG(visit_date) OVER (PARTITION BY user_id ORDER BY visit_date), 1) flag
FROM (SELECT DISTINCT * FROM visitors_data) t
) t
) t
) t
GROUP BY user_id, grp, FLOOR((counter - 1) / 7)
See the demo.

You could break them up after the fact. For instance, if you never have more than 21:
SELECT user_id, LEAST(streak, 7)
FROM (SELECT user_id, COUNT(*) AS streak
FROM groups
GROUP BY user_id, date_group
HAVING COUNT(*) > 1
) gu JOIN
(SELECT 1 as n UNION ALL SELECT 2 as n UNION ALL SELECT 3 UNION ALL SELECT 4
) n
ON streak >= n * 7
ORDER BY LEAST(streak, 7);
If you have an indeterminate number range for the longest streak, you can do something similar with a recursive CTE>

Count new entries day by day

I would like to count new id's in each day. Saying new, I mean new relative to the day before.
Assume we have a table:
Date
Id
2021-01-01
1
2021-01-02
4
2021-01-02
5
2021-01-02
6
2021-01-03
1
2021-01-03
5
2021-01-03
7
My desired output, would look like this:
Date
Count(NewId)
2021-01-01
1
2021-01-02
3
2021-01-03
2

You can use two levels of aggregation:
select date, count(*)
from (select id, min(date) as date
from t
group by id
) i
group by date
order by date;
If by "relative to the day before" you mean that you want to count someone as new whenever they have no record on the previous day, then use lag() . . . carefully:
select date,
sum(case when prev_date = date - interval '1' day then 0 else 1 end)
from (select t.*,
lag(date) over (partition by id order by date) as prev_date
from t
) t
group by date
order by date;

here is another way, probably the simplest :
select t1.Date, count(*) from table t1
where id not in (select id from table t2 where t2.date = t1.date- interval '1 day')
group by t1.Date

Maybe this other option could also do the job, but being honest I would prefer the #GordonLinoff answer:
select date, count(*)
from your_table t
where not exists (
select 1
from your_table tt
where tt.Id=t.id
and tt.date = date_sub(t.date,1)
)
group by date

Add a column with customers orders count at the time they passed the order

I have the following table
order_id
created_at
customer_id
1
2020-01-02
11
2
2020-02-03
12
3
2020-02-03
11
I would like to add a column "customer_orders_count" that will assign the number of orders that a customer passed to each transaction, ie obtain this table :
order_id
created_at
customer_id
customer_orders_count
1
2020-01-02
11
1
2
2020-02-03
12
1
2
2020-02-03
11
2
My problem it's I can't find how to calculated a local "customer_orders_count" dependind on each order, I only managed to add a column with the global "customer_orders_count" and for example for the first row order_id=1 I'll get customer_orders_count=2 whereas I'll like to be 1.
Does anyone has and idea ?

Use cumulative count:
with mytable as (
select 1 as order_id, date '2020-01-02' as created_at, 11 as customer_id union all
select 2, '2020-02-03', 12 union all
select 3 , '2020-02-03', 11
)
select *, count(*) over (partition by customer_id order by created_at) as customer_orders_count
from mytable
order by order_id

Use row_number():
select t.*,
row_number() over (partition by customer_id order by created_at) as customer_order_count
from t;
This is subtly different from using a cumulative count(). This version guarantees that the numbers for a given customer are never duplicated, even when the dates are the same. A cumulative count has no such guarantee.

Joining client records based on overlapping date ranges in oracle SQL

I have a dataset that looks like this:
Client id
stayId
start_date
end_date
type
1
101
1-1-2010
20-7-2010
A
1
105
1-7-2010
30-12-2010
A
2
108.
8-10-2012
10-12-2012
B
2
108.
8-10-2012
10-12-2012
B
And i want to merge rows with overlapping date ranges and take the highest stayId but only if the client id and types match. How should i do this in oracle sql?
The result would look like this:
Client id
stayId
start_date
end_date
type
1
105
1-1-2010
30-12-2010
A
2
108.
8-10-2012
10-12-2012
B
2
108.
01-01-2013
13-10-2013
B

This is a type of gaps-and-islands problem. It looks tricky, because there can be arbitrary overlaps -- I suspect that the overlap might even be an earlier record, as in:
|------| |-------|
|------------------|
For this version, I recommend a cumulative max to identify the rows with no overlap. These rows start the "islands". Then, a cumulative sum identifies the islands (the sum of rows where there is no overlap). The final step is aggregation:
select clientid, type, max(stayid),
min(start_date), max(end_date)
from (select t.*,
sum(case when prev_end_date >= start_date then 0 else 1 end) over
(partition by clientid, type
order by start_date
) as grp
from (select t.*,
max(end_date) over (partition by clientid, type
order by start_date
range between unbounded preceding and '1' day preceding
) as prev_end_date
from t
) t
) t
group by clientid, type, grp;

How to use SQL to get column count for a previous date?

I have the following table,
id status price date
2 complete 10 2020-01-01 10:10:10
2 complete 20 2020-02-02 10:10:10
2 complete 10 2020-03-03 10:10:10
3 complete 10 2020-04-04 10:10:10
4 complete 10 2020-05-05 10:10:10
Required output,
id status_count price ratio
2 0 0 0
2 1 10 0
2 2 30 0.33
I am looking to add the price for previous row. Row 1 is 0 because it has no previous row value.
Find ratio ie 10/30=0.33

You can use analytical function ROW_NUMBER and SUM as follows:
SELECT
id,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY date) - 1 AS status_count,
COALESCE(SUM(price) OVER (PARTITION BY id ORDER BY date), 0) - price as price
FROM yourTable;
DB<>Fiddle demo

I think you want something like this:
SELECT
id,
COUNT(*) OVER (PARTITION BY id ORDER BY date) - 1 AS status_count,
COALESCE(SUM(price) OVER (PARTITION BY id
ORDER BY date ROWS BETWEEN
UNBOUNDED PRECEDING AND 1 PRECEDING), 0) price
FROM yourTable;
Demo

Please also check another method:
with cte
as(*,ROW_NUMBER() OVER (PARTITION BY id ORDER BY date) - 1 AS status_count,
SUM(price) OVER (PARTITION BY id ORDER BY date) ss from yourTable)
select id,status_count,isnull(ss,0)-price price
from cte

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Category Entry and Exit Dates per ID AND Category - sql

Related

Calculate the streaks of visit of users limited to 7

Count new entries day by day

Add a column with customers orders count at the time they passed the order

Joining client records based on overlapping date ranges in oracle SQL

How to use SQL to get column count for a previous date?

Categories

Resources