Populating grid date from based on timestamps - sql

I have two following tables that I want to join:
select
user_id
, date
from users
where id = 13
order by date descending
user_id
date
13
2020-06-31
13
2020-06-30
13
2020-06-29
13
2020-06-28
13
2020-06-27
13
2020-06-26
select
user_id
, date
, orders_count_sum
from orders
where user_id = 13
order by date descending
user_id
date
orders_count_sum
13
2020-06-30
3
13
2020-06-27
2
13
2020-06-26
1
I want to join the orders table to users, so that I get orders_count_sum populated over dates:
userid
date
orders_count_sum
13
2020-06-31
3
13
2020-06-30
3
13
2020-06-29
2
13
2020-06-28
2
13
2020-06-27
2
13
2020-06-26
1
Doing a left join here will only show order_count_sum for the dates from the second table. How can I populate the users tabls grid date with latest for that date orders_count_sum?

You can use a lead() and then join
select u.user_id, u.date, o.orders_count_sum
from users u join
(select o.*,
lead(date) over (partition by o.user_id order by date) as next_date
from orders o
) o
on o.user_id = u.user_id and
u.date >= o.date and
(u.date < o.next_date or o.next_date is null);
You can filter down to a particular user if you also want that.

Consider below approach
select user_id, date,
last_value(orders_count_sum ignore nulls) over win orders_count_sum
from users u
left join orders o
using (user_id, date)
window win as (partition by user_id order by date rows between unbounded preceding and current row)
If applied to sample data in your question - output is
As you might noticed - in your sample data I changed MONTH from 06 to 05 as there is no such a date as JUNE 31 unless you are in the word of Priestley's fantasy Book

Related

Group items from the first time + certain time period

I want to group orders from the same customer if they happen within 10 minutes of the first order, then find the next first order and group them and so on.
Ex:
Customer group orders
6 1 3
2 4,5
3 8
7 1 9,10
2 11,12
3 13
id customer time
3 6 2021-05-12 12:14:22.000000
4 6 2021-05-12 12:24:24.000000
5 6 2021-05-12 12:29:16.000000
8 6 2021-05-12 13:01:40.000000
9 7 2021-05-14 12:13:11.000000
10 7 2021-05-14 12:20:01.000000
11 7 2021-05-14 12:45:00.000000
12 7 2021-05-14 12:48:41.000000
13 7 2021-05-14 12:58:16.000000
18 9 2021-05-18 12:22:13.000000
25 15 2021-05-18 13:44:02.000000
26 16 2021-05-17 09:39:02.000000
27 16 2021-05-18 19:38:43.000000
28 17 2021-05-18 15:40:02.000000
29 18 2021-05-19 15:32:53.000000
30 18 2021-05-19 15:45:56.000000
31 18 2021-05-19 16:29:09.000000
34 15 2021-05-24 15:45:14.000000
35 15 2021-05-24 15:45:14.000000
36 19 2021-05-24 17:14:53.000000
Here is what I have currently, I think that it is currently not grouping by customer when case when d.StartTime > dateadd(minute, 10, c.first_time) so it compares StartTime of all orders for all customers.
with
data as (select Customer,StartTime,Id, row_number() over(partition by Customer order by StartTime) rn from orders t),
cte as (
select d.*, StartTime as first_time
from data d
where rn = 1
union all
select d.*,
case when d.StartTime > dateadd(minute, 10, c.first_time)
then d.StartTime
else c.first_time
end
from cte c
inner join data d on d.rn = c.rn + 1
)
select c.*, dense_rank() over(partition by Customer order by first_time) grp
from cte c;'
I have two databases (MySQL & SQL Server) having similar schema so either would work for me.
Try the following on SQL Server:
SELECT customer,
ROW_NUMBER() OVER (PARTITION BY customer ORDER BY grp) AS group_no,
STRING_AGG(id, ',') AS orders
FROM
(
SELECT id,customer, [time],
(DATEDIFF(SECOND, MIN([time]) OVER (PARTITION BY CUSTOMER), [time])/60)/10 grp
FROM orders
) T
GROUP BY customer, grp
ORDER BY customer
See a demo.
According to your posted requirement, you are trying to divide the period between the first order date and the last order date into groups (or let's say time frames) each one is 10 minutes long.
What I did in this query: for each customer order, find the difference between the order date and the minimum date (first customer order date) in seconds and then divide it by 10 to get it's time frame number. i.e. for a difference = 599s the frame number = 599/60 =9m /10 = 0. for a difference = 620s the frame number = 620/60 =10m /10 = 1.
After defining the correct groups/time frames for each order you can simply use the STRING_AGG function to get the desired output. Noting that the STRING_AGG function applies to SQL Server 2017 (14.x) and later.

How to return id users buy several months consecutive?

How can I get all user_id values from the data below, for all rows containing the same user_id value over consecutive months from a given start date in the date column.
For example, given the below table....
date
user_id
2018-11-01
13
2018-11-01
13
2018-11-01
14
2018-11-01
15
2018-12-01
13
2019-01-01
13
2019-01-01
14
...supposing I want to get the user_id values for consecutive months prior to (but not including) 2019-01-01 then I'd have this as my output:
user_id
m_year
13
2018-11
13
2018-12
13
2019-01
probably can be applied windows function
If you want to aggregate on a user and the year-months
select
t.user_id,
to_char(date_trunc('month',t.date),'YYYY-MM') as m_year
from yourtable t
where t.date < '2019-02-01'::date
group by t.user_id, date_trunc('month',t.date)
order by t.user_id, m_year
But if you only want those with consecutive months, then a little extra is needed.
select
user_id,
to_char(ym,'YYYY-MM') as m_year
from
(
select t.user_id
, date_trunc('month',t.date) as ym
, lag(date_trunc('month',t.date))
over (partition by t.user_id order by date_trunc('month',t.date)) as prev_ym
, lead(date_trunc('month',t.date))
over (partition by t.user_id order by date_trunc('month',t.date)) as next_ym
from yourtable t
where t.date < '2019-02-01'::date
group by t.user_id, date_trunc('month',t.date)
) q
where (ym - prev_ym <= '31 days'::interval or
next_ym - ym <= '31 days'::interval)
order by user_id, ym
user_id | m_year
------: | :------
13 | 2018-11
13 | 2018-12
13 | 2019-01
db<>fiddle here
you don't need a window function in this specific query. Just try :
SELECT DISTINCT ON (user_id) user_id, date_trunc('month', date :: date) AS m_year
FROM your_table

SQLite query - Limit occurrence of value

I have a query that return this result. How can i limit the occurrence of a value from the 4th column.
19 1 _BOURC01 1
20 1 _BOURC01 3 2019-11-18
20 1 _BOURC01 3 2017-01-02
21 1 _BOURC01 6
22 1 _BOURC01 10
23 1 _BOURC01 13 2016-06-06
24 1 _BOURC01 21 2016-09-19
My Query:
SELECT "_44_SpeakerSpeech"."id" AS "id", "_44_SpeakerSpeech"."active" AS "active", "_44_SpeakerSpeech"."id_speaker" AS "id_speaker", "_44_SpeakerSpeech"."Speech" AS "Speech", "34 Program Weekend"."date" AS "date"
FROM "_44_SpeakerSpeech"
LEFT JOIN "_34_programWeekend" "34 Program Weekend" ON "_44_SpeakerSpeech"."Speech" = "34 Program Weekend"."theme_id"
WHERE "id_speaker" = "_BOURC01"
ORDER BY id_speaker, Speech, date DESC
Thanks
I think this is what you want here:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY s.id, s.active, s.id_speaker, s.Speech
ORDER BY p.date DESC) rn
FROM "_44_SpeakerSpeech" s
LEFT JOIN "_34_programWeekend" p ON s.Speech = p.theme_id
WHERE s.id_speaker = '_BOURC01'
)
SELECT id, active, id_speaker, Speech, date
FROM cte
WHERE rn = 1;
This logic assumes that when two or more records all have the same columns values (excluding the date), you want to retain only the latest record.

SQL - Query to return active subscriptions on a given day

I have a table that shows when a user signs up for a subscription and when their membership will expire. A user can purchase a new subscription even if their current one is in force.
userid|purchasedate|expirydate
1 |2019-01-01 |2019-02-01
2 |2019-01-02 |2019-02-02
3 |2019-01-03 |2019-02-03
3 |2019-01-04 |2019-03-03
I need a SQL query that will GROUP BY the date and return the number of active subscriptions on that date. So it would return:
date |count
2019-01-01|1
2019-01-02|2
2019-01-03|3
2019-01-04|3
Below is for BigQuery Standard SQL
#standardSQL
SELECT day, COUNT(DISTINCT userid) active_subscriptions
FROM (SELECT AS STRUCT MIN(purchasedate) min_date, MAX(expirydate) max_date FROM `project.dataset.table`),
UNNEST(GENERATE_DATE_ARRAY(min_date, max_date)) day
JOIN `project.dataset.table`
ON day BETWEEN purchasedate AND expirydate
GROUP BY day
You can test, play with above using dummy data from your question as in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 userid, DATE '2019-01-01' purchasedate, DATE '2019-02-01' expirydate UNION ALL
SELECT 2, '2019-01-02', '2019-02-02' UNION ALL
SELECT 3, '2019-01-03', '2019-02-03' UNION ALL
SELECT 3, '2019-01-04', '2019-03-03'
)
SELECT day, COUNT(DISTINCT userid) active_subscriptions
FROM (SELECT AS STRUCT MIN(purchasedate) min_date, MAX(expirydate) max_date FROM `project.dataset.table`),
UNNEST(GENERATE_DATE_ARRAY(min_date, max_date)) day
JOIN `project.dataset.table`
ON day BETWEEN purchasedate AND expirydate
GROUP BY day
with below output
Row day active_subscriptions
1 2019-01-01 1
2 2019-01-02 2
3 2019-01-03 3
4 2019-01-04 3
5 2019-01-05 3
6 2019-01-06 3
... ... ...
... ... ...
31 2019-01-31 3
32 2019-02-01 3
33 2019-02-02 2
34 2019-02-03 1
35 2019-02-04 1
... ... ...
... ... ...
61 2019-03-02 1
62 2019-03-03 1
You need a list of dates and count(distinct):
select d.dte, count(distinct t.userid) as num_users
from (select distinct purchase_date as dte from t) d left join
t
on d.dte >= t.dte and
d.dte <= t.expiry_date
group by d.dte
order by d.dte;
EDIT:
BigQuery can be fickle about inequalities in the on clause. Here is another approach:
select dte, count(distinct t.userid) as num_users
from t cross join
unnest(generate_date_array(t.purchase_date, t.expiry_date, interval 1 day)) dte
group by dte
order by dte;
You can use a where clause to filter down to particular dates.
I make the table name 'test_expirydate' and use your data
and this one work
select
tb1.expirydate,
count(*) as total
from test_expirydate as tb1
left join (
select
expirydate
from test_expirydate as tb2
group by userid
) as tb2
on tb1.expirydate >= tb2.expirydate
group by tb1.expirydate
I don't sure is it work in other case or not but it fine with current data
Oh, I interpret that the left column should be the expiration date.

Current record with group by function

Trying to get userid recent aggregate value for session_id.
(session_id 3 has two records, recent agg value is 80.00
session_id 4 has four records, recent agg value is 95.00
session_id 6 has three records, recent agg value is 72.00
Table:session_agg
id session_id userid agg date
-- ---------- ------ ----- -------
1 3 11 60.00 1573561586
4 3 11 80.00 1573561586
6 4 11 35.00 1573561749
7 4 11 50.00 1573561751
8 4 11 70.00 1573561912
10 4 11 95.00 1573561921
11 6 14 40.00 1573561945
12 6 14 67.00 1573561967
13 6 14 72.00 1573561978
select id, session_id, userid, agg, date from session_agg
WHERE date IN (select MAX(date) from session_agg GROUP BY session_id) AND
userid = 11
If you want to stick with your current approach, then you need to correlate the session_id in the subquery which checks for the max date for each session:
SELECT id, session_id, userid, add, date
FROM session_agg sa1
WHERE
date = (SELECT MAX(date) FROM session_agg sa2 WHERE sa2.session_id = sa1.session_id) AND
userid = 11;
But, if your version of SQL supports analytic functions, ROW_NUMBER is an easier way to do this:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY session_id ORDER BY date DESC) rn
FROM session_agg
)
SELECT id, session_id, userid, add, date
FROM cte
WHERE rn = 1;