How to fill table with missed dates in PostgreSQL with previous data - sql

I have a table:
date
user_id
state
8/12/2021
1
visit
9/12/2021
1
registered
12/12/2021
1
order
In this table I only have updated of state of users, but I don't see the state by some particular date. How can I add rows with missing dates and fill them with previous value, so that the table will be:
date
user_id
state
8/12/2021
1
visit
9/12/2021
1
registered
10/12/2021
1
registered
11/12/2021
1
registered
12/12/2021
1
order

Here's one attempt. The cte user_dates gets min and max dates for each user that is then fed to generate_series. I.e. each user is associated with all dates between there first and last date.
In the inner select we create a group for each first_value and consecutive null states.
In the outer select we pick the first_value for each such grp.
with user_dates(f, t, user_id) as (
select min(T.dt), max(T.dt), user_id
from T
group by user_id
)
select user_id, dt, grp, first_value(state) over (partition by user_id, grp order by dt)
from (
select ud.user_id
, cal.dt::date
, state
, count(T.state) over (partition by user_id
order by cal.dt) as grp
from user_dates ud
cross join generate_series(ud.f::timestamp, ud.t::timestamp , interval '1 day') cal (dt)
left join T
using (dt, user_id)
) as tmp
order by user_id, dt
;
user_id dt grp first_value
1 2021-12-08 1 visit
1 2021-12-09 2 registered
1 2021-12-10 2 registered
1 2021-12-11 2 registered
1 2021-12-12 3 order
You can remove grp from the select, it's merely there for informative purposes.
Fiddle

Related

Time Between First and Second Records SQL

I am trying to calculate the time between the first and second records. My thought was to add a ranking for each record and then do a calculation on RN 2 - RN 1. I'm struggling to actually get the subquery to do RN2-RN1.
SAMPLE Data:
user_id
date
rn
698998737289929044
2021-04-08 11:27:38
1
698998737289929044
2021-04-08 12:20:25
2
698998737289929044
2021-04-01 13:23:59
3
732850336550572910
2021-03-23 06:13:25
1
598830651911547855
2021-03-11 11:56:53
1
SELECT
user_id,
date,
row_number() over(partition by user_id order by date) as RN
FROM event_table
GROUP BY user_id, date
You can join the result with itself to get the first and second row.
For example:
with
q as (
-- your query here
)
select
f.user_id,
f.date,
s.date - f.date as diff
from q f
left join q s on s.user_id = f.user_id and s.rn = 2
where f.rn = 1

SQL 30 day active user query

I have a table of users and how many events they fired on a given date:
DATE
USERID
EVENTS
2021-08-27
1
5
2021-07-25
1
7
2021-07-23
2
3
2021-07-20
3
9
2021-06-22
1
9
2021-05-05
1
4
2021-05-05
2
2
2021-05-05
3
6
2021-05-05
4
8
2021-05-05
5
1
I want to create a table showing number of active users for each date with active user being defined as someone who has fired an event on the given date or in any of the preceding 30 days.
DATE
ACTIVE_USERS
2021-08-27
1
2021-07-25
3
2021-07-23
2
2021-07-20
2
2021-06-22
1
2021-05-05
5
I tried the following query which returned only the users who were active on the specified date:
SELECT COUNT(DISTINCT USERID), DATE
FROM table
WHERE DATE >= (CURRENT_DATE() - interval '30 days')
GROUP BY 2 ORDER BY 2 DESC;
I also tried using a window function with rows between but seems to end up getting the same result:
SELECT
DATE,
SUM(ACTIVE_USERS) AS ACTIVE_USERS
FROM
(
SELECT
DATE,
CASE
WHEN SUM(EVENTS) OVER (PARTITION BY USERID ORDER BY DATE ROWS BETWEEN 30 PRECEDING AND CURRENT ROW) >= 1 THEN 1
ELSE 0
END AS ACTIVE_USERS
FROM table
)
GROUP BY 1
ORDER BY 1
I'm using SQL:ANSI on Snowflake. Any suggestions would be much appreciated.
This is tricky to do as window functions -- because count(distinct) is not permitted. You can use a self-join:
select t1.date, count(distinct t2.userid)
from table t join
table t2
on t2.date <= t.date and
t2.date > t.date - interval '30 day'
group by t1.date;
However, that can be expensive. One solution is to "unpivot" the data. That is, do an incremental count per user of going "in" and "out" of active states and then do a cumulative sum:
with d as ( -- calculate the dates with "ins" and "outs"
select user, date, +1 as inc
from table
union all
select user, date + interval '30 day', -1 as inc
from table
),
d2 as ( -- accumulate to get the net actives per day
select date, user, sum(inc) as change_on_day,
sum(sum(inc)) over (partition by user order by date) as running_inc
from d
group by date, user
),
d3 as ( -- summarize into active periods
select user, min(date) as start_date, max(date) as end_date
from (select d2.*,
sum(case when running_inc = 0 then 1 else 0 end) over (partition by user order by date) as active_period
from d2
) d2
where running_inc > 0
group by user
)
select d.date, count(d3.user)
from (select distinct date from table) d left join
d3
on d.date >= start_date and d.date < end_date
group by d.date;

How to cross join but using latest value in BIGQUERY

I have this table below
date
id
value
2021-01-01
1
3
2021-01-04
1
5
2021-01-05
1
10
And I expect output like this, where the date column is always increase daily and value column will generate the last value on an id
date
id
value
2021-01-01
1
3
2021-01-02
1
3
2021-01-03
1
3
2021-01-04
1
5
2021-01-05
1
10
2021-01-06
1
10
I think I can use cross join but I can't get my expected output and think that there are a special syntax/logic to solve this
Consider below approach
select * from `project.dataset.table`
union all
select missing_date, prev_row.id, prev_row.value
from (
select *, lag(t) over(partition by id order by date) prev_row
from `project.dataset.table` t
), unnest(generate_date_array(prev_row.date + 1, date - 1)) missing_date
I would write this using:
select dte, t.id, t.value
from (select t.*,
lead(date, 1, date '2021-01-06') over (partition by id order by date) as next_day
from `table` t
) t cross join
unnest(generate_date_array(
date,
ifnull(
date_add(next_date, interval -1 day), -- generate missing date rows
(select max(date) from `table`) -- add last row
)
)) dte;
Note that this requires neither union all nor window function to fill in the values.
alternative solution using last_value. You may explore the following query and customize your logic to generate days (if needed)
WITH
query AS (
SELECT
date,
id,
value
FROM
`mydataset.newtable`
ORDER BY
date ),
generated_days AS (
SELECT
day
FROM (
SELECT
MIN(date) min_dt,
MAX(date) max_dt
FROM
query),
UNNEST(GENERATE_DATE_ARRAY(min_dt, max_dt)) day )
SELECT
g.day,
LAST_VALUE(q.id IGNORE NULLS) OVER(ORDER BY g.day) id,
LAST_VALUE(q.value IGNORE NULLS) OVER(ORDER BY g.day) value,
FROM
generated_days g
LEFT OUTER JOIN
query q
ON
g.day = q.date
ORDER BY
g.day

Get latest entry in each week over several years period

I have the following table to store history for entities:
Date Id State
-------------------------------------
2017-10-10 1 0
2017-10-12 1 4
2018-5-30 1 8
2019-4-1 2 0
2018-3-6 2 4
2018-3-7 2 0
I want to get last entry for each Id in one week period e.g.
Date Id State
-------------------------------------
2017-10-12 1 4
2018-5-30 1 8
2019-4-1 2 0
2018-3-7 2 0
I'd try to use Partition by:
select
ID
,Date
,State
,DatePart(week,Date) as weekNumber
from TableA
where Date = (
select max(Date) over (Partition by Id Order by DatePart(week, Date) Desc)
)
order by ID
but it still gives me more than one result per week.
You can use ROW_NUMBER():
SELECT a.*
FROM (SELECT a.*, ROW_NUMBER() OVER (PARTITION BY a.id, DATEPART(WK, a.Date) ORDER BY a.Date DESC) AS Seq
FROM tablea a
) a
WHERE seq = 1
ORDER BY id, Date;

Teradara SQL - Operation with max-min dates

suppose I have the following data frame in Reradata SQL.
How can I get the variation between the highest and lowest date, at user level? Regards
Initial table
user date price
1 1-1 10
1 2-1 20
1 3-1 30
2 1-1 12
2 2-1 22
2 3-1 32
3 1-1 13
3 2-1 23
3 3-1 33
Final table
user var_price
1 30/10-1
2 32/12-1
3 33/13-1
Try this-
SELECT B.[user],
CAST(SUM(B.max_price) AS VARCHAR)+'/'+CAST(SUM(B.min_price) AS VARCHAR)+ '-1' var_price,
SUM(B.max_price)/SUM(B.min_price) -1 calculated_var_price
FROM
(
SELECT * FROM
(
SELECT [user],0 max_price,price min_price,ROW_NUMBER() OVER (PARTITION BY [user] ORDER BY DATE) RN
FROM your_table
)A WHERE RN = 1
UNION ALL
SELECT * FROM
(
SELECT [user],price max_price,0 min_price, ROW_NUMBER() OVER (PARTITION BY [user] ORDER BY DATE DESC) RN
FROM your_table
)A WHERE RN = 1
)B
GROUP BY B.[user]
Output is-
user var_price calculated_var_price
1 30/10-1 2
2 32/12-1 1
3 33/13-1 1
Is this what you want?
select user, max(price) / min(price) - 1
from t
group by user;
Your values are monotonically increasing, so max() and min() seems like the simplest solution.
EDIT:
You can use window functions:
select user, max(last_price) / max(first_price) - 1
from (select t.*,
first_value(price) over (partition by user order by date rows between unbounded preceding and current_row) as first_price,
first_value(price) over (partition by user order by date desc rows between unbounded preceding and current_row) as last_price
from t
) t
group by user;
select user
,price as first_price
,last_value(price)
over (paritition by user
order by date
rows between unbounded preceding and unbounded following) as last_price
from mytab
qualify
row_number() -- lowest date only
over (paritition by user
order by date) = 1
This returns the row with the lowest date and adds the price of the latest date