SQL :Fill null date and null value in the rol - sql

I have table like this
user | date | Balance|
AAA | 2019-10-25 | 100 |
AAA | 2019-10-23 | 125 |
AAA | 2019-10-22 | 150 |
AAA | 2019-10-20 | 100 |
I want to fill missing date and value on that day with previous date & value.
and fill the first row(and other day that missing) with current date with previous value too.
user | date | Balance|
**AAA | 2019-10-27 | 100 |**
**AAA | 2019-10-26 | 100 |**
AAA | 2019-10-25 | 100 |
**AAA | 2019-10-24 | 125 |**
AAA | 2019-10-23 | 125 |
AAA | 2019-10-22 | 150 |
**AAA | 2019-10-21 | 100 |**
AAA | 2019-10-20 | 100 |

The key here is generating the dates:
select u.dte
from (values (sequence(cast('2019-10-20' as date),
cast('2019-10-27' as date),
interval '1' day
)
)
) v(date_array) cross join
unnest(v.date_array) u(dte)
Then, you can use this information to fill in the values:
with dates as (
select u.dte
from (values (sequence(cast('2019-10-20' as date),
cast('2019-10-27' as date),
interval '1' day
)
)
) v(date_array) cross join
unnest(v.date_array) u(dte)
)
select user, dte,
max(balance) over (partition by user, grp) as balance
from (select d.dte, u.user, t.balance,
count(t.user) over (partition by u.user order by d.dte) as grp
from dates d cross join
(select distinct user from t) u left join
t
on t.date = d.dte and t.user = u.user
) du
order by user, dte;
The final query is implementing lag(ignore nulls). What it does is assign a grouping based on the presence of a record in your data -- that is what the count(t.user) over () is doing. The outer select then spreads this value over the entire group.
EDIT:
According to Piotr's comment:
select user, dte,
coalesce(balance,
lag(balance) ignore nulls over (partition by user, grp)
) as balance
from dates d cross join
(select distinct user from t) u left join
t
on t.date = d.dte and t.user = u.user
order by user, dte;

Related

Repeat rows cumulative

I have this table
| date | id | number |
|------------|----|--------|
| 2021/05/01 | 1 | 10 |
| 2021/05/02 | 2 | 20 |
| 2021/05/03 | 3 | 30 |
| 2021/05/04 | 1 | 20 |
I am trying to write a query to have this other table
| date | id | number |
|------------|----|--------|
| 2021/05/01 | 1 | 10 |
| 2021/05/02 | 1 | 10 |
| 2021/05/02 | 2 | 20 |
| 2021/05/03 | 1 | 10 |
| 2021/05/03 | 2 | 20 |
| 2021/05/03 | 3 | 30 |
| 2021/05/04 | 1 | 20 |
| 2021/05/04 | 2 | 20 |
| 2021/05/04 | 3 | 30 |
The idea is that each date should have all the previus different ids with its number, and if an id is repeated then only the last value should be considered.
One way is to expand out all the rows for each date. Then take the most recent value using qualify:
with t as (
select date '2021-05-01' as date, 1 as id, 10 as number union all
select date '2021-05-02' as date, 2 as id, 20 as number union all
select date '2021-05-03' as date, 3 as id, 30 as number union all
select date '2021-05-04' as date, 1 as id, 20 as number
)
select d.date, t.id, t.number
from t join
(select date
from (select min(date) as min_date, max(date) as max_date
from t
) tt cross join
unnest(generate_date_array(min_date, max_date, interval 1 day)) date
) d
on t.date <= d.date
where 1=1
qualify row_number() over (partition by d.date, t.id order by t.date desc) = 1
order by 1, 2, 3;
A more efficient method doesn't generate all the rows and then filter them. Instead, it just generates the rows that are needed by generating the appropriate dates within each row. That requires a couple of window functions to get the "next" date for each id and the maximum date in the data:
with t as (
select date '2021-05-01' as date, 1 as id, 10 as number union all
select date '2021-05-02' as date, 2 as id, 20 as number union all
select date '2021-05-03' as date, 3 as id, 30 as number union all
select date '2021-05-04' as date, 1 as id, 20 as number
)
select date, t.id, t.number
from (select t.*,
date_add(lead(date) over (partition by id order by date), interval -1 day) as next_date,
max(date) over () as max_date
from t
) t cross join
unnest(generate_date_array(date, coalesce(next_date, max_date))) date
order by 1, 2, 3;
Consider below [less verbose] approach
select t1.date, t2.id, t2.number
from (
select *, array_agg(struct(date, id,number)) over(order by date) arr
from `project.dataset.table`
) t1, unnest(arr) t2
where true
qualify row_number() over (partition by t1.date, t2.id order by t2.date desc) = 1
# order by date, id
if applied to sample data in your question - output is

PostgreSQL: Filter select query by comparing against other rows

Suppose I have a table of Events that lists a userId and the time the Event occurred:
+----+--------+----------------------------+
| id | userId | time |
+----+--------+----------------------------+
| 1 | 46 | 2020-07-22 11:22:55.307+00 |
| 2 | 190 | 2020-07-13 20:57:07.138+00 |
| 3 | 17 | 2020-07-11 11:33:21.919+00 |
| 4 | 46 | 2020-07-22 10:17:11.104+00 |
| 5 | 97 | 2020-07-13 20:57:07.138+00 |
| 6 | 17 | 2020-07-04 11:33:21.919+00 |
| 6 | 17 | 2020-07-11 09:23:21.919+00 |
+----+--------+----------------------------+
I want to get the list of events that had a previous event on the same day, by the same user. The result for the above table would be:
+----+--------+----------------------------+
| id | userId | time |
+----+--------+----------------------------+
| 1 | 46 | 2020-07-22 11:22:55.307+00 |
| 3 | 17 | 2020-07-11 11:33:21.919+00 |
+----+--------+----------------------------+
How can I perform a select query that filters results by evaluating them against other rows in the table?
This can be done using an EXISTS condition:
select t1.*
from the_table t1
where exists (select *
from the_table t2
where t2.userid = t1.userid -- for the same user
and t2.time::date = t1.time::date -- on the same
and t2.time < t1.time); -- but previously on that day
You can use lag():
select t.*
from (select t.*,
lag(time) over (partition by userid, time::date order by time) as prev_time
from t
) t
where prev_time is not null;
Here is a db<>fiddle.
Or row_number():
select t.*
from (select t.*,
row_number() over (partition by userid, time::date order by time) as seqnum
from t
) t
where seqnum >= 2;
You can use LAG() to find the previous row for a user. Then a simple comparison will tell if it occured in the same day or not.
For example:
select *
from (
select
*,
lag(time) over(partition by userId order by time) as prev_time
from t
) x
where date::date = prev_time::date
You can use ROW_NUMBER() analytic function :
SELECT id , userId , time
FROM
(
SELECT ROW_NUMBER() OVER (PARTITION BY UserId, date_trunc('day',time) ORDER BY time DESC) AS rn,
t.*
FROM Events
) q
WHERE rn > 1
in order to bring the latest event for UserId who takes place in more than one event.

Split a date range in SQL Server

I'm struggling with a solution for a problem but I couldn't find anything similar here.
I have a table "A" like:
+---------+------------+------------+-----------+
| user_id | from | to | attribute |
+---------+------------+------------+-----------+
| 1 | 2020-01-01 | 2020-12-31 | abc |
+---------+------------+------------+-----------+
and I get a table "B" like:
+---------+------------+------------+-----------+
| user_id | from | to | attribute |
+---------+------------+------------+-----------+
| 1 | 2020-03-01 | 2020-04-15 | def |
+---------+------------+------------+-----------+
And what I need is:
+---------+------------+------------+-----------+
| user_id | from | to | attribute |
+---------+------------+------------+-----------+
| 1 | 2020-01-01 | 2020-02-29 | abc |
| 1 | 2020-03-01 | 2020-04-15 | def |
| 1 | 2020-04-16 | 2020-12-31 | abc |
+---------+------------+------------+-----------+
I tried just using insert and update but I couldn't figure out how to simultaneously do both. Is there a much simpler way? I read about CTE, could this be an approach?
I'd be very thankful for your help!
Edit: more examples
TABLE A
| user_id | from | to | attribute |
+=========+============+============+===========+
| 1 | 2020-01-01 | 2020-12-31 | atr1 |
| 1 | 2021-01-01 | 2021-12-31 | atr2 |
| 2 | 2020-01-01 | 2021-06-15 | atr1 |
| 3 | 2020-01-01 | 2021-06-15 | atr3 |
TABLE B
| user_id | from | to | attribute |
+=========+============+============+===========+
| 1 | 2020-09-01 | 2021-02-15 | atr3 |
| 2 | 2020-04-15 | 2020-05-31 | atr2 |
| 3 | 2021-04-01 | 2022-01-01 | atr1 |
OUTPUT:
| user_id | from | to | attribute |
+=========+============+============+===========+
| 1 | 2020-01-01 | 2020-08-31 | atr1 |
| 1 | 2020-09-01 | 2021-02-15 | atr3 |
| 1 | 2021-02-16 | 2021-12-31 | atr2 |
| 2 | 2020-01-01 | 2020-04-14 | atr1 |
| 2 | 2020-04-15 | 2020-05-31 | atr2 |
| 2 | 2020-06-01 | 2021-06-15 | atr1 |
| 3 | 2020-01-01 | 2021-03-31 | atr3 |
| 3 | 2021-04-01 | 2022-01-01 | atr1 |
Initially I just asked to split the date range and make a new row because the new attribute of table B is between the one in table A. But it's only a part of the problem. Maybe it's more clear with the new dataset(?)
Sample data,
create table #TableA( userid int, fromdt date
,todt date, attribute varchar(10))
insert into #TableA (userid , fromdt , todt , attribute)
values
( 1 ,'2020-01-01','2020-12-31' , 'atr1' ),
( 1 ,'2021-01-01','2021-12-31' , 'atr2' ),
( 2 ,'2020-01-01','2021-06-15' , 'atr1' ),
( 3 ,'2020-01-01','2021-06-15' , 'atr3' )
create table #TableB( userid int,fromdt date
,todt date, attribute varchar(10))
insert into #TableB (userid,fromdt, todt, attribute)
values
( 1 ,'2020-09-01','2021-02-15' , 'atr3' ),
( 2 ,'2020-04-15','2020-05-31' , 'atr2' ),
( 3 ,'2021-04-01','2022-01-01' , 'atr1' )
;
The Script,
;WITH CTE
AS (
SELECT *
FROM #TableA
UNION ALL
SELECT *
FROM #TableB
)
,CTE2
AS (
SELECT userid
,min(fromdt) minfromdt
,max(todt) maxtodt
FROM CTE
GROUP BY userid
)
,CTE3
AS (
SELECT c.userid
,c.fromdt
,c.todt
,c.attribute
,LEAD(c.fromdt, 1) OVER (
PARTITION BY c.userid ORDER BY c.fromdt
) LeadFromdt
FROM CTE c
)
,CTE4
AS (
SELECT c3.userid
,c3.fromdt
,CASE
WHEN c3.todt > c3.LeadFromdt
THEN dateadd(day, - 1, c3.leadfromdt)
--when c3.todt<c3.LeadFromdt then dateadd(day,-1,c3.leadfromdt)
ELSE c3.todt
END AS Todt
,
--c3.todt as todt1,
c3.attribute
FROM CTE3 c3
)
,CTE5
AS (
SELECT userid
,fromdt
,todt
,attribute
FROM CTE4
UNION ALL
SELECT c2.userid
,dateadd(day, 1, c4.Todt) AS Fromdt
,maxtodt AS Todt
,c4.attribute
FROM CTE2 c2
CROSS APPLY (
SELECT TOP 1 c4.todt
,c4.attribute
FROM cte4 c4
WHERE c2.userid = c4.userid
ORDER BY c4.Todt DESC
) c4
WHERE c2.maxtodt > c4.Todt
)
SELECT *
FROM CTE5
ORDER BY userid
,fromdt
drop table #TableA, #TableB
Your output is wrong.
Also append other sample data in same example
where my script is not working.
The easiest way is to work with a calendar table. You can create one and reuse it later.
When you have one (here I called it "AllDates"), you can do something like this:
WITH cte
as
(
select ad.theDate,u.userid,isnull(b.attrib,a.attrib) as attrib,
ROW_NUMBER() over (PARTITION BY u.userid, isnull(b.attrib,a.attrib)ORDER BY ad.theDate)
- ROW_NUMBER() over (PARTITION BY u.userid ORDER BY ad.theDate) as grp
from AllDates ad
cross join (select userid from tableA union select userid from tableB) u
left join tableB b on ad.theDate between b.frm and b.toD and u.userid = b.userid
left join tableA a on ad.theDate between a.frm and a.toD and u.userid = a.userid
where b.frm is not null
or a.frm is not null
)
SELECT userid,attrib,min(theDate) as frmD, max(theDate) as toD
FROM cte
GROUP BY userid,attrib,grp
ORDER BY 1,3;
If I understand the request correctly the data from table A should be merged into table B to fill the gaps based on four scenarios, here is how I achieved it:
/*
Scenario 1 - Use dates from B as base to be filled in from A
- Start and end dates from B
*/
SELECT
B.UserId,
B.StartDate,
B.EndDate,
B.Attr
FROM #tmpB AS B
UNION
/*
Scenario 2 - Start date between start and end date of another record
- End date from B plus one day as start date
- End date from A as end date
*/
SELECT
B.UserId,
DATEADD(DD, 1, B.EndDate) AS StartDate,
A.EndDate,
A.Attr
FROM #tmpB AS B
JOIN #tmpA AS A ON
B.UserId = A.UserId
AND B.StartDate < A.StartDate
AND B.EndDate > A.StartDate
UNION
/*
Scenario 3 - End date between start and end date of another record or both dates between start and end date of another record
- Start date from A as start date
- Start date from B minus one as end date
*/
SELECT
B.UserId,
A.StartDate,
DATEADD(DD, -1, B.StartDate) AS EndDate,
A.Attr
FROM #tmpB AS B
JOIN #tmpA AS A ON
B.UserId = A.UserId
AND (B.StartDate < A.EndDate AND B.EndDate > A.EndDate
OR B.StartDate BETWEEN A.StartDate AND A.EndDate AND B.EndDate BETWEEN A.StartDate AND A.EndDate)
UNION
/*
Scenario 4 - Both dates between start and end date of another record
- End date from B minus one as start date
- End date from A as end date
*/
SELECT
B.UserId,
DATEADD(DD, -1, B.EndDate) AS StartDate,
A.EndDate,
A.Attr
FROM #tmpB AS B
JOIN #tmpA AS A ON
B.UserId = A.UserId
AND B.StartDate BETWEEN A.StartDate AND A.EndDate
AND B.EndDate BETWEEN A.StartDate AND A.EndDate

Psql - generate series with running total

I have the following table:
create table account_info(
id int not null unique,
creation_date date,
deletion_date date,
gather boolean)
Adding sample data to it:
insert into account_info(id,creation_date,deletion_date,gather)
values(1,'2019-09-10',null,true),
(2,'2019-09-12',null,true),
(3,'2019-09-14','2019-10-08',true),
(4,'2019-09-15','2019-09-18',true),
(5,'2019-09-22',null,false),
(6,'2019-09-27','2019-09-29',true),
(7,'2019-10-04','2019-10-17',false),
(8,null,'2019-10-20',true),
(9,'2019-10-12',null,true),
(10,'2019-10-18',null,true)
I would like to see how many accounts have been added grouped by week and how many accounts have been deleted grouped by week.
I have tried the following:
select dd, count(distinct ai.id) as created ,count(distinct ai2.id) as deleted
from generate_series('2019-09-01'::timestamp,
'2019-10-21'::timestamp, '1 week'::interval) dd
left join account_info ai on ai.creation_date::DATE <= dd::DATE
left join account_info ai2 on ai2.deletion_date::DATE <=dd::DATE
where ai.gather is true
and ai2.gather is true
group by dd
order by dd asc
This produces the following output:
dd | Created | Deleted |
+------------+---------+---------+
| 2019-09-22 | 4 | 1 |
| 2019-09-29 | 5 | 2 |
| 2019-10-06 | 5 | 2 |
| 2019-10-13 | 6 | 3 |
| 2019-10-20 | 7 | 4 |
This output shows me the the running total of how many have been created and how many been deleted.
I would like to see however something like this:
+------------+---------+---------+-------------------+-------------------+
| dd | Created | Deleted | Total Sum Created | Total Sum Deleted |
+------------+---------+---------+-------------------+-------------------+
| 2019-09-22 | 4 | 1 | 4 | 1 |
| 2019-09-29 | 1 | 1 | 5 | 2 |
| 2019-10-06 | NULL | NULL | 5 | 2 |
| 2019-10-13 | 1 | 1 | 6 | 3 |
| 2019-10-20 | 1 | 1 | 7 | 4 |
I get an error message, when trying to sum up the created and deletedcolumns in psql. As I cannot nest aggregate functions.
You could just turn your existing query to a subquery and use lag() to compute the difference between consecutive records:
select
dd,
created - coalesce(lag(created) over(order by dd), 0) created,
deleted - coalesce(lag(deleted) over(order by dd), 0) deleted,
created total_sum_created,
deleted total_sum_deleted
from (
select
dd,
count(distinct ai.id) as created ,
count(distinct ai2.id) as deleted
from
generate_series(
'2019-09-01'::timestamp,
'2019-10-21'::timestamp,
'1 week'::interval
) dd
left join account_info ai
on ai.creation_date::DATE <= dd::DATE and ai.gather is true
left join account_info ai2
on ai2.deletion_date::DATE <=dd::DATE and ai2.gather is true
group by dd
) x
order by dd asc
I moved conditions ai[2].gather = true to the on side of the join: putting these conditions in the where clause basically turns you left joins to inner joins.
Demo on DB Fiddle:
| dd | created | deleted | total_sum_created | total_sum_deleted |
| ------------------------ | ------- | ------- | ----------------- | ----------------- |
| 2019-09-01T00:00:00.000Z | 0 | 0 | 0 | 0 |
| 2019-09-08T00:00:00.000Z | 0 | 0 | 0 | 0 |
| 2019-09-15T00:00:00.000Z | 4 | 0 | 4 | 0 |
| 2019-09-22T00:00:00.000Z | 0 | 1 | 4 | 1 |
| 2019-09-29T00:00:00.000Z | 1 | 1 | 5 | 2 |
| 2019-10-06T00:00:00.000Z | 0 | 0 | 5 | 2 |
| 2019-10-13T00:00:00.000Z | 1 | 1 | 6 | 3 |
| 2019-10-20T00:00:00.000Z | 1 | 1 | 7 | 4 |
Another option would be to use lag() in combination with generate_series() to generate a list of date ranges. Then you can do just one join on the original table, and do conditional aggregation in the outer query:
select
dd,
count(distinct case
when ai.creation_date::date <= dd::date and ai.creation_date::date > lag_dd::date
then ai.id
end) created,
count(distinct case
when ai.deletion_date::date <= dd::date and ai.deletion_date::date > lag_dd::date
then ai.id
end) deleted,
count(distinct case
when ai.creation_date::date <= dd::date
then ai.id
end) total_sum_created,
count(distinct case
when ai.deletion_date::date <= dd::date
then ai.id
end) total_sum_deleted
from
(
select dd, lag(dd) over(order by dd) lag_dd
from generate_series(
'2019-09-01'::timestamp,
'2019-10-21'::timestamp,
'1 week'::interval
) dd
) dd
left join account_info ai on ai.gather is true
group by dd
order by dd
Demo on DB Fiddle
A lateral join and aggregation is soooo well suited to this problem. If you are content with the weeks in the data:
select date_trunc('week', dte) as week,
sum(is_create) as creates_in_week,
sum(is_delete) as deletes_in_week,
sum(sum(is_create)) over (order by min(v.dte)) as running_creates,
sum(sum(is_delete)) over (order by min(v.dte)) as running_deletes
from account_info ai cross join lateral
(values (ai.creation_date, 1, 0), (ai.deletion_date, 0, 1)
) v(dte, is_create, is_delete)
where v.dte is not null and ai.gather
group by week
order by week;
If you want it for a specified set of weeks:
select gs.wk,
sum(v.is_create) as creates_in_week,
sum(v.is_delete) as deletes_in_week,
sum(sum(v.is_create)) over (order by min(v.dte)) as running_creates,
sum(sum(v.is_delete)) over (order by min(v.dte)) as running_deletes
from generate_series('2019-09-01'::timestamp,
'2019-10-21'::timestamp, '1 week'::interval) gs(wk) left join
( account_info ai cross join lateral
(values (ai.creation_date, 1, 0), (ai.deletion_date, 0, 1)
) v(dte, is_create, is_delete)
)
on v.dte >= gs.wk and
v.dte < gs.wk + interval '1 week'
where dte is not null and ai.gather
group by gs.wk
order by gs.wk;
Here is a db<>fiddle.
You can generate the results you want using a series of CTEs to build up the data tables:
with dd as
(select *
from generate_series('2019-09-01'::timestamp,
'2019-10-21'::timestamp, '1 week'::interval) d),
ddl as
(select d, coalesce(lag(d) over (order by d), '1970-01-01'::timestamp) as pd
from dd),
counts as
(select d, count(distinct ai.id) as created, count(distinct ai2.id) as deleted
from ddl
left join account_info ai on ai.creation_date::DATE > ddl.pd::DATE AND ai.creation_date::DATE <= ddl.d::DATE AND ai.gather is true
left join account_info ai2 on ai2.deletion_date::DATE > ddl.pd::DATE AND ai2.deletion_date::DATE <= ddl.d::DATE AND ai2.gather is true
group by d)
select d, created, deleted,
sum(created) over (rows unbounded preceding) as "total created",
sum(deleted) over (rows unbounded preceding) as "total deleted"
from counts
order by d asc
Note that the gather condition needs to be part of the left join to avoid turning those into inner joins.
Output:
d created deleted total created total deleted
2019-09-01 00:00:00 0 0 0 0
2019-09-08 00:00:00 0 0 0 0
2019-09-15 00:00:00 4 0 4 0
2019-09-22 00:00:00 0 1 4 1
2019-09-29 00:00:00 1 1 5 2
2019-10-06 00:00:00 0 0 5 2
2019-10-13 00:00:00 1 1 6 3
2019-10-20 00:00:00 1 1 7 4
Note this query gives the results for the week ending with d. If you want results for the week starting with d, the lag can be changed to lead. You can see this in my demo.
Demo on dbfiddle

Select rows which repeat every month

I am trying to resolve on simple task for first look.
I have transactions table.
| name |entity_id| amount | date |
|--------|---------|--------|------------|
| Github | 1 | 4.80 | 01/01/2014 |
| itunes | 2 | 2.80 | 22/01/2014 |
| Github | 1 | 4.80 | 01/02/2014 |
| Foods | 3 | 24.80 | 01/02/2014 |
| amazon | 4 | 14.20 | 01/03/2014 |
| amazon | 4 | 14.20 | 01/04/2014 |
I have to select rows which repeat every month in same day with same the amount for entity_id.(Subscriptions). Thanks for help
If your date column is created as a date type,
you could use a recursive CTE to collect continuations
after that, eliminate duplicate rows with distinct on
(and you should rename that column, because it's a reserved name in SQL)
with recursive recurring as (
select name, entity_id, amount, date as first_date, date as last_date, 0 as lvl
from transactions
union all
select r.name, r.entity_id, r.amount, r.first_date, t.date, r.lvl + 1
from recurring r
join transactions t
on row(t.name, t.entity_id, t.amount, t.date - interval '1' month)
= row(r.name, r.entity_id, r.amount, r.last_date)
)
select distinct on (name, entity_id, amount) *
from recurring
order by name, entity_id, amount, lvl desc
SQLFiddle
group it by day, for sample:
select entity_id, amount, max(date), min(date), count(*)
from transactions
group by entity_id, amount, date_part('day', date)