Get all rows from one table stream and the row before in time from an other table - sql

Suppose I have one table (table_1) and one table stream (stream_1) that gets changes made to table_1, in my case only inserts of new rows. And once I have acted on these changes, the rowes will be removed from stream_1 but remain in table_1.
From that I would like to calculate delta values for var1 (var1 - lag(var1) as delta_var1) partitioned on a customer and just leave var2 as it is. So the data in table_1 could look something like this:
timemessage
customerid
var1
var2
2021-04-01 06:00:00
1
10
5
2021-04-01 07:00:00
2
100
7
2021-04-01 08:00:00
1
20
10
2021-04-01 09:00:00
1
40
3
2021-04-01 15:00:00
2
150
5
2021-04-01 23:00:00
1
50
6
2021-04-02 06:00:00
2
180
2
2021-04-02 07:00:00
1
55
9
2021-04-02 08:00:00
2
200
4
And the data in stream_1 that I want to act on could looks like this:
timemessage
customerid
var1
var2
2021-04-01 23:00:00
1
50
6
2021-04-02 06:00:00
2
180
2
2021-04-02 07:00:00
1
55
9
2021-04-02 08:00:00
2
200
4
But to be able to calculate delta_var1 for all customers I would need the previous row in time for each customer before the ones in stream_1.
For example: To be able to calculate how much var1 has increased for customerid = 1 between 2021-04-01 09:00:00 and 2021-04-01 23:00:00 I want to include the 2021-04-01 09:00:00 row for customerid = 1 in my output.
So I would like to create a select containing all rows in stream_1 + the previous row in time for each customerid from table_1: The wanted output is the following in regard to the mentioned table_1 and stream_1.
timemessage
customerid
var1
var2
2021-04-01 09:00:00
1
40
3
2021-04-01 15:00:00
2
150
5
2021-04-01 23:00:00
1
50
6
2021-04-02 06:00:00
2
180
2
2021-04-02 07:00:00
1
55
9
2021-04-02 08:00:00
2
200
4

So given you have the "last value per day" in your wanted output, you are want a QUALIFY to keep only the wanted rows and using ROW_NUMBER partitioned by customerid and timemessage. Assuming the accumulator it positive only you can order by accumulatedvalue thus:
WITH data(timemessage, customerid, accumulatedvalue) AS (
SELECT * FROM VALUES
('2021-04-01', 1, 10)
,('2021-04-01', 2, 100)
,('2021-04-02', 1, 20)
,('2021-04-03', 1, 40)
,('2021-04-03', 2, 150)
,('2021-04-04', 1, 50)
,('2021-04-04', 2, 180)
,('2021-04-05', 1, 55)
,('2021-04-05', 2, 200)
)
SELECT * FROM data
QUALIFY ROW_NUMBER() OVER (PARTITION BY customerid,timemessage ORDER BY accumulatedvalue DESC) = 1
ORDER BY 1,2;
gives:
TIMEMESSAGE CUSTOMERID ACCUMULATEDVALUE
2021-04-01 1 10
2021-04-01 2 100
2021-04-02 1 20
2021-04-03 1 40
2021-04-03 2 150
2021-04-04 1 50
2021-04-04 2 180
2021-04-05 1 55
2021-04-05 2 200

if you can trust your data and data in table2 starts right after data in table1 then you can just get the last records for each customer from table1 and union with table2:
select * from table1
qualify row_number() over (partitioned by customerid order by timemessage desc) = 1
union all
select * from table2
if not
select a.* from table1 a
join table2 b
on a.customerid = b.customerid
and a.timemessage < b.timemessage
qualify row_number() over (partitioned by a.customerid order by a.timemessage desc) = 1
union all
select * from table2
also you can add a condition to not look to data for more than 1 day (or 1 hour or whatever safe interval is to look at) for better performance

Related

How to count consecutive days in a table where days are duplicated "PostgresSQL"

Hello I would like to know the highest count of consecutive days a user has trained for.
My logs table that stores the records looks like this:
id
user_id
day
ground_id
created_at
1
1
1
1
2023-01-24 10:00:00
2
1
2
1
2023-01-25 10:00:00
3
1
3
1
2023-01-26 10:00:00
4
1
4
1
2023-01-27 10:00:00
5
1
5
1
2023-01-28 10:00:00
The closest I could get is with this query, which does work only if the user has trained on one ground at a day.
SELECT COUNT(*) AS days_in_row
FROM (SELECT row_number() OVER (ORDER BY day) - day AS grp
FROM logs
WHERE created_at >= '2023-01-24 00:00:00'
AND user_id = 1) x
GROUP BY grp
logs table:
id
user_id
day
ground_id
created_at
1
1
1
1
2023-01-24 10:00:00
2
1
2
1
2023-01-25 10:00:00
3
1
3
1
2023-01-26 10:00:00
4
1
4
1
2023-01-27 10:00:00
5
1
5
1
2023-01-28 10:00:00
This query would return a count of 5 consecutive days which is correct.
However my query doesn't work once a user trains multiple times on different training grounds in one day:
logs table:
id
user_id
day
ground_id
created_at
1
1
1
1
2023-01-24 10:00:00
2
1
2
1
2023-01-25 10:00:00
3
1
3
1
2023-01-26 10:00:00
4
1
3
2
2023-01-26 10:00:00
5
1
4
1
2023-01-27 10:00:00
Than the query from above would return a count of 2 consecutive days which is not what I expect instead I would expect the number four because the user has trained the following days in row (1,2,3,4).
Thank you for reading.
Select only distinct data of interest first
SELECT min(created_at) start, COUNT(*) AS days_in_row
FROM (SELECT created_at, row_number() OVER (ORDER BY day) - day AS grp
FROM (
select distinct day, created_at
from logs
where created_at >= '2023-01-24 00:00:00'
AND user_id = 1) t
) x
GROUP BY grp

SQL - Calculate the average of a value in a table B from date range in table A

I am constructing a table in SQL like this
TABLE A
obj_id start_date end_date
1 2021-03-01 2022-08-02
1 2020-06-01 2021-07-02
2 2021-05-03 2022-08-04
3 2021-04-21 2022-06-05
And I have another table
TABLE B
obj_id date value
1 2021-04-12 21.45
3 2022-06-15 19.02
1 2020-11-02 3.11
2 2022-05-23 45.20
1 2022-07-31 32.45
3 2021-09-01 22.56
2 2021-10-10 34.04
I want to add to TABLE A a column with average value of TABLE B for corresponding obj_id of values where TABLE B date falls between TABLE A date range.
Expected result
TABLE A
obj_id start_date end_date average value
1 2021-03-01 2022-08-02 26.95 <-- Average value of 21.45 and 32.45 excluding 3.11 from average because date in table B is outside date range in table A
1 2020-06-01 2021-07-02 etc.
2 2021-05-03 2022-08-04 etc.
3 2021-04-21 2022-06-05 etc.
Sample query:
select
a.obj_id,
a.start_date,
a.end_date,
avg(b.value) as average
from table_a a
inner join table_b b
on a.obj_id = b.obj_id
and b.date >= a.start_date
and b.date <= a.end_date
group by
a.obj_id,
a.start_date,
a.end_date
order by
a.obj_id

How can I join two tables on an ID and a DATE RANGE in SQL

I have 2 query result tables containing records for different assessments. There are RAssessments and NAssessments which make up a complete review.
The aim is to eventually determine which reviews were completed. I would like to join the two tables on the ID, and on the date, HOWEVER the date each assessment is completed on may not be identical and may be several days apart, and some ID's may have more of an RAssessment than an NAssessment.
Therefore, I would like to join T1 on to T2 on ID & on T1Date(+ or - 7 days). There is no other way to match the two tables and to align the records other than using the date range, as this is a poorly designed database. I hope for some help with this as I am stumped.
Here is some sample data:
Table #1:
ID
RAssessmentDate
1
2020-01-03
1
2020-03-03
1
2020-05-03
2
2020-01-09
2
2020-04-09
3
2022-07-21
4
2020-06-30
4
2020-12-30
4
2021-06-30
4
2021-12-30
Table #2:
ID
NAssessmentDate
1
2020-01-07
1
2020-03-02
1
2020-05-03
2
2020-01-09
2
2020-07-06
2
2020-04-10
3
2022-07-21
4
2021-01-03
4
2021-06-28
4
2022-01-02
4
2022-06-26
I would like my end result table to look like this:
ID
RAssessmentDate
NAssessmentDate
1
2020-01-03
2020-01-07
1
2020-03-03
2020-03-02
1
2020-05-03
2020-05-03
2
2020-01-09
2020-01-09
2
2020-04-09
2020-04-10
2
NULL
2020-07-06
3
2022-07-21
2022-07-21
4
2020-06-30
NULL
4
2020-12-30
2021-01-03
4
2021-06-30
2021-06-28
4
2021-12-30
2022-01-02
4
NULL
2022-01-02
Try this:
SELECT
COALESCE(a.ID, b.ID) ID,
a.RAssessmentDate,
b.NAssessmentDate
FROM (
SELECT
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY ID) RowId, *
FROM table1
) a
FULL OUTER JOIN (
SELECT
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY ID) RowId, *
FROM table2
) b ON a.ID = b.ID AND a.RowId = b.RowId
WHERE (a.RAssessmentDate BETWEEN '2020-01-01' AND '2022-01-02')
OR (b.NAssessmentDate BETWEEN '2020-01-01' AND '2022-01-02')

link a value from one table to another and slice one table based on columns from another table in sql

Suppose I have a first table like this:
tbl1:
eventid date1 date2
A 2020-06-21 2020-06-28
B 2020-05-13 2020-05-24
C 2020-07-20 2020-06-28
I also have a second table with a quantity and a date:
tbl2:
quantity date
5 2020-06-24
13 2020-07-24
8 2020-07-28
8 2020-06-20
12 2020-06-27
9 2020-06-29
10 2020-05-24
11 2020-05-12
18 2020-05-18
9 2020-05-14
7 2020-07-18
12 2020-07-21
Now I want select only the rows from table 2 where the dates fall between the dates of table 1 AND to add a column to table with each row containing A, B or C (eventid from table 1) so that we can see which date in table 2 belongs to which eventid.
So my end result would look like:
quantity date eventid
5 2020-06-24 1
13 2020-07-24 3
8 2020-07-28 3
12 2020-06-27 1
10 2020-05-24 2
18 2020-05-18 2
9 2020-05-14 2
12 2020-07-21 3
I've been starring at it for ages now because I need an efficient way to do it..
Is there an efficient way of obtaining the desired result?
This looks like a join:
select t2.*, t1.eventid
from tbl2 t2 join
tbl1 t1
on t2.date >= t1.date1 and t2.date <= t2.date2;

Select value on next date to be calculated on current date SQL

I have the following table:
ID GroupID oDate oTime oValue
1 A 2014-06-01 00:00:00 100
2 A 2014-06-01 01:00:00 200
3 A 2014-06-01 02:00:00 300
4 A 2014-06-02 00:00:00 400
5 A 2014-06-02 01:00:00 425
6 A 2014-06-02 02:00:00 475
7 B 2014-06-01 00:00:00 1000
8 B 2014-06-01 01:00:00 1500
9 B 2014-06-01 02:00:00 2000
10 B 2014-06-02 00:00:00 3000
11 B 2014-06-02 01:00:00 3100
12 A 2014-06-03 00:00:00 525
13 A 2014-06-03 01:00:00 600
14 A 2014-06-03 02:00:00 625
I want to have the following result:
GroupID oDate oResult
A 2014-06-01 300
A 2014-06-02 125
B 2014-06-01 2000
oResult is coming from:
Value on next date at 00:00:00 subtract value on selected date at 00:00:00.
For example, I want to know the Result for 2014-06-01. Then,
2014-06-02 00:00:00 400 substract 2014-06-01 00:00:00 100
oResult = 400 - 100 = 300
How can I achieve this in SQL syntax?
Thank you.
You can write a query using Common Table Expression as :
;with CTE as
( select row_number() over ( partition by GroupID, oDate order by oTime Asc) as rownum,
GroupID, oDate, oValue,oTime
from Test
)
select CTE.GroupID,CTE1.oDate, (CTE.oValue - CTE1.oValue) as oResult
from CTE
inner join CTE as CTE1 on datediff (day,CTE1.oDate, CTE.oDate) = 1
and CTE1.rownum= CTE.rownum
and CTE1.GroupID= CTE.GroupID
where CTE.rownum = 1
Check Demo here ...
You can use cross apply operator here
Please check this,
select a.GroupID,a.oDate, (ab.oValue - a.oValue) oResult from T as a
cross apply
(
select top 1 * from T as b
where a.oDate < b.oDate
and oTime = '00:00:00.0000000'
and a.ID < b.ID
)as ab
where a.ID in(1,4,7)
Demo