Join tables with dates within intervals of 5 min (get avg)

Join tables with dates within intervals of 5 min (get avg) - sql

I want to join two tables based on timestamp, the problem is that both tables didn't had the exact same timestamp so i want to join them using a near timestamp using a 5 minute interval.
This query needs to be done using 2 Common table expressions, each common table expression needs to get the timestamps and group them by AVG so they can match
Freezer | Timestamp | Temperature_1
1 2018-04-25 09:45:00 10
1 2018-04-25 09:50:00 11
1 2018-04-25 09:55:00 11
Freezer | Timestamp | Temperature_2
1 2018-04-25 09:46:00 15
1 2018-04-25 09:52:00 13
1 2018-04-25 09:59:00 12
My desired result would be:
Freezer | Timestamp | Temperature_1 | Temperature_2
1 2018-04-25 09:45:00 10 15
1 2018-04-25 09:50:00 11 13
1 2018-04-25 09:55:00 11 12
The current query that i'm working on is:
WITH Temperatures_1 (
SELECT Freezer, Temperature_1, Timestamp
FROM TABLE_A
),
WITH Temperatures_2 (
SELECT Freezer, Temperature_2, Timestamp
FROM TABLE_B
)
SELECT A.Freezer, A.Timestamp, Temperature_1, Temperature_2
FROM Temperatures_1 as A
RIGHT JOIN Temperatures_2 as B
ON A.FREEZER = B.FREEZER
WHERE A.Timestamp = B.Timestamp

You should may want to modify your join criteria instead of filtering the output. Use BETWEEN to bracket your join value on the timestamps. I chose +/- 150 seconds because that's half of 2-1/2 minutes to either side (5-minute range to match). You may need something different.
;WITH Temperatures_1 (
SELECT Freezer, Temperature_1, Timestamp
FROM TABLE_A
),
WITH Temperatures_2 (
SELECT Freezer, Temperature_2, Timestamp
FROM TABLE_B
)
SELECT A.Freezer, A.Timestamp, Temperature_1, Temperature_2
FROM Temperatures_1 as A
RIGHT JOIN Temperatures_2 as B
ON A.FREEZER = B.FREEZER
AND A.Timestamp BETWEEN (DATEADD(SECOND, -150, B.Timestamp)
AND (DATEADD(SECOND, 150, B.Timestamp)

You should change the key of join two table by adding the timestamp. The timestamp you should need to approximate the datetime on both side tables A and B tables.
First you should check if the value of the left table (A) datetime is under 2.5 minutes then approximate to the near 5 min. If it is greater the approximate to the next 5 minutes. The same thing you should do on the right table (B). Or you can do this on the CTE and the right join remains the same as your query.

Related

SQL - Fuzzy JOIN on Timestamp columns within X amount of time

Say I have two tables:
a:
timestamp
precipitation
2015-08-03 21:00:00 UTC
3
2015-08-03 22:00:00 UTC
3
2015-08-04 3:00:00 UTC
4
2016-02-04 18:00:00 UTC
4
and b:
timestamp
loc
2015-08-03 21:23:00 UTC
San Francisco
2016-02-04 16:04:00 UTC
New York
I want to join to get a table who has fuzzy joined entries where every row in b tries to get joined to a row in a. Criteria:
The time is within 60 minutes. If a match does not exist within 60 minutes, do not include that row in the output.
In the case of a tie where some row in b could join onto two rows in a, pick the closest one in terms of time.
Example Output:
timestamp
loc
precipitation
2015-08-03 21:00:00 UTC
San Francisco
3

What you need is an ASOF join. I don't think there is an easy way to do this with BigQuery. Other databases like Kinetica (and I think Clickhouse) support ASOF functions that can be used to perform 'fuzzy' joins.
The syntax for Kinetica would be something like the following.
SELECT *
FROM a
LEFT JOIN b
ON ASOF(a.timestamp, b.timestamp, INTERVAL '0' MINUTES, INTERVAL '60' MINUTES, MIN)
The ASOF function above sets up an interval of 60 minutes within which to look for matches on the right side table. When there are multiple matches, it selects the one that is closest (MAX would pick the one that is farthest away).

As per my understanding and based on the data you provided I think the below query should work for your use case.
create temporary table a as(
select TIMESTAMP('2015-08-03 21:00:00 UTC') as ts, 3 as precipitation union all
select TIMESTAMP('2015-08-03 22:00:00 UTC'), 3 union all
select TIMESTAMP('2015-08-04 3:00:00 UTC'), 4 union all
select TIMESTAMP('2016-02-04 18:00:00 UTC'), 4
);
create temporary table b as(
select TIMESTAMP('2015-08-03 21:23:00 UTC') as ts,'San Francisco ' as loc union all
select TIMESTAMP('2016-02-04 14:04:00 UTC') as ts,'New York ' as loc
);
select b_ts,a_ts,loc,precipitation,diff_time_sec
from(
select b.ts b_ts,a.ts a_ts,
ABS(TIMESTAMP_DIFF(b.ts,a.ts, SECOND)) as diff_time_sec,
*
from b
inner join a on b.ts between date_sub(a.ts, interval 60 MINUTE) and date_add(a.ts, interval 60 MINUTE)
)
qualify RANK() OVER(partition by b_ts ORDER BY diff_time_sec) = 1

how to calculate occupancy on the basis of admission and discharge dates

Suppose I have patient admission/claim wise data like the sample below. Data type of patient_id and hosp_id columns is VARCHAR
Table name claims
rec_no
patient_id
hosp_id
admn_date
discharge_date
1
1
1
01-01-2020
10-01-2020
2
2
1
31-12-2019
11-01-2020
3
1
1
11-01-2020
15-01-2020
4
3
1
04-01-2020
10-01-2020
5
1
2
16-01-2020
17-01-2020
6
4
2
01-01-2020
10-01-2020
7
5
2
02-01-2020
11-01-2020
8
6
2
03-01-2020
12-01-2020
9
7
2
04-01-2020
13-01-2020
10
2
1
31-12-2019
10-01-2020
I have another table wherein bed strength/max occupancy strength of hospitals are stored.
table name beds
hosp_id
bed_strength
1
3
2
4
Expected Results I want to find out hospital-wise dates where its declared bed-strength has exceeded on any day.
Code I have tried Nothing as I am new to SQL. However, I can solve this in R with the following strategy
pivot_longer the dates
tidyr::complete() missing dates in between
summarise or aggregate results for each date.
Simultaneously, I also want to know that whether it can be done without pivoting (if any) in sql because in the claims table there are 15 million + rows and pivoting really really slows down the process. Please help.

You can use generate_series() to do something very similar in Postgres. For the occupancy by date:
select c.hosp_id, gs.date, count(*) as occupanyc
from claims c cross join lateral
generate_series(admn_date, discharge_date, interval '1 day') gs(date)
group by c.hosp_id, gs.date;
Then use this as a subquery to get the dates that exceed the threshold:
select hd.*, b.strength
from (select c.hosp_id, gs.date, count(*) as occupancy
from claims c cross join lateral
generate_series(c.admn_date, c.discharge_date, interval '1 day') gs(date)
group by c.hosp_id, gs.date
) hd join
beds b
using (hosp_id)
where h.occupancy > b.strength

How to average values in one table based on the condition involving another table in SQL?

I have two tables. One defines time intervals (beginning and end). Time intervals are not equal in length. Another contains product ID, start and end date of the product.
TableOne:
Interval StartDateTime EndDateTime
202020201 2020-01-01 00:00:00 2020-02-10 00:00:00
202020202 2020-02-10 00:00:00 2020-02-20 00:00:00
TableTwo
ProductID ProductStartDateTime ProductEndDateTime
ASSDWE1 2018-01-04 00:12:00 2020-04-10 20:00:30
ADFGHER 2020-01-05 00:11:30 2020-01-19 00:00:00
ASDFVBN 2017-10-10 00:12:10 2020-02-23 00:23:23
I need to compute the average length of the products from TableTwo that existed during time intervals defined in TableOne. If the product existed throughout the time interval from TableOne, then the length of the product during this time interval is defined as it length since its start date till the end of the time interval.
I tried the following
select
a.*,
(select
AVG(datediff(day, b.ProductStartDateTime, IIF (b.ProductEndDateTime> a.EndDateTime, a.EndDateTime
,b.ProductEndDateTime))) --compute average length of the products
FROM #TableTwo b
WHERE ( not (b.ProductEndDateTime <= a.StartDateTime ) and not (b.ProductStartDateTime >= a.EndDateTime) )
-- select products that existed during interval from #TableOne
) as AverageProductLength
from #TableOne a
I get the mistake "Multiple columns are specified in an aggregated expression containing an outer reference. If an expression being aggregated contains an outer reference, then that outer reference must be the only column referenced in the expression."
The result I want:
Interval StartDateTime EndDateTime AverageProductLength
202020201 2020-01-01 00:00:00 2020-02-10 00:00:00 23
202020202 2020-02-10 00:00:00 2020-02-20 00:00:00 34.5
Is there a way I can do the averaging?

Problems with complex query

There are two tables.
In the first I have columns:
id - a person
time - the time of receiving the bonus (timestamp)
money - size of bonus
And the second:
id
time - time of getting a rank (timestamp)
range - military rank (int)
The task is to withdraw the amount and number of bonuses received by people in the rank of captain (range = 7) with aggregation by day.
I have no ideas how to do a table with this data. I can summarize data by all days such as
SELECT DISTINCTROW Payment.user_id AS user_id, Sum(IIf(IsNull(Payment.money),0,Payment.money)) AS [Sum - money], Count(Payment.money) AS [Count - Payment], Format(Payment.time, "Short Date") as day
FROM Payment
GROUP BY Payment.user_id, Format (Payment.time, "Short Date")
Having ((Count(Payment.money) > 0));
Can you help me with second part and summarize them? thanks
For example: first table (Payment):
user_id time money
a 01.01.10 00:00:00 15,00
a 01.01.10 10:00:00 2,00
a 03.01.10 00:00:00 3,00
c 04.01.10 00:00:00 4,00
c 04.01.10 00:05:00 5,00
d 06.01.10 00:00:00 6,00
e 07.01.10 00:00:00 7,00
e 08.01.10 00:00:00 8,00
The second one:
user_id time range
a 01.01.10 00:00:00 6
a 01.01.10 09:00:00 7
a 04.01.10 00:00:00 8
b 04.01.10 00:00:00 4
c 04.01.10 00:05:00 7
d 06.01.10 00:00:00 5
e 07.01.10 00:00:00 6
f 08.01.10 00:00:00 6
g 08.01.10 00:00:00 7
I expected:
user_id time sum
a 01.01.10 2
a 03.01.10 3
c 04.01.10 5

Here is one possible method using joins:
select t1.user_id, datevalue(p.time) as [time], sum(p.money) as [sum]
from
(
(select t.user_id, t.time from rank t where t.range = 7) t1
inner join payment p on t1.user_id = p.user_id
)
left join
(select t.user_id, t.time from rank t where t.range > 7) t2 on p.user_id = t2.user_id
where
p.time >= t1.time and (t2.user_id is null or p.time < t2.time)
group by
t1.user_id, datevalue(p.time)
I have assumed that your second table is called rank (this was not stated in your question).
Here, the subquery t1 obtains the set of users with range = 7 (captain), and the subquery t2 obtains the set of users with range > 7. I then select all records with a payment date greater than or equal to the date of promotion to captain, but less than any subsequent promotion (if it exists).
This yields the following result:
+---------+------------+------+
| user_id | time | sum |
+---------+------------+------+
| a | 01/01/2010 | 2.00 |
| a | 03/01/2010 | 3.00 |
| c | 04/01/2010 | 5.00 |
+---------+------------+------+
Unless I have misunderstood, I would argue that your expected result is incorrect as the payment below occurs before user_id = c achieved the rank of captain:
c 04.01.10 00:00:00 4,00
c 04.01.10 00:05:00 7

Missing Expression

I have 2 tables one contain just the dates such as
table1: select display_date from dates; --will display the whole month dates(01-31)
____________________________
display_date
___________________
01-OCT-14
02-OCT-14
03-OCT-14
.....SO ON
table2: select display_date, weekday, day, month from employee_Day -- this contains some dates from month (01,04,05, etc..). it wont contain all the dates
______________________________________________
|display_date | weekday | day | month |
-----------------------------------------------
01-OCT-14 7 01 10
04-OCT-14 5 04 10
_______________________________________________
I need to join those two tables and i have to get the output of all the dates and null values for the records which i need the output like as shown below
_____________________________________________
display_date | weekday | day | month |
_______________________________________________
01-OCT-14 7 01 10
02-OCT-14 5 02 10
03-OCT-14 4 03 10
select a.display_date, b.weekday, b.day, b.month from (subquery1) a, (subquery2) b where TO_CHAR(TO_DATE(a.DISPLAY_DATE,'DD-MON-RR'),'DD')= TO_CHAR(b.DAY_NUMBER)(+);
subquery1: select first table values
subquery2 : get secong table values
I am getting missing expression error .
I need to get common values in DISPLAY_DATE column if there is no value for display_date in table 2 then it has to join the result from table1.
I cant use union because the columns on table 1 and 2 are different
Any idea ?

you need to use left join
SELECT d.display_date, e.weekday, e.day, e.month
FROM Dates d
LEFT JOIN employee_Day e
ON d.display_date = e.display_date

I suppose you need something like this:
select a.display_date,
nvl(b.weekday, to_char(a.display_date, 'D')) weekday,
nvl(b.day, to_char(a.display_date, 'DD')) day,
nvl(b.month, to_char(a.display_date, 'MM')) month
from table1 a left join table2 b on a.display_date = b.display_date
order by a.display_date;
Oracle recommends to avoid using of (+) as it's derpicated.
NVL(expr1, expr2) = if expr1 is null then expr2 else expr1

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Join tables with dates within intervals of 5 min (get avg) - sql

Related

SQL - Fuzzy JOIN on Timestamp columns within X amount of time

how to calculate occupancy on the basis of admission and discharge dates

How to average values in one table based on the condition involving another table in SQL?

Problems with complex query

Missing Expression

Categories

Resources