Calculate duration between two rows T-Sql - sql

Good afternoon! Could anyone help me to solve the task? I have a table:
Id
Date
Reason
1
2020-01-01 10:00
Departure
1
2020-01-01 12:20
Arrival
1
2020-01-02 14:30
Departure
1
2020-01-02 19:20
Arrival
1
2020-01-03 15:40
Departure
1
2020-01-04 19:20
Arrival
2
2020-02-03 15:40
Departure
2
2020-02-04 19:20
Arrival
3
2020-03-05 15:40
Departure
3
2020-03-05 19:20
Arrival
3
2020-03-06 16:28
Departure
3
2020-03-06 21:00
Arrival
I need to estimate average duration of each ID. At first step I want to get table, for example for id = 1, as
Id
Duraton (minutes)
1
140
1
290
1
1660
How can I achive that by T-Sql query?

Assuming the rows are perfectly interleaved, you can use lead():
select t.*,
datediff(minute, date, next_date) as diff_minutes
from (select t.*,
lead(date) over (partition by id order by date) as next_date
from t
) t
where reason = 'Departure';
If you want the results for only one id, you can filter in either the subquery or the outer query.

Related

How can i create a new column count in SQL table where count=1 if hours column >=6 else count=0

I aim to first achieve this
id
employee
Datelog
TimeIn
TimeOut
Hours
Count
5
Two
2022-08-10
09:00:00
16:00:00
07:00:00
1
4
Two
2022-08-09
09:00:00
16:00:00
07:00:00
1
3
Two
2022-08-08
09:00:00
16:00:00
07:00:00
1
2
One
2022-08-05
09:00:00
16:00:00
07:00:00
1
1
Two
2022-08-04
09:00:00
10:00:00
01:00:00
0
and now my main objective here is to give a bonus of 2k to employees whose Totalcount per month >=3.
employee
Month
TotalCount
Bonus
Two
August
3
2000
One
August
1
0
Here's the answer using Postgres. It's pretty much generic other than extracting the month out of datelog that might have a slightly different syntax.
select employee
,max(date_part('month', datelog ))
,count(*)
,case when count(*) >= 3 then 2000 else 0 end as bonus
from t
where hours >= time '06:00:00'
group by employee
employee
max
count
bonus
Two
8
3
2000
One
8
1
0
Fiddle

Rolling Sum Calculation Based on 2 Date Fields

Giving up after a few hours of failed attempts.
My data is in the following format - event_date can never be higher than create_date.
I'd need to calculate on a rolling n-day basis (let's say 3) the sum of units where the create_date and event_date were within the same 3-day window. The data is illustrative but each event_date can have over 500+ different create_dates associated with it and the number isn't constant. There is a possibility of event_dates missing.
So let's say for 2022-02-03, I only want to sum units where both the event_date and create_date values were between 2022-02-01 and 2022-02-03.
event_date
create_date
rowid
units
2022-02-01
2022-01-20
1
100
2022-02-01
2022-02-01
2
100
2022-02-02
2022-01-21
3
100
2022-02-02
2022-01-23
4
100
2022-02-02
2022-01-31
5
100
2022-02-02
2022-02-02
6
100
2022-02-03
2022-01-30
7
100
2022-02-03
2022-02-01
8
100
2022-02-03
2022-02-03
9
100
2022-02-05
2022-02-01
10
100
2022-02-05
2022-02-03
11
100
The output I'd need to get to (added in brackets the rows I'd need to include in the calculation for each date but my result would only need to include the numerical sum) . I tried calculating using either dates but neither of them returned the results I needed.
date
units
2022-02-01
100 (Row 2)
2022-02-02
300 (Row 2,5,6)
2022-02-03
300 (Row 2,6,8,9)
2022-02-04
200 (Row 6,9)
2022-02-05
200 (Row 9,11)
In Python I solved above with a definition that looped through filtering a dataframe for each date but I am struggling to do the same in SQL.
Thank you!
Consider below approach
with events_dates as (
select date from (
select min(event_date) min_date, max(event_date) max_date
from your_table
), unnest(generate_date_array(min_date, max_date)) date
)
select date, sum(units) as units, string_agg('' || rowid) rows_included
from events_dates
left join your_table
on create_date between date - 2 and date
and event_date between date - 2 and date
group by date
if applied to sample data in your question - output is

Count consecutive recurring values

I am struggling to find any info on this on the internet after a couple of hours of searching, trial, error and failure. We have the following table structure:
Name
EventDateTime
Mark
Dave
2021-03-24 09:00:00
Present
Dave
2021-03-24 14:00:00
Absent
Dave
2021-03-25 09:00:00
Absent
Dave
2021-03-26 09:00:00
Absent
Dave
2021-03-27 09:00:00
Present
Dave
2021-03-27 14:00:00
Absent
Dave
2021-03-28 09:00:00
Absent
Dave
2021-03-29 10:00:00
Absent
Dave
2021-03-30 13:00:00
Absent
Jane
2021-03-30 13:00:00
Absent
Basically registers for people for events. We need to pull a report to see who we have not had contact from for more x consecutive days. Consecutive meaning for the days that they have events in the data not consecutive calendar days. Also if there is a present on one of the days where they were also absent the count needs to start again from the next day they were absent.
The first issue I've got is getting distinct dates where there are only absences, then the 2nd is getting the number of consecutive days of absences - I've done the 2nd in MySQL with variables but struggled to migrate this over to PostgreSQL where the reporting is done from.
An example of the output I'd want is:
Name
EventDateTime
Mark
ConsecCount
Dave
2021-03-24 09:00:00
Present
0
Dave
2021-03-24 14:00:00
Absent
0
Dave
2021-03-25 09:00:00
Absent
1
Dave
2021-03-26 09:00:00
Absent
2
Dave
2021-03-27 09:00:00
Present
0
Dave
2021-03-27 14:00:00
Absent
0
Dave
2021-03-28 09:00:00
Absent
1
Dave
2021-03-29 10:00:00
Absent
2
Dave
2021-03-30 13:00:00
Absent
3
Jane
2021-03-30 13:00:00
Absent
0
This table is currently at 639931 records and they have been generated since 1st October and will continue to grow at this rate.
Any help, or advise on where to start that would be great.
This can be achieved using window functions as follows:
WITH with_row_numbers AS (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY Name ORDER BY EventDateTime) AS this_row_number,
(CASE WHEN Mark = 'Present' THEN ROW_NUMBER() OVER (PARTITION BY Name ORDER BY EventDateTime) ELSE 0 END) AS row_number_if_present
FROM events
)
SELECT
Name,
EventDateTime,
Mark,
GREATEST(0, this_row_number - MAX(row_number_if_present) OVER (PARTITION BY Name ORDER BY EventDateTime) - 1)
FROM with_row_numbers
Original answer with LATERAL join
WITH with_row_numbers AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY Name ORDER BY EventDateTime)
FROM events e
)
SELECT
t1.Name,
t1.EventDateTime,
t1.Mark,
GREATEST(0, t1.ROW_NUMBER - COALESCE(sub.prev_present_row_number, 0) - 1) AS ConsecCount
FROM with_row_numbers AS t1
CROSS JOIN LATERAL (
SELECT MAX(row_number) AS prev_present_row_number
FROM with_row_numbers t2
WHERE t2.Name = t1.Name
AND t2.EventDateTime <= t1.EventDateTime
AND t2.Mark = 'Present'
) sub

SQL : GROUP and MAX multiple columns

I am a SQL beginner, can anyone please help me about a SQL query?
my table looks like below
PatientID Date Time Temperature
1 1/10/2020 9:15 36.2
1 1/10/2020 20:00 36.5
1 2/10/2020 8:15 36.1
1 2/10/2020 18:20 36.3
2 1/10/2020 9:15 36.7
2 1/10/2020 20:00 37.5
2 2/10/2020 8:15 37.1
2 2/10/2020 18:20 37.6
3 1/10/2020 8:15 36.2
3 2/10/2020 18:20 36.3
How can I get each patient everyday's max temperature:
PatientID Date Temperature
1 1/10/2020 36.5
1 2/10/2020 36.3
2 1/10/2020 37.5
2 2/10/2020 37.6
Thanks in advance!
For this dataset, simple aggregation seems sufficient:
select patientid, date, max(temperature) temperature
from mytable
group by patientid, date
On the other hand, if there are other columns that you want to display on the row that has the maximum daily temperature, then it is different. You need some filtering; one option uses window functions:
select *
from (
select t.*,
rank() over(partition by patientid, date order by temperature desc)
from mytable t
) t
where rn = 1

How to link two tables but only take the MAX value from one table in PostgreSQL?

I have two tables
exchange_rates
TIMESTAMP curr1 curr2 rate
2018-04-01 00:00:00 EUR GBP 0.89
2018-04-01 01:30:00 EUR GBP 0.92
2018-04-01 01:20:00 USD GBP 1.23
and
transactions
TIMESTAMP user curr amount
2018-04-01 18:00:00 1 EUR 23.12
2018-04-01 14:00:00 1 USD 15.00
2018-04-01 01:00:00 2 EUR 55.00
I want to link these two tables on 1. currency and 2. TIMESTAMP in the following way:
curr in transactions must be equal to curr1 in exchange_rates
TIMESTAMP in exchange_rates must be less than or equal to TIMESTAMP in transactions (so we only pick up the exchange rate that was relevant at the time of transaction)
I have this:
SELECT
trans.TIMESTAMP, trans.user,
-- Multiply the amount in transactions by the corresponding rate in exchange_rates
trans.amount * er.rate AS "Converted Amount"
FROM transactions trans, exchange_rates er
WHERE trans.curr = er.curr1
AND er.TIMESTAMP <= trans.TIMESTAMP
ORDER BY trans.user
but this is linking on two many results as the output is more rows than there are in transactions.
DESIRED OUTPUT:
TIMESTAMP user Converted Amount
2018-04-01 18:00:00 1 21.27
2018-04-01 14:00:00 1 18.45
2018-04-01 01:00:00 2 48.95
The logic behind the Converted Amount:
row 1: user spent at 18:00 so take the rate that is less than or equal to the TIMESTAMP in exchange_rates i.e. 0.92 for EUR at 01:30
row 2: user spent at 14:00 so take the rate that is less than or equal to the TIMESTAMP in exchange_rates i.e. 1.23 for USD at 01:20
row 3: user spent at 01:00 so take the rate that is less than or equal to the TIMESTAMP in exchange_rates i.e. 0.89 for EUR at 00:00
How can I do this in postgresql 9.6?
You can use a LATERAL JOIN (CROSS APPLY) and limit the result to the first row that match your conditions.
select t.dt, t.usr, t.amount * e.rate as conv_amount
from transactions t
join lateral (select *
from exchange_rates er
where t.curr = er.curr1
and er.dt <= t.dt
order by dt desc
limit 1) e on true;
dt | usr | conv_amount
:------------------ | --: | ----------:
2018-04-01 18:00:00 | 1 | 21.2704
2018-04-01 14:00:00 | 1 | 18.4500
2018-04-01 01:00:00 | 2 | 48.9500
db<>fiddle here