Calculate time different from previous record - sql

I have a set of data that I want to determine the difference in days between the Begin_time and End_Time for every 2 records to determine the processing time. I'm familiar with DateDiff('d','End_Time','Begin_Time',) to determine the processing time on the same row but how do I determine this for the previous record? For example, something like this DateDiff('Record2.Begin_time','Record1.End_Time') then DateDiff('Record4.Begin_time','Record3.End_Time') then DateDiff('Record6.Begin_time','Record5.End_Time') etc. It doesn't have to use DateDiff function, I'm just using that to illustrate my question. thanks
> Record Begin_Time End_Time Processing_Time
1 11/23/2020 11/24/2020 1
2 11/23/2020 11/24/2020 1
3 11/30/2020 11/30/2020 0
4 11/30/2020 11/30/2020 0
5 11/2/2020 11/3/2020 1
6 11/2/2020 11/3/2020 1
7 11/3/2020 11/5/2020 2
8 11/3/2020 11/5/2020 2

An Aproach could be like this:
Select DateDiff(YourTableEven.Begin_time, YourTableOdd.End_Time)
From YourTable AS YourTableEven
Join YourTable AS YourTableOdd ON YourTableOdd.Record = YourTableEven.Record + 1
Where YourTableEven.Record % 2 = 0

Related

Creating 2 additional columns based on past dates - PostgresSQL

Seeking some help after spending alot of time on searching but to no avail and decided to post this here as I'm rather new to SQL, so any help is greatly appreciated. I've tried a few functions but can't seem to get it right. e.g. GROUP BY, BETWEEN etc
On the PrestoSQL server, I have a table as shown below starting with columns Date, ID and COVID. Using GROUP BY ID, I would like to create a column EverCOVIDBefore which looks back at all past dates of the COVID column to see if there was ever COVID = 1 or not, as well as another column called COVID_last_2_mth which checks if there was ever COVID = 1 within the past 2 months
(Highlighted columns are my expected outcomes)
Link to dataset: https://drive.google.com/file/d/1Sc5Olrx9g2A36WnLcCFMU0YTQ3-qWROU/view?usp=sharing
You can do:
select *,
max(covid) over(partition by id order by date) as ever_covid_before,
max(covid) over(partition by id order by date
range between interval '2 month' preceding and current row)
as covid_last_two_months
from t
Result:
date id covid ever_covid_before covid_last_two_months
----------- --- ------ ------------------ ---------------------
2020-01-15 1 0 0 0
2020-02-15 1 0 0 0
2020-03-15 1 1 1 1
2020-04-15 1 0 1 1
2020-05-15 1 0 1 1
2020-06-15 1 0 1 0
2020-01-15 2 0 0 0
2020-02-15 2 1 1 1
2020-03-15 2 0 1 1
2020-04-15 2 0 1 1
2020-05-15 2 0 1 0
2020-06-15 2 1 1 1
See running example at db<>fiddle.

Generating columns for daily stats in SQL

I have a table that currently looks like this (simplified to illustate my issue):
Thing| Date
1 2022-12-12
2 2022-11-05
3 2022-11-18
4 2022-12-01
1 2022-11-02
2 2022-11-21
5 2022-12-03
5 2022-12-08
2 2022-11-18
1 2022-11-20
I would like to generate the following:
Thing| 2022-11 | 2022-12
1 2 1
2 3 0
3 1 0
4 0 1
5 0 2
I'm new to SQL and can't quite figure this out - would I use some sort of FOR loop equivalent in my SELECT clause? I'm happy to figure out the exact syntax myself, I just need someone to point me in the right direction.
Thank you!
You may use conditional aggregation as the following:
Select Thing,
Count(Case When Date Between '2022-11-01' And '2022-11-30' Then 1 End) As '2022-11',
Count(Case When Date Between '2022-12-01' And '2022-12-31' Then 1 End) As '2022-12'
From table_name
Group By Thing
Order By Thing
See a demo.
The count function counts only the not null values, so for each row not matching the condition inside the count function a null value is returned, hence not counted.

assign value to a group pandas based on a day difference between them

I have a dataframe with ID and date ( and calculated day difference between the rows for the same ID)
ID date day_difference
1 27/06/2019 0
1 28/06/2019 1
1 29/06/2019 1
1 01/07/2019 2
1 02/07/2019 1
1 03/07/2019 1
1 05/07/2019 2
2 27/06/2019 0
2 28/06/2019 1
2 29/06/2019 1
2 01/08/2019 33
2 02/08/2019 1
2 03/08/2019 1
2 04/08/2019 1
which i would like to group by ID and calculate total duration with a condition if day difference is bigger than 30 days re-use that ID again and create a new group starting counting duration from that day after a 30day gap.
Desired result
ID Duration
1 8
2 3
2 4
Thanks.
You can do:
(df.groupby(['ID', df.day_difference.gt(30).cumsum()])
.agg(ID=('ID','first'), Duration=('ID','count'))
.reset_index(drop=True)
)
Output:
ID Duration
0 1 7
1 2 3
2 2 4

Creating min and max values and comparing them to timestamp values sql

I have a PostgreSQL database and I have a table that I am looking to query to determine which presses have been updated between the first cycle created_timestamp and the most recent cycle created_timestamp. Here is an example of the table, which is called event_log_summary.
press_id cycle_number created_timestamp
1 1 2020-02-07 16:07:52
1 2 2020-02-07 16:07:53
1 3 2020-02-07 16:07:54
1 4 2020-04-01 13:23:10
2 1 2020-01-13 8:33:23
2 2 2020-01-13 8:33:24
2 3 2020-01-13 8:33:25
3 1 2020-02-21 18:45:44
3 2 2020-02-21 18:45:45
3 3 2020-02-26 14:22:12
This is the query that I used to get me a three column output of press_id, mincycle, max_cycle, but then I want to compare the maxcycle created_timestamp to the mincycle created_timestamp and see if there is at least x amount of time between the two, say at least 1 day, I am unsure about how to implement that.
SELECT
press_id,
MIN(cycle_number) AS minCycle,
MAX(cycle_number) AS maxCycle
FROM
event_log_detail
GROUP BY
press_id
I have tried different things like using WHERE (MAX(cycle_number) - MIN(cycle_number > 1), but I am pretty new to SQL and don't quite fully know how to implement this. The output I am looking for, would have a difference of at least one day would be the following:
press_id
1
3
Presses 1 and 3 have their maximum cycle created_timestamp at least 1-day difference than their minimum cycle created_timestamp. I am just looking for the press_ids whose first cycle and the last cycle have a difference of at least 1 day, I don't need any other information on the output, just one column with the press_ids. Any help would be appreciated. Thanks.
You can use a HAVING clause:
select press_id,
max(created_timestamp) - min(created_timestamp) as diff
from event_log_detail
group by press_id
having max(created_timestamp) > min(created_timestamp) + interval '1 day';

SQL - Retrieve Closest Lower Value

When a column value does not equal, I would like to retrieve the closest lower pay value.
For instance: 10 yearsOfService should equal the value 650.00; 14 yearsOfService would equal the value 840.00 in the below incentive table,
ID Pay yearsOfService
1 125.00 0
2 156.00 2
3 188.00 3
4 206.00 4
5 650.00 6
6 840.00 14
7 585.00 22
8 495.00 23
9 385.00 24
10 250.00 25
I have tried several different approaches; including:
SELECT TOP 1 (pay) as incentivePay
FROM incentive
WHERE yearsOfService = '10'
This works but only for yearsOfService that match.
With 10 yearsOfService:
RESULTSET = [1 650.00]
Any ideas?
Please try:
SELECT TOP 1 (pay) as incentivePay
FROM incentive
WHERE yearsOfService <= '10'
ORDER BY yearsOfService desc