Calculate overlap time in seconds for groups in SQL - sql

I have a bunch of timestamps grouped by ID and type in the sample data shown below.
I would like to find overlapped time between start_time and end_time columns in seconds for each group of ID and between each lead and follower combinations. I would like to show the overlap time only for the first record of each group which will always be the "lead" type.
For example, for the ID 1, the follower's start and end times in row 3 overlap with the lead's in row 1 for 193 seconds (from 09:00:00 to 09:03:13). the follower's times in row 3 also overlap with the lead's in row 2 for 133 seconds (09:01:00 to 2020-05-07 09:03:13). That's a total of 326 seconds (193+133)
I used the partition clause to rank rows by ID and type and order them by start_time as a start.
How do I get the overlap column?
row# ID type start_time end_time rank. overlap
1 1 lead 2020-05-07 09:00:00 2020-05-07 09:03:34 1 326
2 1 lead 2020-05-07 09:01:00 2020-05-07 09:03:13 2
3 1 follower 2020-05-07 08:59:00 2020-05-07 09:03:13 1
4 2 lead 2020-05-07 11:23:00 2020-05-07 11:33:00 1 540
4 2 follower 2020-05-07 11:27:00 2020-05-07 11:32:00 1
5 3 lead 2020-05-07 14:45:00 2020-05-07 15:00:00 1 305
6 3 follower 2020-05-07 14:44:00 2020-05-07 14:44:45 1
7 3 follower 2020-05-07 14:50:00 2020-05-07 14:55:05 2

In your example, the times completely cover the total duration. If this is always true, you can use the following logic:
select id,
(sum(datediff(second, start_time, end_time) -
datediff(second, min(start_time), max(end_time)
) as overlap
from t
group by id;
To add this as an additional column, then either use window functions or join in the result from the above query.
If the overall time has gaps, then the problem is quite a bit more complicated. I would suggest that you ask a new question and set up a db fiddle for the problem.

Tried this a couple of way and got it to work.
I first joined 2 tables with individual records for each type, 'lead' and 'follower' and created a case statement to calculate max start time for each lead and follower start time combination and min end time for each lead and follower end time combination. Stored this in a temp table.
CASE
WHEN lead_table.start_time > follower_table.start_time THEN lead_table.start_time
WHEN lead_table.start_time < follower_table.start_time THEN patient_table.start_time_local
ELSE 0
END as overlap_start_time,
CASE
WHEN follower_table.end_time < lead_table.end_time THEN follower_table.end_time
WHEN follower_table.end_time > lead_table.end_time THEN lead_table.end_time
ELSE 0
END as overlap_end_time
Then created an outer query to lookup the temp table just created to find the difference between start time and end time for each lead and follower combination in seconds
select temp_table.id,
temp_table.overlap_start_time,
temp_table.overlap_end_time,
DATEDIFF_BIG(second,
temp_table.overlap_start_time,
temp_table.overlap_end_time) as overlap_time FROM temp_table

Related

count number of records by month over the last five years where record date > select month

I need to show the number of valid inspectors we have by month over the last five years. Inspectors are considered valid when the expiration date on their certification has not yet passed, recorded as the month end date. The below SQL code is text of the query to count valid inspectors for January 2017:
SELECT Count(*) AS RecordCount
FROM dbo_Insp_Type
WHERE (dbo_Insp_Type.CERT_EXP_DTE)>=#2/1/2017#);
Rather than designing 60 queries, one for each month, and compiling the results in a final table (or, err, query) are there other methods I can use that call for less manual input?
From this sample:
Id
CERT_EXP_DTE
1
2022-01-15
2
2022-01-23
3
2022-02-01
4
2022-02-03
5
2022-05-01
6
2022-06-06
7
2022-06-07
8
2022-07-21
9
2022-02-20
10
2021-11-05
11
2021-12-01
12
2021-12-24
this single query:
SELECT
Format([CERT_EXP_DTE],"yyyy/mm") AS YearMonth,
Count(*) AS AllInspectors,
Sum(Abs([CERT_EXP_DTE] >= DateSerial(Year([CERT_EXP_DTE]), Month([CERT_EXP_DTE]), 2))) AS ValidInspectors
FROM
dbo_Insp_Type
GROUP BY
Format([CERT_EXP_DTE],"yyyy/mm");
will return:
YearMonth
AllInspectors
ValidInspectors
2021-11
1
1
2021-12
2
1
2022-01
2
2
2022-02
3
2
2022-05
1
0
2022-06
2
2
2022-07
1
1
ID
Cert_Iss_Dte
Cert_Exp_Dte
1
1/15/2020
1/15/2022
2
1/23/2020
1/23/2022
3
2/1/2020
2/1/2022
4
2/3/2020
2/3/2022
5
5/1/2020
5/1/2022
6
6/6/2020
6/6/2022
7
6/7/2020
6/7/2022
8
7/21/2020
7/21/2022
9
2/20/2020
2/20/2022
10
11/5/2021
11/5/2023
11
12/1/2021
12/1/2023
12
12/24/2021
12/24/2023
A UNION query could calculate a record for each of 50 months but since you want 60, UNION is out.
Or a query with 60 calculated fields using IIf() and Count() referencing a textbox on form for start date:
SELECT Count(IIf(CERT_EXP_DTE>=Forms!formname!tbxDate,1,Null)) AS Dt1,
Count(IIf(CERT_EXP_DTE>=DateAdd("m",1,Forms!formname!tbxDate),1,Null) AS Dt2,
...
FROM dbo_Insp_Type
Using the above data, following is output for Feb and Mar 2022. I did a test with Cert_Iss_Dte included in criteria and it did not make a difference for this sample data.
Dt1
Dt2
10
8
Or a report with 60 textboxes and each calls a DCount() expression with criteria same as used in query.
Or a VBA procedure that writes data to a 'temp' table.

Query repeating event in a given time range

I have a set of events (Time in MM/DD) that can be repeated by different users:
EventId Event Time User
1 Start 06/01/2012 10:05AM 1
1 End 06/05/2012 10:45AM 1
2 Start 07/07/2012 09:55AM 2
2 End 09/07/2012 11:05AM 2
3 Start 09/01/2012 11:05AM 2
3 End 09/03/2012 11:05AM 2
I want to get, using SQL, those events a user has done in a specified time range, for instance:
Given 06/06/2012 and 09/02/2012 I am expecting tho get:
EventId Event Time User
2 Start 07/07/2012 09:55AM 2
2 End 09/07/2012 11:05AM 2
3 Start 09/01/2012 11:05AM 1
Any idea on how to deal with this?
A basic range query should work here:
SELECT *
FROM yourTable
WHERE Time >= '2012-06-06'::date AND Time < '2012-09-03'::date;
This assumes you want records falling on June 6, 2012 to September 2 of the same year.

How to calculate price change criteria over a rolling timeframe for market data stored in a PostgreSQL database

I have a postgreSQL 10 database with a table named Period that stores historical price data of several cryptocurrencies. I'm trying to query this data to find where the price has increased by a certain percent over a rolling timeframe i.e. 5% within one day. The relevant table columns are "product_id", "end_time" in seconds since the unix epoch, and "close" the closing price of the period.
Update for clarification:
I want to query every period in the database sorted chronologically by end_time
I want the results to have additional calculated columns (I called them buy_signal and sell_signal in my example but they correspond to the min_period/max_period below) based on the following logic
Find the period with the max close price as max_period that occurs within one day of the current period (1440 minutes, table has a row for each minute)
Find the period with the min close price as min_period that occurs within one day of the current period AND where min_period.end_time < max_period.end_time
If the difference between the closing price of min_period and max_period is > 5%, then the buy_signal column corresponding to the row containing the min_period is true and the sell_signal column of the max_period row is true
Most of this is pretty straightforward but I'm stuck on the bolded part in #2. It seems like I need to actually get a reference/alias to the max_period in order to add that criteria to the min_period query.
Given the following simplified data:
id product_id start_time end_time close
0 BTC-USD 100000 100060 100
1 BTC-USD 100060 100120 99
2 BTC-USD 100120 100180 101
3 BTC-USD 100180 100240 105
4 BTC-USD 100240 100300 104
5 BTC-USD 100300 100360 102
6 BTC-USD 100360 100420 100
7 BTC-USD 100420 100480 98
I'd like to be able to query these periods with additional calculated columns to indicate when a local minimum/maximum has occurred and when the difference between the min and max is greater than the threshold. For example:
id product_id start_time end_time close buy_signal sell_signal
0 BTC-USD 100000 100060 100
1 BTC-USD 100060 100120 99 true
2 BTC-USD 100120 100180 101
3 BTC-USD 100180 100240 105 true
4 BTC-USD 100240 100300 104
5 BTC-USD 100300 100360 102
6 BTC-USD 100360 100420 100
7 BTC-USD 100420 100480 98
Notice row with id 7 is a local min but it doesn't come before the local max.
The complexity of this query is far beyond my SQL experience and I've spent weeks looking through stack overflow answers and SQL features in my spare time but this ugly query is as far as I've been able to get:
SELECT
end_time,
close,
MAX(close) OVER (ORDER BY end_time ROWS BETWEEN CURRENT ROW AND 1439 FOLLOWING) AS max_close,
MIN(close) OVER (ORDER BY end_time ROWS BETWEEN CURRENT ROW AND 1439 FOLLOWING) AS min_close,
((MAX(close) OVER (ORDER BY end_time ROWS BETWEEN CURRENT ROW AND 1439 FOLLOWING) -
MIN(close) OVER (ORDER BY end_time ROWS BETWEEN CURRENT ROW AND 1439 FOLLOWING)) /
MIN(close) OVER (ORDER BY end_time ROWS BETWEEN CURRENT ROW AND 1439 FOLLOWING)) AS change_percent,
CASE WHEN
((MAX(close) OVER (ORDER BY end_time ROWS BETWEEN CURRENT ROW AND 1439 FOLLOWING) -
MIN(close) OVER (ORDER BY end_time ROWS BETWEEN CURRENT ROW AND 1439 FOLLOWING)) /
MIN(close) OVER (ORDER BY end_time ROWS BETWEEN CURRENT ROW AND 1439 FOLLOWING)) > 0.05
THEN true
END AS buy_signal
FROM Period
WHERE product_id='BTC-USD'
The obvious problem is this doesn't differentiate between the order of the min and max; it selects the lowest value in the window without regards to whether the min came chronologically before the high based on end_time. I've been struggling with how to combine that time filter with the windowing to find the lowest price that came before the highest price and piece everything together. Another (acceptable) limitation is this only works for one product_id at a time but ideally I'd like the query results to include multiple different product_ids ordered by end_time with the signals being product_id-specific.
My goal is to feed the paginated query results into a java application (I'm using Hibernate) that trains a model using the buy and sell signals. I could use the java application to pre-process the data but with millions of rows it would take hours just to execute and add more complexity than the query option. Apologies for the long question but I didn't want to leave anything out. Any help would be greatly appreciated.

Getting date difference between consecutive rows in the same group

I have a database with the following data:
Group ID Time
1 1 16:00:00
1 2 16:02:00
1 3 16:03:00
2 4 16:09:00
2 5 16:10:00
2 6 16:14:00
I am trying to find the difference in times between the consecutive rows within each group. Using LAG() and DATEDIFF() (ie. https://stackoverflow.com/a/43055820), right now I have the following result set:
Group ID Difference
1 1 NULL
1 2 00:02:00
1 3 00:01:00
2 4 00:06:00
2 5 00:01:00
2 6 00:04:00
However I need the difference to reset when a new group is reached, as in below. Can anyone advise?
Group ID Difference
1 1 NULL
1 2 00:02:00
1 3 00:01:00
2 4 NULL
2 5 00:01:00
2 6 00:04:00
The code would look something like:
select t.*,
datediff(second, lag(time) over (partition by group order by id), time)
from t;
This returns the difference as a number of seconds, but you seem to know how to convert that to a time representation. You also seem to know that group is not acceptable as a column name, because it is a SQL keyword.
Based on the question, you have put group in the order by clause of the lag(), not the partition by.

Detect Intervals

id_person transaction internation_in internation_out
1 456465 2015-01-01 2015-02-01
2 564564 2015-02-03 2015-04-02
3 4564654 2015-01-01 2015-01-05
4 4564646 2015-01-01 2015-02-04
4 4564656 2015-03-01 2015-04-15
4 87899465 2015-05-16 2015-05-25
5 56456456 2015-01-01 2105-01-08
5 45456546 2015-02-04 2015-03-04
I want to know how to group by id_person the difference (Interval in hours) between the internation_out from the first transaction with the internation_in of the next transaction.
I probe with lag and lead but I can't group by id_person
I Want this Result using id_person 4 for example
id_person transaction Gap
4 4564646 Null
4 4564656 The result of (2015-02-04- 2015-03-01)
4 87899465 The result of (2015-04-15- 2015-05-16)
If your time periods are not overlapping (and yours are not), then there is a simple calculation for the gaps: it is the total number of days from the beginning to the end minus the total on each row. So, you don't need lead() or lag():
select id_person,
(case when count(*) > 1
then (max(internation_out) - min(internation_in) -
sum(internation_out - internation_in)
)
end) as gap_duration
from table t
group by id_person;
Note that this returns NULL if there is only one row for the person. If you want 0, then you don't need the case.