Calculate Average between columns by comparing two rows in SQL Server - sql

I have the below table
BidID AppID AppStatus StatusTime
1 1 In Review 2019-01-02 12:00:00
1 1 Approved 2019-01-02 13:00:00
1 2 In Review 2019-01-04 13:00:00
1 2 Approved 2019-01-04 14:00:00
2 2 In Review 2019-01-07 15:00:00
2 2 Approved 2019-01-07 17:00:00
3 1 In Review 2019-01-09 13:00:00
4 1 Approved 2019-01-09 13:00:00
What I am trying to do is first to calculate the average of statusTime minutes difference by the following logic
First group by the BidID and then by AppID and then calculate the time difference between the StatusTime between In Review and Approved AppStatus
eg
First Group BidID,Then group App ID
, Then First Check for In Review Status and Find the Next Approved status and then have to calculate min difference between the dates
BidID AppID AppStatus BidAverage
1 -> 1,2 -> For App ID 1(2019-01-02 1hour 1.5
15:48:42.000 - 2019-01-02
12:33:36.000)
For App ID 2(2019-01-04 2hour
10:33:12.000 - 2019-01-04
10:33:12.000)
2-> 2 -> For App ID 2(2019-01-04 1 1
10:33:12.000 - 2019-01-04
10:33:12.000)
3-> 1-> No Calculation since no Approved
4-> 1-> No Calculation since no In Review before Approved
Final Average (1.5 + 1) / 2 = 1.25 for the table
The time difference excluding saturday I have already figured out Time Dfference Exluding Weekend using David's suggestion.
I am not sure how to check if AppStatus is first in In Review and then Approved and then only calculate the time difference and if there is no Approved like in BidID 3 then don't use that in the average calculation and then average it across the APPId and then the BidID
Thanks

I think you can just use min() and max() for simplicity to get the times for the bid/app pairs. The rest is just aggregation and more aggregation.
The processing you describe seems to be:
select avg(avg_bid_diff)
from (select bid, avg(diff*1.0) as avg_bid_diff
from (select bid, appid,
datediff(second, min(starttime), max(statustime)) as diff
from t
where appstatus in ('In Review', 'Approved')
group by bid, appid
having count(*) = 2
) ba
group by bid
) b;
This makes assumptions that are consistent with the provided data -- that the statuses don't have duplicates for the bid/app pairs an that approval is always after review.

Related

SQL Query - Identifying entries between payment dates greater than 6 years

I have this table (in reality it has more fields but for simplicity, it will demonstrate what I'm after)
Payment_Type
Person ID
Payment_date
Payment_Amount
Normal
1
2015-01-01
£1.00
Normal
1
2017-01-01
£2.00
Reversal
1
2022-01-09
£3.00
Normal
2
2016-12-29
£3.00
Reversal
2
2022-01-02
£4.00
I need 2 specific things from this:
I need all entries where there is over 6 years difference between any given payment dates (when its been greater than or equal to 6 years from the date of the latest payment date). I don't need to count them, I just need it to return all the entries that meet this criteria.
I also need it to specify where a normal payment hasn't been made for 6 years or more from todays date but a reversal has however occurred within the last 6 years. (This might need to be a separate query but will take suggestions)
I'm using Data Lake (Hue).
Thank you.
I've tried to run a sub query with join and union but I'm not getting the desired results so will need to start from scratch. Any advice/insight on this is greatly appreciated.
Ideally, query one will show:
Payment_Type
Person ID
Payment_date
Payment_Amount
Normal
1
2015-01-01
£1.00
Normal
1
2017-01-01
£2.00
Normal
2
2016-12-29
£3.00
Query 2 results should show:
Payment_Type
Person ID
Payment_date
Payment_Amount
Normal
1
2017-01-01
£2.00
Reversal
1
2022-01-09
£3.00
Normal
2
2016-12-29
£3.00
Reversal
2
2022-01-02
£4.00

How to measure an average count from a set of days each with their own data points, in SQL/LookerML

I have the following table:
id | decided_at | reviewer
1 2020-08-10 13:00 john
2 2020-08-10 14:00 john
3 2020-08-10 16:00 john
4 2020-08-12 14:00 jane
5 2020-08-12 17:00 jane
6 2020-08-12 17:50 jane
7 2020-08-12 19:00 jane
What I would like to do is get the difference between the min and max for each day and get the total count from the id's that are the min, the range between min and max, and the max. Currently, I'm only able to get this data for the past day.
Desired output:
Date | Time(h) | Count | reviewer
2020-08-10 3 3 john
2020-08-12 5 4 jane
From this, I would like to get the average show this data over the past x number of days.
Example:
If today was the 13th, filter on the past 2 days (48 hours)
Output:
reviewer | reviews/hour
jane 5/4 = 1.25
Example 2:
If today was the 13th, filter on the past 3 days (48 hours)
reviewer | reviews/hour
john 3/3 = 1
jane 5/4 = 1.25
Ideally, if this is possible in LookML without the use of a derived table, it would be nicest to have that. Otherwise, a solution in SQL would be great and I can try to convert to LookerML.
Thanks!
In SQL, one solution is to use two levels of aggregation:
select reviewer, sum(cnt) / sum(diff_h) review_per_hour
from (
select
reviewer,
date(decided_at) decided_date,
count(*) cnt,
timestampdiff(hour, min(decided_at), max(decided_at)) time_h
from mytable
where decided_at >= current_date - interval 2 day
group by reviewer, date(decided_at)
) t
group by reviewer
The subquery filters on the date range, aggregates by reviewer and day, and computes the number of records and the difference between the minimum and the maximum date, as hours. Then, the outer query aggregates by reviewer and does the final computation.
The actual function to compute the date difference varies across databases; timestampdiff() is supported in MySQL - other engines all have alternatives.

Calculate overlap time in seconds for groups in SQL

I have a bunch of timestamps grouped by ID and type in the sample data shown below.
I would like to find overlapped time between start_time and end_time columns in seconds for each group of ID and between each lead and follower combinations. I would like to show the overlap time only for the first record of each group which will always be the "lead" type.
For example, for the ID 1, the follower's start and end times in row 3 overlap with the lead's in row 1 for 193 seconds (from 09:00:00 to 09:03:13). the follower's times in row 3 also overlap with the lead's in row 2 for 133 seconds (09:01:00 to 2020-05-07 09:03:13). That's a total of 326 seconds (193+133)
I used the partition clause to rank rows by ID and type and order them by start_time as a start.
How do I get the overlap column?
row# ID type start_time end_time rank. overlap
1 1 lead 2020-05-07 09:00:00 2020-05-07 09:03:34 1 326
2 1 lead 2020-05-07 09:01:00 2020-05-07 09:03:13 2
3 1 follower 2020-05-07 08:59:00 2020-05-07 09:03:13 1
4 2 lead 2020-05-07 11:23:00 2020-05-07 11:33:00 1 540
4 2 follower 2020-05-07 11:27:00 2020-05-07 11:32:00 1
5 3 lead 2020-05-07 14:45:00 2020-05-07 15:00:00 1 305
6 3 follower 2020-05-07 14:44:00 2020-05-07 14:44:45 1
7 3 follower 2020-05-07 14:50:00 2020-05-07 14:55:05 2
In your example, the times completely cover the total duration. If this is always true, you can use the following logic:
select id,
(sum(datediff(second, start_time, end_time) -
datediff(second, min(start_time), max(end_time)
) as overlap
from t
group by id;
To add this as an additional column, then either use window functions or join in the result from the above query.
If the overall time has gaps, then the problem is quite a bit more complicated. I would suggest that you ask a new question and set up a db fiddle for the problem.
Tried this a couple of way and got it to work.
I first joined 2 tables with individual records for each type, 'lead' and 'follower' and created a case statement to calculate max start time for each lead and follower start time combination and min end time for each lead and follower end time combination. Stored this in a temp table.
CASE
WHEN lead_table.start_time > follower_table.start_time THEN lead_table.start_time
WHEN lead_table.start_time < follower_table.start_time THEN patient_table.start_time_local
ELSE 0
END as overlap_start_time,
CASE
WHEN follower_table.end_time < lead_table.end_time THEN follower_table.end_time
WHEN follower_table.end_time > lead_table.end_time THEN lead_table.end_time
ELSE 0
END as overlap_end_time
Then created an outer query to lookup the temp table just created to find the difference between start time and end time for each lead and follower combination in seconds
select temp_table.id,
temp_table.overlap_start_time,
temp_table.overlap_end_time,
DATEDIFF_BIG(second,
temp_table.overlap_start_time,
temp_table.overlap_end_time) as overlap_time FROM temp_table

Getting date difference between consecutive rows in the same group

I have a database with the following data:
Group ID Time
1 1 16:00:00
1 2 16:02:00
1 3 16:03:00
2 4 16:09:00
2 5 16:10:00
2 6 16:14:00
I am trying to find the difference in times between the consecutive rows within each group. Using LAG() and DATEDIFF() (ie. https://stackoverflow.com/a/43055820), right now I have the following result set:
Group ID Difference
1 1 NULL
1 2 00:02:00
1 3 00:01:00
2 4 00:06:00
2 5 00:01:00
2 6 00:04:00
However I need the difference to reset when a new group is reached, as in below. Can anyone advise?
Group ID Difference
1 1 NULL
1 2 00:02:00
1 3 00:01:00
2 4 NULL
2 5 00:01:00
2 6 00:04:00
The code would look something like:
select t.*,
datediff(second, lag(time) over (partition by group order by id), time)
from t;
This returns the difference as a number of seconds, but you seem to know how to convert that to a time representation. You also seem to know that group is not acceptable as a column name, because it is a SQL keyword.
Based on the question, you have put group in the order by clause of the lag(), not the partition by.

Detect Intervals

id_person transaction internation_in internation_out
1 456465 2015-01-01 2015-02-01
2 564564 2015-02-03 2015-04-02
3 4564654 2015-01-01 2015-01-05
4 4564646 2015-01-01 2015-02-04
4 4564656 2015-03-01 2015-04-15
4 87899465 2015-05-16 2015-05-25
5 56456456 2015-01-01 2105-01-08
5 45456546 2015-02-04 2015-03-04
I want to know how to group by id_person the difference (Interval in hours) between the internation_out from the first transaction with the internation_in of the next transaction.
I probe with lag and lead but I can't group by id_person
I Want this Result using id_person 4 for example
id_person transaction Gap
4 4564646 Null
4 4564656 The result of (2015-02-04- 2015-03-01)
4 87899465 The result of (2015-04-15- 2015-05-16)
If your time periods are not overlapping (and yours are not), then there is a simple calculation for the gaps: it is the total number of days from the beginning to the end minus the total on each row. So, you don't need lead() or lag():
select id_person,
(case when count(*) > 1
then (max(internation_out) - min(internation_in) -
sum(internation_out - internation_in)
)
end) as gap_duration
from table t
group by id_person;
Note that this returns NULL if there is only one row for the person. If you want 0, then you don't need the case.