Efficient way to calculate average time between row dates grouped by ID

Efficient way to calculate average time between row dates grouped by ID - sql

Suppose I have a table like this:
thedate ID
2014-10-20 14:13:42.063 1
2014-10-20 14:13:43.063 1
2014-10-20 14:13:47.063 1
2014-10-20 14:12:50.063 2
2014-10-20 14:13:49.063 2
2014-10-20 14:13:54.063 2
2014-10-20 14:20:24.063 2
2014-10-20 14:13:02.063 3
To replicate a similar toybox table as in this example you can use the following code:
declare #tmp as table(thedate datetime,ID int)
insert into #tmp (thedate, ID) values
(dateadd(s,0,getdate()),1), (dateadd(s,1,getdate()),1), (dateadd(s,5,getdate()),1),
(dateadd(s,-52,getdate()),2), (dateadd(s,7,getdate()),2), (dateadd(s,12,getdate()),2),(dateadd(s,402,getdate()),2),
(dateadd(s,-40,getdate()),3)
For each ID I want the average time between the dates. Now the database is huge (lots of ID's and dates for each ID), so it has to be very efficient. I want a result like this:
ID AvgTime (seconds)
1 2,5
2 151,333333333333
3 NULL
The following code does what I want, but it is way too slow:
select
a.ID,
(select top 1 avg(cast(datediff(s,(select max(thedate)
from #tmp c where ID = b.ID
and thedate < b.thedate)
,thedate) as float)) over (partition by b.ID)
from #tmp b where ID = a.ID)
from #tmp a group by ID
Does anyone know how to do this efficiently?

The average is the maximum minus the minimum divided by one less than the count. You can use this to write a relatively simple query:
select id,
cast(datediff(second, min(thedate), max(thedate)) as float) / (count(*) - 1)
from #tmp
group by id;
If some of the ids only have one row, then you'll want to check for potential divide by 0:
select id,
(case when count(*) > 1
then cast(datediff(second, min(thedate), max(thedate)) as float) / (count(*) - 1)
end) as AvgDiff
from #tmp
group by id;

Related

Find date in specific group which is next to current group avoiding undesired groups

Assume table called t1:
create table t1(
dates date,
groups number
);
insert into t1 values('01.03.2020', 1);
insert into t1 values('02.03.2020', 2);
insert into t1 values('10.03.2020', 3);
insert into t1 values('01.04.2020', 10);
insert into t1 values('02.04.2020', 20);
insert into t1 values('10.04.2020', 3);
DATES GROUPS
01.03.2020 1
02.03.2020 2
10.03.2020 3
01.04.2020 10
02.04.2020 20
10.04.2020 3
I need to add column which would store value from DATES column where GROUP column value equals to 3 and that should be date of nearest 3d group in term of time.
Desired result:
DATES GROUPS DATE_OF_NEXT_3D_GROUP
01.03.2020 1 10.03.2020
02.03.2020 2 10.03.2020
10.03.2020 3 NULL(or could be 10.04.2020 from next 3d group)
01.04.2020 10 10.04.2020
02.04.2020 20 10.04.2020
10.04.2020 3 NULL(or date from next 3d group)
... ... ...
Appreciate your help

I strongly, strongly recommend using analytic functions for this rather than a correlated subquery:
select dates, groups,
(case when groups <> 3
then min(case when groups = 3 then dates end) over (order by dates desc)
end)
from t1
order by 1;
Analytic functions are designed for this type of operation and should have much better performance.
Here is a db<>fiddle.

You can achieve this with a subquery:
select dates,
groups,
(select min(dates)
from t1 b
where b.groups = 3
and b.dates > a.dates) as next_g3_date
from t1 a;

Need sum of a column from a filter condition for each row

Need to get total sum of defect between main_date column and past 365 day (a year) from it, if any, for a single ID.
And The value need to be populated for each row.
Have tried below queries and tried to use CSUM also but it's not working:
1) select sum(Defect) as "sum",Id,MAIN_DT
from check_diff
where MAIN_DT between ADD_MONTHS(MAIN_DT,-12) and MAIN_DT group by 2,3;
2)select Defect,
Type1,
Type2,
Id,
MAIN_DT,
ADD_MONTHS(TIM_MAIN_DT,-12) year_old,
CSUM(Defect,MAIN_DT)
from check_diff
where
MAIN_DT between ADD_MONTHS(MAIN_DT,-12) and MAIN_DT group by id;
The expected output is as below:
Defect Type1 Type2 Id main_dt sum
1 a a 1 3/10/2017 1
99 a a 1 4/10/2018 99
0 a b 1 7/26/2018 99
1 a b 1 11/21/2018 100
1 a c 2 12/20/2018 1

Teradata doesn't support RANGE for Cumulative Sums, but you can rewrite it using a Correlated Scalar SUbquery:
select Defect, Id, MAIN_DT,
( select sum(Defect) as "sum"
from check_diff as t2
where t2.Id = t1.Id
and t2.MAIN_DT > ADD_MONTHS(t1.MAIN_DT,-12)
and t2.MAIN_DT <= t1.MAIN_DT group by 2,3;
) as dt
from check_diff as t1
Performance might be bad depending on the overall number of rows and the number of rows per ID.

SQL: How to count the sum of values without GROUP BY

I have the following table:
visitorId visitNumber DATE
1 1 20180101
1 2 20180101
1 3 20180105
2 1 20171230
2 2 20180106
What I would like to return is:
visitorId totalVisits max_visits_in_1_day
1 3 2
2 2 1
I manage to get everything working without max_visits_in_1_day using:
SELECT visitorId,
MAX(visitNumber) - MIN(visitNumber) + 1 as totalVisits,
GROUP BY visitorId
What I need to do is improve the code such that max_visits_in_1_day gets added. Something like MAX(COUNT(GROUP BY(DATE)))
I first tried adding MAX(COUNT(DATE)), but this aggregates all dates, and doesn't actually look for maximum unique date. In a sense, I would need to do a GROUP BY on DATE and the sum the counts then.
I tried adding GROUP BY visitorId, DATE but this creates extra rows.

You will have to take two steps like this:
SELECT visitorId, SUM(perDay) AS totalVisits, MAX(perDay) AS max_visits_in_1_day
FROM
(SELECT visitorId, COUNT(visitNumber) AS perDay, DATE
FROM myTable
GROUP BY visitorId, DATE) A
GROUP BY visitorId

You can try the following query -
SELECT visitorId
,COUNT(visitNumber) totalVisits
,mv1d.count max_visits_in_1_day
FROM YOUR_TABLE YT
INNER JOIN (SELECT visitorId, MAX(COUNT(DATE)) count
FROM YOUR_TABLE YT1)
ON YT.visitorId = YT1.visitorId
GROUP BY visitorId

SQL Query to generate an extra field from data in the table

I have a table with 3 fields like this sample table Tbl1
Person Cost FromDate
1 10 2009-1-1
1 20 2010-1-1
2 10 2009-1-1
I want to query it and get back the 3 fields and a generated field called ToDate that defaults to 2099-1-1 unless there is an actual ToDate implied from another entry for the person in the table.
select Person,Cost,FromDate,ToDate From Tbl1
Person Cost FromDate ToDate
1 10 2009-1-1 2010-1-1
1 20 2010-1-1 2099-1-1
2 10 2009-1-1 2099-1-1

You can select the minimum date from all dates that are after the record's date. If there is none you get NULL. With COALESCE you change NULL into the default date:
select
Person,
Cost,
FromDate,
coalesce((select min(FromDate) from Tbl1 later where later.FromDate > Tbl1.FromDate), '2099-01-01') as ToDate
From Tbl1
order by Person, FromDate;

Although Thorsten's answer is perfectly fine, it would be more efficient to use window-functions to match the derived end-dates.
;WITH nbrdTbl
AS ( SELECT Person, Cost, FromDate, row_nr = ROW_NUMBER() OVER (PARTITION BY Person ORDER BY FromDate ASC)
FROM Tbl1)
SELECT t.Person, t.Cost, t.FromDate, derived_end_date = COALESCE(nxt.FromDate, '9991231')
FROM nbrdTbl t
LEFT OUTER JOIN nbrdTbl nxt
ON nxt.Person = t.Person
AND nxt.row_nr = t.row_nr + 1
ORDER BY t.Person, t.FromDate
Doing a test on a 2000-records table it's about 3 times as efficient according to the Execution plan (78% vs 22%).

How do I aggregate numbers from a string column in SQL

I am dealing with a poorly designed database column which has values like this
ID cid Score
1 1 3 out of 3
2 1 1 out of 5
3 2 3 out of 6
4 3 7 out of 10
I want the aggregate sum and percentage of Score column grouped on cid like this
cid sum percentage
1 4 out of 8 50
2 3 out of 6 50
3 7 out of 10 70
How do I do this?

You can try this way :
select
t.cid
, cast(sum(s.a) as varchar(5)) +
' out of ' +
cast(sum(s.b) as varchar(5)) as sum
, ((cast(sum(s.a) as decimal))/sum(s.b))*100 as percentage
from MyTable t
inner join
(select
id
, cast(substring(score,0,2) as Int) a
, cast(substring(score,charindex('out of', score)+7,len(score)) as int) b
from MyTable
) s on s.id = t.id
group by t.cid
[SQLFiddle Demo]

Redesign the table, but on-the-fly as a CTE. Here's a solution that's not as short as you could make it, but that takes advantage of the handy SQL Server function PARSENAME. You may need to tweak the percentage calculation if you want to truncate rather than round, or if you want it to be a decimal value, not an int.
In this or most any solution, you have to count on the column values for Score to be in the very specific format you show. If you have the slightest doubt, you should run some other checks so you don't miss or misinterpret anything.
with
P(ID, cid, Score2Parse) as (
select
ID,
cid,
replace(Score,space(1),'.')
from scores
),
S(ID,cid,pts,tot) as (
select
ID,
cid,
cast(parsename(Score2Parse,4) as int),
cast(parsename(Score2Parse,1) as int)
from P
)
select
cid, cast(round(100e0*sum(pts)/sum(tot),0) as int) as percentage
from S
group by cid;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Efficient way to calculate average time between row dates grouped by ID - sql

Related

Find date in specific group which is next to current group avoiding undesired groups

Need sum of a column from a filter condition for each row

SQL: How to count the sum of values without GROUP BY

SQL Query to generate an extra field from data in the table

How do I aggregate numbers from a string column in SQL

Categories

Resources