Data aggregation by sliding time periods

Data aggregation by sliding time periods - sql

[Query and question edited and fixed thanks to comments from #Gordon Linoff and #shawnt00]
I recently inherited a SQL query that calculates the number of some events in time windows of 30 days from a log database. It uses a CTE (Common Table Expression) to generate the 30 days ranges since '2019-01-01' to now. And then it counts the cases in those 30/60/90 days intervals. I am not sure this is the best method. All I know is that it takes a long time to run and I do not understand 100% how exactly it works. So I am trying to rebuild it in an efficient way (maybe as it is now is the most efficient way, I do not know).
I have several questions:
One of the things I notice is that instead of using DATEDIFF the query simply substracts a number of days from the date.Is that a good practice at all?
Is there a better way of doing the time comparisons?
Is there a better way to do the whole thing? The bottom line is: I need to aggregate data by number of occurrences in time periods of 30, 60 and 90 days.
Note: LogDate original format is like 2019-04-01 18:30:12.000.
DECLARE #dt1 Datetime='2019-01-01'
DECLARE #dt2 Datetime=getDate();
WITH ctedaterange
AS (SELECT [Dates]=#dt1
UNION ALL
SELECT [dates] + 30
FROM ctedaterange
WHERE [dates] + 30<= #dt2)
SELECT
[dates],
lt.Activity, COUNT(*) as Total,
SUM(CASE WHEN lt.LogDate <= dates and lt.LogDate > dates - 90 THEN 1 ELSE 0 END) AS Activity90days,
SUM(CASE WHEN lt.LogDate <= dates and lt.LogDate > dates - 60 THEN 1 ELSE 0 END) AS Activity60days,
SUM(CASE WHEN lt.LogDate <= dates and lt.LogDate > dates - 30 THEN 1 ELSE 0 END) AS Activity30days
FROM ctedaterange AS cte
JOIN (SELECT Activity, CONVERT(DATE, LogDate) as LogDate FROM LogTable) AS lt
ON cte.[dates] = lt.LogDate
group by [dates], lt.Activity
OPTION (maxrecursion 0)
Sample dataset (LogTable):
LogDate, Activity
2020-02-25 01:10:10.000, Activity01
2020-04-14 01:12:10.000, Activity02
2020-08-18 02:03:53.000, Activity02
2019-10-29 12:25:55.000, Activity01
2019-12-24 18:11:11.000, Activity03
2019-04-02 03:33:09.000, Activity01
Expected Output (the output does not reflect the data shown above for I would need too many lines in the sample set to be shown in this post)
As I said above, the bottom line is: I need to aggregate data by number of occurrences in time periods of 30, 60 and 90 days.
Activity, Activity90days, Activity60days, Activity30days
Activity01, 3, 0, 1
Activity02, 1, 10, 2
Activity03, 5, 1, 3
Thank you for any suggestion.

SQL Server doesn't yet have the option to range over values of the window frame of an analytic function. Since you've generated all possible dates though and you've already got the counts by date, it's very easy to look back a specific number of (aggregated) rows to get the right totals. Here is my suggested expression for 90 days:
sum(count(LogDate)) over (
partition by Activity order by [dates]
with rows between 89 preceding and current row
)

Related

Retrieve data 60 days prior to their retest date

I have a requirement where I need to retrieve Row(s) 60 days prior to their "Retest Date" which is a column present in the table. I have also attached the screenshot and the field "Retest Date" is highlighted.
reagentlotid
reagentlotdesc
u_retest
RL-0000004
NULL
2021-09-30 17:00:00.00
RL-0000005
NULL
2021-09-29 04:21:00.00
RL-0000006
NULL
2021-09-29 04:22:00.00
RL-0000007
Y-T4
2021-08-28 05:56:00.00
RL-0000008
NULL
2021-09-30 05:56:00.00
RL-0000009
NULL
2021-09-28 04:23:00.00
This is what I was trying to do in SQL Server:
select r.reagentlotid, r.reagentlotdesc, r.u_retestdt
from reagentlot r
where u_retestdt = DATEADD(DD,60,GETDATE());
But, it didn't work. The above query returning 0 rows.
Could please someone help me with this query?

Use a range, if you want all data from the day 60 days hence:
select r.reagentlotid, r.reagentlotdesc, r.u_retestdt
from reagentlot r
where
u_retestdt >= CAST(DATEADD(DD,60,GETDATE())
AS DATE) AND
u_retestdt < CAST(DATEADD(DD,61,GETDATE()) AS DATE)
Dates are like numbers; the time is like a decimal part. 12:00:00 is half way through a day so it's like x.5 - SQLServer even lets you manipulate datetime types by adding fractions of days etc (adding 0.5 is adding 12h)
If you had a column of numbers like 1.1, 1.5. 2.4 and you want all the one-point-somethings you can't get any of them by saying score = 1; you say score >= 1 and score < 2
Generally, you should try to avoid manipulating table data in a query's WHERE clause because it usually makes indexes unusable: if you want "all numbers between 1 and 2", use a range; don't chop the decimal off the table data in order to compare it to 1. Same with dates; don't chop the time off - use a range:
--yes
WHERE score >= 1 and score < 2
--no
WHERE CAST(score as INTEGER) = 1
--yes
WHERE birthdatetime >= '1970-01-01' and birthdatetime < '1970-01-02'
--no
WHERE CAST(birthdatetime as DATE) = '1970-01-01'
Note that I am using a CAST to cut the time off in my recommendation to you, but that's to establish a pair of constants of "midnight on the day 60 days in the future" and "midnight on 61 days in the future" that will be used in the range check.
Follow the rule of thumb of "avoid calling functions on columns in a where clause" and generally, you'll be fine :)

Try something like this. -60 days may be the current or previous year. HTH
;with doy1 as (
select DATENAME(dayofyear, dateadd(day,-60,GetDate())) as doy
)
, doy2 as (
select case when doy > 0 then doy
when doy < 0 then 365 - doy end as doy
, case when doy > 0 then year(getdate())
when doy < 0 then year(getdate())-1 end as yr
from doy1
)
select r.reagentlotid
, r.reagentlotdesc
, cast(r.u_retestdt as date) as u_retestdt
from reagentlot r
inner join doy2 d on DATENAME(dayofyear, r.u_retestdt) = d.doy
where DATENAME(dayofyear, r.u_retestdt) = doy
and year(r.u_retestdt) = d.yr

SELECT statement optimization

I'm not so expert in SQL queryes, but not even a complete newbie.
I'm exporting data from a MS-SQL database to an excel file using a SQL query.
I'm exporting many columns and two of this columns contain a date and an hour, this are the columns I use for the WHERE clause.
In detail I have about 200 rows for each day, everyone with a different hour, for many days. I need to extract the first value after the 15:00 of each day for more days.
Since the hours are different for each day i can't specify something like
SELECT a,b,hour,day FROM table WHERE hour='15:01'
because sometimes the value is at 15:01, sometimes 15:03 and so on (i'm looking for the closest value after the 15:00), for fix this i used this workaround:
SELECT TOP 1 a,b,hour,day FROM table WHERE hour > "15:00"
in this way i can take the first value after the 15:00 for a day...the problem is that i need this for more days...for a user-specifyed interval of days. At the moment i fix this with a UNION ALL statement, like this:
SELECT TOP 1 a,b,hour,day FROM table WHERE data="first_day" AND hour > "15:00"
UNION ALL SELECT TOP 1 a,b,hour,day FROM table WHERE data="second_day" AND hour > "15:00"
UNION ALL SELECT TOP 1 a,b,hour,day FROM table WHERE data="third_day" AND hour > "15:00"
...and so on for all the days (i build the SQL string with a for each day in the specifyed interval).
Until now this worked, but now I need to expand the days interval (now is maximun a week, so 5 days) to up to 60 days. I don't want to build an huge query string, but i can't imagine an alternative way for write the SQL.
Any help appreciated
Ettore

I typical solution for this uses row_number():
SELECT a, b, hour, day
FROM (SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY day ORDER BY hour) as seqnum
FROM table t
WHERE hour > '15:00'
) t
WHERE seqnum = 1;

Using iif to mimic CASE for days of week

I've hit a little snag with one of my queries. I'm throwing together a simple chart to plot a number of reports being submitted by day of week.
My query to start was :
SELECT Weekday(incidentdate) AS dayOfWeek
, Count(*) AS NumberOfIncidents
FROM Incident
GROUP BY Weekday(incidentdate);
This works fine and returns what I want, something like
1 200
2 323
3 32
4 322
5 272
6 282
7 190
The problem is, I want the number returned by the weekday function to read the corresponding day of week, like case when 1 then 'sunday' and so forth. Since Access doesn;t have the SQL server equivalent that returns it as the word for the weekday, I have to work around.
Problem is, it's not coming out the way I want. So I wrote it using iif since I can't use CASE. The problem is, since each iif statement is treated like a column selection (the way I'm writing it), my data comes out unusable, like this
SELECT
iif(weekday(incidentdate) =1,'Sunday'),
iif(weekday(incidentdate) =2,'Monday')
'so forth
, Count(*) AS NumberOfIncidents
FROM tblIncident
GROUP BY Weekday(incidentdate);
Expr1000 Expr1001 count
Sunday 20
Monday 106
120
186
182
164
24
Of course, I want my weekdays to be in the same column as the original query. Halp pls

Use the WeekdayName() function.
SELECT
WeekdayName(Weekday(incidentdate)) AS dayOfWeek,
Count(*) AS NumberOfIncidents
FROM Incident
GROUP BY WeekdayName(Weekday(incidentdate));

As BWS Suggested, Switch was what I wanted. Here's what I ended up writing
SELECT
switch(
Weekday(incidentdate) = 1, 'Sunday',
Weekday(incidentdate) = 2,'Monday',
Weekday(incidentdate) = 3,'Tuesday',
Weekday(incidentdate) = 4,'Wednesday',
Weekday(incidentdate) = 5,'Thursday',
Weekday(incidentdate) = 6,'Friday',
Weekday(incidentdate) = 7,'Saturday'
) as DayOfWeek
, Count(*) AS NumberOfIncidents
FROM tblIncident
GROUP BY Weekday(incidentdate);
Posting this here so there's actual code for future readers
Edit: WeekdayName(weekday(yourdate)) as HansUp said it probably a little easier :)

check this previous post:
What is the equivalent of Select Case in Access SQL?

Why not just create a 7 row table with day number & day name then just join to it?

Change select to a previous date

I have basic knowledge of SQL and have a question:
I am trying to select data from a time series (date and windspeed). I want to select the original wind speed value if it lies between hours 7 and 21. If the hour is outside this range I would like to assign the wind speed to the previous wind speed at hour 21. There is also a concern that there is the occasional point where hour 21 does not exist and would like to assign the windspeed as hour 20... 19 etc until it finds the next available hour.
SELECT
date,
CASE WHEN DATEPART(HH,date) < 7 OR DATEPART(HH,date) > 21
THEN '<WIND SPEED AT HOUR 21> ELSE <WIND SPEED> END AS ModifiedWindspeed
,WindSpeed, winddirection
from TerrainCorrectedHourlyWind w
This might make things clearer. If the hour is in the specified range, select windspeed. If not then select the wind speed from the prior day at 21 hours.

Though you've tagged the question mysql, I'm guessing this is actually SQL Server because of the DATEPART() function used. Try the following, which uses an OUTER APPLY to get your alternate value:
SELECT Date
, CASE
WHEN DATEPART(HOUR, Date)BETWEEN 7 AND 21 THEN w.WindSpeed
ELSE m.WindSpeed
END AS ModifiedWindSpeed
, w.WindSpeed
, w.WindDirection
FROM TerrainCorrectedHourlyWind AS w
OUTER APPLY(SELECT TOP 1 WindSpeed
FROM TerrainCorrectedHourlyWind
WHERE DATEPART(HOUR, Date)BETWEEN 7 AND 21
AND Date < w.Date
ORDER BY Date DESC)AS m;
Just to explain what this is doing--the OUTER APPLY will get the single most recent record (TOP 1 and ORDER BY Date DESC) for dates prior to the record in question (Date < w.Date) as well as within the hours specified. The CASE near the top chooses whether to use the current value or this alternate one based on the hour.

Find closest date in SQL Server

I have a table dbo.X with DateTime column Y which may have hundreds of records.
My Stored Procedure has parameter #CurrentDate, I want to find out the date in the column Y in above table dbo.X which is less than and closest to #CurrentDate.
How to find it?

The where clause will match all rows with date less than #CurrentDate and, since they are ordered descendantly, the TOP 1 will be the closest date to the current date.
SELECT TOP 1 *
FROM x
WHERE x.date < #CurrentDate
ORDER BY x.date DESC

Use DateDiff and order your result by how many days or seconds are between that date and what the Input was
Something like this
select top 1 rowId, dateCol, datediff(second, #CurrentDate, dateCol) as SecondsBetweenDates
from myTable
where dateCol < #currentDate
order by datediff(second, #CurrentDate, dateCol)

I have a better solution for this problem i think.
I will show a few images to support and explain the final solution.
Background
In my solution I have a table of FX Rates. These represent market rates for different currencies. However, our service provider has had a problem with the rate feed and as such some rates have zero values. I want to fill the missing data with rates for that same currency that as closest in time to the missing rate. Basically I want to get the RateId for the nearest non zero rate which I will then substitute. (This is not shown here in my example.)
1) So to start off lets identify the missing rates information:
Query showing my missing rates i.e. have a rate value of zero
2) Next lets identify rates that are not missing.
Query showing rates that are not missing
3) This query is where the magic happens. I have made an assumption here which can be removed but was added to improve the efficiency/performance of the query. The assumption on line 26 is that I expect to find a substitute transaction on the same day as that of the missing / zero transaction.
The magic happens is line 23: The Row_Number function adds an auto number starting at 1 for the shortest time difference between the missing and non missing transaction. The next closest transaction has a rownum of 2 etc.
Please note that in line 25 I must join the currencies so that I do not mismatch the currency types. That is I don't want to substitute a AUD currency with CHF values. I want the closest matching currencies.
Combining the two data sets with a row_number to identify nearest transaction
4) Finally, lets get data where the RowNum is 1
The final query
The query full query is as follows;
; with cte_zero_rates as
(
Select *
from fxrates
where (spot_exp = 0 or spot_exp = 0)
),
cte_non_zero_rates as
(
Select *
from fxrates
where (spot_exp > 0 and spot_exp > 0)
)
,cte_Nearest_Transaction as
(
select z.FXRatesID as Zero_FXRatesID
,z.importDate as Zero_importDate
,z.currency as Zero_Currency
,nz.currency as NonZero_Currency
,nz.FXRatesID as NonZero_FXRatesID
,nz.spot_imp
,nz.importDate as NonZero_importDate
,DATEDIFF(ss, z.importDate, nz.importDate) as TimeDifferece
,ROW_NUMBER() Over(partition by z.FXRatesID order by abs(DATEDIFF(ss, z.importDate, nz.importDate)) asc) as RowNum
from cte_zero_rates z
left join cte_non_zero_rates nz on nz.currency = z.currency
and cast(nz.importDate as date) = cast(z.importDate as date)
--order by z.currency desc, z.importDate desc
)
select n.Zero_FXRatesID
,n.Zero_Currency
,n.Zero_importDate
,n.NonZero_importDate
,DATEDIFF(s, n.NonZero_importDate,n.Zero_importDate) as Delay_In_Seconds
,n.NonZero_Currency
,n.NonZero_FXRatesID
from cte_Nearest_Transaction n
where n.RowNum = 1
and n.NonZero_FXRatesID is not null
order by n.Zero_Currency, n.NonZero_importDate

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas