Render a chart for incidences based on Age - kql

i'm rather noob at data analysis so... this might be extremely simple, just my logic is missing...
I have a table with several rows but only 2 matter: CreateDate & IncidentId
Basically i'm trying to render a piechart that would section things like: x incidents >30d; y incidents >15 and <30; z incidents >7 and <15; aa incidents <7
--
I'm starting out from a query from a former colleague who created a baseline to show this all in table form, initially I was trying to round down the CreateDate value to just DD-MM-YYYY, but i'm starting to struggle...
I managed to narrow down the CreateDate with bin() ... was thinking to use TrimEnd() to narrow down the dates... but i think i'm overcomplicating it....... I'm currently stuck at:
| project bin(CreateDate,1d), IncidentId
Current Results
If anyone has a couple of tips I'd be extremely appreciated!... i'm sure it's something simple that i'm missing...
Thank you so much!

Actually, only the CreateDate column matters
// Sample data generation. Not part of the solution.
let Incidents = materialize(range i from 1 to 100 step 1 | extend CreateDate = ago(50d * rand()));
// Solution starts here.
Incidents
| summarize count() by period = case
(
now() - CreateDate >= 30d, "30d <= x"
,now() - CreateDate >= 15d, "15d <= x < 30d"
,now() - CreateDate >= 7d, "7d <= x < 15d"
,"x < 7d"
)
| render piechart
period
count_
30d <= x
37
x < 7d
12
7d <= x < 15d
25
15d <= x < 30d
26
Fiddle

Related

Retrieve data 60 days prior to their retest date

I have a requirement where I need to retrieve Row(s) 60 days prior to their "Retest Date" which is a column present in the table. I have also attached the screenshot and the field "Retest Date" is highlighted.
reagentlotid
reagentlotdesc
u_retest
RL-0000004
NULL
2021-09-30 17:00:00.00
RL-0000005
NULL
2021-09-29 04:21:00.00
RL-0000006
NULL
2021-09-29 04:22:00.00
RL-0000007
Y-T4
2021-08-28 05:56:00.00
RL-0000008
NULL
2021-09-30 05:56:00.00
RL-0000009
NULL
2021-09-28 04:23:00.00
This is what I was trying to do in SQL Server:
select r.reagentlotid, r.reagentlotdesc, r.u_retestdt
from reagentlot r
where u_retestdt = DATEADD(DD,60,GETDATE());
But, it didn't work. The above query returning 0 rows.
Could please someone help me with this query?
Use a range, if you want all data from the day 60 days hence:
select r.reagentlotid, r.reagentlotdesc, r.u_retestdt
from reagentlot r
where
u_retestdt >= CAST(DATEADD(DD,60,GETDATE())
AS DATE) AND
u_retestdt < CAST(DATEADD(DD,61,GETDATE()) AS DATE)
Dates are like numbers; the time is like a decimal part. 12:00:00 is half way through a day so it's like x.5 - SQLServer even lets you manipulate datetime types by adding fractions of days etc (adding 0.5 is adding 12h)
If you had a column of numbers like 1.1, 1.5. 2.4 and you want all the one-point-somethings you can't get any of them by saying score = 1; you say score >= 1 and score < 2
Generally, you should try to avoid manipulating table data in a query's WHERE clause because it usually makes indexes unusable: if you want "all numbers between 1 and 2", use a range; don't chop the decimal off the table data in order to compare it to 1. Same with dates; don't chop the time off - use a range:
--yes
WHERE score >= 1 and score < 2
--no
WHERE CAST(score as INTEGER) = 1
--yes
WHERE birthdatetime >= '1970-01-01' and birthdatetime < '1970-01-02'
--no
WHERE CAST(birthdatetime as DATE) = '1970-01-01'
Note that I am using a CAST to cut the time off in my recommendation to you, but that's to establish a pair of constants of "midnight on the day 60 days in the future" and "midnight on 61 days in the future" that will be used in the range check.
Follow the rule of thumb of "avoid calling functions on columns in a where clause" and generally, you'll be fine :)
Try something like this. -60 days may be the current or previous year. HTH
;with doy1 as (
select DATENAME(dayofyear, dateadd(day,-60,GetDate())) as doy
)
, doy2 as (
select case when doy > 0 then doy
when doy < 0 then 365 - doy end as doy
, case when doy > 0 then year(getdate())
when doy < 0 then year(getdate())-1 end as yr
from doy1
)
select r.reagentlotid
, r.reagentlotdesc
, cast(r.u_retestdt as date) as u_retestdt
from reagentlot r
inner join doy2 d on DATENAME(dayofyear, r.u_retestdt) = d.doy
where DATENAME(dayofyear, r.u_retestdt) = doy
and year(r.u_retestdt) = d.yr

Data aggregation by sliding time periods

[Query and question edited and fixed thanks to comments from #Gordon Linoff and #shawnt00]
I recently inherited a SQL query that calculates the number of some events in time windows of 30 days from a log database. It uses a CTE (Common Table Expression) to generate the 30 days ranges since '2019-01-01' to now. And then it counts the cases in those 30/60/90 days intervals. I am not sure this is the best method. All I know is that it takes a long time to run and I do not understand 100% how exactly it works. So I am trying to rebuild it in an efficient way (maybe as it is now is the most efficient way, I do not know).
I have several questions:
One of the things I notice is that instead of using DATEDIFF the query simply substracts a number of days from the date.Is that a good practice at all?
Is there a better way of doing the time comparisons?
Is there a better way to do the whole thing? The bottom line is: I need to aggregate data by number of occurrences in time periods of 30, 60 and 90 days.
Note: LogDate original format is like 2019-04-01 18:30:12.000.
DECLARE #dt1 Datetime='2019-01-01'
DECLARE #dt2 Datetime=getDate();
WITH ctedaterange
AS (SELECT [Dates]=#dt1
UNION ALL
SELECT [dates] + 30
FROM ctedaterange
WHERE [dates] + 30<= #dt2)
SELECT
[dates],
lt.Activity, COUNT(*) as Total,
SUM(CASE WHEN lt.LogDate <= dates and lt.LogDate > dates - 90 THEN 1 ELSE 0 END) AS Activity90days,
SUM(CASE WHEN lt.LogDate <= dates and lt.LogDate > dates - 60 THEN 1 ELSE 0 END) AS Activity60days,
SUM(CASE WHEN lt.LogDate <= dates and lt.LogDate > dates - 30 THEN 1 ELSE 0 END) AS Activity30days
FROM ctedaterange AS cte
JOIN (SELECT Activity, CONVERT(DATE, LogDate) as LogDate FROM LogTable) AS lt
ON cte.[dates] = lt.LogDate
group by [dates], lt.Activity
OPTION (maxrecursion 0)
Sample dataset (LogTable):
LogDate, Activity
2020-02-25 01:10:10.000, Activity01
2020-04-14 01:12:10.000, Activity02
2020-08-18 02:03:53.000, Activity02
2019-10-29 12:25:55.000, Activity01
2019-12-24 18:11:11.000, Activity03
2019-04-02 03:33:09.000, Activity01
Expected Output (the output does not reflect the data shown above for I would need too many lines in the sample set to be shown in this post)
As I said above, the bottom line is: I need to aggregate data by number of occurrences in time periods of 30, 60 and 90 days.
Activity, Activity90days, Activity60days, Activity30days
Activity01, 3, 0, 1
Activity02, 1, 10, 2
Activity03, 5, 1, 3
Thank you for any suggestion.
SQL Server doesn't yet have the option to range over values of the window frame of an analytic function. Since you've generated all possible dates though and you've already got the counts by date, it's very easy to look back a specific number of (aggregated) rows to get the right totals. Here is my suggested expression for 90 days:
sum(count(LogDate)) over (
partition by Activity order by [dates]
with rows between 89 preceding and current row
)

Calculating Starting Date Across Multiple Date Ranges

I have a SQL question which I have not been able to solve for a while now. I have looked everywhere without luck. I spent a lot of time looking through StackOverflow and thought someone here might be able to help.
I need to be able to determine a starting date x number of days back across multiple ranges. For example:
Range || Ending Date of Range (date) || Number of Days in Range (int)
1 || 2/20/2013 || 44
2 || 9/5/2014 || 75
3 || 3/25/2016 || 20
I have 3 ranges lasting various amounts of time with the ending date for each. The date ranges never overlap. I need to count a specific number, let's say 100 days, back in the range. In the example above. The answer would be 2/15/13.
20 + 75 = 95 days. 100 - 95 = 5 days. So 5 days back from 2/20/2013 is 2/15/2013.
Does anyone know how I might go about accomplishing this in a SQL query?
I believe the best way involves adding the ranges together until the sum passes the desired number (100) then taking the difference just prior (95) and subtracting it from (100) which would give me (5) from there its just simple date math. I could easily do this with any programming language but with SQL I am struggling.
Really what I need help with is coming up with (5) and the correct end date (2/20/2013). I can handle the date math from there.
I would appreciate any guidance on how I might a go about accomplishing this. Thanks in advance.
I tried this in Sybase:
SELECT
max(CASE WHEN total_days >= chosen_days THEN end_date END)
-
min(CASE WHEN total_days < chosen_days THEN chosen_days - total_days ELSE chosen_days END)
FROM
(
SELECT a.end_date, sum(b.days_in_range) AS total_days, 100 AS chosen_days
FROM test AS a JOIN test AS b ON a.end_date <= b.end_date
GROUP BY a.end_date
) AS x

SQL Server : get records where dates are in specific time fram

Let's say I have the folling table
EventName | eventStartDate | eventEndDate
--------------------------------------------------
Event 1 | 11/11/2015 | 01/31/2016
Event 2 | 01/24/2016 | 01/26/2016
Event 3 | 02/23/2015 | 03/20/2016
Event 4 | 02/20/2016 | 02/26/2016
I'd like to write query that gets all the events where the event takes place within a calendar month. So for example I want to get all events that are active in January. This would leave me with Event 1, Event 2 and Event 3
Any help is appreciated
This can be done using a bunch of conditions like:
select *
from your_table
where
(eventStartDate >= '20160101' and eventStartDate <= '20160131')
or
(eventEndDate >= '20160101' and eventEndDate <= '20160131')
or
(eventStartDate <= '20160101' and eventEndDate >= '20160131')
First one defines period that starts in Jan.
Second one defines perion that ends in Jan.
Third one defines period that starts before and ends after Jan (inclusively).
Obviously - there are no other periods intersecting this particular January possible.
Update:
All these conditions could be simplified to:
select *
from your_table
where eventStartDate <= '20160131' and eventEndDate >= '20160101'
This condition still defines all three periods mentioned above, but significantly shorter.
A different approach would be to compute a six-digit number for each month and then use the BETWEENoperator:
SELECT
[EventName]
, [eventStartDate]
, [eventEndDate]
FROM [tblEvents]
WHERE 201601 BETWEEN YEAR([eventStartDate]) * 100 + MONTH([eventStartDate]) AND YEAR([eventEndDate]) * 100 + MONTH([eventEndDate])
Update: With huge data sets, however, there will be a lot of computing going on, so performance has to be kept in mind.
DECLARE
#CurrentMonth int =1 -- MONTH IN NUMBERS
Please this
SELECT *
EventName
,eventStartDate
,eventEndDate
FROM EVENTs
WHERE
YEAR(eventStartDate) =YEAR(GETUTCDATE())
AND (MONTH(eventStartDate)>= #CurrentMonth AND MONTH(eventStartDate)<(#CurrentMonth+1))

Change select to a previous date

I have basic knowledge of SQL and have a question:
I am trying to select data from a time series (date and windspeed). I want to select the original wind speed value if it lies between hours 7 and 21. If the hour is outside this range I would like to assign the wind speed to the previous wind speed at hour 21. There is also a concern that there is the occasional point where hour 21 does not exist and would like to assign the windspeed as hour 20... 19 etc until it finds the next available hour.
SELECT
date,
CASE WHEN DATEPART(HH,date) < 7 OR DATEPART(HH,date) > 21
THEN '<WIND SPEED AT HOUR 21> ELSE <WIND SPEED> END AS ModifiedWindspeed
,WindSpeed, winddirection
from TerrainCorrectedHourlyWind w
This might make things clearer. If the hour is in the specified range, select windspeed. If not then select the wind speed from the prior day at 21 hours.
Though you've tagged the question mysql, I'm guessing this is actually SQL Server because of the DATEPART() function used. Try the following, which uses an OUTER APPLY to get your alternate value:
SELECT Date
, CASE
WHEN DATEPART(HOUR, Date)BETWEEN 7 AND 21 THEN w.WindSpeed
ELSE m.WindSpeed
END AS ModifiedWindSpeed
, w.WindSpeed
, w.WindDirection
FROM TerrainCorrectedHourlyWind AS w
OUTER APPLY(SELECT TOP 1 WindSpeed
FROM TerrainCorrectedHourlyWind
WHERE DATEPART(HOUR, Date)BETWEEN 7 AND 21
AND Date < w.Date
ORDER BY Date DESC)AS m;
Just to explain what this is doing--the OUTER APPLY will get the single most recent record (TOP 1 and ORDER BY Date DESC) for dates prior to the record in question (Date < w.Date) as well as within the hours specified. The CASE near the top chooses whether to use the current value or this alternate one based on the hour.