Creating average for specific timeframe - sql

I'm setting up a time series with each row = 1 hr.
The input data has sometimes multiple values per hour. This can vary.
Right now the specific code looks like this:
select
patientunitstayid
, generate_series(ceil(min(nursingchartoffset)/60.0),
ceil(max(nursingchartoffset)/60.0)) as hr
, avg(case when nibp_systolic >= 1 and nibp_systolic <= 250 then
nibp_systolic else null end) as nibp_systolic_avg
from nc
group by patientunitstayid
order by patientunitstayid asc;
and generates this data:
It takes the average of the entire time series for each patient instead of taking it for each hour. How can I fix this?

I'm expecting something like this:
select nc.patientunitstayid, gs.hr,
avg(case when nc.nibp_systolic >= 1 and nc.nibp_systolic <= 250
then nibp_systolic
end) as nibp_systolic_avg
from (select nc.*,
min(nursingchartoffset) over (partition by patientunitstayid) as min_nursingchartoffset,
max(nursingchartoffset) over (partition by patientunitstayid) as max_nursingchartoffset
from nc
) nc cross join lateral
generate_series(ceil(min_nursingchartoffset/60.0),
ceil(max_nursingchartoffset/60.0)
) as gs(hr)
group by nc.patientunitstayid, hr
order by nc.patientunitstayid asc, hr asc;
That is, you need to be aggregating by hr. I put this into the from clause, to highlight that this generates rows. If you are using an older version of Postgres, then you might not have lateral joins. If so, just use a subquery in the from clause.
EDIT:
You can also try:
from (select nc.*,
generate_series(ceil(min(nursingchartoffset) over (partition by patientunitstayid) / 60.0),
ceil(max(nursingchartoffset) over (partition by patientunitstayid)/ 60.0)
) hr
from nc
) nc
And adjust the references to hr in the outer query.

Related

Create partitions based on column values in sql

I am very new to sql and query writing and after alot of trying, I am asking for help.
As shown in the picture, I want to create partition of data based on is_late = 1 and show its count (that is 2) but at the same time want to capture the value of last_status where is_late = 0 to be displayed in the single row.
The task is to calculate how many time the rider was late and time taken by him from first occurrence of estimated time to the last_status.
Desired output:
You can use following query
SELECT
rider_id,
task_created_time,
expected_time_to_arrive,
is_late,
last_status,
task_count,
CONVERT(VARCHAR(5), DATEADD(MINUTE, DATEDIFF(MINUTE, expected_time_to_arrive, last_status), 0), 114) AS time_delayed
FROM
(SELECT
rider_id,
task_created_time,
expected_time_to_arrive,
is_late,
SUM(CASE WHEN is_late = 1 THEN 1 ELSE 0 END) OVER(PARTITION BY rider_id ORDER BY rider_id) AS task_count,
ROW_NUMBER() OVER(PARTITION BY rider_id ORDER BY rider_id) AS num,
MAX(last_status) OVER(PARTITION BY rider_id ORDER BY rider_id) AS last_status
FROM myTestTable) t
WHERE num = 1
db<>fiddle

Determine cluster of access time within 10min intervals per user per day in SQL Server

How to query in SQL from the sample data, it will group or cluster the access_time per user per day within 10min intervals?
This is a complete guess, based on reading between the lines, and is untested due to a lack of consumable sample data.
It, however, looks like you are after a triangular JOIN (these can perform poorly, especially as this won't be SARGable) and a DENSE_RANK:
SELECT YT.[date],
YT.User_ID,
YT2.AccessTime,
DENSE_RANK() OVER (PARTITION BY YT.[date], YT.User_ID ORDER BY YT1.AccessTime) AS Cluster
FROM dbo.YourTable YT
JOIN dbo.YourTable YT2 ON YT.[date] = YT2.[date]
AND YT.User_ID = YT2.User_ID
AND YT.AccessTime <= YT2.AccessTime --This will join the row to itself
AND DATEADD(MINUTE,10,YT.AccessTime) >= YT2.AccessTime; --That is intentional
If I have understood your problem you want to group all accesses for a user in a day when all accesses of that group are in a time interval of 10 minutes. Not counting single accesses, so an access distant more than 10 minutes from every other is not counted as a cluster.
You can identify the clusters joining the accesses table with itself to get all possible time intervals of 10 minutes and number them.
Finally simply rejoin access table to get accesses for each cluster:
; with
user_clusters as (
select a1.date, a1.user_id, a1.access_time cluster_start, a2.access_time cluster_end,
ROW_NUMBER() over (partition by a1.date, a1.user_id order by a1.access_time) user_cluster_id
from ACCESS_TIMES a1
join ACCESS_TIMES a2 on a1.date = a2.date and a1.user_id = a2.user_id
and a1.access_time < a2.access_time
and datediff(minute, a1.access_time, a2.access_time)<10
)
select *
from user_clusters c
join ACCESS_TIMES a on a.date = c.date and a.user_id = c.user_id and a.access_time between c.cluster_start and cluster_end
order by a.date, a.user_id, c.user_cluster_id, a.access_time
output:
date user_id access_time user_cluster_id
'2020-09-19', 'AA083P', '2020-09-19 18:15:00', 1
'2020-09-19', 'AA083P', '2020-09-19 18:22:00', 1
'2020-09-19', 'AA083P', '2020-09-19 18:22:00', 2
'2020-09-19', 'AA083P', '2020-09-19 18:28:00', 2
'2020-09-20', 'AB162Y', '2020-09-20 19:34:00', 1
'2020-09-20', 'AB162Y', '2020-09-20 19:37:00', 1

How to solve a nested aggregate function in SQL?

I'm trying to use a nested aggregate function. I know that SQL does not support it, but I really need to do something like the below query. Basically, I want to count the number of users for each day. But I want to only count the users that haven't completed an order within a 15 days window (relative to a specific day) and that have completed any order within a 30 days window (relative to a specific day). I already know that it is not possible to solve this problem using a regular subquery (it does not allow to change subquery values for each date). The "id" and the "state" attributes are related to the orders. Also, I'm using Fivetran with Snowflake.
SELECT
db.created_at::date as Date,
count(case when
(count(case when (db.state = 'finished')
and (db.created_at::date between dateadd(day,-15,Date) and dateadd(day,-1,Date)) then db.id end)
= 0) and
(count(case when (db.state = 'finished')
and (db.created_at::date between dateadd(day,-30,Date) and dateadd(day,-16,Date)) then db.id end)
> 0) then db.user end)
FROM
data_base as db
WHERE
db.created_at::date between '2020-01-01' and dateadd(day,-1,current_date)
GROUP BY Date
In other words, I want to transform the below query in a way that the "current_date" changes for each date.
WITH completed_15_days_before AS (
select
db.user as User,
count(case when db.state = 'finished' then db.id end) as Completed
from
data_base as db
where
db.created_at::date between dateadd(day,-15,current_date) and dateadd(day,-1,current_date)
group by User
),
completed_16_days_before AS (
select
db.user as User,
count(case when db.state = 'finished' then db.id end) as Completed
from
data_base as db
where
db.created_at::date between dateadd(day,-30,current_date) and dateadd(day,-16,current_date)
group by User
)
SELECT
date(db.created_at) as Date,
count(distinct case when comp_15.completadas = 0 and comp_16.completadas > 0 then comp_15.user end) as "Total Users Churn",
count(distinct case when comp_15.completadas > 0 then comp_15.user end) as "Total Users Active",
week(Date) as Week
FROM
data_base as db
left join completadas_15_days_before as comp_15 on comp_15.user = db.user
left join completadas_16_days_before as comp_16 on comp_16.user = db.user
WHERE
db.created_at::date between '2020-01-01' and dateadd(day,-1,current_date)
GROUP BY Date
Does anyone have a clue on how to solve this puzzle? Thank you very much!
The following should give you roughly what you want - difficult to test without sample data but should be a good enough starting point for you to then amend it to give you exactly what you want.
I've commented to the code to hopefully explain what each section is doing.
-- set parameter for the first date you want to generate the resultset for
set start_date = TO_DATE('2020-01-01','YYYY-MM-DD');
-- calculate the number of days between the start_date and the current date
set num_days = (Select datediff(day, $start_date , current_date()+1));
--generate a list of all the dates from the start date to the current date
-- i.e. every date that needs to appear in the resultset
WITH date_list as (
select
dateadd(
day,
'-' || row_number() over (order by null),
dateadd(day, '+1', current_date())
) as date_item
from table (generator(rowcount => ($num_days)))
)
--Create a list of all the orders that are in scope
-- i.e. 30 days before the start_date up to the current date
-- amend WHERE clause to in/exclude records as appropriate
,order_list as (
SELECT created_at, rt_id
from data_base
where created_at between dateadd(day,-30,$start_date) and current_date()
and state = 'finished'
)
SELECT dl.date_item
,COUNT (DISTINCT ol30.RT_ID) AS USER_COUNT
,COUNT (ol30.RT_ID) as ORDER_COUNT
FROM date_list dl
-- get all orders between -30 and -16 days of each date in date_list
left outer join order_list ol30 on ol30.created_at between dateadd(day,-30,dl.date_item) and dateadd(day,-16,dl.date_item)
-- exclude records that have the same RT_ID as in the ol30 dataset but have a date between 0 amd -15 of the date in date_list
WHERE NOT EXISTS (SELECT ol15.RT_ID
FROM order_list ol15
WHERE ol30.RT_ID = ol15.RT_ID
AND ol15.created_at between dateadd(day,-15,dl.date_item) and dl.date_item)
GROUP BY dl.date_item
ORDER BY dl.date_item;

SQL - Grouping by Last Day of Quarter

I currently have a query running to average survey scores for agents. We use the date range of the LastDayOfTheQuarter and 180 days back to calculate these scores. I ran into an issue for this current quarter.
One of my agents hasn't received any surveys in 2020 which is causing the query to not pull the current lastdayofquarter and 180 days back of results.
The code I am using:
SELECT
Agent,
U.Position,
U.BranchDescription,
(ADDDATE(LastDayOfQuarter, -180)) AS MinDate,
(LastDayOfQuarter) AS MaxDate,
COUNT(DISTINCT Response ID) as SurveyCount,
AVG(CASE WHEN Question ID = Q1_2 THEN Answer Value END) AS EngagedScore,
AVG(CASE WHEN Question ID = Q1_3 THEN Answer Value END) AS KnowledgableScore,
AVG(CASE WHEN Question ID = Q1_6 THEN Answer Value END) AS ValuedScore
FROM qualtrics_responses
LEFT JOIN date D
ON (D.`Date`) = (DATE(`End Date`))
LEFT JOIN `users` U
ON U.`UserID` = `Agent ID`
WHERE `Agent` IS NOT NULL
AND DATE(`End Date`) <= (`LastDayOfQuarter`)
AND DATE(`End Date`) >= (ADDDATE(`LastDayOfQuarter`, -180))
GROUP BY `Agent`, (ADDDATE(`LastDayOfQuarter`, -180))
i know the issue is due to the way I am joining the dates and since he doesn't have a result in this current year, the end date to date join isn't grabbing the desired date range. I can't seem to come up with any alternatives. Any help is appreciated.
I make the assumption that table date in your query is a calendar table, that stores the starts and ends of the quarters (most likely with one row per date in the quarter).
If so, you can solve this problem by rearranging the joins: first cross join the users and the calendar table to generate all possible combinations, then bring in the surveys table with a left join:
SELECT
U.UserID,
U.Position,
U.BranchDescription,
D.LastDayOfQuarter - interval 180 day AS MinDate,
D.LastDayOfQuarter AS MaxDate,
COUNT(DISTINCT Q.ResponseID) as SurveyCount,
AVG(CASE WHEN Q.QuestionID = 'Q1_2' THEN Q.Answer Value END) AS EngagedScore,
AVG(CASE WHEN Q.QuestionID = 'Q1_3' THEN Q.Answer Value END) AS KnowledgableScore,
AVG(CASE WHEN Q.QuestionID = 'Q1_6' THEN Q.Answer Value END) AS ValuedScore
FROM date D
CROSS JOIN users U
LEFT JOIN qualtrics_responses Q
ON Q.EndDate >= D.Date
AND Q.EndDate < D.Date + interval 1 day
AND U.UserID = Q.AgentID
AND Q.Agent IS NOT NULL
GROUP BY
U.UserID,
U.Position,
U.BranchDescription,
D.LastDayOfQuarter
Notes:
I adapted the date arithmetics - this assumes that you are using MySQL, as the syntax of the query suggests
You should really qualify all the columns in the query, by prefixing them with the alias of the table they belong to; this makes the query so much easier to understand. I gave a tried at it, you might need to review that.
All non-aggregated columns should appear in the group by clause (also see the comment from Eric); this is a a requirement in most databaseses, and good practice anywhere

Group by in columns and rows, counts and percentages per day

I have a table that has data like following.
attr |time
----------------|--------------------------
abc |2018-08-06 10:17:25.282546
def |2018-08-06 10:17:25.325676
pqr |2018-08-05 10:17:25.366823
abc |2018-08-06 10:17:25.407941
def |2018-08-05 10:17:25.449249
I want to group them and count by attr column row wise and also create additional columns in to show their counts per day and percentages as shown below.
attr |day1_count| day1_%| day2_count| day2_%
----------------|----------|-------|-----------|-------
abc |2 |66.6% | 0 | 0.0%
def |1 |33.3% | 1 | 50.0%
pqr |0 |0.0% | 1 | 50.0%
I'm able to display one count by using group by but unable to find out how to even seperate them to multiple columns. I tried to generate day1 percentage with
SELECT attr, count(attr), count(attr) / sum(sub.day1_count) * 100 as percentage from (
SELECT attr, count(*) as day1_count FROM my_table WHERE DATEPART(week, time) = DATEPART(day, GETDate()) GROUP BY attr) as sub
GROUP BY attr;
But this also is not giving me correct answer, I'm getting all zeroes for percentage and count as 1. Any help is appreciated. I'm trying to do this in Redshift which follows postgresql syntax.
Let's nail the logic before presenting:
with CTE1 as
(
select attr, DATEPART(day, time) as theday, count(*) as thecount
from MyTable
)
, CTE2 as
(
select theday, sum(thecount) as daytotal
from CTE1
group by theday
)
select t1.attr, t1.theday, t1.thecount, t1.thecount/t2.daytotal as percentofday
from CTE1 t1
inner join CTE2 t2
on t1.theday = t2.theday
From here you can pivot to create a day by day if you feel the need
I am trying to enhance the query #johnHC btw if you needs for 7days then you have to those days in case when
with CTE1 as
(
select attr, time::date as theday, count(*) as thecount
from t group by attr,time::date
)
, CTE2 as
(
select theday, sum(thecount) as daytotal
from CTE1
group by theday
)
,
CTE3 as
(
select t1.attr, EXTRACT(DOW FROM t1.theday) as day_nmbr,t1.theday, t1.thecount, t1.thecount/t2.daytotal as percentofday
from CTE1 t1
inner join CTE2 t2
on t1.theday = t2.theday
)
select CTE3.attr,
max(case when day_nmbr=0 then CTE3.thecount end) as day1Cnt,
max(case when day_nmbr=0 then percentofday end) as day1,
max(case when day_nmbr=1 then CTE3.thecount end) as day2Cnt,
max( case when day_nmbr=1 then percentofday end) day2
from CTE3 group by CTE3.attr
http://sqlfiddle.com/#!17/54ace/20
In case that you have only 2 days:
http://sqlfiddle.com/#!17/3bdad/3 (days descending as in your example from left to right)
http://sqlfiddle.com/#!17/3bdad/5 (days ascending)
The main idea is already mentioned in the other answers. Instead of joining the CTEs for calculating the values I am using window functions which is a bit shorter and more readable I think. The pivot is done the same way.
SELECT
attr,
COALESCE(max(count) FILTER (WHERE day_number = 0), 0) as day1_count, -- D
COALESCE(max(percent) FILTER (WHERE day_number = 0), 0) as day1_percent,
COALESCE(max(count) FILTER (WHERE day_number = 1), 0) as day2_count,
COALESCE(max(percent) FILTER (WHERE day_number = 1), 0) as day2_percent
/*
Add more days here
*/
FROM(
SELECT *, (count::float/count_per_day)::decimal(5, 2) as percent -- C
FROM (
SELECT DISTINCT
attr,
MAX(time::date) OVER () - time::date as day_number, -- B
count(*) OVER (partition by time::date, attr) as count, -- A
count(*) OVER (partition by time::date) as count_per_day
FROM test_table
)s
)s
GROUP BY attr
ORDER BY attr
A counting the rows per day and counting the rows per day AND attr
B for more readability I convert the date into numbers. Here I take the difference between current date of the row and the maximum date available in the table. So I get a counter from 0 (first day) up to n - 1 (last day)
C calculating the percentage and rounding
D pivot by filter the day numbers. The COALESCE avoids the NULL values and switched them into 0. To add more days you can multiply these columns.
Edit: Made the day counter more flexible for more days; new SQL Fiddle
Basically, I see this as conditional aggregation. But you need to get an enumerator for the date for the pivoting. So:
SELECT attr,
COUNT(*) FILTER (WHERE day_number = 1) as day1_count,
COUNT(*) FILTER (WHERE day_number = 1) / cnt as day1_percent,
COUNT(*) FILTER (WHERE day_number = 2) as day2_count,
COUNT(*) FILTER (WHERE day_number = 2) / cnt as day2_percent
FROM (SELECT attr,
DENSE_RANK() OVER (ORDER BY time::date DESC) as day_number,
1.0 * COUNT(*) OVER (PARTITION BY attr) as cnt
FROM test_table
) s
GROUP BY attr, cnt
ORDER BY attr;
Here is a SQL Fiddle.