SQL Rowwise comparison between groups - sql

Question
The following is a snippet of my data:
Create Table Emps(person VARCHAR(50), started DATE, stopped DATE);
Insert Into Emps Values
('p1','2015-10-10','2016-10-10'),
('p1','2016-10-11','2017-10-11'),
('p1','2017-10-12','2018-10-13'),
('p2','2019-11-13','2019-11-13'),
('p2','2019-11-14','2020-10-14'),
('p3','2020-07-15','2021-08-15'),
('p3','2021-08-16','2022-08-16');
db<>fiddle.
I want to use T-SQL to get a count of how many persons fulfil the following criteria at least once - multiples should also count as one:
For a person:
One of the dates in 'started' (say s1) is larger than at least one of the dates in 'ended' (say e1)
s1 and e1 are in the same year, to be set manually - e.g. '2021-01-01' until '2022-01-01'
Example expected response
If I put the date range '2016-01-01' until '2017-01-01' somewhere in a WHERE / HAVING clause, the output should be 1 as only p1 has both a start date and an end date that fall in 2016 where the start date is larger than the end date:
s1 = '2016-10-11', and e1 = '2016-10-10'.
Why can't I do this myself
The reason I'm stuck is that I don't know how to do this rowwise comparison between groups. The question requires comparing values across columns (start with end) across rows, within a person ID.

Use conditional aggregation to get the maximum start date and the minimum stop date in the given range.
select person
from emps
group by person
having max(case when started >= '2016-01-01' and started < '2017-01-01'
then started end) >
min(case when stopped >= '2016-01-01' and stopped < '2017-01-01'
then stopped end);
Demo: https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=45adb153fcac9ce72708f1283cac7833

I would choose to use a self-outer-join with an exists correlation, it should be pretty much the most performant, all things being equal.
select Count(*)
from emps e
where exists (
select * from emps e2
where e2.person = e.person
and e2.stopped > e.started
and e.started between '20160101' and '20170101'
and e2.started between '20160101' and '20170101'
);

You said you plan to set the dates manually, so this works where we set the start date in one CTE, and the end date in another CTE. Then we calculate the min/max for each, and use that criteria in the query where statement.
with min_max_start as (
select person,
min(started) as min_start, --obsolete
max(started) as max_start
from emps
where started >= '2016-01-01'
group by person
),
min_max_end as (
select person,
min(stopped) as min_stop,
max(stopped) as max_stop --obsolete
from emps
where stopped < '2017-01-01'
group by person
)
select count(distinct e.person)
from emps e
join min_max_start mms
on e.person = mms.person
join min_max_end mme
on e.person = mme.person
where mms.max_start> mme.min_stop
Output: 1

Try the following:
With CTE as
(
Select D.person, D.started, T.stopped,
case
when Year(D.started) = Year(T.stopped) and D.started > T.stopped
then 1
else 0
end as chk
From
(Select person, started From Emps Where started >= '2016-01-01') D
Join
(Select person, stopped From Emps Where stopped <= '2017-01-01') T
On D.person = T.person
)
Select Count(Distinct person) as CNT
From CTE
Where chk = 1;
To get the employee list who met the criteria use the following on the CTE instead of the above Select Count... query:
Select person, started, stopped
From CTE
Where chk = 1;
See a demo from db<>fiddle.

Related

How to solve a nested aggregate function in SQL?

I'm trying to use a nested aggregate function. I know that SQL does not support it, but I really need to do something like the below query. Basically, I want to count the number of users for each day. But I want to only count the users that haven't completed an order within a 15 days window (relative to a specific day) and that have completed any order within a 30 days window (relative to a specific day). I already know that it is not possible to solve this problem using a regular subquery (it does not allow to change subquery values for each date). The "id" and the "state" attributes are related to the orders. Also, I'm using Fivetran with Snowflake.
SELECT
db.created_at::date as Date,
count(case when
(count(case when (db.state = 'finished')
and (db.created_at::date between dateadd(day,-15,Date) and dateadd(day,-1,Date)) then db.id end)
= 0) and
(count(case when (db.state = 'finished')
and (db.created_at::date between dateadd(day,-30,Date) and dateadd(day,-16,Date)) then db.id end)
> 0) then db.user end)
FROM
data_base as db
WHERE
db.created_at::date between '2020-01-01' and dateadd(day,-1,current_date)
GROUP BY Date
In other words, I want to transform the below query in a way that the "current_date" changes for each date.
WITH completed_15_days_before AS (
select
db.user as User,
count(case when db.state = 'finished' then db.id end) as Completed
from
data_base as db
where
db.created_at::date between dateadd(day,-15,current_date) and dateadd(day,-1,current_date)
group by User
),
completed_16_days_before AS (
select
db.user as User,
count(case when db.state = 'finished' then db.id end) as Completed
from
data_base as db
where
db.created_at::date between dateadd(day,-30,current_date) and dateadd(day,-16,current_date)
group by User
)
SELECT
date(db.created_at) as Date,
count(distinct case when comp_15.completadas = 0 and comp_16.completadas > 0 then comp_15.user end) as "Total Users Churn",
count(distinct case when comp_15.completadas > 0 then comp_15.user end) as "Total Users Active",
week(Date) as Week
FROM
data_base as db
left join completadas_15_days_before as comp_15 on comp_15.user = db.user
left join completadas_16_days_before as comp_16 on comp_16.user = db.user
WHERE
db.created_at::date between '2020-01-01' and dateadd(day,-1,current_date)
GROUP BY Date
Does anyone have a clue on how to solve this puzzle? Thank you very much!
The following should give you roughly what you want - difficult to test without sample data but should be a good enough starting point for you to then amend it to give you exactly what you want.
I've commented to the code to hopefully explain what each section is doing.
-- set parameter for the first date you want to generate the resultset for
set start_date = TO_DATE('2020-01-01','YYYY-MM-DD');
-- calculate the number of days between the start_date and the current date
set num_days = (Select datediff(day, $start_date , current_date()+1));
--generate a list of all the dates from the start date to the current date
-- i.e. every date that needs to appear in the resultset
WITH date_list as (
select
dateadd(
day,
'-' || row_number() over (order by null),
dateadd(day, '+1', current_date())
) as date_item
from table (generator(rowcount => ($num_days)))
)
--Create a list of all the orders that are in scope
-- i.e. 30 days before the start_date up to the current date
-- amend WHERE clause to in/exclude records as appropriate
,order_list as (
SELECT created_at, rt_id
from data_base
where created_at between dateadd(day,-30,$start_date) and current_date()
and state = 'finished'
)
SELECT dl.date_item
,COUNT (DISTINCT ol30.RT_ID) AS USER_COUNT
,COUNT (ol30.RT_ID) as ORDER_COUNT
FROM date_list dl
-- get all orders between -30 and -16 days of each date in date_list
left outer join order_list ol30 on ol30.created_at between dateadd(day,-30,dl.date_item) and dateadd(day,-16,dl.date_item)
-- exclude records that have the same RT_ID as in the ol30 dataset but have a date between 0 amd -15 of the date in date_list
WHERE NOT EXISTS (SELECT ol15.RT_ID
FROM order_list ol15
WHERE ol30.RT_ID = ol15.RT_ID
AND ol15.created_at between dateadd(day,-15,dl.date_item) and dl.date_item)
GROUP BY dl.date_item
ORDER BY dl.date_item;

how to sql query for patients between dates

I'd like query for patients having received their first diagnosis of x between 2019 - present and excluding those patients that received a diagnosis of x prior to 2019.
When I use the query below, I result in the same number of patient with or without statement: AND d.[DOS] !< '2019'
Can someone help?
Thanks!
SELECT [id]
,[DiagnosisCD]
,[DOS]
FROM [diags] d
WHERE [DiagnosisCD] IN ('H91.2', 'H91.20', 'H91.21', 'H91.22', 'H91.23')
AND d.[DOS] >= '2019'
AND d.[DOS] !< '2019'
You can use aggregation:
SELECT [id]
FROM [diags] d
WHERE [DiagnosisCD] IN ('H91.2', 'H91.20', 'H91.21', 'H91.22', 'H91.23')
GROUP BY id
HAVING MIN(DOS) >= 2019
If DOS is really a date then use:
HAVING MIN(DOS) >= '2019-01-01'
If you want all rows related to these diagnoses -- even if there is more than one per patient -- then you can use exists:
SELECT d.*
FROM [diags] d
WHERE d.DiagnosisCD IN ('H91.2', 'H91.20', 'H91.21', 'H91.22', 'H91.23') AND
NOT EXISTS (SELECT 1
FROM diags d2
WHERE d2.id = d.id AND
d2.DiagnosisCD IN ('H91.2', 'H91.20', 'H91.21', 'H91.22', 'H91.23') AND
d2.dos < 2019
);

More than 1 appointment on the same day for a patient and display both appointments

I am trying to find patients that have more than 1 appointment on the same day. I want to then display all the appointments the patient may have. Do I need to use a subquery to do this? Here is what I have so far:
Select
Appt.ID-PatNm as Patient,
ApptNum,
Sched_ApptType.Prov.Mnemonic as Type,
Appt.Provider-Name as Provider,
Appt.Dt,
Appt.Tm,
Appt.Department-Mnemonic As Dept,
Appt.SchedulerInits,
Case $EXTRACT(Appt.InternalStatus,1)
when 'P' then 'Pending'
when 'A' then 'Arrived'
when 'R' then 'Rescheduled'
End as Status
From Sched.Appointment Appt
JOIN Sched_ApptType.Prov ON
Appt.Department = Sched_ApptType.Prov.Department
and
Appt.Provider = Sched_ApptType.Prov.Provider
and
Appt.Type = Sched_ApptType.Prov.ApptType
Where (Appt.Dt) > DATEADD('DD',-120,CURRENT_DATE)
AND Appt.InternalStatus IN ('P','R','A')
AND Appt.Department-Mnemonic= 'EYE'
Group By
Appt.ID-PatNm,
Appt.Dt
You get the patients having more than one appointment in a day by grouping by patient and day:
select distinct a.id_patnm
from sched.appointment a
group by a.id_patnm, a.dt
having count(*) > 1
So yes, you need a subquery:
Where (Appt.Dt) > DATEADD('DD',-120,CURRENT_DATE)
AND Appt.InternalStatus IN ('P','R','A')
AND Appt.Department_Mnemonic= 'EYE'
AND Appt.ID_PatNm IN
(
select a.id_patnm
from sched.appointment a
group by a.id_patnm, a.dt
having count(*) > 1
)
(BTW: I used id_patnm instead of id-patnm here, for I don't know any DBMS that would allow the hyphen. When using a hyphen in a column name you have to use quotes on the name, e.g. "id-patnm".)
I suppose you could add a column for Appointment_id which would then allow you to get the desired result.

Counting concurrent records based on startdate and enddate columns

The table structure:
StaffingRecords
PersonnelId int
GroupId int
StaffingStartDateTime datetime
StaffingEndDateTime datetime
How can I get a list of staffing records, given a date and a group id that employees belong to, where the count of present employees fell below a threshold, say, 3, at any minute of the day?
The way my brain works, I would call a stored proc repeatedly with each minute of the day, but of course this would be horribly inefficient:
SELECT COUNT(PersonnelId)
FROM DailyRosters
WHERE GroupId=#GroupId
AND StaffingStartTime <= #TimeParam
AND StaffingEndTime > #TimeParam
AND COUNT(GroupId) < 3
GROUP BY GroupId
HAVING COUNT(PersonnelId) < 3
Edit: If it helps to refine the question, employees may come and go throughout the day. Personnel may have a staffing record from 0800 - 0815, and another from 1000 - 1045, for example.
Here is a solution where I find all of the distinct start and end times, and then query to see how many other people are clocked in at the time. Everytime the answer is less than 4, you know you are understaffed at that time, and presumably until the NEXT start time.
with meaningfulDtms(meaningfulTime, timeType, group_id)
as
(
select distinct StaffingStartTime , 'start' as timeType, group_id
from DailyRosters
union
select distinct StaffingEndTime , 'end' as timeType, group_id
from DailyRosters
)
select COUNT(*), meaningfulDtms.group_id, meaningfulDtms.meaningfulTime
from DailyRosters dr
inner join meaningfulDtms on dr.group_id = meaningfulDtms.group_id
and (
(dr.StaffingStartTime < meaningfulDtms.meaningfulTime
and dr.StaffingEndTime >= meaningfulDtms.meaningfulTime
and meaningfulDtms.timeType = 'start')
OR
(dr.StaffingStartTime <= meaningfulDtms.meaningfulTime
and dr.StaffingEndTime > meaningfulDtms.meaningfulTime
and meaningfulDtms.timeType = 'end')
)
group by meaningfulDtms.group_id, meaningfulDtms.meaningfulTime
having COUNT(*) < 4
Create a table with all minutes in the day with dt at PK
It will have 1440 rows
this will not give you count of zero - no staff
select allMiuntes.dt, worktime.grpID, count(distinct(worktime.personID))
from allMinutes
join worktime
on allMiuntes.dt > worktime.start
and allMiuntes.dt < worktime.end
group by allMiuntes.dt, worktime.grpID
having count(distinct(worktime.personID)) < 3
for times with zero I think the best way is a master of grpID
but I am not sure about this one
select allMiuntes.dt, grpMaster.grpID, count(distinct(worktime.personID))
from grpMaster
cross join allMinutes
left join worktime
on allMiuntes.dt > worktime.start
and allMiuntes.dt < worktime.end
and worktime.grpID = grpMaster.grpID
group by allMiuntes.dt, grpMaster.grpID
having count(distinct(worktime.personID)) < 3

sql db2 select records from either table

I have an order file, with order id and ship date. Orders can only be shipped monday - friday. This means there are no records selected for Saturday and Sunday.
I use the same order file to get all order dates, with date in the same format (yyyymmdd).
i want to select a count of all the records from the order file based on order date... and (i believe) full outer join (or maybe right join?) the date file... because i would like to see
20120330 293
20120331 0
20120401 0
20120402 920
20120403 430
20120404 827
etc...
however, my sql statement is still not returning a zero record for the 31st and 1st.
with DatesTable as (
select ohordt "Date" from kivalib.orhdrpf
where ohordt between 20120315 and 20120406
group by ohordt order by ohordt
)
SELECT ohscdt, count(OHTXN#) "Count"
FROM KIVALIB.ORHDRPF full outer join DatesTable dts on dts."Date" = ohordt
--/*order status = filled & order type = 1 & date between (some fill date range)*/
WHERE OHSTAT = 'F' AND OHTYP = 1 and ohscdt between 20120401 and 20120406
GROUP BY ohscdt ORDER BY ohscdt
any ideas what i'm doing wrong?
thanks!
It's because there is no data for those days, they do not show up as rows. You can use a recursive CTE to build a contiguous list of dates between two values that the query can join on:
It will look something like:
WITH dates (val) AS (
SELECT CAST('2012-04-01' AS DATE)
FROM SYSIBM.SYSDUMMY1
UNION ALL
SELECT Val + 1 DAYS
FROM dates
WHERE Val < CAST('2012-04-06' AS DATE)
)
SELECT d.val AS "Date", o.ohscdt, COALESCE(COUNT(o.ohtxn#), 0) AS "Count"
FROM dates AS d
LEFT JOIN KIVALIB.ORDHRPF AS o
ON o.ohordt = TO_CHAR(d.val, 'YYYYMMDD')
WHERE o.ohstat = 'F'
AND o.ohtyp = 1