how to sql query for patients between dates - sql

I'd like query for patients having received their first diagnosis of x between 2019 - present and excluding those patients that received a diagnosis of x prior to 2019.
When I use the query below, I result in the same number of patient with or without statement: AND d.[DOS] !< '2019'
Can someone help?
Thanks!
SELECT [id]
,[DiagnosisCD]
,[DOS]
FROM [diags] d
WHERE [DiagnosisCD] IN ('H91.2', 'H91.20', 'H91.21', 'H91.22', 'H91.23')
AND d.[DOS] >= '2019'
AND d.[DOS] !< '2019'

You can use aggregation:
SELECT [id]
FROM [diags] d
WHERE [DiagnosisCD] IN ('H91.2', 'H91.20', 'H91.21', 'H91.22', 'H91.23')
GROUP BY id
HAVING MIN(DOS) >= 2019
If DOS is really a date then use:
HAVING MIN(DOS) >= '2019-01-01'
If you want all rows related to these diagnoses -- even if there is more than one per patient -- then you can use exists:
SELECT d.*
FROM [diags] d
WHERE d.DiagnosisCD IN ('H91.2', 'H91.20', 'H91.21', 'H91.22', 'H91.23') AND
NOT EXISTS (SELECT 1
FROM diags d2
WHERE d2.id = d.id AND
d2.DiagnosisCD IN ('H91.2', 'H91.20', 'H91.21', 'H91.22', 'H91.23') AND
d2.dos < 2019
);

Related

SQL Rowwise comparison between groups

Question
The following is a snippet of my data:
Create Table Emps(person VARCHAR(50), started DATE, stopped DATE);
Insert Into Emps Values
('p1','2015-10-10','2016-10-10'),
('p1','2016-10-11','2017-10-11'),
('p1','2017-10-12','2018-10-13'),
('p2','2019-11-13','2019-11-13'),
('p2','2019-11-14','2020-10-14'),
('p3','2020-07-15','2021-08-15'),
('p3','2021-08-16','2022-08-16');
db<>fiddle.
I want to use T-SQL to get a count of how many persons fulfil the following criteria at least once - multiples should also count as one:
For a person:
One of the dates in 'started' (say s1) is larger than at least one of the dates in 'ended' (say e1)
s1 and e1 are in the same year, to be set manually - e.g. '2021-01-01' until '2022-01-01'
Example expected response
If I put the date range '2016-01-01' until '2017-01-01' somewhere in a WHERE / HAVING clause, the output should be 1 as only p1 has both a start date and an end date that fall in 2016 where the start date is larger than the end date:
s1 = '2016-10-11', and e1 = '2016-10-10'.
Why can't I do this myself
The reason I'm stuck is that I don't know how to do this rowwise comparison between groups. The question requires comparing values across columns (start with end) across rows, within a person ID.
Use conditional aggregation to get the maximum start date and the minimum stop date in the given range.
select person
from emps
group by person
having max(case when started >= '2016-01-01' and started < '2017-01-01'
then started end) >
min(case when stopped >= '2016-01-01' and stopped < '2017-01-01'
then stopped end);
Demo: https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=45adb153fcac9ce72708f1283cac7833
I would choose to use a self-outer-join with an exists correlation, it should be pretty much the most performant, all things being equal.
select Count(*)
from emps e
where exists (
select * from emps e2
where e2.person = e.person
and e2.stopped > e.started
and e.started between '20160101' and '20170101'
and e2.started between '20160101' and '20170101'
);
You said you plan to set the dates manually, so this works where we set the start date in one CTE, and the end date in another CTE. Then we calculate the min/max for each, and use that criteria in the query where statement.
with min_max_start as (
select person,
min(started) as min_start, --obsolete
max(started) as max_start
from emps
where started >= '2016-01-01'
group by person
),
min_max_end as (
select person,
min(stopped) as min_stop,
max(stopped) as max_stop --obsolete
from emps
where stopped < '2017-01-01'
group by person
)
select count(distinct e.person)
from emps e
join min_max_start mms
on e.person = mms.person
join min_max_end mme
on e.person = mme.person
where mms.max_start> mme.min_stop
Output: 1
Try the following:
With CTE as
(
Select D.person, D.started, T.stopped,
case
when Year(D.started) = Year(T.stopped) and D.started > T.stopped
then 1
else 0
end as chk
From
(Select person, started From Emps Where started >= '2016-01-01') D
Join
(Select person, stopped From Emps Where stopped <= '2017-01-01') T
On D.person = T.person
)
Select Count(Distinct person) as CNT
From CTE
Where chk = 1;
To get the employee list who met the criteria use the following on the CTE instead of the above Select Count... query:
Select person, started, stopped
From CTE
Where chk = 1;
See a demo from db<>fiddle.

Fill in blank dates for rolling average - CTE in Snowflake

I have two tables – activity and purchase
Activity table:
user_id date videos_watched
1 2020-01-02 3
1 2020-01-04 5
1 2020-01-07 5
Purchase table:
user_id purchase_date
1 2020-01-01
2 2020-02-02
What I would like to do is to get a 30 day rolling average since purchase on how many videos has been watched.
The base query is like this:
SELECT
DATEDIFF(DAY, p.purchase_date, a.date) AS day_since_purchase,
AVG(A.VIDEOS_VIEWED)
FROM PURCHASE P
LEFT OUTER JOIN ACTIVITY A ON P.USER_ID = A.USER_ID AND
A.DATE >= P.PURCHASE_DATE AND A.DATE <= DATEADD(DAY, 30, P.PURCHASE_DATE)
GROUP BY 1;
However, the Activity table only has records for each day a video has been logged. I would like to fill in the blanks for days a video has not been viewed.
I have started to look into using a CTE like this:
WITH cte AS (
SELECT date('2020-01-01') as fdate
UNION ALL
SELECT CAST(DATEADD(day,1,fdate) as date)
FROM cte
WHERE fdate < date('2020-04-01')
) select * from cte
cross join purchases p
left outer join activity a
on p.user id = a.user_id
and a.fdate = p.purchase_date
and a.date >= p.purchase_date and a.date <= dateadd(day, 30, p.purchase_date)
The end goal is to have something like this:
days_since_purchase videos_watched
1 3
2 0 --CTE coalesce inserted value
3 0
4 5
Been trying for the last couple of hours to get it right, but still can't really get the hang of it.
If you want to fill in the gaps in the result set, then I think you should be generating integers rather than dates:
WITH cte AS (
SELECT 1 as day_since_purchase
UNION ALL
SELECT 1 + day_since_purchase
FROM cte
WHERE day_since_purchase < 4
)
SELECT cte.day_since_purchase, COALESCE(avg_videos_viewed, 0)
FROM cte LEFT JOIN
(SELECT DATEDIFF(DAY, p.purchase_date, a.date) AS day_since_purchase,
AVG(A.VIDEOS_VIEWED) as avg_videos_viewed
FROM purchases p JOIN
activity a
ON p.user id = a.user_id AND
a.fdate = p.purchase_date AND
a.date >= p.purchase_date AND
a.date <= dateadd(day, 30, p.purchase_date)
GROUP BY 1
) pa
ON pa.day_since_purchase = cte.day_since_purchase;
You can use a recursive query to generate the 30 days following each purchase, then bring the activity table:
with cte as (
select
purchase_date,
client_id,
0 days_since_purchase,
purchase_date dt
from purchases
union all
select
purchase_date,
client_id,
days_since_purchase + 1
dateadd(day, days_since_purchase + 1, purchase_date)
from cte
where days_since_purchase < 30
)
select
c.days_since_purchase,
avg(colaesce(a. videos_watch, 0)) avg_ videos_watch
from cte c
left join activity a
on a.client_id = c.client_id
and a.fdate = c.purchase_date
and a.date = c.dt
group by c.days_since_purchase
Your question is unclear on whether you have a column in the activity table that stores the purchase date each row relates to. Your query has column fdate but not your sample data. I used that column in the query (without such column, you might end up counting the same activity in different purchases).

distinct count with group by

I have already searched SO but found no answer to my question. My question is if I use the query below I get correct count which is 90:
select count(distinct account_id)
from FactCustomerAccount f
join DimDate d on f.date_id = d.datekey
-- 90
But when I group by CalendarYear as below I am missing 12 counts. The query and output is below:
select CalendarYear,count(distinct account_id) as accountCount
from FactCustomerAccount f
join DimDate d on f.date_id = d.datekey
group by CalendarYear
output:
CalendarYear accountCount
2005 10
2006 26
2007 49
2008 63
2009 65
2010 78
I am not sure why I am missing 12 counts. To debug I run following query if I have missing date_id in FactCustomerAccount but found no missing keys:
select distinct f.date_id from FactCustomerAccount f
where f.date_id not in
(select DateKey from dimdate d)
I am using SQL Server 2008 R2.
Can anyone please suggest what could be the reason for missing 12 counts?
Thanks in advance.
EDIT ONE:
I did not quite understand reason/answer given to my question in the 2 replies so I would like to add 2 queries below using AdventureWorksDW2008R2 where no count is missing:
select count (distinct EmployeeKey)
from FactSalesQuota f
join dimdate d on f.DateKey = d.DateKey
-- out: 17
select d.CalendarYear, count (distinct EmployeeKey) as Employecount
from FactSalesQuota f
join dimdate d on f.DateKey = d.DateKey
group by d.CalendarYear
-- out:
-- CalendarYear Employecount
-- 2005 10
-- 2006 14
-- 2007 17
-- 2008 17
So please correct me what I am missing.
Your queries are very different:
The first:
select count(distinct account_id)
from FactCustomerAccount f
join DimDate d on f.date_id = d.datekey
Return a count of different accounts (over all years), so if you have an account_id present in two years, you have 1 (count) returned.
The second:
Grouped by CalendarYear so if you have an account_id in two different years, this information goes in two different rows.
select CalendarYear,count(distinct account_id) as accountCount
from FactCustomerAccount f
join DimDate d on f.date_id = d.datekey
group by CalendarYear
EDIT
I try to explain better:
I suppose this data set of order couple: (year, account_id)
`2008 10`
`2009 10`
`2010 10`
`2010 12`
If you run two upper queries you have:
`2`
and
`2008 1`
`2009 1`
`2010 2`
because exist two different account_id (10 and 12) and only in the last year (2010) account_ids 10 and 12 have written their rows.
But if you have this data set:
`2008 10`
`2009 10`
`2009 12`
`2010 12`
You'll have:
First query result:
2
Second query result:
2008 1
2009 2
2010 1
You aren't missing 12. It could be that some accounts didn't have activities in the final years.
i will say to analyze this,check number of rows.check calender column.Is there any rows with null in calenderyear .or try rank,i am not sure
select *,
ROW_NUMBER()over(partition by CalendarYear,account_id order by CalendarYear)
from FactSalesQuota f
join dimdate d on f.DateKey = d.DateKey

sql db2 select records from either table

I have an order file, with order id and ship date. Orders can only be shipped monday - friday. This means there are no records selected for Saturday and Sunday.
I use the same order file to get all order dates, with date in the same format (yyyymmdd).
i want to select a count of all the records from the order file based on order date... and (i believe) full outer join (or maybe right join?) the date file... because i would like to see
20120330 293
20120331 0
20120401 0
20120402 920
20120403 430
20120404 827
etc...
however, my sql statement is still not returning a zero record for the 31st and 1st.
with DatesTable as (
select ohordt "Date" from kivalib.orhdrpf
where ohordt between 20120315 and 20120406
group by ohordt order by ohordt
)
SELECT ohscdt, count(OHTXN#) "Count"
FROM KIVALIB.ORHDRPF full outer join DatesTable dts on dts."Date" = ohordt
--/*order status = filled & order type = 1 & date between (some fill date range)*/
WHERE OHSTAT = 'F' AND OHTYP = 1 and ohscdt between 20120401 and 20120406
GROUP BY ohscdt ORDER BY ohscdt
any ideas what i'm doing wrong?
thanks!
It's because there is no data for those days, they do not show up as rows. You can use a recursive CTE to build a contiguous list of dates between two values that the query can join on:
It will look something like:
WITH dates (val) AS (
SELECT CAST('2012-04-01' AS DATE)
FROM SYSIBM.SYSDUMMY1
UNION ALL
SELECT Val + 1 DAYS
FROM dates
WHERE Val < CAST('2012-04-06' AS DATE)
)
SELECT d.val AS "Date", o.ohscdt, COALESCE(COUNT(o.ohtxn#), 0) AS "Count"
FROM dates AS d
LEFT JOIN KIVALIB.ORDHRPF AS o
ON o.ohordt = TO_CHAR(d.val, 'YYYYMMDD')
WHERE o.ohstat = 'F'
AND o.ohtyp = 1

sql query to find customers who order too frequently?

My database isn't actually customers and orders, it's customers and prescriptions for their eye tests (just in case anyone was wondering why I'd want my customers to make orders less frequently!)
I have a database for a chain of opticians, the prescriptions table has the branch ID number, the patient ID number, and the date they had their eyes tested. Over time, patients will have more than one eye test listed in the database. How can I get a list of patients who have had a prescription entered on the system more than once in six months. In other words, where the date of one prescription is, for example, within three months of the date of the previous prescription for the same patient.
Sample data:
Branch Patient DateOfTest
1 1 2007-08-12
1 1 2008-08-30
1 1 2008-08-31
1 2 2006-04-15
1 2 2007-04-12
I don't need to know the actual dates in the result set, and it doesn't have to be exactly three months, just a list of patients who have a prescription too close to the previous prescription. In the sample data given, I want the query to return:
Branch Patient
1 1
This sort of query isn't going to be run very regularly, so I'm not overly bothered about efficiency. On our live database I have a quarter of a million records in the prescriptions table.
Something like this
select p1.branch, p1.patient
from prescription p1, prescription p2
where p1.patient=p2.patient
and p1.dateoftest > p2.dateoftest
and datediff('day', p2.dateoftest, p1.dateoftest) < 90;
should do... you might want to add
and p1.dateoftest > getdate()
to limit to future test prescriptions.
This one will efficiently use an index on (Branch, Patient, DateOfTest) which you of course should have:
SELECT Patient, DateOfTest, pDate
FROM (
SELECT (
SELECT TOP 1 DateOfTest AS last
FROM Patients pp
WHERE pp.Branch = p.Branch
AND pp.Patient = p.Patient
AND pp.DateOfTest BETWEEN DATEADD(month, -3, p.DateOfTest) AND p.DateOfTest
ORDER BY
DateOfTest DESC
) pDate
FROM Patients p
) po
WHERE pDate IS NOT NULL
On way:
select d.branch, d.patient
from data d
where exists
( select null from data d1
where d1.branch = d.branch
and d1.patient = d.patient
and "difference (d1.dateoftest ,d.dateoftest) < 6 months"
);
This part needs changing - I'm not familiar with SQL Server's date operations:
"difference (d1.dateoftest ,d.dateoftest) < 6 months"
Self-join:
select a.branch, a.patient
from prescriptions a
join prescriptions b
on a.branch = b.branch
and a.patient = b.patient
and a.dateoftest > b.dateoftest
and a.dateoftest - b.dateoftest < 180
group by a.branch, a.patient
This assumes you want patients who visit the same branch twice. If you don't, take out the branch part.
SELECT Branch
,Patient
FROM (SELECT Branch
,Patient
,DateOfTest
,DateOfOtherTest
FROM Prescriptions P1
JOIN Prescriptions P2
ON P2.Branch = P1.Branch
AND P2.Patient = P2.Patient
AND P2.DateOfTest <> P1.DateOfTest
) AS SubQuery
WHERE DATEDIFF(day, SubQuery.DateOfTest, SubQuery.DateOfOtherTest) < 90