SQL: determine that an activity occurs with a given frequency - sql

A common problem in electronic-medical-record (EMR) reporting in determining that an activity occurs with a specific frequency. In this situation, I need to determine that a note was written every 72-hours after admission.
Given:
A D
|-0-|-1-|-2-|-3-|-4-|-5-|-6-|-7-|-8-|-9-|
|---- 1 ----|---- 2 ----|---- 3 ----|-4-|
There would need to be at least one note during periods 1, 2, and 3. Because 4 isn't a full 72-hour period, it doesn't require a note. Failure to find a note in periods 1, 2, and 3 would be a FAIL.
Data:
(ENC):
ENC_ID ADMITTED DISCHARGED PERIODS PASS_FAIL
4114221 06/15/09 18:30 06/24/09 15:40 3 ?
PERIODS: TRUNC(CEIL((DISCHARGED - ADMITTED)/3))
The 'PASS_FAIL' column would indicate if the encounter had an adequate number and timing of notes.
(NOTE):
ENC_ID NOTE_ID NOTE_TIME PERIOD
4114221 1833764 06/17/09 08:42 1
4114221 1843613 06/18/09 08:14 1
4114221 1858159 06/18/09 20:15 2
4114221 1850948 06/18/09 20:15 2
4114221 1850912 06/18/09 20:18 2
4114221 1859315 06/19/09 18:35 2
4114221 1863982 06/20/09 10:29 2
4114221 1868895 06/21/09 22:00 3
4114221 1873539 06/22/09 15:42 3
PERIOD: CEIL((NOTE_TIME - ADMITTED)/3)
Is there an efficient way to solve this problem?

SELECT e.*,
CASE WHEN cnt = TRUNC(CEIL((discharged / admitted) / 3)) THEN 'pass' ELSE 'fail' END AS pass_fail
FROM (
SELECT COUNT(*) AS cnt
FROM enc ei
CROSS JOIN
(
SELECT level AS period
FROM dual
CONNECT BY
level <=
(
SELECT TRUNC(CEIL((discharged / admitted) / 3))
FROM enc
WHERE enc_id = :enc_id
)
) p
WHERE ei.enc_id = :enc_id
AND EXISTS
(
SELECT NULL
FROM note
WHERE enc_id = ei.enc_id
AND note_time >= ei.admitted + (p - 1) * 3
AND note_time < ei.admitted + p * 3
)
) c
JOIN enc e
ON e.enc_id = :enc_id

If I'm reading your question correctly NOTE is a table with the data indicated.
All you really care about is whether the periods 1, 2 & 3 exist in the notes table for each enc_id.
If this is the case it indicates that an analytic function should be used:
select e.enc_id, e.admitted, e.discharged, e.periods
, decode( n.ct
, 'pass'
, 'fail' ) as pass_fail
from enc e
left outer join ( select distinct enc_id
, count(n.period) over ( partition by n.enc_id ) as ct
from note
where period in (1,2,3)
) n
on e.enc_id = n.enc_id
This selects all period's per enc_id from note, which are the ones you want to examine. Then counts them per enc_id. The distinct is there to ensure you only get one row per enc_id in the final result.
If you only want those enc_ids that have a value in note then turn the left outer join into an inner join.
If period is not, as indicated, in the note query, you have to do a distinct on the full query rather than the sub-query and check which period each note_id is in.
I'm sorry about the horrible formatting but I wanted to try to fit it on the page.
select distinct e.enc_id, e.admitted, e.discharged, e.periods
, decode( count( distinct -- number of distinct periods
case when n.note_time between e.admitted
and e.admitted + 3 then 1
when n.note_time between e.admitted
and e.admitted + 6 then 2
when n.note_time between e.admitted
and e.admitted + 9 then 3
end ) -- per enc_id from note
over ( partition by n.enc_id )
-- if it-s 3 then pass
, 3, 'pass'
-- else fail.
, 'fail' ) as pass_fail
from enc e
left outer join note n
on e.enc_id = n.enc_id
Whatever your data-structure the benefits of both ways are that they are simple joins, one index unique scan ( I'm assuming enc.end_id is unique ) and one index range scan ( on note ).

Related

Postgresql How to Calculate between 2 date depends on another table

let's say i have two table like this :
workday_emp
emp_id work_start work_end
1 "2021-04-06" "2021-04-14"
2 "2021-04-27" "2021-05-04"
3 "2021-04-30" "2021-05-07"
holiday_tbl
id name date
1 "holiday 1" "2021-04-07"
2 "holiday 2" "2021-04-28"
3 "holiday 3" "2021-04-29"
i want to show table like this with a query:
emp_id work_start work_end day_holiday
1 "2021-04-06" "2021-04-14" 1
2 "2021-04-27" "2021-05-04" 2
3 "2021-04-30" "2021-05-07" 1
the question is, how to calculate how many "day_holiday" between "work_start" and "work_end" depends to "holiday_tbl" table?
Please try this. For Employee 3 holiday count will 0 not 1 because his work_day starts at april30 but last holiday was apr29.
-- PostgreSQL(v11)
SELECT w.emp_id, w.work_start, w.work_end
, (SELECT COUNT(id)
FROM holiday_tbl
WHERE holiday_date BETWEEN w.work_start AND w.work_end) day_holiday
FROM workday_emp w
Please check from url https://dbfiddle.uk/?rdbms=postgres_11&fiddle=1948691b58ba841b2765d7de383f8df8
This should do the job:
SELECT emp_id, work_start, work_end, COUNT(ht.holiday) holiday_cnt
FROM workday_emp we LEFT JOIN
(
SELECT date holiday
FROM holiday_tbl
) ht ON ht.holiday BETWEEN we.work_start AND we.work_end
GROUP BY 1, 2, 3
ORDER BY 1, 2;
db<>fiddle

Find duplicates within a specific period

I have a table with the following structure
ID Person LOG_TIME
-----------------------------------
1 1 2012-05-21 13:03:11.550
2 1 2012-05-22 13:09:37.050 <--- this is duplicate
3 1 2012-05-28 13:09:37.183
4 2 2012-05-20 15:09:37.230
5 2 2012-05-22 13:03:11.990 <--- this is duplicate
6 2 2012-05-24 04:04:13.222 <--- this is duplicate
7 2 2012-05-29 11:09:37.240
I have some application job that fills this table with data.
There is a business rule that each person should have only 1 record in every 7 days.
From the above example, records # 2,5 and 6 are considered duplicates while 1,3,4 and 7 are OK.
I want to have a SQL query that checks if there are records for the same person in less than 7 days.
;WITH cte AS
(
SELECT ID, Person, LOG_TIME,
DATEDIFF(d, MIN(LOG_TIME) OVER (PARTITION BY Person), LOG_TIME) AS diff_date
FROM dbo.Log_time
)
SELECT *
FROM cte
WHERE diff_date BETWEEN 1 AND 6
Demo on SQLFiddle
Please see my attempt on SQLFiddle here.
You can use a join based on DATEDIFF() to find records which are logged less than 7 days apart:
WITH TooClose
AS
(
SELECT
a.ID AS BeforeID,
b.ID AS AfterID
FROM
Log a
INNER JOIN Log b ON a.Person = b.Person
AND a.LOG_TIME < b.LOG_TIME
AND DATEDIFF(DAY, a.LOG_TIME, b.LOG_TIME) < 7
)
However, this will include records which you don't consider "duplicates" (for instance, ID 3, because it is too close to ID 2). From what you've said, I'm inferring that a record isn't a "duplicate" if the record it is too close to is itself a "duplicate".
So to apply this rule and get the final list of duplicates:
SELECT
AfterID AS ID
FROM
TooClose
WHERE
BeforeID NOT IN (SELECT AfterID FROM TooClose)
Please take a look at this sample.
Reference: SQLFIDDLE
Query:
select person,
datediff(max(log_time),min(log_time)) as diff,
count(log_time)
from pers
group by person
;
select y.person, y.ct
from (
select person,
datediff(max(log_time),min(log_time)) as diff,
count(log_time) as ct
from pers
group by person) as y
where y.ct > 1
and y.diff <= 7
;
PERSON DIFF COUNT(LOG_TIME)
1 1 3
2 8 3
PERSON CT
1 3
declare #Count int
set #count=(
select COUNT(*)
from timeslot
where (( (TimeFrom<#Timefrom and TimeTo >#Timefrom)
or (TimeFrom<#Timeto and TimeTo >#Timeto))
or (TimeFrom=#Timefrom or TimeTo=#Timeto)))

SQL Server : max date and inner join

I have two tables, one is a list of tasks. The other containing historical values for those tasks.
I need to generate a list of the latest event (and its description) for each check, as long as long as its Date_Executed is less than the current datetime minus the Timeframe (TimeFrame being hours within the task has to be done, formatted for use in DATEADD). But only if they have an active = 1.
Table: checks
Check_id description TimeFrame active
1 Task One -24 0
2 Task Two -24 0
3 Task Forty -48 1
4 Task Somehin -128 1
Table: events
Event_id Check_id Comment Date_Executed User_Executed
1 1 NULL 2012-09-18 16:10:44.917 admin
2 1 NULL 2012-09-25 11:39:01.000 jeff
3 4 Failed 2012-09-25 13:20:09.930 steve
4 4 Half failed 2012-09-25 13:05:09.953 marsha
5 2 NULL 2012-09-25 14:02:24.000 marsha
6 3 NULL 2012-09-18 16:10:55.023 marsha
The best solutions I have so far is:
SELECT
a.[Date_Executed]
a.[Check_id],
a.[Comments],
b.[frequency],
b.[Check_id],
b.[description]
FROM
[checksdb].[dbo].events as a,
[checksdb].[dbo].checks as b
where
b.active = 1
and a.[Date_Executed] < = dateadd(HOUR,b.[frequency],GETDATE())
and a.Check_id = b.Check_id
order by Check_id, priority
and
select MAX(date_Executed), Task_id from daily_check_events group by Task_id
Neither of which gets me what I need, I could really use some help.
Since you are SQL Server which supports Common Table Expression and Window Function. Try this,
WITH latestEvents
AS
(
SELECT Event_id, Check_id, [Comment], Date_Executed, User_Executed,
ROW_NUMBER() OVER(PARTITION BY Check_ID ORDER BY DATE_Executed DESC)
AS RowNum
FROM events
)
SELECT a.[Check_id], a.[description],
b.[Date_Executed], b.[Comment]
FROM checks a
INNER JOIN latestEvents b
on a.check_ID = b.check_ID
WHERE b.RowNum = 1 AND
a.active = 1
-- other conditions here
SQLFiddle Demo
The above query will only work on RDBMS that supports Window Functions. Alternatively, use the query below that works on most RDBMS
SELECT a.Check_id, a.description,
c.Date_Executed, c.Comment
FROM checks a
INNER JOIN
(
SELECT check_id, MAX(Date_Executed) maxExecuted
FROM events
GROUP BY check_ID
) b ON a.check_ID = b.check_ID
INNER JOIN events c
ON c.check_ID = b.check_ID AND
c.date_executed = b.maxExecuted
WHERE a.active = 1
SQLFiddle Demo

How to write Oracle query to find a total length of possible overlapping from-to dates

I'm struggling to find the query for the following task
I have the following data and want to find the total network day for each unique ID
ID From To NetworkDay
1 03-Sep-12 07-Sep-12 5
1 03-Sep-12 04-Sep-12 2
1 05-Sep-12 06-Sep-12 2
1 06-Sep-12 12-Sep-12 5
1 31-Aug-12 04-Sep-12 3
2 04-Sep-12 06-Sep-12 3
2 11-Sep-12 13-Sep-12 3
2 05-Sep-12 08-Sep-12 3
Problem is the date range can be overlapping and I can't come up with SQL that will give me the following results
ID From To NetworkDay
1 31-Aug-12 12-Sep-12 9
2 04-Sep-12 08-Sep-12 4
2 11-Sep-12 13-Sep-12 3
and then
ID Total Network Day
1 9
2 7
In case the network day calculation is not possible just get to the second table would be sufficient.
Hope my question is clear
We can use Oracle Analytics, namely the "OVER ... PARTITION BY" clause, in Oracle to do this. The PARTITION BY clause is kind of like a GROUP BY but without the aggregation part. That means we can group rows together (i.e. partition them) and them perform an operation on them as separate groups. As we operate on each row we can then access the columns of the previous row above. This is the feature PARTITION BY gives us. (PARTITION BY is not related to partitioning of a table for performance.)
So then how do we output the non-overlapping dates? We first order the query based on the (ID,DFROM) fields, then we use the ID field to make our partitions (row groups). We then test the previous row's TO value and the current rows FROM value for overlap using an expression like: (in pseudo code)
max(previous.DTO, current.DFROM) as DFROM
This basic expression will return the original DFROM value if it doesnt overlap, but will return the previous TO value if there is overlap. Since our rows are ordered we only need to be concerned with the last row. In cases where a previous row completely overlaps the current row we want the row then to have a 'zero' date range. So we do the same thing for the DTO field to get:
max(previous.DTO, current.DFROM) as DFROM, max(previous.DTO, current.DTO) as DTO
Once we have generated the new results set with the adjusted DFROM and DTO values, we can aggregate them up and count the range intervals of DFROM and DTO.
Be aware that most date calculations in database are not inclusive such as your data is. So something like DATEDIFF(dto,dfrom) will not include the day dto actually refers to, so we will want to adjust dto up a day first.
I dont have access to an Oracle server anymore but I know this is possible with the Oracle Analytics. The query should go something like this:
(Please update my post if you get this to work.)
SELECT id,
max(dfrom, LAST_VALUE(dto) OVER (PARTITION BY id ORDER BY dfrom) ) as dfrom,
max(dto, LAST_VALUE(dto) OVER (PARTITION BY id ORDER BY dfrom) ) as dto
from (
select id, dfrom, dto+1 as dto from my_sample -- adjust the table so that dto becomes non-inclusive
order by id, dfrom
) sample;
The secret here is the LAST_VALUE(dto) OVER (PARTITION BY id ORDER BY dfrom) expression which returns the value previous to the current row.
So this query should output new dfrom/dto values which dont overlap. It's then a simple matter of sub-querying this doing (dto-dfrom) and sum the totals.
Using MySQL
I did haves access to a mysql server so I did get it working there. MySQL doesnt have results partitioning (Analytics) like Oracle so we have to use result set variables. This means we use #var:=xxx type expressions to remember the last date value and adjust the dfrom/dto according. Same algorithm just a little longer and more complex syntax. We also have to forget the last date value any time the ID field changes!
So here is the sample table (same values you have):
create table sample(id int, dfrom date, dto date, networkDay int);
insert into sample values
(1,'2012-09-03','2012-09-07',5),
(1,'2012-09-03','2012-09-04',2),
(1,'2012-09-05','2012-09-06',2),
(1,'2012-09-06','2012-09-12',5),
(1,'2012-08-31','2012-09-04',3),
(2,'2012-09-04','2012-09-06',3),
(2,'2012-09-11','2012-09-13',3),
(2,'2012-09-05','2012-09-08',3);
On to the query, we output the un-grouped result set like above:
The variable #ld is "last date", and the variable #lid is "last id". Anytime #lid changes, we reset #ld to null. FYI In mysql the := operators is where the assignment happens, an = operator is just equals.
This is a 3 level query, but it could be reduced to 2. I went with an extra outer query to keep things more readable. The inner most query is simple and it adjusts the dto column to be non-inclusive and does the proper row ordering. The middle query does the adjustment of the dfrom/dto values to make them non-overlapped. The outer query simple drops the non-used fields, and calculate the interval range.
set #ldt=null, #lid=null;
select id, no_dfrom as dfrom, no_dto as dto, datediff(no_dto, no_dfrom) as days from (
select if(#lid=id,#ldt,#ldt:=null) as last, dfrom, dto, if(#ldt>=dfrom,#ldt,dfrom) as no_dfrom, if(#ldt>=dto,#ldt,dto) as no_dto, #ldt:=if(#ldt>=dto,#ldt,dto), #lid:=id as id,
datediff(dto, dfrom) as overlapped_days
from (select id, dfrom, dto + INTERVAL 1 DAY as dto from sample order by id, dfrom) as sample
) as nonoverlapped
order by id, dfrom;
The above query gives the results (notice dfrom/dto are non-overlapping here):
+------+------------+------------+------+
| id | dfrom | dto | days |
+------+------------+------------+------+
| 1 | 2012-08-31 | 2012-09-05 | 5 |
| 1 | 2012-09-05 | 2012-09-08 | 3 |
| 1 | 2012-09-08 | 2012-09-08 | 0 |
| 1 | 2012-09-08 | 2012-09-08 | 0 |
| 1 | 2012-09-08 | 2012-09-13 | 5 |
| 2 | 2012-09-04 | 2012-09-07 | 3 |
| 2 | 2012-09-07 | 2012-09-09 | 2 |
| 2 | 2012-09-11 | 2012-09-14 | 3 |
+------+------------+------------+------+
How about constructing an SQL which merges intervals by removing holes and considering only maximum intervals. It goes like this (not tested):
SELECT DISTINCT F.ID, F.From, L.To
FROM Temp AS F, Temp AS L
WHERE F.From < L.To AND F.ID = L.ID
AND NOT EXISTS (SELECT *
FROM Temp AS T
WHERE T.ID = F.ID
AND F.From < T.From AND T.From < L.To
AND NOT EXISTS ( SELECT *
FROM Temp AS T1
WHERE T1.ID = F.ID
AND T1.From < T.From
AND T.From <= T1.To)
)
AND NOT EXISTS (SELECT *
FROM Temp AS T2
WHERE T2.ID = F.ID
AND (
(T2.From < F.From AND F.From <= T2.To)
OR (T2.From < L.To AND L.To < T2.To)
)
)
with t_data as (
select 1 as id,
to_date('03-sep-12','dd-mon-yy') as start_date,
to_date('07-sep-12','dd-mon-yy') as end_date from dual
union all
select 1,
to_date('03-sep-12','dd-mon-yy'),
to_date('04-sep-12','dd-mon-yy') from dual
union all
select 1,
to_date('05-sep-12','dd-mon-yy'),
to_date('06-sep-12','dd-mon-yy') from dual
union all
select 1,
to_date('06-sep-12','dd-mon-yy'),
to_date('12-sep-12','dd-mon-yy') from dual
union all
select 1,
to_date('31-aug-12','dd-mon-yy'),
to_date('04-sep-12','dd-mon-yy') from dual
union all
select 2,
to_date('04-sep-12','dd-mon-yy'),
to_date('06-sep-12','dd-mon-yy') from dual
union all
select 2,
to_date('11-sep-12','dd-mon-yy'),
to_date('13-sep-12','dd-mon-yy') from dual
union all
select 2,
to_date('05-sep-12','dd-mon-yy'),
to_date('08-sep-12','dd-mon-yy') from dual
),
t_holidays as (
select to_date('01-jan-12','dd-mon-yy') as holiday
from dual
),
t_data_rn as (
select rownum as rn, t_data.* from t_data
),
t_model as (
select distinct id,
start_date
from t_data_rn
model
partition by (rn, id)
dimension by (0 as i)
measures(start_date, end_date)
rules
( start_date[for i
from 1
to end_date[0]-start_date[0]
increment 1] = start_date[0] + cv(i),
end_date[any] = start_date[cv()] + 1
)
order by 1,2
),
t_network_days as (
select t_model.*,
case when
mod(to_char(start_date, 'j'), 7) + 1 in (6, 7)
or t_holidays.holiday is not null
then 0 else 1
end as working_day
from t_model
left outer join t_holidays
on t_holidays.holiday = t_model.start_date
)
select id,
sum(working_day) as network_days
from t_network_days
group by id;
t_data - your initial data
t_holidays - contains list of holidays
t_data_rn - just adds unique key (rownum) to each row of t_data
t_model - expands t_data date ranges into a flat list of dates
t_network_days - marks each date from t_model as working day or weekend based on day of week (Sat and Sun) and holidays list
final query - calculates number of network day per each group.

SQL gaps in dates

I am trying to find gaps in the a table based on a state code the tables look like this.
StateTable:
StateID (PK) | Code
--------------------
1 | AK
2 | AL
3 | AR
StateModel Table:
StateModelID | StateID | EfftiveDate | ExpirationDate
-------------------------------------------------------------------------
1 | 1 | 2012-06-28 00:00:00.000| 2012-08-02 23:59:59.000
2 | 1 | 2012-08-03 00:00:00.000| 2050-12-31 23:59:59.000
3 | 1 | 2055-01-01 00:00:00.000| 2075-12-31 23:59:59.000
The query I am using is the following:
Declare #gapMessage varchar(250)
SET #gapMessage = ''
select
#gapMessage = #gapMessage +
(Select StateTable.Code FROM StateTable where t1.StateID = StateTable.StateID)
+ ' Row ' +CAST(t1.StateModelID as varchar(6))+' has a gap with '+
CAST(t2.StateModelID as varchar(6))+ CHAR(10)
from StateModel t1
inner join StateModel t2
on
t1.StateID = t2.StateID
and DATEADD(ss, 1,t1.ExpirationDate) < t2.EffectiveDate
and t1.EffectiveDate < t2.EffectiveDate
if(#gapMessage != '')
begin
Print 'States with a gap problem'
PRINT #gapMessage
end
else
begin
PRINT 'No States with a gap problem'
end
But with the above table example I get the following output:
States with a gap problem
AK Row 1 has a gap with 3
AK Row 2 has a gap with 3
Is there anyway to restructure my query so that the gap between 1 and 3 does not display because there is not a gap between 1 and 2?
I am using MS sql server 2008
Thanks
WITH
sequenced AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY StateID ORDER BY EfftiveDate) AS SequenceID,
*
FROM
StateModel
)
SELECT
*
FROM
sequenced AS a
INNER JOIN
sequenced AS b
ON a.StateID = b.StateID
AND a.SequenceID = b.SequenceID - 1
WHERE
a.ExpirationDate < DATEADD(second, -1, b.EfftiveDate)
To make this as effective as possible, also add an index on (StateID, EfftiveDate)
I wanted to just give credit to MatBailie, but don't have the points to do it yet, so I thought I would help out anyone else looking for a similar solution that may want to take it a step further like I needed to. I have changed my application of his code (which involves member enrollment) to the same language as the example here.
In my case, I needed these things:
I have two similar tables that I need to develop into one total table. In this example, let's make the tables like this: SomeStates + OtherStates = UpdatedTable. These are UNIONED in the AS clause.
I didn't want to remove any rows due to gaps, but I wanted to flag them on the StateID level. This is added as an additional column 'StateID_GapFlag'.
I also wanted to add a column to hold the oldest or MIN(EffectiveDate). This would be used in later calculations of SUM(period) to get a total duration, excluding gaps. This is the column 'MIN_EffectiveDate'.
;WITH sequenced
( SequenceID
,EffectiveDate
,ExpirationDate)
AS
(select
ROW_NUMBER() OVER (PARTITION BY StateID ORDER by EffectiveDate) as SequenceID,
* from (select EffectiveDate, ExpirationDate from SomeStates
UNION ALL
(select EffectiveDate, ExpirationDate from OtherStates)
) StateModel
where
EffectiveDate > 'filter'
)
Select DISTINCT
IJ1.[MIN_EffectiveDate]
,coalesce(IJ2.GapFlag,'') as [MemberEnrollmentGapFlag]
,EffectiveDate
,ExpirationDate
into UpdatedTable
from sequenced seq
inner join
(select StateID, min(EffectiveDate) as 'MIN_EffectiveDate'
from sequenced
group by StateID
) IJ1
on seq.member# = IJ1.member
left join
(select a.member#, 'GAP' as 'StateID_GapFlag'
from sequenced a
inner join
sequenced b
on a.StateID = b.StateID
and a.SequenceID = (b.sequenceID - 1)
where a.ExpirationDate < DATEADD(day, -1, b.EffectiveDate)
) LJ2
on seq.StateID = LJ2.StateID
You could use ROW_NUMBER to provide an ordering of stateModel's for each state, then check that the second difference for consecutive rows doesn't exceed 1. Something like:
;WITH Models (StateModelID, StateID, Effective, Expiration, RowOrder) AS (
SELECT StateModelID, StateID, EffectiveDate, ExpirationDate,
ROW_NUMBER() OVER (PARTITION BY StateID, ORDER BY EffectiveDate)
FROM StateModel
)
SELECT F.StateModelId, S.StateModelId
FROM Models F
CROSS APPLY (
SELECT M.StateModelId
FROM Models M
WHERE M.RowOrder = F.RowOrder + 1
AND M.StateId = F.StateId
AND DATEDIFF(SECOND, F.Expiration, M.Effective) > 1
) S
This will get you the state model IDs of the rows with gaps, which you can format how you wish.