How to find the number of occurences within a date range? - sql

Let's say I have hospital visits in the table TestData
I would like to know which patients have had a second hospital visit within 7 days of their first hospital visit.
How would I code this in SQL?
I have patient_id as a TEXT
the date is date_visit is also TEXT and takes the format MM/DD/YYYY
patient_id
date_visit
A123B29133
07/12/2011
A123B29133
07/14/2011
A123B29133
07/20/2011
A123B29134
12/05/2016
In the above table patient A123B29133 fulfills the condition as they were seen on 07/14/2011 which is less that 7 days from 07/12/2011

You can use a subquery with exists:
with to_d(id, v_date) as (
select patient_id, substr(date_visit, 7, 4)||"-"||substr(date_visit, 1, 2)||"-"||substr(date_visit, 4, 2) from visits
)
select t2.id from (select t1.id, min(t1.v_date) d1 from to_d t1 group by t1.id) t2
where exists (select 1 from to_d t3 where t3.id = t2.id and t3.v_date != t2.d1 and t3.v_date <= date(t2.d1, '+7 days'))
id
A123B29133

Since your date column is not in YYYY-MM-DD which is the default value used by several sqlite date functions, the substr function was used to transform your date in this format. JulianDay was then used to convert your dates to an integer value which would ease the comparison of 7 days. The MIN window function was used to identify the first hospital visit date for that patient. The demo fiddle and samples show the query that was used to transform the data and the results before the final query which filters based on your requirements i.e. < 7 days. With this approach using window functions, you may also retrieve the visit_date and the number of days since the first visit date if desired.
You may read more about sqlite date functions here.
Query #1
SELECT
patient_id,
visit_date,
JulianDay(visit_date) -
MIN(JulianDay(visit_date)) OVER (PARTITION BY patient_id)
as num_of_days_since_first_visit
FROM
(
SELECT
*,
(
substr(date_visit,7) || '-' ||
substr(date_visit,0,3) || '-' ||
substr(date_visit,4,2)
) as visit_date
FROM
visits
) v;
patient_id
visit_date
num_of_days_since_first_visit
A123B29133
2011-07-12
0
A123B29133
2011-07-14
2
A123B29133
2011-07-20
8
A123B29134
2016-12-05
0
Query #2
The below is your desired query, which uses the previous query as a CTE and applies the filter for visits less than 7 days. num_of_days <> 0 is applied to remove entries where the first date is also the date of the record.
WITH num_of_days_since_first_visit AS (
SELECT
patient_id,
visit_date,
JulianDay(visit_date) - MIN(JulianDay(visit_date)) OVER (PARTITION BY patient_id) num_of_days
FROM
(
SELECT
*,
(
substr(date_visit,7) || '-' ||
substr(date_visit,0,3) || '-' ||
substr(date_visit,4,2)
) as visit_date
FROM
visits
) v
)
SELECT DISTINCT
patient_id
FROM
num_of_days_since_first_visit
WHERE
num_of_days <> 0 AND num_of_days < 7;
patient_id
A123B29133
View on DB Fiddle
Let me know if this works for you.

I would like to know which patients have had a second hospital visit within 7 days of their first hospital visit.
You can use lag(). The following gets all rows where this is true:
select t.*
from (select t.*,
lag(date_visit) over (partition by patient_id order by date_visit) as prev_date_visit
from t
) t
where prev_date_visit >= date(date_visit, '-7 day');
If you just want the patient_ids, you can use select distinct patient_id.

Related

Finding id's available in previous weeks but not in current week

How to find if an id which was present in previous weeks but not available in current week on a rolling basis. For e.g
Week1 has id 1,2,3,4,5
Week2 has id 3,4,5,7,8
Week3 has id 1,3,5,10,11
So I found out that id 1 and 2 are missing in week 2 and id 2,4,7,8 are missing in week 3 from previous 2 weeks But how to do this on a rolling window for a large amount of data distributed over a period of 20+ years
Please find the sample dataset and expected output. I am expecting the output to be partitioned based on the week_end Date
Dataset
ID|WEEK_START|WEEK_END|APPEARING_DATE
7152|2015-12-27|2016-01-02|2015-12-27
8350|2015-12-27|2016-01-02|2015-12-27
7152|2015-12-27|2016-01-02|2015-12-29
4697|2015-12-27|2016-01-02|2015-12-30
7187|2015-12-27|2016-01-02|2015-01-01
8005|2015-12-27|2016-01-02|2015-12-27
8005|2015-12-27|2016-01-02|2015-12-29
6254|2016-01-03|2016-01-09|2016-01-03
7962|2016-01-03|2016-01-09|2016-01-04
3339|2016-01-03|2016-01-09|2016-01-06
7834|2016-01-03|2016-01-09|2016-01-03
7962|2016-01-03|2016-01-09|2016-01-05
7152|2016-01-03|2016-01-09|2016-01-07
8350|2016-01-03|2016-01-09|2016-01-09
2403|2016-01-10|2016-01-16|2016-01-10
0157|2016-01-10|2016-01-16|2016-01-11
2228|2016-01-10|2016-01-16|2016-01-14
4697|2016-01-10|2016-01-16|2016-01-14
Excepted Output
Partition1: WEEK_END=2016-01-02
ID|MAX(LAST_APPEARING_DATE)
7152|2015-12-29
8350|2015-12-27
4697|2015-12-30
7187|2015-01-01
8005|2015-12-29
Partition1: WEEK_END=2016-01-09
ID|MAX(LAST_APPEARING_DATE)
7152|2016-01-07
8350|2016-01-09
4697|2015-12-30
7187|2015-01-01
8005|2015-12-29
6254|2016-01-03
7962|2016-01-05
3339|2016-01-06
7834|2016-01-03
Partition3: WEEK_END=2016-01-10
ID|MAX(LAST_APPEARING_DATE)
7152|2016-01-07
8350|2016-01-09
4697|2016-01-14
7187|2015-01-01
8005|2015-12-29
6254|2016-01-03
7962|2016-01-05
3339|2016-01-06
7834|2016-01-03
2403|2016-01-10
0157|2016-01-11
2228|2016-01-14
Please use below query,
select ID, MAX(APPEARING_DATE) from table_name
group by ID, WEEK_END;
Or, including WEEK)END,
select ID, WEEK_END, MAX(APPEARING_DATE) from table_name
group by ID, WEEK_END;
You can use aggregation:
select t.*, max(week_end)
from t
group by id
having max(week_end) < '2016-01-02';
Adjust the date in the having clause for the week end that you want.
Actually, your question is a bit unclear. I'm not sure if a later week end would keep the row or not. If you want "as of" data, then include a where clause:
select t.id, max(week_end)
from t
where week_end < '2016-01-02'
group by id
having max(week_end) < '2016-01-02';
If you want this for a range of dates, then you can use a derived table:
select we.the_week_end, t.id, max(week_end)
from (select '2016-01-02' as the_week_end union all
select '2016-01-09' as the_week_end
) we cross join
t
where t.week_end < we.the_week_end
group by id, we.the_week_end
having max(t.week_end) < we.the_week_end;

Count Data by Loop Calendar SQL/Oracle

I need to get the data that generates count of total ID by date between date_active and date_end using date ranges for each. If the dates are crossing each other the ID will adding up. here is the data I have right now,
TABLE CONTRACT:
ID DATE_ACTIVE DATE_END
1 05-FEB-13 08-NOV-13
1 21-DEC-18 06-OCT-19
2 05-FEB-13 27-JAN-14
3 05-FEB-13 07-NOV-13
4 06-FEB-13 02-NOV-13
4 25-OCT-14 13-APR-16
TABLE CALENDAR:
DT
05-FEB-13
06-FEB-13
07-FEB-13
08-FEB-13
09-FEB-13
..-DEC-19
what I want out is basically like this:
DT COUNT(ID)
05-FEB-13 3
06-FEB-13 4
07-FEB-13 4
08-FEB-13 4
09-FEB-13 4
10-FEB-13 4
....
03-NOV-13 3
....
08-NOV-13 2
09-NOV-13 1
....
28-JAN-14 0
....
25-OCT-14 1
....
13-APR-16 1
14-APR-16 0
....
21-DEC-18 1
....
06-OCT-19 1
07-OCT-19 0
....
....
And here is my query to get that result
with contract as (
select * from contract
where id in ('1','2','3','4')
)
,
cal as
(
select TRUNC (SYSDATE - ROWNUM) dt
from dual
connect by rownum < sysdate - to_date('05-FEB-13')
)
select aa.dt,count(distinct bb.id)id from cal aa
left join contract bb on aa.dt >= bb.date_active and aa.dt<= bb.date_end
group by aa.dt
order by 1
but the problem is I have 6 mio of ID and if I use this kind of query, the result maybe will take forever, and I'm having a hard times to figured out how to get the result with different query. It will be my pleasure if somebody can help me out of this. Thank you so much.
If you group your events by date_active and date_end, you will get the numbers of events which have started and ended on each separate day.
Not a lot of days have passed between 2013 and 2019 (about 2 000), so the grouped resultsets will be relatively short.
Now that you have the two groups, you can notice that the number of events on each given date is the number of events which have started on or before this date, minus the number of events which have finished on or before this date (I'm assuming the end dates are non-inclusive).
In other words, the number of events on every given day is:
The number of events on the previous date,
plus the number of events started on this date,
minus the number of events ended on this date.
This can be easily done using a window function.
This will require a join between the calendar table and the two groups, but fortunately all of them are relatively short (thousands of records) and the join would be fast.
Here's the query: http://sqlfiddle.com/#!4/b21ce/5
WITH cal AS
(
SELECT TRUNC (to_date('01-NOV-13') - ROWNUM) dt
FROM dual
CONNECT BY
rownum < to_date('01-NOV-13')- to_date('01-FEB-13')
),
started_on AS
(
SELECT date_active AS dt, COUNT(*) AS cnt_start
FROM contract
GROUP BY
date_active
),
ended_on AS
(
SELECT date_end AS dt, COUNT(*) AS cnt_end
FROM contract
GROUP BY
date_end
)
SELECT dt,
SUM(COALESCE(cnt_start, 0) - COALESCE(cnt_end, 0)) OVER (ORDER BY dt) cnt
FROM cal c
LEFT JOIN
started_on s
USING (dt)
LEFT JOIN
ended_on e
USING (dt)
(I used a fixed date instead of SYSDATE to keep the resultset short, but the idea is the same)
This query requires that the calendar starts before the earliest event, otherwise every result will be off by a fixed amount, the number of events before the beginning of the calendar.
You can replace the fixed date in the calendar condition with (SELECT MIN(date_active) FROM contract) which is instant if date_active is indexed.
Update:
If your contract dates can overlap and you want to collapse multiple overlapping contracts into a one continuous contract, you can use window functions to do so.
WITH cal AS
(
SELECT TRUNC (to_date('01-NOV-13') - ROWNUM) dt
FROM dual
CONNECT BY
rownum <= to_date('01-NOV-13')- to_date('01-FEB-13')
),
collapsed_contract AS
(
SELECT *
FROM (
SELECT c.*,
COALESCE(LAG(date_end_effective) OVER (PARTITION BY id ORDER BY date_active), date_active) AS date_start_effective
FROM (
SELECT c.*,
MAX(date_end) OVER (PARTITION BY id ORDER BY date_active) AS date_end_effective
FROM contract c
) c
) c
WHERE date_start_effective < date_end_effective
),
started_on AS
(
SELECT date_start_effective AS dt, COUNT(*) AS cnt_start
FROM collapsed_contract
GROUP BY
date_start_effective
),
ended_on AS
(
SELECT date_end_effective AS dt, COUNT(*) AS cnt_end
FROM collapsed_contract
GROUP BY
date_end_effective
)
SELECT dt,
SUM(COALESCE(cnt_start, 0) - COALESCE(cnt_end, 0)) OVER (ORDER BY dt) cnt
FROM cal c
LEFT JOIN
started_on s
USING (dt)
LEFT JOIN
ended_on e
USING (dt)
http://sqlfiddle.com/#!4/adeba/1
The query might seem bulky, but that's to make it more efficient, as all these window functions can be calculated in a single pass over the table.
Note however that this single pass relies on the table being sorted on (id, date_active) so an index on these two fields is crucial.
Firstly, row_number() over (order by id,date_active) analytic function is used in order to generate unique ID values those will be substituted in
connect by level <= ... and prior id = id syntax to get unpivoted hierarchical data :
with t0 as
(
select row_number() over (order by id,date_active) as id, date_active, date_end
from contract
), t1 as
(
select date_active + level - 1 as dt
from t0
connect by level <= date_end - date_active + 1
and prior id = id
and prior sys_guid() is not null
)
select dt, count(*)
from t1
group by dt
order by dt
Demo

SQL count occurrences in window

I have user logins by date. My requirement is to track the number of users that have been logged in during the past 90 days window.
I am new to both SQL in general and Teradata specifically and I can't get the window functionality to work as I need.
I need the following result, where ACTIVE is a count of the unique USER_IDs that appear in the previous 90 day window the DATE.
DATES ACTIVE_IN_WINDOW
12/06/2018 20
13/06/2018 45
14/06/2018 65
15/06/2018 73
17/06/2018 24
18/06/2018 87
19/06/2018 34
20/06/2018 51
Currently my script is as follows.
It is this line here that I cant get right
COUNT ( USER_ID) OVER (PARTITION BY USER_ID ORDER BY EVT_DT ROWS BETWEEN 90 PRECEDING AND 0 FOLLOWING)
I suspect I need a different set of functions to make this work.
SELECT b.DATES , a.ACTIVE_IN_WINDOW
FROM
(
SELECT
CAST(CALENDAR_DATE AS DATE) AS DATES FROM SYS_CALENDAR.CALENDAR
WHERE DATES BETWEEN ADD_MONTHS(CURRENT_DATE, - 10) AND CURRENT_DATE
) b
LEFT JOIN
(
SELECT USER_ID , EVT_DT
, COUNT ( USER_ID) OVER (PARTITION BY USER_ID ORDER BY EVT_DT ROWS BETWEEN 90 PRECEDING AND 0 FOLLOWING) AS ACTIVE_IN_WINDOW
FROM ENV0.R_ONBOARDING
) a
ON a.EVT_DT = b.DATES
ORDER BY b.DATES
Thank you for any assistance.
The logic is similar to Gordon', but a non-equi-Join instead of a Correlated Scalar Subquery is usually more efficient on Teradata:
SELECT b.DATES , Count(DISTINCT USER_ID)
FROM
(
SELECT CALENDAR_DATE AS DATES
FROM SYS_CALENDAR.CALENDAR
WHERE DATES BETWEEN Add_Months(Current_Date, - 10) AND Current_Date
) b
LEFT JOIN
( -- apply DISTINCT before aggregation to reduce intermediate spool
SELECT DISTINCT USER_ID, EVT_DT
FROM ENV0.R_ONBOARDING
) AS a
ON a.EVT_DT BETWEEN Add_Months(b.DATES,-3) AND b.DATES
GROUP BY 1
ORDER BY 1
Of course this will require a large spool and much CPU.
Edit:
Switching to weeks reduces the overhead, I'm using dates instead of week numbers (it's easier to modify for other ranges):
SELECT b.Week , Count(DISTINCT USER_ID)
FROM
( -- Return only Mondays instead of DISTINCT over all days
SELECT calendar_date AS Week
FROM SYS_CALENDAR.CALENDAR
WHERE CALENDAR_DATE BETWEEN Add_Months(Current_Date, -9) AND Current_Date
AND day_of_week = 2 -- 2 = Monday
) b
LEFT JOIN
(
SELECT DISTINCT USER_ID,
-- td_monday returns the previous Monday, but we need the following monday
-- covers the previous Tuesday up to the current Monday
Td_Monday(EVT_DT+6) AS PERIOD_WEEK
FROM ENV0.R_ONBOARDING
-- You should add another condition to limit the actually covered date range, e.g.
-- where EVT_DT BETWEEN Add_Months(b.DATES,-13) AND b.DATES
) AS a
ON a.PERIOD_WEEK BETWEEN b.Week-(12*7) AND b.Week
GROUP BY 1
ORDER BY 1
Explain should duplicate the calendar as preparation for the product join, if not you might need to materialize the dates in a Volatile Table. Better don't use sys_calendar, there are no statistics, e.g. optimizer doesn't know about how many days per week/month/year, etc. Check your system, there should be a calendar table designed for you company needs (with stats on all columns)
If your data is not too big, a subquery might be the simplest method:
SELECT c.dte,
(SELECT COUNT(DISTINCT o.USER_ID)
FROM ENV0.R_ONBOARDING o
WHERE o.EVT_DT > ADD_MONTHS(dte, -3) AND
o.EVT_DT <= dte
) as three_month_count
FROM (SELECT CAST(CALENDAR_DATE AS DATE) AS dte
FROM SYS_CALENDAR.CALENDAR
WHERE CALENDAR_DATE BETWEEN ADD_MONTHS(CURRENT_DATE, - 10) AND CURRENT_DATE
) c;
You might want to start on a shorter timeframe then 3 months to see how the query performs.

Same output in two different lateral joins

I'm working on a bit of PostgreSQL to grab the first 10 and last 10 invoices of every month between certain dates. I am having unexpected output in the lateral joins. Firstly the limit is not working, and each of the array_agg aggregates is returning hundreds of rows instead of limiting to 10. Secondly, the aggregates appear to be the same, even though one is ordered ASC and the other DESC.
How can I retrieve only the first 10 and last 10 invoices of each month group?
SELECT first.invoice_month,
array_agg(first.id) first_ten,
array_agg(last.id) last_ten
FROM public.invoice i
JOIN LATERAL (
SELECT id, to_char(invoice_date, 'Mon-yy') AS invoice_month
FROM public.invoice
WHERE id = i.id
ORDER BY invoice_date, id ASC
LIMIT 10
) first ON i.id = first.id
JOIN LATERAL (
SELECT id, to_char(invoice_date, 'Mon-yy') AS invoice_month
FROM public.invoice
WHERE id = i.id
ORDER BY invoice_date, id DESC
LIMIT 10
) last on i.id = last.id
WHERE i.invoice_date BETWEEN date '2017-10-01' AND date '2018-09-30'
GROUP BY first.invoice_month, last.invoice_month;
This can be done with a recursive query that will generate the interval of months for who we need to find the first and last 10 invoices.
WITH RECURSIVE all_months AS (
SELECT date_trunc('month','2018-01-01'::TIMESTAMP) as c_date, date_trunc('month', '2018-05-11'::TIMESTAMP) as end_date, to_char('2018-01-01'::timestamp, 'YYYY-MM') as current_month
UNION
SELECT c_date + interval '1 month' as c_date,
end_date,
to_char(c_date + INTERVAL '1 month', 'YYYY-MM') as current_month
FROM all_months
WHERE c_date + INTERVAL '1 month' <= end_date
),
invocies_with_month as (
SELECT *, to_char(invoice_date::TIMESTAMP, 'YYYY-MM') invoice_month FROM invoice
)
SELECT current_month, array_agg(first_10.id), 'FIRST 10' as type FROM all_months
JOIN LATERAL (
SELECT * FROM invocies_with_month
WHERE all_months.current_month = invoice_month AND invoice_date >= '2018-01-01' AND invoice_date <= '2018-05-11'
ORDER BY invoice_date ASC limit 10
) first_10 ON TRUE
GROUP BY current_month
UNION
SELECT current_month, array_agg(last_10.id), 'LAST 10' as type FROM all_months
JOIN LATERAL (
SELECT * FROM invocies_with_month
WHERE all_months.current_month = invoice_month AND invoice_date >= '2018-01-01' AND invoice_date <= '2018-05-11'
ORDER BY invoice_date DESC limit 10
) last_10 ON TRUE
GROUP BY current_month;
In the code above, '2018-01-01' and '2018-05-11' represent the dates between we want to find the invoices. Based on those dates, we generate the months (2018-01, 2018-02, 2018-03, 2018-04, 2018-05) that we need to find the invoices for.
We store this data in all_months.
After we get the months, we do a lateral join in order to join the invoices for every month. We need 2 lateral joins in order to get the first and last 10 invoices.
Finally, the result is represented as:
current_month - the month
array_agg - ids of all selected invoices for that month
type - type of the selected invoices ('first 10' or 'last 10').
So in the current implementation, you will have 2 rows for each month (if there is at least 1 invoice for that month). You can easily join that in one row if you need to.
LIMIT is working fine. It's your query that's broken. JOIN is just 100% the wrong tool here; it doesn't even do anything close to what you need. By joining up to 10 rows with up to another 10 rows, you get up to 100 rows back. There's also no reason to self join just to combine filters.
Consider instead window queries. In particular, we have the dense_rank function, which can number every row in the result set according to groups:
SELECT
invoice_month,
time_of_month,
ARRAY_AGG(id) invoice_ids
FROM (
SELECT
id,
invoice_month,
-- Categorize as end or beginning of month
CASE
WHEN month_rank <= 10 THEN 'beginning'
WHEN month_reverse_rank <= 10 THEN 'end'
ELSE 'bug' -- Should never happen. Just a fall back in case of a bug.
END AS time_of_month
FROM (
SELECT
id,
invoice_month,
dense_rank() OVER (PARTITION BY invoice_month ORDER BY invoice_date) month_rank,
dense_rank() OVER (PARTITION BY invoice_month ORDER BY invoice_date DESC) month_rank_reverse
FROM (
SELECT
id,
invoice_date,
to_char(invoice_date, 'Mon-yy') AS invoice_month
FROM public.invoice
WHERE invoice_date BETWEEN date '2017-10-01' AND date '2018-09-30'
) AS fiscal_year_invoices
) ranked_invoices
-- Get first and last 10
WHERE month_rank <= 10 OR month_reverse_rank <= 10
) first_and_last_by_month
GROUP BY
invoice_month,
time_of_month
Don't be intimidated by the length. This query is actually very straightforward; it just needed a few subqueries.
This is what it does logically:
Fetch the rows for the fiscal year in question
Assign a "rank" to the row within its month, both counting from the beginning and from the end
Filter out everything that doesn't rank in the 10 top for its month (counting from either direction)
Adds an indicator as to whether it was at the beginning or end of the month. (Note that if there's less than 20 rows in a month, it will categorize more of them as "beginning".)
Aggregate the IDs together
This is the tool set designed for the job you're trying to do. If really needed, you can adjust this approach slightly to get them into the same row, but you have to aggregate before joining the results together and then join on the month; you can't join and then aggregate.

Counting an already counted column in SQL (db2)

I'm pretty new to SQL and have this problem:
I have a filled table with a date column and other not interesting columns.
date | name | name2
2015-03-20 | peter | pan
2015-03-20 | john | wick
2015-03-18 | harry | potter
What im doing right now is counting everything for a date
select date, count(*)
from testtable
where date >= current date - 10 days
group by date
what i want to do now is counting the resulting lines and only returning them if there are less then 10 resulting lines.
What i tried so far is surrounding the whole query with a temp table and the counting everything which gives me the number of resulting lines (yeah)
with temp_count (date, counter) as
(
select date, count(*)
from testtable
where date >= current date - 10 days
group by date
)
select count(*)
from temp_count
What is still missing the check if the number is smaller then 10.
I was searching in this Forum and came across some "having" structs to use, but that forced me to use a "group by", which i can't.
I was thinking about something like this :
with temp_count (date, counter) as
(
select date, count(*)
from testtable
where date >= current date - 10 days
group by date
)
select *
from temp_count
having count(*) < 10
maybe im too tired to think of an easy solution, but i can't solve this so far
Edit: A picture for clarification since my english is horrible
http://imgur.com/1O6zwoh
I want to see the 2 columned results ONLY IF there are less then 10 rows overall
I think you just need to move your having clause to the inner query so that it is paired with the GROUP BY:
with temp_count (date, counter) as
(
select date, count(*)
from testtable
where date >= current date - 10 days
group by date
having count(*) < 10
)
select *
from temp_count
If what you want is to know whether the total # of records (after grouping), are returned, then you could do this:
with temp_count (date, counter) as
(
select date, counter=count(*)
from testtable
where date >= current date - 10 days
group by date
)
select date, counter
from (
select date, counter, rseq=row_number() over (order by date)
from temp_count
) x
group by date, counter
having max(rseq) >= 10
This will return 0 rows if there are less than 10 total, and will deliver ALL the results if there are 10 or more (you can just get the first 10 rows if needed with this also).
In your temp_count table, you can filter results with the WHERE clause:
with temp_count (date, counter) as
(
select date, count(distinct date)
from testtable
where date >= current date - 10 days
group by date
)
select *
from temp_count
where counter < 10
Something like:
with t(dt, rn, cnt) as (
select dt, row_number() over (order by dt) as rn
, count(1) as cnt
from testtable
where dt >= current date - 10 days
group by dt
)
select dt, cnt
from t where 10 >= (select max(rn) from t);
will do what you want (I think)