Counts based on an "as of" date - sql

My apologies if I'm not wording the question correctly, and that's why I can't find any previous question/answers on this.....
My specific situation can be generalized as:
I have a table containing records of bed assignments for patients at a system of hospitals. A patient's placement into a bed is tagged with a date, and a reason for their placement there.
Patient |Hospital |Bed |Reason |Date
--------|---------|----|-------|--------
1234 |HOSP1 |111 |A |1/1/2016
5678 |HOSP1 |222 |A |2/1/2016
9012 |HOSP2 |333 |B |3/1/2016
3456 |HOSP3 |444 |C |3/1/2016
2345 |HOSP3 |555 |A |3/1/2016
7890 |HOSP1 |111 |D |4/1/2016
Based on the very small sample set above, I need to get a count of the "Reasons", per Hospital, given an "as of" date. So given an "as of" date of 3/15/2016:
As of Date: 3/15/2016
Hospital|Reason |Count
--------|---------|-----
HOSP1 |A |2
HOSP2 |B |1
HOSP3 |A |1
HOSP3 |C |1
But when changing the "as of" date to 4/1/16, I would hope to see the following:
As of Date: 4/15/2016
Hospital|Reason |Count
--------|---------|-----
HOSP1 |A |1
HOSP1 |D |1
HOSP2 |B |1
HOSP3 |A |1
HOSP3 |C |1
Any suggestions on the best route to accomplish this without melting my CPU or the servers? (my real record set is about 36m rows, going back 15 years). And my ultimate end goal is determine yearly averages of "reason" counts at each "hospital", but I know the first step is to get these initial counts finalized first (or is it???).

What you want is the most recent record before a certain date. This is pretty easy to do using window functions:
select hospital, reason, count(*)
from (select t.*,
row_number() over (partition by hospital, bed order by date desc) as seqnum
from t
where date <= '2016-03-15'
) t
where seqnum = 1
group by hospital, reason;

Related

Conditional count of rows where at least one peer qualifies

Background
I'm a novice SQL user. Using PostgreSQL 13 on Windows 10 locally, I have a table t:
+--+---------+-------+
|id|treatment|outcome|
+--+---------+-------+
|a |1 |0 |
|a |1 |1 |
|b |0 |1 |
|c |1 |0 |
|c |0 |1 |
|c |1 |1 |
+--+---------+-------+
The Problem
I didn't explain myself well initially, so I've rewritten the goal.
Desired result:
+-----------------------+-----+
|ever treated |count|
+-----------------------+-----+
|0 |1 |
|1 |3 |
+-----------------------+-----+
First, identify id that have ever been treated. Being "ever treated" means having any row with treatment = 1.
Second, count rows with outcome = 1 for each of those two groups. From my original table, the ids who are "ever treated" have a total of 3 outcome = 1, and the "never treated", so to speak, have 1 `outcome = 1.
What I've tried
I can get much of the way there, I think, with something like this:
select treatment, count(outcome)
from t
group by treatment;
But that only gets me this result:
+---------+-----+
|treatment|count|
+---------+-----+
|0 |2 |
|1 |4 |
+---------+-----+
For the updated question:
SELECT ever_treated, sum(outcome_ct) AS count
FROM (
SELECT id
, max(treatment) AS ever_treated
, count(*) FILTER (WHERE outcome = 1) AS outcome_ct
FROM t
GROUP BY 1
) sub
GROUP BY 1;
ever_treated | count
--------------+-------
0 | 1
1 | 3
db<>fiddle here
Read:
For those who got no treatment at all (all treatment = 0), we see 1 x outcome = 1.
For those who got any treatment (at least one treatment = 1), we see 3 x outcome = 1.
Would be simpler and faster with proper boolean values instead of integer.
(Answer to updated question)
here is an easy to follow subquery logic that works with integer:
select subq.ever_treated, sum(subq.count) as count
from (select id, max(treatment) as ever_treated, count(*) as count
from t where outcome = 1
group by id) as subq
group by subq.ever_treated;

Select rows from a table satisfying criteria for all rows in a child table which have at least one record in another table

We have table "PROCESS" with process_id primary key.
Processes have "items" which are stored in another table "ITEM" with (process_id, item_id) as primary key.
Each item have "events" which are stored in yet another table "EVENT" with (process_id, item_id, event_id) as primary key. Event have type (stored in "events"."event_type" column)
Let's suppose there are events with type "A".
I want to select processes which have at least one event of type "A" for all their items (so, if one item doesn't have such event, I don't need such process in result set).
I ended up with following query:
SELECT needed_processes.process_id FROM (
SELECT items.process_id, items.number_of_items, events.number_of_events FROM
(SELECT process.process_id, count(*) number_of_items FROM process
JOIN item ON process.process_id = item.process_id
GROUP BY process.process_id
) items JOIN
(SELECT needed_events.process_id, count(*) number_of_events FROM
(SELECT process.process_id, item.item_id FROM process JOIN item
ON process.process_id = item.process_id JOIN events ON item.process_id = event.process_id
AND item.item_id = event.item_id
WHERE event.event_type = 'A'
group by process.process_id, item.item_id
) needed_events group by needed_events.process_id
) events ON items.process_id = events.process_id
where items.number_of_items = events.number_of_events) needed_processes
It counting numbers of items for a process and checks that number of desired events for process equals to it's number of items.
This query is hard to read, hard to understand and doesn't look like efficient one.
Is there a simpler queries (in terms of reading, or in terms of performance) for this task?
I will be fine with oracle-specific queries, database agnostic queries are also welcome.
Examples
Process
|process_id|
|1 |
|2 |
|3 |
|4 |
Item (item always belong only to one process)
|process_id|item_id|
|1 |11 |
|1 |12 |
|1 |13 |
|2 |14 |
|2 |15 |
|3 |16 |
Event (event always belong only to one item)
|process_id|item_id|event_id|event_type|
|1 |11 |21 |A |
|1 |11 |22 |A |
|1 |11 |23 |B |
|1 |13 |24 |A |
|2 |14 |25 |A |
|2 |14 |26 |A |
|2 |15 |27 |A |
|2 |15 |28 |B |
Result
|process_id|
|2 |
process_id=1 should be filtered out because it doesn't have event of type A for item 12. It has two events of type A for item 11, but they should be treated as "item 11 has event A".
process_id=2 should be returned in result set, because it has events of type A for all his items. It has two events of type A for item 14 and this should not affect result.
process_id=3 should not be returned because it doesn't have any events (=> doesn't have event of type A for every of it's items)
process_id=4 should not be returned because it doesn't have any items (corner case).
This returns all processes where there's an event 'A' for every item:
select process_id
from events
group by process_id
having count(distinct item_id) -- all items
= count(distinct case when event_type = 'A' then item_id end) -- only items with event 'A'

Calculating sum of jobs processed every month using two tables

I have these two tables:
TIME (this table contains the time_id which in turn gives the details like the day,month, year etc)
|time_id|hour|day|month|year|
_____________________________
|1234 |1 |6 |9 |2013|
_____________________________
|1235 |2 |7 |9 |2013|
_____________________________
|1223 |2 |4 |8 |2014|
_____________________________
|1227 |2 |8 |8 |2014|
SUM_JOBS_PROCESSED (this table contains the time_id and the no of jobs processed for this particular time_id.)
|time_id|sum_of_jobs_processed|
_______________________________
|1234 |5 |
_______________________________
|1235 |6 |
_______________________________
|1223 |4 |
_______________________________
|1227 |4 |
I am trying to write a query which should display something like this
|month|year|sum_of_jobs_processed|
__________________________________
|9 |2013| 11 |
__________________________________
|8 |2014| 8 |
__________________________________
It should display total number of jobs processed for a month.
Could anyone please help me with these? I am able to find total number of jobs processed for a day, but number of jobs processed for a month, is not happening.
Not sure I fully understood what you're trying, but I think this query should give you the desired result:
SELECT t.month,
t.year,
SUM(s.sum_of_jobs_processed)
FROM bspm_dim_time t
JOIN bspm_sum_jobs_day s
ON t.time_id = s.time_id
GROUP BY t.month,
t.year
ORDER BY t.year,
t.month
Live DEMO.
Try this:
SELECT month,
year,
sum(sum_of_jobs_processed)
FROM TIME
INNER JOIN SUM_JOBS_PROCESSED
ON TIME.time_id = SUM_JOBS_PROCESSED.time_id
GROUP BY month,
year
ORDER BY month,
year
Mark as answer if correct.

How to write an SQL statement to record the fact that Rabbit has one additional homework assigned

I have a table shown below, and I want to record the fact that Rabbit has one additional homework assigned.
id |name |partnerId |totalHW |lateHW |major
-----------------------------------------------
12 |Puma |17 |3 |0 |CS
14 |Rabbit |21 |7 |4 |SE
17 |Mouse |12 |5 |5 |CE
21 |Aardvark |32 |2 |0 |CS
22 |Cougar |24 |4 |2 |SE
24 |Zebra |28 |7 |3 |EE
27 |Orca |14 |2 |1 |CS
32 |Elephant |null |0 |null |S
How do I go about it? I could not find the relation as how is Rabbit assigned one additional homework. How to solve this query?
I would do it like this:
UPDATE [tablename]
SET totalHW = totalHW + 1
WHERE name = 'Rabbit';
You'll likely come to a solution easily if you change to a normalized database structure. "totalHW" and "lateHW" are calculated fields based on other data, i.e., individual assignments. Instead create a homework table, likely with a late field and whatever other fields, and when you need to know total and number of late, calculate that in sql. Once you do that, adding more homework and other manipulation of the data becomes very simple.

Case to check if previous record matches last record

I have a query result like the one below. I wish to add a column in the query result that will flag as 1 if the [FinishTime] of the last record related to same [Machine] has the same [StartTime] as the current record.
So for example, in the table below, there is a flag=1 for row 5 ([Machine]=RD103) because it has the same start-time as for it's last record entry (row 3).
+---+-------+---------+-------+---------+----------------------+
|OID|Machine|StartTime|EndTime|DelayName|Consecutive Delay Flag|
+---+-------+---------+-------+---------+----------------------+
|1 |RD101 |20:00 |20:20 |A |0 |
+---+-------+---------+-------+---------+----------------------+
|2 |RD102 |21:00 |22:00 |A |0 |
+---+-------+---------+-------+---------+----------------------+
|3 |RD103 |23:00 |23:20 |B |0 |
+---+-------+---------+-------+---------+----------------------+
|4 |RD101 |20:20 |20:45 |C |1 |
+---+-------+---------+-------+---------+----------------------+
|5 |RD103 |23:20 |23:25 |A |1 |
+---+-------+---------+-------+---------+----------------------+
This is a great example of what analytic functions do - they don't force you to group your results (in other words - they still produce a single result per row), but you can have values that relate to other rows.
In your case, the LAG function should do the trick:
SELECT oid, machine, starttime, endtime, delayname,
CASE WHEN starttime =
LAG (starttime) OVER (PARTITION BY machine ORDER BY starttime)
THEN 1
ELSE 0
END AS flag
FROM my_table