SQL: How to get date range counts as rows instead of columns? - sql

The generalized use case I have is to get record counts for a number of date ranges across one or more tables.
My specific use case is this:
For a patient encounter table (enc) and a pregnancy table (preg), get the counts of patients seen 9 months before the expected due date, 12 months before, 15 months before, etc.
I can get the data I need by doing an outer join on the encounter table with a where clause that boxes the time constraints. However, this seems to be inefficient, a lot of typing, and the data are not in the form I'd like (I would like each time window to be a row instead of a column).
Below is the query I currently have. How can I rewrite it to get the data row wise instead of column wise?
select
preg.org,
count(distinct nine.patient_id) `Pre-Delivery Visits (09 Months)`,
count(distinct twelve.patient_id) `Pre-Delivery Visits (12 Months)`,
count(distinct all.patient_id) `Pre-Delivery Visits (All)`,
count(distinct preg.patient_id) `All Pregnancies`
from
pregnancy preg
left outer join enc nine on preg.patient_id = nine.patient_id and nine.encounter_date < preg.est_delivery_date and nine.encounter_date > date_add(preg.est_delivery_date, (-30*9))
left outer join enc twelve on preg.patient_id = twelve.patient_id and twelve.encounter_date < preg.est_delivery_date and twelve.encounter_date > date_add(preg.est_delivery_date, (-30*12))
left outer join enc all on preg.patient_id = all.patient_id and all.encounter_date < preg.est_delivery_date
group by 1
;
Data are returned in this format:
org (09 Months) (12 months) (All) (All Pregnancies)
org x 1 10 15 20
org y 2 22 23 24
org z 200 202 230 250
I'd like to get the data like this
org time_box count
org x 09 mon 1
org y 09 mon 2
org z 09 mon 202
org x 12 mon 10
...
etc.

I'm not sure if this does what you want. This calculates non-overlapping groups, so 12 months is really 9-12 months:
select (case when e.encounter_date > p.est_delivery_date - 30*9 day
then 'nine'
when e.encounter_date > p.est_delivery_date - 30*12 day
then 'twelve'
when e.encounter_date is not null
then 'all pre-delivery'
else 'all pregnancy'
end) as grp,
count(distinct p.patient_id)
from pregnancy p left join
enc e
on e.patient_id = p.patient_id and
e.encounter_date < p.est_delivery_date
group by grp;

Related

Oracle SQL - Count, per month, how many times a site appears in the results

I'm not sure if I will explain this correctly so apologies in advance.
I'm looking to put together a report that shows the number of times a site (central_site.site_code & central_site.site_name) appears in a report and then total this up for each month with a grand total at the end. The date to summarize into month values is job.actual_start_date
What I'm looking for is something like:
Site Code Site Name April May June July August Total
1234 HIGH STREET 2 4 3 3 2 14
3093 4TH AVENUE 10 5 8 8 7 38
The code I have got so far to produce all the information that I would like summarizing in the format above is:
select
central_site.site_code,
central_site.site_name,
job.actual_start_date
from
central_site
inner join job on job.site_code = central_site.site_code
inner join job_type on job.job_type_key = job_type.job_type_key
inner join job_status_log on job.job_number = job_status_log.job_number
where
job_type.job_type_code = 'G012' and
job_status_log.status_code = '5200'
I just don't know the syntax / formulas to be able to total each site up per month and then provide a total for the year.
I think you want conditional aggregation:
select cs.site_code, cs.site_name,
sum(case when extract(month from ?.datecol) = 1 then 1 else 0 end) as jan,
sum(case when extract(month from ?.datecol) = 2 then 1 else 0 end) as feb,
. . .,
count(*) as year_total
from central_site cs join
job j
on j.site_code = cs.site_code join
job_type jt
on j.job_type_key = jt.job_type_key join
job_status_log jsl
on j.job_number = js.job_number
where jt.job_type_code = 'G012' and
jsl.status_code = '5200' and
?.datecol >= date '2018-01-01' and
?.datecol < date '2019-01-01'
group by cs.site_code, cs.site_name;
This is assuming that "number of times" is simply a count. Your question doesn't specify *what column is used to specify the date. So, that element needs to be filled in.

How do I get a 0 sum when grouping by more than one field?

I have tried looking and found using left outer joins tends to be the answer, but that usually is for a sum to a date grouping and not 2 levels of groupings. I think the additional grouping is messing me up.
My Table:
dbo._Labour
_id int pk --the id of this labour entry
_date date --the date of this entry (using this to group by year)
_activityid int --what type of activity they are doing
_typeid int --3 possible options (0,1,2) of what type of item they are performing the activity on
_hours numeric(4,0) --how many hours were spent performing that activity.
My ultimate goal is to know for each year the total hours spent working for each of the 3 types (0,1,2). My current code:
select coalesce(SUM(_hours),0) as _hours, YEAR(_date) as _year, l._typeid
from RelayTanks.dbo._Labour l
left outer join (select YEAR(_date) as _y
from RelayTanks.dbo._Labour group by YEAR(_date)) y on y._y = YEAR(l._date)
where _activityid in (3,4,9,11)
group by YEAR(_date), l._typeid
order by year(_date), l._typeid'
which results in:
_typeid _year _hours
0 2015 1174
1 2015 3953
2 2015 851
0 2016 119
1 2016 541
2 2016 65
1 2017 10
What I am looking for is 2017 to show _hours 0 for _typeid 0 and a 0 for _typeid 2. ie:
_typeid _year _hours
0 2015 1174
1 2015 3953
2 2015 851
0 2016 119
1 2016 541
2 2016 65
0 2017 0
1 2017 10
2 2017 0
I have tried many things including 2 outer left joins and I just can't quite figure it out. The current out join I thought would be a good substitute to actually having a dummy year table and only showing years where at least some work has been done.
This is my first question posted here so many apologies if I have neglected to include some info or am missing some community etiquette.
Thanks for your help!
This would get you there, assuming that all of the possible years and types are already represented in your data:
SELECT year(_date), _typeid, _hours = SUM(_hours)
FROM _Labour
GROUP BY year(_date), _typeid
UNION
SELECT year(y._date), t._typeid, 0
FROM _Labour y, _Labour t
WHERE NOT EXISTS (SELECT 1 FROM _Labour WHERE year(_Labour._date) = year(y._date) AND _Labour._typeid = t._typeid)
ORDER BY 1, 2
if you have to do this with just one table then something like this might work.
SELECT s._typeid,
s._y,
coalesce(SUM(lj._hours),0) as _hours
FROM
(
SELECT t._typeid, y._y
FROM (SELECT DISTINCT _typeid FROM RelayTanks.dbo._Labour) t
CROSS JOIN
(SELECT DISTINCT YEAR(_date) _y FROM RelayTanks.dbo._Labour) y
) s
LEFT JOIN RelayTanks.dbo._Labour lj ON lj._typeid = s._typeid AND YEAR(lj._date) = s._y
performance might be terrible.. you might want to consider adding a computed column for the _date.year value

Missing a single day

My database has two tables, a car table and a wheel table.
I'm trying to find the number of wheels that meet a certain condition over a range of days, but some days are not included in the output.
Here is the query:
USE CarDB
SELECT MONTH(c.DateTime1) 'Month',
DAY(c.DateTime1) 'Day',
COUNT(w.ID) 'Wheels'
FROM tblCar c
INNER JOIN tblWheel w
ON c.ID = w.CarID
WHERE c.DateTime1 BETWEEN '05/01/2013' AND '06/04/2013'
AND w.Measurement < 18
GROUP BY MONTH(c.DateTime1), DAY(c.DateTime1)
ORDER BY [ Month ], [ Day ]
GO
The output results seem to be correct, but days with 0 wheels do not show up. For example:
Sample Current Output:
Month Day Wheels
2 1 7
2 2 4
2 3 2 -- 2/4 is missing
2 5 9
Sample Desired Ouput:
Month Day Wheels
2 1 7
2 2 4
2 3 2
2 4 0
2 5 9
I also tried a left join but it didn't seem to work.
You were on the right track with a LEFT JOIN
Try run your query with this kind of outer join but remove your WHERE clause. Notice anything?
What's happening is that the join is applied and then the where clause removes the values that don't match the criteria. All this happens before the group by, meaning the cars are excluded.
Here's one method for you:
SELECT Year(cars.datetime1) As the_year
, Month(cars.datetime1) As the_month
, Day(cars.datetime1) As the_day
, Count(wheels.id) As wheels
FROM (
SELECT id
, datetime1
FROM tblcar
WHERE datetime1 BETWEEN '2013-01-05' AND '2013-04-06'
) As cars
LEFT
JOIN tblwheels As wheels
ON wheels.carid = cars.id
What's different this time round is that we're limiting the results of the car table before we join to the wheels table.
You probably want to use a LEFT OUTER JOIN:
USE CarDB
SELECT MONTH (c.DateTime1) 'Month', DAY (c.DateTime1) 'Day', COUNT (w.ID) 'Wheels'
FROM tblCar c LEFT OUTER JOIN tblWheel w ON c.ID = w.CarID
WHERE c.DateTime1 BETWEEN '05/01/2013' AND '06/04/2013'
AND (w.Measurement IS NULL OR w.Measurement < 18)
GROUP BY MONTH (c.DateTime1), DAY (c.DateTime1)
ORDER BY [Month], [Day]
GO
Aand then, you need to adapt the WHERE condition, as you want to keep the rows with w.Measurement being NULL due to the OUTER join.
Remove the join and change your select to this:
SELECT MONTH (c.DateTime1) 'Month', DAY (c.DateTime1) 'Day', isnull(select top 1 (select COUNT from tblWheel where id = tblCar.ID and Measurement < 18), 0) 'Wheels'

SQL Query: Calculating the deltas in a time series

For a development aid project I am helping a small town in Nicaragua improving their water-network-administration.
There are about 150 households and every month a person checks the meter and charges the houshold according to the consumed water (reading from this month minus reading from last month). Today all is done on paper and I would like to digitalize the administration to avoid calculation-errors.
I have an MS Access Table in mind - e.g.:
*HousholdID* *Date* *Meter*
0 1/1/2013 100
1 1/1/2013 130
0 1/2/2013 120
1 1/2/2013 140
...
From this data I would like to create a query that calculates the consumed water (the meter-difference of one household between two months)
*HouseholdID* *Date* *Consumption*
0 1/2/2013 20
1 1/2/2013 10
...
Please, how would I approach this problem?
This query returns every date with previous date, even if there are missing months:
SELECT TabPrev.*, Tab.Meter as PrevMeter, TabPrev.Meter-Tab.Meter as Diff
FROM (
SELECT
Tab.HousholdID,
Tab.Data,
Max(Tab_1.Data) AS PrevData,
Tab.Meter
FROM
Tab INNER JOIN Tab AS Tab_1 ON Tab.HousholdID = Tab_1.HousholdID
AND Tab.Data > Tab_1.Data
GROUP BY Tab.HousholdID, Tab.Data, Tab.Meter) As TabPrev
INNER JOIN Tab
ON TabPrev.HousholdID = Tab.HousholdID
AND TabPrev.PrevData=Tab.Data
Here's the result:
HousholdID Data PrevData Meter PrevMeter Diff
----------------------------------------------------------
0 01/02/2013 01/01/2013 120 100 20
1 01/02/2013 01/01/2012 140 130 10
The query above will return every delta, for every households, for every month (or for every interval). If you are just interested in the last delta, you could use this query:
SELECT
MaxTab.*,
TabCurr.Meter as CurrMeter,
TabPrev.Meter as PrevMeter,
TabCurr.Meter-TabPrev.Meter as Diff
FROM ((
SELECT
Tab.HousholdID,
Max(Tab.Data) AS CurrData,
Max(Tab_1.Data) AS PrevData
FROM
Tab INNER JOIN Tab AS Tab_1
ON Tab.HousholdID = Tab_1.HousholdID
AND Tab.Data > Tab_1.Data
GROUP BY Tab.HousholdID) As MaxTab
INNER JOIN Tab TabPrev
ON TabPrev.HousholdID = MaxTab.HousholdID
AND TabPrev.Data=MaxTab.PrevData)
INNER JOIN Tab TabCurr
ON TabCurr.HousholdID = MaxTab.HousholdID
AND TabCurr.Data=MaxTab.CurrData
and (depending on what you are after) you could only filter current month:
WHERE
DateSerial(Year(CurrData), Month(CurrData), 1)=
DateSerial(Year(DATE()), Month(DATE()), 1)
this way if you miss a check for a particular household, it won't show.
Or you might be interested in showing last month present in the table (which can be different than current month):
WHERE
DateSerial(Year(CurrData), Month(CurrData), 1)=
(SELECT MAX(DateSerial(Year(Data), Month(Data), 1))
FROM Tab)
(here I am taking in consideration the fact that checks might be on different days)
I think the best approach is to use a correlated subquery to get the previous date and join back to the original table. This ensures that you get the previous record, even if there is more or less than a 1 month lag.
So the right query looks like:
select t.*, tprev.date, tprev.meter
from (select t.*,
(select top 1 date from t t2 where t2.date < t.date order by date desc
) prevDate
from t
) join
t tprev
on tprev.date = t.prevdate
In an environment such as the one you describe, it is very important not to make assumptions about the frequency of reading the meter. Although they may be read on average once per month, there will always be exceptions.
Testing with the following data:
HousholdID Date Meter
0 01/12/2012 100
1 01/12/2012 130
0 01/01/2013 120
1 01/01/2013 140
0 01/02/2013 120
1 01/02/2013 140
The following query:
SELECT a.housholdid,
a.date,
b.date,
a.meter,
b.meter,
a.meter - b.meter AS Consumption
FROM (SELECT *
FROM water
WHERE Month([date]) = Month(Date())
AND Year([date])=year(Date())) a
LEFT JOIN (SELECT *
FROM water
WHERE DateSerial(Year([date]),Month([date]),Day([date]))
=DateSerial(Year(Date()),Month(Date())-1,Day([date])) ) b
ON a.housholdid = b.housholdid
The above query selects the records for this month Month([date]) = Month(Date()) and compares them to records for last month ([date]) = Month(Date()) - 1)
Please do not use Date as a field name.
Returns the following result.
housholdid a.date b.date a.meter b.meter Consumption
0 01/02/2013 01/01/2013 120 100 20
1 01/02/2013 01/01/2013 140 130 10
Try
select t.householdID
, max(s.theDate) as billingMonth
, max(s.meter)-max(t.meter) as waterUsed
from myTbl t join (
select householdID, max(theDate) as theDate, max(meter) as meter
from myTbl
group by householdID ) s
on t.householdID = s.householdID and t.theDate <> s.theDate
group by t.householdID
This works in SQL not sure about access
You can use the LAG() function in certain SQL dialects. I found this to be much faster and easier to read than joins.
Source: http://blog.jooq.org/2015/05/12/use-this-neat-window-function-trick-to-calculate-time-differences-in-a-time-series/

MS Access Rounding Precision With Group By

Why doesn't the average of the score of an employee of each month, when summed, equal the average of the employees score (ever)?
Average
SELECT Avg(r.score) AS rawScore
FROM (ET INNER JOIN Employee AS e ON ET.employeeId = e.id) INNER JOIN (Employee AS a INNER JOIN Review AS r ON a.id = r.employeeId) ON ET.id = r.ETId
WHERE (((e.id)=#employeeId))
Returns 80.737
Average By Month
SELECT Avg(r.score) AS rawScore, Format(submitDate, 'mmm yy') AS MonthText, month(r.submitDate) as mm, year(submitDate) as yy
FROM (ET INNER JOIN Employee AS e ON ET.employeeId = e.id) INNER JOIN (Employee AS a INNER JOIN Review AS r ON a.id = r.employeeId) ON ET.id = r.ETId
WHERE (((e.id)=#employeeId))
GROUP BY month(r.submitDate), year(submitDate), Format(submitDate, 'mmm yy')
ORDER BY year(submitDate) DESC, month(r.submitDate) DESC
Returns
Average Score : Month
81.000 : Oct 09
80.375 : Sep 09
82.700 : Aug 09
83.100 : Jul 09
75.625 : Jun 09
I know 80.737 is correct because I have tallied up the records by hand and done the average. But the average of this table (at 3 decimal places), is 80.56 which is too far off. Does group by mess with the rounding at each step?
An average of average values will not return the same result as a single average over all values, unless all the groups averaged have the same number of items.
If there are different numbers of employees rawScore each month it will be skewing your results.
Consider this example: if we calculate the average of the numbers 1 through 10 the average is 5.5.
Calculating the average of the numbers from 1 through 5 the average is 3, and of 6 through 10 is 8. Both groups have 5 items so the average of 3 and 8 = 5.5.
However, if you take the first average as 1 and 2 = 1.5, and the second average as 3 through 10 = 6.5, then average 1.5 and 6.5 gives 4. This is skewed because the first group has 2 items, and the second has 8.
In addition to this will be the cumulative effects of rounding that Robert Harvey noted.
I wouldn't expect the two results to be the same, for the simple reason that, if rounding is occurring, you're rounding five times in the monthly scores, and only once for the yearly.
That said, I would check the record counts also, and see if they jibe. It is possible given the formatting on the date and such, that a record or two is slipping through the cracks on the monthly queries.