MS Access Rounding Precision With Group By - sql

Why doesn't the average of the score of an employee of each month, when summed, equal the average of the employees score (ever)?
Average
SELECT Avg(r.score) AS rawScore
FROM (ET INNER JOIN Employee AS e ON ET.employeeId = e.id) INNER JOIN (Employee AS a INNER JOIN Review AS r ON a.id = r.employeeId) ON ET.id = r.ETId
WHERE (((e.id)=#employeeId))
Returns 80.737
Average By Month
SELECT Avg(r.score) AS rawScore, Format(submitDate, 'mmm yy') AS MonthText, month(r.submitDate) as mm, year(submitDate) as yy
FROM (ET INNER JOIN Employee AS e ON ET.employeeId = e.id) INNER JOIN (Employee AS a INNER JOIN Review AS r ON a.id = r.employeeId) ON ET.id = r.ETId
WHERE (((e.id)=#employeeId))
GROUP BY month(r.submitDate), year(submitDate), Format(submitDate, 'mmm yy')
ORDER BY year(submitDate) DESC, month(r.submitDate) DESC
Returns
Average Score : Month
81.000 : Oct 09
80.375 : Sep 09
82.700 : Aug 09
83.100 : Jul 09
75.625 : Jun 09
I know 80.737 is correct because I have tallied up the records by hand and done the average. But the average of this table (at 3 decimal places), is 80.56 which is too far off. Does group by mess with the rounding at each step?

An average of average values will not return the same result as a single average over all values, unless all the groups averaged have the same number of items.
If there are different numbers of employees rawScore each month it will be skewing your results.
Consider this example: if we calculate the average of the numbers 1 through 10 the average is 5.5.
Calculating the average of the numbers from 1 through 5 the average is 3, and of 6 through 10 is 8. Both groups have 5 items so the average of 3 and 8 = 5.5.
However, if you take the first average as 1 and 2 = 1.5, and the second average as 3 through 10 = 6.5, then average 1.5 and 6.5 gives 4. This is skewed because the first group has 2 items, and the second has 8.
In addition to this will be the cumulative effects of rounding that Robert Harvey noted.

I wouldn't expect the two results to be the same, for the simple reason that, if rounding is occurring, you're rounding five times in the monthly scores, and only once for the yearly.
That said, I would check the record counts also, and see if they jibe. It is possible given the formatting on the date and such, that a record or two is slipping through the cracks on the monthly queries.

Related

How to add SUM columns to a SQL query that are totaled by week?

I'm writing a query to break down quantity of total transactions by week happening from 8pm to 3am. Here is what I'm trying to accomplish:
StoreNo
Week 1
000001
123
000002
123
(Week 2, Week 3,...)
I am also trying to pull from multiple tables, which are the following:
StoreNo: align_dim
Week No: time_day_dim
Transaction Count: tld.fact_v1
The query I have so far is:
SELECT a.restid,
COUNT(DISTINCT tld_fact_v1.dw_gc_header) as "Total Transactions"
FROM tbc.tbcdbv.tld_fact_v1
LEFT JOIN tbcdb.align_dim a ON a.dw_restid=tbc.tbcdbv.tld_fact_v1.dw_restid
LEFT JOIN tbcdbv.time_day_dim_v1 on tld_fact_v1.dw_day=time_day_dim_v1.dw_day
WHERE time_day_dim_v1.fiscalyearno = 'Y2022'
GROUP BY 1
This query works and I receive:
StoreNo
Total
000001
123
000002
123
How would I be able to get it split out by week?
UNION the result of all count for year 2022
SELECT a.restid,
COUNT(DISTINCT tld_fact_v1.dw_gc_header) as "Total Transactions"
FROM tbc.tbcdbv.tld_fact_v1
LEFT JOIN tbcdb.align_dim a ON a.dw_restid=tbc.tbcdbv.tld_fact_v1.dw_restid
LEFT JOIN tbcdbv.time_day_dim_v1 on tld_fact_v1.dw_day=time_day_dim_v1.dw_day
WHERE time_day_dim_v1.fiscalyearno = 'Y2022'
GROUP BY 1
UNION
SELECT NULL,
COUNT(DISTINCT tld_fact_v1.dw_gc_header) as "Total Transactions"
FROM tbc.tbcdbv.tld_fact_v1
LEFT JOIN tbcdb.align_dim a ON a.dw_restid=tbc.tbcdbv.tld_fact_v1.dw_restid
LEFT JOIN tbcdbv.time_day_dim_v1 on tld_fact_v1.dw_day=time_day_dim_v1.dw_day
WHERE time_day_dim_v1.fiscalyearno = 'Y2022'

SQL: How to get date range counts as rows instead of columns?

The generalized use case I have is to get record counts for a number of date ranges across one or more tables.
My specific use case is this:
For a patient encounter table (enc) and a pregnancy table (preg), get the counts of patients seen 9 months before the expected due date, 12 months before, 15 months before, etc.
I can get the data I need by doing an outer join on the encounter table with a where clause that boxes the time constraints. However, this seems to be inefficient, a lot of typing, and the data are not in the form I'd like (I would like each time window to be a row instead of a column).
Below is the query I currently have. How can I rewrite it to get the data row wise instead of column wise?
select
preg.org,
count(distinct nine.patient_id) `Pre-Delivery Visits (09 Months)`,
count(distinct twelve.patient_id) `Pre-Delivery Visits (12 Months)`,
count(distinct all.patient_id) `Pre-Delivery Visits (All)`,
count(distinct preg.patient_id) `All Pregnancies`
from
pregnancy preg
left outer join enc nine on preg.patient_id = nine.patient_id and nine.encounter_date < preg.est_delivery_date and nine.encounter_date > date_add(preg.est_delivery_date, (-30*9))
left outer join enc twelve on preg.patient_id = twelve.patient_id and twelve.encounter_date < preg.est_delivery_date and twelve.encounter_date > date_add(preg.est_delivery_date, (-30*12))
left outer join enc all on preg.patient_id = all.patient_id and all.encounter_date < preg.est_delivery_date
group by 1
;
Data are returned in this format:
org (09 Months) (12 months) (All) (All Pregnancies)
org x 1 10 15 20
org y 2 22 23 24
org z 200 202 230 250
I'd like to get the data like this
org time_box count
org x 09 mon 1
org y 09 mon 2
org z 09 mon 202
org x 12 mon 10
...
etc.
I'm not sure if this does what you want. This calculates non-overlapping groups, so 12 months is really 9-12 months:
select (case when e.encounter_date > p.est_delivery_date - 30*9 day
then 'nine'
when e.encounter_date > p.est_delivery_date - 30*12 day
then 'twelve'
when e.encounter_date is not null
then 'all pre-delivery'
else 'all pregnancy'
end) as grp,
count(distinct p.patient_id)
from pregnancy p left join
enc e
on e.patient_id = p.patient_id and
e.encounter_date < p.est_delivery_date
group by grp;

Oracle SQL - Count, per month, how many times a site appears in the results

I'm not sure if I will explain this correctly so apologies in advance.
I'm looking to put together a report that shows the number of times a site (central_site.site_code & central_site.site_name) appears in a report and then total this up for each month with a grand total at the end. The date to summarize into month values is job.actual_start_date
What I'm looking for is something like:
Site Code Site Name April May June July August Total
1234 HIGH STREET 2 4 3 3 2 14
3093 4TH AVENUE 10 5 8 8 7 38
The code I have got so far to produce all the information that I would like summarizing in the format above is:
select
central_site.site_code,
central_site.site_name,
job.actual_start_date
from
central_site
inner join job on job.site_code = central_site.site_code
inner join job_type on job.job_type_key = job_type.job_type_key
inner join job_status_log on job.job_number = job_status_log.job_number
where
job_type.job_type_code = 'G012' and
job_status_log.status_code = '5200'
I just don't know the syntax / formulas to be able to total each site up per month and then provide a total for the year.
I think you want conditional aggregation:
select cs.site_code, cs.site_name,
sum(case when extract(month from ?.datecol) = 1 then 1 else 0 end) as jan,
sum(case when extract(month from ?.datecol) = 2 then 1 else 0 end) as feb,
. . .,
count(*) as year_total
from central_site cs join
job j
on j.site_code = cs.site_code join
job_type jt
on j.job_type_key = jt.job_type_key join
job_status_log jsl
on j.job_number = js.job_number
where jt.job_type_code = 'G012' and
jsl.status_code = '5200' and
?.datecol >= date '2018-01-01' and
?.datecol < date '2019-01-01'
group by cs.site_code, cs.site_name;
This is assuming that "number of times" is simply a count. Your question doesn't specify *what column is used to specify the date. So, that element needs to be filled in.

PRINT only SUM by year (group by Territory) in SQL

SELECT year(soh.OrderDate) 'year',sum(soh.TotalDue) 'Total',st.[Group] TerritoryGroup
FROM Sales.SalesOrderHeader soh
LEFT OUTER JOIN Sales.SalesTerritory st
ON soh.TerritoryID=st.TerritoryID
GROUP BY year(soh.OrderDate),(soh.TotalDue),[Group]
ORDER BY year(soh.OrderDate),(soh.TotalDue)
This is what I came up with, but the years are scattered instead of ONE year per Territory total.
(I like to print the Total for each year in each Territory)
Is there a concise way to make this select statement?
If you want one row per year, then only include that in the group by:
SELECT year(soh.OrderDate) as year, sum(soh.TotalDue) as Total
FROM Sales.SalesOrderHeader soh LEFT OUTER JOIN
Sales.SalesTerritory st
ON soh.TerritoryID = st.TerritoryID
GROUP BY year(soh.OrderDate)
ORDER BY year(soh.OrderDate);
If you want one row per year and territory group, then include only those two columns:
SELECT year(soh.OrderDate) as year, sum(soh.TotalDue) as Total, st.[Group] as TerritoryGroup
FROM Sales.SalesOrderHeader soh LEFT OUTER JOIN
Sales.SalesTerritory st
ON soh.TerritoryID = st.TerritoryID
GROUP BY year(soh.OrderDate), [Group]
ORDER BY year(soh.OrderDate), Total;
Some notes:
You do not need single quotes around the column aliases. You should use single quotes only for string and date constants.
If you are summarizing just by year, then you cannot have TerritoryGroup in the output.
In neither case would you include soh.TotalDue in the group by. You are summing that column, not aggregating by it.
The order by clause should not contain soh.TotalDue; it should be the aggregated value (Total) instead.
In GROUP BY you say you want one line per combination of year, totaldue and territory group.
Let's say you have these records:
orderdate totaldue territorygroup
2014-01-01 100 1
2014-01-15 200 1
2014-01-21 100 1
2013-03-03 100 1
2014-04-04 100 2
Then you get these result records:
year totaldue territorygroup
2014 100 1
2014 200 1
2013 100 1
2014 100 2
(BTW: sum(soh.TotalDue) = soh.TotalDue, because you group by TotalDue.)
So the solution for you is to say what you want to see in your result records actually. One record per ______. Thus you get your GROUP BY clause and the results you want.

Missing a single day

My database has two tables, a car table and a wheel table.
I'm trying to find the number of wheels that meet a certain condition over a range of days, but some days are not included in the output.
Here is the query:
USE CarDB
SELECT MONTH(c.DateTime1) 'Month',
DAY(c.DateTime1) 'Day',
COUNT(w.ID) 'Wheels'
FROM tblCar c
INNER JOIN tblWheel w
ON c.ID = w.CarID
WHERE c.DateTime1 BETWEEN '05/01/2013' AND '06/04/2013'
AND w.Measurement < 18
GROUP BY MONTH(c.DateTime1), DAY(c.DateTime1)
ORDER BY [ Month ], [ Day ]
GO
The output results seem to be correct, but days with 0 wheels do not show up. For example:
Sample Current Output:
Month Day Wheels
2 1 7
2 2 4
2 3 2 -- 2/4 is missing
2 5 9
Sample Desired Ouput:
Month Day Wheels
2 1 7
2 2 4
2 3 2
2 4 0
2 5 9
I also tried a left join but it didn't seem to work.
You were on the right track with a LEFT JOIN
Try run your query with this kind of outer join but remove your WHERE clause. Notice anything?
What's happening is that the join is applied and then the where clause removes the values that don't match the criteria. All this happens before the group by, meaning the cars are excluded.
Here's one method for you:
SELECT Year(cars.datetime1) As the_year
, Month(cars.datetime1) As the_month
, Day(cars.datetime1) As the_day
, Count(wheels.id) As wheels
FROM (
SELECT id
, datetime1
FROM tblcar
WHERE datetime1 BETWEEN '2013-01-05' AND '2013-04-06'
) As cars
LEFT
JOIN tblwheels As wheels
ON wheels.carid = cars.id
What's different this time round is that we're limiting the results of the car table before we join to the wheels table.
You probably want to use a LEFT OUTER JOIN:
USE CarDB
SELECT MONTH (c.DateTime1) 'Month', DAY (c.DateTime1) 'Day', COUNT (w.ID) 'Wheels'
FROM tblCar c LEFT OUTER JOIN tblWheel w ON c.ID = w.CarID
WHERE c.DateTime1 BETWEEN '05/01/2013' AND '06/04/2013'
AND (w.Measurement IS NULL OR w.Measurement < 18)
GROUP BY MONTH (c.DateTime1), DAY (c.DateTime1)
ORDER BY [Month], [Day]
GO
Aand then, you need to adapt the WHERE condition, as you want to keep the rows with w.Measurement being NULL due to the OUTER join.
Remove the join and change your select to this:
SELECT MONTH (c.DateTime1) 'Month', DAY (c.DateTime1) 'Day', isnull(select top 1 (select COUNT from tblWheel where id = tblCar.ID and Measurement < 18), 0) 'Wheels'