Aggregate totals and create data for weeks that no data exists - sql

I have a table like such:
Region Date Cases
France 1-1-2014 5
Spain 2-5-2014 6
France 3-5-2014 7
...
What I would like to do is run an aggregated function like so, to group the total number of cases in weeks for each region.
select region, datepart(week, date) weeknbr, sum(cases) cases
from <table>
group by region, datepart(week, date)
order by region, datepart(week, date)
Using this aggregated function, is there a way to insert a zero value for each region when data does not exist for that week?
so the final result would look like:
region weeknbr cases
France 1 5
France 2 0
France 3 0
.....
Spain 1 0
Spain 2 0
Spain 3 0
....
Spain 8 6
I have tried to create a table with week numbers, and then joining the week numbers with my data, but have been unsuccessful. This ends up creating a null or zero value for the region and cases. I can always use the isnull function to make the cases 0, but I need to account for each region for each week. That's whats killing me right now. Is this possible? If not, where should I start looking and how should I modify the underlining tables?
Any help would be greatly appreciated. Thank you

If I understand your meaning correctly, you could always generate artificial rows, cross join on grouped regions for completeness of your 0's, then left join your aggregate table on region and week. So:
select r.region, w.RowId as Weeknbr, isnull(c.Cases,0)
from (
select row_number()over(order by name) as RowID
from master..spt_values
) w
cross join (
select region
from <table>
group by region
) r
left join
select region, datepart(week, date) weeknbr, sum(cases) cases
from <table>
group by region, datepart(week, date)
order by region, datepart(week, date)
) c on (w.RowID <= 53 and w.RowID = c.Weeknbr and r.region = c.region)

You need a date_list table and a region_list table. Cross join the dimension tables to get all date-region combinations and then left join against your fact table.
SELECT
d.date,
r.region,
t.cases
FROM date_list d
CROSS JOIN region_list r
LEFT JOIN date_region t ON d.date = t.date AND r.region = t.region

Related

Trouble running a complex query in sql?

I am pretty new to SQL Server and just started playing with it. I am trying to create a table that shows attendance percentage by department.
So first i run this query:
SELECT CrewDesc, COUNT(*)
FROM database.emp
INNER JOIN database.crew on sim1 = sim2
GROUP BY CrewDesc
This gives a table like this:
Accounting 10
Marketing 5
Economics 20
Engineering 5
Machinery 5
Tech Support 10
Then i run another query:
SELECT DeptDescription, COUNT(*)
FROM database.Attendee
GROUP BY DeptDescription
This gives me the result of all the people that have attended meeting something like
Accounting 8
Marketing 5
Economics 15
Engineering 10
Tech Support 8
Then I get the current week in the year by SELECT Datepart(ww, GetDate()) as CurrentWeek To make this example easy lets assume this will be week "2".
Now the way i was going to create this was a table for each step but that seems like waste. Is there a way we can combine to tables in a query? So in the end result i would like a table like this
Total# Attd Week (Total*Week) Attd/(Total*week)%
Accounting 10 8 2 20 8/20
Marketing 5 5 2 10 5/10
Economics 20 15 2 40 15/40
Engineering 5 10 2 10 10/10
Machinery 5 NULL 2 10 0/10
Tech Support 10 8 2 20 8/20
Ok, note that my recommendation below is based on your exact existing queries - there are certainly other ways to construct this that may be more performant, but functionally this should work for your requirement. Also, it illustrates the key features of different join types that happen to be relevant for your request, as well as inline views (aka nested queries), which are a super-powerful technique in the SQL language as a whole.
select t1.CrewDesc, t1.Total, t2.Attd, t3.Week,
(t1.Total*t3.Week) as Total_x_Week,
case when isnull(t1.Total*t3.Week, 0) = 0 then 0 else isnull(t2.Attd, 0) / isnull(t1.Total*t3.Week, 0) end as PercentageAttd
from (
SELECT CrewDesc, COUNT(*) AS Total
FROM database.emp INNER JOIN database.crew on sim1 = sim2
GROUP BY CrewDesc
) t1
left outer join /* left outer to keep all rows from t1 */ (
SELECT DeptDescription, COUNT(*) AS Attd
FROM database.Attendee GROUP BY DeptDescription
) t2
on t1.CrewDesc = t2.DeptDescription
cross join /* useful when adding a scalar value to all rows */ (
SELECT Datepart(ww, GetDate()) as Week
) t3
order by t1.CrewDesc
Good luck!
Try something like this
SELECT COALESCE(a.crewdesc,b.deptdescription),
a.total,
b.attd,
Datepart(ww, Getdate()) AS week,
total * Datepart(ww, Getdate()),
b.attd/(a.total*Datepart(ww, Getdate()))
FROM (query 1) a
FULL OUTER JOIN (query 2) b
ON a.crewdesc = b.deptdescription
WITH Total AS ( SELECT CrewDesc, COUNT(*) AS [Count]
FROM database.emp
INNER JOIN database.crew on sim1 = sim2
GROUP BY CrewDesc
),
Attd AS ( SELECT DeptDescription, COUNT(*) AS [Count]
FROM database.Attendee
GROUP BY DeptDescription
)
SELECT COALESCE(CrewDesc,DeptDescription) AS [Dept],
Total.[Count] AS [Total#],Attd.[Count] AS [Attd],
Total.[Count] * Datepart(ww, GetDate()) AS [(Total*Week)],
CAST(Attd.[Count] AS VARCHAR(10))+'/'+ CAST((Total.[Count] * Datepart(ww, GetDate()))AS VARCHAR(10)) AS [Attd/(Total*week)%]
FROM Total INNER JOIN Attd ON Total.CrewDesc = Attd.DeptDescription
I'm assuming your queries are correct -- you give no real information about your model so I've no way to know. They look wrong since the same data is called CrewDesc in one table and Dept in another. Also the join sim1 = sim2 seems very strange to me. In any case given the queries you posted this will work.
With TAttend as
(
SELECT CrewDesc, COUNT(*) as TotalNum
FROM database.emp
INNER JOIN database.crew on sim1 = sim2
GROUP BY CrewDesc
), Attend as
(
SELECT DeptDescription, COUNT(*) as Attd
FROM database.Attendee
GROUP BY DeptDescription
)
SELECT CrewDesc as Dept, TotalNum, ISNULL(Attd, 0) as Attd ,Datepart(ww, GetDate()) as Week,
CASE WHEN ISNULL(Attd, 0) > 0 THEN 0
ELSE ISNULL(Attd, 0) / (TotalNum * Datepart(ww, GetDate()) ) END AS Percent
FROM TAttend
LEFT JOIN Attend on CrewDesc = DeptDescription

Missing a single day

My database has two tables, a car table and a wheel table.
I'm trying to find the number of wheels that meet a certain condition over a range of days, but some days are not included in the output.
Here is the query:
USE CarDB
SELECT MONTH(c.DateTime1) 'Month',
DAY(c.DateTime1) 'Day',
COUNT(w.ID) 'Wheels'
FROM tblCar c
INNER JOIN tblWheel w
ON c.ID = w.CarID
WHERE c.DateTime1 BETWEEN '05/01/2013' AND '06/04/2013'
AND w.Measurement < 18
GROUP BY MONTH(c.DateTime1), DAY(c.DateTime1)
ORDER BY [ Month ], [ Day ]
GO
The output results seem to be correct, but days with 0 wheels do not show up. For example:
Sample Current Output:
Month Day Wheels
2 1 7
2 2 4
2 3 2 -- 2/4 is missing
2 5 9
Sample Desired Ouput:
Month Day Wheels
2 1 7
2 2 4
2 3 2
2 4 0
2 5 9
I also tried a left join but it didn't seem to work.
You were on the right track with a LEFT JOIN
Try run your query with this kind of outer join but remove your WHERE clause. Notice anything?
What's happening is that the join is applied and then the where clause removes the values that don't match the criteria. All this happens before the group by, meaning the cars are excluded.
Here's one method for you:
SELECT Year(cars.datetime1) As the_year
, Month(cars.datetime1) As the_month
, Day(cars.datetime1) As the_day
, Count(wheels.id) As wheels
FROM (
SELECT id
, datetime1
FROM tblcar
WHERE datetime1 BETWEEN '2013-01-05' AND '2013-04-06'
) As cars
LEFT
JOIN tblwheels As wheels
ON wheels.carid = cars.id
What's different this time round is that we're limiting the results of the car table before we join to the wheels table.
You probably want to use a LEFT OUTER JOIN:
USE CarDB
SELECT MONTH (c.DateTime1) 'Month', DAY (c.DateTime1) 'Day', COUNT (w.ID) 'Wheels'
FROM tblCar c LEFT OUTER JOIN tblWheel w ON c.ID = w.CarID
WHERE c.DateTime1 BETWEEN '05/01/2013' AND '06/04/2013'
AND (w.Measurement IS NULL OR w.Measurement < 18)
GROUP BY MONTH (c.DateTime1), DAY (c.DateTime1)
ORDER BY [Month], [Day]
GO
Aand then, you need to adapt the WHERE condition, as you want to keep the rows with w.Measurement being NULL due to the OUTER join.
Remove the join and change your select to this:
SELECT MONTH (c.DateTime1) 'Month', DAY (c.DateTime1) 'Day', isnull(select top 1 (select COUNT from tblWheel where id = tblCar.ID and Measurement < 18), 0) 'Wheels'

TERADATA: Aggregate across multiple tables

Consider the following query where aggregation happens across two tables: Sales and Promo and the aggregate values are again used in a calculation.
SELECT
sales.article_id,
avg((sales.euro_value - ZEROIFNULL(promo.euro_value)) / NULLIFZERO(sales.qty - ZEROIFNULL(promo.qty)))
FROM
( SELECT
sales.article_id,
sum(sales.euro_value),
sum(sales.qty)
from SALES_TABLE sales
where year >= 2011
group by article_id
) sales
LEFT OUTER JOIN
( SELECT
promo.article_id,
sum(promo.euro_value),
sum(promo.qty)
from PROMOTION_TABLE promo
where year >= 2011
group by article_id
) promo
ON sales.article_id = promo.article_id
GROUP BY sales.article_id;
Some notes on the query:
Both the inner queries return huge number of rows due to large number of articles. Running explain on teradata, the inner queries themselves take very less time, but the join takes a long time.
Assume primary key on article_id is present and both the tables are partitioned by year.
Left Outer Join because second table contains optional data.
So, can you suggest a better way of writing this query. Thanks for reading this far :)
Not really sure how the avg function got into the mix, so I'm removing it.
SELECT article_id,
(SUM(sales_value) - SUM(promo_value)) /
(SUM(sales_qty) - SUM(promo_qty))
FROM (
SELECT
article_id,
sum(euro_value) AS sales_value,
sum(qty) AS sales_qty,
0 AS promo_value,
0 AS promo_qty
from SALES_TABLE sales
where year >= 2011
group by article_id
UNION ALL
SELECT
article_id,
0 AS sales_value,
0 AS sales_qty,
sum(euro_value) AS promo_value,
sum(qty) AS promo_qty
from SALES_TABLE sales
where year >= 2011
group by article_id
) AS comb
GROUP BY article_id;

Multiple joins to single table in SQL Server query throwing off counts

I am writing a stored procedure in SQL Server Management Studio 2005 to return a list of states and policy counts for two different time periods, month to date and year to date. I have created a couple of views to gather the required data and a stored procedure for use in a Reporting Services report.
Below is my stored procedure:
SELECT DISTINCT
S.[State],
COUNT(HP_MTD.PolicyID) AS PolicyCount_MTD,
COUNT(HP_YTD.PolicyID) AS PolicyCount_YTD
FROM tblStates S
LEFT OUTER JOIN vwHospitalPolicies HP_MTD ON S.[State] = HP.[State]
AND HP.CreatedDate BETWEEN DATEADD(MONTH, -1, GETDATE()) AND GETDATE()
LEFT OUTER JOIN vwHospitalPolicies HP_YTD ON S.[State] = HP.[State]
AND HP.CreatedDate BETWEEN DATEADD(YEAR, -1, GETDATE()) AND GETDATE()
GROUP BY S.[State]
ORDER BY S.[State] ASC
The problem I am running into is my counts are bloating when a second LEFT OUTER JOIN is added, even the COUNT() that isn't referencing the second join. I need a left join since not all states will have policies for the given period, but they should still appear on the report.
Any suggestions would be greatly appreciated!
It sounds like you need:
COUNT(DISTINCT HP_MTD.PolicyID) AS PolicyCount_MTD,
COUNT(DISTINCT HP_YTD.PolicyID) AS PolicyCount_YTD
instead of:
COUNT(HP_MTD.PolicyID) AS PolicyCount_MTD,
COUNT(HP_YTD.PolicyID) AS PolicyCount_YTD
Your original query is including the number of matching rows in the second join. Adding a DISTINCT clause inside the COUNT limits it to unique occurrences of the PolicyID.
I prefer CTEs for this sort of work - they're basically a sort of inline view.
Your actual problem is that some policies are being counted twice - once for the month-to-date, and once for the year-to-date - in both count columns. So, if you have source tables like this:
year-to-date
state policy
==================
1 1
1 2
1 3
2 4
month-to-date
state policy
==================
1 1
1 2
The result table for the JOINs (before COUNT is assesed) looks like this:
temp
state monthPolicy yearPolicy
================================
1 1 1
1 1 2
1 1 3
1 2 1
1 2 2
1 2 3
2 - 4
Clearly not what you want. Often, the solution is to use a CTE or other table reference to present summed-up records for the final join. Something like this:
WITH PoliciesMonthToDate (state, count) as (
SELECT state, COUNT(*)
FROM vwHospitalPolicies
WHERE createdDate >= DATEADD(MONTH, -1, GETDATE())
AND createdDate < DATEADD(DAY, 1, GETDATE())
GROUP BY state),
PoliciesYearToDate (state, count) as (
SELECT state, COUNT(*)
FROM vwHospitalPolicies
WHERE createdDate >= DATEADD(YEAR, -1, GETDATE())
AND createdDate < DATEADD(DAY, 1, GETDATE())
GROUP BY state)
SELECT a.state, COALESCE(b.count, 0) as policy_count_mtd,
COALESCE(c.count, 0) as policy_count_ytd
FROM tblStates a
LEFT JOIN PoliciesMonthToDate b
ON b.state = a.state
LEFT JOIN PoliciesYearToDate c
ON c.state = a.state
ORDER BY a.state
(minor nitpick - don't use prefixes for tables and views. It's noise, and if for some reason you switch between them, it would require a code change. That, or the name would then be misleading)

Detecting duplicates which fall outside of a date interval

I searched in SO but couldnt find a direct answer.
There are patients, hospitals, medical branches(ER,urology,orthopedics,internal disease etc), medical operation codes (examination,surgical operation, MRI, ultrasound or sth. else) and patient visiting dates.
Patient visits doctor, doctor prescribes medicine and asks to come again for control check.
If patient returns after 10 days, (s)he has to pay another examination fee to the same hospital. Hospitals may appoint a date after 10 days telling there are no available slots in following 10 days, in order to get the examination fee.
Table structure is like:
Patient id.no Hospital Medical Branch Medical Op. Code Date
1 H1 M0 P1 01/05/2011
5 H1 M1 P9 03/05/2011
3 H2 M0 P2 09/05/2011
1 H1 M0 P1 14/05/2011
3 H1 M0 P2 20/05/2011
5 H1 M2 P9 25/05/2011
1 H1 M0 P3 26/05/2011
Here, visiting patients no. 3 and 5 does not constitute a problem as patient no. 3 visits different hospitals and patient no.5 visits different medical branches. They would pay the examination fee even if they visited within 10 days.
Patient no.1, however, visits same hospital, same branch and is subject to same process (P1: examination) on 01/05 and 14/05.
26/05 doesnt count because it is not medical examination.
What I want to flag is same patient, same hospital, same branch and same medical operation code (that is specifically medical examination : P1 ), with date range more than 10 days.
The format of resulting table:
HOSPITAL TOTAL NUM. of PATIENTS NUM. of PATIENTS OUT OF DATE RANGE
H1 x a
H2 y b
H3 z c
Thanks.
Once again, it's analytic functions to the rescue.
This query uses the LAG() function to link a record in YOUR_TABLE with the previous (defined by DATE) matching record (defined by PATIENT_ID) in the table.
select hospital_id
, count(*) as total_num_of_patients
, sum (out_of_range) as num_of_patients_out_of_range
from (
select patient_id
, hospital_id
, case
when hospital_id_1 = hospital_id_0
and visit_1 > visit_0 + 10
and med_op_code_1 = med_op_code_0
then 1
else 0
end as out_of_range
from (
select patient_id
, hospital_id as hospital_id_1
, date as visit_1
, med_op_code as med_op_code_1
, lag (date) over (partition by patient_id order by date) as visit_0
, lag (hopital_id) over (partition by patient_id order by date) as hopital_id_0
, lag (med_op_code) over (partition by patient_id order by date) as med_op_code_0
from your_table
where med_op_code = 'P1'
)
)
group by hospital_id
/
Caveat: I haven't tested this code, so it may contain syntax errors. I will check it the next time I can access an Oracle database.
This is a little rough, as I haven't got an Oracle DB to hand, but the key feature is the same: the analytical function LAG(). Along with its companion function, LEAD(), they're great for helping to deal with things like periods of activity.
Here's my attempt at the code:
select n.hospital, COUNT(n.patient_id) as patients_out_of_date_range
from (
select *
from (
select d.*, lag(date, 1) over (partition by d.patient_id, d.hospital, d.medical_branch, d.medical_op_code order by d.date) as prev_date
from datatable d inner join
(
select d.patient_id, d.hospital, d.medical_branch, d.medical_op_code
from datatable d
where d.medical_op_code = 'P1'
group by d.patient_id, d.hospital, d.medical_branch, d.medical_op_code
having COUNT(d.date) > 1
) t on d.patient_id = t.patient_id and d.hospital = t.hospital and d.medical_branch = t.medical_branch and d.medical_op_code = t.medical_op_code
) m
where date - prev_date > 10
) n
group by n.hospital
Like I say, this isn't tested, but it should at least get you started in the right direction.
Some references:
http://www.adp-gmbh.ch/ora/sql/analytical/lag.html
http://www.oracle-base.com/articles/misc/LagLeadAnalyticFunctions.php
I think this is what you're trying for:
WITH Patient_Visits (Patient_Id, Hospital_Id, Branch_Id, Visit_Date, Visit_Order) as (
SELECT Patient_Id, Hospital_Id, BranchId, Visit_Date,
ROW_NUMBER() OVER(PARTITION BY Patient_ID, Hospital_Id, Branch_Id,
ORDER_BY Patient_Id, Hospital_Id, Branch_Id, Visit_Date)
FROM Hospital_Visits
WHERE Procedure_Id = 'P1'),
Hospital_Recent_Visits (Hospital_Id, Recent_Visitor_Count) as (
SELECT a.Hospital_Id, COUNT(DISTINCT a.Patient_Id)
FROM Patient_Visits as a
JOIN Patient_Visits as b
ON b.Hospital_Id = a.Hospital_Id
AND b.Branch_Id = a.Branch_Id
AND b.Patient_Id = a.Patient_Id
AND b.Visit_Order = a.Visit_Order - 1
AND b.Visit_Date + 10 > a.Visit_Date
GROUP BY a.Hospital_Id, a.Patient_Id),
Hospital_Patient_Count (Hospital_Id, Patient_Count) as (
SELECT Hospital_Id, COUNT(DISTINCT Patient_Id)
FROM Hospital_Visits
GROUP BY Hospital_Id, Patient_Id)
SELECT a.Hospital_Id, b.Patient_Count, c.Recent_Visitor_Count
FROM Hospitals as a
LEFT JOIN Hospital_Patient_Count as b
ON b.Hospital_Id = a.Hospital_Id
LEFT JOIN Hospital_Recent_Visits as c
ON c.Hospital_id = a.Hospital_Id
Please note that this was written and tested against a DB2 system. I think Oracle databases have the relevant functionality, so the query should still work as written. However, DB2 appears to lack some of the OLAP functions Oracle has (my version, at least), which could be useful in knocking out some of the CTEs.