Multiple joins with aggregates - sql

I have the two following tables:
Person:
EntityId FirstName LastName
----------- ------------------ -----------------
1 Ion Ionel
2 Fane Fanel
3 George Georgel
4 Mircea Mircel
SalesQuotaHistory
SalesQuotaId EntityId SalesQuota SalesOrderDate
------------ ----------- ----------- -----------------------
1 1 1000 2014-01-01 00:00:00.000
2 1 1000 2014-01-02 00:00:00.000
3 1 1000 2014-01-03 00:00:00.000
4 3 3000 2013-01-01 00:00:00.000
5 3 3000 2013-01-01 00:00:00.000
7 4 4000 2015-01-01 00:00:00.000
8 4 4000 2015-01-02 00:00:00.000
9 4 4000 2015-01-03 00:00:00.000
10 1 1000 2015-01-01 00:00:00.000
11 1 1000 2015-01-02 00:00:00.000
I am trying to get the SalesQuota for each user in 2014 and 2015.
Using this query i am getting an erroneous result:
SELECT p.EntityId
, p.FirstName
, SUM(sqh2014.SalesQuota) AS '2014'
, SUM(sqh2015.SalesQuota) AS '2015'
FROM Person p
LEFT OUTER JOIN SalesQuotaHistory sqh2014
ON p.EntityId = sqh2014.EntityId
AND YEAR(sqh2014.SalesOrderDate) = 2014
LEFT OUTER JOIN SalesQuotaHistory sqh2015
ON p.EntityId = sqh2015.EntityId
AND YEAR(sqh2015.SalesOrderDate) = 2015
GROUP BY p.EntityId, p.FirstName
EntityId FirstName 2014 2015
--------- ----------- ---------- --------------------
1 Ion 6000 6000
2 Fane NULL NULL
3 George NULL NULL
4 Mircea NULL 12000
In fact, Id 1 has a total SalesQuota of 3000 in 2014 and 2000 in 2015.
What i am asking here, is .. what is really happening behind the scenes? What is the order of operation in this specific case?
Thanks to my last post i was able to solve this using the following query:
SELECT p.EntityId
, p.FirstName
, SUM(CASE WHEN YEAR(sqh.SalesOrderDate) = 2014 THEN sqh.SalesQuota ELSE 0 END) AS '2014'
, SUM(CASE WHEN YEAR(sqh.SalesOrderDate) = 2015 THEN sqh.SalesQuota ELSE 0 END) AS '2015'
FROM Person p
LEFT OUTER JOIN SalesQuotaHistory sqh
ON p.EntityId = sqh.EntityId
GROUP BY p.EntityId, p.FirstName
EntityId FirstName 2014 2015
----------- --------------------- ----------- -----------
1 Ion 3000 2000
2 Fane 0 0
3 George 0 0
4 Mircea 0 12000
but without understanding what's wrong with the first attempt .. i can't get over this ..
Any explanation would be greatly appreciated.

Is easy to see what is happening if you change your select to
SELECT *
and remove the group by
You first approach need something like this
Sql Fiddle Demo
SELECT p.[EntityId]
, p.FirstName
, COALESCE(s2014,0) as [2014]
, COALESCE(s2015,0) as [2015]
FROM Person p
LEFT JOIN (SELECT EntityId, SUM(SalesQuota) s2014
FROM SalesQuotaHistory
WHERE YEAR(SalesOrderDate) = 2014
GROUP BY EntityId
) as s1
ON p.[EntityId] = s1.EntityId
LEFT JOIN (SELECT EntityId, SUM(SalesQuota) s2015
FROM SalesQuotaHistory
WHERE YEAR(SalesOrderDate) = 2015
GROUP BY EntityId
) as s2
ON p.[EntityId] = s2.EntityId
Joining with the result data only if exist for that id and year.
OUTPUT
| EntityId | FirstName | 2014 | 2015 |
|----------|-----------|------|-------|
| 1 | Ion | 3000 | 2000 |
| 2 | Fane | 0 | 0 |
| 3 | George | 0 | 0 |
| 4 | Mircea | 0 | 12000 |

You have multiple rows for each year, so the first method is producing a Cartesian product.
For instance, consider EntityId 100:
1 1 1000 2014-01-01 00:00:00.000
2 1 1000 2014-01-02 00:00:00.000
3 1 1000 2014-01-03 00:00:00.000
10 1 1000 2015-01-01 00:00:00.000
11 1 1000 2015-01-02 00:00:00.000
The intermediate result from the join produces six rows, with these SalesQuotaId:
1 10
1 11
2 10
2 11
3 10
3 11
You can then do the math -- the result is off because of the multiple rows.
You seem to know how to fix the problem. The conditional aggregation approach produces the correct answer.

You could improve the speed of your query by adding a WHERE condition to filter only the years over which you're looking for data:
SELECT p.EntityId
, p.FirstName
, SUM(CASE WHEN YEAR(sqh.SalesOrderDate) = 2014
THEN sqh.SalesQuota ELSE 0 END) AS '2014'
, SUM(CASE WHEN YEAR(sqh.SalesOrderDate) = 2015
THEN sqh.SalesQuota ELSE 0 END) AS '2015'
FROM Person p
LEFT OUTER JOIN SalesQuotaHistory sqh
ON p.EntityId = sqh.EntityId
WHERE YEAR(sqh.SalesOrderDate) IN (2014, 2015)
GROUP BY p.EntityId, p.FirstName
Otherwise, the query that you found is the way to go (good job!)

Related

Select max date for each register, null if does not exists

I have these tables: Employee (id, name, number), Configuration (id, years, licence_days), Periods (id, start_date, end_date, configuration_id, employee_id, period_type):
Employee table:
id name number
---- ----- -------
1 Bob 355
2 John 467
3 Maria 568
4 Josh 871
configuration table:
id years licence_days
---- ----- ------------
1 1 8
2 3 16
3 5 24
Periods table:
id start_date end_date configuration_id employee_id period_type
---- ---------- ------- ---------------- ----------- -----------
1 2021-05-23 2021-05-31 1 1 vaccation
2 2021-05-24 2021-06-01 1 2 vaccation
3 2021-03-01 2021-03-17 2 2 vaccation
4 2021-05-05 2021-05-21 2 2 vaccation
5 2021-01-01 2021-01-17 2 4 vaccation
I want this result:
Result:
employee_id years licence_days max(end_date)
1 1 8 2021-05-31
1 3 16 null
1 5 24 null
2 1 8 2021-06-01
2 3 16 2021-05-21
2 5 24 null
3 1 8 null
3 3 16 null
3 5 24 null
4 1 8 null
4 3 16 2021-01-17
4 5 24 null
i.e., I want to select all Employees with all configuration, and for each one of that, the max end_date of the "vaccation" type (or null if it does not exists).
How can I do that
Oracle supports cross joins, right? So may be something like that?
SELECT e.employee_id, c.years, c.licence_days, max(p.end_date)
FROM Employee e
CROSS JOIN configuration c
LEFT JOIN Periods p
ON e.employee_id = p.employee_id
AND c.configuration_id = p.configuration_id
GROUP BY e.employee_id, c.years, c.licence_days
ORDER BY e.employee_id, c.years
#umberto-petrov chooses wisely with the ANSI CROSS JOIN syntax for a cartesian join. However, in the very weak probability that your requires output of configurations even where there is no employees, you can go with something like :
EDIT: Filtering the Periods join with 'vaccation' as asked in the comments.
If you have to filter for some employee ids, change ON 1 = 1 by ON Employee.id IN (id1, id2, ...). It still keeps every configurations but only takes employees that match the ids.
SELECT Employee.employee_id,
Configuration.years,
Configuration.licence_days,
MAX(Configuration.end_date) max_end_date
FROM Configuration LEFT JOIN Employee ON 1 = 1
LEFT JOIN Periods ON Periods.configuration_id = Configuration.id
AND Periods.employee_id = Employee.id
AND Periods.period_type = 'vaccation'
GROUP BY Employee.employee_id,
Configuration.years,
Configuration.licence_days
ORDER BY Employee.employee_id,
Configuration.years,
Configuration.licence_days
We start from configuration to take every records from this one at least, then made a LEFT CARTESIAN JOIN with Employee and finally a full LET JOIN on Periods for both. That way , if there is no employees, this will output configuration_id and NULL for years, licence_days and max end_date.

Selecting latest entry ordered by group, adding a user defined column for the latest one

I currently have two SQL tables:
Employees
ID Created RecordID Status
------ ----------------------- ------------ ----------
1 2020-07-29 12:38:54.070 1 1
2 2020-08-03 14:28:59.803 1 1
3 2020-08-04 13:47:49.427 2 0
and
Payments
ID EmployeeID Amount
------ -------------- ------------
1 1 1022
2 1 1090
3 1 2105
4 2 1112
5 2 1450
6 2 2923
7 3 1064
How do I add another (user defined) column called Latest to the following query, that can check if the row corresponds to the Employee RecordID number with the latest Created, and returns a string true or false value for that row? In other words, a row should be Latest: true if its Created has the latest timestamp for all rows corresponding to that RecordID number.
Here's my SQL so far:
SELECT e.ID, e.Created, e.Status, p.EmployeeID, p.Amount
FROM Employees AS e
JOIN Payments AS p
ON e.ID = p.EmployeeID
Here are the expected results:
ID Created Status EmployeeID Amount Latest
------ --------------------------- ---------- -------------- ---------- ----------
1 2020-07-29 12:38:54.070 1 1 1022 false
1 2020-07-29 12:38:54.070 1 1 1090 false
1 2020-07-29 12:38:54.070 1 1 2105 false
2 2020-08-03 14:28:59.803 1 2 1112 true
2 2020-08-03 14:28:59.803 1 2 1450 true
2 2020-08-03 14:28:59.803 1 2 2923 true
3 2020-08-04 13:47:49.427 0 3 1064 true
Thanks!
;WITH x AS
(
SELECT e.ID, e.Created, e.Status, p.EmployeeID, p.Amount,
Latest = CASE DENSE_RANK() OVER (PARTITION BY e.RecordID ORDER BY e.Created DESC)
WHEN 1 THEN 'true' ELSE 'false' END
FROM dbo.Employees AS e
JOIN dbo.Payments AS p
ON e.ID = p.EmployeeID
)
SELECT * FROM x ORDER BY ID;

simple sql over (partition by) not working as expected

Feels like it should be simple but my mind has gone blank so would appreciate any help!
Let's say I have this dataset
Date sale_id salesperson Missed_payment_this_month
01/01/2016 1001 John 1
01/01/2016 1002 Bob 0
01/01/2016 1003 Bob 0
01/01/2016 1004 John N/A
01/02/2016 1001 John 1
01/02/2016 1002 Bob 1
01/02/2016 1003 Bob 0
01/02/2016 1004 John 1
01/03/2016 1001 John 1
01/03/2016 1002 Bob 0
01/03/2016 1003 Bob 0
01/03/2016 1004 John 1
And want to add these two columns to the end. They look at the number of missed payments previously, by sales_id and salesperson.
Previous_missed_payment_by_sale_id Previous_missed_payment_by_sales person
0 0
0 0
0 0
0 0
1 1
0 0
0 0
0 1
2 3
1 1
0 1
1 3
sales_id is ok but getting it over sales persons is giving me an error (group by) or adding in extra columns. I need to keep the rows constant.
My best guess that returns extra columns:
select t1.Date, t1.sale_id, t1.salesperson
,sum(case when t2.Missed_payment_this_month = '1' then 1 else 0 end) previous_missed_sales_id
,sum(case when t2.Missed_payment_this_month = '1' then 1 else 0 end) OVER (PARTITION by t1.salesperson) previous_missed_salesperson
from [dbo].[simple_join_table2] t1
inner join [dbo].[simple_join_table2] t2 on
(t2.[Date] < t1.[Date] AND t1.[sale_id] = t2.[sale_id])
group by t1.Date, t1.sale_id, t1.salesperson
,case when t2.Missed_payment_this_month = '1' then 1 else 0 end
this is the output:
Date sale_id salesperson previous_missed_sales_id previous_missed_salesperson
01/02/2016 1002 Bob 0 1
01/02/2016 1003 Bob 0 1
01/03/2016 1002 Bob 0 1
01/03/2016 1002 Bob 1 1
01/03/2016 1003 Bob 0 1
01/02/2016 1001 John 1 3
01/02/2016 1004 John 0 3
01/03/2016 1001 John 2 3
01/03/2016 1004 John 0 3
01/03/2016 1004 John 1 3
Is this possible without another sub query? I guess another way to put it is i'm trying to mimic the sumx and earlier functions of Powerpivot.
If you are on 2012+ use windowing aggregates. Previous = sum all_previous_including_curret - sum current. Ms sql default window is exactly ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
with [simple_join_table2] as(
-- sample data
select cast(valuesDate as Date) valuesDate, sale_id, salesperson, Missed_payment_this_month
from (
values
('20160101',1001,'John', 1)
,('20160101',1002,'Bob ', 0)
,('20160101',1003,'Bob ', 0)
,('20160101',1004,'John',null)
,('20160201',1001,'John', 1)
,('20160201',1002,'Bob ', 1)
,('20160201',1003,'Bob ', 0)
,('20160201',1004,'John', 1)
,('20160301',1001,'John', 1)
,('20160301',1002,'Bob ', 0)
,('20160301',1003,'Bob ', 0)
,('20160301',1004,'John', 1)
) t(valuesDate, sale_id, salesperson, Missed_payment_this_month)
)
select valuesDate,sale_id, salesperson, Missed_payment_this_month,
byidprevmonth = sum(Missed_payment_this_month ) over(partition by sale_id order by valuesDate)
- sum(Missed_payment_this_month) over(partition by valuesDate, sale_id),
bypersonprevmonth = sum(Missed_payment_this_month) over(partition by salesperson order by valuesDate)
- sum(Missed_payment_this_month) over(partition by valuesDate, salesperson)
from [simple_join_table2]
order by salesperson, valuesDate

Oracle query stumped - derived table

It's been a long time since I've done more than the most basic sql queries. But I ran into this one today and have spent a few hours on it and am stuck with my derived table attempt (this is for an Oracle db). Looking for a few tips. Thx.
TABLE: dtree
DataID Name
-------------
10001 A.doc
10002 B.doc
10003 C.doc
10004 D.doc
TABLE: collections
CollectionID DataID
---------------------
201 10001
201 10002
202 10003
203 10004
TABLE: rimsNodeClassification
DataID RimsSubject RimsRSI Status
---------------------------------------
10001 blah IS-03 Active
10002 blah LE-01 Active
10003 blah AD-02 Active
10004 blah AD-03 Active
TABLE: rsiEventSched
RimsRSI RetStage DateToUse RetYears
--------------------------------------
IS-03 SEM-PHYS 95 1
IS-03 ACT NULL 2
LE-01 SEM-PHYS 94 1
LE-01 INA-PHYS 95 2
LE-01 ACT NULL NULL
LE-01 OFC NULL NULL
LE-02 SEM-PHYS 94 2
Trying to query on CollectionID=201
INTENDED RESULT:
DataID Name RimsRSI Status SEMPHYS_DateToUse INAPHYS_DateToUse SEMPHYS_RetYears INAPHYS_RetYears
-------------------------------------------------------------------------------------------------------
10001 A.doc IS-03 Active 95 null 1 null
10002 B.doc Le-01 Active 94 95 1 2
You don't need a Derived Table, just join the tables (the last using a Left join) and then apply a MAX(CASE) aggregation:
select c.DataID, t.Name, rnc.RimsRSI, rnc.Status,
max(case when res.RetStage = 'SEM-PHYS' then res.DateToUse end) SEMPHYS_DateToUse,
max(case when res.RetStage = 'INA-PHYS' then res.DateToUse end) INAPHYS_DateToUse,
max(case when res.RetStage = 'SEM-PHYS' then res.RetYears end) SEMPHYS_RetYears,
max(case when res.RetStage = 'INA-PHYS' then res.RetYears end) INAPHYS_RetYears
from collections c
join dtree t
on c.DataID = t.DataID
join rimsNodeClassification rnc
on c.DataID = rnc.DataID
left join rsiEventSched res
on rnc.RimsRSI = res.RimsRSI
where c.CollectionID= 201
group by c.DataID, t.Name, rnc.RimsRSI, rnc.Status

How to filter first appearance in table only

Here is the table structure:
tblApplicants:
applicantID (index) | ApplyingForYear (nvarchar)
------------------------------------------------------
1 2013/14
11 2013/14
13 2013/14
12 2013/14
15 2013/14
21 2012/13
tblApplicantSchools_shadow:
id (index) | applicantID | updated (datetime) | statusID (int) | schoolID (int)
-----------------------------------------------------------------------------------------------------
1 11 2012-09-24 00:00:00.000 3 2
1 13 2012-10-24 00:00:00.000 4 2
2 15 2012-11-24 00:00:00.000 3 4
3 13 2012-03-24 00:00:00.000 4 3
4 12 2012-09-24 00:00:00.000 4 1
5 21 2012-11-03 00:00:00.000 5 2
6 11 2012-09-04 00:00:00.000 4 4
What I need to do is:
get all applicants, that have an ApplyingForYear of '2013/14' in tblApplicants
have a statusID of 4
I only want to count them once - even if they appear twice or more in tblApplicantschools_show
group the number of distinct applicants (as per the above) - by the updated date column (grouped by week)
So based on the sample data above, there should be 3 rows that come out, (because ApplicantID 13 appears twice and I only want him once).
This is how the result should look:
Datesubmitted TotalAppsPerWeek
-------------------------------------------------------
2012-10-24 00:00:00.000 1
2012-09-24 00:00:00.000 1
2012-09-04 00:00:00.000 1
This is what I have so far - but it results in 4 rows, not 3 :(
select
DATEADD(ww,(DATEDIFF(ww,0,[tblApplicantSchools_shadow].updated)),0) AS Datesubmitted,
count(DISTINCT [tblApplicantSchools_shadow].applicantID) as TotalAppsPerWeek
FROM tblApplicants
INNER JOIN tblApplicantSchools_shadow
ON tblApplicantS.ApplicantID = tblApplicantSchools_shadow.applicantID
WHERE
ApplyingForYear = '2013/14'
AND [tblApplicantSchools_shadow].statusID = 4
GROUP BY
DATEADD(ww, (DATEDIFF(ww, 0, [tblApplicantSchools_shadow].updated)), 0)
And here is a Fiddle: http://sqlfiddle.com/#!3/3aa61/42
From your title, I'm assuming the one row you want from each applicant is the one with the smallest id. You can select one row per applicant ID with the ROW_NUMBER() function:
;with latestApplication AS
(
SELECT DATEADD(ww,(DATEDIFF(ww,0,[tblApplicantSchools_shadow].updated)),0)
AS Datesubmitted,
[tblApplicantSchools_shadow].applicantID,
ROW_NUMBER() OVER (PARTITION BY [tblApplicantSchools_shadow].applicantID
ORDER BY [tblApplicantSchools_shadow].id)
AS rn
FROM tblApplicants
INNER JOIN tblApplicantSchools_shadow
ON tblApplicantS.ApplicantID = tblApplicantSchools_shadow.applicantID
WHERE ApplyingForYear = '2013/14'
AND [tblApplicantSchools_shadow].statusID = 4
)
select Datesubmitted, COUNT(1) AS TotalAppsPerWeek
FROM latestApplication
WHERE rn = 1
group by Datesubmitted
order by Datesubmitted DESC
http://sqlfiddle.com/#!3/3aa61/57