How to optimize SQL Server query - sql

I am copying data from one table to another table. While copying I am doing some calculation to modify one column.
SQL Server query:
INSERT INTO rat_proj_duration_map_2
SELECT
r.*,
r.hour_val / (CASE
WHEN week_val = 1 AND
(SELECT TOP 1
hrswk
FROM UserProfileRATinterface_view us
INNER JOIN users u
ON u.username = us.username
WHERE calwk = 2
AND r.uid = u.uid
AND yr = 2016)
> 0 THEN (SELECT TOP 1
hrswk
FROM UserProfileRATinterface_view us
INNER JOIN users u
ON u.username = us.username
WHERE calwk = 2
AND r.uid = u.uid
AND yr = 2016)
WHEN (SELECT
hrswk
FROM UserProfileRATinterface_view us
INNER JOIN users u
ON u.username = us.username
WHERE r.week_val = us.calwk
AND r.uid = u.uid
AND yr = 2016)
< 1 AND
(SELECT
MAX(hrswk)
FROM UserProfileRATinterface_view us
INNER JOIN users u
ON u.username = us.username
WHERE r.uid = u.uid
AND yr = 2016)
> 0 THEN (SELECT
MAX(hrswk)
FROM UserProfileRATinterface_view us
INNER JOIN users u
ON u.username = us.username
WHERE r.uid = u.uid
AND yr = 2016)
WHEN (SELECT
COUNT(*)
FROM UserProfileRATinterface_view us
INNER JOIN users u
ON u.username = us.username
WHERE r.uid = u.uid
AND yr = 2016)
<= 0 THEN 1
ELSE (SELECT
hrswk
FROM UserProfileRATinterface_view us
INNER JOIN users u
ON u.username = us.username
WHERE r.week_val = us.calwk
AND r.uid = u.uid
AND yr = 2016)
END) * 100 AS percentage_val
FROM rat_proj_duration_map r
When I run this query I getting time out issue.
TCP Provider: Timeout error [258]
SQL Server is not in my hand to increase time out value.
Is it possible to optimize my SQL query?

Are you sure this query is logically correct? You have several TOP 1s without specific ORDER BY, scalar comparison of subselect without TOP (which, I assume, may return more than one row if you are using top in other subselects with same source).
And yes - this query can be optimized. You can obtain all the values you need with a single subselect statement and avoid multiple execution of same subselects for each row of rat_proj_duration_map which you are having now:
INSERT INTO rat_proj_duration_map_2
SELECT
r.*,
r.hour_val / (CASE
WHEN week_val = 1 AND us.min_hrswk_2 > 0
THEN us.min_hrswk_2
WHEN us.min_hrswk_week_val <1
AND max_hrswk > 0
THEN max_hrswk
WHEN us.cnt <= 0
THEN 1
ELSE min_hrswk_week_val
END) * 100 as percentage_val
FROM
rat_proj_duration_map r
OUTER APPLY
(
SELECT
count(*) as cnt,
MIN(CASE WHEN calcw = 2 THEN hrswk END) as min_hrswk_2,
MIN(CASE WHEN calcw = r.week_val THEN hrswk END) as min_hrswk_week_val,
MAX(hrswk) as max_hrswk
FROM UserProfileRATinterface_view us
inner join users u on u.username=us.username
WHERE r.uid=u.uid and yr=2016
) us
But I can't be sure if original logic is correct. And the idea of that case to me looks like this:
...
r.hour_val / COALESCE(NULLIF(us.min_hrswk_2, 0),
NULLIF(us.min_hrswk_week_val, 0), NULLIF(max_hrswk, 0), 1)
...

The subqueries in your case clause seem to be essentially the same. You could simplify the whole command by defining a grouped version (... where yr=2016 group by u.uid) of this subquery (preferrably as a common table expression) and then work with that. This could potentially save a lot of redundant operations.
The following might work (have not tested it):
;WITH usrall as (
SELECT u.uid ui, hrswk hw, r.week wk, us.calwk cw
FROM UserProfileRATinterface_view us
INNER JOIN users u on u.username=us.username
WHERE r.uid=u.uid and yr=2016
), usrgrp as (
SELECT ui gui, MAX(hrswk) ghw, count(*) gcnt FROM usrall group by ui
), denom as (
SELECT gui dui, COALESCE( MAX(w2.hw), MAX(wkwc.hw), MAX(gwh) ) dnm
FROM usrgrp
LEFT JOIN usrall w2 ON w2.ui=gui AND w2.cw=2 AND w2.hw>0
LEFT JOIN usrall wkcw ON wkcw.ui=gui AND wkcw.wk=wkcw.cw AND wkwc.hw<1
GROUP BY gui
)
SELECT r.*, r.hour_val / d.dnm
FROM rat_proj_duration_map r
INNER JOIN denom d ON d.dui=u.uid
Essentially I have tried (I hope it works :-/) to replace the case construct by a COALESCE() function that checks the three possible calculated values one after the other. The first non-null value is accepted.
As I said: I have not tested it. Good luck

Related

SQL code in Access says error in FROM clause

SELECT
p.Name,
p.Age,
MAX(COUNT(m.winTeam_ID) / (COUNT(m.winTeam_ID) + COUNT(m.lossTeam_ID)))
FROM Players AS p
INNER JOIN Teams AS t
ON t.ID = p.Team_ID
INNER JOIN Matches AS m
ON m.Team_ID = t.ID
GROUP BY
p.Name,
p.Age;
I can suggest the following query:
SELECT TOP 1
p.Name,
p.Age,
COUNT(m.winTeam_ID) / (COUNT(m.winTeam_ID) + COUNT(m.lossTeam_ID))
FROM (Players AS p
INNER JOIN Teams AS t
ON t.ID = p.Team_ID)
INNER JOIN Matches AS m
ON m.Team_ID = t.ID
GROUP BY
p.Name,
p.Age
ORDER BY
COUNT(m.winTeam_ID) / (COUNT(m.winTeam_ID) + COUNT(m.lossTeam_ID)) DESC;
This fixes the syntax problem with your joins. It also interprets the MAX as meaning that you want the record with the maximum ratio of counts. In this case, we can use TOP 1 along with ORDER BY to identify this max record.
MS Access requires strange parentheses when you have more than one join. In addition, MAX(COUNT(m.winTeam_ID)) doesn't make sense. I don't know what you are trying to calculate in the SELECT. Perhaps this does what you want:
SELECT p.Name, p.Age,
COUNT(m.winTeam_ID) / (COUNT(m.winTeam_ID) + COUNT(m.lossTeam_ID)))
FROM (Players AS p INNER JOIN
Teams AS t
ON t.ID = p.Team_ID
) INNER JOIN
Matches AS m
ON m.Team_ID = t.ID
GROUP BY p.Name, p.Age;
I think your Matches table shouldn't have a Team_ID And instead you have winTeam_ID and lossTeam_ID!
And also you want to query players of a team with - something like - the best win-rate.
If so, Use a query like this - tested on SQL Server only -:
select
p.Age, p.Name, ts.rate
from
Players p
join
(select top(1) -- sub-query will return just first record
t.ID
, sum(case when (t.ID = winTeam_ID) then 1 else 0 end) as wins
, sum(case when (t.ID = lossTeam_ID) then 1 else 0 end) as losses
, sum(case when (t.ID = winTeam_ID) then 1.0 else 0.0 end) /
(sum(case when (t.ID = winTeam_ID) then 1.0 else 0.0 end) + sum(case when (t.ID = lossTeam_ID) then 1.0 else 0.0 end)) as rate
from Teams as t
left join Matches as m
on t.ID = m.winTeam_ID
or t.ID = lossTeam_ID
group by t.ID
order by rate desc -- This will make max rate as first
) as ts -- Team stats calculated in this sub-query
on p.Team_ID = ts.ID;

Count (distinct) not working for multiple inner join in SQL Server 2016

I have below query with count(distinct u.uid) and when I run below query its go on execution state, if I run count(1) then I get a result of 236. I don't understand why count(distinct u.uid) is not returning a result.
Please note this query is running in other environment of SQL Server not sure why not working in SQL Server 2016 (I am not sure if it is specific to 2016 environment)
SELECT
COUNT(distinct u.uid) AS NOOFROWS
FROM
ABC u
INNER JOIN
(SELECT uemail
FROM ABC
GROUP BY uemail
HAVING COUNT(1) = 1) AS u2 ON u.uemail = u2.uemail
INNER JOIN
PQR on u.uid = PQR.uid
INNER JOIN
XYZ p ON u.uid = p.uid
I'm rewriting the query so I can read it better:
SELECT COUNT(distinct u.uid) AS NOOFROWS
FROM ABC u INNER JOIN
(SELECT uemail
FROM ABC
GROUP BY uemail
HAVING COUNT(1) = 1
) u2
ON u.uemail = u2.uemail INNER JOIN
PQR
ON u.uid = PQR.uid INNER JOIN
XYZ p
ON u.uid = p.uid ;
I interpreted the question as the query giving 0, but that is impossible that COUNT(1) returns 236 but COUNT(DISTINCT u.uid) returns 0 (unless there is a bug in SQL Server). COUNT(DISTINCT) returns 0 only if all the values of u.uid are NULL. Because u.uid is used in an INNER JOIN, it cannot ever be NULL. You could get 1, if all 236 rows had the same value, but you cannot get 0.
So, perhaps you mean that the query does not return. If that is the case, then you can use explain to see why the two execution plans are different.
Check below inner query whether it is returning rows or not ?
SELECT uemail
FROM ABC
GROUP BY uemail
HAVING COUNT(1) = 1

COUNT Multiple tables with relationships and users

I've got a group of tables in SQL Server that I need to count on with specific counting criteria for each table, the problem I'm having that I need to group this by users, which are contained in one table and are mapped to the tables I need to count on via a relationship table.
This is the relationship table
Relationship User Work Item
Analyst 1 IR1
Analyst 2 IR2
Analyst 2 IR3
Analyst 1 IR4
User 3 IR1
Analyst 1 SR2
Analyst 1 SR3
Analyst 2 SR4
This is the IR table (the SR table is identical)
ID Status
IR1 Active
IR2 Active
IR3 Closed
IR4 Active
This is the user table
User Name
1 Dave
2 Jim
3 Karl
What I need is a table like below counting only the active items
Name IR Count SR Count
Dave 2 2
Jim 1 1
All I seem to be able to do currently is count all of the users regardless of status, I think this may be due to the left joins. I basically had:
Select u.name,
count (ir),
count (sr) from user u
Inner join relationship r on r.user=u.user and r.relationship = 'Analyst'
Left Join IR on r.workitem=ir.id and ir.Status = 'Active'
Left Join SR on r.workitem=sr.id and sr.Status = 'Active'
Group by u.name
I have simplified the above as much as possible. This is the actual query:
SELECT
u.DisplayName as Analyst,
u.BaseManagedEntityId as AUsername,
COUNT(distinct i.Id_9A505725_E2F2_447F_271B_9B9F4F0D190C) AS 'Active Incidents',
COUNT(distinct sr.Id_9A505725_E2F2_447F_271B_9B9F4F0D190C) as 'Active Service Requests',
COUNT(distinct cr.Id_9A505725_E2F2_447F_271B_9B9F4F0D190C) as 'Active Change Requests',
COUNT(distinct ma.Id_9A505725_E2F2_447F_271B_9B9F4F0D190C) as 'Active Manual Activities'
FROM MTV_System$Domain$User u
INNER JOIN RelationshipView r ON r.TargetEntityId = u.BaseManagedEntityId AND r.RelationshipTypeId = '15E577A3-6BF9-6713-4EAC-BA5A5B7C4722' AND r.IsDeleted ='0'
LEFT JOIN MTV_System$WorkItem$Incident i ON r.SourceEntityId = i.BaseManagedEntityId AND (i.Status_785407A9_729D_3A74_A383_575DB0CD50ED != '2B8830B6-59F0-F574-9C2A-F4B4682F1681' AND i.Status_785407A9_729D_3A74_A383_575DB0CD50ED != 'BD0AE7C4-3315-2EB3-7933-82DFC482DBAF')
LEFT JOIN MTV_System$WorkItem$ServiceRequest sr ON r.SourceEntityId = SR.BaseManagedEntityId AND (sr.Status_6DBB4A46_48F2_4D89_CBF6_215182E99E0F = '72B55E17-1C7D-B34C-53AE-F61F8732E425' OR sr.Status_6DBB4A46_48F2_4D89_CBF6_215182E99E0F = '59393F48-D85F-FA6D-2EBE-DCFF395D7ED1' OR sr.Status_6DBB4A46_48F2_4D89_CBF6_215182E99E0F = '05306BF5-A6B9-B5AD-326B-BA4E9724BF37')
LEFT JOIN MTV_System$WorkItem$ChangeRequest cr on r.SourceEntityId = cr.BaseManagedEntityId AND (cr.Status_72C1BC70_443C_C96F_A624_A94F1C857138 = '6D6C64DD-07AC-AAF5-F812-6A7CCEB5154D' or cr.Status_72C1BC70_443C_C96F_A624_A94F1C857138 = 'DD6B0870-BCEA-1520-993D-9F1337E39D4D')
LEFT JOIN MTV_System$WorkItem$Activity$ManualActivity MA on r.SourceEntityId = ma.BaseManagedEntityId AND (ma.Status_8895EC8D_2CBF_0D9D_E8EC_524DEFA00014 = '11FC3CEF-15E5-BCA4-DEE0-9C1155EC8D83' OR ma.Status_8895EC8D_2CBF_0D9D_E8EC_524DEFA00014 = 'D544258F-24DA-1CF3-C230-B057AAA66BED')
GROUP BY u.DisplayName,u.BaseManagedEntityId
Order by u.DisplayName
your over simplification seems to have lost your what appears to be your issue from some of your comments. Using left joins would include any of the users even if they don't have a count of one of your other tables. However if you want your result set to only include usres that have at least 1 Incident and/or 1 Request and/or 1 Change Request. Etc. you can either filter the aggretation you are doing after the fact to remove when Incidents + Requests + ... = 0. Or you can filter them out by adding a WHERE statement that says WHEN NOT all of those other tables are null which is the same as OR IS NOT NULL...
SELECT
u.DisplayName as Analyst,
u.BaseManagedEntityId as AUsername,
COUNT(distinct i.Id_9A505725_E2F2_447F_271B_9B9F4F0D190C) AS 'Active Incidents',
COUNT(distinct sr.Id_9A505725_E2F2_447F_271B_9B9F4F0D190C) as 'Active Service Requests',
COUNT(distinct cr.Id_9A505725_E2F2_447F_271B_9B9F4F0D190C) as 'Active Change Requests',
COUNT(distinct ma.Id_9A505725_E2F2_447F_271B_9B9F4F0D190C) as 'Active Manual Activities'
FROM
MTV_System$Domain$User u
INNER JOIN RelationshipView r
ON r.TargetEntityId = u.BaseManagedEntityId
AND r.RelationshipTypeId = '15E577A3-6BF9-6713-4EAC-BA5A5B7C4722'
AND r.IsDeleted ='0'
LEFT JOIN MTV_System$WorkItem$Incident i
ON r.SourceEntityId = i.BaseManagedEntityId
AND i.Status_785407A9_729D_3A74_A383_575DB0CD50ED NOT IN ('2B8830B6-59F0-F574-9C2A-F4B4682F1681','BD0AE7C4-3315-2EB3-7933-82DFC482DBAF')
LEFT JOIN MTV_System$WorkItem$ServiceRequest sr
ON r.SourceEntityId = SR.BaseManagedEntityId
AND sr.Status_6DBB4A46_48F2_4D89_CBF6_215182E99E0F IN ('72B55E17-1C7D-B34C-53AE-F61F8732E425','59393F48-D85F-FA6D-2EBE-DCFF395D7ED1','05306BF5-A6B9-B5AD-326B-BA4E9724BF37')
LEFT JOIN MTV_System$WorkItem$ChangeRequest cr
ON r.SourceEntityId = cr.BaseManagedEntityId
AND cr.Status_72C1BC70_443C_C96F_A624_A94F1C857138 IN ('6D6C64DD-07AC-AAF5-F812-6A7CCEB5154D','DD6B0870-BCEA-1520-993D-9F1337E39D4D')
LEFT JOIN MTV_System$WorkItem$Activity$ManualActivity MA
ON r.SourceEntityId = ma.BaseManagedEntityId
AND ma.Status_8895EC8D_2CBF_0D9D_E8EC_524DEFA00014 IN ('11FC3CEF-15E5-BCA4-DEE0-9C1155EC8D83','D544258F-24DA-1CF3-C230-B057AAA66BED')
WHERE
i.Id_9A505725_E2F2_447F_271B_9B9F4F0D190C IS NOT NULL
OR sr.Id_9A505725_E2F2_447F_271B_9B9F4F0D190C IS NOT NULL
OR cr.Id_9A505725_E2F2_447F_271B_9B9F4F0D190C IS NOT NULL
OR ma.Id_9A505725_E2F2_447F_271B_9B9F4F0D190C IS NOT NULL
GROUP BY u.DisplayName,u.BaseManagedEntityId
Order by u.DisplayName
Also note the user of IN and NOT IN in the join conditions instead of OR all of the time.
The issue is count(). Instead, you want count(distinct):
Select u.name,
count(distinct ir),
count(distinct sr)
from user u
. . .
This generates a Cartesian product between the ir and sr values. If you have largish numbers in each group, then aggregating before the join is a better approach.
;WITH T AS
(
SELECT
U.UserID,
U.Name,
CASE WHEN R.WorkItem LIKE 'IR%' THEN R.WorkItem ELSE '' END AS IRCode,
CASE WHEN R.WorkItem LIKE 'SR%' THEN R.WorkItem ELSE '' END AS SRCode
FROM #tblUser U
INNER JOIN #tblRelationship R ON U.UserId=R.UserId
)
SELECT
Name,
SUM(CASE IRCode WHEN '' THEN 0 ELSE 1 END) AS 'IR Count',
SUM(CASE SRCode WHEN '' THEN 0 ELSE 1 END) AS 'SR Count'
FROM T
WHERE
(
(T.IRCode=''
OR
T.SRCode='')
AND
(
T.IRCode IN (SELECT ID FROM #tblIR WHERE Status='Active')
OR
T.SRCode IN (SELECT ID FROM #tblSR WHERE Status='Active')
)
)
GROUP BY T.Name
Try with the below script .
SELECT u.name
,SUM(t1.[IR Count]) [IR Count]
,SUM(t2.[SR Count]) [SR Count]
FROM #user u
INNER JOIN #relationship r on r.[user]=u.[user]and r.relationship = 'Analyst'
CROSS APPLY (SELECT COUNT(DISTINCT ID) [IR Count]
FROM #IR ir WHERE r.workitem=ir.id and ir.Status = 'Active') t1
CROSS APPLY (SELECT COUNT(DISTINCT ID) [SR Count]
FROM #SR sr WHERE r.workitem=sr.id and sr.Status = 'Active')t2
GROUP BY u.name
OUTPUT :
SR Count in your sample output is wrong ,if the SR table is same as that of IR table (Status of the SR3 is closed for the user 'dave' so that will be ignored.) .
This is what I've come up with that seems to work based off the actual table
SELECT
u.DisplayName as Analyst,
u.BaseManagedEntityId as AUsername,
COUNT(distinct i.Id_9A505725_E2F2_447F_271B_9B9F4F0D190C) AS 'Active Incidents',
COUNT(distinct sr.Id_9A505725_E2F2_447F_271B_9B9F4F0D190C) as 'Active Service Requests',
COUNT(distinct cr.Id_9A505725_E2F2_447F_271B_9B9F4F0D190C) as 'Active Change Requests',
COUNT(distinct ma.Id_9A505725_E2F2_447F_271B_9B9F4F0D190C) as 'Active Manual Activities'
FROM MTV_System$Domain$User u
LEFT JOIN RelationshipView r ON r.TargetEntityId = u.BaseManagedEntityId AND r.RelationshipTypeId = '15E577A3-6BF9-6713-4EAC-BA5A5B7C4722' AND r.IsDeleted ='0'
LEFT JOIN MTV_System$WorkItem$Incident i ON r.SourceEntityId = i.BaseManagedEntityId
LEFT JOIN MTV_System$WorkItem$ServiceRequest sr ON r.SourceEntityId = SR.BaseManagedEntityId
LEFT JOIN MTV_System$WorkItem$ChangeRequest cr on r.SourceEntityId = cr.BaseManagedEntityId
LEFT JOIN MTV_System$WorkItem$Activity$ManualActivity MA on r.SourceEntityId = ma.BaseManagedEntityId
Where (i.Status_785407A9_729D_3A74_A383_575DB0CD50ED != '2B8830B6-59F0-F574-9C2A-F4B4682F1681' AND i.Status_785407A9_729D_3A74_A383_575DB0CD50ED != 'BD0AE7C4-3315-2EB3-7933-82DFC482DBAF')
OR (sr.Status_6DBB4A46_48F2_4D89_CBF6_215182E99E0F = '72B55E17-1C7D-B34C-53AE-F61F8732E425' OR sr.Status_6DBB4A46_48F2_4D89_CBF6_215182E99E0F = '59393F48-D85F-FA6D-2EBE-DCFF395D7ED1' OR sr.Status_6DBB4A46_48F2_4D89_CBF6_215182E99E0F = '05306BF5-A6B9-B5AD-326B-BA4E9724BF37')
OR (cr.Status_72C1BC70_443C_C96F_A624_A94F1C857138 = '6D6C64DD-07AC-AAF5-F812-6A7CCEB5154D' or cr.Status_72C1BC70_443C_C96F_A624_A94F1C857138 = 'DD6B0870-BCEA-1520-993D-9F1337E39D4D')
OR (ma.Status_8895EC8D_2CBF_0D9D_E8EC_524DEFA00014 = '11FC3CEF-15E5-BCA4-DEE0-9C1155EC8D83' OR ma.Status_8895EC8D_2CBF_0D9D_E8EC_524DEFA00014 = 'D544258F-24DA-1CF3-C230-B057AAA66BED')
GROUP BY u.DisplayName,u.BaseManagedEntityId
Order by u.DisplayName

Remove grouped data set when total of count is zero with subquery

I'm generating a data set that looks like this
category user total
1 jonesa 0
2 jonesa 0
3 jonesa 0
1 smithb 0
2 smithb 0
3 smithb 5
1 brownc 2
2 brownc 3
3 brownc 4
Where a particular user has 0 records in all categories is it possible to remove their rows form the set? If a user has some activity like smithb does, I'd like to keep all of their records. Even the zeroes rows. Not sure how to go about that, I thought a CASE statement may be of some help but I'm not sure, this is pretty complicated for me. Here is my query
SELECT DISTINCT c.category,
u.user_name,
CASE WHEN (
SELECT COUNT(e.entry_id)
FROM category c1
INNER JOIN entry e1
ON c1.category_id = e1.category_id
WHERE c1.category_id = c.category_id
AND e.user_name = u.user_name
AND e1.entered_date >= TO_DATE ('20140625','YYYYMMDD')
AND e1.entered_date <= TO_DATE ('20140731', 'YYYYMMDD')) > 0 -- I know this won't work
THEN 'Yes'
ELSE NULL
END AS TOTAL
FROM user u
INNER JOIN role r
ON u.id = r.user_id
AND r.id IN (1,2),
category c
LEFT JOIN entry e
ON c.category_id = e.category_id
WHERE c.category_id NOT IN (19,20)
I realise the case statement won't work, but it was an attempt on how this might be possible. I'm really not sure if it's possible or the best direction. Appreciate any guidance.
Try this:
delete from t1
where user in (
select user
from t1
group by user
having count(distinct category) = sum(case when total=0 then 1 else 0 end) )
The sub query can get all the users fit your removal requirement.
count(distinct category) get how many category a user have.
sum(case when total=0 then 1 else 0 end) get how many rows with activities a user have.
There are a number of ways to do this, but the less verbose the SQL is, the harder it may be for you to follow along with the logic. For that reason, I think that using multiple Common Table Expressions will avoid the need to use redundant joins, while being the most readable.
-- assuming user_name and category_name are unique on [user] and [category] respectively.
WITH valid_categories (category_id, category_name) AS
(
-- get set of valid categories
SELECT c.category_id, c.category AS category_name
FROM category c
WHERE c.category_id NOT IN (19,20)
),
valid_users ([user_name]) AS
(
-- get set of users who belong to valid roles
SELECT u.[user_name]
FROM [user] u
WHERE EXISTS (
SELECT *
FROM [role] r
WHERE u.id = r.[user_id] AND r.id IN (1,2)
)
),
valid_entries (entry_id, [user_name], category_id, entry_count) AS
(
-- provides a flag of 1 for easier aggregation
SELECT e.[entry_id], e.[user_name], e.category_id, CAST( 1 AS INT) AS entry_count
FROM [entry] e
WHERE e.entered_date BETWEEN TO_DATE('20140625','YYYYMMDD') AND TO_DATE('20140731', 'YYYYMMDD')
-- determines if entry is within date range
),
user_categories ([user_name], category_id, category_name) AS
( SELECT u.[user_name], c.category_id, c.category_name
FROM valid_users u
-- get the cartesian product of users and categories
CROSS JOIN valid_categories c
-- get only users with a valid entry
WHERE EXISTS (
SELECT *
FROM valid_entries e
WHERE e.[user_name] = u.[user_name]
)
)
/*
You can use these for testing.
SELECT COUNT(*) AS valid_categories_count
FROM valid_categories
SELECT COUNT(*) AS valid_users_count
FROM valid_users
SELECT COUNT(*) AS valid_entries_count
FROM valid_entries
SELECT COUNT(*) AS users_with_entries_count
FROM valid_users u
WHERE EXISTS (
SELECT *
FROM user_categories uc
WHERE uc.user_name = u.user_name
)
SELECT COUNT(*) AS users_without_entries_count
FROM valid_users u
WHERE NOT EXISTS (
SELECT *
FROM user_categories uc
WHERE uc.user_name = u.user_name
)
SELECT uc.[user_name], uc.[category_name], e.[entry_count]
FROM user_categories uc
INNER JOIN valid_entries e ON (uc.[user_name] = e.[user_name] AND uc.[category_id] = e.[category_id])
*/
-- Finally, the results:
SELECT uc.[user_name], uc.[category_name], SUM(NVL(e.[entry_count],0)) AS [entry_count]
FROM user_categories uc
LEFT OUTER JOIN valid_entries e ON (uc.[user_name] = e.[user_name] AND uc.[category_id] = e.[category_id])
Here's another method:
WITH totals AS (
SELECT
c.category,
u.user_name,
COUNT(e.entry_id) AS total,
SUM(COUNT(e.entry_id)) OVER (PARTITION BY u.user_name) AS user_total
FROM
user u
INNER JOIN
role r ON u.id = r.user_id
CROSS JOIN
category c
LEFT JOIN
entry e ON c.category_id = e.category_id
AND u.user_name = e.user_name
AND e1.entered_date >= TO_DATE ('20140625', 'YYYYMMDD')
AND e1.entered_date <= TO_DATE ('20140731', 'YYYYMMDD')
WHERE
r.id IN (1, 2)
AND c.category_id IN (19, 20)
GROUP BY
c.category,
u.user_name
)
SELECT
category,
user_name,
total
FROM
totals
WHERE
user_total > 0
;
The totals derived table calculates the totals per user and category as well as totals across all categories per user (using SUM() OVER ...). The main query returns only rows where the user total is greater than zero.

SQL LEFT JOIN combined with regular joins

I have the following query that joins a bunch of tables.
I'd like to get every record from the INDUSTRY table that has consolidated_industry_id = 1 regardless of whether or not it matches the other tables. I believe this needs to be done with a LEFT JOIN?
SELECT attr.industry_id AS option_id,
attr.industry AS option_name,
uj.ft_job_industry_id,
Avg(CASE
WHEN s.salary > 0 THEN s.salary
END) AS average,
Count(CASE
WHEN s.salary > 0 THEN attr.industry
END) AS count_non_zero,
Count(attr.industry_id) AS count_total
FROM industry attr,
user_job_ft_job uj,
salary_ft_job s,
user_job_ft_job ut,
[user] u,
user_education_mba_school mba
WHERE u.user_id = uj.user_id
AND u.user_id = ut.user_id
AND u.user_id = mba.user_id
AND uj.ft_job_industry_id = attr.industry_id
AND uj.user_job_ft_job_id = s.user_job_id
AND u.include_in_student_site_results = 1
AND u.site_instance_id IN ( 1 )
AND uj.job_type_id = 1
AND attr.consolidated_industry_id = 1
AND mba.mba_graduation_year_id NOT IN ( 8, 9 )
AND uj.admin_approved = 1
GROUP BY attr.industry_id,
attr.industry,
uj.ft_job_industry_id
This returns only one row, but there are 8 matches in the industry table where consolidated_industry_id = 1.
--- EDIT: The real question here is, how do I combine the LEFT JOIN with the regular joins?
Use left join for tables that may miss a corresponding record. Put the conditions for each table in the on clause of the join, not in the where, as that would in effect make them inner joins anyway. Something like:
select
attr.industry_id AS option_id, attr.industry AS option_name,
uj.ft_job_industry_id, AVG(CASE WHEN s.salary > 0 THEN s.salary END) AS average,
COUNT(CASE WHEN s.salary > 0 THEN attr.industry END) as count_non_zero,
COUNT(attr.industry_id) as count_total
from
industry attr
left join user_job_ft_job uj on uj.ft_job_industry_id = attr.industry_id and uj.job_type_id = 1 and uj.admin_approved = 1
left join salary_ft_job s on uj.user_job_ft_job_id = s.user_job_id
left join [user] u on u.user_id = uj.user_id and u.include_in_student_site_results = 1 and u.site_instance_id IN (1)
left join user_job_ft_job ut on u.user_id = ut.user_id
left join user_education_mba_school mba on u.user_id = mba.user_id and mba.mba_graduation_year_id not in (8, 9)
where
attr.consolidated_industry_id = 1
group by
attr.industry_id, attr.industry, uj.ft_job_industry_id
If you have any tables that you know always have a corresponding record, just use innser join for that.