optimize Table Spool in SQL Server Execution plan - sql

I have the following sql query and trying to optimize it using execution plan. In execution plan it says Estimated subtree cost is 36.89. There are several table spools(Eager Spool). can anyone help me to optimize this query. Thanks in advance.
SELECT
COUNT(DISTINCT bp.P_ID) AS total,
COUNT(DISTINCT CASE WHEN bc.Description != 'S' THEN bp.P_ID END) AS m_count,
COUNT(DISTINCT CASE WHEN bc.Description = 'S' THEN bp.P_ID END) AS s_count,
COUNT(DISTINCT CASE WHEN bc.Description IS NULL THEN bp.P_ID END) AS n_count
FROM
progress_tbl AS progress
INNER JOIN Person_tbl AS bp ON bp.P_ID = progress.person_id
LEFT OUTER JOIN Status_tbl AS bm ON bm.MS_ID = bp.MembershipStatusID
LEFT OUTER JOIN Membership_tbl AS m ON m.M_ID = bp.CurrentMembershipID
LEFT OUTER JOIN Category_tbl AS bc ON bc.MC_ID = m.MembershipCategoryID
WHERE
logged_when BETWEEN '2017-01-01' AND '2017-01-31'

Here's a technique you can use.
WITH T AS
(
SELECT DISTINCT CASE
WHEN bc.Description != 'S' THEN 'M'
WHEN bc.Description = 'S' THEN 'S'
WHEN bc.Description IS NULL THEN 'N'
END AS type,
bp.P_ID
FROM progress_tbl AS progress
INNER JOIN Person_tbl AS bp
ON bp.P_ID = progress.person_id
LEFT OUTER JOIN Status_tbl AS bm
ON bm.MS_ID = bp.MembershipStatusID
LEFT OUTER JOIN Membership_tbl AS m
ON m.M_ID = bp.CurrentMembershipID
LEFT OUTER JOIN Category_tbl AS bc
ON bc.MC_ID = m.MembershipCategoryID
WHERE logged_when BETWEEN '2017-01-01' AND '2017-01-31'
)
SELECT COUNT(DISTINCT P_ID) AS total,
COUNT(CASE WHEN type= 'M' THEN P_ID END) AS m_count,
COUNT(CASE WHEN type= 'S' THEN P_ID END) AS s_count,
COUNT(CASE WHEN type= 'N' THEN P_ID END) AS n_count
FROM T
I will demonstrate it on a simpler example.
Suppose your existing query is
SELECT
COUNT(DISTINCT number) AS total,
COUNT(DISTINCT CASE WHEN name != 'S' THEN number END) AS m_count,
COUNT(DISTINCT CASE WHEN name = 'S' THEN number END) AS s_count,
COUNT(DISTINCT CASE WHEN name IS NULL THEN number END) AS n_count
FROM master..spt_values;
You can rewrite it as follows
WITH T AS
(
SELECT DISTINCT CASE
WHEN name != 'S'
THEN 'M'
WHEN name = 'S'
THEN 'S'
ELSE 'N'
END AS type,
number
FROM master..spt_values
)
SELECT COUNT(DISTINCT number) AS total,
COUNT(CASE WHEN type= 'M' THEN number END) AS m_count,
COUNT(CASE WHEN type= 'S' THEN number END) AS s_count,
COUNT(CASE WHEN type= 'N' THEN number END) AS n_count
FROM T
Note the rewrite is costed as considerably cheaper and the plan is much simpler.

As already pointed out, there seems to be some typo/copy paste issues with your query. This makes it rather difficult for us to figure out what's going on.
The table-spools probably are what's going on in the CASE WHEN b.description etc... constructions. MSSQL first creates a (memory) table with all the resulting values and then that one gets sorted and streamed through the COUNT(DISTINCT ...) operator. I don't think there is much you can do about that as the work needs to be done somewhere.
Anyway, some remarks and wild guesses:
I'm guessing that logged_when is in the progress_tbl table?
If so, do you really need to LEFT OUTER JOIN all the other tables? From what I can tell they aren't being used?
You're trying to count the number of P_IDs that match the criteria and you want to split up that number between those that have b.Description either 'S', something else, or NULL.
for this you could calculate the total as the sum of the m_count, s_count and n_count. This would save you 1 COUNT() operation, not sure it helps a lot in the bigger picture but all bits help I guess.
Something like this:
;WITH counts AS (
SELECT
COUNT(DISTINCT CASE WHEN b.Description != 'S' THEN b_p.P_ID END) AS m_count,
COUNT(DISTINCT CASE WHEN b.Description = 'S' THEN b_p.P_ID END) AS s_count,
COUNT(DISTINCT CASE WHEN b.Description IS NULL THEN b_p.P_ID END) AS n_count
FROM
progress_tbl AS progress
INNER JOIN Person_tbl AS bp ON bp.P_ID = progress.person_id
LEFT OUTER JOIN Status_tbl AS bm ON bm.MS_ID = bp.MembershipStatusID -- really needed?
LEFT OUTER JOIN Membership_tbl AS m ON m.M_ID = bp.CurrentMembershipID -- really needed?
LEFT OUTER JOIN Category_tbl AS bc ON bc.MC_ID = m.MembershipCategoryID -- really needed?
WHERE
logged_when BETWEEN '2017-01-01' AND '2017-01-31' -- what table does logged_when column come from????
)
SELECT total = m_count + s_count + n_count,
*
FROM counts
UPDATE
BEWARE: Using the answer/example code of Martin Smith I came to realize that total isn't necessarily the sum of the other fields. It could be a given P_ID shows up with different description which then might fall into different categories. Depending on your data it might thus be that my answer is plain wrong.

Related

How to output a new column in a SELECT query based on a condition?

Currently, I have this SQL query:
SELECT AVG(ttbe.MarkGiven) FROM tblTestsTakenByEmployee ttbe
INNER JOIN tblCoursesTakenByEmployee ctbe ON ttbe.EmployeeId = ctbe.EmployeeId
LEFT JOIN tblCourse c ON ctbe.CourseId = c.CourseId
WHERE ctbe.HasCompletedCourse = 'Y'
GROUP BY ctbe.CourseId, c.CourseName, EXTRACT(YEAR FROM ctbe.DateOfCourseCompletion), ctbe.EmployeeId
At the moment, this returns the average mark of a single employee on a course which was completed on a certain year, calculated across each of the tests it contains (a course can have multiple tests).
I want to output an additional column to the SELECT query which specifies whether that employee has passed the course, based on a threshold. For example, if AVG(ttbe.MarkGiven) >= 40 then it would return 'Y' in the new column, otherwise it would return 'N'. What's the simplest and most efficient way of achieving this?
you could use the CASE expression like that:
SELECT AVG(ttbe.MarkGiven),
CASE WHEN AVG(ttbe.MarkGiven) >= 40 THEN 'Y' ELSE 'N' END as exam_passed
FROM tblTestsTakenByEmployee ttbe
INNER JOIN tblCoursesTakenByEmployee ctbe ON ttbe.EmployeeId = ctbe.EmployeeId
LEFT JOIN tblCourse c ON ctbe.CourseId = c.CourseId
WHERE ctbe.HasCompletedCourse = 'Y'
GROUP BY ctbe.CourseId, c.CourseName, EXTRACT(YEAR FROM ctbe.DateOfCourseCompletion),
ctbe.EmployeeId
https://www.oracletutorial.com/oracle-basics/oracle-case/#:~:text=Oracle%20CASE%20expression%20allows%20you,that%20accepts%20a%20valid%20expression.
You could use your current query as a CTE and then from the CTE use the reulst and compare it to your threshold
The advantag of the CTE is that you will not have to recalculate your AVG(ttbe.MarkGiven)
WITH _CTE as
(
SELECT AVG(ttbe.MarkGiven) as Col1
FROM tblTestsTakenByEmployee ttbe
INNER JOIN tblCoursesTakenByEmployee ctbe
ON ttbe.EmployeeId = ctbe.EmployeeId
LEFT JOIN tblCourse c
ON ctbe.CourseId = c.CourseId
WHERE ctbe.HasCompletedCourse = 'Y'
GROUP BY ctbe.CourseId, c.CourseName, EXTRACT(YEAR FROM ctbe.DateOfCourseCompletion), ctbe.EmployeeId
)
Select Col1
,CASE
WHEN Col1 >= 50 THEN 'Y'
ELSE 'N'
END AS Col1_Threshold

sql join not taking all records from another table

I have a query like this
WITH CTE AS
(
SELECT
U.Name, U.Adluserid AS 'Empid',
MIN(CASE WHEN IOType = 0 THEN Edatetime END) AS 'IN',
MAX(CASE WHEN IOType = 1 THEN Edatetime END) AS 'out',
(CASE
WHEN MAX(E.Status) = 1 THEN 'AL'
WHEN MAX(E.Status) = 2 THEN 'SL'
ELSE 'L'
END) AS leave_status
FROM
Mx_ACSEventTrn
RIGHT JOIN
Mx_UserMst U ON Mx_ACSEventTrn.UsrRefcode = U.UserID
LEFT JOIN
Tbl_Zeo_Empstatus E ON Mx_ACSEventTrn.UsrRefcode = E.Emp_Id
WHERE
CAST(Edatetime AS DATE) BETWEEN '2019-11-03' AND '2019-11-03'
GROUP BY
U.Name, U.Adluserid
)
SELECT
[Name], [Empid], [IN], [OUT],
(CASE
WHEN CAST([IN] AS TIME) IS NULL THEN CAST(leave_status AS NVARCHAR(50))
WHEN CAST([IN] AS TIME) < CAST('08:15' AS TIME) THEN 'P'
ELSE 'L'
END) AS status
FROM
CTE
In my employee master table Mx_UserMst I have 67 employees. But here it is showing only a few employees those who are punched. I want to show all employees from employee master
I believe that the problem is his WHERE clause:
where cast(Edatetime as date) between '2019-11-03' and '2019-11-03'
Why not cast(Edatetime as date) = '2019-11-03'?
I'm not sure in which table the column Edatetime belongs (you should qualify all the columns with the correct table name/alias).
You must move the condition to an ON clause:
WITH CTE AS
(
select U.Name,U.Adluserid as 'Empid',
min(case when IOType=0 then Edatetime end) as 'IN',
max(case when IOType=1 then Edatetime end) as 'out',
case max(E.Status) when 1 then 'AL' when 2 then 'SL' else 'L' end as leave_status
from Mx_UserMst U
left join Mx_ACSEventTrn on Mx_ACSEventTrn.UsrRefcode=U.UserID and (cast(Edatetime as date) between '2019-11-03' and '2019-11-03')
left join Tbl_Zeo_Empstatus E on Mx_ACSEventTrn.UsrRefcode=E.Emp_Id
group by U.Name,U.Adluserid
)
SELECT [Name], [Empid],[IN],[OUT],
case
when cast([IN] as time) is null then cast(leave_status as nvarchar(50))
when cast([IN] as time) < cast('08:15' as time) then 'P'
else 'L'
end as status
FROM CTE
If Edatetime belongs to Tbl_Zeo_Empstatus move the condition to the next join's ON clause.
I also changed the RIGHT to a LEFT join so to make the statement more readable.
If you want to keep everything in a particular table, then that should be the first table in the FROM clause. Subsequent joins should be LEFT JOINs and conditions on subsequent tables should be in the ON clause rather than the WHERE clause.
I would also advise you to use table aliases and to only use single quotes for string and date constants -- NOT column aliases.
The following assumes that IOType and Edatetime are in the table Mx_ACSEventTrn. I should not have to guess. You should qualify all column names in the query.
WITH CTE AS (
SELECT U.Name, U.Adluserid AS Empid,
MIN(CASE WHEN AE.IOType = 0 THEN AE.Edatetime END) AS in_dt,
MAX(CASE WHEN AE.IOType = 1 THEN AE.Edatetime END) AS out_dt,
(CASE WHEN MAX(ES.Status) = 1 THEN 'AL'
WHEN MAX(ES.Status) = 2 THEN 'SL'
ELSE 'L'
END) AS leave_status
FROM Mx_UserMst U LEFT JOIN
Mx_ACSEventTrn AE
ON AE.UsrRefcode = U.UserID AND
CAST(AE.Edatetime AS DATE) BETWEEN '2019-11-03' AND '2019-11-03' LEFT JOIN
Tbl_Zeo_Empstatus ES
ON AE.UsrRefcode = ES.Emp_Id AND
GROUP BY U.Name, U.Adluserid
)
SELECT Name, Empid, IN_DT, OUT_DT,
(CASE WHEN IN_DT IS NULL THEN leave_status
WHEN CAST(IN_DT AS TIME) < CAST('08:15' AS TIME) THEN 'P'
ELSE 'L'
END) AS status
FROM CTE;
Some more points:
Don't name aliases things like IN that are already key words. That is why I gave it the name IN_DT.
There is no reason to cast to a TIME to compare to NULL.
I don't see a reason to cast to NVARCHAR(50) in the outer CASE expression.

Join 2 tables even with null values and sum each row

See pics below of what i want to do with my sql statement. I have used a left join and ISNULL to get all the results fine except for I don't get the total, which I want to sum the numbers for each customer. All table b values are integers.
Select a.CustomerId, a.FName, a.LName, b.mtg1, b.mtg2, b.mtg3, b.mtg4 From Customer a Left Join Hours b On a.CustomerID = b.CustomerID group by a.Lname
The GROUP BY will make it a bit difficult to captured all the necessary data. Also, the sum of different columns will require the use of COALESCE(column, 0) so as to use zero as the value if the column is null because if not done, your total will come back asNULL`.
One possible solution will is:
SELECT a.CustId, a.FName, a.LName, SUM(b.mtg1), SUM(b.mtg2), SUM(b.mtg3), SUM(b.mtg4), (COALESCE(SUM(b.mtg1), 0) + COALESCE(SUM(b.mtg2), 0) + COALESCE(SUM(b.mtg3), 0) + COALESCE(SUM(b.mtg4), 0)) AS total
FROM table_a a
LEFT JOIN table_b b ON(b.CustID = a.CustId)
GROUP BY a.CustID, a.FName, a.LName
with cte as
(
select flname+', '+fname as Name,
mtg1,mtg2,mtg3,mtg4,
isnull(((case when mtg1 is null then 0 else mtg1 end)+
(case when mtg2 is null then 0 else mtg2 end)+
(case when mtg3 is null then 0 else mtg3 end)+
(case when mtg4 is null then 0 else mtg4 end)),0) as total
from a left join b
on a.custid = b.custid
)
select name,
isnull(mtg1,0) as mtg1,
isnull(mtg2,0) as mtg2,
isnull(mtg3,0) as mtg3,
isnull(mtg4,0) as mtg4,
total
from cte
order by 1

Aggregate case when inside non aggregate query

I have a pretty massive query that in its simplest form looks like this:
select r.rep_id, u.user_id, u.signup_date, pi.application_date, pi.management_date, aum
from table1 r
left join table2 u on r.user_id=u.user_id
left join table3 pi on u.user_id=pi.user_id
I need to add one more condition that gives me count of users with non null application date per rep (like: rep 1 has 3 users with filled application dates), and assign it into categories (since 3 users, rep is a certain status category). This looks something like this:
case when sum(case when application_date is not null then 1 else 0 end) >=10 then 'status1'
when sum(case when application_date is not null then 1 else 0 end) >=5 then 'status2'
when sum(case when application_date is not null then 1 else 0 end) >=1 then 'status3'
else 'no_status' end as category
However, if I was to simply add it to the select statement, all reps will becomes of status1 because the sum() is done over all advisors with application dates filled:
select r.rep_id, u.user_id, u.signup_date, pi.application_date, pi.management_date, aum,
(
select case when sum(case when application_date is not null then 1 else 0 end) >=10 then 'status1'
when sum(case when application_date is not null then 1 else 0 end) >=5 then 'status2'
when sum(case when application_date is not null then 1 else 0 end) >=1 then 'status3'
else 'no_status' end as category
from table3
) as category
from table1 r
left join table2 u on r.user_id=u.user_id
left join table3 pi on u.user_id=pi.user_id
Can you assist with having the addition to my query to be across reps and not overall? Much appreciated!
Based on your description, I think you need a window function:
select r.rep_id, u.user_id, u.signup_date, pi.application_date, pi.management_date, aum,
count(pi.application_date) over (partition by r.rep_id) as newcol
from table1 r left join
table2 u
on r.user_id = u.user_id left join
table3 pi
on u.user_id = pi.user_id;
You can use the count() in a case to get ranges, if that is what you prefer.

Joined SQL query MAX aggregate with condition

I am having trouble with a SQL query. Here is a representation of my schema on SQL Fiddle:
http://sqlfiddle.com/#!15/14c8e/1
The issue is that I want to return rows of data from the Invitations table and join them with a sum of both the 'sent' event_type and 'viewed' event_type from the associated events, as well as the latest created_at date.
I can get all the data and counts working, but am having issue with the last_sent_on. Is there a way I can use a condition in a MAX aggregate function?
e.g.
MAX(
SELECT events.created_at
WHERE event_type='sent'
)
If not, how would I write the proper subselect?
I am currently using Postgresql.
Thank you.
You can use a case statement inside of max just as you've done with sum. The query below will select the maximum created_at for event_type='sent'
SELECT
i.id,
i.name,
i.email,
max(case when e.event_type='sent' then e.created_at end) AS last_sent_on,
sum(case when e.event_type='sent' then 1 else 0 end) AS sent_count,
sum(case when e.event_type='viewed' then 1 else 0 end) AS view_count
FROM
invitations i
LEFT OUTER JOIN
events e
ON e.eventable_id = i.id
WHERE e.eventable_type='Invitation'
GROUP BY i.id, i.name, i.email
SQLFiddle
Try using a subquery to build the max value for sent.
SELECT
i.id,
i.name,
i.email,
sent.last_sent,
sum(case when e.event_type='sent' then 1 else 0 end) AS sent_count,
sum(case when e.event_type='viewed' then 1 else 0 end) AS view_count
FROM
invitations i
LEFT OUTER JOIN
events e
ON e.eventable_id = i.id
LEFT JOIN ( SELECT eventable_id uid, MAX(created_at) AS last_sent
FROM events
WHERE event_type = 'sent'
GROUP BY eventable_id ) AS sent
ON sent.uid = i.id
WHERE e.eventable_type='Invitation'
GROUP BY i.id, i.name, i.email, sent.last_sent