Multiple Joins less cost way - sql

Below query has 3 tables where I have to do 2 joins to get a column information, It is very slow, is there any effective way to run this query?
SELECT DISTINCT
st.status_c1
FROM
schemaname.tablea st
INNER JOIN (
SELECT
lic.SpecId AS applicationid,
lic.comData AS combusappid,
lic.ageId,
lic.licId,
lic.licid,
lic.appid,
com.nybe_bustbl_id AS busid
FROM
schemaname.tableb lic
INNER JOIN tablec com ON lic.comData = com.comData
WHERE
lic.ageId = '12'
) rt ON
st.ageId = rt.ageId
AND
st.licId = rt.licId
AND
st.licid = rt.licid
AND
st.appid = rt.appid
WHERE
status_id = 3;

Your current query will create extra rows when the JOIN condition is met for multiple entries in either table and then DISTINCT will filter these duplicates out. You could try to cut down the amount of work filtering duplicates by using EXISTS:
SELECT DISTINCT
st.status_c1
FROM schemaname.tablea st
WHERE status_id = 3
AND EXISTS (
SELECT 1
FROM schemaname.tableb lic
WHERE lic.ageId = '12'
AND st.ageId = lic.ageId
AND st.licId = lic.licId
AND st.appid = lic.appid
AND EXISTS(
SELECT 1 FROM tablec com WHERE lic.comData = com.comData
)
);

There is a bunch of redundancy in the query (licid is in the SELECT and ON twice) and you don't need to use subqueries for this. I think this will work:
SELECT DISTINCT st.status_c1
FROM tablea st
INNER JOIN tableb lic ON st.ageId = lic.ageId
AND st.licId = lic.licId
AND st.appid = lic.appid
INNER JOIN tablec com ON lic.comData = com.comData
WHERE status_id = 3
and lic.ageId = '12'

How frequently are you going to run this query, how much time is it taking now and what is the explectation. Are statistcs run on all tha tables.
There are many things which we can think of, but to start with if possible could you plese give ue the like the table structure and explain plan of the query.
Also may be an index on status_c1 table tablea help. As pointed out try removing the join condition which is twice AND st.licid = rt.licid
SELECT DISTINCT st.status_c1
FROM schemaname.tablea st
INNER JOIN (
SELECT
lic.SpecId AS applicationid, lic.comData AS combusappid, lic.ageId, lic.licId, lic.licid,
lic.appid, com.nybe_bustbl_id AS busid
FROM schemaname.tableb lic
INNER JOIN tablec com ON lic.comData = com.comData
WHERE lic.ageId = '12'
) rt ON st.ageId = rt.ageId AND st.licId = rt.licId AND st.licid = rt.licid AND st.appid = rt.appid
WHERE status_id = 3;

Related

SQL split repeating rows caused by UNION

I am writing a query to look through and get two seperate averages based on where conditions.
I tried two select statetments but ended up with lots of duplicates.
Now I have a union which works pretty well, although I have my two fields in alternating rows instead of seperate columns.
Can anyone suggest a fix, sorry for the dodgy code!
SELECT
tblSkillName.skillName,
tblTestScores.skillUID,
AVG(tblTestScores.percentage) AS `cohortPercentage`
FROM
(
(
(
tblTestScores
INNER JOIN tblUsers ON tblUsers.email = tblTestScores.email
)
INNER JOIN tblTestDetails ON tblTestScores.testDetailsID = tblTestDetails.testDetailsID
)
INNER JOIN tblSkillName ON tblSkillName.skillUID = tblTestScores.skillUID
)
WHERE
teacherGroup = '9JS2/Cp'
AND tblTestScores.testDetailsID = 1
GROUP BY
skillName
UNION ALL
SELECT
tblSkillName.skillName,
tblTestScores.skillUID,
AVG(tblTestScores.percentage) AS `groupPercentage`
FROM
(
(
(
tblTestScores
INNER JOIN tblUsers ON tblUsers.email = tblTestScores.email
)
INNER JOIN tblTestDetails ON tblTestScores.testDetailsID = tblTestDetails.testDetailsID
)
INNER JOIN tblSkillName ON tblSkillName.skillUID = tblTestScores.skillUID
)
WHERE
tblTestScores.testDetailsID = 1
GROUP BY
skillName
ORDER BY
skillUID ASC

SELECT NOT IN with multiple columns in subquery

Regarding the statement below, sltrxid can exist as both ardoccrid and ardocdbid. I'm wanting to know how to include both in the NOT IN subquery.
SELECT *
FROM glsltransaction A
INNER JOIN cocustomer B ON A.acctid = B.customerid
WHERE sltrxstate = 4
AND araccttype = 1
AND sltrxid NOT IN(
SELECT ardoccrid,ardocdbid
FROM arapplyitem)
I would recommend not exists:
SELECT *
FROM glsltransaction t
INNER JOIN cocustomer c ON c.customerid = t.acctid
WHERE
??.sltrxstate = 4
AND ??.araccttype = 1
AND NOT EXISTS (
SELECT 1
FROM arapplyitem a
WHERE ??.sltrxid IN (a.ardoccrid, a.ardocdbid)
)
Note that I changed the table aliases to things that are more meaningful. I would strongly recommend prefixing the column names with the table they belong to, so the query is unambiguous - in absence of any indication, I represented this as ?? in the query.
IN sometimes optimize poorly. There are situations where two subqueries are more efficient:
SELECT *
FROM glsltransaction t
INNER JOIN cocustomer c ON c.customerid = t.acctid
WHERE
??.sltrxstate = 4
AND ??.araccttype = 1
AND NOT EXISTS (
SELECT 1
FROM arapplyitem a
WHERE ??.sltrxid = a.ardoccrid
)
AND NOT EXISTS (
SELECT 1
FROM arapplyitem a
WHERE ??.sltrxid = a.ardocdbid
)

Rewrite a where in / intersect query to a join

Is there a way to rewrite my query into a join one.
The question I have to solve is: List the names of green items sold by no department on the first floor. Do not show duplicates.
select distinct itemname from xsale where deptname in (
select deptname from xdept where deptfloor <> 1
)
intersect (
select itemname from xitem where itemcolor= 'green'
)
I have been stuck at this exercise a couple of days now because the join statements don't make much sense to me even after reading about it. I hope someone can help me.
I think the following query will help you about how to use join statment for your query;
EDITED
select distinct xs.itemname from xsale xs
inner join xdept xp on xs.deptname=xp.deptname
where xp.deptfloor <>1
intersect
(select xi.itemname from xitem xi where xi.itemcolor= 'green')
Check if this works for you
Select Distinct a.itemname from xsale a
INNER JOIN xdept b on a.deptName = b.deptName
INNER JOIN xitem c on a.itemName = c.itemname
Where b.deptfloor <> 1 and c.itemcolor = 'green'
I think you can express the logic as an EXIST and NOT EXIST query:
SELECT itemname
FROM xitem
WHERE itemcolor = 'green' -- all green items
AND EXISTS (
-- exists a sale for that item
SELECT 1
FROM xsale
WHERE xsale.itemname = xitem.itemname
AND NOT EXISTS (
-- not exists a department in those sales with floor = 1
SELECT 1
FROM xdept
WHERE xdept.deptname = xsale.deptname AND xdept.deptfloor = 1
)
)
It is often best to build your queries up step by step. So if we break the problem down:
List the names of green items sold
SELECT i.ItemName
FROM xitem AS i
WHERE i.ItemColor = 'Green';
Items sold by a department on the first floor
SELECT s.ItemName
FROM xsale AS s
INNER JOIN xdept AS d
ON d.DeptName = s.DeptName
WHERE d.DeptFloor = 1;
So now, you want all the items output by the first query, except for those that appear in the 2nd:
SELECT i.ItemName
FROM xitem AS i
WHERE i.ItemColor = 'Green'
EXCEPT
SELECT s.ItemName
FROM xsale AS s
INNER JOIN xdept AS d
ON d.DeptName = s.DeptName
WHERE d.DeptFloor = 1;
Then the final part:
Do not show duplicates:
SELECT DISTINCT i.ItemName
FROM xitem AS i
WHERE i.ItemColor = 'Green'
EXCEPT
SELECT s.ItemName
FROM xsale AS s
INNER JOIN xdept AS d
ON d.DeptName = s.DeptName
WHERE d.DeptFloor = 1;
An alertanative to EXCEPT would be NOT EXISTS, they will almost always result in the same execution plan, but I find NOT EXISTS is more flexible (you don't need the same columns in both queries):
SELECT i.ItemName
FROM xitem AS i
WHERE i.ItemColor = 'Green'
AND NOT EXISTS
( SELECT 1
FROM xsale AS s
INNER JOIN xdept AS d
ON d.DeptName = s.DeptName
WHERE d.DeptFloor = 1
AND s.ItemName = i.ItemName
)
GROUP BY i.ItemName;
Again to show an alternative, I have used GROUP BY rather than DISTINCT. In most cases these are semantically equivalent, but there are scenarios where GROUP BY will perform better (namely when a scalar function is involved - GROUP BY will remove duplciates first, and then execute the funciton on all remaining values, DISTINCT will execute the function first and remove duplicate results).
Examples on DB Fiddle

SQL Server Slow query using subqueries

I have a very slooowwwww query.
It selects customer records, with combined criteria, kinda like this:
A table has Customers, another table has CustomerCars, another table has CustomerMotorcycles.
A customer can have one or more cars. If the customer is a commercial business and any one of the customer's cars is a Ford, then we want to exclude that customer from our selection.
A customer can also have one or more motorcycles, and if the customer is a retail business and any one of its motorcycles is a Harley, then we want to exclude that customer.
So I have a statement like:
SELECT *
FROM CUST
WHERE
(CUST.CUSTTYPE = 'COMM'
AND CUST.CUSTID NOT IN (SELECT CUSTCARS.CUSTID
FROM CUSTCARS
WHERE CUSTCARS.CAR = 'FORD'))
OR
(CUST.CUSTTYPE = 'RETAIL'
AND CUST.CUSTID NOT IN (SELECT CUSTCYCLES.CUSTID
FROM CUSTCYCLES
WHERE CUSTCYCLES.CYCLE = 'HARLEY'))
This runs crazy slow.
This is currently being run as a bunch of separate queries that dump data into temporary tables, then several other queries are run to delete the records we don't want, but it's quite clumsy.
Any suggestions? Thanks for any help!
Try "left excluding joins", where we left join the data that matches the conditions we do not want, and then exclude the matching rows through the where clause:
SELECT
CUST.*
FROM CUST
LEFT JOIN CUSTCARS ON CUST.CUSTID = CUSTCARS.CUSTID
AND CUSTCARS.CAR = 'FORD'
AND CUST.CUSTTYPE = 'COMM'
LEFT JOIN CUSTCYCLES ON CUST.CUSTID = CUSTCYCLES.CUSTID
AND CUSTCYCLES.CYCLE = 'HARLEY'
AND CUST.CUSTTYPE = 'RETAIL'
WHERE CUSTCARS.CUSTID IS NULL
OR CUSTCYCLES.CUSTID IS NULL
;
Whilst I'm here, it might be the OR in your existing query that causes excessive slowness (maybe) so perhaps combining the 2 subqueries to one list would help:
SELECT
*
FROM CUST
WHERE CUSTID NOT IN (
SELECT
CUSTCARS.CUSTID
FROM CUSTCARS
INNER JOIN CUST ON CUSTCARS.CUSTID = CUST.CUSTID
WHERE CUSTCARS.CAR = 'FORD'
AND CUST.CUSTTYPE = 'COMM'
UNION ALL
SELECT
CUSTCYCLES.CUSTID
FROM CUSTCYCLES
INNER JOIN CUST ON CUSTCYCLES.CUSTID = CUST.CUSTID
WHERE CUSTCYCLES.CYCLE = 'HARLEY'
AND CUST.CUSTTYPE = 'RETAIL'
)
;
Two things come to mind. First, make sure there are indexes for CUSTCARS.CAR and CUSTCYCLES.CYCLE. Second, you might try NOT EXISTS instead of NOT IN.
SELECT * FROM CUST
WHERE
(CUST.CUSTTYPE = 'COMM'
AND NOT EXISTS(SELECT 1 FROM CUSTCARS WHERE CUSTCARS.CAR = 'FORD'
AND CUSTCARS.CUSTID=CUST.CUSTID))
OR
(CUST.CUSTTYPE = 'RETAIL'
AND NOT EXISTS(SELECT 1 FROM CUSTCYCLES WHERE CUSTCYCLES.CYCLE = 'HARLEY'
AND CUSTCYCLES.CUSTID=CUST.CUSTID))
I would suggest starting with not exists:
select c.*
from cust c
where not (c.custtype = 'COMM' and
exists (select 1
from custcars cc
where cc.custid =c.custid and cc.car = 'FORD'
)
) and
not (c.custtype = 'RETAIL' and
exists (select 1
from custcycles cc
where cc.custid = c.custid and cc.cycle = 'HARLEY'
)
) ;
Then, you want to be sure you have indexes on custcar(custid, car) and custcycles(custid, cycle).

SQL Server - UNION ALL

I'm new to SQL development and I need to do UNION on two select statements. Below is a sample query. The Join tables & conditions, where criteria, columns names and everything is the same in both the select statements except the the primary tables after the FROM clause. I just wanted to know if there is a way to have a single static select query, instead of repeating the same query twice for the UNION (without going for a dynamic query).
SELECT Sum(ABC.Intakes) As TotalIntakes, Sum(ABC.ClientTarget) as TotalClientTarget
FROM(
SELECT Sum(tt.IntakesReceived) As Intakes, Sum(tt.ClientTarget) As ClientTarget,
tt.ProgramId
FROM
(SELECT Count(DISTINCT ClientID) As IntakesReceived,
DATEDIFF(MONTH, L.AwardStartDate, L.AwardEndDate)*L.MonthlyClientTarget As ClientTarget,
L.AwardId, L.ProgramId
FROM IntakeCoverageLegacy As L
LEFT JOIN UserRoleEntity URE ON URE.EntityId = L.AwardId
LEFT JOIN CDPUserRole UR ON URE.UserRoleId = UR.Id AND UR.CDPUserId = #UserId
WHERE (#Program IS NULL OR L.ProgramId IN (SELECT ProgramID FROM #ProgramIDList)
AND (ufn_IsInternalUser(#UserId) = 1
OR (ufn_IsInternalUser(#UserId) = 0 AND UR.CDPUserId = #UserId ))
GROUP BY L.AwardId, L.ProgramId) As tt
GROUP BY tt.ProgramId, tt.ProgramName
UNION ALL
SELECT Sum(tt.IntakesReceived) As Intakes, Sum(tt.ClientTarget) As ClientTarget,
tt.ProgramId
FROM
(SELECT Count(DISTINCT C.ClientID) As IntakesReceived,
DATEDIFF(MONTH, C.AwardStartDate, C.AwardEndDate)*L.MonthlyClientTarget As ClientTarget,
C.AwardId, C.ProgramId
FROM IntakeCoverageCDP As C
LEFT JOIN UserRoleEntity URE ON URE.EntityId = L.AwardId
LEFT JOIN CDPUserRole UR ON URE.UserRoleId = UR.Id AND UR.CDPUserId = #UserId
WHERE (#Program IS NULL OR C.ProgramId IN (SELECT ProgramID FROM #ProgramIDList)
AND (ufn_IsInternalUser(#UserId) = 1
OR (ufn_IsInternalUser(#UserId) = 0 AND UR.CDPUserId = #UserId ))
GROUP BY C.AwardId, C.ProgramId) As tt
GROUP BY tt.ProgramId, tt.ProgramName
) As ABC
GROUP BY ABC.ProgramId
OK... What I posted earlier was a sample query and I've updated the sample to my actual query to make it more clear. It's just the primary tables that are different. My requirement is that - after doing UNION ALL, I need to sum the aggregate columns in the final result, grouping by ProgramId.
I would probably first use UNION for the Client and LegacyClient tables as a derived table and then perform the JOINs:
SELECT C.AwardId,
C.ProgramName,
COUNT(ClientId) AS Intakes
FROM ( SELECT AwardId,
ProgramName,
Id
FROM Client
WHERE Id = #ClientId
UNION
SELECT AwardId,
ProgramName,
Id
FROM LegacyClient
WHERE Id = #ClientId) C
LEFT JOIN UserRoleEntity URE
ON C.AwardId = URE.EntityId
LEFT JOIN UserRole UR
ON URE.UserRoleId = UR.Id AND UR.CDPUserId = #UserId
WHERE (testFunction(#UserId) = 0
OR (testFunction(#UserId) <> 0 AND UR.CDPUserId = #UserId))
GROUP BY C.AwardId,
C.ProgramName;
SELECT C.AwardId, C.ProgramName, Count(ClientId) as Intakes
FROM
(
SELECT Id, AwardId, ProgramName, ClientId FROM Client UNION ALL
SELECT Id, AwardId, ProgramName, ClientId FROM LegacyClient
) C
LEFT OUTER JOIN UserRoleEntity URE ON C.AwardId = URE.EntityId
LEFT OUTER JOIN UserRole UR ON URE.UserRoleId = UR.Id AND UR.CDPUserId = #UserId
WHERE
C.Id = #ClientId
AND (testFunction(#UserId) = 0 OR UR.CDPUserId = #UserId)
GROUP BY C.AwardId, C.ProgramName
Using testFunction() twice isn't really necessary (unless null is one of the outputs.)
You might also prefer to filter on ClientId outside of the union. I'm guess your purpose in rewriting it to avoid the duplicated logic. You might still want to see which one is better handled by the optimizer.
Also, I used a UNION ALL. I'm thinking you imagine only one result from one of the two tables. As you originally wrote it that count column is going to factor into the union.
Counting on ClientId seems odd. So does having a parameter named #ClientId that doesn't seem to match up with the ClientId column.