Need to improve the performance of a SQL query

Need to improve the performance of a SQL query - sql

I am new to SQL and I'm facing some problem with the performance of a SQL query.
I followed some points from Google and created the required indexes. But still not able to improve the performance.
Guide me to improve the performance of following the query. The tables have millions of records.
SELECT TOP 15 id,
field1,
field2
FROM (SELECT DISTINCT 0 AS ID,
tblsuites.suite Field1,
'Work Order' AS Field2
FROM tbljb_schedules
INNER JOIN tblsuites
ON tbljb_schedules.tblsuites_id = tblsuites.tblsuites_id
INNER JOIN tblsites
ON tbljb_schedules.tblsites_id = tblsites.tblsites_id
LEFT OUTER JOIN tblbldgs
ON
tbljb_schedules.tblbldgs_id = tblbldgs.tblbldgs_id
WHERE tbljb_schedules.tbldomains_id = 28
AND tbljb_schedules.internalonly = 0
AND tbljb_schedules.tblsites_id IN (SELECT tblsites_id
FROM tbllogins_sites
WHERE tbllogins_id = 264
AND
tblsites.active = 1)
AND ( tblsuites.suite LIKE '%1%' )
UNION
SELECT DISTINCT 0 AS ID,
tblsuites.suite Field1,
'Work Order' AS Field2
FROM arcjb_schedules
INNER JOIN tblsuites
ON arcjb_schedules.tblsuites_id = tblsuites.tblsuites_id
INNER JOIN tblsites
ON arcjb_schedules.tblsites_id = tblsites.tblsites_id
LEFT OUTER JOIN tblbldgs
ON arcjb_schedules.tblbldgs_id =
tblbldgs. tblbldgs_id
WHERE arcjb_schedules.tbldomains_id = 28
AND arcjb_schedules.internalonly = 0
AND arcjb_schedules.tblsites_id IN (SELECT tblsites_id
FROM tbllogins_sites
WHERE tbllogins_id = 264
AND
tblsites.active = 1)
AND ( tblsuites.suite LIKE '%1%' )) T
ORDER BY CASE
WHEN Charindex('1', field1) = 1 THEN 0
ELSE 1
END,
field1

Maybe you should Try the below code changes
SELECT TOP 15
id,
field1,
field2
FROM
(
SELECT
0 AS ID,
tblsuites.suite Field1,
'Work Order' AS Field2,
SeqOne = CASE WHEN CHARINDEX('1', tblsuites.suite)= 1
THEN 1
ELSE 0 END
FROM tbljb_schedules
INNER JOIN tblsuites
ON tbljb_schedules.tblsuites_id = tblsuites.tblsuites_id
INNER JOIN tblsites
ON tbljb_schedules.tblsites_id = tblsites.tblsites_id
LEFT OUTER JOIN tblbldgs
ON tbljb_schedules.tblbldgs_id = tblbldgs.tblbldgs_id
WHERE tbljb_schedules.tbldomains_id = 28
AND tbljb_schedules.internalonly = 0
AND EXISTS -- Replace IN With EXISTS
(
SELECT
1
FROM tbllogins_sites
WHERE tbllogins_id = 264
AND tblsites.active = 1
AND tblsites_id = tbljb_schedules.tblsites_id
)
AND (tblsuites.suite LIKE '%1%')
UNION -- UNION Will By Default Take DISTINCT Records
SELECT
0 AS ID,
tblsuites.suite Field1,
'Work Order' AS Field2,
SeqOne = CASE WHEN CHARINDEX('1', tblsuites.suite)= 1
THEN 1
ELSE 0 END
FROM arcjb_schedules
INNER JOIN tblsuites ON arcjb_schedules.tblsuites_id = tblsuites.tblsuites_id
INNER JOIN tblsites ON arcjb_schedules.tblsites_id = tblsites.tblsites_id
LEFT OUTER JOIN tblbldgs ON arcjb_schedules.tblbldgs_id = tblbldgs.tblbldgs_id
WHERE arcjb_schedules.tbldomains_id = 28
AND arcjb_schedules.internalonly = 0
AND EXISTS
(
SELECT
1
FROM tbllogins_sites
WHERE tbllogins_id = 264
AND tblsites.active = 1
AND tblsites_id = arcjb_schedules.tblsites_id
)
AND (tblsuites.suite LIKE '%1%')
)T
ORDER BY
SeqOne,
field1;

Performance tuning is a more of a trial and error subject.
Looking at your code I can see a major performance offender. This condition in the where clause:
and tblsuites.suite LIKE '%1%'
I don't know how big is the tblSuites table but the use of LIKE '%1%' makes the optimizer perform table scans rather than index seek, so even if you have indexed that column it will be useless.
also follow what #Jayasurya Satheesh mentioned in his answer.

Related

How do you properly query the result of a complex join statement in SQL?

New to advanced SQL!
I'm trying to write a query that returns the COUNT(*) and SUM of the resulting columns from this query:
DECLARE #Id INT = 1000;
SELECT
*,
CASE
WHEN Id1 >= 6 THEN 1
ELSE 0
END AS Tier1,
CASE
WHEN Id1 >= 4 THEN 1
ELSE 0
END AS Tier2,
CASE
WHEN Id1 >= 2 THEN 1
ELSE 0
END AS Tier3
FROM (
SELECT
Org.OrgID,
App.AppID,
App.FirstName,
App.LastName,
MAX(AppSubmitU_Level.Id1) AS Id1
FROM Org
INNER JOIN AppEmployment
ON AppEmployment.OrgID = Org.OrgID
INNER JOIN App
ON App.AppID = AppEmployment.AppID
INNER JOIN AppSubmit
ON App.AppID = AppSubmit.AppID
INNER JOIN AppSubmitU_Level
ON AppSubmit.LevelID = AppSubmitU_Level.Id1
INNER JOIN AppEmpU_VerifyStatus
ON AppEmpU_VerifyStatus.VerifyStatusID = AppEmployment.VerifyStatusID
WHERE AppSubmitU_Level.SubmitTypeID = 1 -- Career
AND AppEmpU_VerifyStatus.StatusIsVerified = 1
AND AppSubmit.[ExpireDate] IS NOT NULL
AND AppSubmit.[ExpireDate] > GETDATE()
AND Org.OrgID = #Id
GROUP BY
Org.OrgID,
App.AppID,
App.FirstName,
App.LastName
) employees
I've tried to do so by moving the #Id outside the original query, and adding a SELECT(*), SUM, and SUM to the top, like so:
DECLARE #OrgID INT = 1000;
SELECT COUNT(*), SUM(employees.Tier1), SUM(employees.Tier2), SUM(employees.Tier3)
FROM
(SELECT *,
...
) AS employees
);
When I run the query, however, I'm getting the errors:
The multi-part identifier employees.Tier1 could not be bound
The same errors appear for the other identifiers in my SUM statements.
I'm assuming this has to do with the fact that the Tier1, Tier2, and Tier3 columns are being returned by the inner join query in my FROM(), and aren't values set by the existing tables that I'm querying. But I can't figure out how to rewrite it to initialize properly.
Thanks in advance for the help!

This is a scope problem: employees is defined in the subquery only, it is not available in the outer scope. You basically want to alias the outer query:
DECLARE #OrgID INT = 1000;
SELECT COUNT(*), SUM(employees.Tier1) TotalTier1, SUM(employees.Tier2) TotalTier2, SUM(employees.Tier3) TotalTier3
FROM (
SELECT *,
...
) AS employees
) AS employees;
--^ here
Note that I added column aliases to the outer query, which is a good practice in SQL.
It might be easier to understand what is going on if you use another alias for the outer query:
SELECT COUNT(*), SUM(e.Tier1), SUM(e.Tier2), SUM(e.Tier3)
FROM (
SELECT *,
...
) AS employees
) AS e;
Note that you don't actually need to qualify the column names in the outer query, since column names are unambigous anyway.
And finally: you don't actually need a subquery. You could write the query as:
SELECT
SUM(CASE WHEN Id1 >= 6 THEN 1 ELSE 0 END) AS TotalTier1,
SUM(CASE WHEN Id1 >= 4 THEN 1 ELSE 0 END) AS TotalTier2,
SUM(CASE WHEN Id1 >= 2 THEN 1 ELSE 0 END) AS TotalTier3
FROM (
SELECT
Org.OrgID,
App.AppID,
App.FirstName,
App.LastName,
MAX(AppSubmitU_Level.Id1) AS Id1
FROM Org
INNER JOIN AppEmployment
ON AppEmployment.OrgID = Org.OrgID
INNER JOIN App
ON App.AppID = AppEmployment.AppID
INNER JOIN AppSubmit
ON App.AppID = AppSubmit.AppID
INNER JOIN AppSubmitU_Level
ON AppSubmit.LevelID = AppSubmitU_Level.Id1
INNER JOIN AppEmpU_VerifyStatus
ON AppEmpU_VerifyStatus.VerifyStatusID = AppEmployment.VerifyStatusID
WHERE AppSubmitU_Level.SubmitTypeID = 1 -- Career
AND AppEmpU_VerifyStatus.StatusIsVerified = 1
AND AppSubmit.[ExpireDate] IS NOT NULL
AND AppSubmit.[ExpireDate] > GETDATE()
AND Org.OrgID = #Id
GROUP BY
Org.OrgID,
App.AppID,
App.FirstName,
App.LastName
) employees

Running slow with intersect

Trying to optimize a query it is updaing the records in table A based on the INTERSECT on two data sets.
UPDATE #TableA
SET IsFlag = CASE WHEN ISNULL(RJobFlag, 0) > 0 THEN 0 ELSE 1 END
FROM #TableA AS ABC
OUTER APPLY (
SELECT 1 RJobFlag
WHERE EXISTS (
SELECT ABC.COLUMN1,ABC.COLUMN2,ABC.COLUMN3,ABC.COLUMN4,ABC.COLUMN5,ABC.COLUMN6,ABC.COLUMN7,ABC.COLUMN8,ABC.COLUMN8,ABC.COLUMN9,ABC.COLUMN10,StudentID,SubjectID
INTERSECT
SELECT XYZ.COLUMN1,XYZ.COLUMN2,XYZ.COLUMN3,XYZ.COLUMN4,XYZ.COLUMN5,XYZ.COLUMN6,XYZ.COLUMN7,XYZ.COLUMN8,XYZ.COLUMN8,XYZ.COLUMN9,XYZ.COLUMN10,StudentID,SubjectID
FROM #TableB AS XYZ
WHERE XYZ.COLUMN1 = (SELECT DISTINCT ID FROM #TableC MNOP WHERE MNOP.StudentID = ABC.StudentID)
AND StudentID = ABC.StudentID
AND SubjectID = ABC.SubjectID )
) Subquery
WHERE ABC.COLUMN1= '2'
Appretiated if you have some ideas to better optimize it.
Thanks

This might be worse never know.
UPDATE tA
SET IsFlag = COALESCE(RJobFlag, 0)
FROM #TableA tA
LEFT JOIN ( SELECT 1 RJobFlag, *
FROM #TableB tB
WHERE EXISTS ( SELECT *
FROM #TableC tC
WHERE tC.ID = tB.COLUMN1 AND tC.StudentID = tB.StudentID)
) tB
ON tA.StudentID = tB.StudentID
AND tA.SubjectID = tB.SubjectID
AND tA.COLUMN1 = tB.COLUMN1
AND tA.COLUMN2 = tB.COLUMN2
AND tA.COLUMN3 = tB.COLUMN3
AND tA.COLUMN4 = tB.COLUMN4
AND tA.COLUMN5 = tB.COLUMN5
AND tA.COLUMN6 = tB.COLUMN6
AND tA.COLUMN7 = tB.COLUMN7
AND tA.COLUMN8 = tB.COLUMN8
AND tA.COLUMN9 = tB.COLUMN9
AND tA.COLUMN10 = tB.COLUMN10
If #TableA has a bunch of records and you're using outer apply or outer join every time you update it will be slow since it has to update every single record. maybe there's a way to only update the records that have changed?

Is there a way to make this query more efficient performance wise?

This query takes a long time to run on MS Sql 2008 DB with 70GB of data.
If i run the 2 where clauses seperately it takes a lot less time.
EDIT - I need to change the 'select *' to 'delete' afterwards, please keep it in mind when answering. thanks :)
select *
From computers
Where Name in
(
select T2.Name
from
(
select Name
from computers
group by Name
having COUNT(*) > 1
) T3
join computers T2 on T3.Name = T2.Name
left join policyassociations PA on T2.PK = PA.EntityId
where (T2.EncryptionStatus = 0 or T2.EncryptionStatus is NULL) and
(PA.EntityType <> 1 or PA.EntityType is NULL)
)
OR
ClientId in
(
select substring(ClientID,11,100)
from computers
)

Swapping IN for EXISTS will help.
Also, as per Gordon's answer: UNION can out-perform OR.
SELECT computers.*
FROM computers
LEFT
JOIN policyassociations
ON policyassociations.entityid = computers.pk
WHERE (
computers.encryptionstatus = 0
OR computers.encryptionstatus IS NULL
)
AND (
policyassociations.entitytype <> 1
OR policyassociations.entitytype IS NULL
)
AND EXISTS (
SELECT name
FROM (
SELECT name
FROM computers
GROUP
BY name
HAVING Count(*) > 1
) As duplicate_computers
WHERE name = computers.name
)
UNION
SELECT *
FROM computers As c
WHERE EXISTS (
SELECT SubString(clientid, 11, 100)
FROM computers
WHERE SubString(clientid, 11, 100) = c.clientid
)
You've now updated your question asking to make this a delete.
Well the good news is that instead of the "OR" you just make two DELETE statements:
DELETE
FROM computers
LEFT
JOIN policyassociations
ON policyassociations.entityid = computers.pk
WHERE (
computers.encryptionstatus = 0
OR computers.encryptionstatus IS NULL
)
AND (
policyassociations.entitytype <> 1
OR policyassociations.entitytype IS NULL
)
AND EXISTS (
SELECT name
FROM (
SELECT name
FROM computers
GROUP
BY name
HAVING Count(*) > 1
) As duplicate_computers
WHERE name = computers.name
)
;
DELETE
FROM computers As c
WHERE EXISTS (
SELECT SubString(clientid, 11, 100)
FROM computers
WHERE SubString(clientid, 11, 100) = c.clientid
)
;

Some things I would look at are
1. are indexes in place?
2. 'IN' will slow your query, try replacing it with joins,
3. you should use column name, I guess 'Name' in this case, while using count(*),
4. try selecting required data only, by selecting particular columns.
Hope this helps!

or can be poorly optimized sometimes. In this case, you can just split the query into two subqueries, and combine them using union:
select *
From computers
Where Name in
(
select T2.Name
from
(
select Name
from computers
group by Name
having COUNT(*) > 1
) T3
join computers T2 on T3.Name = T2.Name
left join policyassociations PA on T2.PK = PA.EntityId
where (T2.EncryptionStatus = 0 or T2.EncryptionStatus is NULL) and
(PA.EntityType <> 1 or PA.EntityType is NULL)
)
UNION
select *
From computers
WHERE ClientId in
(
select substring(ClientID,11,100)
from computers
);
You might also be able to improve performance by replacing the subqueries with explicit joins. However, this seems like the shortest route to better performance.
EDIT:
I think the version with join's is:
select c.*
From computers c left outer join
(select c.Name
from (select c.*, count(*) over (partition by Name) as cnt
from computers c
) c left join
policyassociations PA
on T2.PK = PA.EntityId and PA.EntityType <> 1
where (c.EncryptionStatus = 0 or c.EncryptionStatus is NULL) and
c.cnt > 1
) cpa
on c.Name = cpa.Name left outer join
(select substring(ClientID, 11, 100) as name
from computers
) csub
on c.Name = csub.name
Where cpa.Name is not null or csub.Name is not null;

How to join SELECT statements within a CASE statement to outer queries?

In the following I need to somehow join the fields within the case statement with the outer tables so that the fldNumb is selected for the current PK and CIA values. I am really stuck, can anyone help?
INSERT INTO #CTE (sMedNum)
SELECT T1.sMedNum
FROM #CTE INNER JOIN
(SELECT
(CASE WHEN (Charindex('.', CAST(rInd AS NVARCHAR(30))) > 0)
THEN CAST(( SELECT fldNumb
FROM #CTE
WHERE sCtr = FLOOR(rInd)) AS FLOAT) +
CAST(( SELECT fldNumb
FROM #CTE
WHERE Ind = FLOOR(rInd+1)) AS FLOAT) / 2
ELSE CAST(( SELECT fldNumb
FROM #CTE
WHERE sCtr = rInd) AS FLOAT)
END) AS sMedNum, fldPK, fldCIA
) T1
ON
#CTE.fldPK = T2.fldPK AND
#CTE.fldCIA = T2.fldCIA

Not having your database I cannot help you terribly much, but it looks like you need to be aliasing your tables. If you are trying to get a value from a table outside a subquery, you have to call it something different.
here is an example
select case
when a1.value = 1 then (select a2.id from tableA a2 where a1.val2 = a2.val2)
when a1.value = 2 then (select a3.id from tableA a3 where a1.val2 = a3.val2)
when a1.value = 3 then (select a4.id from tableA a4 where a1.val2 = a4.val2)
end as new_id
from tableA a1
hope this makes sense, if it doesn't, then that is what questions are for ;)

Link tables based on column value

Is it possible to pull values from 2 different tables based on the value of a column? For example, I have a table with a boolean column that either returns 0 or 1 depending on what the end user selects in our program. 0 means that I should pull in the default values. 1 means to use the user's data.
If my table Table1 looked like this:
Case ID Boolean
====================
1 0
2 1
3 1
4 0
5 0
Then I would need to pull Case IDs 1,4,and 5's corresponding data from table Default and Case IDs 3 and 4's corresponding data from table UserDef. Then I would have to take these values, combine them, and reorder them by Case ID so I can preserve the order in the resulting table.
I am fairly inexperienced with SQL but I am trying to learn. Any help or suggestions are greatly appreciated. Thank you in advance for your help.

Something like this:
SELECT
t1.CaseID
,CASE WHEN t1.Boolean = 1 THEN dt.Col1 ELSE ut.Col1 END AS Col1
,CASE WHEN t1.Boolean = 1 THEN dt.Col2 ELSE ut.Col2 END AS Col2
FROM Table1 t1
LEFT JOIN DefaultTable dt ON dt.CaseID = t1.CaseID
LEFT JOIN UserDefTable ut ON ut.CaseID = t1.CaseID
ORDER BY t1.CaseID
You join on both tables and then use CASE in SELECT to choose from which one to display data.
Option B:
WITH CTE_Combo AS
(
SELECT 0 as Boolean, * FROM Default --replace * with needed columns
UNION ALL
SELECT 1 AS Boolean, * FROM UserDef --replace * with needed columns
)
SELECT * FROM Table1 t
LEFT JOIN CTE_Combo c ON t.CaseID = c.CaseID AND t.Boolean = c.Boolean
ORDER BY t.CaseID
This might be even simpler - using CTE make a union of both tables adding artificial column, and then join CTE and your Table using both ID and flag column.

SELECT t1.CaseID,
ISNULL(td.data, tu.data) userData -- pick data from table_default
-- if not null else from table_user
FROM table1 t1
LEFT JOIN table_default td ON t1.CaseID = td.CaseID -- left join with table_default
AND t1.Boolean = 0 -- when boolean = 0
LEFT JOIN table_user tu ON t1.CaseID = tu.CaseID -- left join with table_user
AND t1.Boolean = 1 -- when boolean = 1
ORDER BY t1.CaseID

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Need to improve the performance of a SQL query - sql

Related

How do you properly query the result of a complex join statement in SQL?

Running slow with intersect

Is there a way to make this query more efficient performance wise?

How to join SELECT statements within a CASE statement to outer queries?

Link tables based on column value

Categories

Resources