I have below the query that updates certain ids on HierarchicalTable.
My first query, is stable but I have a problem on its performance:
DECLARE #targetName varchar(100)
UPDATE a
SET a.PrimaryId = b.PrimaryId
, a.SecondaryId = b.SecondaryId
FROM
(
SELECT PrimaryId
, SecondaryId
FROM Hierarchical
WHERE ParentName = #targetName
) as a
JOIN
(
SELECT PrimaryId
, SecondaryId
FROM Hierarchical
WHERE Name = #targetName
) b
ON a.ParentId = b.Id
This next query, is my second option:
DECLARE #targetName varchar(100)
UPDATE a
SET a.PrimaryId = b.PrimaryId
, a.SecondaryId = b.SecondaryId
FROM Hierarchical a
JOIN Hierarchical b
ON a.ParentId = b.Id
WHERE a.ParentName = #targetName
AND b.Name = #targetName
My questions are:
Does the second query execute just like the first query?
Will the second query outperform the first query?
*Note: I have large scale of data, and we're having hardware issues on executing
these queries.
I've posted here at SO so that I can have any opinions that I can see.
Your first query will not execute because it is missing an on clause. Let me assume that the on clause is really a.Id = b.Id.
The question you are asking is about how the query is optimized. The real way to answer is to look at the query plan, which you can readily see in SQL Server Management Studio. You can start with the documentation to go down this path.
In your case, though, you are using the subqueries to say "do the filtering when you read the data". Actually, SQL Server typically pushes such filtering operations down to the table read, so the subqueries are probably superfluous.
If you want to improve performance, I would suggest that you have the following indexes on the table: hierarchical(parentname, id) and hierarchical(name, id). These should probably give a good performance boost.
Related
Assume my table has millions of records.
The following columns have indexes:
type
date
unique_id
Here is my query:
SELECT TOP (1000) T.TIME, T.TYPE, F.NAME,
B.NAME, T.MESSAGE
FROM MY_TABLE T
LEFT OUTER JOIN FOO F ON F.ID = T.FID
LEFT OUTER JOIN BAR B ON B.ID = T.BID
WHERE T.TYPE IN ('success', 'failure')
AND T.DATE BETWEEN 1592585183437 AND 1594232320525
AND T.UNIQUE_ID = "my unique ID"
ORDER BY T.DATE DESC
My question is am I causing myself any trouble with this query if I have tons of records in my table? Can this be optimized further?
My question is am I causing myself any trouble with this query if I have tons of records
in my table?
Wrong question. As long as you are only asking for data you need, you NEED it. Any trouble means you STIL need it.
The query looks as good as it gets. I somehow doubt the TOP 10000 (that is a LOT of data unless you compress it somehow).
The question is more whether you have the proper indices. Without the proper indices. I would also not use strings like this for TYPE, but then the question is about the query, not possible failures in table design.
Check indices.
I'm working on an oracle query that is doing a select on a huge table, however the joins with other tables seem to be costing a lot in terms of time of processing.
I'm looking for tips on how to improve the working of this query.
I'm attaching a version of the query and the explain plan of it.
Query
SELECT
l.gl_date,
l.REST_OF_TABLES
(
SELECT
MAX(tt.task_id)
FROM
bbb.jeg_pa_tasks tt
WHERE
l.project_id = tt.project_id
AND l.task_number = tt.task_number
) task_id
FROM
aaa.jeg_labor_history l,
bbb.jeg_pa_projects_all p
WHERE
p.org_id = 2165
AND l.project_id = p.project_id
AND p.project_status_code = '1000'
Something to mention:
This query takes data from oracle to send it to a sql server database, so I need it to be this big, I can't narrow the scope of the query.
the purpose is to set it to a sql server job with SSIS so it runs periodically
One obvious suggestion is not to use sub query in select clause.
Instead, you can try to join the tables.
SELECT
l.gl_date,
l.REST_OF_TABLES
t.task_id
FROM
aaa.jeg_labor_history l
Join bbb.jeg_pa_projects_all p
On (l.project_id = p.project_id)
Left join (SELECT
tt.project_id,
tt.task_number,
MAX(tt.task_id) task_id
FROM
bbb.jeg_pa_tasks tt
Group by tt.project_id, tt.task_number) t
On (l.project_id = t.project_id
AND l.task_number = t.task_number)
WHERE
p.org_id = 2165
AND p.project_status_code = '1000';
Cheers!!
As I don't know exactly how many rows this query is returning or how many rows this table/view has.
I can provide you few simple tips which might be helpful for you for better query performance:
Check Indexes. There should be indexes on all fields used in the WHERE and JOIN portions of the SQL statement.
Limit the size of your working data set.
Only select columns you need.
Remove unnecessary tables.
Remove calculated columns in JOIN and WHERE clauses.
Use inner join, instead of outer join if possible.
You view contains lot of data so you can also break down and limit only the information you need from this view
I have written this view when deadline was coming.
WITH AllCategories
AS (SELECT CaseTable.CaseID,
CT.Category,
CT.CategoryType,
Q.Note AS CategoryCaseNote,
Q.CategoryID,
Q.CategoryIsDefaultValue
FROM CaseTable
INNER JOIN
((SELECT CaseID, -- Filled categories in table
CategoryCaseNote AS Note,
CategoryID,
-1 AS QuestionID,
0 AS CategoryIsDefaultValue
FROM CaseCategory)
UNION ALL
(SELECT -1 AS CaseID, -- possible categories
NULL AS Note,
CategoryID AS CategoryID,
QuestionID,
1 AS CategoryIsDefaultValue
FROM SHOW_QuestionCategory)) AS Q
ON (Q.QuestionID = -1
OR Q.QuestionID = CaseTransactionTable.QuestionID)
AND (Q.CaseID = -1
OR Q.CaseID = CaseTable.CaseTransactionID)
LEFT OUTER JOIN
CategoryTable AS CT
ON Q.CategoryID = CT.CategoryID)
SELECT A.*
FROM AllCategories AS A
INNER JOIN
(SELECT CaseID,
CategoryID,
MIN(CategoryIsDefaultValue) AS CategoryIsDefaultValue
FROM AllCategories
GROUP BY CaseID, CategoryID) AS B
ON A.CaseID = B.CaseID
AND A.CategoryID = B.CategoryID
AND A.CategoryIsDefaultValue = B.CategoryIsDefaultValue
Now it's becoming bottleneck because of very expensive join between CaseTable and subquery with UNION (resulting in over 30% cost of frequently used procedure; in execution plan it's nested loops node with ~70% cost of select).
I have tried to rewrite it multiple times, but these attempts resulted only in worser perfomance.
Table CaseCategory have unique index on tuple (CaseID, CategoryID).
It's probably a combination of problems with bad cardinality estimates and use of CTE. With what you've told us, I'd try to give some general guidance. Info you provided on the index means nothing without knowing the cardinality and distribution od the data. BTW, not sure if it would qualify as an answer, but it's too long for a comment. Feel free to downvote :)
There is a stored procedure selecting from the view, am I correct? I also presume you have some WHERE clause somewhere, right?
In that case, get rid of the view alltogether, and move the code into the procedure. This will allow to get rid of the CTE (which is most likely executed twice), and to save the intermediate results from what is now the CTE into a #temp table. Could be benefitial to apply the same #temp-table strategy with the UNION ALL subquery.
Make sure to apply the WHERE predicates as soon as possible (SQL Server is usually good with pushing, but this combination of proc-view-CTE might confuse it).
I have this below SQL query that I want to get an opinion on whether I can improve it using Temp Tables or something else or is this good enough? So basically I am just feeding the result set from inner query to the outer one.
SELECT S.SolutionID
,S.SolutionName
,S.Enabled
FROM dbo.Solution S
WHERE s.SolutionID IN (
SELECT DISTINCT sf.SolutionID
FROM dbo.SolutionToFeature sf
WHERE sf.SolutionToFeatureID IN (
SELECT sfg.SolutionToFeatureID
FROM dbo.SolutionFeatureToUsergroup SFG
WHERE sfg.UsergroupID IN (
SELECT UG.UsergroupID
FROM dbo.Usergroup UG
WHERE ug.SiteID = #SiteID
)
)
)
It's going to depend largely on the indexes you have on those tables. Since you are only selecting data out of the Solution table, you can put everything else in an exists clause, do some proper joins, and it should perform better.
The exists clause will allow you to remove the distinct you have on the SolutionToFeature table. Distinct will cause a performance hit because it is basically creating a temp table behind the scenes to do the comparison on whether or not the record is unique against the rest of the result set. You take a pretty big hit as your tables grow.
It will look something similar to what I have below, but without sample data or anything I can't tell if it's exactly right.
Select S.SolutionID, S.SolutionName, S.Enabled
From dbo.Solutin S
Where Exists (
select 1
from dbo.SolutionToFeature sf
Inner Join dbo.SolutionToFeatureTousergroup SFG on sf.SolutionToFeatureID = SFG.SolutionToFeatureID
Inner Join dbo.UserGroup UG on sfg.UserGroupID = UG.UserGroupID
Where S.SolutionID = sf.SolutionID
and UG.SiteID = #SiteID
)
I have an SQL Query (For SQL Server 2008 R2) that takes a very long time to complete. I was wondering if there was a better way of doing it?
SELECT #count = COUNT(Name)
FROM Table1 t
WHERE t.Name = #name AND t.Code NOT IN (SELECT Code FROM ExcludedCodes)
Table1 has around 90Million rows in it and is indexed by Name and Code.
ExcludedCodes only has around 30 rows in it.
This query is in a stored procedure and gets called around 40k times, the total time it takes the procedure to finish is 27 minutes.. I believe this is my biggest bottleneck because of the massive amount of rows it queries against and the number of times it does it.
So if you know of a good way to optimize this it would be greatly appreciated! If it cannot be optimized then I guess im stuck with 27 min...
EDIT
I changed the NOT IN to NOT EXISTS and it cut the time down to 10:59, so that alone is a massive gain on my part. I am still going to attempt to do the group by statement as suggested below but that will require a complete rewrite of the stored procedure and might take some time... (as I said before, im not the best at SQL but it is starting to grow on me. ^^)
In addition to workarounds to get the query itself to respond faster, have you considered maintaining a column in the table that tells whether it is in this set or not? It requires a lot of maintenance but if the ExcludedCodes table does not change often, it might be better to do that maintenance. For example you could add a BIT column:
ALTER TABLE dbo.Table1 ADD IsExcluded BIT;
Make it NOT NULL and default to 0. Then you could create a filtered index:
CREATE INDEX n ON dbo.Table1(name)
WHERE IsExcluded = 0;
Now you just have to update the table once:
UPDATE t
SET IsExcluded = 1
FROM dbo.Table1 AS t
INNER JOIN dbo.ExcludedCodes AS x
ON t.Code = x.Code;
And ongoing you'd have to maintain this with triggers on both tables. With this in place, your query becomes:
SELECT #Count = COUNT(Name)
FROM dbo.Table1 WHERE IsExcluded = 0;
EDIT
As for "NOT IN being slower than LEFT JOIN" here is a simple test I performed on only a few thousand rows:
EDIT 2
I'm not sure why this query wouldn't do what you're after, and be far more efficient than your 40K loop:
SELECT src.Name, COUNT(src.*)
FROM dbo.Table1 AS src
INNER JOIN #temptable AS t
ON src.Name = t.Name
WHERE src.Code NOT IN (SELECT Code FROM dbo.ExcludedCodes)
GROUP BY src.Name;
Or the LEFT JOIN equivalent:
SELECT src.Name, COUNT(src.*)
FROM dbo.Table1 AS src
INNER JOIN #temptable AS t
ON src.Name = t.Name
LEFT OUTER JOIN dbo.ExcludedCodes AS x
ON src.Code = x.Code
WHERE x.Code IS NULL
GROUP BY src.Name;
I would put money on either of those queries taking less than 27 minutes. I would even suggest that running both queries sequentially will be far faster than your one query that takes 27 minutes.
Finally, you might consider an indexed view. I don't know your table structure and whether your violate any of the restrictions but it is worth investigating IMHO.
You say this gets called around 40K times. WHy? Is it in a cursor? If so do you really need a cursor. Couldn't you put the values you want for #name in a temp table and index it and then join to it?
select t.name, count(t.name)
from table t
join #name n on t.name = n.name
where NOT EXISTS (SELECT Code FROM ExcludedCodes WHERE Code = t.code)
group by t.name
That might get you all your results in one query and is almost certainly faster than 40K separate queries. Of course if you need the count of all the names, it's even simpleer
select t.name, count(t.name)
from table t
NOT EXISTS (SELECT Code FROM ExcludedCodes WHERE Code = t
group by t.name
NOT EXISTS typically performs better than NOT IN, but you should test it on your system.
SELECT #count = COUNT(Name)
FROM Table1 t
WHERE t.Name = #name AND NOT EXISTS (SELECT 1 FROM ExcludedCodes e WHERE e.Code = t.Code)
Without knowing more about your query it's tough to supply concrete optimization suggestions (i.e. code suitable for copy/paste). Does it really need to run 40,000 times? Sounds like your stored procedure needs reworking, if that's feasible. You could exec the above once at the start of the proc and insert the results in a temp table, which can keep the indexes from Table1, and then join on that instead of running this query.
This particular bit might not even be the bottleneck that makes your query run 27 minutes. For example, are you using a cursor over those 90 million rows, or scalar valued UDFs in your WHERE clauses?
Have you thought about doing the query once and populating the data in a table variable or temp table? Something like
insert into #temp (name, Namecount)
values Name, Count(name)
from table1
where name not in(select code from excludedcodes)
group by name
And don't forget that you could possibly use a filtered index as long as the excluded codes table is somewhat static.
Start evaluating the execution plan. Which is the heaviest part to compute?
Regarding the relation between the two tables, use a JOIN on indexed columns: indexes will optimize query execution.