I am writting the following query,
Execution plan
Its takes 30 seconds to load just 80 rows.
Is there anything we can do to reduce the time of running this query?
select
CO.ContributorsName [ContributorsName]
, D.DocumentLastPublished DocumentLastPublished
, CO.ContributorsImage [AuthorImage]
, T.NodeAliasPath
, D.DocumentID
, BD.*
from CMS_Tree T
inner join Cms_Class CC
on T.NodeClassID = CC.ClassID
and CC.ClassName = 'wv.blogdata'
inner join Cms_Document D
on T.NodeID = D.DocumentNodeID
inner join WV_BlogData BD
on D.DocumentForeignKeyValue = BD.BlogDataID
and COALESCE(BD.IsDeleted, 0) = 0
inner join WV_Contributors CO
on BD.AuthorID = CO.ContributorsID
where (
'ALL' = 'ALL'
or category = 'All'
)
and DocumentCulture = 'en-US'
Don't use * for all tables.Only specify column names what columns you need.Check your WHERE Clause also.
Covering indexes
(Looking at your execution plan, it looks like you've already got the appropriate covering indexes, but this is good general advice, and still worth a try)
If this is a frequently used query, make sure you've got the appropriate covering indexes on the tables involved. See this MSDN page for how to identify potential missing indexes. Note that adding indexes will improve query performance, at the cost of degrading your insert performance. You will also need to make sure you've got the appropriate maintenance plans in place to ensure your indexes don't get fragmented or unbalanced.
Query changes
I'd also recommend trying some changes to your query and comparing the execution plans.
It's difficult to make any meaningful suggestions without looking at your database and being able to try a few things.
From a cursory look at your query, the most obvious thing I can see is that you're performing an inner join on Cms_Class, but not selecting any of the data from it, or even joining it to other tables (apart from CMS_Tree). I'd suggest removing this join and using an exists statement instead, like so:
select
CO.ContributorsName [ContributorsName]
, D.DocumentLastPublished DocumentLastPublished
, CO.ContributorsImage [AuthorImage]
, T.NodeAliasPath
, D.DocumentID
, BD.*
from CMS_Tree T
inner join Cms_Document D
on T.NodeID = D.DocumentNodeID
inner join WV_BlogData BD
on D.DocumentForeignKeyValue = BD.BlogDataID
and COALESCE(BD.IsDeleted, 0) = 0
inner join WV_Contributors CO
on BD.AuthorID = CO.ContributorsID
where (
'ALL' = 'ALL'
or category = 'All'
)
and DocumentCulture = 'en-US'
and exists
(
select null
from Cms_Class CC
where T.NodeClassID = CC.ClassID
and CC.ClassName = 'wv.blogdata'
)
Give it a try, look at the execution plans, and see if it makes a difference for you.
If you create new covering indexes, re-run your queries and look at the execution plans again, because the most efficient query with missing indexes might not be the most efficient query once you've added indexes.
Document caching (SQL isn't always the best solution for accessing data)
Assuming you've done both of these, and the query performance is still too poor, you may want to ask yourself if you really need to query live data. Looking at your query, it looks like you're querying data from a CMS. The data in a CMS is only going to change when a content author actually makes a change. Most of the time, the data will stay the same from request to request. This means that doing a direct query from SQL every time you want to access content might be overkill for your needs.
A good use-case example is to look at how Umbraco CMS accesses its data. It keeps an XML document cache of all of the published documents on a given site. When a content author publishes changes, it then updates the XML document cache.
Accessing the cache is much more efficient than talking to SQL directly, and they even warn users not to use their SQL API for serving up CMS content, because it is too slow.
Related
I am not sure whether the title of this question is correct or not.
I have a table for example users which contains different types of users. Like user type 10, 20, 30 etc.
In a query I need to join the user table, but I want only user type 20. So which of the below query perform better.
SELECT fields
FROM consumer c
INNER JOIN user u ON u.userid = c.userid
WHERE u.type = 20
In another way,
SELECT fields
FROM consumer c
INNER JOIN (SELECT user_fields FROM user WHERE type = 20) u ON u.userid = c.userid
Please advice.
Let's start with this query:
SELECT . . .
FROM consumer c INNER JOIN
user u
ON u.userid = c.userid
WHERE u.type = 20;
Assuming that type is relatively rare, you want indexes on the tables. The best indexes are probably user(type, userid) and customer(userid). It is possible that an index on user(userid, type) would be better (and would be unnecessary if userid is a clustered primary key).
The second query . . . well, from the SQL Server perspective it is probably the same. Why? SQL Server has a good optimizer. You can check the execution plans if you like. Because of the optimizer:
There is no benefit to having a subquery select only a handful of columns. For better or worse, SQL Server pushes that information down to the node that reads the data.
The where clause is not necessarily going to be evaluated before the join. SQL Server is smart enough to re-arrange operations.
Not all optimizers are this smart. In a database such as MySQL, MS Access, or SQLite, I'm pretty sure the first version is much better than the second.
Run the two queries in SSMS as a batch, and click "execution plan" , you will find that the execution plan of both queries, and the query cost (relative to the batch ): 50%
That means they are the same.
If they are different (in case of some optimization), you find the ratio different.
I simulated your query and find the query cost=50% ===> i.e they are the same.
It really depends on a various number of factors:
is "userid" on both table indexed?
is "type" on table "users" indexed?
how many rows in each table?
Usually a subquery produces slower performances, but depending on the conditions listed above and how your sql server installation is configured, both query can be resolved (and so, executed) as the same by the query analyzer.
SQLServer takes your query and tries to optimize it so it can happen that query B is "transformed" in query A.
Look at the QueryAnalyzer tool for both queries, and see if they have differences.
Generally speaking inner queries are better to be avoided, and you'll probably get the best performances doing query A.
Both your options are valid. Personally would code it like this;
SELECT fields
FROM consumer c
INNER JOIN user u ON u.userid = c.userid and u.type = 20
Run both queries in SQL Management Studio (query) and tick 'Include actual execution plan'. This will let you see the performance of your queries against each other. It will depend on your particular database.
I have a query which is taking approximately 10 mins to execute and produce the results. When I try to break it into parts and run it, it seems to run fine, within seconds.
I tried to modify the subselect of the top and the bottom portions of the query and determine if that was causing the issue, but it was not. It gave out some results within 3 seconds.
I am trying to learn to read the Estimated Execution plan, but it is becoming more confusing and hard for me to trace to the issue.
Can anyone please point out some mistakes which I made that is making the query for long?
Select Distinct
PostExtended.BatchNum,
post.ControlNumStatus,
post.AccountSeg,
Post.PostDat
From
Post
Post Records
join (Select Post, MAX(Dist) as Dist, COUNT(fkglDist) as RecordCount From PostExtend WITH (NOLOCK) Group By flPost) as PostExtender on Post.PK = PostExtender.flPost
join glPostExtended WITH (NOLOCK) on glPostExtendedLimiter.Post = glPostExtended.Post and (PostExtendedLimiter.fkglDist = PostExtend.Dist or PostExtend.Dist is null)
join (select lP.fkosControlNumberStatus, lP.SourceJENumber, AccountSegment,
sum(case
............
from Post WITH (NOLOCK)
join AccountingPeriod WITH (NOLOCK) on AccountingPeriod.pk = lP.fkglAccountingPeriod
join FiscalYear WITH (NOLOCK) on FiscalYear.pk = AccountingPeriod.FiscalYear
join Account WITH (NOLOCK) on Account.pk = FiscalYear.Account
where FiscalYear.Period = #Date
and glP.fkMLSosCodeEntryType = 2202
group by glP.fkosControlNumberStatus, glP.SourceNumber, AccountSeg) post on post.ControlNumStatus = Post.fkControlNumberStatus and postdata.SourceJENumber = glPost.SourceJENumber
where post.AmountT <> 0)......
Group by
The subqueries are very often the point of problems.
I would try to:
separate the postdata subquery from the main query,
save the result in a temporary table or even in a table variable,
put clustered index on fkosControlNumberStatus and SourceJENumber fields,
join this temporary table back to the main query.
Sometimes the result of these simple actions pleasantly surprises.
This is a fairly complex query. You are joining on Aggregate Queries (with GROUP BY).
The first thing I would do is see how long it takes to run each of the join queries. One of these may run very fast, while another may run very long. So, you may not really need to optimize the entire query--just one of the joined queries.
Another way to do it is just start eliminating joins one by one, then run the entire query and see how fast it goes. When you have a really significant decrease in time, you've found the error.
Typically, one thing that can add a lot of CPU is comparisons. The sums with case statements might be the biggest suspect.
Have you used the Database Engine Tuning Adviser? If all else fails, go with that and see what it tells you.
So, maybe try this approach:
Take away the CASE Statements inside the SUM expressions on that last join.
Remove the last JOIN with all the sums.
Remove the first join with that GROUP BY and the MAX expression
That would be my strategy.
I've got a query that gets run in certain circumstances with an 'over-simplified' execution plan that actually turns out to be quite slow (3-5 seconds). The query is:
SELECT DISTINCT Salesperson.*
FROM Salesperson
INNER JOIN SalesOrder on Salesperson.Id = SalesOrder.SalespersonId
INNER JOIN PrelimOrder on SalesOrder.Id = PrelimOrder.OrderId
INNER JOIN PrelimOrderStatus on PrelimOrder.CurrentStatusId = PrelimOrderStatus.Id
INNER JOIN PrelimOrderStatusType on PrelimOrderStatus.StatusTypeId = PrelimOrderStatusType.Id
WHERE
PrelimOrderStatusType.StatusTypeCode = 'Draft'
AND Salesperson.EndDate IS NULL
and the slow execution plan looks like:
The thing that stands out straight away is that the actual number of rows/executions is significantly higher than the respective estimates:
If I remove the Salesperson.EndDate IS NULL clause, then a faster, parallelized execution plan is run:
A similar execution plan also runs quite fast if I remove the DISTINCT keyword.
From what I can gather, it seems that the optimiser decides, based on its incorrect estimates, that the query won't be costly to run and therefore doesn't choose the parallelized plan. But I can't for the life of me figure out why it is choosing the incorrect plan. I have checked my statistics and they are all as they should be. I have tested in both SQL Server 2008 to 2016 with identical results.
SELECT DISTINCT is expensive. So, it is best to avoid it. Something like this:
SELECT sp.*
FROM Salesperson sp
WHERE EXISTS (SELECT 1
FROM SalesOrder so INNER JOIN
PrelimOrder po
ON so.Id = po.OrderId INNER JOIN
PrelimOrderStatus pos
ON po.CurrentStatusId = pos.Id INNER JOIN
PrelimOrderStatusType post
ON pos.StatusTypeId = post.Id
WHERE sp.Id = so.SalespersonId AND
post.StatusTypeCode = 'Draft'
) AND
sp.EndDate IS NULL;
Note: An index on SalesPerson(EndDate, Id) would be helpful.
As #Gordon Linoff already said, DISTINCT usually is bad news for performance. Often it means you're amassing way too much data and then squeezing it back together in a more compact set. Better to keep it small all throughout the process, if possible.
Also, it's kind of counter-intuitive that the query plan with index scans turns out to be faster than the one with index seeks; it seems (in this case) parallelism makes up for it. You could try playing around with the
Cost Threshold For Parallelism Option but beware that this is a server-wide setting! (then again, in my opinion the default of 5 is rather high for most use-cases I've run into personally; CPU's are aplenty these days, time still isn't =).
Bit of a long reach, but I was wondering if you could 'split' the query in 2, thus eliminating (a small) part of the guesswork of the server. I'm assuming here that StatusTypeCode is unique. (verify the datatype of the variable too!)
DECLARE #StatusTypeId int
SELECT #StatusTypeId = Id
FROM PrelimOrderStatusType
WHERE StatusTypeCode = 'Draft'
SELECT Salesperson.*
FROM Salesperson
WHERE Salesperson.EndDate IS NULL
AND EXISTS ( SELECT *
FROM SalesOrder
ON SalesOrder.SalespersonId = Salesperson.Id
JOIN PrelimOrder
ON PrelimOrder.OrderId = SalesOrder.Id
JOIN PrelimOrderStatus
ON PrelimOrderStatus.Id = PrelimOrder.CurrentStatusId
AND PrelimOrderStatus.StatusTypeId = #StatusTypeId)
If it doesn't help, could you give give the definition of the indexes that are being used?
Sure could use some optimization help here. I've got a stored procedure which takes approximately 1 minute, 18 seconds to run and it gets even worse when I run the asp.net page which hits it.
Some stats:
tbl_Allocation typically has approximately 55K records
CS_Ready has ~300
Redate_Orders has ~2000
Here is the code:
ALTER PROCEDURE [dbo].[sp_Order_Display]
/*
(
#parameter1 int = 5,
#parameter2 datatype OUTPUT
)
*/
AS
/* SET NOCOUNT ON */
BEGIN
WTIH CS_Ready AS
(
SELECT
tbl_Order_Notes.txt_Order_Only As CS_Ready_Order
FROM
tbl_Order_Notes
INNER JOIN
tbl_Order_Notes_by_line ON tbl_Order_Notes.txt_Order_Only = SUBSTRING(tbl_Order_Notes_by_line.txt_Order_Key_by_line, 1, CHARINDEX('-', tbl_Order_Notes_by_line.txt_Order_Key_by_line, 0) - 1)
WHERE
(tbl_Order_Notes.bin_Customer_Service_Review = 'True')
AND (tbl_Order_Notes_by_line.dat_Recommended_Date_by_line IS NOT NULL)
AND (tbl_Order_Notes_by_line.bin_Redate_Request_by_line = 'True')
OR (tbl_Order_Notes.bin_Customer_Service_Review = 'True')
AND (tbl_Order_Notes_by_line.dat_Recommended_Date_by_line IS NULL)
AND (tbl_Order_Notes_by_line.bin_Redate_Request_by_line = 'False'
OR tbl_Order_Notes_by_line.bin_Redate_Request_by_line IS NULL)
),
Redate_Orders AS
(
SELECT DISTINCT
SUBSTRING(txt_Order_Key_by_line, 1, CHARINDEX('-', txt_Order_Key_by_line, 0) - 1) AS Redate_Order_Number
FROM
tbl_Order_Notes_by_line
WHERE
(bin_Redate_Request_by_line = 'True')
)
SELECT DISTINCT
tbl_Allocation.*, tbl_Order_Notes.*,
tbl_Order_Notes_by_line.*,
tbl_Max_Promised_Date_1.Max_Promised_Ship,
tbl_Max_Promised_Date_1.Max_Scheduled_Pick,
Redate_Orders.Redate_Order_Number, CS_Ready.CS_Ready_Order,
tbl_Most_Recent_Comments.Abbr_Comment,
MRC_Line.Abbr_Comment as Abbr_Comment_Line
FROM
tbl_Allocation
INNER JOIN
tbl_Max_Promised_Date AS tbl_Max_Promised_Date_1 ON tbl_Allocation.num_Order_Num = tbl_Max_Promised_Date_1.num_Order_Num
LEFT OUTER JOIN
CS_Ready ON tbl_Allocation.num_Order_Num = CS_Ready.CS_Ready_Order
LEFT OUTER JOIN
Redate_Orders ON tbl_Allocation.num_Order_Num = Redate_Orders.Redate_Order_Number
LEFT OUTER JOIN
tbl_Order_Notes ON Hidden_Order_Only = tbl_Order_Notes.txt_Order_Only
LEFT OUTER JOIN
tbl_Order_Notes_by_line ON Hidden_Order_Key = tbl_Order_Notes_by_line.txt_Order_Key_by_line
LEFT OUTER JOIN
tbl_Most_Recent_Comments ON Cast(tbl_Allocation.Hidden_Order_Only as varchar) = tbl_Most_Recent_Comments.Com_ID_Parent_Key
LEFT OUTER JOIN
tbl_Most_Recent_Comments as MRC_Line ON Cast(tbl_Allocation.Hidden_Order_Key as varchar) = MRC_Line.Com_ID_Parent_Key
ORDER BY
num_Order_Num, num_Line_Num
End
RETURN
What suggestions do you have to make this execute within five seconds or less?
Thanks,
Rob
Assuming you have appropriate indices defined, you still have several things that suggest problems.
1) You have 2 select distinct clauses in this query -- in a good design, distinct clauses are are rarely needed
2) The first inner join uses
tbl_Order_Notes_by_line
ON tbl_Order_Notes.txt_Order_Only
= SUBSTRING(tbl_Order_Notes_by_line.txt_Order_Key_by_line, 1,
CHARINDEX('-', tbl_Order_Notes_by_line.txt_Order_Key_by_line, 0) - 1)
This looks like a horrible join criteria -- function calls during the join that prevent any decent query optimization. My guess is that your are using data the has internal meaning and that you are parsing the internal meaning during the join, e.g.,
PartNumber = AAA-BBBB_NNNNNNNN
where AAA is the Country product line and BBBB is the year & month of the design
If you must have coded fields like these AND you need to manipulate them, put the codes into separate database fields and created a computer column -- or even a plan copy of the full part number field if the combined field is unusually complex.
This point is not a performance issue, but you have a long sub-query using multiple AND & OR clauses. I know the rules for operator precedence, you may know the rules for operator precedence, but will the next guy? Will you remember them an 1:00 when stuff is broken.
ADDED
You are using 2 common table expressions. I know others say it does not happen, but I don't really trust the query optimizer for CTE's -- I have had to recode CTE based joins for performance issues on several occasions -- creating an actual view equivalent to the CTE and using that instead can be a significant speedup. May well depend on the version of SQL server, but if you are running an older version I would definitely wonder about CTR optimization. -- This is not as important as the first 2 things I've mentioned, try to fix those first.
ADDED
I'm going to harsh on CTEs again, as I did not really explain why they are bad for performance, and it was bothering me. If you don't have performance issues, and you like the syntax, they can be useful in at least limited usage, personally I don't normally recommend them for anything more than that -- and given that it is MS specific syntactical sugar, I really can't recommend them much at all.
I think the primary reason that CTEs don't get optimized well is that there are no statistics for the opimizer to use. If you are pulling a lot of rows into a CTE, you are probably better off creating #temptable and populating it. You can even add an index or two to your #temptable and the optimizer can figure out how to use them too. A #temp table is similar, but at least through sql 2012, the were no faster than #temp that I could tell -- supposedly new goodness in server 2014 help this.
A CTE is really just a temporary view in disguise, which I why I suggested you can replace with a real view to better better performance (and you often can), or you can populate a temp table and sometime get even better performance.
I've been toying around with switching from ms-access files to SQLite files for my simple database needs; for the usual reasons: smaller file size, less overhead, open source, etc.
One thing that is preventing me from making the switch is what seems to be a lack of speed in SQLite. For simple SELECT queries, SQLite seems to perform as well as, or better than MS-Access. The problem occurs with a fairly complex SELECT query with multiple INNER JOIN statements:
SELECT DISTINCT
DESCRIPTIONS.[oCode] AS OptionCode,
DESCRIPTIONS.[descShort] AS OptionDescription
FROM DESCRIPTIONS
INNER JOIN tbl_D_E ON DESCRIPTIONS.[oCode] = tbl_D_E.[D]
INNER JOIN tbl_D_F ON DESCRIPTIONS.[oCode] = tbl_D_F.[D]
INNER JOIN tbl_D_H ON DESCRIPTIONS.[oCode] = tbl_D_H.[D]
INNER JOIN tbl_D_J ON DESCRIPTIONS.[oCode] = tbl_D_J.[D]
INNER JOIN tbl_D_T ON DESCRIPTIONS.[oCode] = tbl_D_T.[D]
INNER JOIN tbl_Y_D ON DESCRIPTIONS.[oCode] = tbl_Y_D.[D]
WHERE ((tbl_D_E.[E] LIKE '%')
AND (tbl_D_H.[oType] ='STANDARD')
AND (tbl_D_J.[oType] ='STANDARD')
AND (tbl_Y_D.[Y] = '41')
AND (tbl_Y_D.[oType] ='STANDARD')
AND (DESCRIPTIONS.[oMod]='D'))
In MS-Access, this query executes in about 2.5 seconds. In SQLite, it takes a little over 8 minutes. It takes the same amount of time whether I'm running the query from VB code or from the command prompt using sqlite3.exe.
So my questions are the following:
Is SQLite just not optimized to handle multiple INNER JOIN statements?
Have I done something obviously stupid in my query (because I am new to SQLite) that makes it so slow?
And before anyone suggests a completely different technology, no I can not switch. My choices are MS-Access or SQLite. :)
UPDATE:
Assigning an INDEX to each of the columns in the SQLite database reduced the query time from over 8 minutes down to about 6 seconds. Thanks to Larry Lustig for explaining why the INDEXing was needed.
As requested, I'm reposting my previous comment as an actual answer (when I first posted the comment I was not able, for some reason, to post it as an answer):
MS Access is very aggressive about indexing columns on your behalf, whereas SQLite will require you to explicitly create the indexes you need. So, it's possible that Access has indexed either [Description] or [D] for you but that those indexes are missing in SQLite. I don't have experience with that amount of JOIN activity in SQLite. I used it in one Django project with a relatively small amount of data and did not detect any performance issues.
Do you have issues with referencial integrity? I ask because have the impression you've got unnecessary joins, so I re-wrote your query as:
SELECT DISTINCT
t.[oCode] AS OptionCode,
t.[descShort] AS OptionDescription
FROM DESCRIPTIONS t
JOIN tbl_D_H h ON h.[D] = t.[oCode]
AND h.[oType] = 'STANDARD'
JOIN tbl_D_J j ON j.[D] = t.[oCode]
AND j.[oType] = 'STANDARD'
JOIN tbl_Y_D d ON d.[D] = t.[oCode]
AND d.[Y] = '41'
AND d.[oType] ='STANDARD'
WHERE t.[oMod] = 'D'
If DESCRIPTIONS and tbl_D_E have multiple row scans then oCode and D should be indexed. Look at example here to see how to index and tell how many row scans there are (http://www.siteconsortium.com/h/p1.php?id=mysql002).
This might fix it though ..
CREATE INDEX ocode_index ON DESCRIPTIONS (oCode) USING BTREE;
CREATE INDEX d_index ON tbl_D_E (D) USING BTREE;
etc ....
Indexing correctly is one piece of the puzzle that can easily double, triple or more the speed of the query.