SQL Server stored procedure takes 1' 18" to run... seems long - sql

Sure could use some optimization help here. I've got a stored procedure which takes approximately 1 minute, 18 seconds to run and it gets even worse when I run the asp.net page which hits it.
Some stats:
tbl_Allocation typically has approximately 55K records
CS_Ready has ~300
Redate_Orders has ~2000
Here is the code:
ALTER PROCEDURE [dbo].[sp_Order_Display]
/*
(
#parameter1 int = 5,
#parameter2 datatype OUTPUT
)
*/
AS
/* SET NOCOUNT ON */
BEGIN
WTIH CS_Ready AS
(
SELECT
tbl_Order_Notes.txt_Order_Only As CS_Ready_Order
FROM
tbl_Order_Notes
INNER JOIN
tbl_Order_Notes_by_line ON tbl_Order_Notes.txt_Order_Only = SUBSTRING(tbl_Order_Notes_by_line.txt_Order_Key_by_line, 1, CHARINDEX('-', tbl_Order_Notes_by_line.txt_Order_Key_by_line, 0) - 1)
WHERE
(tbl_Order_Notes.bin_Customer_Service_Review = 'True')
AND (tbl_Order_Notes_by_line.dat_Recommended_Date_by_line IS NOT NULL)
AND (tbl_Order_Notes_by_line.bin_Redate_Request_by_line = 'True')
OR (tbl_Order_Notes.bin_Customer_Service_Review = 'True')
AND (tbl_Order_Notes_by_line.dat_Recommended_Date_by_line IS NULL)
AND (tbl_Order_Notes_by_line.bin_Redate_Request_by_line = 'False'
OR tbl_Order_Notes_by_line.bin_Redate_Request_by_line IS NULL)
),
Redate_Orders AS
(
SELECT DISTINCT
SUBSTRING(txt_Order_Key_by_line, 1, CHARINDEX('-', txt_Order_Key_by_line, 0) - 1) AS Redate_Order_Number
FROM
tbl_Order_Notes_by_line
WHERE
(bin_Redate_Request_by_line = 'True')
)
SELECT DISTINCT
tbl_Allocation.*, tbl_Order_Notes.*,
tbl_Order_Notes_by_line.*,
tbl_Max_Promised_Date_1.Max_Promised_Ship,
tbl_Max_Promised_Date_1.Max_Scheduled_Pick,
Redate_Orders.Redate_Order_Number, CS_Ready.CS_Ready_Order,
tbl_Most_Recent_Comments.Abbr_Comment,
MRC_Line.Abbr_Comment as Abbr_Comment_Line
FROM
tbl_Allocation
INNER JOIN
tbl_Max_Promised_Date AS tbl_Max_Promised_Date_1 ON tbl_Allocation.num_Order_Num = tbl_Max_Promised_Date_1.num_Order_Num
LEFT OUTER JOIN
CS_Ready ON tbl_Allocation.num_Order_Num = CS_Ready.CS_Ready_Order
LEFT OUTER JOIN
Redate_Orders ON tbl_Allocation.num_Order_Num = Redate_Orders.Redate_Order_Number
LEFT OUTER JOIN
tbl_Order_Notes ON Hidden_Order_Only = tbl_Order_Notes.txt_Order_Only
LEFT OUTER JOIN
tbl_Order_Notes_by_line ON Hidden_Order_Key = tbl_Order_Notes_by_line.txt_Order_Key_by_line
LEFT OUTER JOIN
tbl_Most_Recent_Comments ON Cast(tbl_Allocation.Hidden_Order_Only as varchar) = tbl_Most_Recent_Comments.Com_ID_Parent_Key
LEFT OUTER JOIN
tbl_Most_Recent_Comments as MRC_Line ON Cast(tbl_Allocation.Hidden_Order_Key as varchar) = MRC_Line.Com_ID_Parent_Key
ORDER BY
num_Order_Num, num_Line_Num
End
RETURN
What suggestions do you have to make this execute within five seconds or less?
Thanks,
Rob

Assuming you have appropriate indices defined, you still have several things that suggest problems.
1) You have 2 select distinct clauses in this query -- in a good design, distinct clauses are are rarely needed
2) The first inner join uses
tbl_Order_Notes_by_line
ON tbl_Order_Notes.txt_Order_Only
= SUBSTRING(tbl_Order_Notes_by_line.txt_Order_Key_by_line, 1,
CHARINDEX('-', tbl_Order_Notes_by_line.txt_Order_Key_by_line, 0) - 1)
This looks like a horrible join criteria -- function calls during the join that prevent any decent query optimization. My guess is that your are using data the has internal meaning and that you are parsing the internal meaning during the join, e.g.,
PartNumber = AAA-BBBB_NNNNNNNN
where AAA is the Country product line and BBBB is the year & month of the design
If you must have coded fields like these AND you need to manipulate them, put the codes into separate database fields and created a computer column -- or even a plan copy of the full part number field if the combined field is unusually complex.
This point is not a performance issue, but you have a long sub-query using multiple AND & OR clauses. I know the rules for operator precedence, you may know the rules for operator precedence, but will the next guy? Will you remember them an 1:00 when stuff is broken.
ADDED
You are using 2 common table expressions. I know others say it does not happen, but I don't really trust the query optimizer for CTE's -- I have had to recode CTE based joins for performance issues on several occasions -- creating an actual view equivalent to the CTE and using that instead can be a significant speedup. May well depend on the version of SQL server, but if you are running an older version I would definitely wonder about CTR optimization. -- This is not as important as the first 2 things I've mentioned, try to fix those first.
ADDED
I'm going to harsh on CTEs again, as I did not really explain why they are bad for performance, and it was bothering me. If you don't have performance issues, and you like the syntax, they can be useful in at least limited usage, personally I don't normally recommend them for anything more than that -- and given that it is MS specific syntactical sugar, I really can't recommend them much at all.
I think the primary reason that CTEs don't get optimized well is that there are no statistics for the opimizer to use. If you are pulling a lot of rows into a CTE, you are probably better off creating #temptable and populating it. You can even add an index or two to your #temptable and the optimizer can figure out how to use them too. A #temp table is similar, but at least through sql 2012, the were no faster than #temp that I could tell -- supposedly new goodness in server 2014 help this.
A CTE is really just a temporary view in disguise, which I why I suggested you can replace with a real view to better better performance (and you often can), or you can populate a temp table and sometime get even better performance.

Related

How can this change be making my query slow (OR vs UNION) and can I fix it?

I've just been debugging a slow SQL query.
It's a join between 2 tables, with a WHERE clause conditioning on either a property of 1 table OR the other.
If I re-write it as a UNION then it's suddenly 2 orders of magnitude faster, even though those 2 queries produce identical outputs:
DECLARE #UserId UNIQUEIDENTIFIER = '0019813D-4379-400D-9423-56E1B98002CB'
SELECT *
FROM Bookings
LEFT JOIN BookingPricings ON Booking = Bookings.ID
WHERE (BookingPricings.[Owner] in (#UserId) OR Bookings.MixedDealBroker in (#UserId))
--Execution time: ~4000ms
SELECT *
FROM Bookings
LEFT JOIN BookingPricings ON Booking = Bookings.ID
WHERE (BookingPricings.[Owner] in (#UserId))
UNION
SELECT *
FROM Bookings
LEFT JOIN BookingPricings ON Booking = Bookings.ID
WHERE (Bookings.MixedDealBroker in (#UserId))
--Execution time: ~70ms
This seems rather surprising to me! I would have expected the SQL compiler to be entirely capable of identifying that the 2nd form was equivalent and would have used that compilation approach if it were available.
Some context notes:
I've checked and IN (#UserId) vs = #UserId makes no difference.
Nor does JOIN vs LEFT JOIN.
Those tables each have 100,000s records, and the filter cuts it down to ~100.
In the slow version it seems to be reading every row of both tables.
So:
Does anyone have any ideas for how this comes about.
What (if anything) can I do to fix the performance without just re-writing the query as a series of UNIONs (not viable for a variety of reasons.)
=-=-=-=-=-=-=
Execution Plans:
This is a common limitation of SQL engines, not just in SQL Server, but also other database systems as well. The OR complicates the predicate enough that the execution plan selected isn't always ideal. This probably relates to the fact that only one index can be seeked into per instance of a table object at a time (for the most part), or in your specific case, your OR predicate is across two different tables, and other factors with how SQL engines are designed.
By using a UNION clause, you now have two instances of the Bookings table referenced, which can individually be seeked on separately in the most efficient way possible. That allows the SQL Engine to pick a better execution plan to serve you query.
This is pretty much just one of those things that are the way they are because that's just the way it is, and you need to remember the UNION clause workaround for future encounters of this kind of performance issue.
Also, in response to your comment:
I don't understand how the difference can affect the EP, given that the 2 different "phrasings" of the query are identical?
A new execution plan is generated every time one doesn't exist in the plan cache for a given query, essentially. The way the Engine determines if a plan for a query is already cached is based on the exact hashing of that query statement, so even an extra space character at the end of the query can result in a new plan being generated. Theoretically that plan can be different. So a different written query (despite being logically the same) can surely result in a different execution plan.
There are other reasons a plan can change on re-generation too, such as different data and statistics of that data, in the tables referenced in the query between executions. But these reasons don't really apply to your question above.
As already stated, the OR condition prevents the database engine from efficiently using the indexes in a single query. Because the OR condition spans tables, I doubt that the Tuning Advisor will come up with anything useful.
If you have a case where the query you have posted is part of a larger query, or the results are complex and you do not want to repeat code, you can wrap your initial query in a Common Table Expression (CTE) or a subquery and then feed the combined results into the remainder of your query. Sometimes just selecting one or more PKs in your initial query will be sufficient.
Something like:
SELECT <complex select list>
FROM (
SELECT Bookings.ID AS BookingsID, BookingPricings.ID AS BookingPricingsID
FROM Bookings
LEFT JOIN BookingPricings ON Booking = Bookings.ID
WHERE (BookingPricings.[Owner] in (#UserId))
UNION
SELECT Bookings.ID AS BookingsID, BookingPricings.ID AS BookingPricingsID
FROM Bookings B
LEFT JOIN BookingPricings ON Booking = Bookings.ID
WHERE (Bookings.MixedDealBroker in (#UserId))
) PRE
JOIN Bookings B ON B.ID = PRE.BookingsID
JOIN BookingPricings BP ON BP.ID = PRE.BookingPricingsID
<more joins>
WHERE <more conditions>
Having just the IDs in your initial select make the UNION more efficient. The UNION can also be changed to a yet more-efficient UNION ALL with careful use of additional conditions, such as AND Bookings.MixedDealBroker <> #UserId in the second part, to avoid overlapping results.

mssql execution order guarantee when conversion in where clause

I have following scalar function
CREATE FUNCTION dbo.getOM
( #mskey INT,
#category VARCHAR(2)
)
RETURNS VARCHAR(11)
AS
BEGIN
DECLARE #om VARCHAR(11)
SELECT #om = o.aValue
FROM dbo.idmv_value_basic o WITH (NOLOCK)
WHERE o.MSKEY = #mskey and o.AttrName = 'OM'
AND EXISTS (
SELECT NULL
FROM sys.sequences s WITH (NOLOCK)
WHERE CONVERT(INT, replace(o.aValue, '1690', '')) BETWEEN s.minimum_value AND s.maximum_value AND s.name = concat('om_', #category)
)
RETURN #om
END
Problem with that is, that o.aValue could not only have numeric values, so that the convertion can fail, if it is executet on other rows of idmv_value_basic, where attrName is not 'OM'.
For some unknown reason this morning, our MSSQL-Server changed the execution order of the where conditions and the convertion failed.
How could I define the selection, so that is guaranteed, that only the the selected lines of idmv_value_basic are used for the selection on sys.sequences?
I know, for SQL the execution order is not deterministic, but there must be a way, to guarantee, that the conversion would not fail.
Any ideas, how I could change the function or am I doing something fundamentally wrong?
By the way, when I execute the selection manualy, it does not fail, but when I execute the funtion it fails.
We could repair the function while changing something, save and then change it back and save again.
I'll try answer the question: "Any way to guarantee execution-order?"
You can - to some extent. When you write an EXISTS, what sqlserver will actually do behind the scenes is a join. (check the execution plan)
Now, the way joins are evaluated depends on cardinalities. Say you're joining tables A and B, with a predicate on both. SQL Server will prefer to start with the table producing the fewest rows.
In your case, SQL Server probably decided that a full scan of sys.sequences produces fewer rows than dbo.idmv_value_basic (WHERE o.MSKEY = #mskey and o.AttrName = 'OM') - maybe because the number of rows in idmv_value_basic increased recently?
You could help things by making an index on dbo.idmv_value_basic (MSKEY, AttrName) INCLUDE (aValue). I assume the predicate produces exactly one row pr. MSKey - or at least not very many - and an index would help SQL Server choose the "right way", by giving it more accurate estimates of how many rows that part of the join produces.
CREATE INDEX IDVM_VALUE_BASIC_MSKEY_ATTRNAME_VALUE
ON dbo.idmv_value_basic (MSKey, AttrName) INCLUDE (aValue);
Rewriting an EXISTS as a JOIN can be done, but requires a bit of finesse. With a JOIN, you can specify which kind (Loop, Merge or Hash) and thus force SQL Server to acknowledge your knowing better and forcing the order of evaluation, ie.
SELECT ...
FROM dbo.idmv_value_basic o
INNER LOOP JOIN (SELECT name
FROM sys.sequences
WHERE (..between min and max)
) AS B
ON (B.name= concat('om_',#category ))
WHERE o.MSKey=#mskey AND o.AttrName = 'OM'
And lose the WITH (NOLOCK)

Left join or Select in select (SQL - Speed of query)

I have something like this:
SELECT CompanyId
FROM Company
WHERE CompanyId not in
(SELECT CompanyId
FROM Company
WHERE (IsPublic = 0) and CompanyId NOT IN
(SELECT ShoppingLike.WhichId
FROM Company
INNER JOIN
ShoppingLike ON Company.CompanyId = ShoppingLike.UserId
WHERE (ShoppingLike.IsWaiting = 0) AND
(ShoppingLike.ShoppingScoreTypeId = 2) AND
(ShoppingLike.UserId = 75)
)
)
It has 3 select, I want to know how could I have it without making 3 selects, and which one has better speed for 1 million record? "select in select" or "left join"?
My experiences are from Oracle. There is never a correct answer to optimising tricky queries, it's a collaboration between you and the optimiser. You need to check explain plans and sometimes traces, often at each stage of writing the query, to find out what the optimiser in thinking. Having said that:
You could remove the outer SELECT by putting the entire contents of it's subquery WHERE clause in a NOT(...). On the face of it will prevent that outer full scan of Company (or it's index of CompanyId). Try it, check the output is the same and get timings, then remove it temporarily before trying the below. The NOT() may well cause the optimiser to stop considering an ANTI-JOIN against the ShoppingLike subquery due to an implicit OR being created.
Ensure that CompanyId and WhichId are defined as NOT NULL columns. Without this (or the likes of an explicit CompanyId IS NOT NULL) then ANTI-JOIN options are often discarded.
The inner most subquery is not correlated (does not reference anything from it's outer query) so can be extracted and tuned separately. As a matter of style I'd swap the table names round the INNER JOIN as you want ShoppingLike scanned first as it has all the filters against it. It wont make any difference but it reads easier and makes it possible to use a hint to scan tables in the order specified. I would even question the need for the Company table in this subquery.
You've used NOT IN when sometimes the very similar NOT EXISTS gives the optimiser more/alternative options.
All the above is just trial and error unless you start trying the explain plan. Oracle can, with a following wind, convert between LEFT JOIN and IN SELECT. 1M+ rows will create time to invest.

Why does breaking out this correlated subquery vastly improve performance?

I tried running this query against two tables which were very different sizes - #temp was about 15,000 rows, and Member is about 70,000,000, about 68,000,000 of which do not have the ID 307.
SELECT COUNT(*)
FROM #temp
WHERE CAST(individual_id as varchar) NOT IN (
SELECT IndividualID
FROM Member m
INNER JOIN Person p ON p.PersonID = m.PersonID
WHERE CompanyID <> 307)
This query ran for 18 hours, before I killed it and tried something else, which was:
SELECT IndividualID
INTO #source
FROM Member m
INNER JOIN Person p ON p.PersonID = m.PersonID
WHERE CompanyID <> 307
SELECT COUNT(*)
FROM #temp
WHERE CAST(individual_id AS VARCHAR) NOT IN (
SELECT IndividualID
FROM #source)
And this ran for less than a second before giving me a result.
I was pretty surprised by this. I'm a middle-tier developer rather than a SQL expert and my understanding of what goes on under the hood is a little murky, but I would have presumed that, since the sub-query in my first attempt is the exact same code, asking for the exact same data as in the second attempt, that these would be roughly equivalent.
But that's obviously wrong. I can't look at the execution plan for my original query to see what SQL Server is trying to do. So can someone kindly explain why splitting the data out into a temp table is so much faster?
EDIT: Table schemas and indexes
The #temp table has two columns, Individual_ID int and Source_Code varchar(50)
Member and Person are more complex. They has 29 and 13 columns respectively so I don't really want to post them all in full. PersonID is an int and is the PK on Person and an FK on Member. IndividualID is a column on Person - this is not clear in the query as written.
I tried using a LEFT JOIN instead of NOT IN before asking the question. The performance on the second query wasn't noticeably different - both were sub-second. On the first query I let it run for an hour before stopping it, presuming it would make no significant difference.
I also added an index on #source, just like on the original table, so the performance impact should be identical.
First, your query has two faux pas's that really stick out. You are converting to varchar(), but you do not include a length argument. This should not be allowed! The default length varies by context and you need to be explicit.
Second, you are matching two keys in different tables and they seemingly have different types. Foreign key references should always have the same type. This can have a very big impact on performance. If you are dealing with tables that have millions of rows, then you need to pay some attention to the data structure.
To understand the difference in performance, you need to understand execution plans. The two queries have very different execution plans. My (educated) guess is that the first version version is using a nested loop join algorithm. The second version is using a more sophisticated algorithm. In your case, this would be due to the ability of SQL Server to maintain statistics on tables. So, instantiating the intermediate results actually helps the optimizer produce a better query plan.
The subject of how best to write this logic has been investigated a lot. Here is a very good discussion on the subject by Aaron Bertrand.
I do agree with Aaron on the preference for not exists in this case:
SELECT COUNT(*)
FROM #temp t
WHERE NOT EXISTS (SELECT 1
FROM Member m JOIN
Person p
ON p.PersonID = m.PersonID
WHERE MemberID <> 307 and individual_id = t. individual_id
);
However, I don't know if this will have better performance in this particular case.
This line is probably what kills the first query
WHERE CAST(individual_id as varchar) NOT IN
My guess would be that this forces a table scan rather than using any indexes.

Attempt at database localization using table-valued functions

I'm looking for opinions on the following localization technique:
We start with 2 tables:
tblProducts : ProductID, Name,Description,SomeAttribute
tblProductsLocalization : ProductID,Language,Name,Description
and a table-valued function:
CREATE FUNCTION [dbo].[LocalizedProducts](#locale nvarchar(50))
RETURNS TABLE
AS (SELECT a.ProductID,COALESCE(b.Name,a.Name)as [Name],COALESCE(b.Description,a.Description)as [Description],a.SomeAttribute
from tblProducts a
left outer join tblProductsLocalization_Locale b
on a.ProductID= b.ProductID and b.[Language]=#locale)
What I plan to do is include the the function whenever i need localized-data returned:
select * from LocalizedProducts('en-US') where ID=1
instead of
select * from tblProducts where ID=1
I'm interested if there are major performance concerns arround this or any showstoppers. Any reasons I shouldn't adopt this?
Edit: I've tagged this SQL2005 , altough I develop this using 2008, I think the deployment target only has SQL2005. I could upgrade to 2008 if the need arises though.
Later edit:
I have created a view, with identical content, but without the parameter:
CREATE VIEW [dbo].[LocalizedProductsView]
AS
SELECT b.Language,a.ProductID,COALESCE(b.Name,a.Name)as [Name],
COALESCE(b.Description,a.Description)as [Description],a.SomeAttributefrom tblProducts a
left outer join tblProductsLocalization_Locale b on a.ProductID= b.ProductID
I then proceeded to run some tests:
Estimated execution plan looks identical to both queries:
select * from LocalizedProducts('us-US') where SomeNonIndexedParameter=2
select * from LocalizedProductsView where (Language='us-US' or Language is null) and SomeNonIndexedPramaters=2
Final Question that arrises is: Should I understand that the TVF is computing the translations on ALL the products, regardless of the WHERE parameters? is the View doing the same thing ?
Short answer: As a general rule, there is nothing wrong with using a TVF for this sort of thing, but I would suggest making the ID be a parameter, also:
CREATE FUNCTION [dbo].[LocalizedProducts](#ID int, #locale nvarchar(50))
RETURNS TABLE
AS (SELECT a.ProductID,COALESCE(b.Name,a.Name)as [Name],COALESCE(b.Description,a.Description)as [Description],a.SomeAttribute
from tblProducts a
left outer join tblProductsLocalization _Locale b
on a.ProductID= b.ProductID and b.[Language]=#locale)
where a.ProductId = #ID
Used like so:
select * from LocalizedProducts(1, 'en-US')
Longer explanation:
I've never tried something like this in SQL 2008 yet, so it's possible that SQL Server can optimized this issue away.
My experience in earlier versions, though, seems to suggest that SQL Server tends to handle
User-Defined Functions in a more procedural than declarative fashion, so it doesn't interpret what you want and then figure out the best way to get you what you want, but actually performs in order the instructions you've written. So it appears to me that this method would:
select all English-language text, placing it into a table variable.
take the results of step #1 and select any records with the given ID.
This would mean a lot of wasted cycles, putting mostly-unused English text into the table variable, before applying the ID filter to that result set. On the other hand, putting all of the filters into the UDF would let SQL Server determine whether it's easiest to filter by ID first (more likely, assuming a standard indexing scheme), and then apply the locale filter, or vice versa. Either way, you should be having less data being moved around in the background, and thus have better performance, if you put all your filters in one spot. Again, this all assumes that SQL Server is not now making giant leaps in optimization. But if so, that's even more reason to say, yes, there is no problem using the TVF.
It's a safe bet that you'll have to translate more than product names. So I'd design the translation solution to handle any kind of string.
For example, you could have a localization table like:
Id, TranslatableStringId, Language, Translation
Then each product could have a translatable string associated with it. But also the explanatory text on top of the product list.
For products, you'd query like:
SELECT *
FROM Products p
INNER JOIN Translations t
ON p.DescriptionId = t.TranslatableStringId
AND t.language = 'en-US'
For an explanatory text, you'd get a simple:
SELECT t.Translation
FROM Translations t
WHERE t.TranslatableStringId = 123 -- ID of string
AND t.language = 'en-US'
P.S. For a real program, I'd use a more shorthand description than TranslatableStringId, like tsid, because translations tend to pop up everywhere.
I wanted to come back with an answer to this after doing a lot more testing.
It appears to me that SQL2008 is actually looking inside the TVF when performing the query plan and optimizing accordingly:
For instance:
select pr.* from LocalizedProducts('en-US') pr inner join LocalizedPhotos('en-US') ph on
ph.ProductId=pr.Id where pr.SomeUnindexProperty= 5
This query needs to touch 4 tables:
Products
Products_Localization
Photos
Photos_Localization
The way the query plan looks is that (let me see if I can format this):
Product gets a Clustered Index Seek
-- >> Products gets nested loop with Photos
-->> nested loop Products_Localization -
->> nested loop Photos_Localization.
Which is not what you would expect if the TVF would be a black box. The simple fact that Product gets an index SEEK would suggest to me that the query will not interpret blindly the entire TVF.
I ran a lot of performance tests, and on average the "localization" TVF are between 50% - 100% slower than using direct table-queries, but that would be expected as twice as many tables are involved in the TVFs than in the normal queries.