Recursive SQL function and change tracking don't work together

Recursive SQL function and change tracking don't work together - sql

I have a recursive function which gives allows me to give any GUID in the heirarchy and it pulls back all the values below it. This is used for folder security.
ALTER FUNCTION dbo.ValidSiteClass
(
#GUID UNIQUEIDENTIFIER
)
RETURNS TABLE
AS
RETURN
(
-- Add the SELECT statement with parameter references here
WITH previous
AS ( SELECT
PK_SiteClass,
FK_Species_SiteClass,
CK_ParentClass,
ClassID,
ClassName,
Description,
SyncKey,
SyncState
FROM
dbo.SiteClass
WHERE
PK_SiteClass = #GUID
UNION ALL
SELECT
Cur.PK_SiteClass,
Cur.FK_Species_SiteClass,
Cur.CK_ParentClass,
Cur.ClassID,
Cur.ClassName,
Cur.Description,
Cur.SyncKey,
Cur.SyncState
FROM
dbo.SiteClass Cur,
previous
WHERE
Cur.CK_ParentClass = previous.PK_SiteClass)
SELECT DISTINCT
previous.PK_SiteClass,
previous.FK_Species_SiteClass,
previous.CK_ParentClass,
previous.ClassID,
previous.ClassName,
previous.Description,
previous.SyncKey,
previous.syncState
FROM
previous
)
I have a stored procudure which then later needs to figure out what folders have changed in the user's heirarchy which I use for change tracking. When I try to join it with my change tracking it never returns the query. For example, the following doesn't ever return any results (It just spins, I stop it after 6 minutes)
DECLARE #ChangeTrackerNumber INT = 13;
DECLARE #SelectedSchema UNIQUEIDENTIFIER = '36EC6589-8297-4A82-86C3-E6AAECCC7D95';
WITH validones AS (SELECT PK_SITECLASS FROM ValidSiteClass(#SelectedSchema))
SELECT SiteClass.PK_SiteClass KeyGuid,
'' KeyString,
dbo.GetChangeOperationEnum(SYS_CHANGE_OPERATION) ChangeOp
FROM dbo.SiteClass
INNER JOIN CHANGETABLE(CHANGES SiteClass, #ChangeTrackerNumber) tracking --tracking
ON tracking.PK_SiteClass = SiteClass.PK_SiteClass
INNER JOIN validones
ON SiteClass.PK_SiteClass = validones.PK_SiteClass
WHERE SyncState IN ( 0, 2, 4 );
The only way I can make this work is with a temptable such as:
DECLARE #ChangeTrackerNumber INT = 13;
DECLARE #SelectedSchema UNIQUEIDENTIFIER = '36EC6589-8297-4A82-86C3-E6AAECCC7D95';
CREATE TABLE #temptable
(
[PK_SiteClass] UNIQUEIDENTIFIER
);
INSERT INTO #temptable
(
PK_SiteClass
)
SELECT PK_SiteClass
FROM dbo.ValidSiteClass(#SelectedSchema);
SELECT SiteClass.PK_SiteClass KeyGuid,
'' KeyString,
dbo.GetChangeOperationEnum(SYS_CHANGE_OPERATION) ChangeOp
FROM dbo.SiteClass
INNER JOIN CHANGETABLE(CHANGES SiteClass, #ChangeTrackerNumber) tracking --tracking
ON tracking.PK_SiteClass = SiteClass.PK_SiteClass
INNER JOIN #temptable
ON SiteClass.PK_SiteClass = #temptable.PK_SiteClass
WHERE SyncState IN ( 0, 2, 4 );
DROP TABLE #temptable;
In other words, the CTE doesn't work and I need to call the temptable.
First question, isn't the CTE supposed to be the same thing (but better) than a temptable?
Second question, does anyone know why this could be so? I have tried inner joins and using a where and in clause also. Is there something different about a recursive query that might cause this odd behavior?

Generally, when you have a table-valued function, you'd just include it like it was a regular table (assuming you have a parameter to pass to it). If you want to pass a series of parameters to it, you'd use outer apply, but that doesn't seem to be the case here.
I think (maybe) this is more like you want (notice no with clause):
select
s.PK_SiteClass KeyGuid,
'' KeyString,
dbo.GetChangeOperationEnum(t.SYS_CHANGE_OPERATION) ChangeOp
from
dbo.ValidSiteClass(#SelectedSchema) v
inner join
SiteClass s
on
s.PK_SiteClass = v.PK_SiteClass
inner join
changetable(changes SiteClass, #ChangeTrackerNumber) c
on
c.PK_SiteClass = s.PK_SiteClass
where
SyncState in ( 0, 2, 4 )
option (force order)
...which I'll admit doesn't look that mechanically different than what you have with the with clause. However, you could be running into an issue with SQL Server just picking a horrible plan not having any other clues. Including the option (force order) makes SQL Server perform the joins according to the order you put them in...and sometimes this makes an incredible difference.
I wouldn't say this is recommended. In fact, it's a hack...just to see WTF. But, play around with the order...and get SQL Server to show you the actual execution plans to see why it might have come up with something so heinous. An inline table-valued function is visible to SQL Server's query plan engine, and it may decide to not treat the function as an isolated thing the way programmers traditionally think about functions. I suspect this is why it took so long to begin with.
Funny enough, if your function were to be a so-called multi-lined table-valued function, SQL would definitely not have the same type of visibility into it when planning this query...and it might run faster. Again, not a recommendation, just something that might hack a better plan.

Related

mssql execution order guarantee when conversion in where clause

I have following scalar function
CREATE FUNCTION dbo.getOM
( #mskey INT,
#category VARCHAR(2)
)
RETURNS VARCHAR(11)
AS
BEGIN
DECLARE #om VARCHAR(11)
SELECT #om = o.aValue
FROM dbo.idmv_value_basic o WITH (NOLOCK)
WHERE o.MSKEY = #mskey and o.AttrName = 'OM'
AND EXISTS (
SELECT NULL
FROM sys.sequences s WITH (NOLOCK)
WHERE CONVERT(INT, replace(o.aValue, '1690', '')) BETWEEN s.minimum_value AND s.maximum_value AND s.name = concat('om_', #category)
)
RETURN #om
END
Problem with that is, that o.aValue could not only have numeric values, so that the convertion can fail, if it is executet on other rows of idmv_value_basic, where attrName is not 'OM'.
For some unknown reason this morning, our MSSQL-Server changed the execution order of the where conditions and the convertion failed.
How could I define the selection, so that is guaranteed, that only the the selected lines of idmv_value_basic are used for the selection on sys.sequences?
I know, for SQL the execution order is not deterministic, but there must be a way, to guarantee, that the conversion would not fail.
Any ideas, how I could change the function or am I doing something fundamentally wrong?
By the way, when I execute the selection manualy, it does not fail, but when I execute the funtion it fails.
We could repair the function while changing something, save and then change it back and save again.

I'll try answer the question: "Any way to guarantee execution-order?"
You can - to some extent. When you write an EXISTS, what sqlserver will actually do behind the scenes is a join. (check the execution plan)
Now, the way joins are evaluated depends on cardinalities. Say you're joining tables A and B, with a predicate on both. SQL Server will prefer to start with the table producing the fewest rows.
In your case, SQL Server probably decided that a full scan of sys.sequences produces fewer rows than dbo.idmv_value_basic (WHERE o.MSKEY = #mskey and o.AttrName = 'OM') - maybe because the number of rows in idmv_value_basic increased recently?
You could help things by making an index on dbo.idmv_value_basic (MSKEY, AttrName) INCLUDE (aValue). I assume the predicate produces exactly one row pr. MSKey - or at least not very many - and an index would help SQL Server choose the "right way", by giving it more accurate estimates of how many rows that part of the join produces.
CREATE INDEX IDVM_VALUE_BASIC_MSKEY_ATTRNAME_VALUE
ON dbo.idmv_value_basic (MSKey, AttrName) INCLUDE (aValue);
Rewriting an EXISTS as a JOIN can be done, but requires a bit of finesse. With a JOIN, you can specify which kind (Loop, Merge or Hash) and thus force SQL Server to acknowledge your knowing better and forcing the order of evaluation, ie.
SELECT ...
FROM dbo.idmv_value_basic o
INNER LOOP JOIN (SELECT name
FROM sys.sequences
WHERE (..between min and max)
) AS B
ON (B.name= concat('om_',#category ))
WHERE o.MSKey=#mskey AND o.AttrName = 'OM'
And lose the WITH (NOLOCK)

Performance of In Operator with OR conditional SQL

I have common clause in Most of the Procedures like
Select * from TABLE A + Joins where <Conditions>
And
(
-- All Broker
('True' = (Select AllBrokers from SiteUser where ID = #SiteUserID))
OR
(
A.BrokerID in
(
Select BrokerID from SiteUserBroker where SiteUserID
= #SiteUserID)
)
)
So basically if the user has access to all brokers the whole filter should not be applied else if should get the list of Broker
I am bit worries about the performance as this is used in lot of procedures and data has started reaching over 100,000 records and will grow soon, so can this be better written?
ANY Ideas are highly appreciated

One of the techniques is to use built dynamic T-SQL statement and then execute it. Since, this is done in stored procedure you are OK and the idea is simple.
DECLARE #DynamicTSQLStatement NVARCHAR(MAX);
SET #DynamicTSQLStatement = 'base query';
IF 'Getting All Brokers is not allowed '
BEGIN;
SET #DynamicTSQLStatement = #DynamicTSQLStatement + 'aditional where clause'
END;
EXEC sp_executesql #DynamicTSQLStatement;
Or instead of using dynamic T-SQL statement you can have two separate queries - one for users seeing all the data and one for users seeing part of the data. This can lead to code duplication.
Another way, is to turn this OR statement in INNER JOIN. You should test the performance in order to be sure you are getting something from it. The idea is to create a temporary table (it can have primary key or indexes if needed) and store all visible broker ids there - if the users sees all, then Select BrokerID from SiteUserBroker and if the users sees a few - Select BrokerID from SiteUserBroker where SiteUserID = #SiteUserID. In the second way, you are going to simplify the whole statement, but be sure to test if performance is improved.
CREATE TABLE #SiteUserBroker
(
[BrokerID] INT PRIMARY KEY
);
INSERT INTO #SiteUserBroker ([BrokerID])
SELECT BrokerID
FROM SiteUserBroker
where SiteUserID = #SiteUserID
OR ('True' = (Select AllBrokers from SiteUser where ID = #SiteUserID));
Select *
from TABLE A
INNER JOIN #SiteUserBroker B
ON A.BrokerID = B.[BrokerID]
-- other joins
where <Conditions>
As we are using INNER JOIN you can add it at the begging. If there are LEFT JOINs after it, it will affect the performance in positive way.

Adding to #gotqn answer, you can add EXISTS instead of IN (Note - This is not complete answer) -
AND EXISTS (
Select 1/0 from SiteUserBroker X
where A.BrokerID = X.BrokerID AND
X.SiteUserID = #SiteUserID
)
I found that Exists performs better than In in some cases. Please verify your case.

SQL Server stored procedure takes 1' 18" to run... seems long

Sure could use some optimization help here. I've got a stored procedure which takes approximately 1 minute, 18 seconds to run and it gets even worse when I run the asp.net page which hits it.
Some stats:
tbl_Allocation typically has approximately 55K records
CS_Ready has ~300
Redate_Orders has ~2000
Here is the code:
ALTER PROCEDURE [dbo].[sp_Order_Display]
/*
(
#parameter1 int = 5,
#parameter2 datatype OUTPUT
)
*/
AS
/* SET NOCOUNT ON */
BEGIN
WTIH CS_Ready AS
(
SELECT
tbl_Order_Notes.txt_Order_Only As CS_Ready_Order
FROM
tbl_Order_Notes
INNER JOIN
tbl_Order_Notes_by_line ON tbl_Order_Notes.txt_Order_Only = SUBSTRING(tbl_Order_Notes_by_line.txt_Order_Key_by_line, 1, CHARINDEX('-', tbl_Order_Notes_by_line.txt_Order_Key_by_line, 0) - 1)
WHERE
(tbl_Order_Notes.bin_Customer_Service_Review = 'True')
AND (tbl_Order_Notes_by_line.dat_Recommended_Date_by_line IS NOT NULL)
AND (tbl_Order_Notes_by_line.bin_Redate_Request_by_line = 'True')
OR (tbl_Order_Notes.bin_Customer_Service_Review = 'True')
AND (tbl_Order_Notes_by_line.dat_Recommended_Date_by_line IS NULL)
AND (tbl_Order_Notes_by_line.bin_Redate_Request_by_line = 'False'
OR tbl_Order_Notes_by_line.bin_Redate_Request_by_line IS NULL)
),
Redate_Orders AS
(
SELECT DISTINCT
SUBSTRING(txt_Order_Key_by_line, 1, CHARINDEX('-', txt_Order_Key_by_line, 0) - 1) AS Redate_Order_Number
FROM
tbl_Order_Notes_by_line
WHERE
(bin_Redate_Request_by_line = 'True')
)
SELECT DISTINCT
tbl_Allocation.*, tbl_Order_Notes.*,
tbl_Order_Notes_by_line.*,
tbl_Max_Promised_Date_1.Max_Promised_Ship,
tbl_Max_Promised_Date_1.Max_Scheduled_Pick,
Redate_Orders.Redate_Order_Number, CS_Ready.CS_Ready_Order,
tbl_Most_Recent_Comments.Abbr_Comment,
MRC_Line.Abbr_Comment as Abbr_Comment_Line
FROM
tbl_Allocation
INNER JOIN
tbl_Max_Promised_Date AS tbl_Max_Promised_Date_1 ON tbl_Allocation.num_Order_Num = tbl_Max_Promised_Date_1.num_Order_Num
LEFT OUTER JOIN
CS_Ready ON tbl_Allocation.num_Order_Num = CS_Ready.CS_Ready_Order
LEFT OUTER JOIN
Redate_Orders ON tbl_Allocation.num_Order_Num = Redate_Orders.Redate_Order_Number
LEFT OUTER JOIN
tbl_Order_Notes ON Hidden_Order_Only = tbl_Order_Notes.txt_Order_Only
LEFT OUTER JOIN
tbl_Order_Notes_by_line ON Hidden_Order_Key = tbl_Order_Notes_by_line.txt_Order_Key_by_line
LEFT OUTER JOIN
tbl_Most_Recent_Comments ON Cast(tbl_Allocation.Hidden_Order_Only as varchar) = tbl_Most_Recent_Comments.Com_ID_Parent_Key
LEFT OUTER JOIN
tbl_Most_Recent_Comments as MRC_Line ON Cast(tbl_Allocation.Hidden_Order_Key as varchar) = MRC_Line.Com_ID_Parent_Key
ORDER BY
num_Order_Num, num_Line_Num
End
RETURN
What suggestions do you have to make this execute within five seconds or less?
Thanks,
Rob

Assuming you have appropriate indices defined, you still have several things that suggest problems.
1) You have 2 select distinct clauses in this query -- in a good design, distinct clauses are are rarely needed
2) The first inner join uses
tbl_Order_Notes_by_line
ON tbl_Order_Notes.txt_Order_Only
= SUBSTRING(tbl_Order_Notes_by_line.txt_Order_Key_by_line, 1,
CHARINDEX('-', tbl_Order_Notes_by_line.txt_Order_Key_by_line, 0) - 1)
This looks like a horrible join criteria -- function calls during the join that prevent any decent query optimization. My guess is that your are using data the has internal meaning and that you are parsing the internal meaning during the join, e.g.,
PartNumber = AAA-BBBB_NNNNNNNN
where AAA is the Country product line and BBBB is the year & month of the design
If you must have coded fields like these AND you need to manipulate them, put the codes into separate database fields and created a computer column -- or even a plan copy of the full part number field if the combined field is unusually complex.
This point is not a performance issue, but you have a long sub-query using multiple AND & OR clauses. I know the rules for operator precedence, you may know the rules for operator precedence, but will the next guy? Will you remember them an 1:00 when stuff is broken.
ADDED
You are using 2 common table expressions. I know others say it does not happen, but I don't really trust the query optimizer for CTE's -- I have had to recode CTE based joins for performance issues on several occasions -- creating an actual view equivalent to the CTE and using that instead can be a significant speedup. May well depend on the version of SQL server, but if you are running an older version I would definitely wonder about CTR optimization. -- This is not as important as the first 2 things I've mentioned, try to fix those first.
ADDED
I'm going to harsh on CTEs again, as I did not really explain why they are bad for performance, and it was bothering me. If you don't have performance issues, and you like the syntax, they can be useful in at least limited usage, personally I don't normally recommend them for anything more than that -- and given that it is MS specific syntactical sugar, I really can't recommend them much at all.
I think the primary reason that CTEs don't get optimized well is that there are no statistics for the opimizer to use. If you are pulling a lot of rows into a CTE, you are probably better off creating #temptable and populating it. You can even add an index or two to your #temptable and the optimizer can figure out how to use them too. A #temp table is similar, but at least through sql 2012, the were no faster than #temp that I could tell -- supposedly new goodness in server 2014 help this.
A CTE is really just a temporary view in disguise, which I why I suggested you can replace with a real view to better better performance (and you often can), or you can populate a temp table and sometime get even better performance.

Query tuning SQL Server 2008

I have a view name "vw_AllJobsWithRecruiter".
ALTER VIEW dbo.vw_AllJobsWithRecruiter
AS
SELECT TOP(SELECT COUNT(iJobID_PK) FROM dbo.tbUS_Jobs)
iJobId_PK AS JobId,
dbo.ufn_JobStatus(iJobId_PK) AS JobStatus,
dbo.ufn_RecruiterCompanyName(iJobId_PK) AS CompanyName,
sOther AS OtherCompanyName
FROM dbo.tbUS_Jobs
WHERE bDraft = 0
ORDER BY dtPostedDate DESC
This view contains only 3278 number of rows.
If I execute the below query :
SELECT * FROM vw_AllJobsWithRecruiter
WHERE OtherCompanyName LIKE '%Microsoft INC%'
It is taking less than a second to execute.
Now my problem is:
If I use the query below query:
SELECT * FROM vw_AllJobsWithRecruiter
WHERE CompanyName LIKE '%Microsoft INC%'
OR OtherCompanyName LIKE '%Microsoft INC%'
It is taking 30 seconds to execute and from the front end it is throwing timeout error.
The function is here:
CREATE Function [dbo].[ufn_RecruiterCompanyName] (#JobId bigint)
RETURNS nvarchar(200)
AS
BEGIN
DECLARE #ResultVar nvarchar(200)
DECLARE #RecruiterId bigint
select #RecruiterId = iRecruiterId_FK from dbo.tbUS_Jobs with (Nolock)
where iJobId_PK = #JobId;
Select #ResultVar = sCompanyName from dbo.tbUS_RecruiterCompanyInfo with (Nolock)
where iRecruiterId_FK = dbo.ufn_GetParentRecruiterID(#RecruiterId)
return isnull(#ResultVar,'')
END
The other function
CREATE Function [dbo].[ufn_GetParentRecruiterID](#RecruiterId bigint)
returns bigint
as
begin
declare #ParentRecruiterId bigint
SELECT #ParentRecruiterId = iParentId FROM dbo.tbUS_Recruiter with (Nolock)
WHERE iRecruiterId_PK = #RecruiterId
IF(#ParentRecruiterId = 0)
SET #ParentRecruiterId = #RecruiterId
RETURN #ParentRecruiterId
end
My questions are
Why it is taking so much time to execute?
How can I reduce the execution time?
Thanks a lot for your attention.

The first query only calls dbo.ufn_RecruiterCompanyName() only for the rows returned, it filters on a stored value. For the second Query, SQL Server needs to call the ufn for all of the rows. Depending on the function, this might cause the delay.
Check this in Query Analyzer, and try to avoid the second Query ^^
After taking a look at the custom function I suggest rewriting that View using joined tables. When doing the lookups in such functions, SQL Server calls them for every Row that it touches or delivers. Using a LEFT JOIN allows the Server to use the Indexes and Key much faster and should deliver the Data in less than a second.
Without all of the custom functions and a Definition of all the tables, I cannot give you an Example of that new View, but it should look a bit like this:
SELECT
jobs.Jobid,
jobstatus.Jobstatus,
recruiter.Company
FROM jobs
LEFT JOIN jobstatus ON jobs.Jobid = jobstatus.Jobid
LEFT JOIN recruiter ON jobs.Recruiterid = recruiter.Recruiterid

The problem is your nested function calls.
You are calling ufn_RecruiterCompanyName in your WHERE clause, albeit indirectly.
What this means is, your WHERE clause is non-Sargable, and must run that function for every row.
That function also calls ufn_GetParentRecruiterID. Since that is in your WHERE clause within the first function, and also non-Sargable, you are basically doing two table scans per row in your table.
Replace the function calls with JOINs and you will see a huge boost in performance.

SQL - Optimizing complex paged search querys

I'm currently developing a stored procedure for a complex search in a big database. Because there are many thousand entries, that could be returned I want to use paging. Although it is working, I think it's too slow. I read a lot of posts and articles regarding pagination of SQL queries and optimizing performance. But most 'optimizations' were only helpful for very basic requests like 'give items 20-30 from table x'.
Since our world is not that simple and there are more complex queries to make I would like to get some help optimizing the following query:
CREATE PROCEDURE [SearchItems]
#SAttr1 BIT = 0,
#SAttr2 BIT = 0,
#SAttr3 BIT = 0,
#Flag1 BIT = 0,
#Flag2 BIT = 0,
#Param1 VARCHAR(20),
#Param2 VARCHAR(10),
#SkipCount BIGINT,
#TakeCount BIGINT,
#SearchStrings NVARCHAR(1000)
AS
DECLARE #SearchStringsT TABLE(
Val NVARCHAR(30)
)
INSERT INTO #SearchStringsT
SELECT * FROM dbo.Split(#SearchStrings,',');
WITH ResultTable AS (
SELECT Table1.*, ROW_NUMBER() OVER(ORDER BY Table1.ID ASC) AS [!ROWNUM!]
FROM Table1
INNER JOIN Table2 ON Table1.ID = Table2.FK1
INNER JOIN Table3 ON Table2.ID = Table3.FK2
INNER JOIN Table4 ON Table3.XX = Table4.FKX
WHERE Table1.X1 = #Parameter1
AND
(#Flag1 = 0 OR Table1.X2 = 1) AND
(#Flag2 = 0 OR Table2.X4 = #Parameter2) AND
(#Flag3 = 0 OR EXISTS(SELECT * FROM Table5 WHERE Table5.ID = Table3.X1))
AND
(
(#SAttr1 = 0 OR EXISTS(SELECT * FROM #SearchStringsT WHERE Table1.X1 LIKE Val)) OR
(#SAttr2 = 0 OR EXISTS(SELECT * FROM #SearchStringsT WHERE Table2.X1 LIKE Val)) OR
(#SAttr3 = 0 OR EXISTS(SELECT * FROM #SearchStringsT WHERE Table3.X1 LIKE Val)) OR
(#SAttr4 = 0 OR EXISTS(SELECT * FROM #SearchStringsT WHERE Table4.X1 LIKE Val))
)
)
SELECT TOP(#TakeCount) * FROM ResultTable
WHERE [!ROWNUM!] BETWEEN (#SkipCount + 1) AND (#SkipCount + #TakeCount)
RETURN
The #SAttr parameters are bit parameters to specify whether to search a field or not , the #Flag parameters are turning on/off checking of some boolean expressions, #SkipCount and #TakeCount are used for paging. #SearchString is a comma separated list of search keywords, already including the wild cards.
I hope someone can help me optimizing this, because a single search in a database with 20.000 entries in the main table lasts 800ms and its increasing with entry count. The final application needs to deal with over 100.000 entries.
I thank you very much for every help.
Marks

While I agree with Tom H. that this may be a case where dynamic SQL is best (and I'm a stored proc kinds girl, so I don't say that very often), it may be that you don't have good indexing on your tables. Are all the possible search fields indexed? Are all the FKs indexed?
I mean 20,000 is a tiny, tiny table and 100,000 is too, so it really seems as if you might not have indexed yet.
Check your execution plan to see if indexes are being used.

Stored procedures are not very good at being super-generic because it prevents SQL Server from always using optimal methods. In a similar situation recently I used (gasp) dynamic SQL. My search stored procedures would build the SQL code to perform the search, using pagination just like you have it (WITH with a ROW_NUMBER(), etc.). The advantage was that if parameters indicated that one piece of information wasn't being used in the search then the generated code would omit it. In the end it allowed for better query plans.
Make sure that you use sp_executesql properly to prevent SQL injection attacks.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas