Optimzing TSQL code [closed] - sql

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
My job is the maintain one application which heavy use SQL server (MSSQL2005).
Until now middle server stores TSQL codes in XML and send dynamic TSQL queries without using stored procs.
As I am able change those XML queries I want to migrate most of my queries to stored procs.
Question is folowing:
Most of my queries have same Where conditions against one table
Sample:
Select
.....
from ....
where ....
and (a.vrsta_id = #vrsta_id or #vrsta_id = 0)
and (a.podvrsta_id = #podvrsta_id or #podvrsta_id = 0)
and (a.podgrupa_2 = #podgrupa2_id or #podgrupa2_id = 0)
and (
(a.id in (select art_id from osobina_veze where podosobina_id in (select ado from dbo.fn_ado_param_int(#podosobina))
group by art_id
having count(art_id)= #podosobina_count ))
or ('0' = #podosobina)
)
They also have same where conditions on other table.
How I should organize my code ?
What is proper way ?
Should I
make table valued function that I will use in all queries
or use #Temp tables and simple inner join my query to that each time when proc executing?
or use #temp filed by table valued function ?
or leave all queries with this large where clause and hope that index is going to do their jobs.
or use WITH(statement)

i've come to realize that having such complex searches in a single query is not really a good idea.
i prefer to construct the sql depeding on the input condition values.
this makes it easier fo rsql server to construct a better execution plan for each search.
this way i'd bet that you have a sub-optimal execution plan for your queries.
i realize that this would include dynamic sql so the usual warnings for it apply.

You have two different functional concerns here: the selection of values from your one table and the choice of columns to be returned or other tables to be joined to that data. If the number of items from filtering on your one table could be large, I would inclined to store the PK of selected values into a middle or work table. If it is a permanent table, you can separate different searches by something like SessionId or you could just separate each set of search results by a random value which you pass from the filtering routine to the selecting routine.
There is no reason you could not keep the filtering routine in dynamic SQL. However, I would not try to do dynamic SQL in T-SQL. T-SQL is awful for string manipulation. Building the queries dynamically from your middle tier affords you the ability to exclude elements from the Where clause that are effectively not passed. E.g., instead of having and (a.vrsta_id = #vrsta_id or #vrsta_id = 0), you could simply exclude this line altogether when #vrsta_id is in fact zero or have a.vrsta_id = #vrsta_id when #vrsta_id is not zero. Generally, this type of query will perform better than a series of ORs.
Once you have your work table, your selecting queries would look something like:
Select..
From WorkTable As W
Join ...
Where SetId = 12345
And ( OtherTable.Col = ....
In this case, SetId would represent the set of items that were created from the filtering routine.

You can create a table valued function that takes in the parameters and returns a table of matching a.id values. Then you can inner join that function onto the query in each of your stored procedures. For example:
create function dbo.GetMatches
(
#vrsta_id int,
#podvrsta_id int,
#podgrupa2_id int,
#podosobina_count int
)
returns table
as
return
Select
a.id
from a
where
(a.vrsta_id = #vrsta_id or #vrsta_id = 0)
and (a.podvrsta_id = #podvrsta_id or #podvrsta_id = 0)
and (a.podgrupa_2 = #podgrupa2_id or #podgrupa2_id = 0)
and (
(a.id in (select art_id from osobina_veze where podosobina_id in (select ado from dbo.fn_ado_param_int(#podosobina))
group by art_id
having count(art_id)= #podosobina_count ))
or ('0' = #podosobina)
)
Then in this example query...
select
*
from
a
inner join dbo.GetMatches(1,2,3,4) matches
on a.id = matches.id
inner join b on a.bID = b.bID -- example other table
You could also use that function in the where statement like this...
where
a.id in (select id from dbo.GetMatches(1,2,3,4))

Related

Pass int from outer query into OPENQUERY used as subquery

I am trying to improve the performance of a very large and complex query. Below is the relevant portions. I pass the id to the where clause and get back an orderid. I need to get the order comments from another database on a linked server. I understand that I have to pass the query string to OpenQuery, it cannot have dynamic values. In my example I've hard coded it.
How do I get the S.OrderId value and pass it to the OpenQuery? I've tried some of the example but none do this with a subquery. Declare and Set throw errors inside of my main SELECT.
SELECT S.ID AS Id
, S.OrderID
, (SELECT * FROM OPENQUERY(SQL2014A, 'SELECT TOP 1 SECCOMMENT FROM TMWAMS.dbo.ORDERSEC WHERE ORDERID = 1515552')) AS COMMENTS
FROM ShopPO S WHERE ID = 230

SQL Query Performance Issues Using Subquery

I am having issues with my query run time. I want the query to automatically pull the max id for a column because the table is indexed off of that column. If i punch in the number manually, it runs in seconds, but i want the query to be more dynamic if possible.
I've tried placing the sub-query in different places with no luck
SELECT *
FROM TABLE A
JOIN TABLE B
ON A.SLD_MENU_ITM_ID = B.SLD_MENU_ITM_ID
AND B.ACTV_FLG = 1
WHERE A.WK_END_THU_ID_NU >= (SELECT DISTINCT MAX (WK_END_THU_ID_NU) FROM TABLE A)
AND A.WK_END_THU_END_YR_NU = YEAR(GETDATE())
AND A.LGCY_NATL_STR_NU IN (7731)
AND B.SLD_MENU_ITM_ID = 4314
I just want this to run faster. Maybe there is a different approach i should be taking?
I would move the subquery to the FROM clause and change the WHERE clause to only refer to A:
SELECT *
FROM A CROSS JOIN
(SELECT MAX(WK_END_THU_ID_NU) as max_wet
FROM A
) am
ON a.WK_END_THU_ID_NU = max_wet JOIN
B
ON A.SLD_MENU_ITM_ID = B.SLD_MENU_ITM_ID AND
B.ACTV_FLG = 1
WHERE A.WK_END_THU_END_YR_NU = YEAR(GETDATE()) AND
A.LGCY_NATL_STR_NU IN (7731) AND
A.SLD_MENU_ITM_ID = 4314; -- is the same as B
Then you want indexes. I'm pretty sure you want indexes on:
A(SLD_MENU_ITM_ID, WK_END_THU_END_YR_NU, LGCY_NATL_STR_NU, SLD_MENU_ITM_ID)
B(SLD_MENU_ITM_ID, ACTV_FLG)
I will note that moving the subquery to the FROM clause probably does not affect performance, because SQL Server is smart enough to only execute it once. However, I prefer table references in the FROM clause when reasonable. I don't think a window function would actually help in this case.

Query Variable Table without storing variables

Salesforce Marketing Cloud queries do not allow variables or temporary tables according to the "SQL Support" section of this official documentation (http://help.marketingcloud.com/en/documentation/exacttarget/interactions/activities/query_activity/)
I have a data extension called Parameters_DE with fields Name and Value that stores constant values. I need to refer to this DE in queries.
Using transact-SQL, an example is:
Declare #number INT
SET #number = (SELECT Value FROM Parameters_DE WHERE Name='LIMIT')
SELECT * FROM Items_DE
WHERE Price < #number
How can the above be done without variables or temporary tables so that I can refer to the value of the 'LIMIT' variable that is stored in Parameters_DE and so that the query will work in Marketing Cloud?
This is what I would have done anyway, even if variables are allowed:
SELECT i.*
FROM Items_DE i
INNER JOIN Parameters_DE p ON p.Name = 'LIMIT'
WHERE i.Price < p.Value
Wanting to a use a variable is indicative of still thinking procedural, instead of set-based. Note that, if you need to, you can join to the Parameters_DE table more than once (give a difference alias each time) to use the values of different parameters at different parts in a query.
You can also make things more efficient for this type of query by having a parameters table with one row, and a column for each value you need. Then you can JOIN to the table one time with a 1=1 condition and look at just the columns you need. Of course, this idea has limitations, too.
You could just use the SELECT which retrieves the number in your WHERE clause:
SELECT * FROM Items_DE
WHERE Price < (SELECT Value FROM Parameters_DE WHERE Name='LIMIT')
This can be done with a join
SELECT i.*
FROM Items_DE i
INNER JOIN Parameters_DE p
ON p.Name = 'LIMIT'
AND p.Price > i.Value

Why would using a temp table vs a table variable improve the speed of this query?

I currently have a performance issue with a query (that is more complicated than the example below). Originally the query would run and take say 30 seconds, then when I switched out the use of a table variable to using a temp table instead, the speed is cut down to a few seconds.
Here is a trimmed down version using a table variable:
-- Store XML into tables for use in query
DECLARE #tCodes TABLE([Code] VARCHAR(100))
INSERT INTO
#tCodes
SELECT
ParamValues.ID.value('.','VARCHAR(100)') AS 'Code'
FROM
#xmlCodes.nodes('/ArrayOfString/string') AS ParamValues(ID)
SELECT
'SummedValue' = SUM(ot.[Value])
FROM
[SomeTable] st (NOLOCK)
JOIN
[OtherTable] ot (NOLOCK)
ON ot.[SomeTableID] = st.[ID]
WHERE
ot.[CodeID] IN (SELECT [Code] FROM #tCodes) AND
st.[Status] = 'ACTIVE' AND
YEAR(ot.[SomeDate]) = 2013 AND
LEFT(st.[Identifier], 11) = #sIdentifier
Here is the version with the temp table which performs MUCH faster:
SELECT
ParamValues.ID.value('.','VARCHAR(100)') AS 'Code'
INTO
#tCodes
FROM
#xmlCodes.nodes('/ArrayOfString/string') AS ParamValues(ID)
SELECT
'SummedValue' = SUM(ot.[Value])
FROM
[SomeTable] st (NOLOCK)
JOIN
[OtherTable] ot (NOLOCK)
ON ot.[SomeTableID] = st.[ID]
WHERE
ot.[CodeID] IN (SELECT [Code] FROM #tCodes) AND
st.[Status] = 'ACTIVE' AND
YEAR(ot.[SomeDate]) = 2013 AND
LEFT(st.[Identifier], 11) = #sIdentifier
The problem I have with performance is solved with the change but I just don't understand why it fixes the issue and would prefer to know why. It could be related to something else in the query but all I have changed in the stored proc (which is much more complicated) is to switch from using a table variable to using a temp table. Any thoughts?
The differences and similarities between table variables and #temp tables are looked at in depth in my answer here.
Regarding the two queries you have shown (unindexed table variable vs unindexed temp table) three possibilities spring to mind.
INSERT ... SELECT to table variables is always serial. The SELECT can be parallelised for temp tables.
Temp tables can have column statistics histograms auto created for them.
Usually the cardinality of table variables is assumed to be 0 (when they are compiled when the table is empty)
From the code you have shown (3) seems the most likely explanation.
This can be resolved by using OPTION (RECOMPILE) to recompile the statement after the table variable has been populated.

SQL Server inner join and subquery

I have below the query that updates certain ids on HierarchicalTable.
My first query, is stable but I have a problem on its performance:
DECLARE #targetName varchar(100)
UPDATE a
SET a.PrimaryId = b.PrimaryId
, a.SecondaryId = b.SecondaryId
FROM
(
SELECT PrimaryId
, SecondaryId
FROM Hierarchical
WHERE ParentName = #targetName
) as a
JOIN
(
SELECT PrimaryId
, SecondaryId
FROM Hierarchical
WHERE Name = #targetName
) b
ON a.ParentId = b.Id
This next query, is my second option:
DECLARE #targetName varchar(100)
UPDATE a
SET a.PrimaryId = b.PrimaryId
, a.SecondaryId = b.SecondaryId
FROM Hierarchical a
JOIN Hierarchical b
ON a.ParentId = b.Id
WHERE a.ParentName = #targetName
AND b.Name = #targetName
My questions are:
Does the second query execute just like the first query?
Will the second query outperform the first query?
*Note: I have large scale of data, and we're having hardware issues on executing
these queries.
I've posted here at SO so that I can have any opinions that I can see.
Your first query will not execute because it is missing an on clause. Let me assume that the on clause is really a.Id = b.Id.
The question you are asking is about how the query is optimized. The real way to answer is to look at the query plan, which you can readily see in SQL Server Management Studio. You can start with the documentation to go down this path.
In your case, though, you are using the subqueries to say "do the filtering when you read the data". Actually, SQL Server typically pushes such filtering operations down to the table read, so the subqueries are probably superfluous.
If you want to improve performance, I would suggest that you have the following indexes on the table: hierarchical(parentname, id) and hierarchical(name, id). These should probably give a good performance boost.