SQL Server 2008 execution plan question - sql

I have a question addressed to sql guru.
There are two tables with almost identical structure.
Based on parameter passed into the stored procedure I need to collect data from one or another table.
How to do that in a best way?
Please do not suggest to combine those tables into single one - that is not appropriate.
I did it following (MS SQL Server 2008):
Select *
FROM
String s
JOIN (
SELECT id
,TypeCode
,ProdNo
FROM Table1
WHERE #param = 1 AND TypeCode= 'INV'
UNION
SELECT id
,TypeCode
,ProdNo
FROM Table2
WHERE #param = 2 AND TypeCode= 'INV'
) m ON m.Id = s.Id
WHERE s.Id = 256
but when I looked at execution plan I was surprised because it got data from both tables in parallel threads and only after that filtered by #param value.
I thought that filtering will be made on the first stage and data collected from single table.
Is there a way to make select only from one table without splitting query into two queries and using IF operator?
Thanks

you really need to read this Dynamic Search Conditions in T-SQL by Erland Sommarskog. You shouldn't worry about repeating code, this is not some homework assignment. Just worry about making the execution plan use an index. When making SQL code "pretty" the only thing to consider is indenting & case, any other changes can cause the query plan to be slower. I've seen trivial changes to a super fast query result in a super slow query. GO FOR SPEED (index usage) and duplicate code as necessary. also see: The Curse and Blessings of Dynamic SQL
You tagged the question sql-server-2008, so if you're running SQL 2008 SP1 CU5 (10.0.2746) and SQL 2008 R2 CU1 (10.50.1702) and later, there is a new behavior (as explained in the "Dynamic Search Conditions" article linked above) of OPTION(RECOMPILE) that does not appear in all versions of SQL 2008 or in 2005. This behavior basically evaluates the #Local_Variables values at runtime and recompiles the query accordingly. In your case, this should cause one half of your UNION to be eliminated when compiling them.

Could you just use a simple IF statement?
IF #Param = 1
BEGIN
EXEC SQL
END
ELSE IF #Param = 2
BEGIN
EXEC SQL
END
ELSE
RAISERROR('Invalid Parameter', 16, 1)
Or alternatively you could build the query dynamically and execute it using the sp_executesql stored procedure.
DECLARE #Sql NVARCHAR(100)
SET #Sql = N'SELECT * FROM ('
IF #Param = 1
SET #Sql = #Sql + N'SELECT 1 a, 2 b, 3 c'
ELSE IF #param = 2
SET #Sql = #Sql + N'SELECT 4 a, 5 b, 6 c'
ELSE
RAISERROR('Invalid Parameter', 16, 1)
SET #Sql = #Sql + ') tbl'
EXEC sp_executesql #sql

First thing I'd suggest is putting the ID filter inside the union as well.
I've also changed the UNION to UNION ALL. This avoids evaluating DISTINCT rows
Select *
FROM
String s
JOIN (
SELECT id
,TypeCode
,ProdNo
FROM Table1
WHERE #param = 1 AND TypeCode= 'INV' AND id = 256
UNION ALL
SELECT id
,TypeCode
,ProdNo
FROM Table2
WHERE #param = 2 AND TypeCode= 'INV' AND id = 256
) m ON m.Id = s.Id
WHERE s.Id = 256

SQL Server's not that clever - when writing queries you should only ensure that you send the least amount of SQL to get the data you want (without sending superfluous statements), but also provide the most amount of information (via filters) where possible to give the query optimiser as many hints as possible about the data. As you've seen, it will execute all the SQL you send it.
So it sounds like you need to use dynamic-SQL from what I'm reading. This also gives you the benefit of being able to merge common parts of the SQL cutting down on the amount of duplication. For example, you could have (just taking your inner code -- you can wrap the rest of your stuff around it):
DECLARE #sql NVARCHAR(1000)
SET #sql = 'SELECT id, TypeCode, ProdCode'
IF #param = 1
SET #sql = #sql + ' FROM table1'
IF #param = 2
SET #sql = #sql + ' FROM table2'
SET #sql = #sql + ' WHERE TypeCode = ''INV'''
EXECUTE sp_ExecuteSQL #sql
Just be aware, if you're going to wind this into something more complicated, about little Bobby Tables: it's possible to abuse sp_ExecuteSQL and open gaping holes, but used correctly - with parameterised dynamic SQL - it's as good as a stored procedure.

if you can create a tDUMMY table (with a single dummy row) give this a shot.
Select
*
FROM
String s
JOIN (
SELECT
id, TypeCode, ProdNo
FROM
tDUMMY INNER JOIN Table1 ON TypeCode= 'INV'
WHERE
#param = 1
UNION ALL
SELECT
id, TypeCode, ProdNo
FROM
tDUMMY INNER JOIN Table2 ON TypeCode= 'INV'
WHERE
#param = 2
) m ON m.Id = s.Id
WHERE s.Id = 256
theoretically the query optimizer should first filter the tDUMMY table and then attempt the join. So if #param = 1 the second query should get out much faster (it will check 1 row of tDUMMY again, but it shouldn't check table2)
Note - i also made it UNION ALL (but it wouldn't have much of an impact) because one side will always return no rows anyway.

Related

How to a add a condition in the where clause in SQL query based on a variable value

CREATE PROCEDURE GetBenefitCategory(
#BenefitCategoryID INT = 0
)
AS
IF #BenefitCategoryID = 0
SELECT * FROM BenefitCategory
ELSE
SELECT * FROM BenefitCategory Where BenefitCategoryID = #BenefitCategoryID
I need to write a SP where i need to add a condition is Where clause as above if the benefitcategoryID is supplied as a input parameter to the SP.
Is there a better of achieving this rather than using a IF ELSE condition.
Actually, my query is very big and i need to add only one condition in the where clause based on the input parameter for the procedure, hence didn't wanted to write redundant code in the SP with IF ELSE condition.
Please let me know if there is a better way of achieving this.
You can express the logic as:
SELECT *
FROM BenefitCategory
WHERE BenefitCategoryID = #BenefitCategoryID OR #BenefitCategoryID = 0;
However, this may not have optimal performance, because the OR is likely to prevent the use of an index. One method around this is UNION ALL:
SELECT *
FROM BenefitCategory
WHERE BenefitCategoryID = #BenefitCategoryID
UNION ALL
SELECT *
FROM BenefitCategory
WHERE #BenefitCategoryID = 0;
However, this repeat the query, which isn't desirable.
So, you have a few options. If you don't care about performance (say the table is small so index usage is not important), then the first method is fine. If you do, you can use the first method and force a re-compile. One common alternative -- particularly for somewhat complex queries that take some time to run -- is dynamic SQL. This prevents the OR, so the query can be readily optimized.
And, there is your solution. It is quite reasonable. The only caveat I would add is that the stored procedure could be a stored function (unless you use dynamic SQL). That is usually easier to work with in other parts of the code.
You can use a parameterized View (with a default value for the parameter). A Parameterized View has the advantage of being indexable.
You get all the advantages of
SELECT *
FROM BenefitCategory
WHERE BenefitCategoryID = #BenefitCategoryID OR #BenefitCategoryID = 0;
and also it's compiled and (possibly) indexed. Another way would be
SELECT *
INTO #tempTable
FROM BenefitCategory
IF (#BenefitCategoryID = 0)
SELECT * INTO #tempTable1 FROM #tempTable
ELSE Select * INTO #tempTable1
WHERE BenefitCategoryID = #BenefitCategoryID
SELECT * from #tempTable1;
and repeat the process if you have many more situations, so that your temp tables keep getting smaller and smaller. But that really depends on how small your category table is, and how elaborate your filtering (where clauses) are.
The best way to write the query for this scenario to use IIF statement.
Use this below query if the BenefitCategoryID is not null column, its most likely in this case.
SELECT * FROM BenefitCategory Where BenefitCategoryID =
iif(#BenefitCategoryID=0,BenefitCategoryID,#BenefitCategoryID)
Below link to learn more about the IIF statement
https://learn.microsoft.com/en-us/sql/t-sql/functions/logical-functions-iif-transact-sql?view=sql-server-2017
An alternative method is using dynamic sql. This is very simplified, using the example you have, however:
CREATE PROCEDURE GetBenefitCategory (#BenefitCategoryID int = 0)
AS
DECLARE #SQL nvarchar(MAX);
SET #SQL = N'SELECT *' + NCHAR(10) +
N'FROM BenefitCategory' + NCHAR(10) +
N'WHERE {Your Other Where Clauses}' +
IIF(#BenefitCategoryID = 0, N';', + NCHAR(10) + N' AND BenefitCategoryID = #dBenefitCategoryID;');
EXEC sp_executesql #SQL, N'dBenefitCategoryID int', #dBenefitCategoryID = #BenefitCategoryID;
GO

Re-use Query Parts in T-SQL

I have a very complicated stored procedure that repeats a very complicated query with different where clauses based on certain values passed in. The stored procedure takes up over 500 lines of code, with the common part of the query taking up just over 100 lines. That common part is repeated 3 times.
I originally thought to use CTE (Common Table Expressions) except in T-SQL you can't define the common part, do your IF statement and then apply the WHERE clause. That's essentially what I need.
As a workaround I created a view for the common code, but it's only used in one stored procedure.
Is there any way to do this without creating a full view or temp tables?
Ideally I would like to do something like this:
WITH SummaryCTE (col1, col2, col3...)
AS
(
SELECT concat("Pending Attachments - ", ifnull(countCol1, 0)) col1
-- all the rest of the stuff
FROM x as y
LEFT JOIN attachments z on z.attachmentId = x.attachmentId
-- and more complicated stuff
)
IF (#originatorId = #userId)
BEGIN
SELECT * FROM SummaryCTE
WHERE
-- handle this security case
END
ELSE IF (#anotherCondition = 1)
BEGIN
SELECT * FROM SummaryCTE
WHERE
-- a different where clause
END
ELSE
BEGIN
SELECT * FROM SummaryCTE
WHERE
-- the generic case
END
Hopefully the pseudo code gives you an idea of what I would like. Right now my workaround is to create a view for the contents of what I defined SummaryCTE as, and then handle the IF/ELSE IF/ELSE clause. Executing this structure will throw an error at the first IF statement because the next command is supposed to be a SELECT instead. At least in T-SQL.
Maybe this doesn't exist in any other way, but I wanted to know for sure.
Well, aside from the temp tables and views that you've identified, you could go with dynamic SQL to build the code then execute it. This keeps you from having to repeat code, but makes it a bit hard to just deal with. Like this:
declare #sql varchar(max) = 'with myCTE (col1, col2) as ( select * from myTable) select * from myCTE'
if (#myVar = 1)
begin
#sql = #sql + ' where col1 = 2'
end
else if (#myVar = 2)
begin
#sql = #sql + ' where col2 = 4'
end
-- ...
exec #sql
Another option would be to incorporate your different where clauses into the original query.
WITH SummaryCTE (col1, col2, col3...)
AS
(
SELECT concat("Pending Attachments - ", ifnull(countCol1, 0)) col1
-- all the rest of the stuff
FROM x as y
LEFT JOIN attachments z on z.attachmentId = x.attachmentId
-- and more complicated stuff
)
select *
from SummaryCTE
where
(
-- this was your first if
#originatorId = #userId
and ( whatever you do for your security case )
)
or
(
-- this was your second branch
#anotherCondition = 1
and ( handle another case here )
)
or
-- etc. etc.
This eliminates the if/else chain but makes the query more complicated. It also can cause some bad cached query plans because of parameter sniffing, but that may not matter much depending on your data. Test that before making a decision. (You can also add optimizer hints to not cache the query plan. You won't get a bad one, but you also take a hit on every execution to create the query plan again. Test to find out, don't guess. Also, a solution with a view and the if/else chain will suffer from the same parameter sniffing/cached query plan problem.)

Finding number of columns returned by a query

How can I get the number of columns returned by an SQL query using SQL Server?
For example, if I have a query like following:
SELECT *
FROM A1, A2
It should return the total number of columns in table A1 + total number of columns in table A2. But the query might be more complicated.
Here is one method:
select top 0
into _MYLOCALTEMPTABLE
from (your query here) t
select count(*)
from Information_Schema.Columns c
where table_name = '_MYLOCALTEMPTABLE'
You can do something similar by creating a view.
You didn't specify your SQL Server version but I'm assuming it's not 2012. However, future readers of this question might be on 2012+ so I'm posting this answer for them.
SQL Server 2012 provides a set of procedures to provide more meta-data about queries and parameters. In this case, the stored procedure sp_describe_first_result_set will provide a handy tabular form.
There is also a DMO function, sys.dm_exec_describe_first_result_set, to provide similar content which is what you'd want to use in your example
DECLARE
-- Your query goes here
#query nvarchar(4000) = N'SELECT * FROM mdm.tblStgBatch AS TSB';
-- Tabular results
EXECUTE sys.sp_describe_first_result_set #tsql = #query;
-- Simple column count
SELECT
COUNT(1) AS column_count
FROM
sys.dm_exec_describe_first_result_set(#query, NULL, 0);
The new metadata discovery options are replacing FMTONLY which is how one would solve this problem prior to 2012. My TSQL chops are apparently not strong enough to do anything useful with it and instead I'd have to bail out to a .NET language to work with the output of FMTONLY.
SET FMTONLY ON;
SELECT *
FROM A1, A2;
SET FMTONLY OFF;
Try this;
--Insert into a temp table (this could be any query)
SELECT *
INTO #temp
FROM [yourTable]
--Select from temp table
SELECT * FROM #temp
--List of columns
SELECT COUNT(name) NumOfColumns FROM tempdb.sys.columns WHERE object_id =
object_id('tempdb..#temp');
--drop temp table
DROP TABLE #temp
Ugly I know:
SELECT COUNT(*) +
(
SELECT COUNT(*)
FROM information_schema.columns
WHERE table_name = 'A1'
)
FROM information_schema.columns
WHERE table_name = 'A2'

Nested if statements in SQL Server stored procedure SELECT statement

I'm new here, and relatively new to stored procedures, so please bear with me! I've checked related questions on here and can't find anything that works in this instance.
I am trying to build a stored procedure (MS SQL Server 2005) that takes a number of passed in values and in effect dynamically builds up the SQL as you would with inline SQL.
This is where I've come unstuck.
We have (somewhat simplified for clarity):
#searchf1 varchar(100), -- search filter 1
#searchr1 varchar(100), -- search result 1
#searchf2 varchar(100), -- search filter 2
#searchr2 varchar(100), -- search result 2
#direction char(1), -- direction to order results in
AS
set nocount on
set dateformat dmy
SELECT *
FROM database.dbo.table T
WHERE T.deleted = 'n'
ORDER BY CASE #direction
WHEN 'A' THEN T.id
WHEN 'D' THEN T.id DESC
END
END
set nocount off
I have also tried the lines from ORDER BY as:
IF #direction = 'N' THEN
ORDER BY
T.id
ELSE
ORDER BY
T.id DESC
Both approaches give me an error along the lines:
"Incorrect syntax near the keyword 'DESC'." (which references the line id DESC following the final ORDER BY
As part of this stored procedure I also want to try to feed in matched pairs of values which reference a field to look up and a field to match it to, these could either be present or ''. To do that I need to add into the SELECT section code similar to:
WHERE
deleted = 'n'
IF #searchf1 <> '' THEN
AND fieldf1 = #searchf1 AND fieldr1 = #searchr1
This however generates errors like:
Incorrect syntax near the keyword 'IF'.
I know dynamic SQL of this type isn't the most elegant. And I know that I could do it with glocal IF ELSE statements, but if I did the SP would be thousands of lines long; there are going to up to 15 pairs of these search fields, together with the direction and field to order that direction on.
(the current version of this SP uses a passed in list of IDs to return generated by some inline dynamic SQL, through doing this I'm trying to reduce it to one hit to generate the recordset)
Any help greatly appreciated. I've hugely simplified the code in the above example for clarity, since it's the general concept of a nested IF statement with SELECT and ORDER BY that I'm inquiring about.
For this I would try to go with a more formal Dynamic SQL solution, something like the following, given your defined input parameters
DECLARE #SQL VARCHAR(MAX)
SET #SQL = '
SELECT
FROM
database.dbo.table T
WHERE
T.deleted = ''n'' '
--Do your conditional stuff here
IF #searchf1 <> '' THEN
SET #SQL = #SQL + ' AND fieldf1 = ' + #searchf1 + ' AND fieldr1 = ' + #searchr1 + ''' '
--Finish the query
SET #SQL = #SQL + ' ORDER BY xxx'
EXEC(#SQL)
DISCLAIMER: The use of Dynamic SQL is NOT something that should be taken lightly, and proper consideration should be taken in ALL circumstances to ensure that you are not open to SQL injection attacks, however, for some dynamic search type operations it is one of the most elegant route.
Try it this way:
SELECT * FROM database.dbo.table T WHERE T.deleted = 'n'
ORDER BY
CASE WHEN #direction='A' THEN T.id END ASC,
CASE WHEN #direction='D' THEN T.id END DESC
Source Article:
http://blog.sqlauthority.com/2007/07/17/sql-server-case-statement-in-order-by-clause-order-by-using-variable/
Another option that you might have, depending on the data type of your field, if nulls are NOT allowed, would be to do something like this.
SELECT *
FROM database.dbo.table T
WHERE T.deleted = 'n'
AND fieldf1 = COALESCE(#searchf1, fieldf1)
AND fieldr1 = COALESCE(#searchr1, fieldr1)
--ETC
ORDER BY fieldf1
This way you are not using dynamic SQL and it is fairly readable, just have the variable be null when you are looking to omit the data.
NOTE: As I mentioned this route will NOT work if any of the COALESCE columns contain null values.

Proper way to handle 'optional' where clause filters in SQL?

Let's say you have a stored procedure, and it takes an optional parameter. You want to use this optional parameter in the SQL query. Typically this is how I've seen it done:
SELECT * FROM dbo.MyTableName t1
WHERE t1.ThisField = 'test'
AND (#MyOptionalParam IS NULL OR t1.MyField = #MyOptionalParam)
This seems to work well, however it causes a high amount of logical reads if you run the query with STATISTICS IO ON. I've also tried the following variant:
SELECT * FROM dbo.MyTableName t1
WHERE t1.ThisField = 'test'
AND t1.MyField = CASE WHEN #MyOptionalParam IS NULL THEN t1.MyField ELSE #MyOptionalParam END
And it yields the same number of high reads. If we convert the SQL to a string, then call sp_ExecuteSQL on it, the reads are almost nil:
DECLARE #sql nvarchar(max)
SELECT #sql = 'SELECT * FROM dbo.MyTableName t1
WHERE t1.ThisField = ''test'''
IF #MyOptionalParam IS NOT NULL
BEGIN
SELECT #sql = #sql + ' AND t1.MyField = #MyOptionalParam '
END
EXECUTE sp_ExecuteSQL #sql, N'#MyOptionalParam', #MyOptionalParam
Am I crazy? Why are optional where clauses so hard to get right?
Update: I'm basically asking if there's a way to keep the standard syntax inside of a stored procedure and get low logical reads, like the sp_ExecuteSql method does. It seems completely crazy to me to build up a string... not to mention it makes it harder to maintain, debug, visualize..
If we convert the SQL to a string, then call sp_ExecuteSQL on it, the reads are almost nil...
Because your query is no longer evaluating an OR, which as you can see kills sargability
The query plan is cached when using sp_executesql; SQL Server doesn't have to do a hard parse...
Excellent resource: The Curse & Blessing of Dynamic SQL
As long as you are using parameterized queries, you should safe from SQL Injection attacks.
This is another variation on the optional parameter technique:
SELECT * FROM dbo.MyTableName t1
WHERE t1.ThisField = 'test'
AND t1.MyField = COALESCE(#MyOptionalParam, t1.MyField)
I'm pretty sure it will have the same performance problem though. If performance is #1 then you'll probably be stuck with forking logic and near duplicate queries or building strings which is equally painful in TSQL.
You're using "OR" clause (implicitly and explicitly) on the first two SQL statements. Last one is an "AND" criteria. "OR" is always more expensive than "AND" criteria. No you're not crazy, should be expected.
EDIT: Adding link to similar question/answer with context as to why the union / if...else approach works better than OR logic (FYI, Remus, the answerer in this link, used to work on the SQL Server team developing service broker and other technologies)
Change from using the "or" syntax to a union approach, you'll see 2 seeks that should keep your logical read count as low as possible:
SELECT * FROM dbo.MyTableName t1
WHERE t1.ThisField = 'test'
AND #MyOptionalParam IS NULL
union all
SELECT * FROM dbo.MyTableName t1
WHERE t1.ThisField = 'test'
AND t1.MyField = #MyOptionalParam
If you want to de-duplicate the results, use a "union" instead of "union all".
EDIT: Demo showing that the optimizer is smart enough to rule out scan with a null variable value in UNION:
if object_id('tempdb..#data') > 0
drop table #data
go
-- Put in some data
select top 1000000
cast(a.name as varchar(100)) as thisField, cast(newid() as varchar(50)) as myField
into #data
from sys.columns a
cross join sys.columns b
cross join sys.columns c;
go
-- Shwo count
select count(*) from #data;
go
-- Index on thisField
create clustered index ixc__blah__temp on #data (thisField);
go
set statistics io on;
go
-- Query with a null parameter value
declare #MyOptionalParam varchar(50);
select *
from #data d
where d.thisField = 'test'
and #MyOptionalParam is null;
go
-- Union query
declare #MyOptionalParam varchar(50);
select *
from #data d
where d.thisField = 'test'
and #MyOptionalParam is null
union all
select *
from #data d
where d.thisField = 'test'
and d.myField = '5D25E9F8-EA23-47EE-A954-9D290908EE3E';
go
-- Union query with value
declare #MyOptionalParam varchar(50);
select #MyOptionalParam = '5D25E9F8-EA23-47EE-A954-9D290908EE3E'
select *
from #data d
where d.thisField = 'test'
and #MyOptionalParam is null
union all
select *
from #data d
where d.thisField = 'test'
and d.myField = '5D25E9F8-EA23-47EE-A954-9D290908EE3E';
go
if object_id('tempdb..#data') > 0
drop table #data
go
Change from using the "or" syntax to a two query approach, you'll see 2 different plans that should keep your logical read count as low as possible:
IF #MyOptionalParam is null
BEGIN
SELECT *
FROM dbo.MyTableName t1
END
ELSE
BEGIN
SELECT *
FROM dbo.MyTableName t1
WHERE t1.MyField = #MyOptionalParam
END
You need to fight your programmer's urge to reduce duplication here. Realize you are asking for two fundamentally different execution plans and require two queries to produce two plans.