Are there any existing, elegant, patterns for an optional TOP clause? - sql

Take the (simplified) stored procedure defined here:
create procedure get_some_stuffs
#max_records int = null
as
begin
set NOCOUNT on
select top (#max_records) *
from my_table
order by mothers_maiden_name
end
I want to restrict the number of records selected only if #max_records is provided.
Problems:
The real query is nasty and large; I want to avoid having it duplicated similar to this:
if(#max_records is null)
begin
select *
from {massive query}
end
else
begin
select top (#max_records)
from {massive query}
end
An arbitrary sentinel value doesn't feel right:
select top (ISNULL(#max_records, 2147483647)) *
from {massive query}
For example, if #max_records is null and {massive query} returns less than 2147483647 rows, would this be identical to:
select *
from {massive query}
or is there some kind of penalty for selecting top (2147483647) * from a table with only 50 rows?
Are there any other existing patterns that allow for an optionally count-restricted result set without duplicating queries or using sentinel values?

I was thinking about this, and although I like the explicitness of the IF statement in your Problem 1 statement, I understand the issue of duplication. As such, you could put the main query in a single CTE, and use some trickery to query from it (the bolded parts being the highlight of this solution):
CREATE PROC get_some_stuffs
(
#max_records int = NULL
)
AS
BEGIN
SET NOCOUNT ON;
WITH staged AS (
-- Only write the main query one time
SELECT * FROM {massive query}
)
-- This part below the main query never changes:
SELECT *
FROM (
-- A little switcheroo based on the value of #max_records
SELECT * FROM staged WHERE #max_records IS NULL
UNION ALL
SELECT TOP(ISNULL(#max_records, 0)) * FROM staged WHERE #max_records IS NOT NULL
) final
-- Can't use ORDER BY in combination with a UNION, so move it out here
ORDER BY mothers_maiden_name
END
I looked at the actual query plans for each and the optimizer is smart enough to completely avoid the part of the UNION ALL that doesn't need to run.
The ISNULL(#max_records, 0) is in there because TOP NULL isn't valid, and it will not compile.

You could use SET ROWCOUNT:
create procedure get_some_stuffs
#max_records int = null
as
begin
set NOCOUNT on
IF #max_records IS NOT NULL
BEGIN
SET ROWCOUNT #max_records
END
select top (#max_records) *
from my_table
order by mothers_maiden_name
SET ROWCOUNT 0
end

There are a few methods, but as you probably notice these all look ugly or are unnecessarily complicated. Furthermore, do you really need that ORDER BY?
You could use TOP (100) PERCENT and a View, but the PERCENT only works if you do not really need that expensive ORDER BY, since SQL Server will ignore your ORDER BY if you try it.
I suggest taking advantage of stored procedures, but first lets explain the difference in the type of procs:
Hard Coded Parameter Sniffing
--Note the lack of a real parametrized column. See notes below.
IF OBJECT_ID('[dbo].[USP_TopQuery]', 'U') IS NULL
EXECUTE('CREATE PROC dbo.USP_TopQuery AS ')
GO
ALTER PROC [dbo].[USP_TopQuery] #MaxRows NVARCHAR(50)
AS
BEGIN
DECLARE #SQL NVARCHAR(4000) = N'SELECT * FROM dbo.ThisFile'
, #Option NVARCHAR(50) = 'TOP (' + #MaxRows + ') *'
IF ISNUMERIC(#MaxRows) = 0
EXEC sp_executesql #SQL
ELSE
BEGIN
SET #SQL = REPLACE(#SQL, '*', #Option)
EXEC sp_executesql #SQL
END
END
Local Variable Parameter Sniffing
IF OBJECT_ID('[dbo].[USP_TopQuery2]', 'U') IS NULL
EXECUTE('CREATE PROC dbo.USP_TopQuery2 AS ')
GO
ALTER PROC [dbo].[USP_TopQuery2] #MaxRows INT NULL
AS
BEGIN
DECLARE #Rows INT;
SET #Rows = #MaxRows;
IF #MaxRows IS NULL
SELECT *
FROM dbo.THisFile
ELSE
SELECT TOP (#Rows) *
FROM dbo.THisFile
END
No Parameter Sniffing, old method
IF OBJECT_ID('[dbo].[USP_TopQuery3]', 'U') IS NULL
EXECUTE('CREATE PROC dbo.USP_TopQuery3 AS ')
GO
ALTER PROC [dbo].[USP_TopQuery3] #MaxRows INT NULL
AS
BEGIN
IF #MaxRows IS NULL
SELECT *
FROM dbo.THisFile
ELSE
SELECT TOP (#MaxRows) *
FROM dbo.THisFile
END
PLEASE NOTE ABOUT PARAMETER SNIFFING:
SQL Server initializes variables in Stored Procs at the time of compile, not when it parses.
This means that SQL Server will be unable to guess the query and will
choose the last valid execution plan for the query, regardless of
whether it is even good.
There are two methods, hard coding an local variables that allow the Optimizer to guess.
Hard Coding for Parameter Sniffing
Use sp_executesql to not only reuse the query, but prevent SQL Injection.
However, in this type of query, will not always perform substantially better since a TOP Operator is not a column or table (so the statement effectively has no variables in this version I used)
Statistics at the time of the creation of your compiled plan will dictate how affective the method is if you are not using a variable on a predicate (ON, WHERE, HAVING)
Can use options or hint to RECOMPILE to overcome this issue.
Variable Parameter Sniffing
Variable Paramter sniffing, on the other hand, is flexible enough to work witht the statistics here, and in my own testing it seemed the variable parameter had the advantage of the query using statistics (particularly after I updated the statistics).
Ultimately, the issue of performance is about which method will use the least amount of steps to traverse through the leaflets. Statistics, the rows in your table, and the rules for when SQL Server will decide to use a Scan vs Seek impact the performance.
Running different values will show performances change significantly, though typically better than USP_TopQuery3. So DO NOT ASSUME one method is necessarily better than the other.
Also note you can use a table-valued function to do the same, but as Dave Pinal would say:
If you are going to answer that ‘To avoid repeating code, you use
Function’ ‑ please think harder! Stored procedure can do the same...
if you are going to answer
with ‘Function can be used in SELECT, whereas Stored Procedure cannot
be used’ ‑ again think harder!
SQL SERVER – Question to You – When to use Function and When to use Stored Procedure

You could do it like this (using your example):
create procedure get_some_stuffs
#max_records int = null
as
begin
set NOCOUNT on
select top (ISNULL(#max_records,1000)) *
from my_table
order by mothers_maiden_name
end
I know you don't like this (according to your point 2), but that's pretty much how it's done (in my experience).

How about something like this (you're have to really look at execution plans and I didn't have time to set anything up)?
create procedure get_some_stuffs
#max_records int = null
as
begin
set NOCOUNT on
select *, ROW_NUMBER(OVER order by mothers_maiden_name) AS row_num
from {massive query}
WHERE #max_records IS NULL OR row_num < #max_records
end
Another thing you can do with {massive query} is make a view or inline table-valued function (it it's parametrized), which is generally a pretty good practice for anything big and repetitively used.

Related

Correct usage of WHERE and AND in procedure with optional parameters

Pre-question info:
I'm writing a stored-procedure that would take some parameters and depending on those parameters(if they are filled - because they don't have to be) I'm adding few where clauses. The thing is I don't know if I'm gonna even use the where clause from start because I don't know if any of my params is going to be non-empty/not-null.
The inside of procedure looks cca like:
BEGIN
DECLARE #strMySelect varchar(max)
SET #strMySelect ='SELECT myparams FROM mytable'
// add some WHERE statement(*)
IF(ISNULL(#myParamDate1,'')<>'')BEGIN
SET #strMySelect =#strMySelect +'
AND param1 >='''+CAST(#myParamDate1 as varchar(30))+''''
END
IF(ISNULL(#myParamDate2,'')<>'')BEGIN
SET #strMySelect =#strMySelect +'
AND param1 <='''+CAST(#myParamDate2 as varchar(30))+''''
END
//... bit more of these "AND"s
EXECUTE(#strExec)
QUESTION:
Is it ok(correct way of doing this) to put in my query some WHERE statement that I know that will be always true so I can use in my parameter cases AND always? OR do I have to check for each param if it's first one that is filled or is there an easy way of checking in SQL that at least one of my parameters isn't NULL/empty?
I handle optional parameters like this:
where (
(#optionalParameter is not null and someField = #optionalParameter )
or
#optionalParameter is null
)
etc
I find it simpler.
Your extra where clause is not a problem from a performance point-of-view, since the query optimizer will (likely) remove the 1 = 1 condition anyway.
However, I would recommend a solution along the lines of what Dan Bracuk suggested for two reasons:
It is easier to read, write and debug.
You avoid the possibility of SQL injection attacks.
There are cases where you have to custom-build your query-string (e.g. when given a table name as parameter), but I would avoid it whenever possible.
You don't need to use EXEC function to check for parameters. A good practice is using case to check for parameter value for example
CREATE PROCEDURE MyProc
#Param1 int = 0
AS
BEGIN
SELECT * FROM MyTable WHERE CASE #param1 WHEN 0 THEN #param1 ELSE MyField END = #Param1
END
GO
In case that #param1 has no value (default 0) then you have #param1=#param1 which gives always true, in case you have #param with value then condition is MyField=#param1.

Stored procedure date parameter filters - Ignore if Null

I am using the following SQL in my stored procedure to not filter by date parameters if they are null.
WHERE (Allocated >= ISNULL(#allocatedStartDate, '01/01/1900')
AND Allocated <= ISNULL(#allocatedEndDate,'01/01/3000'))
AND
(MatterOpened >= ISNULL(#matterOpenedStartDate, '01/01/1900')
AND MatterOpened <= ISNULL(#matterOpenedEndDate, '01/01/3000'))
Will this give any kind of performance hit when dealing with a lot of records?
Is there a better way to do this?
Number of records - around 500k
Or just let the query optimizer have it:
WHERE ( #allocatedStartDate is NULL or Allocated >= allocatedStartDate ) and
( #allocatedEndDate is NULL or Allocated <= #allocatedEndDate ) and
( #matterOpenedStartDate is NULL or MatterOpened >= #matterOpenedStartDate ) and
( #matterOpenedEndDate is NULL or MatterOpened <= #matterOpenedEndDate )
Note that this is not logically equivalent to your query. The last line uses column MatterOpened, not Allocated, as I assume that was a typographic error.
If performance is really an issue, you may want to consider adding indexes and changing the stored procedure to execute different queries based on the parameters. At least break it into: no filter, filter only on Allocated, filter only on MatterOpened, filter on both columns.
In a lot of cases, dynamic SQL can be better for you instead of trying to rely on the optimizer to cache a good plan for both NULL and non-NULL parameters.
DECLARE #sql NVARCHAR(MAX);
SET #sql = N'SELECT
...
WHERE 1 = 1';
SET #sql = #sql + CASE WHEN #allocatedStartDate IS NOT NULL THEN
' AND Allocated >= ''' + CONVERT(CHAR(8), #allocatedStartDate, 112) + '''';
-- repeat for other clauses
EXEC sp_executesql #sql;
No, it's not fun to maintain, but each variation should get its own plan in the cache. You'll want to test with different settings for "Optimize for ad hoc workloads" and database-level paramaterization settings. Oops, just noticed 2005. Keep those in mind for the future (and any readers who aren't still stuck on 2005).
Also make sure to use EXEC sp_executesql and not EXEC.
Instead of checking to see if the variable is null in your query, check them at the beginning of your stored procedure and change the value to your default
SELECT
#allocatedStartDate = ISNULL(#allocatedStartDate, '01/01/1900'),
#allocatedEndDate = ISNULL(#allocatedEndDate,'01/01/3000'),
#matterOpenedStartDate = ISNULL(#matterOpenedStartDate, '01/01/1900'),
#matterOpenedEndDate = ISNULL(#matterOpenedEndDate, '01/01/3000')
Maybe something like this:
DECLARE #allocatedStartDate DATETIME=GETDATE()
DECLARE #allocatedEndDate DATETIME=GETDATE()-2
;WITH CTE AS
(
SELECT
ISNULL(#allocatedStartDate, '01/01/1900') AS allocatedStartDate,
ISNULL(#allocatedEndDate,'01/01/3000') AS allocatedEndDate
)
SELECT
*
FROM
YourTable
CROSS JOIN CTE
WHERE (Allocated >= CTE.allocatedStartDate
AND Allocated <= CTE.allocatedEndDate)
AND
(MatterOpened >= CTE.allocatedStartDate
AND Allocated <= CTE.allocatedEndDate)

Print Dynamic Parameter Values

I've used dynamic SQL for many tasks and continuously run into the same problem: Printing values of variables used inside the Dynamic T-SQL statement.
EG:
Declare #SQL nvarchar(max), #Params nvarchar(max), #DebugMode bit, #Foobar int
select #DebugMode=1,#Foobar=364556423
set #SQL='Select #Foobar'
set #Params=N'#Foobar int'
if #DebugMode=1 print #SQL
exec sp_executeSQL #SQL,#Params
,#Foobar=#Foobar
The print results of the above code are simply "Select #Foobar". Is there any way to dynamically print the values & variable names of the sql being executed? Or when doing the print, replace parameters with their actual values so the SQL is re-runnable?
I have played with creating a function or two to accomplish something similar, but with data type conversions, pattern matching truncation issues, and non-dynamic solutions. I'm curious how other developers solve this issue without manually printing each and every variable manually.
I dont believe the evaluated statement is available, meaning your example query 'Select #FooBar' is never persisted anywhere as 'Select 364556243'
Even in a profiler trace you would see the statement hit the cache as '(#Foobar int)select #foobar'
This makes sense, since a big benefit of using sp_executesql is that it is able to cache the statement in a reliable form without variables evaluated, otherwise if it replaced the variables and executed that statement we would just see the execution plan bloat.
updated: Here's a step in right direction:
All of this could be cleaned up and wrapped in a nice function, with inputs (#Statement, #ParamDef, #ParamVal) and would return the "prepared" statement. I'll leave some of that as an exercise for you, but please post back when you improve it!
Uses split function from here link
set nocount on;
declare #Statement varchar(100), -- the raw sql statement
#ParamDef varchar(100), -- the raw param definition
#ParamVal xml -- the ParamName -to- ParamValue mapping as xml
-- the internal params:
declare #YakId int,
#Date datetime
select #YakId = 99,
#Date = getdate();
select #Statement = 'Select * from dbo.Yak where YakId = #YakId and CreatedOn > #Date;',
#ParamDef = '#YakId int, #Date datetime';
-- you need to construct this xml manually... maybe use a table var to clean this up
set #ParamVal = ( select *
from ( select '#YakId', cast(#YakId as varchar(max)) union all
select '#Date', cast(#Date as varchar(max))
) d (Name, Val)
for xml path('Parameter'), root('root')
)
-- do the work
declare #pStage table (pName varchar(100), pType varchar(25), pVal varchar(100));
;with
c_p (p)
as ( select replace(ltrim(rtrim(s)), ' ', '.')
from dbo.Split(',', #ParamDef)d
),
c_s (pName, pType)
as ( select parsename(p, 2), parsename(p, 1)
from c_p
),
c_v (pName, pVal)
as ( select p.n.value('Name[1]', 'varchar(100)'),
p.n.value('Val[1]', 'varchar(100)')
from #ParamVal.nodes('root/Parameter')p(n)
)
insert into #pStage
select s.pName, s.pType, case when s.pType = 'datetime' then quotename(v.pVal, '''') else v.pVal end -- expand this case to deal with other types
from c_s s
join c_v v on
s.pName = v.pName
-- replace pName with pValue in statement
select #Statement = replace(#Statement, pName, isnull(pVal, 'null'))
from #pStage
where charindex(pName, #Statement) > 0;
print #Statement;
On the topic of how most people do it, I will only speak to what I do:
Create a test script that will run the procedure using a wide range of valid and invalid input. If the parameter is an integer, I will send it '4' (instead of 4), but I'll only try 1 oddball string value like 'agd'.
Run the values against a data set of representative size and data value distribution for what I'm doing. Use your favorite data generation tool (there are several good ones on the market) to speed this up.
I'm generally debugging like this on a more ad hoc basis, so collecting the results from the SSMS results window is as far as I need to take it.
The best way I can think of is to capture the query as it comes across the wire using a SQL Trace. If you place something unique in your query string (as a comment), it is very easy to apply a filter for it in the trace so that you don't capture more than you need.
However, it isn't all peaches & cream.
This is only suitable for a Dev environment, maybe QA, depending on how rigid your shop is.
If the query takes a long time to run, you can mitigate that by adding "TOP 1", "WHERE 1=2", or a similar limiting clause to the query string if #DebugMode = 1. Otherwise, you could end up waiting a while for it to finish each time.
For long queries where you can't add something the query string only for debug mode, you could capture the command text in a StmtStarted event, then cancel the query as soon as you have the command.
If the query is an INSERT/UPDATE/DELETE, you will need to force a rollback if #DebugMode = 1 and you don't want the change to occur. In the event you're not currently using an explicit transaction, doing that would be extra overhead.
Should you go this route, there is some automation you can achieve to make life easier. You can create a template for the trace creation and start/stop actions. You can log the results to a file or table and process the command text from there programatically.

SQL Search using case or if

Everyone has been a super help so far. My next question is what is the best way for me to approach this... If I have 7 fields that a user can search what is the best way to conduct this search, They can have any combination of the 7 fields so that is 7! or 5040 Combinations which is impossible to code that many. So how do I account for when the User selects field 1 and field 3 or they select field 1, field 2, and field 7? Is there any easy to do this with SQL? I dont know if I should approach this using an IF statement or go towards a CASE in the select statement. Or should I go a complete different direction? Well if anyone has any helpful pointers I would greatly appreciate it.
Thank You
You'll probably want to look into using dynamic SQL for this. See: Dynamic Search Conditions in T-SQL and Catch-all queries for good articles on this topic.
Select f1,f2 from table where f1 like '%val%' or f2 like '%val%'
You could write a stored procedure that accepts each parameter as null and then write your WHERE clause like:
WHERE (field1 = #param1 or #param1 is null)
AND (field2 = #param2 or #param2 is null) etc...
But I wouldn't recommend it. It can definitely affect performance doing it this way depending on the number of parameters you have. I second Joe Stefanelli's answer with looking into dynamic SQL in this case.
Depends on:
how your data looks like,
how big they are,
how exact result is expected (all matching records or top 100 is enough),
how much resources has you database.
you can try something like:
CREATE PROC dbo.Search(
#param1 INT = NULL,
#param2 VARCHAR(3) = NULL
)
AS
BEGIN
SET NOCOUNT ON
-- create temporary table to keep keys (primary) of matching records from searched table
CREATE TABLE #results (k INT)
INSERT INTO
#results(k)
SELECT -- you can use TOP here to norrow results
key
FROM
table
-- you can use WHERE if there are some default conditions
PRINT ##ROWCOUNT
-- if #param1 is set filter #result
IF #param1 IS NOT NULL BEGIN
PRINT '#param1'
;WITH d AS (
SELECT
key
FROM
table
WHERE
param1 <> #param1
)
DELETE FROM
#results
WHERE
k = key
PRINT ##ROWCOUNT
END
-- if #param2 is set filter #result
IF #param2 IS NOT NULL BEGIN
PRINT '#param2'
;WITH d AS (
SELECT
key
FROM
table
WHERE
param2 <> #param2
)
DELETE FROM
#results
WHERE
k = key
PRINT ##ROWCOUNT
END
-- returns what left in #results table
SELECT
table.* -- or better only columns you need
FROM
#results r
JOIN
table
ON
table.key = r.k
END
I use this technique on large database (millions of records, but running on large server) to filter data from some predefined data. And it works pretty well.
However I don't need all matching records -- depends on query 10-3000 matching records is enough.
If you are using a stored procedure you can use this method:
CREATE PROCEDURE dbo.foo
#param1 VARCHAR(32) = NULL,
#param2 INT = NULL
AS
BEGIN
SET NOCOUNT ON
SELECT * FROM MyTable as t
WHERE (#param1 IS NULL OR t.Column1 = #param1)
AND (#param2 IS NULL OR t.COlumn2 = #param2)
END
GO
These are usually called optional parameters. The idea is that if you don't pass one in it gets the default value (null) and that section of your where clause always returns true.

Stored Procedure; Insert Slowness

I have an SP that takes 10 seconds to run about 10 times (about a second every time it is ran). The platform is asp .net, and the server is SQL Server 2005. I have indexed the table (not on the PK also), and that is not the issue. Some caveats:
usp_SaveKeyword is not the issue. I commented out that entire SP and it made not difference.
I set #SearchID to 1 and the time was significantly reduced, only taking about 15ms on average for the transaction.
I commented out the entire stored procedure except the insert into tblSearches and strangely it took more time to execute.
Any ideas of what could be going on?
set ANSI_NULLS ON
go
ALTER PROCEDURE [dbo].[usp_NewSearch]
#Keyword VARCHAR(50),
#SessionID UNIQUEIDENTIFIER,
#time SMALLDATETIME = NULL,
#CityID INT = NULL
AS
BEGIN
SET NOCOUNT ON;
IF #time IS NULL SET #time = GETDATE();
DECLARE #KeywordID INT;
EXEC #KeywordID = usp_SaveKeyword #Keyword;
PRINT 'KeywordID : '
PRINT #KeywordID
DECLARE #SearchID BIGINT;
SELECT TOP 1 #SearchID = SearchID
FROM tblSearches
WHERE SessionID = #SessionID
AND KeywordID = #KeywordID;
IF #SearchID IS NULL BEGIN
INSERT INTO tblSearches
(KeywordID, [time], SessionID, CityID)
VALUES
(#KeywordID, #time, #SessionID, #CityID)
SELECT Scope_Identity();
END
ELSE BEGIN
SELECT #SearchID
END
END
Why are you using top 1 #SearchID instead of max (SearchID) or where exists in this query? top requires you to run the query and retrieve the first row from the result set. If the result set is large this could consume quite a lot of resources before you get out the final result set.
SELECT TOP 1 #SearchID = SearchID
FROM tblSearches
WHERE SessionID = #SessionID
AND KeywordID = #KeywordID;
I don't see any obvious reason for this - either of aforementioned constructs should get you something semantically equivalent to this with a very cheap index lookup. Unless I'm missing something you should be able to do something like
select #SearchID = isnull (max (SearchID), -1)
from tblSearches
where SessionID = #SessionID
and KeywordID = #KeywordID
This ought to be fairly efficient and (unless I'm missing something) semantically equivalent.
Enable "Display Estimated Execution Plan" in SQL Management Studio - where does the execution plan show you spending the time? It'll guide you on the heuristics being used to optimize the query (or not in this case). Generally the "fatter" lines are the ones to focus on - they're ones generating large amounts of I/O.
Unfortunately even if you tell us the table schema, only you will be able to see actually how SQL chose to optimize the query. One last thing - have you got a clustered index on tblSearches?
Triggers!
They are insidious indeed.
What is the clustered index on tblSearches? If the clustered index is not on primary key, the database may be spending a lot of time reordering.
How many other indexes do you have?
Do you have any triggers?
Where does the execution plan indicate the time is being spent?