We have a stored procedure that is used to allow users to search in a table with 20 million records and 40 columns wide. There are about 20 different columns they can search on (any combination) from and all those columns are in the WHERE clause.
Furthermore each columns is checked for Null and needs to be able to search with just part of the data.
Here is an example
(
#FirstName IS NULL
OR (RTRIM(UPPER(FirstName)) LIKE RTRIM(UPPER(#FirstName)) + '%')
)
AND (#LastName IS NULL)
What is a best way to rewrite this stored procedure? Should I break this stored procedure into multiple small stored procedures? If so how? I will need to allow user to search
When I look at the execution plan, regardless of what columns are passed, it always does the index scan
I had exactly this situation years ago, millions of rows and numerous filter parameters and the best method is to use dynamic sql. Construct a SQL statement based on the parameters that have values, then execute the SQL statement. (EXEC sp_executesql #sql)
The select clause of the sql statement is static but the from clause and the where clause is based on the parameters.
CREATE PROCEDURE dbo.DynamicSearch
#FirstName VARCHAR(20),
#LastName VARCHAR(20),
#CompanyName VARCHAR(50)
AS
BEGIN
DECLARE #SQL NVARCHAR(MAX) = N''
DECLARE #Select NVARCHAR(MAX) = N'SELECT ColA, ColB, ColC, ColD '
DECLARE #From NVARCHAR(MAX) = N'From Person'
DECLARE #Where NVARCHAR(MAX) = N''
IF #FirstName IS NOT NULL
Begin
Set #Where = #Where + 'FirstName = ''' + #FirstName + ''''
End
IF #LastName IS NOT NULL
Begin
if len(#Where) > 0
Begin
Set #Where = #Where + ' AND '
End
Set #Where = #Where + 'LastName = ''' + #LastName + ''''
End
IF #CompanyName IS NOT NULL
Begin
if len(#Where) > 0
Begin
Set #Where = #Where + ' AND '
End
Set #From = #From + ' inner join Company on person.companyid = company.companyid '
Set #Where = #Where + 'company.CompanyName = ''' + #CompanyName + ''''
End
Set #SQL = #Select + #From + #Where
EXECUTE sp_executesql #sql
END
To go down the dynamic SQL route you would use something like:
CREATE PROCEDURE dbo.SearchSomeTable
#FirstName VARCHAR(20),
#LastName VARCHAR(20),
#AnotherCol INT
AS
BEGIN
DECLARE #SQL NVARCHAR(MAX) = N'SELECT SomeColumn FROM SomeTable WHERE 1 = 1',
#ParamDefinition NVARCHAR(MAX) = N'#FirstName VARCHAR(20),
#LastName VARCHAR(20),
#AnotherCol INT';
IF #FirstName IS NOT NULL
#SQL = #SQL + ' AND FirstName = #FirstName';
IF #LastName IS NOT NULL
#SQL = #SQL + ' AND LastName = #LastName';
IF #AnotherCol IS NOT NULL
#SQL = #SQL + ' AND AnotherCol = #AnotherCol';
EXECUTE sp_executesql #sql, #ParamDefinition, #FirstName, #LastName, #AnotherCol;
END
Otherwise you will need to use the OPTION (RECOMPILE) query hint to force the query to recompile each time it is run to get the optimal plan for the particular parameters you have passed.
Another possibility, though the above is perhaps the most logical, is to create one or more "search" criteria tables into which you insert the users selections, and then perform LEFT JOINS against the search criteria. You would only be able to do this for cases in which there is an equivalency and preferably an small datatype such as int. For string comparisons, these kind of joins would be potentially dreadful as far as performance goes. The only problem with the above suggestion (dynamic-sql) is the possibility of plan cache bloat as each execution in which there is only one difference with an existing plan would cause a new plan to be generated.
Related
How can I make the following code SQL injection safe? I know that the problem is the following line:
SET #sqlCommand = #sqlCommand + 'Event.Name LIKE ' + '''%' + #name + '%'''
But I don't know how to make it SQL injection safe. I heard something about REPLACE but this doesn't solve the problem as a whole.
CREATE PROCEDURE searchEvents #name VARCHAR(50), #location VARCHAR(20), #postcode CHAR(4), #address VARCHAR(40), #startDate DATETIME, #endDate DATETIME
AS
DECLARE
#sqlCommand NVARCHAR(MAX) = 'SELECT Event.Name, Description, Location.Name AS Location, Postcode, Address, StartDate, EndDate, Website FROM Event JOIN Location ON Event.LocationID = Location.LocationID',
#parameters NVARCHAR(MAX),
#whereIncluded BIT = 0
BEGIN
IF #name IS NOT NULL
BEGIN
IF #whereIncluded = 0
BEGIN
SET #sqlCommand = #sqlCommand + ' WHERE '
SET #whereIncluded = 1
END
ELSE
SET #sqlCommand = #sqlCommand + ' AND '
SET #sqlCommand = #sqlCommand + 'Event.Name LIKE ' + '''%' + #name + '%'''
END
-- It's the same if clause for all parameters like above
SET #parameters = '#p_name VARCHAR(50), #p_location VARCHAR(20), #p_postcode CHAR(4), #p_address VARCHAR(40), #p_startDate DATETIME, #p_endDate DATETIME'
EXEC sp_executesql
#sqlCommand,
#parameters,
#p_name = #name,
#p_location = #location,
#p_postcode = #postcode,
#p_address = #address,
#p_startDate = #startDate,
#p_endDate = #endDate
END
Parametrise your SQL. Statements like SET #sqlCommand = #sqlCommand + 'Event.Name LIKE ' + '''%' + #name + '%''' are awful.
I'm not going to go too indepth here, there are 100's of example on how to make your SQL "safe(r)". HOwever, her's a few pointers...
Parametrisation:
Concatenating strings for variables is a sure way to leave yourself open to injection. Take the simple example below:
DECLARE #SQL nvarchar(MAX);
DECLARE #name varchar(1000);
SET #SQL = N'
SELECT *
FROM MyTable
WHERE [Name] = ''' + #Name + ''';';
EXEC (#SQL);
This is an awful example of Dynamic SQL. If you set the value of #name to ''; DROP TABLE MyTable;--' then SQL statement becomes:
SELECT *
FROM MyTable
WHERE [Name] = ''; DROP TABLE MyTable;-- ';
Oh good! Your table, MyTable has been dropped. The correct way would be:
DECLARE #SQL nvarchar(MAX);
DECLARE #name varchar(1000);
SET #SQL = N'
SELECT *
FROM MyTable
WHERE [Name] = #dName;';
EXEC sp_executesql #SQL, N'#dname varchar(1000)', #dName = #Name;
Dynamic Objects:
This is another common mistake people make. They have a query along the lines of:
DECLARE #SQL nvarchar(MAX);
DECLARE #Table varchar(1000);
SET #SQL = N'SELECT * FROM ' + #Table;
EXEC (#SQL);
This suffers exactly the same problem as above. You can't pass a variable as an object name, so you need to do this a little differently. This is by preferred method:
DECLARE #SQL nvarchar(MAX);
DECLARE #Table sysname; --notice the type here as well.
SELECT #SQL = N'SELECT *' + NCHAR(10) +
N'FROM ' + (SELECT QUOTENAME(t.[name]) --QUOTENAME is very important
FROM sys.tables t
WHERE t.[name] = #Table) + N';';
PRINT #SQL;
EXEC sp_executesql #SQL;
Why am I querying sys.tables? Well, this means if you do pass a nonsense table name in, the value of #SQL will be NULL; meaning that the dynamic SQL is completely harmless.
Like I said, this is just the basics; SO isn't the right place for a full answer here. There are 100's of articles on this subject, and you'll learn far more via your own research.
You can just...
SELECT
...
WHERE
#name IS NULL
OR Event.Name LIKE '%' + #name + '%'
You don't even need dynamic SQL in this case, so you can execute the above SQL directly, instead of through sp_executesql.
Be careful about the performance though - a LIKE operand starting with % may cause a full table (or clustered index) scan.
I have this procedure for custom paging, search and sort options.
ALTER PROCEDURE [dbo].[stp_OrdersPaginated]
#Name NVARCHAR(50)=NULL,
#OrderNumber NVARCHAR(50)=NULL,
#Status NVARCHAR(50)=NULL,
#OrderBy NVARCHAR(100)=NULL,
#PageNumber INT,
#PageSize INT
AS
BEGIN
SET NOCOUNT ON;
CREATE TABLE #ORDERS
(
[OrderId] Bigint
,[Name] Varchar(100)
,[Status] Varchar(50)
,[CreatedDate] Date
,[OrderNumber] Varchar(100)
,[UserId] Bigint
,[Amount] Decimal
,RowNumber Bigint IDENTITY(1,1)
)
DECLARE #intTotal INT
SET #intTotal = #PageSize * #PageNumber
DECLARE #sSQL NVARCHAR(MAX)
DECLARE #Where NVARCHAR(MAX) = ''
DECLARE #Order NVARCHAR(MAX) = ''
SET #sSQL = 'SELECT dbo.[Order].OrderId, [User].Name, dbo.[Order].Status,
dbo.[Order].CreatedDate, [Order].OrderNumber, dbo.[User].UserId,
dbo.Order.[Amount]
FROM dbo.[Order]
INNER JOIN dbo.User
ON dbo.[User].UserId = dbo.[Order].UserId'
SET #Order =' ORDER BY ' +#OrderBy
IF #Name is not null
SET #Where = #Where + ' AND dbo.[User].Name LIKE ''%'+#Name+'%'''
IF #OrderNumber is not null
SET #Where = #Where + ' AND dbo.[Order].OrderNumber LIKE '''+#OrderNumber+'%'''
IF #Status is not null
SET #Where = #Where + ' AND dbo.[Order].[Status] LIKE '''+#Status+'%'''
IF LEN(#Where) > 0
SET #sSQL = #sSQL + ' WHERE ' + RIGHT(#Where, LEN(#Where)-4)
INSERT INTO #ORDERS
EXECUTE (#sSQL + #Order)
Select [OrderId],[Name],[Status],[CreatedDate],[OrderNumber,[UserId]
,[Amount],RowNumber
From #ORDERS
WHERE RowNumber between ((#PageNumber * #PageSize)-(#PageSize- 1)) AND (#PageNumber * #PageSize)
Declare #TotalRecords Integer
Declare #TotalPage Integer
SELECT #TotalRecords=MAX(RowNumber) from #ORDERS
if(#TotalRecords is not NULL)
begin
if(#TotalRecords%#PageSize = 0)
begin
SET #TotalPage = #TotalRecords/#PageSize
end
else
begin
SET #TotalPage = #TotalRecords/#PageSize + 1
end
end
else
begin
set #TotalPage = 1
end
Select #TotalPage [TotalPages], #TotalRecords [TotalRecords]
DROP Table #ORDERS
END
As you can see one of the Search params is Name. The Procedure works perfectly for all except for Single Quote(') for obvious reason. Example: if I pass O' Brien for name it would fail. Is there any way to handle such single quote values with custom queries on SQL Server?
Your problem stems from not constructing your dynamic SQL in a best-practice manner, which along with making it difficult to construct the correct SQL, is also exposing you to SQL injection attacks.
Essentially, you should never use concatenation when adding parameters to your SQL string. I also use char(37) to represent the % sign, as this way it isn't necessary to escape it with apostrophes.
So your SQL becomes something like
IF #Name is not null
SET #Where += 'AND Name LIKE char(37)+#Name+char(37)'
IF #OrderNumber is not null
SET #Where += ' AND OrderNumber LIKE #OrderNumber+char(37)'
IF #Status is not null
SET #Where += ' AND [Status] LIKE #Status+char(37)'
IF LEN(#Where) > 0
SET #sSQL += ' WHERE ' + RIGHT(#Where, LEN(#Where)-4)
Creating the OrderBy is harder as you cannot parameterise that. if you absolutely trust the value passed in, then your code is okay, but the safest way would be to have something like an if statement that
tests the value passed in and creates the appropriate clause. e.g.
IF #OrderBy = 'status'
SET #Ssql += ' ORDER BY Status'
--Next you need to declare the parameters being included in the dynamic SQL. i'm making up the variable types, as you didn't specify what they were.
declare #params nvarchar(1000) = '#name nvarchar(100), #ordernumber nvarchar(100), #status nvarchar(10)'
--Then you can execute your dynamic SQL, passing to it the parameters provided to your procedure
insert into #temp
ExeCUTE sp_executesql #sSQL, #params, #name, #ordernumber, #status
One other benefit of constructing dynamic SQL in this manner, rather than concatenating strings, is that SQL Server can actually cache the query plan like it does for non-dynamic SQL and you don't have the performance hit that you get when you use concatenation.
Try:
IF #Name is not null
BEGIN
SET #Name = REPLACE(#Name, '''', '''''')
SET #Where = #Where + ' AND dbo.[User].Name LIKE ''%'+#Name+'%'''
END
so i want to pass several values to a stored procedure. then bases on those values, add to a variable that would then be set as my where clause. but im stumped and google aint helping. here is what i have/my idea
CREATE PROCEDURE sp_RunReport #TransType varchar(255), #Accounts varchar(75)
AS
--declare a varchar WHERE clause variable here
IF #TransType <> ''
--add to WHERE clause variable
iF #Accounts <>''
--add to WHERE clause variable
SELECT *
FROM log
WHERE --my WHERE clause
GO
i dont see how this is possible. i can do it all in c sharp in the front end, but i feel like it should be done and can be done in the stored procedure. any help would be greatly appreciated
While dynamic SQL in previous answer is perhaps fine, I would suggest this "pure" SQL approach
WHERE TransType = ISNULL(#TransType, TransType)
AND Accounts = ISNULL(#Accounts, Accounts)
There is some room to optimize performance by getting rid of ISULL (not optimal when used in WHERE clause), but this should give you the idea.
Of course, insteead of AND your logic may reuire an OR and also you need to ensure that params are "properly" empty (NULL in my case or whatever will constitute "empty" if you re-write this to remove the ISNULL)
You can use dynamic SQL using EXEC:
CREATE PROCEDURE sp_RunReport #TransType varchar(255), #Accounts varchar(75)
AS
DECLARE #where VARCHAR(4000) = ' WHERE 1=1'
DECLARE #sql NVARCHAR(4000)
IF #TransType <> ''
SET #where = #where + ' AND TransType = ''' + #TransType + ''''
IF #Accounts <>''
SET #where = #where + ' AND Accounts = ''' + #Accounts + ''''
SET #sql = 'SELECT *
FROM log' + #where
exec (#sql)
Or using sp_executesql which is recommended:
CREATE PROCEDURE sp_RunReport #TransType varchar(255), #Accounts varchar(75)
AS
DECLARE #where NVARCHAR(4000) = N' WHERE 1=1'
DECLARE #sql NVARCHAR(4000)
DECLARE #ParmDefinition nvarchar(500) = N'#TransType varchar(255), #Accounts varchar(75)'
IF #TransType <> ''
SET #where = #where + N' AND TransType = #TransType'
IF #Accounts <>''
SET #where = #where + N' AND Accounts = #Accounts'
SET #sql = N'SELECT *
FROM log' + #where
EXECUTE sp_executesql #sql, #ParmDefinition,
#TransType = #TransType, #Accounts = #Accounts
I have a stored procedure that looks like this:
create stored procedure aaa
#columnName nvarchar(10),
#comparisonParam nvarchar(10),
#val nvarchar(100)
as
declare #date date
set #date = convert(#val, date)
exec('select * from Sheep where ' + #columnName + #comparisonParam + #date )
When actually the query is supposed to be like this:
select * from Sheep where birth_date = 12-12-2000
When I run the procedure it doesn't work with date value, but with string and int it works.
The date value must be quoted.
On a side note, I'd warn against doing this. If you need to build up dynamic sql you need to consider the risks such as: sql injection attacks, bad syntax, invalid semantics etc.
Consider using an existing component to build the query. A few examples:
.NET LINQ (to SQL/Entities) http://msdn.microsoft.com/en-us/library/bb397926.aspx
.NET SqlCommandBuilder http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlcommandbuilder.aspx
See Best way of constructing dynamic sql queries in C#/.NET3.5?
Your date literal needs to be surrounded in single quotes (I use CHAR(39) usually since it is easier to read and doesn't require escaping). Otherwise you are saying:
WHERE birth_date = (12) - (12) - (2000)
Which resolves to:
WHERE birth_date = -2000
Which resolves to DATEADD(DAY, -2000, '1900-01-01') or:
WHERE birth_date = '1894-07-11'
This is probably not going to yield the results you want.
With typical SQL injection warnings in place of course, and assuming that #columnName is always a string or date/time column, here is how I would re-write your stored procedure (though I would probably try to avoid the dynamic SQL altogether if I could).
ALTER PROCEDURE dbo.aaa
#columnName NVARCHAR(10),
#comparisonParam NVARCHAR(10),
#val NVARCHAR(100)
AS
BEGIN
SET NOCOUNT ON;
DECLARE #sql NVARCHAR(MAX);
SET #sql = N'SELECT * FROM dbo.Sheep WHERE '
+ QUOTENAME(#columnName) + #comparisonParam + CHAR(39)
+ REPLACE(#val, CHAR(39), CHAR(39) + CHAR(39))
+ CHAR(39);
EXEC sp_executesql #sql;
END
GO
In order to thwart potential issues you may want to add validation for columns and data types, and ensure that the operation is one you expect. e.g.
CREATE PROCEDURE dbo.bbb
#columnName NVARCHAR(10),
#comparisonParam NVARCHAR(10),
#val NVARCHAR(100)
AS
BEGIN
SET NOCOUNT ON;
DECLARE #delimiter CHAR(1);
SELECT #delimiter = CASE
WHEN [system_type_id] IN
(104,48,52,56,127,59,60,62,106,108,122) THEN '' -- numeric
WHEN [system_type_id] IN
(35,40,41,42,43,58,61,99,167,175,231,239) THEN CHAR(39) -- string
END FROM sys.columns WHERE [object_id] = OBJECT_ID(N'dbo.Sheep')
AND name = #columnName;
IF #delimiter IS NULL
BEGIN
RAISERROR('Column ''%s'' was not found or an unexpected data type.', 11, 1,
#columnName);
RETURN;
END
IF #comparisonParam NOT IN (N'=', N'>=', N'<=', N'<', N'>', N'LIKE')
BEGIN
RAISERROR('Comparison param ''%s'' was not valid.', 11, 1, #comparisonParam);
RETURN;
END
DECLARE #sql NVARCHAR(MAX);
SET #sql = N'SELECT * FROM dbo.Sheep WHERE '
+ QUOTENAME(#columnName) + ' ' + #comparisonParam + ' '
+ #delimiter + REPLACE(#val, CHAR(39), CHAR(39) + CHAR(39))
+ #delimiter;
EXEC sp_executesql #sql;
END
GO
Now make sure you use an unambiguous date format for your string literals. 12-12-2000 is not a good choice. 20001212 is much better.
There are possibly some ways to do this without dynamic SQL - I gave a very simplified answer here. This may be feasible depending on the data types, the number of potential columns, and the number of operations you want to support.
create stored procedure aaa
#columnName nvarchar(10),
#comparisonParam nvarchar(10),
#val nvarchar(100)
as
declare #date date
set #date = convert(#val, date)
exec('select * from Sheep where ' + #columnName + #comparisonParam + #date )
Build your dynamic SQL using a typed date parameter. Use sp_executesql which allows to pass parameter definitions and parameter values to the embedded SQL:
create procedure aaa
#columnName nvarchar(10),
#comparisonParam nvarchar(10),
#val nvarchar(100)
as
declare #date date, #sql nvarchar(max);
set #date = convert(#val, date);
-- Note how #date is a *variable* in the generated SQL:
set #sql =N'select * from Sheep where ' +
quotename(#columnName) + #comparisonParam + N'#date';
-- Use sp_executesql and define the type and value of the variable
exec sp_executesql #sql, N'#date date', #date;
You need to create table valued function for this rather than creating a stored procedure.
You can use any table valued function like
SELECT * from dbo.CallMyFunction(parameter1, parameter2
eg.
CREATE FUNCTION Sales.ufn_SalesByStore (#storeid int)
RETURNS TABLE
AS
RETURN
(
SELECT P.ProductID, P.Name, SUM(SD.LineTotal) AS 'Total'
FROM Production.Product AS P
JOIN Sales.SalesOrderDetail AS SD ON SD.ProductID = P.ProductID
JOIN Sales.SalesOrderHeader AS SH ON SH.SalesOrderID = SD.SalesOrderID
JOIN Sales.Customer AS C ON SH.CustomerID = C.CustomerID
WHERE C.StoreID = #storeid
GROUP BY P.ProductID, P.Name
);
GO
See this for reference http://msdn.microsoft.com/en-us/library/ms191165(v=sql.105).aspx
EDIT
Instead of using dynamic sql try giving a thought on
SELECT * FROM
FROM [dbo].[Person]
WHERE ([PersonID] = #PersonID
OR #AreaID IS NULL
)
AND (([Code] BETWEEN #Code AND CHAR(255))
OR #Code IS NULL
)
AND (([Name] BETWEEN #Name AND CHAR(255))
OR #Name IS NULL
)
AND (([Notes] BETWEEN #Notes AND CHAR(255))
OR #Notes IS NULL
)
I will preface this question by saying, I do not think it is solvable. I also have a workaround, I can create a stored procedure with an OUTPUT to accomplish this, it is just easier to code the sections where I need this checksum using a function.
This code will not work because of the Exec SP_ExecuteSQL #SQL calls. Anyone know how to execute dynamic SQL in a function? (and once again, I do not think it is possible. If it is though, I'd love to know how to get around it!)
Create Function Get_Checksum
(
#DatabaseName varchar(100),
#TableName varchar(100)
)
RETURNS FLOAT
AS
BEGIN
Declare #SQL nvarchar(4000)
Declare #ColumnName varchar(100)
Declare #i int
Declare #Checksum float
Declare #intColumns table (idRecord int identity(1,1), ColumnName varchar(255))
Declare #CS table (MyCheckSum bigint)
Set #SQL =
'Insert Into #IntColumns(ColumnName)' + Char(13) +
'Select Column_Name' + Char(13) +
'From ' + #DatabaseName + '.Information_Schema.Columns (NOLOCK)' + Char(13) +
'Where Table_Name = ''' + #TableName + '''' + Char(13) +
' and Data_Type = ''int'''
-- print #SQL
exec sp_executeSql #SQL
Set #SQL =
'Insert Into #CS(MyChecksum)' + Char(13) +
'Select '
Set #i = 1
While Exists(
Select 1
From #IntColumns
Where IdRecord = #i)
begin
Select #ColumnName = ColumnName
From #IntColumns
Where IdRecord = #i
Set #SQL = #SQL + Char(13) +
CASE WHEN #i = 1 THEN
' Sum(Cast(IsNull(' + #ColumnName + ',0) as bigint))'
ELSE
' + Sum(Cast(IsNull(' + #ColumnName + ',0) as bigint))'
END
Set #i = #i + 1
end
Set #SQL = #SQL + Char(13) +
'From ' + #DatabaseName + '..' + #TableName + ' (NOLOCK)'
-- print #SQL
exec sp_executeSql #SQL
Set #Checksum = (Select Top 1 MyChecksum From #CS)
Return isnull(#Checksum,0)
END
GO
It "ordinarily" can't be done as SQL Server treats functions as deterministic, which means that for a given set of inputs, it should always return the same outputs. A stored procedure or dynamic sql can be non-deterministic because it can change external state, such as a table, which is relied on.
Given that in SQL server functions are always deterministic, it would be a bad idea from a future maintenance perspective to attempt to circumvent this as it could cause fairly major confusion for anyone who has to support the code in future.
Here is the solution
Solution 1:
Return the dynamic string from Function then
Declare #SQLStr varchar(max)
DECLARE #tmptable table (<columns>)
set #SQLStr=dbo.function(<parameters>)
insert into #tmptable
Exec (#SQLStr)
select * from #tmptable
Solution 2:
call nested functions by passing parameters.
You can get around this by calling an extended stored procedure, with all the attendant hassle and security problems.
http://decipherinfosys.wordpress.com/2008/07/16/udf-limitations-in-sql-server/
http://decipherinfosys.wordpress.com/2007/02/27/using-getdate-in-a-udf/
Because functions have to play nicely with the query optimiser there are quite a few restrictions on them. This link refers to an article that discusses the limitations of UDF's in depth.
Thank you all for the replies.
Ron: FYI, Using that will throw an error.
I agree that not doing what I originally intended is the best solution, I decided to go a different route. My two choices were to use sum(cast(BINARY_CHECKSUM(*) as float)) or an output parameter in a stored procedure. After unit testing speed of each, I decided to go with sum(cast(BINARY_CHECKSUM(*) as float)) to get a comparable checksum value for each table's data.