Optional where clause / parameter in a SQL 2008 stored proc? - sql

I'm writing some code that updates a table. Depending on what the user wants to do, it either updates a large set of records, or a smaller one. The delineating factor is a group ID.
The user can choose whether to update the table for all records, or just those with that groupID. I'd like to use the same stored procedure for both instances, with maybe a little logic in there to differentiate between the scenarios. (I'd prefer not to write two stored procs with 90% identical code.)
I'm no expert at stored procedures and am not sure if I can pass in optional parameters, or how to dynamically generate part of a where clause, depending on whether the groupID is there or not. Any suggestions are welcome.
Thanks!

You can use this or an "OR" contsruct
... WHERE GroupID = ISNULL(#GroupdID, GroupID)
... WHERE GroupID = #GroupdID OR #GroupdID IS NULL

create procedure MyProc (#GroupID int = null)
as
begin
update MyTable set ....
where #GroupID is null or GroupID = #GroupID
end

Related

Calling stored procedure to insert multiple values

In our application we have a multiline grids which have many records. For inserting or updating we are calling a stored procedure.
As per the current implementation the stored procedure is calling for each line in the grid. For each line it checks the existence in the table. If data is already there, it will update the table else insert new data into the table.
Instead of calling the procedure for each line, we thought create a table value parameter and pass all the grid values at the same time.
My questions are:
Is it a good approach?
How to handle the existence check (for insert or update) if I pass the values as table-valued parameter? Do I need to loop through the table and check it?
Is it better to have separate stored procedures for insert and update?
Please provide your suggestions. Thanks in advance.
1) TVP is a good approach. And a single stored proc call is more efficient with fewer calls to the Database.
2) You haven't made it clear if each row in the grid has some kind of ID column that determines if the data exists in the Table, however assuming there is, make sure that it is indexed then use INSERT INTO and UPDATE statements like this:
To add new rows:
INSERT INTO [grid_table]
SELECT * FROM [table_valued_parameter]
WHERE [id_column] NOT IN (SELECT [id_column] FROM [grid_table])
To update existing rows:
UPDATE gt
SET gt.col_A = tvp.col_A,
gt.col_B = tvp.col_B,
gt.col_C = tvp.col_C,
...
gt.col_Z = tvp.col_Z
FROM [grid_table] gt
INNER JOIN [table_valued_parameter] tvp ON gt.id_column = tvp.id_column
NB:
No need to do an IF EXISTS() or anything as the WHERE and JOIN
clauses will run the same checks,so no need to do a 'pre-check'
before running each statement.
This assumes the TVP data isthe same structure as the Table in the
database.
YOU MUST make sure the id_column is indexed.
I've use 'INNER JOIN' instead of just 'JOIN' to make the point it is an inner join
3) Using the approach above you just new one stored proc, simple and effective
It's a good approach
Any way try to put the logic through object level for iterating and checking and finally insert/update in T-SQL. This reduces overhead for RDMS as object level functionality is faster than operations in RDBMS.
Dont put too may stored procedures for each type of operation have a minimised procedures with multiple operations based on parameters you send to it.
Hope it helps!
Yes, it is a good approach. Calling procedure for each row is bad for performance. TVPs make life easier.
Yes, you can do that check in stored procedure, which should be a simple SELECT on uniqueId in most of the cases.
With this approach, yes, it is better to have both in same stored procedure.
1) Using TVP is good approach, but send only new or updated rows as TVP, no need to send entire datagrid.
2) For INSERT/UPDATE use MERGE example:
MERGE [dbo].[Contact] AS [Target]
USING #Contact AS [Source] ON [Target].[Email] = [Source].[Email]
WHEN MATCHED THEN
UPDATE SET [FirstName] = [Source].[FirstName],
[LastName] = [Source].[LastName]
WHEN NOT MATCHED THEN
INSERT ( [Email], [FirstName], [LastName] )
VALUES ( [Source].[Email], [Source].[FirstName], [Source].[LastName] );
3) For your case one stored procedure is enough.

Procedure Advice

I'm looking to boost performance on one of our processes within a production database. We have 2 sets of SPs which are by configuration settings stored within a configuration table.
An example syntax would be:
Declare #SWITCH BIT
IF #SWITCH = 1
INSERT INTO DEST_TABLE_A
SELECT VALUES
FROM SOURCE_TABLE
IF #SWITCH = 2
INSERT INTO DEST_TABLE_B
SELECT VALUES
FROM SOURCE_TABLE
Would it be better practice in this instance to move the IF logic into the WHERE clause to create a standardized instead of the logic having a conditional within it?
E.g.
INSERT INTO DEST_TABLE_A
SELECT VALUES
FROM SOURCE_TABLE
WHERE #SWITCH = 1
INSERT INTO DEST_TABLE_B
SELECT VALUES
FROM SOURCE_TABLE
WHERE #SWITCH = 2
I appreciate this might be an opinion piece but I was curious to see if anyone else has had experience with this scenario.
The second example might lead you to parameter sniffing problem . (longer explanation here)
This issue is created by the query optimizer that generates an execution plan optimized for one of the values of the switch statement (and this is the value that you send first time when you call the stored procedure).
In your case, if you call the stored procedure with #switch = 1 first time, an execution plan for this parameter is generated. Subsequent calls with #switch = 2 might take significantlty longer times to process.

How to structure a query with a large, complex where clause?

I have an SQL query that takes these parameters:
#SearchFor nvarchar(200) = null
,#SearchInLat Decimal(18,15) = null
,#SearchInLng Decimal(18,15) = null
,#SearchActivity int = null
,#SearchOffers bit = null
,#StartRow int
,#EndRow int
The variables #SearchFor, #SearchActivity, #SearchOffers can be either null or not null. #SearchInLat and #SearchInLng must both null, or both have values.
I'm not going to post the whole query as its boring and hard to read, but the WHERE clause is shaped like this:
( -- filter by activity --
(#SearchActivity IS NULL)
OR (#SearchActivity = Activities.ActivityID)
)
AND ( -- filter by Location --
(#SearchInLat is NULL AND #SearchInLng is NULL)
OR ( ... )
)
AND ( -- filter by activity --
#SearchActivity is NULL
OR ( ... )
)
AND ( -- filter by has offers --
#SearchOffers is NULL
OR ( ... )
)
AND (
... -- more stuff
)
I have read that this is a bad way to structure a query - that SqlServer has trouble working out an efficient execution plan with lots of clauses like this, so I'm looking for other ways to do it.
I see two ways of doing this:
Construct the query as a string in my client application, so that the WHERE clause only contains filters for the relevant parameters. The problem with this is it means not accessing the database through stored procedures, as everything else is at the moment.
Change the stored procedure so that it examines which arguments are null, and executes child procedures depending on which arguments it is passed. The problem here is that it would mean repeating myself a lot in the definition of the procs, and thus be harder to maintain.
What should I do? Or should I just keep on as I am currently doing? I have OPTION (RECOMPILE) set for the procedures, but I've heard that this doesn't work right in Server 2005. Also, I plan to add more parameters to this proc, so I want to make sure whatever solution I have is fairly scaleable.
The answer is to use DynamicSQL (be it in the client, or in an SP using sp_executesql), but the reason why is long, so here's a link...
Dynamic Search Conditions in T-SQL
A very short version is that one-size does not fit all. And as the optimiser creates one plan for one query, it's slow. So the solution is to continue using parameterised queries (for execution plan caching), but to have many queries, for the different types of search that can happen.
Perhaps an alternative might be to perform several separate select statements?
e.g.
( -- filter by activity --
if #SearchActivity is not null
insert into tmpTable (<columns>)
select *
from myTable
where (#SearchActivity = Activities.ActivityID)
)
( -- filter by Location --
if #SearchInLat is not null and #SearchInLng is not null
insert into tmpTable (<columns>)
select *
from myTable
where (latCol = #SearchInLat AND lngCol = #SearchInLng)
etc...
then select the temp table to return the final result set.
I'm not sure how this would work with respect to the optimiser and the query plans, but each individual select would be very straightforward and could utilise the indexes that you would have created on each column which should make them very quick.
Depending on your requirements it also may make sense to create a primary key on the temp table to allow you to join to it on each select (to avoid duplicates).
Look at the performance first, like others have said.
If possible, you can use IF clauses to simplify the queries based on what parameters are provided.
You could also use functions or views to encapsulate some of the code if you find you are repeating it often.

Should I write a whole procedure for each database table.column I update separately?

I have an application that uses AJAX liberally. I have several places where a single database column is being updated for the record the user is actively editing.
So far I've been creating separate stored procedures for each AJAX action... so I've got UPDATE_NAME, UPDATE_ADDRESS, UPDATE_PHONE stored procedures.
I was just wondering if there's a better way to continue utilizing stored procedures, but without creating one for each column.
I'd like to avoid reflecting upon a string parameter which specifies the column, if possible. I.e. I know I could have an UPDATE_COLUMN procedure which takes as one of its parameters the column name. This kind of gives me the willies, but if that's the only way to do it then I may give it some more considering. But not all columns are of the same data type, so that doesn't seem like a silver bullet.
Consider writing a single update procedure that accepts several columns and uses DEFAULT NULL for all columns that are not mandatory (as suggested by others).
Using NVL in the update will then only update the columns you provided. the only problem with this approach is, that you can't set a value to NULL.
PROCEDURE update_record (
in_id IN your_table.id%TYPE,
in_name IN your_table.name%TYPE DEFAULT NULL,
in_address IN your_table.address%TYPE DEFAULT NULL,
in_phone IN your_table.phone%TYPE DEFAULT NULL,
in_...
) AS
BEGIN
UPDATE your_table
SET name = NVL( in_name, name ),
address = NVL( in_address, address),
phone = NVL( in_phone, phone ),
...
WHERE id = in_id;
END update_record;
You can call it with named parameters then:
update_record( in_id => 123, in_address => 'New address' );
This allows you to update several columns at once when necessary.
I would say to stop using stored procedures for activities that simple, there is no justification to create so many small procedures for every single column in the database. You are much better off with dynamic sql (with parameters) for that.
Create a procedure that can update every column, but only updates columns for which you pass a non-null parameter
CREATE PROCEDURE spUpdateFoo (#fooId INT, #colA INT, #colB VARCHAR(32), #colC float)
AS
update Foo set colA = ISNULL(#colA, colA),
colB = ISNULL(#colB, colB),
colC = ISNULL(#colC, colC)
where fooId = #fooId
Note that this doesn't work if you want to be able to explicitly set null values through your procedure, but you could choose a different value to specify a non-change (-1, etc) with a little more complexity.
It doesn't hurt to do what you are doing, but it could get a little crazy if you continue that path. One thing you can do is create one stored procedure and assign NULL values as default parameters to all your fields that you are updating. So when you call the sproc from your app, if a parameter is given a value that value will be used in the update, otherwise the parameter will take a null value.
Then you can do a check in the sproc IF #Parameter IS NOT NULL ...
If you find yourself ever only needing to update just one field and you do not want to create one central sproc and pass nulls, then use Octavia's solution right below mine and write a simple update procedure.

Performance implications of sql 'OR' conditions when one alternative is trivial?

I'm creating a stored procedure for searching some data in my database according to some criteria input by the user.
My sql code looks like this:
Create Procedure mySearchProc
(
#IDCriteria bigint=null,
...
#MaxDateCriteria datetime=null
)
as
select Col1,...,Coln from MyTable
where (#IDCriteria is null or ID=#IDCriteria)
...
and (#MaxDateCriteria is null or Date<#MaxDateCriteria)
Edit : I've around 20 possible parameters, and each combination of n non-null parameters can happen.
Is it ok performance-wise to write this kind of code? (I'm using MS SQL Server 2008)
Would generating SQL code containing only the needed where clauses be notably faster?
OR clauses are notorious for causing performance issues mainly because they require table scans. If you can write the query without ORs you'll be better off.
where (#IDCriteria is null or ID=#IDCriteria)
and (#MaxDateCriteria is null or Date<#MaxDateCriteria)
If you write this criteria, then SQL server will not know whether it is better to use the index for IDs or the index for Dates.
For proper optimization, it is far better to write separate queries for each case and use IF to guide you to the correct one.
IF #IDCriteria is not null and #MaxDateCriteria is not null
--query
WHERE ID = #IDCriteria and Date < #MaxDateCriteria
ELSE IF #IDCriteria is not null
--query
WHERE ID = #IDCriteria
ELSE IF #MaxDateCriteria is not null
--query
WHERE Date < #MaxDateCriteria
ELSE
--query
WHERE 1 = 1
If you expect to need different plans out of the optimizer, you need to write different queries to get them!!
Would generating SQL code containing only the needed where clauses be notably faster?
Yes - if you expect the optimizer to choose between different plans.
Edit:
DECLARE #CustomerNumber int, #CustomerName varchar(30)
SET #CustomerNumber = 123
SET #CustomerName = '123'
SELECT * FROM Customers
WHERE (CustomerNumber = #CustomerNumber OR #CustomerNumber is null)
AND (CustomerName = #CustomerName OR #CustomerName is null)
CustomerName and CustomerNumber are indexed. Optimizer says : "Clustered
Index Scan with parallelization". You can't write a worse single table query.
Edit : I've around 20 possible parameters, and each combination of n non-null parameters can happen.
We had a similar "search" functionality in our database. When we looked at the actual queries issued, 99.9% of them used an AccountIdentifier. In your case, I suspect either one column is -always supplied- or one of two columns are always supplied. This would lead to 2 or 3 cases respectively.
It's not important to remove OR's from the whole structure. It is important to remove OR's from the column/s that you expect the optimizer to use to access the indexes.
So, to boil down the above comments:
Create a separate sub-procedure for each of the most popular variations of specific combinations of parameters, and within a dispatcher procedure call the appropriate one from an IF ELSE structure, the penultimate ELSE clause of which builds a query dynamically to cover the remaining cases.
Perhaps only one or two cases may be specifically coded at first, but as time goes by and particular combinations of parameters are identified as being statistically significant, implementation procedures may be written and the master IF ELSE construct extended to identify those cases and call the appropriate sub-procedure.
Regarding "Would generating SQL code containing only the needed where clauses be notably faster?"
I don't think so, because this way you effectively remove the positive effects of query plan caching.
You could perform selective queries, in order of the most common / efficient (indexed etc), parameters, and add PK(s) to a temporary table
That would create a (hopefully small!) subset of data
Then join that Temporary Table with the main table, using a full WHERE clause with
SELECT ...
FROM #TempTable AS T
JOIN dbo.MyTable AS M
ON M.ID = T.ID
WHERE (#IDCriteria IS NULL OR M.ID=#IDCriteria)
...
AND (#MaxDateCriteria IS NULL OR M.Date<#MaxDateCriteria)
style to refine the (small) subset.
What if constructs like these were replaced:
WHERE (#IDCriteria IS NULL OR #IDCriteria=ID)
AND (#MaxDateCriteria IS NULL OR Date<#MaxDateCriteria)
AND ...
with ones like these:
WHERE ID = ISNULL(#IDCriteria, ID)
AND Date < ISNULL(#MaxDateCriteria, DATEADD(millisecond, 1, Date))
AND ...
or is this just coating the same unoptimizable query in syntactic sugar?
Choosing the right index is hard for the optimizer. IMO, this is one of few cases where dynamic SQL is the best option.
this is one of the cases i use code building or a sproc for each searchoption.
since your search is so complex i'd go with code building.
you can do this either in code or with dynamic sql.
just be careful of SQL Injection.
I suggest one step further than some of the other suggestions - think about degeneralizing at a much higher abstraction level, preferably the UI structure. Usually this seems to happen when the problem is being pondered in data mode rather than user domain mode.
In practice, I've found that almost every such query has one or more non-null, fairly selective columns that would be reasonably optimizable, if one (or more) were specified. Furthermore, these are usually reasonable assumptions that users can understand.
Example: Find Orders by Customer; or Find Orders by Date Range; or Find Orders By Salesperson.
If this pattern applies, then you can decompose your hypergeneralized query into more purposeful subqueries that also make sense to users, and you can reasonably prompt for required values (or ranges), and not worry too much about crafting efficient expressions for subsidiary columns.
You may still end up with an "All Others" category. But at least then if you provide what is essentially an open-ended Query By Example form, then users will have some idea what they're getting into. Doing what you describe really puts you in the role of trying to out-think the query optimizer, which is folly IMHO.
I'm currently working with SQL 2005, so I don't know if the 2008 optimizer acts differently. That being said, I've found that you need to do a couple of things...
Make sure that you are using WITH (RECOMPILE) for your query
Use CASE statements to cause short-circuiting of the logic. At least in 2005 this is NOT done with OR statements. For example:
.
SELECT
...
FROM
...
WHERE
(1 =
CASE
WHEN #my_column IS NULL THEN 1
WHEN my_column = #my_column THEN 1
ELSE 0
END
)
The CASE statement will cause the SQL Server optimizer to recognize that it doesn't need to continue past the first WHEN. In this example it's not a big deal, but in my search procs a non-null parameter often meant searching in another table through a subquery for existence of a matching row, which got costly. Once I made this change the search procs started running much faster.
My suggestion is to build the sql string. You will gain maximum performance from index and reuse execution plan.
DECLARE #sql nvarchar(4000);
SET #sql = N''
IF #param1 IS NOT NULL
SET #sql = CASE WHEN #sql = N'' THEN N'' ELSE N' AND ' END + N'param1 = #param1';
IF #param2 IS NOT NULL
SET #sql = CASE WHEN #sql = N'' THEN N'' ELSE N' AND ' END + N'param2 = #param2';
...
IF #paramN IS NOT NULL
SET #sql = CASE WHEN #sql = N'' THEN N'' ELSE N' AND ' END + N'paramN = #paramN';
IF #sql <> N''
SET #sql = N' WHERE ' + #sql;
SET #sql = N'SELECT ... FROM myTable' + #sql;
EXEC sp_executesql #sql, N'#param1 type, #param2 type, ..., #paramN type', #param1, #param2, ..., #paramN;
Each time the procedure is called, passing different parameters, there is a different optimal execution plan for getting the data. The problem being, that SQL has cached an execution plan for your procedure and will use a sub-optimal (read terrible) execution plan.
I would recommend:
Create specific SPs for frequently run execution paths (i.e. passed parameter sets) optimised for each scenario.
Keep you main generic SP for edge cases (presuming they are rarely run) but use the WITH RECOMPILE clause to cause a new execution plan to be created each time the procedure is run.
We use OR clauses checking against NULLs for optional parameters to great affect. It works very well without the RECOMPILE option so long as the execution path is not drastically altered by passing different parameters.