Consolidated: SQL Pass comma separated values in SP for filtering - sql

I'm here to share a consolidated analysis for the following scenario:
I've an 'Item' table and I've a search SP for it. I want to be able to search for multiple ItemCodes like:
- Table structure : Item(Id INT, ItemCode nvarchar(20))
- Filter query format: SELECT * FROM Item WHERE ItemCode IN ('xx','yy','zz')
I want to do this dynamically using stored procedure. I'll pass an #ItemCodes parameter which will have comma(',') separated values and the search shud be performed as above.
Well, I've already visited lot of posts\forums and here're some threads:
Dynamic SQL might be a least complex way but I don't want to consider it because of the parameters like performance,security (SQL-Injection, etc..)..
Also other approaches like XML, etc.. if they make things complex I can't use them.
And finally, no extra temp-table JOIN kind of performance hitting tricks please.
I've to manage the performance as well as the complexity.
T-SQL stored procedure that accepts multiple Id values
Passing an "in" list via stored procedure
I've reviewed the above two posts and gone thru some solutions provided, here're some limitations:
http://www.sommarskog.se/arrays-in-sql-2005.html
This will require me to 'declare' the parameter-type while passing it to the SP, it distorts the abstraction (I don't set type in any of my parameters because each of them is treated in a generic way)
http://www.sqlteam.com/article/sql-server-2008-table-valued-parameters
This is a structured approach but it increases complexity, required DB-structure level changes and its not abstract as above.
http://madprops.org/blog/splitting-text-into-words-in-sql-revisited/
Well, this seems to match-up with my old solutions. Here's what I did in the past -
I created an SQL function : [GetTableFromValues] (returns a temp table populated each item (one per row) from the comma separated #ItemCodes)
And, here's how I use it in my WHERE caluse filter in SP -
SELECT * FROM Item WHERE ItemCode in (SELECT * FROM[dbo].[GetTableFromValues](#ItemCodes))
This one is reusable and looks simple and short (comparatively of course). Anything I've missed or any expert with a better solution (obviously 'within' the limitations of the above mentioned points).
Thank you.

I think using dynamic T-SQL will be pragmatic in this scenario. If you are careful with the design, dynamic sql works like a charm. I have leveraged it in countless projects when it was the right fit. With that said let me address your two main concerns - performance and sql injection.
With regards to performance, read T-SQL reference on parameterized dynamic sql and sp_executesql (instead of sp_execute). A combination of parameterized sql and using sp_executesql will get you out of the woods on performance by ensuring that query plans are reused and sp_recompiles avoided! I have used dynamic sql even in real-time contexts and it works like a charm with these two items taken care of. For your satisfaction you can run a loop of million or so calls to the sp with and without the two optimizations, and use sql profiler to track sp_recompile events.
Now, about SQL-injection. This will be an issue if you use an incorrect user widget such as a textbox to allow the user to input the item codes. In that scenario it is possible that a hacker may write select statements and try to extract information about your system. You can write code to prevent this but I think going down that route is a trap. Instead consider using an appropriate user widget such as a listbox (depending on your frontend platform) that allows multiple selection. In this case the user will just select from a list of "presented items" and your code will generate the string containing the corresponding item codes. Basically you do not pass user text to the dynamic sql sp! You can even use slicker JQuery based selection widgets but the bottom line is that the user does not get to type any unacceptable text that hits your data layer.
Next, you just need a simple stored procedure on the database that takes a param for the itemcodes (for e.g. '''xyz''','''abc'''). Internally it should use sp_executesql with a parameterized dynamic query.
I hope this helps.
-Tabrez

Related

suggestions on how to build query for search with multiple options

We are building a search form for users to search our database, the form will contain mulitlpe fields which are all optional. Fields including:
company name
company code
business type (service or product)
Product or Service
Product or Service subtype --> this will depend on what is chosen in #4
Basically the users can fill all or just some of the fields and submit the form. How best should we handle the sql for this? Is it best to use dynamic sql, build out the where clause in our webpage and then forward that to the sql stored procedure to use as it's where clause? Or is it better to pass all the values to the stored procedure and let it build the where clause dynamically. Also is dynamic sql the only way? I wasn't sure if using EXECUTE(#SQLStatement) is a good practice.
What I have done in the past is when a search option is not being usesd pass in a null for its value. Then in your select statement you would do something like
WHERE i.companyname = COALESCE(#CompanyName,i.companyname)
AND i.companycode = COALESCE(#CompanyCode,i.companycode)
What happens above is that if #CompanyName is null i.companyname will be compared to itself resulting in a match. If #CompanyName has a value it will compare i.companyname against that value.
I have used this way with 15 optional filters in a database with 15,000 rows and it has performed relatively well to date
More on the COALESCE operator
Dynamic SQL isn't the only way, it'd be better if you can avoid it with methods like: http://www.sommarskog.se/dyn-search.html
If you can't get the performance from the above method and go for dynamic SQL, do not allow the web-page to construct the SQL and execute it - you will end up getting SQL injected. Also avoid text strings being passed in, as sanitising them is very difficult. Ideally have the web page pass down parameters that are numbers only (IDs and such) for you to create the dynamic SQL from.
If you do decide to use dynamic SQL be sure to read all this: http://www.sommarskog.se/dynamic_sql.html

Wrap SQL CONTAINS as an expression?

I have a question. I working on one site on Asp.Net, which uses some ORM. I need to use a couple of FullTextSearch functions, such as Contains. But when I try to generate it with that ORM, it generates such SQL code
SELECT
[Extent1].[ID] AS [ID],
[Extent1].[Name] AS [Name]
FROM [dbo].[SomeTable] AS [Extent1]
WHERE (Contains([Extent1].[Name], N'qq')) = 1
SQL can't parse it, because Contains doesn't return bit value. And unfortunately I can't modify SQL query generation process, but I can modify statements in it.
My question is - is it possible to wrap call of CONTAINS function to something else? I tried to create another function, that will SELECT with contains, but it requires specific table\column objects, and I don't want to do one function for each table..
EDIT
I can modify result type for that function in ORM. In previous sample result type is Bit. I can change it to int,nvarchar,etc. But as I understood there is no Boolean type in SQL, and I can't specify it.
Can't you put this in a stored procedure, and tell your ORM to call the stored procedure? Then you don't have to worry about the fact that your ORM only understands a subset of valid T-SQL.
I don't know that I believe the argument that requiring new stored procedures is a blocker. If you have to write a new CONTAINS expression in your ORM code, how much different is it to wrap that expression in a CREATE PROCEDURE statement in a different window? If you want to do this purely in ORM, then you're going to have to put pressure on the vendor to pick up the pace and start getting more complete coverage of the language they should fully support.

SQL Max parameters

I read here that the maximum number of parameters that can be passed to a Stored Procedure is 2100.
I am just curious what kind of system would require a SP with 2100 parameters to be passed, and couldn't one split that into multiple SPs?
I thought that maybe an SP that calls multiple SPs would require a lot of params to be passed, I just can't fathom writing out that disgusting EXEC statment.
The limit of procedures parameters predates both the XML data type and the table valued parameters, so back in those days there was simply no alternative. Having 2100 parameters on a procedure does not necessarily mean a human wrote it nor that a human will call it. It is quite common in generated code, like code created by tools and frameworks, to push such boundaries for any language, since the maintenance and refactoring of the generated code occur in the generating tool, not in the result code.
If you have a stored procedure using 2100 parameters you most likely have some sort of design problem.
Passing in a CSV list of values in a single parameter (and using a table value split function to turn those into rows), or using a table value parameter would be much easier than handling all of those input parameters.
I had a situation where I had to run something quite like the following:
SELECT
...
WHERE
ID IN (?,?,?,?...)
The list of parameters featured all entities the user had permission to use in the system (it was dynamically generated by some underlying framework). Turns out that the SGBD had a limitation on the number of parameters to be passed like this, and it was below 2100 (IIRC, it was Oracle and the maximum were 999 parameters in the IN list).
This would be a nice example of a rather long list of parameters to something that had to be a stored procedure (we had more than 999 and less than 2100 parameters to pass).
Don't know if the 999 constraint apply to sql server, but it is definately a situation where the long list would be useful...

SQL Server stored procedures - update column based on variable name..?

I have a data driven site with many stored procedures. What I want to eventually be able to do is to say something like:
For Each #variable in sproc inputs
UPDATE #TableName SET #variable.toString = #variable
Next
I would like it to be able to accept any number of arguments.
It will basically loop through all of the inputs and update the column with the name of the variable with the value of the variable - for example column "Name" would be updated with the value of #Name. I would like to basically have one stored procedure for updating and one for creating. However to do this I will need to be able to convert the actual name of a variable, not the value, to a string.
Question 1: Is it possible to do this in T-SQL, and if so how?
Question 2: Are there any major drawbacks to using something like this (like performance or CPU usage)?
I know if a value is not valid then it will only prevent the update involving that variable and any subsequent ones, but all the data is validated in the vb.net code anyway so will always be valid on submitting to the database, and I will ensure that only variables where the column exists are able to be submitted.
Many thanks in advance,
Regards,
Richard Clarke
Edit:
I know about using SQL strings and the risk of SQL injection attacks - I studied this a bit in my dissertation a few weeks ago.
Basically the website uses an object oriented architecture. There are many classes - for example Product - which have many "Attributes" (I created my own class called Attribute, which has properties such as DataField, Name and Value where DataField is used to get or update data, Name is displayed on the administration frontend when creating or updating a Product and the Value, which may be displayed on the customer frontend, is set by the administrator. DataField is the field I will be using in the "UPDATE Blah SET #Field = #Value".
I know this is probably confusing but its really complicated to explain - I have a really good understanding of the entire system in my head but I cant put it into words easily.
Basically the structure is set up such that no user will be able to change the value of DataField or Name, but they can change Value. I think if I were to use dynamic parameterised SQL strings there will therefore be no risk of SQL injection attacks.
I mean basically loop through all the attributes so that it ends up like:
UPDATE Products SET [Name] = '#Name', Description = '#Description', Display = #Display
Then loop through all the attributes again and add the parameter values - this will have the same effect as using stored procedures, right??
I dont mind adding to the page load time since this is mainly going to affect the administration frontend, and will marginly affect the customer frontend.
Question 1: you must use dynamic SQL - construct your update statement as a string, and run it with the EXEC command.
Question 2: yes there are - SQL injection attacks, risk of mal-formed queries, added overhead of having to compile a separate SQL statement.
Your example is very inefficient, so if I pass in 10 columns you will update the same table 10 times?
The better way is to do one update by using sp_executesql and build this dynamically, take a look at The Curse and Blessings of Dynamic SQL to see how you have to do it
Is this a new system where you have the freedom to design as necessary, or are you stuck with an existing DB design?
You might consider representing the attributes not as columns, but as rows in a child table.
In the parent MyObject you'd just have header-level data, things that are common to all objects in the system (maybe just an identifier). In the child table MyObjectAttribute you'd have a primary key of with another column attrValue. This way you can do an UPDATE like so:
UPDATE MyObjectAttribute
SET attrValue = #myValue
WHERE objectID = #myID
AND attrName = #myAttrName

Sql Optimization: Xml or Delimited String

This is hopefully just a simple question involving performance optimizations when it comes to queries in Sql 2008.
I've worked for companies that use Stored Procs a lot for their ETL processes as well as some of their websites. I've seen the scenario where they need to retrieve specific records based on a finite set of key values. I've seen it handled in 3 different ways, illustrated via pseudo-code below.
Dynamic Sql that concatinates a string and executes it.
EXEC('SELECT * FROM TableX WHERE xId IN (' + #Parameter + ')'
Using a user defined function to split a delimited string into a table
SELECT * FROM TableY INNER JOIN SPLIT(#Parameter) ON yID = splitId
USING XML as the Parameter instead of a delimited varchar value
SELECT * FROM TableZ JOIN #Parameter.Nodes(xpath) AS x (y) ON ...
While I know creating the dynamic sql in the first snippet is a bad idea for a large number of reasons, my curiosity comes from the last 2 examples. Is it more proficient to do the due diligence in my code to pass such lists via XML as in snippet 3 or is it better to just delimit the values and use an udf to take care of it?
There is now a 4th option - table valued parameters, whereby you can actually pass a table of values in to a sproc as a parameter and then use that as you would normally a table variable. I'd be preferring this approach over the XML (or CSV parsing approach)
I can't quote performance figures between all the different approaches, but that's one I'd be trying - I'd recommend doing some real performance tests on them.
Edit:
A little more on TVPs. In order to pass the values in to your sproc, you just define a SqlParameter (SqlDbType.Structured) - the value of this can be set to any IEnumerable, DataTable or DbDataReader source. So presumably, you already have the list of values in a list/array of some sort - you don't need to do anything to transform it into XML or CSV.
I think this also makes the sproc clearer, simpler and more maintainable, providing a more natural way to achieve the end result. One of the main points is that SQL performs best at set based/not looping/non string manipulation activities.
That's not to say it will perform great with a large set of values passed in. But with smaller sets (up to ~1000) it should be fine.
UDF invocation is a little bit more costly than splitting the XML using the built-in function.
However, this only needs to be done once per query, so the performance difference will be negligible.