SQL separate string being passed - sql

I have a product table with a tag column, each product has multiple tags stored in this format: "|technology|mobile|acer|laptop|" ...second product's tags could look like this "|computer|laptop|toshiba|"
I am using MS SQL Server 2008 and stored procedure, I would like to know how I could pass a string like "|computer|laptop|" and get both records returned as they both have the tag laptop in them and if I passed "|computer|" only the second record would return as it is the only one comtainning that tag.
What is the best way of doing this without performance penalties using stored procedure?
I have so far had no luck with different codes i have found on the internet, I really hope you guys can maybe help me with this, thank you.

I agree with the other posters that storing data in a column like that is going to cause headaches. You really want to store those tags in a child table so you can easily and efficiently join them. If it's an inherited system or something you can't refactor right away you can write a split function.
The typical sql split implementation uses a while loop and a table variable in a multi-statement TVF. Every iteration incurs more I/O and CPU overhead. Performance testing on SQL 2005 SP1 showed that this overhead is hidden from the I/O Stats and query plan. Profiling the code will reveal the true cost.
Rewriting that function into a inline TVF is much more efficient. The primary difference between an inline and multi-statement TVF is the Query Optimizer will merge the inline function into the query before processing; this eliminates the overhead from the function call. Also, since there is no table variable required, the additional I/O cost is eliminated. Finally, you avoid the costly iterative processing.
Here is the fastest, most scalable split function I could come up with including unit tests and summary.
This function requires a numbers table:
CREATE TABLE dbo.Numbers
(
NUM INT PRIMARY KEY CLUSTERED
)
;WITH Nbrs ( n ) AS
(
SELECT 1 UNION ALL
SELECT 1 + n FROM Nbrs WHERE n < 10000
)
INSERT INTO dbo.Numbers
SELECT n FROM Nbrs
OPTION ( MAXRECURSION 10000 )
The source of the function is here:
IF EXISTS (
SELECT 1
FROM dbo.sysobjects
WHERE id = object_id(N'[dbo].[ParseString]')
AND xtype in (N'FN', N'IF', N'TF'))
BEGIN
DROP FUNCTION [dbo].[ParseString]
END
GO
CREATE FUNCTION dbo.ParseString (#String VARCHAR(8000), #Delimiter VARCHAR(10))
RETURNS TABLE
AS
/*******************************************************************************************************
* dbo.ParseString
*
* Creator: MagicMike
* Date: 9/12/2006
*
*
* Outline: A set-based string tokenizer
* Takes a string that is delimited by another string (of one or more characters),
* parses it out into tokens and returns the tokens in table format. Leading
* and trailing spaces in each token are removed, and empty tokens are thrown
* away.
*
*
* Usage examples/test cases:
Single-byte delimiter:
select * from dbo.ParseString2('|HDI|TR|YUM|||', '|')
select * from dbo.ParseString('HDI| || TR |YUM', '|')
select * from dbo.ParseString(' HDI| || S P A C E S |YUM | ', '|')
select * from dbo.ParseString2('HDI|||TR|YUM', '|')
select * from dbo.ParseString('', '|')
select * from dbo.ParseString('YUM', '|')
select * from dbo.ParseString('||||', '|')
select * from dbo.ParseString('HDI TR YUM', ' ')
select * from dbo.ParseString(' HDI| || S P A C E S |YUM | ', ' ') order by Ident
select * from dbo.ParseString(' HDI| || S P A C E S |YUM | ', ' ') order by StringValue
Multi-byte delimiter:
select * from dbo.ParseString('HDI and TR', 'and')
select * from dbo.ParseString('Pebbles and Bamm Bamm', 'and')
select * from dbo.ParseString('Pebbles and sandbars', 'and')
select * from dbo.ParseString('Pebbles and sandbars', ' and ')
select * from dbo.ParseString('Pebbles and sand', 'and')
select * from dbo.ParseString('Pebbles and sand', ' and ')
*
*
* Notes:
1. A delimiter is optional. If a blank delimiter is given, each byte is returned in it's own row (including spaces).
select * from dbo.ParseString3('|HDI|TR|YUM|||', '')
2. In order to maintain compatibility with SQL 2000, ident is not sequential but can still be used in an order clause
If you are running on SQL2005 or later
SELECT Ident, StringValue FROM
with
SELECT Ident = ROW_NUMBER() OVER (ORDER BY ident), StringValue FROM
*
*
* Modifications
*
*
********************************************************************************************************/
RETURN (
SELECT Ident, StringValue FROM
(
SELECT Num as Ident,
CASE
WHEN DATALENGTH(#delimiter) = 0 or #delimiter IS NULL
THEN LTRIM(SUBSTRING(#string, num, 1)) --replace this line with '' if you prefer it to return nothing when no delimiter is supplied. Remove LTRIM if you want to return spaces when no delimiter is supplied
ELSE
LTRIM(RTRIM(SUBSTRING(#String,
CASE
WHEN (Num = 1 AND SUBSTRING(#String,num ,DATALENGTH(#delimiter)) <> #delimiter) THEN 1
ELSE Num + DATALENGTH(#delimiter)
END,
CASE CHARINDEX(#Delimiter, #String, Num + DATALENGTH(#delimiter))
WHEN 0 THEN LEN(#String) - Num + DATALENGTH(#delimiter)
ELSE CHARINDEX(#Delimiter, #String, Num + DATALENGTH(#delimiter)) - Num -
CASE
WHEN Num > 1 OR (Num = 1 AND SUBSTRING(#String,num ,DATALENGTH(#delimiter)) = #delimiter)
THEN DATALENGTH(#delimiter)
ELSE 0
END
END
)))
End AS StringValue
FROM dbo.Numbers
WHERE Num <= LEN(#String)
AND (
SUBSTRING(#String, Num, DATALENGTH(ISNULL(#delimiter,''))) = #Delimiter
OR Num = 1
OR DATALENGTH(ISNULL(#delimiter,'')) = 0
)
) R WHERE StringValue <> ''
)
For your case, you could use it like this:
--SAMPLE DATA
CREATE TABLE #products
(
productid INT IDENTITY PRIMARY KEY CLUSTERED ,
prodname VARCHAR(200),
tags VARCHAR(200)
)
INSERT INTO #products (prodname, tags)
SELECT 'toshiba laptop', '|laptop|toshiba|notebook|'
UNION ALL
SELECT 'toshiba netbook', '|netbook|toshiba|'
UNION ALL
SELECT 'Apple macbook', '|laptop|apple|notebook|'
UNION ALL
SELECT 'Apple mouse', '|apple|mouse'
--Actual solution
DECLARE #searchTags VARCHAR(200)
SET #searchTags = '|apple|laptop|' --This would the string that would get passed in if it were a stored procedure
--First we convert the supplied tags into a table for use later
--My (2005) dev box raised a severe error attempting to do the search in 1 step
--hence the temp table
CREATE TABLE #tags
(
tag VARCHAR(200) PRIMARY KEY CLUSTERED
)
INSERT INTO #tags --The function splits the string up into one record for each value
SELECT stringValue
FROM dbo.parsestring(#searchTags,'|') --SQL 2005 has a real problem joining to a TVF twice, apparently
SELECT DISTINCT p.*
FROM #products P --we join the products table with the function to get a row for each tag so we can compare with the temp table
CROSS APPLY (SELECT stringValue FROM dbo.parsestring(P.tags,'|')) T
WHERE EXISTS(SELECT * FROM #tags WHERE tag = T.stringValue) --we compare the rows with our temp table and if we get matches, the products are returned
/*This will return the Apple Macbook and the Toshiba Laptop because they both contain
the 'laptop' tag and the Apple mouse because it contains the 'apple' tag. The
toshiba netbook contains neither tag so it won't be returned.*/
But, with your tags in a separate table as suggested (1-many for a simplified example) It would look like this:
SELECT * FROM Products P
WHERE EXISTS (SELECT *
FROM tags T
INNER JOIN dbo.parsestring(#tags,'|') Q
ON T.tag = Q.StringValue
WHERE T.productid = P.productiId )

You have a many-to-many relationship between products and tags. The best way of doing this is to redesign your database. Create a table of tags and a junction table that links products with tags.

That's not a very good design. Combining like terms into one field and separating them with a delimiter such as a vertical bar does not scale well and it is very limiting.
I recommend you read up on how to design databases. The best book I ever bought regarding database design was Database Design for Mere Mortals by Michael Hernandez ISBN: 0-201-69471-9. Amazon Listing I noticed he has a second edition.
He walks you through the entire process of (from start to finish) of designing a database. I recommend you start with this book.
You have to learn to look at things in groups or chunks. Database design has simple building blocks just like programming does. If you gain a thorough understanding of these simple building blocks you can tackle any database design.
In programming you have:
If Constructs
If Else Constructs
Do While Loops
Do Until Loops
Case Constructs
With databases you have:
Data Tables
Lookup Tables
One to One relationships
One to Many Relationships
Many to Many relationships
Primary keys
Foreign keys
The simpler you make things the better. A database is nothing more than a place where you put data into cubbie holes. Start by identifying what these cubbie holes are and what kind of stuff you want in them.
You are never going to create the perfect database design the first time you try. This is a fact. Your design will go through several refinements during the process. Sometimes things won't seem apparent until you start entering data, and then you have an ahh ha moment.
The web brings it's own sets of challenges. Bandwith issues. Statelessness. Erroneous data from processes that start but never get finished.

make a split with CLR function return a table with the value or pass as xml and load it into a table varible an make a join
create procedure search
(
#data xml
)
AS
BEGIN
--declare #data xml
declare #LoadData table
(
dataToFind varchar(max)
)
--set #data= cast(
--'<data>
-- <item>computer</item>
-- <item>television</item>
--</data>' as xml)
insert into #LoadData
SELECT T2.Loc.value('.','varchar(max)')
FROM (select #data as data )T
CROSS APPLY data.nodes('/data/item') as T2(Loc)
select * from #LoadData--use for join
END

I would suggest you write an extra couplle of tables that with "proper design,
Populate those tables from the existing not well designed bit - this way y our search will work properly buy others using the old | pipe approach won't notice till you have time to refactor

Related

Checking if field contains multiple string in sql server

I am working on a sql database which will provide with data some grid. The grid will enable filtering, sorting and paging but also there is a strict requirement that users can enter free text to a text input above the grid for example
'Engine 1001 Requi' and that the result will contain only rows which in some columns contain all the pieces of the text. So one column may contain Engine, other column may contain 1001 and some other will contain Requi.
I created a technical column (let's call it myTechnicalColumn) in the table (let's call it myTable) which will be updated each time someone inserts or updates a row and it will contain all the values of all the columns combined and separated with space.
Now to use it with entity framework I decided to use a table valued function which accepts one parameter #searchQuery and it will handle it like this:
CREATE FUNCTION myFunctionName(#searchText NVARCHAR(MAX))
RETURNS #Result TABLE
( ... here come columns )
AS
BEGIN
DECLARE #searchToken TokenType
INSERT INTO #searchToken(token) SELECT value FROM STRING_SPLIT(#searchText,' ')
DECLARE #searchTextLength INT
SET #searchTextLength = (SELECT COUNT(*) FROM #searchToken)
INSERT INTO #Result
SELECT
... here come columns
FROM myTable
WHERE (SELECT COUNT(*) FROM #searchToken WHERE CHARINDEX(token, myTechnicalColumn) > 0) = #searchTextLength
RETURN;
END
Of course the solution works fine but it's kinda slow. Any hints how to improve its efficiency?
You can use an inline Table Valued Function, which should be quite a lot faster.
This would be a direct translation of your current code
CREATE FUNCTION myFunctionName(#searchText NVARCHAR(MAX))
RETURNS TABLE
AS RETURN
(
WITH searchText AS (
SELECT value token
FROM STRING_SPLIT(#searchText,' ') s(token)
)
SELECT
... here come columns
FROM myTable t
WHERE (
SELECT COUNT(*)
FROM searchText
WHERE CHARINDEX(s.token, t.myTechnicalColumn) > 0
) = (SELECT COUNT(*) FROM searchText)
);
GO
You are using a form of query called Relational Division Without Remainder and there are other ways to cut this cake:
CREATE FUNCTION myFunctionName(#searchText NVARCHAR(MAX))
RETURNS TABLE
AS RETURN
(
WITH searchText AS (
SELECT value token
FROM STRING_SPLIT(#searchText,' ') s(token)
)
SELECT
... here come columns
FROM myTable t
WHERE NOT EXISTS (
SELECT 1
FROM searchText
WHERE CHARINDEX(s.token, t.myTechnicalColumn) = 0
)
);
GO
This may be faster or slower depending on a number of factors, you need to test.
Since there is no data to test, i am not sure if the following will solve your issue:
-- Replace the last INSERT portion
INSERT INTO #Result
SELECT
... here come columns
FROM myTable T
JOIN #searchToken S ON CHARINDEX(S.token, T.myTechnicalColumn) > 0

Convert SQL results from "N rows of 1 column" to "1 row of N columns" from WITHIN the query

Doing this seemingly trivial task should be simple and obvious using PIVOT - but isn't.
What is the cleanest way to do the conversion, not necessarily using pivot, when limited to ONLY using "pure" SQL (see other factors, below)?
It shouldn't affect the answer, but note that a Python 3.X front end is being used to run SQL queries on a MS SQL Server 2012 backend.
Background :
I need to create CSV files by calling SQL code from Python 3.x. The CSV header line is created from the field (column) names of the SQL table that holds the results of the query.
The following SQL code extracts the field names and returns them as N rows of 1 column - but I need them as 1 row of N columns. (In the example below, the final result must be "A", "B", "C" .)
CREATE TABLE #MyTable -- ideally the real code uses "DECLARE #MyTable TABLE"
(
A varchar( 32 ),
B varchar( 32 ),
C varchar( 32 )
) ;
CREATE TABLE #MetaData -- ideally the real code uses "DECLARE #MetaData TABLE"
(
NameOfField varchar( 32 ) not NULL
) ;
INSERT INTO #MetaData
SELECT name
FROM tempdb.sys.columns as X
WHERE ( object_id = Object_id( 'tempdb..#MyTable' ) )
ORDER BY column_id ; -- generally redundant, ensures correct order if results returned in random order
/*
OK so far, the field names are returned as 3 rows of 1 column (entitled "NameOfField").
Pivoting them into 1 row of 3 columns should be something simple like:
*/
SELECT NameOfField
FROM #MetaData AS Source
PIVOT
(
COUNT( [ NameOfField ] ) FOR [ NameOfField ]
IN ( #MetaData ) -- I've tried "IN (SELECT NameOfField FROM #Metadata)"
) AS Destination ;
This error gets raised twice, once for the COUNT and once for the "FOR" clause of the PIVOT statement:
Msg 207, Level 16, State 1, Line 32
Invalid column name ' NameOfField'.
How do I use the contents of #Metadata to get PIVOT to work? Or is there another simple way?
Other background factors to be aware of:
OBDC (Python's pyodbc package) is being used to pass the SQL queries from - and return the results (a cursor) to - a Python 3.x front end. Consequently there is no opportunity to use any type of manual intervention before the result set is returned to Python.
The above SQL code is intended to become standard boilerplate for every query passed to SQL. The code must dynamically "adapt" itself to the structure of #MyTable (e.g. if field B is removed while D and E are added after C, the end result must be "A", "C","D", "E"). This means that the field names of a table must never appear inside PIVOT's IN clause (the #MetaData table is intended to supply those values).
"Standard" SQL must be used. ALL vendor specific (e.g. Microsoft) extensions/utilities (e.g. "bcp", sqlcmd) must be avoided unless there is a very compelling reason to use them (because "it's there" doesn't count).
For known reasons the select clause (into #Metadata) doesn't work for temporary variables (#MyTable). Is there an equivalent Select that works for temporary variables(i.e. #MetaData)?
UPDATE: This problem is subtly different from that in SQL Server dynamic PIVOT query?. In my case I have to preserve the order of the fields, something not required by that question.
WHY I NEED TO DO THIS:
The python code is a GUI for non-technical people. They use the GUI to pick & chose which (or even all) SQL reports to run from a HUGE number of reports.
Apps like Excel are being used to view these files: to keep our users happy each CSV file must have a header line. The header line will consist of the field names from the SQL table that holds the results of the query.
These scripts can change at any time (e.g. add/delete a column) without any advance notice. To meet our users needs the header line must automatically "adjust itself" to make the corresponding changes. The SQL code below accomplishes this.
The header line gets merged (using UNION) with the query results to form the result set (a cursor) that gets passed back to Python. Python then processes the returned data and creates the CSV file (including the header line) that gets used by our customers.
In a nutshell: We have many sites, many users, many queries. By having SQL "dynmically create" the header line we remove the headache of having to manually manage/coordinate/rollout the SQL changes to all affected parties.
I am unsure what "pure" sql is. Are you refering to ANSI-92 SQL?
Anyhow, if you can use SQL variables, try this:
DECLARE #STRING VARCHAR(MAX)
SELECT #STRING = COALESCE(#STRING + ', ' + '"' + NameOfField + '"', '"' + NameOfField + '"')
FROM #MetaData
SELECT #STRING
/*
Results:
"A", "B", "C"
*/
To #Tab Alleman, thanks. I was able to modify the answer to SQL Server dynamic PIVOT query?
to do the swap (see below) in a way that meets all my needs.
NOTE: For some reason the "DISTINCT" keyword places the fields in alphabetical order - something I don't want.
Commenting that word out (as done below) preserves the order of the fields. I'm a bit uneasy about doing this but in this case it should be safe because the values being selected into #MetaData are guaranteed to be unique.
The difference can be easily seen by swapping fields A & B in #MyTable and uncommenting the "DISTINCT" keyword
--drop table #MyTable
--drop table #MetaData
Create TABLE #MyTable
(
A varchar( 10 ),
B varchar( 10 ),
C varchar( 10 )
)
;
CREATE TABLE #MetaData
(
NameOfField varchar( 100 ) not NULL,
Position int
)
;
INSERT INTO #MetaData
SELECT name, column_id
FROM tempdb.sys.columns as X
WHERE ( object_id = Object_id( 'tempdb..#MyTable' ) )
--ORDER BY column_id -- normally redundant, guards against results being returned in random order
;
select * from #MetaData
DECLARE #cols AS NVARCHAR(MAX),
#query AS NVARCHAR(MAX);
SET #cols = STUFF( (SELECT
-- DISTINCT
',' + QUOTENAME( c.NameOfField )
FROM #MetaData AS c
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1,1,'')
--print( #cols )
set #query = 'SELECT ' + #cols + ' from
(
select NameOfField
from #MetaData
) AS x
pivot
(
MAX( NameOfField )
for NameOfField in ( '+ #cols + ' )
) AS p
'
--print( #query )
execute( #query )
drop table #MyTable
drop table #MetaData

Splitting delimited values in a SQL column into multiple rows

I would really like some advice here, to give some background info I am working with inserting Message Tracking logs from Exchange 2007 into SQL. As we have millions upon millions of rows per day I am using a Bulk Insert statement to insert the data into a SQL table.
In fact I actually Bulk Insert into a temp table and then from there I MERGE the data into the live table, this is for test parsing issues as certain fields otherwise have quotes and such around the values.
This works well, with the exception of the fact that the recipient-address column is a delimited field seperated by a ; character, and it can be incredibly long sometimes as there can be many email recipients.
I would like to take this column, and split the values into multiple rows which would then be inserted into another table. Problem is anything I am trying is either taking too long or not working the way I want.
Take this example data:
message-id recipient-address
2D5E558D4B5A3D4F962DA5051EE364BE06CF37A3A5#Server.com user1#domain1.com
E52F650C53A275488552FFD49F98E9A6BEA1262E#Server.com user2#domain2.com
4fd70c47.4d600e0a.0a7b.ffff87e1#Server.com user3#domain3.com;user4#domain4.com;user5#domain5.com
I would like this to be formatted as followed in my Recipients table:
message-id recipient-address
2D5E558D4B5A3D4F962DA5051EE364BE06CF37A3A5#Server.com user1#domain1.com
E52F650C53A275488552FFD49F98E9A6BEA1262E#Server.com user2#domain2.com
4fd70c47.4d600e0a.0a7b.ffff87e1#Server.com user3#domain3.com
4fd70c47.4d600e0a.0a7b.ffff87e1#Server.com user4#domain4.com
4fd70c47.4d600e0a.0a7b.ffff87e1#Server.com user5#domain5.com
Does anyone have any ideas about how I can go about doing this?
I know PowerShell pretty well, so I tried in that, but a foreach loop even on 28K records took forever to process, I need something that will run as quickly/efficiently as possible.
Thanks!
If you are on SQL Server 2016+
You can use the new STRING_SPLIT function, which I've blogged about here, and Brent Ozar has blogged about here.
SELECT s.[message-id], f.value
FROM dbo.SourceData AS s
CROSS APPLY STRING_SPLIT(s.[recipient-address], ';') as f;
If you are still on a version prior to SQL Server 2016
Create a split function. This is just one of many examples out there:
CREATE FUNCTION dbo.SplitStrings
(
#List NVARCHAR(MAX),
#Delimiter NVARCHAR(255)
)
RETURNS TABLE
AS
RETURN (SELECT Number = ROW_NUMBER() OVER (ORDER BY Number),
Item FROM (SELECT Number, Item = LTRIM(RTRIM(SUBSTRING(#List, Number,
CHARINDEX(#Delimiter, #List + #Delimiter, Number) - Number)))
FROM (SELECT ROW_NUMBER() OVER (ORDER BY s1.[object_id])
FROM sys.all_objects AS s1 CROSS APPLY sys.all_objects) AS n(Number)
WHERE Number <= CONVERT(INT, LEN(#List))
AND SUBSTRING(#Delimiter + #List, Number, 1) = #Delimiter
) AS y);
GO
I've discussed a few others here, here, and a better approach than splitting in the first place here.
Now you can extrapolate simply by:
SELECT s.[message-id], f.Item
FROM dbo.SourceData AS s
CROSS APPLY dbo.SplitStrings(s.[recipient-address], ';') as f;
Also I suggest not putting dashes in column names. It means you always have to put them in [square brackets].
SQL Server 2016 include a new table function string_split(), similar to the previous solution.
The only requirement is Set compatibility level to 130 (SQL Server 2016)
You may use CROSS APPLY (available in SQL Server 2005 and above) and STRING_SPLIT function (available in SQL Server 2016 and above):
DECLARE #delimiter nvarchar(255) = ';';
-- create tables
CREATE TABLE MessageRecipients (MessageId int, Recipients nvarchar(max));
CREATE TABLE MessageRecipient (MessageId int, Recipient nvarchar(max));
-- insert data
INSERT INTO MessageRecipients VALUES (1, 'user1#domain.com; user2#domain.com; user3#domain.com');
INSERT INTO MessageRecipients VALUES (2, 'user#domain1.com; user#domain2.com');
-- insert into MessageRecipient
INSERT INTO MessageRecipient
SELECT MessageId, ltrim(rtrim(value))
FROM MessageRecipients
CROSS APPLY STRING_SPLIT(Recipients, #delimiter)
-- output results
SELECT * FROM MessageRecipients;
SELECT * FROM MessageRecipient;
-- delete tables
DROP TABLE MessageRecipients;
DROP TABLE MessageRecipient;
Results:
MessageId Recipients
----------- ----------------------------------------------------
1 user1#domain.com; user2#domain.com; user3#domain.com
2 user#domain1.com; user#domain2.com
and
MessageId Recipient
----------- ----------------
1 user1#domain.com
1 user2#domain.com
1 user3#domain.com
2 user#domain1.com
2 user#domain2.com
for table = "yelp_business", split the column categories values separated by ; into rows and display as category column.
SELECT unnest(string_to_array(categories, ';')) AS category
FROM yelp_business;

How hard would you try to make your SQL queries secure?

I am in a situation where I am given a comma-separated VarChar as input to a stored procedure. I want to do something like this:
SELECT * FROM tblMyTable
INNER JOIN /*Bunch of inner joins here*/
WHERE ItemID IN ($MyList);
However, you can't use a VarChar with the IN statement. There are two ways to get around this problem:
(The Wrong Way) Create the SQL query in a String, like so:
SET $SQL = '
SELECT * FROM tblMyTable
INNER JOIN /*Bunch of inner joins here*/
WHERE ItemID IN (' + $MyList + ');
EXEC($SQL);
(The Right Way) Create a temporary table that contains the values of $MyList, then join that table in the initial query.
My question is:
Option 2 has a relatively large performance hit with creating a temporary table, which is less than ideal.
While Option 1 is open to an SQL injection attack, since my SPROC is being called from an authenticated source, does it really matter? Only trusted sources will execute this SPROC, so if they choose to bugger up the database, that is their prerogative.
So, how far would you go to make your code secure?
What database are you using? in SQL Server you can create a split function that can split a long string and return a table sub-second. you use the table function call like a regular table in a query (no temp table necessary)
You need to create a split function, or if you have one just use it. This is how a split function can be used:
SELECT
*
FROM YourTable y
INNER JOIN dbo.yourSplitFunction(#Parameter) s ON y.ID=s.Value
I prefer the number table approach to split a string in TSQL but there are numerous ways to split strings in SQL Server, see the previous link, which explains the PROs and CONs of each.
For the Numbers Table method to work, you need to do this one time table setup, which will create a table Numbers that contains rows from 1 to 10,000:
SELECT TOP 10000 IDENTITY(int,1,1) AS Number
INTO Numbers
FROM sys.objects s1
CROSS JOIN sys.objects s2
ALTER TABLE Numbers ADD CONSTRAINT PK_Numbers PRIMARY KEY CLUSTERED (Number)
Once the Numbers table is set up, create this split function:
CREATE FUNCTION [dbo].[FN_ListToTable]
(
#SplitOn char(1) --REQUIRED, the character to split the #List string on
,#List varchar(8000)--REQUIRED, the list to split apart
)
RETURNS TABLE
AS
RETURN
(
----------------
--SINGLE QUERY-- --this will not return empty rows
----------------
SELECT
ListValue
FROM (SELECT
LTRIM(RTRIM(SUBSTRING(List2, number+1, CHARINDEX(#SplitOn, List2, number+1)-number - 1))) AS ListValue
FROM (
SELECT #SplitOn + #List + #SplitOn AS List2
) AS dt
INNER JOIN Numbers n ON n.Number < LEN(dt.List2)
WHERE SUBSTRING(List2, number, 1) = #SplitOn
) dt2
WHERE ListValue IS NOT NULL AND ListValue!=''
);
GO
You can now easily split a CSV string into a table and join on it:
select * from dbo.FN_ListToTable(',','1,2,3,,,4,5,6777,,,')
OUTPUT:
ListValue
-----------------------
1
2
3
4
5
6777
(6 row(s) affected)
Your can use the CSV string like this, not temp table necessary:
SELECT * FROM tblMyTable
INNER JOIN /*Bunch of inner joins here*/
WHERE ItemID IN (select ListValue from dbo.FN_ListToTable(',',$MyList));
I would personally prefer option 2 in that just because a source is authenticated, does not mean you should be letting your guard down. You would leave yourself open to potential rights escalations where an authenticated low lvl user, is able to still execute commands against the database you had not intended.
The phrase you use of 'trusted sources' - it might be better if you assume an X-Files aproach and to trust no-one.
If someone buggers up the database you might still be getting a call.
A good option that is similar to option two is to use a function to create a table in memory from the CSV list. It is reasonably fast and offers the protections of option two. Then that table can be joined to the Inner Join, e.g.
CREATE FUNCTION [dbo].[simple_strlist_to_tbl] (#list nvarchar(MAX))
RETURNS #tbl TABLE (str varchar(4000) NOT NULL) AS
BEGIN
DECLARE #pos int,
#nextpos int,
#valuelen int
SELECT #pos = 0, #nextpos = 1
WHILE #nextpos > 0
BEGIN
SELECT #nextpos = charindex(',', #list, #pos + 1)
SELECT #valuelen = CASE WHEN #nextpos > 0
THEN #nextpos
ELSE len(#list) + 1
END - #pos - 1
INSERT #tbl (str)
VALUES (substring(#list, #pos + 1, #valuelen))
SELECT #pos = #nextpos
END
RETURN
END
Then in the join:
tblMyTable INNER JOIN
simple_strlist_to_tbl(#MyList) list ON tblMyTable.itemId = list.str
Option 3 is to confirm each item in the list is in fact an integer before concatenating the string to your SQL statement.
Do this by parsing the input string (e.g., split into an array), loop through and convert each value to an int, and then recreate the list yourself before concatenating back to the SQL statement. This will give you reasonable assurance that SQL injection cannot occur.
It is safer to concatenate strings that have been created by your application, because you can do things like check for int, but it also means your code is written in a way that a subsequent developer may modify slightly, thus opening back up the risk of SQL injection, because they do not realize that is what your code is protecting against. Make sure you comment well what you are doing if you go this route.
A third option: pass the values to the stored procedure in an array. Then you can either assemble the comma-separated string in your code and use the dynamic SQL option, or (if your flavour of RDBMS permits it) use the array directly in the SELECT statement.
Why don't You write an CLR split function, that will do all the job nice and easy? You can write user Defined table functions which will return a table doing string splitting with .Net infructure. Hell in SQL 2008 you can even give them hints if they return the strings sorted in any way... like ascending or something which can help the optimizer?
Or maybe You can't do CLR integration then You have to stick to the tsql but I personally would go for the CLR soluton

SQL Inline or Scalar Function?

So I need an SQL function that will concatenate a bunch of row values into one varchar.
I have the functions written but right now I'm focused on what is the better choice for performance.
The Scalar Function is
CREATE FUNCTION fn_GetPatients_ByRecipient (#recipient int)
RETURNS varchar(max)
AS
BEGIN
DECLARE #patients varchar(max)
SET #patients = ''
SELECT #patients = #patients + convert(varchar, Patient) + ';' FROM RecipientsPatients WHERE Recipient = #recipient
RETURN #patients
END
The Inline Function just returns a table of all the values instead of concatenating them.
CREATE FUNCTION fn_GetPatients_ByRecipient (#recipient int)
RETURNS TABLE
AS
RETURN
(
SELECT Patient FROM RecipientsPatients WHERE Recipient = #recipient
)
I would then take this table in a separate function and concatenate them together. I was thinking the second choice is best since I will be going row by row through a smaller data set. Any opinions on what I'm doing right/wrong would be appreciated.
Thanks
This problem of string concatenation in SQL Server has several solutions, and the pros and cons are discussed in Concatenating Row Values in Transact-SQL and other similar articles on the web.
My favourite solution is using the FOR XML PATH(' ') trick. The chain assignment method you use works fine, although is not officialy supported and hence may break in future. Your method should be among the fastest possible, if not the fastes, as long as the table valued function does not perform a full scan, ie. you have an index on Recipient that covers Patient (use include).
The only thing I would add is to declare both functions WITH SCHEMABINDING, this has side effects that improve performance.
See here for an example of using the FOR XML PATH trick
set nocount on;
declare #t table (id int, name varchar(20), x char(1))
insert into #t (id, name, x)
select 1,'test1', 'a' union
select 1,'test1', 'b' union
select 1,'test1', 'c' union
select 2,'test2', 'a' union
select 2,'test2', 'c' union
select 3,'test3', 'b' union
select 3,'test3', 'c'
SELECT p1.id, p1.name,
stuff((SELECT ', ' + x
FROM #t p2
WHERE p2.id = p1.id
ORDER BY name, x
FOR XML PATH('') ), 1,2, '') AS p3
FROM #t p1
GROUP BY
id, name
it returns
1 test1 a, b, c
2 test2 a, c
3 test3 b, c
Have a look at Adam Machanic's results from his Grouped String Concatenation Contest:
http://web.archive.org/web/20150328021904/http://sqlblog.com/blogs/adam_machanic/archive/2009/05/31/grouped-string-concatenation-the-winner-is.aspx
It has the code to show you the most efficient way to do this. Peter Larsson, who won the contest, used a combination of tricks including XML PATH to accomplish the task. There was some debate later about whether it was the most efficient solution based on subsequent tests of other submissions. Make sure you check the comments to know what scripts to look at in the zip file you can download there. Generally FOR XML PATH('') is the fastest though.