Efficiently replacing many characters from a string - sql

I would like to know the most efficient way of removing any occurrence of characters like , ; / " from a varchar column.
I have a function like this but it is incredibly slow. The table has about 20 million records.
CREATE FUNCTION [dbo].[Udf_getcleanedstring] (#s VARCHAR(255))
returns VARCHAR(255)
AS
BEGIN
DECLARE #o VARCHAR(255)
SET #o = Replace(#s, '/', '')
SET #o = Replace(#o, '-', '')
SET #o = Replace(#o, ';', '')
SET #o = Replace(#o, '"', '')
RETURN #o
END

Whichever method you use it is probably worth adding a
WHERE YourCol LIKE '%[/-;"]%'
Except if you suspect that a very large proportion of rows will in fact contain at least one of the characters that need to be stripped.
As you are using this in an UPDATE statement then simply adding the WITH SCHEMABINDING attribute can massively improve things and allow the UPDATE to proceed row by row rather than needing to cache the entire operation in a spool first for Halloween Protection
Nested REPLACE calls in TSQL are slow anyway though as they involve multiple passes through the strings.
You could knock up a CLR function as below (if you haven't worked with these before then they are very easy to deploy from an SSDT project as long as CLR execution is permitted on the server). The UPDATE plan for this too does not contain a spool.
The Regular Expression uses (?:) to denote a non capturing group with the various characters of interest separated by the alternation character | as /|-|;|\" (the " needs to be escaped in the string literal so is preceded by a slash).
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Text.RegularExpressions;
public partial class UserDefinedFunctions
{
private static readonly Regex regexStrip =
new Regex("(?:/|-|;|\")", RegexOptions.Compiled);
[SqlFunction]
public static SqlString StripChars(SqlString Input)
{
return Input.IsNull ? null : regexStrip.Replace((string)Input, "");
}
}

I want to show the huge performance differences between the using with 2 types of USER DIFINED FUNCTIONS:
User TABLE function
User SCALAR function
See the test example :
use AdventureWorks2012
go
-- create table for the test
create table dbo.FindString (ColA int identity(1,1) not null primary key,ColB varchar(max) );
declare #text varchar(max) = 'A web server can handle a Hypertext Transfer Protocol request either by reading
a file from its file ; system based on the URL <> path or by handling the request using logic that is specific
to the type of resource. In the case that special logic is invoked the query string will be available to that logic
for use in its processing, along with the path component of the URL.';
-- init process in loop 1,000,000
insert into dbo.FindString(ColB)
select #text
go 1000000
-- use one of the scalar function from the answers which post in this thread
alter function [dbo].[udf_getCleanedString]
(
#s varchar(max)
)
returns varchar(max)
as
begin
return replace(replace(replace(replace(#s,'/',''),'-',''),';',''),'"','')
end
go
--
-- create from the function above new function an a table function ;
create function [dbo].[utf_getCleanedString]
(
#s varchar(255)
)
returns table
as return
(
select replace(replace(replace(replace(#s,'/',''),'-',''),';',''),'"','') as String
)
go
--
-- clearing the buffer cach
DBCC DROPCLEANBUFFERS ;
go
-- update process using USER TABLE FUNCTIO
update Dest with(rowlock) set
dest.ColB = D.String
from dbo.FindString dest
cross apply utf_getCleanedString(dest.ColB) as D
go
DBCC DROPCLEANBUFFERS ;
go
-- update process using USER SCALAR FUNCTION
update Dest with(rowlock) set
dest.ColB = dbo.udf_getCleanedString(dest.ColB)
from dbo.FindString dest
go
AND these are the execution plan :
As you can see the UTF is much better the USF ,they 2 doing the same thing replacing string, but one return scalar and the other return as a table
Another important parameter for you to see (SET STATISTICS IO ON ;)

How about nesting them together in a single call:
create function [dbo].[udf_getCleanedString]
(
#s varchar(255)
)
returns varchar(255)
as
begin
return replace(replace(replace(replace(#s,'/',''),'-',''),';',''),'"','')
end
Or you may want to do an UPDATE on the table itself for the first time. Scalar functions are pretty slow.

Here is a similar question asked previously, I like this approach mentioned here.
How to Replace Multiple Characters in SQL?
declare #badStrings table (item varchar(50))
INSERT INTO #badStrings(item)
SELECT '>' UNION ALL
SELECT '<' UNION ALL
SELECT '(' UNION ALL
SELECT ')' UNION ALL
SELECT '!' UNION ALL
SELECT '?' UNION ALL
SELECT '#'
declare #testString varchar(100), #newString varchar(100)
set #teststring = 'Juliet ro><0zs my s0x()rz!!?!one!#!#!#!'
set #newString = #testString
SELECT #newString = Replace(#newString, item, '') FROM #badStrings
select #newString -- returns 'Juliet ro0zs my s0xrzone'

Related

How to replace all special characters in string

I have a table with the following columns:
dbo.SomeInfo
- Id
- Name
- InfoCode
Now I need to update the above table's InfoCode as
Update dbo.SomeInfo
Set InfoCode= REPLACE(Replace(RTRIM(LOWER(Name)),' ','-'),':','')
This replaces all spaces with - & lowercase the name
When I do check the InfoCode, I see there are Names with some special characters like
Cathe Friedrich''s Low Impact
coffeyfit-cardio-box-&-burn
Jillian Michaels: Cardio
Then I am manually writing the update sql against this as
Update dbo.SomeInfo
SET InfoCode= 'cathe-friedrichs-low-impact'
where Name ='Cathe Friedrich''s Low Impact '
Now, this solution is not realistic for me. I checked the following links related to Regex & others around it.
UPDATE and REPLACE part of a string
https://www.codeproject.com/Questions/456246/replace-special-characters-in-sql
But none of them is hitting the requirement.
What I need is if there is any character other [a-z0-9] replace it - & also there should not be continuous -- in InfoCode
The above Update sql has set some values of InfoCode as the-dancer's-workout®----starter-package
Some Names have value as
Sleek Technique™
The Dancer's-workout®
How can I write Update sql that could handle all such special characters?
Using NGrams8K you could split the string into characters and then rather than replacing every non-acceptable character, retain only certain ones:
SELECT (SELECT '' + CASE WHEN N.token COLLATE Latin1_General_BIN LIKE '[A-z0-9]'THEN token ELSE '-' END
FROM dbo.NGrams8k(V.S,1) N
ORDER BY position
FOR XML PATH(''))
FROM (VALUES('Sleek Technique™'),('The Dancer''s-workout®'))V(S);
I use COLLATE here as on my default collation in my instance the '™' is ignored, therefore I use a binary collation. You may want to use COLLATE to switch the string back to its original collation outside of the subquery.
This approach is fully inlinable:
First we need a mock-up table with some test data:
DECLARe #SomeInfo TABLE (Id INT IDENTITY, InfoCode VARCHAR(100));
INSERT INTO #SomeInfo (InfoCode) VALUES
('Cathe Friedrich''s Low Impact')
,('coffeyfit-cardio-box-&-burn')
,('Jillian Michaels: Cardio')
,('Sleek Technique™')
,('The Dancer''s-workout®');
--This is the query
WITH cte AS
(
SELECT 1 AS position
,si.Id
,LOWER(si.InfoCode) AS SourceText
,SUBSTRING(LOWER(si.InfoCode),1,1) AS OneChar
FROM #SomeInfo si
UNION ALL
SELECT cte.position +1
,cte.Id
,cte.SourceText
,SUBSTRING(LOWER(cte.SourceText),cte.position+1,1) AS OneChar
FROM cte
WHERE position < DATALENGTH(SourceText)
)
,Cleaned AS
(
SELECT cte.Id
,(
SELECT CASE WHEN ASCII(cte2.OneChar) BETWEEN 65 AND 90 --A-Z
OR ASCII(cte2.OneChar) BETWEEN 97 AND 122--a-z
OR ASCII(cte2.OneChar) BETWEEN 48 AND 57 --0-9
--You can easily add more ranges
THEN cte2.OneChar ELSE '-'
--You can easily nest another CASE to deal with special characters like the single quote in your examples...
END
FROM cte AS cte2
WHERE cte2.Id=cte.Id
ORDER BY cte2.position
FOR XML PATH('')
) AS normalised
FROM cte
GROUP BY cte.Id
)
,NoDoubleHyphens AS
(
SELECT REPLACE(REPLACE(REPLACE(normalised,'-','<>'),'><',''),'<>','-') AS normalised2
FROM Cleaned
)
SELECT CASE WHEN RIGHT(normalised2,1)='-' THEN SUBSTRING(normalised2,1,LEN(normalised2)-1) ELSE normalised2 END AS FinalResult
FROM NoDoubleHyphens;
The first CTE will recursively (well, rather iteratively) travers down the string, character by character and a return a very slim set with one row per character.
The second CTE will then GROUP the Ids. This allows for a correlated sub-query, where the actual check is performed using ASCII-ranges. FOR XML PATH('') is used to re-concatenate the string. With SQL-Server 2017+ I'd suggest to use STRING_AGG() instead.
The third CTE will use a well known trick to get rid of multiple occurances of a character. Take any two characters which will never occur in your string, I use < and >. A string like a--b---c will come back as a<><>b<><><>c. After replacing >< with nothing we get a<>b<>c. Well, that's it...
The final SELECT will cut away a trailing hyphen. If needed you can add similar logic to get rid of a leading hyphen. With v2017+ There was TRIM('-') to make this easier...
The result
cathe-friedrich-s-low-impact
coffeyfit-cardio-box-burn
jillian-michaels-cardio
sleek-technique
the-dancer-s-workout
You can create a User-Defined-Function for something like that.
Then use the UDF in the update.
CREATE FUNCTION [dbo].LowerDashString (#str varchar(255))
RETURNS varchar(255)
AS
BEGIN
DECLARE #result varchar(255);
DECLARE #chr varchar(1);
DECLARE #pos int;
SET #result = '';
SET #pos = 1;
-- lowercase the input and remove the single-quotes
SET #str = REPLACE(LOWER(#str),'''','');
-- loop through the characters
-- while replacing anything that's not a letter to a dash
WHILE #pos <= LEN(#str)
BEGIN
SET #chr = SUBSTRING(#str, #pos, 1)
IF #chr LIKE '[a-z]' SET #result += #chr;
ELSE SET #result += '-';
SET #pos += 1;
END;
-- SET #result = TRIM('-' FROM #result); -- SqlServer 2017 and beyond
-- multiple dashes to one dash
WHILE #result LIKE '%--%' SET #result = REPLACE(#result,'--','-');
RETURN #result;
END;
GO
Example snippet using the function:
-- using a table variable for demonstration purposes
declare #SomeInfo table (Id int primary key identity(1,1) not null, InfoCode varchar(100) not null);
-- sample data
insert into #SomeInfo (InfoCode) values
('Cathe Friedrich''s Low Impact'),
('coffeyfit-cardio-box-&-burn'),
('Jillian Michaels: Cardio'),
('Sleek Technique™'),
('The Dancer''s-workout®');
update #SomeInfo
set InfoCode = dbo.LowerDashString(InfoCode)
where (InfoCode LIKE '%[^A-Z-]%' OR InfoCode != LOWER(InfoCode));
select *
from #SomeInfo;
Result:
Id InfoCode
-- -----------------------------
1 cathe-friedrichs-low-impact
2 coffeyfit-cardio-box-burn
3 jillian-michaels-cardio
4 sleek-technique-
5 the-dancers-workout-

SQL Server 2012: Remove text from end of string

I'm new to SQL so please forgive me if I use incorrect terminology and my question sounds confused.
I've been tasked with writing a stored procedure which will be sent 3 variables as strings (varchar I think). I need to take two of the variables and remove text from the end of the variable and only from the end.
The strings/text I need to remove from the end of the variables are
co
corp
corporation
company
lp
llc
ltd
limited
For example this string
Global Widgets LLC
would become
Global Widgets
However it should only apply once so
Global Widgets Corporation LLC
Should become
Global Widgets Corporation
I then need to use the altered variables to do a SQL query.
This is to be used as a backup for an integration piece we have which makes a callout to another system. The other system takes the same variables and uses Regex to remove the strings from the end of variables.
I've tried different combinations of PATINDEX, SUBSTRING, REPLACE, STUFF but cannot seem to come up with something that will do the job.
===============================================================
Edit: I want to thank everyone for the answers provided so far, but I left out some information that I didn't think was important but judging by the answers seems like it would affect the processing.
My proc will start something like
ALTER PROC [dbo].[USP_MyDatabaseTable] #variableToBeAltered nvarchar(50)
AS
I will then need to remove all , and . characters. I've already figured out how to do this. I will then need to do the processing on #variableToBeAltered (technically there will be two variables) to remove the strings I listed previously. I must then remove all spaces from #variableToBeAltered. (Again I figured that part out). Then finally I will use #variableToBeAltered in my SQL query something like
SELECT [field1] AS myField
,[field2] AS myOtherField
FROM [MyData].[dbo].[MyDatabaseTable]
WHERE [field1] = (#variableToBeAltered);
I hope this information is more useful.
I'd keep all of your suffixes in a table to make this a little easier. You can then perform code like this either within a query or against a variable.
DECLARE #company_name VARCHAR(50) = 'Global Widgets Corporation LLC'
DECLARE #Suffixes TABLE (suffix VARCHAR(20))
INSERT INTO #Suffixes (suffix) VALUES ('LLC'), ('CO'), ('CORP'), ('CORPORATION'), ('COMPANY'), ('LP'), ('LTD'), ('LIMITED')
SELECT #company_name = SUBSTRING(#company_name, 1, LEN(#company_name) - LEN(suffix))
FROM #Suffixes
WHERE #company_name LIKE '%' + suffix
SELECT #company_name
The keys here are that you are only matching with strings that end in the suffix and it uses SUBSTRING rather than REPLACE to avoid accidentally removing copies of any of the suffixes from the middle of the string.
The #Suffixes table is a table variable here, but it makes more sense for you to just create it and fill it as a permanent table.
The query will just find the one row (if any) that matches its suffix with the end of your string. If a match is found then the variable will be set to a substring with the length of the suffix removed from the end. There will usually be a trailing space, but for a VARCHAR that will just get dropped off.
There are still a couple of potential issues to be aware of though...
First, if you have a company name like "Watco" then the "co" would be a false positive here. I'm not sure what can be done about that other than maybe making your suffixes include a leading space.
Second, if one suffix ends with one of your other suffixes then the ordering that they get applied could be a problem. You could get around this by only applying the row with the greatest length for suffix, but it gets a little more complicated, so I've left that out for now.
Building on the answer given by Tom H, but applying across the entire table:
set nocount on;
declare #suffixes table(tag nvarchar(20));
insert into #suffixes values('co');
insert into #suffixes values('corp');
insert into #suffixes values('corporation');
insert into #suffixes values('company');
insert into #suffixes values('lp');
insert into #suffixes values('llc');
insert into #suffixes values('ltd');
insert into #suffixes values('limited');
declare #companynames table(entry nvarchar(100),processed bit default 0);
insert into #companynames values('somecompany llc',0);
insert into #companynames values('business2 co',0);
insert into #companynames values('business3',0);
insert into #companynames values('business4 lpx',0);
while exists(select * from #companynames where processed = 0)
begin
declare #currentcompanyname nvarchar(100) = (select top 1 entry from #companynames where processed = 0);
update #companynames set processed = 1 where entry = #currentcompanyname;
update #companynames
set entry = SUBSTRING(entry, 1, LEN(entry) - LEN(tag))
from #suffixes
where entry like '%' + tag
end
select * from #companynames
You can use a query like below:
-- Assuming that you can maintain all patterns in a table or a temp table
CREATE TABLE tbl(pattern varchar(100))
INSERT INTO tbl values
('co'),('llc'),('beta')
--#a stores the string you need to manipulate, #lw & #b are variables to aid
DECLARE #a nvarchar(100), #b nvarchar(100), #lw varchar(100)
SET #a='alpha beta gamma'
SET #b=''
-- #t is a flag
DECLARE #t int
SET #t=0
-- Below is a loop
WHILE(#t=0 OR LEN(#a)=0 )
BEGIN
-- Store the current last word in the #lw variable
SET #lw=reverse(substring(reverse(#a),1, charindex(' ', reverse(#a)) -1))
-- check if the word is in pattern dictionary. If yes, then Voila!
SELECT #t=1 FROM tbl WHERE #lw like pattern
-- remove the last word from #a
SET #a=LEFT(#a,LEN(#a)-LEN(#lw))
IF (#t<>1)
BEGIN
-- all words which were not pattern are joined back onto this stack
SET #b=CONCAT(#lw,#b)
END
END
-- get back the remaining word
SET #a=CONCAT(#a,#b)
SELECT #a
drop table tbl
Do note that this method overcomes Tom's problem of
if you have a company name like "Watco" then the "co" would be a false positive here. I'm not sure what can be done about that other than maybe making your suffixes include a leading space.
use the replace function in SQL 2012,
declare #var1 nvarchar(20) = 'ACME LLC'
declare #var2 nvarchar(20) = 'LLC'
SELECT CASE
WHEN ((PATINDEX('%'+#var2+'%',#var1) <= (LEN(#var1)-LEN(#var2)))
Or (SUBSTRING(#var1,PATINDEX('%'+#var2+'%',#var1)-1,1) <> SPACE(1)))
THEN #var1
ELSE
REPLACE(#var1,#var2,'')
END
Here is another way to overcome the 'Runco Co' situation.
declare #var1 nvarchar(20) = REVERSE('Runco Co')
declare #var2 nvarchar(20) = REVERSE('Co')
Select REVERSE(
CASE WHEN(CHARINDEX(' ',#var1) > LEN(#var2)) THEN
SUBSTRING(#var1,PATINDEX('%'+#var2+'%',#var1)+LEN(#var2),LEN(#var1)-LEN(#var2))
ELSE
#var1
END
)

Escape SQL function string parameter within query

I have a SQL view that calls a scalar function with a string parameter. The problem is that the string occasionally has special characters which causes the function to fail.
The view query looks like this:
SELECT TOP (100) PERCENT
Id, Name, StartDate, EndDate
,dbo.[fnGetRelatedInfo] (Name) as Information
FROM dbo.Session
The function looks like this:
ALTER FUNCTION [dbo].[fnGetRelatedInfo]( #Name varchar(50) )
RETURNS varchar(200)
AS
BEGIN
DECLARE #Result varchar(200)
SELECT #Result = ''
SELECT #Result = #Result + Info + CHAR(13)+CHAR(10)
FROM [SessionInfo]
WHERE SessionName = #Name
RETURN #Result
END
How do I escape the name value so it will work when passed to the function?
I am guessing that the problem is non-unicode characters in dbo.Session.Name. Since the parameter to the function is VARCHAR, it will only hold unicode characters, so the non-unicode characters are lost when being passed to the function. The solution for this would be to change the parameter to be NVARCHAR(50).
However, if you care about performance, and more importantly consistent, reliable results stop using this function immediately. Alter your view to simply be:
SELECT s.ID,
s.Name,
s.StartDate,
s.EndDate,
( SELECT si.Info + CHAR(13)+CHAR(10)
FROM SessionInfo AS si
WHERE si.SessionName = s.Name
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)') AS Information
FROM dbo.Session AS s;
Using variable concatenation can lead to unexpected results which are dependent on the internal pathways of the execution plan. So I would rule this out as a solution immediately. Not only this, the RBAR nature of a scalar UDF means that this will not scale well at all.
Various ways of doing this grouped concatenation have been benchmarked here, where CLR is actually the winner, but this is not always an option.

update table with dynamic sql query

For a project, we are using a table (named txtTable) that contains all the texts. And each column contains a different language (for example column L9 is English, column L7 is German, etc..).
TextID L9 L7 L16 L10 L12
------------------------------------------------------
26 Archiving Archivierung NULL NULL NULL
27 Logging Protokollierung NULL NULL NULL
28 Comments Kommentar NULL NULL NULL
This table is located in a database on a Microsoft SQL Server 2005. The big problem is that this database name changes each time the program is restarted. This is a behavior typically for this third-party program and cannot be changed.
Next to this database and on the same server is our own database. In this database are several tables that point to the textID for generating data for reporting (SQL Server Reporting Services) in the correct language. This database contains also a table "ProjectSettings" with some properties like the name of the texttable database, and the stored procedures to generate the reporting data.
The way we now are requesting the right texts of the right language from this table with the changing database name is by creating a dynamic SQL query and execute it in a stored procedure.
Now we were wondering if there is a cleaner way to get the texts in the right language. We were thinking about creating a function with the textID and the language as a parameter, but we cannot find a good way to do this. We thought about a function so we just can use it in the select statement, but this doesn’t work:
CREATE FUNCTION [dbo].[GetTextFromLib]
(
#TextID int,
#LanguageColumn Varchar(5)
)
RETURNS varchar(255)
AS
BEGIN
-- return variables
DECLARE #ResultVar varchar(255)
-- Local variables
DECLARE #TextLibraryDatabaseName varchar(1000)
DECLARE #nvcSqlQuery varchar(1000)
-- get the report language database name
SELECT #TextLibraryDatabaseName = TextLibraryDatabaseName FROM ProjectSettings
SET #nvcSqlQuery = 'SELECT #ResultVar =' + #LanguageColumn + ' FROM [' + #TextLibraryDatabaseName + '].dbo.TXTTable WHERE TEXTID = ' + cast(#TextID as varchar(30))
EXEC(#nvcSqlQuery)
-- Return the result of the function
RETURN #ResultVar
END
Is there any way to work around this so we don’t have to use the dynamic sql in our stored procedures so it is only ‘contained’ in 1 function?
Thanks in advance & kind regards,
Kurt
Yes, it is possible with the help of synonym mechanism introduced with SQL Server 2005. So, you can create synonym during your setting up procedure based on data from ProjectSettings table and you can use it in your function. Your code will look something like this:
UPDATE: The code of function is commented here because it still contains dynamic SQL which does not work in function as Kurt said in his comment. New version of function is below this code.
-- Creating synonym for TXTTable table
-- somewhere in code when processing current settings
-- Suppose your synonym name is 'TextLibrary'
--
-- Drop previously created synonym
IF EXISTS (SELECT * FROM sys.synonyms WHERE name = N'TextLibrary')
DROP SYNONYM TextLibrary
-- Creating synonym using dynamic SQL
-- Local variables
DECLARE #TextLibraryDatabaseName varchar(1000)
DECLARE #nvcSqlQuery varchar(1000)
-- get the report language database name
SELECT #TextLibraryDatabaseName = TextLibraryDatabaseName FROM ProjectSettings
SET #nvcSqlQuery = 'CREATE SYNONYM TextLibrary FOR [' + #TextLibraryDatabaseName + '].dbo.TXTTable'
EXEC(#nvcSqlQuery)
-- Synonym created
/* UPDATE: This code is commented but left for discussion consistency
-- Function code
CREATE FUNCTION [dbo].[GetTextFromLib]
(
#TextID int,
#LanguageColumn Varchar(5)
)
RETURNS varchar(255)
AS
BEGIN
-- return variables
DECLARE #ResultVar varchar(255)
-- Local variables
DECLARE #nvcSqlQuery varchar(1000)
SET #nvcSqlQuery = 'SELECT #ResultVar =' + #LanguageColumn + ' FROM TextLibrary WHERE TEXTID = ' + cast(#TextID as varchar(30))
EXEC(#nvcSqlQuery)
-- Return the result of the function
RETURN #ResultVar
END
*/
UPDATE This is one more attempt to solve the problem. Now it uses some XML trick:
-- Function code
CREATE FUNCTION [dbo].[GetTextFromLib]
(
#TextID int,
#LanguageColumn Varchar(5)
)
RETURNS varchar(255)
AS
BEGIN
-- return variables
DECLARE #ResultVar varchar(255)
-- Local variables
DECLARE #XmlVar XML
-- Select required record into XML variable
-- XML has each table column value in element with corresponding name
SELECT #XmlVar = ( SELECT * FROM TextLibrary
WHERE TEXTID = #TextID
FOR XML RAW, ELEMENTS )
-- Select value of required element from XML
SELECT #ResultVar = Element.value('(.)[1]', 'varchar(255)')
FROM #XmlVar.nodes('/row/*') AS T(Element)
WHERE Element.value('local-name(.)', 'varchar(50)') = #LanguageColumn
-- Return the result of the function
RETURN #ResultVar
END
Hope this helps.
Credits to answerer of this question at Stackoverflow - How to get node name and values from an xml variable in t-sql
To me, it sounds like a total PITA... However, how large is this database of "words" you are dealing with. Especially if it is not changing much and remains pretty constant. Why not have on some normal cycle (such as morning), just have one dynamic query generated that queries the one that changes and synchronize it to a "standard" table name in YOUR database that won't change. Then, all your queries run against YOUR version and completely remove the constant dynamic queries every time. Yes there would need to be this synchronizing stored procedure to run, but if it can be run on a schedule, you should be fine, and again, how large is the table of "words" for proper language context.

How do I make a function in SQL Server that accepts a column of data?

I made the following function in SQL Server 2008 earlier this week that takes two parameters and uses them to select a column of "detail" records and returns them as a single varchar list of comma separated values. Now that I get to thinking about it, I would like to take this table and application-specific function and make it more generic.
I am not well-versed in defining SQL functions, as this is my first. How can I change this function to accept a single "column" worth of data, so that I can use it in a more generic way?
Instead of calling:
SELECT ejc_concatFormDetails(formuid, categoryName)
I would like to make it work like:
SELECT concatColumnValues(SELECT someColumn FROM SomeTable)
Here is my function definition:
FUNCTION [DNet].[ejc_concatFormDetails](#formuid AS int, #category as VARCHAR(75))
RETURNS VARCHAR(1000) AS
BEGIN
DECLARE #returnData VARCHAR(1000)
DECLARE #currentData VARCHAR(75)
DECLARE dataCursor CURSOR FAST_FORWARD FOR
SELECT data FROM DNet.ejc_FormDetails WHERE formuid = #formuid AND category = #category
SET #returnData = ''
OPEN dataCursor
FETCH NEXT FROM dataCursor INTO #currentData
WHILE (##FETCH_STATUS = 0)
BEGIN
SET #returnData = #returnData + ', ' + #currentData
FETCH NEXT FROM dataCursor INTO #currentData
END
CLOSE dataCursor
DEALLOCATE dataCursor
RETURN SUBSTRING(#returnData,3,1000)
END
As you can see, I am selecting the column data within my function and then looping over the results with a cursor to build my comma separated varchar.
How can I alter this to accept a single parameter that is a result set and then access that result set with a cursor?
Others have answered your main question - but let me point out another problem with your function - the terrible use of a CURSOR!
You can easily rewrite this function to use no cursor, no WHILE loop - nothing like that. It'll be tons faster, and a lot easier, too - much less code:
FUNCTION DNet.ejc_concatFormDetails
(#formuid AS int, #category as VARCHAR(75))
RETURNS VARCHAR(1000)
AS
RETURN
SUBSTRING(
(SELECT ', ' + data
FROM DNet.ejc_FormDetails
WHERE formuid = #formuid AND category = #category
FOR XML PATH('')
), 3, 1000)
The trick is to use the FOR XML PATH('') - this returns a concatenated list of your data columns and your fixed ', ' delimiters. Add a SUBSTRING() on that and you're done! As easy as that..... no dogged-slow CURSOR, no messie concatenation and all that gooey code - just one statement and that's all there is.
You can use table-valued parameters:
CREATE FUNCTION MyFunction(
#Data AS TABLE (
Column1 int,
Column2 nvarchar(50),
Column3 datetime
)
)
RETURNS NVARCHAR(MAX)
AS BEGIN
/* here you can do what you want */
END
You can use Table Valued Parameters as of SQL Server 2008, which would allow you to pass a TABLE variable in as a parameter. The limitations and examples for this are all in that linked article.
However, I'd also point out that using a cursor could well be painful for performance.
You don't need to use a cursor, as you can do it all in 1 SELECT statement:
SELECT #MyCSVString = COALESCE(#MyCSVString + ', ', '') + data
FROM DNet.ejc_FormDetails
WHERE formuid = #formuid AND category = #category
No need for a cursor
Your question is a bit unclear. In your first SQL statement it looks like you're trying to pass columns to the function, but there is no WHERE clause. In the second SQL statement you're passing a collection of rows (results from a SELECT). Can you supply some sample data and expected outcome?
Without fully understanding your goal, you could look into changing the parameter to be a table variable. Fill a table variable local to the calling code and pass that into the function. You could do that as a stored procedure though and wouldn't need a function.