Related
My goal is to get the total number "the", "when", and "wolf" in this dataset.
I would have a total of 14 count in the return query.
Edit: "the" is also added to the count in the word "they"
CREATE TABLE #BaseTable([Text] VARCHAR (500))
INSERT INTO #BaseTable ([Text]) VALUES
('When the villagers heard the cry, they came running up the hill to drive the wolf away.'),
('But, when they arrived, they saw no wolf.'),
('The boy was amused when seeing their angry faces.')
The version of SQL Server:
Microsoft SQL Server 2014 - 12.0.2000.8 (Intel X86)
Feb 20 2014 19:20:46
Copyright (c) Microsoft Corporation
Express Edition on Windows NT 6.3 <X64> (Build 9600: ) (WOW64) (Hypervisor)
You could do this with string functions and a lateral join:
select bt.*, x.*
from #basetable bt
cross apply (
select sum(no_matches) no_matches
from (
select (len(bt.text) - len(replace(bt.text, w.word, ''))) / len(w.word) no_matches
from (values ('the'), ('when'), ('wolf')) w(word)
) x
) x
If you want just the total number of matches in the resultset, then:
select sum(no_matches) total_matches
from #basetable bt
cross apply (
select (len(bt.text) - len(replace(bt.text, w.word, ''))) / len(w.word) no_matches
from (values ('the'), ('when'), ('wolf')) w(word)
) x
Demo on DB Fiddle
You can split the strings and count:
select count(*)
from t cross apply
string_split(replace(replace(t.text, '.', ''), ',', ''), ' ') s
where s.value in ('the', 'when', 'wolf');
SQL Server is not optimal for such string operations, but it is possible.
A very simplistic approach could look like this:
DECLARE #BaseTable TABLE ([Text] VARCHAR(255))
INSERT INTO #BaseTable ([Text]) VALUES
('When the villagers heard the cry, they came running up the hill to drive the wolf away.'),
('But, when they arrived, they saw no wolf.'),
('The boy was amused when seeing their angry faces.')
DECLARE #Val1 VARCHAR(255) = 'when'
, #Val2 VARCHAR(255) = 'but'
, #Val3 VARCHAR(255) = 'wolf'
SELECT (LEN(BT.Text) - LEN(REPLACE(BT.Text, #Val1, ''))) /LEN(#Val1)
, (LEN(BT.Text) - LEN(REPLACE(BT.Text, #Val2, ''))) /LEN(#Val2)
, (LEN(BT.Text) - LEN(REPLACE(BT.Text, #Val3, ''))) /LEN(#Val3)
FROM #BaseTable AS BT
The idea is to calculate the difference in length when the target word is replaced with nothing. But you don't really want to do this on the database side, if you have a choice.
edit:
Summing all the values is pretty straight forward:
SELECT SUM(val1) + SUM(val2) + SUM(val3)
FROM (
SELECT (LEN(BT.Text) - LEN(REPLACE(BT.Text, #Val1, ''))) /LEN(#Val1) AS val1
, (LEN(BT.Text) - LEN(REPLACE(BT.Text, #Val2, ''))) /LEN(#Val2) AS val2
, (LEN(BT.Text) - LEN(REPLACE(BT.Text, #Val3, ''))) /LEN(#Val3) AS val3
FROM #BaseTable AS BT
) AS subQ
If you have a version of SQL Server older than 2016 and STRING_SPLIT is not available then an ordinal splitter function, such as dbo.DelimitedSplit8K, could be used to separate the words.
dbo.DelimitedSplit8K
CREATE FUNCTION dbo.DelimitedSplit8K
--===== Define I/O parameters
(#pString VARCHAR(8000), #pDelimiter CHAR(1))
--WARNING!!! DO NOT USE MAX DATA-TYPES HERE! IT WILL KILL PERFORMANCE!
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
--===== "Inline" CTE Driven "Tally Table" produces values from 1 up to 10,000...
-- enough to cover VARCHAR(8000)
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT TOP (ISNULL(DATALENGTH(#pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT 1 UNION ALL
SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(#pString,t.N,1) = #pDelimiter
),
cteLen(N1,L1) AS(--==== Return start and length (for use in substring)
SELECT s.N1,
ISNULL(NULLIF(CHARINDEX(#pDelimiter,#pString,s.N1),0)-s.N1,8000)
FROM cteStart s
)
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
Item = SUBSTRING(#pString, l.N1, l.L1)
FROM cteLen l
;
Query
select count(*)
from #BaseTable bt
cross apply dbo.DelimitedSplit8K(bt.[Text], ' ') ds
where ds.Item in ('the', 'when', 'wolf');
Output
9
I need to order a column sequentially by each delimited section. For example, given the sample data:
a
----------
120.1
120.2
120.12
120
130
120.2.22
120.2.41
120.3
I need to obtain the following output:
120
120.1
120.2
120.2.22
120.2.41
120.3
120.12
130
I use this query but its doesn't work
Query ;
Select a from b rpad(REPLACE(a, '.', ''),15,0),REPLACE(a, '.', '') ASC
I agree with all the comments from Sean Lange et al. Since you can have unlimited number of decimals, you need a splitter. Once the values are split, then you could apply the same ORDER BY logic I have shown. I think the ORDER BY really explains the algorithm you were looking for.
Here is a way with a splitter:
declare #table table (col varchar(64))
insert into #table
values
('120.1'),
('120.2'),
('120.12'),
('120'),
('130'),
('120.2.22'),
('120.2.41.55.64.12'),
('120.3')
;with cte as(
select
*
from #table t
cross apply dbo.DelimitedSplit8K(t.col,'.')
pivot(
max(Item) for ItemNumber in ([1],[2],[3],[4],[5],[6],[7],[8]) --enter the number of possibilities
) p)
select
cte.*
from cte
order by
cast(isnull(cte.[1],0) as int)
,cast(isnull(cte.[2],0) as int)
,cast(isnull(cte.[3],0) as int)
,cast(isnull(cte.[4],0) as int)
,cast(isnull(cte.[5],0) as int)
,cast(isnull(cte.[6],0) as int)
,cast(isnull(cte.[7],0) as int)
,cast(isnull(cte.[8],0) as int)
The function, if needed:
CREATE FUNCTION [dbo].[DelimitedSplit8K] (#pString VARCHAR(8000), #pDelimiter CHAR(1))
--WARNING!!! DO NOT USE MAX DATA-TYPES HERE! IT WILL KILL PERFORMANCE!
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
/* "Inline" CTE Driven "Tally Table" produces values from 1 up to 10,000... enough to cover VARCHAR(8000)*/
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT TOP (ISNULL(DATALENGTH(#pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT 1 UNION ALL
SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(#pString,t.N,1) = #pDelimiter
),
cteLen(N1,L1) AS(--==== Return start and length (for use in substring)
SELECT s.N1,
ISNULL(NULLIF(CHARINDEX(#pDelimiter,#pString,s.N1),0)-s.N1,8000)
FROM cteStart s
)
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
Item = SUBSTRING(#pString, l.N1, l.L1)
FROM cteLen l
GO
First create a function to split. I stole this one from https://sqlperformance.com/2012/07/t-sql-queries/split-strings
create FUNCTION dbo.SplitStrings_XML
(
#List NVARCHAR(MAX),
#Delimiter NVARCHAR(255)
)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
(
SELECT rn=row_number() over (order by i), Item = y.i.value('(./text())[1]', 'nvarchar(4000)')
FROM
(
SELECT x = CONVERT(XML, '<i>'
+ REPLACE(#List, #Delimiter, '</i><i>')
+ '</i>').query('.')
) AS a CROSS APPLY x.nodes('i') AS y(i)
);
GO
declare #t table (a varchar(64))
insert into #t
values
('120.1'),
('120.2'),
('120.12'),
('120'),
('130'),
('120.2.22'),
('120.2.41'),
('120.3')
select a
, a1 = max(cast(case when rn=1 then Item else null end as int))
, a2 = max(cast(case when rn=2 then Item else null end as int))
, a3 = max(cast(case when rn=3 then Item else null end as int))
, a4 = max(cast(case when rn=4 then Item else null end as int))
from #t t
cross apply dbo.SplitStrings_XML(t.a,'.') as a
group by t.a
order by 2,3,4,5
I have a column in Table1 with string in it separated by commma:
Id Val
1 ,4
2 ,3,1,0
3 NULL
4 ,5,2
Is there a simple way to split and get any value from that column,
for example
SELECT Value(1) FROM Table1 should get
Id Val
1 4
2 3
3 NULL
4 5
SELECT Value(2) FROM Table1 should get
Id Val
1 NULL
2 1
3 NULL
4 2
Thank you!
Storing comma separated values in a column is always a pain, consider changing your table structure
To get this done, create a split string function. Here is one of the best possible approach to split the string to individual rows. Referred from http://www.sqlservercentral.com/articles/Tally+Table/72993/
CREATE FUNCTION [dbo].[DelimitedSplit8K]
(#pString VARCHAR(8000), #pDelimiter CHAR(1))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
--===== "Inline" CTE Driven "Tally Table" produces values from 0 up to 10,000...
-- enough to cover NVARCHAR(4000)
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT TOP (ISNULL(DATALENGTH(#pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT 1 UNION ALL
SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(#pString,t.N,1) = #pDelimiter
),
cteLen(N1,L1) AS(--==== Return start and length (for use in substring)
SELECT s.N1,
ISNULL(NULLIF(CHARINDEX(#pDelimiter,#pString,s.N1),0)-s.N1,8000)
FROM cteStart s
)
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
Item = SUBSTRING(#pString, l.N1, l.L1)
FROM cteLen l
to call the function
SELECT *
FROM yourtable
CROSS apply (SELECT CASE WHEN LEFT(val, 1) = ',' THEN Stuff(val, 1, 1, '') ELSE val END) cs (cleanedval)
CROSS apply [dbo].[Delimitedsplit8k](cs.cleanedval, ',')
WHERE ItemNumber = 1
SELECT *
FROM yourtable
CROSS apply (SELECT CASE WHEN LEFT(val, 1) = ',' THEN Stuff(val, 1, 1, '') ELSE val END) cs (cleanedval)
CROSS apply [dbo].[Delimitedsplit8k](cs.cleanedval, ',')
WHERE ItemNumber = 2
Another option using a Parse/Split Function and an OUTER APPLY
Example
Declare #YourTable Table ([Id] int,[Val] varchar(50))
Insert Into #YourTable Values
(1,',4')
,(2,',3,1,0')
,(3,NULL)
,(4,',5,2')
Select A.ID
,Val = B.RetVal
From #YourTable A
Outer Apply (
Select * From [dbo].[tvf-Str-Parse](A.Val,',')
Where RetSeq = 2
) B
Returns
ID Val
1 4
2 3
3 NULL
4 5
The UDF if Interested
CREATE FUNCTION [dbo].[tvf-Str-Parse] (#String varchar(max),#Delimiter varchar(10))
Returns Table
As
Return (
Select RetSeq = Row_Number() over (Order By (Select null))
,RetVal = LTrim(RTrim(B.i.value('(./text())[1]', 'varchar(max)')))
From (Select x = Cast('<x>' + replace((Select replace(#String,#Delimiter,'§§Split§§') as [*] For XML Path('')),'§§Split§§','</x><x>')+'</x>' as xml).query('.')) as A
Cross Apply x.nodes('x') AS B(i)
);
Here is an example of using a CTE combined with converting the CSV to XML:
DECLARE #Test TABLE (
CsvData VARCHAR(10)
);
INSERT INTO #Test (CsvData)
VALUES
('1,2,3'),
(',4,5,7'),
(NULL),
(',3,');
WITH XmlData AS (
SELECT CONVERT(XML, '<val>' + REPLACE(CsvData, ',', '</val><val>') + '</val>') [CsvXml]
FROM #Test
)
SELECT xd.CsvXml.value('val[2]', 'VARCHAR(10)')
FROM XmlData xd;
This would output:
2
4
NULL
3
The column to display is controlled by the XPath query. In this case, val[2].
The main advantage here is that no user-defined functions are required.
Try This Logic Using recursive CTE
DECLARE #Pos INT = 2
DECLARE #T TABLE
(
Id INT,
Val VARCHAR(50)
)
INSERT INTO #T
VALUES(1,',4'),(2,',3,1,0'),(3,NULL),(4,',5,2')
;WITH CTE
AS
(
SELECT
Id,
SeqNo = 0,
MyStr = SUBSTRING(Val,CHARINDEX(',',Val)+1,LEN(Val)),
Num = REPLACE(SUBSTRING(Val,1,CHARINDEX(',',Val)),',','')
FROM #T
UNION ALL
SELECT
Id,
SeqNo = SeqNo+1,
MyStr = CASE WHEN CHARINDEX(',',MyStr)>0
THEN SUBSTRING(MyStr,CHARINDEX(',',MyStr)+1,LEN(MyStr))
ELSE NULL END,
Num = CASE WHEN CHARINDEX(',',MyStr)>0
THEN REPLACE(SUBSTRING(MyStr,1,CHARINDEX(',',MyStr)),',','')
ELSE MyStr END
FROM CTE
WHERE ISNULL(REPLACE(MyStr,',',''),'')<>''
)
SELECT
T.Id,
CTE.Num
FROM #T t
LEFT JOIN CTE
ON T.Id = cte.Id
AND SeqNo = #Pos
My Output for the above
Test Data
Declare #t TABLE (Id INT , Val VARCHAR(100))
INSERT INTO #t VALUES
(1 , '4'),
(2 , '3,1,0'),
(3 , NULL),
(4 , '5,2')
Function Definition
CREATE FUNCTION [dbo].[fn_xml_Splitter]
(
#delimited nvarchar(max)
, #delimiter nvarchar(1)
, #Position INT = NULL
)
RETURNS TABLE
AS
RETURN
(
SELECT Item
FROM (
SELECT Split.a.value('.', 'VARCHAR(100)') Item
, ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) ItemNumber
FROM
(SELECT Cast ('<X>' + Replace(#delimited, #delimiter, '</X><X>')
+ '</X>' AS XML) AS Data
) AS t CROSS APPLY Data.nodes ('/X') AS Split(a)
)x
WHERE x.ItemNumber = #Position OR #Position IS NULL
);
GO
Function Call
Now you can call this function in two different ways.
1 . to get return an Item on a specific position, specify the position in the 3rd parameter of the function:
SELECT *
FROM #t t
CROSS APPLY [dbo].[fn_xml_Splitter](t.Val , ',', 1)
2 . to get return all items, specify the key word DEFUALT in the 3rd parameter of the function:
SELECT *
FROM #t t
CROSS APPLY [dbo].[fn_xml_Splitter](t.Val , ',', DEFAULT)
How I can do this using SQL Server?
Here's one of the many, popular splitters.
ONLINE DEMO
declare #table table (accountnum int, [services] varchar(1000), PIN int)
insert into #table
values
(30200,'ASCF008,ASFTCTAF',111111),
(30200,'AFTCTAF',222222),
(30200,'AFTCTAF,ASCF004',555555)
Select
accountnum
,[services] = Item
,PIN
from
#table
cross apply DelimitedSplit8K([services],',')
THE SPLIT FUNCTION BY JEFF MODEN
/****** Object: UserDefinedFunction [dbo].[DelimitedSplit8K] Script Date: 09/15/2017 10:51:16 AM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE FUNCTION [dbo].[DelimitedSplit8K] (#pString VARCHAR(8000), #pDelimiter CHAR(1))
--WARNING!!! DO NOT USE MAX DATA-TYPES HERE! IT WILL KILL PERFORMANCE!
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
/* "Inline" CTE Driven "Tally Table" produces values from 1 up to 10,000...
enough to cover VARCHAR(8000)*/
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT TOP (ISNULL(DATALENGTH(#pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT 1 UNION ALL
SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(#pString,t.N,1) = #pDelimiter
),
cteLen(N1,L1) AS(--==== Return start and length (for use in substring)
SELECT s.N1,
ISNULL(NULLIF(CHARINDEX(#pDelimiter,#pString,s.N1),0)-s.N1,8000)
FROM cteStart s
)
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
Item = SUBSTRING(#pString, l.N1, l.L1)
FROM cteLen l
;
GO
In SQL Server 2016 there is a STRING_SPLIT() function.
Reference here: https://learn.microsoft.com/en-us/sql/t-sql/functions/string-split-transact-sql
As for earlier versions, great and simple answer is here:
Turning a Comma Separated string into individual rows
Try this logic, it separates the commas.
SELECT Accountnum,
LTRIM(RTRIM(m.n.value('.[1]','varchar(8000)'))) AS [Services],
PTN
FROM
(
SELECT
CAST('<XMLRoot><RowData>' + REPLACE([Services],',','</RowData><RowData>') + '</RowData></XMLRoot>' AS XML) AS x
FROM
(
SELECT Accountnum,
[Services],
PTN
FROM dbo.TESTTABLE
) AS XMLData
) AS Result
CROSS APPLY x.nodes('/XMLRoot/RowData')m(n)
Your table design is wrong. You need an additional services table and an account services table:
tblAccount
AccountID
AccountNum
ServiceID
PTN
tblServices
ServiceID
Service
tblAccountService
AccountID
ServiceID
Your design is against First Normal Form
I would use something like this.
Left of comma
SELECT LEFT(services,CHARINDEX(',',services)-1) FROM table
Right of comma
SELECT Right(services,CHARINDEX(',',REVERSE(services))-1) FROM table
I have a simple Table-Valued function that takes around 5 second to execute. The function holds a query which returns the data in 1 sec. I have read through some blogs where it is said that this might be due to parameter sniffing but couldn't find a resolution yet. How can I fix the function if it is due to parameter sniffing?
CREATE FUNCTION [dbo].[fn_PurchaseRecord]
(
#ID INT = NULL,
#Name nvarchar(MAX),
#PurchaseDate DATE
)
RETURNS #result TABLE
(
[ID] [int] NULL,
[Name] [varchar](20) NULL,
[BasePrice] [FLOAT] NULL,
[Amount] [FLOAT]
)
AS BEGIN
WITH CTE_Purchase AS
(
SELECT
ht.ID,
ProductName AS Name,
BasePrice AS BasePrice
FROM
data.PurchaseRecord i (NOLOCK)
WHERE
i.ID = #ID
AND
Date = #PurchaseDate
AND
BuyerName=#Name
)
INSERT INTO #result
SELECT
ID,
Name,
BasePrice,
BasePrice*10.25
FROM
CTE_Purchase
RETURN;
END
Why not a single-statement TVF ?
CREATE FUNCTION [dbo].[fn_PurchaseRecordTESTFIRST]
(
#ID INT = NULL,
#Name nvarchar(MAX),
#PurchaseDate DATE
)
RETURNS TABLE
Return (
SELECT ID
,Name = ProductName
,BasePrice
,Amount = BasePrice*10.25
FROM data.PurchaseRecord i
WHERE i.ID = #ID
AND Date = #PurchaseDate
AND BuyerName=#Name
)
If parameter sniffing is happening it's the least of your worries - Sean hit nail on the head when saying that Multi-statement Table Valued Functions (mTVFs) should be avoided like the plague. By design, they're going to be much slower than an inline Table Valued Function (iTVF) in that you define a table, populate it, then return it. iTVF's, on the other hand, can be thought of as views that accept parameters and returns data directly from the underlying tables.
Another HUGE problem with mTVFs is that they kill parallelism; this means that if you have 2 CPUS or 2,000 CPUs only only ONE will work on resolving your query. No exceptions. Looks have a look at Jeff Moden's delimitedsplit8K:
CREATE FUNCTION [dbo].[DelimitedSplit8K]
--===== Define I/O parameters
(#pString VARCHAR(8000), #pDelimiter CHAR(1))
--WARNING!!! DO NOT USE MAX DATA-TYPES HERE! IT WILL KILL PERFORMANCE!
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
--===== "Inline" CTE Driven "Tally Table" produces values from 1 up to 10,000...
-- enough to cover VARCHAR(8000)
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT TOP (ISNULL(DATALENGTH(#pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT 1 UNION ALL
SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(#pString,t.N,1) = #pDelimiter
),
cteLen(N1,L1) AS(--==== Return start and length (for use in substring)
SELECT s.N1,
ISNULL(NULLIF(CHARINDEX(#pDelimiter,#pString,s.N1),0)-s.N1,8000)
FROM cteStart s
)
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
Item = SUBSTRING(#pString, l.N1, l.L1)
FROM cteLen l;
GO
Now let's build an mTVF version like so and do a performance test...
CREATE FUNCTION [dbo].[DelimitedSplit8K_MTVF]
(#pString VARCHAR(8000), #pDelimiter CHAR(1))
RETURNS #table TABLE (ItemNumber int, Item varchar(100))
AS
BEGIN
--===== "Inline" CTE Driven "Tally Table" produces values from 1 up to 10,000...
-- enough to cover VARCHAR(8000)
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT TOP (ISNULL(DATALENGTH(#pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT 1 UNION ALL
SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(#pString,t.N,1) = #pDelimiter
),
cteLen(N1,L1) AS(--==== Return start and length (for use in substring)
SELECT s.N1,
ISNULL(NULLIF(CHARINDEX(#pDelimiter,#pString,s.N1),0)-s.N1,8000)
FROM cteStart s
)
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
INSERT #table
SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
Item = SUBSTRING(#pString, l.N1, l.L1)
FROM cteLen l;
RETURN;
END
GO
Before continuing I want to address #John Cappelletti 's statement:
I've seen claims like this before [about MAX data types], but I've yet to see any compelling stats
For some compelling stats let's make a minor tweek to the iTVF version of delimitedSplit8K and change the input string to varchar(max):
CREATE FUNCTION [dbo].[DelimitedSplit8K_VCMAXINPUT]
(#pString VARCHAR(max), #pDelimiter CHAR(1))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
--===== "Inline" CTE Driven "Tally Table" produces values from 1 up to 10,000...
-- enough to cover VARCHAR(8000)
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT TOP (ISNULL(DATALENGTH(#pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT 1 UNION ALL
SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(#pString,t.N,1) = #pDelimiter
),
cteLen(N1,L1) AS(--==== Return start and length (for use in substring)
SELECT s.N1,
ISNULL(NULLIF(CHARINDEX(#pDelimiter,#pString,s.N1),0)-s.N1,8000)
FROM cteStart s
)
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
Item = SUBSTRING(#pString, l.N1, l.L1)
FROM cteLen l;
GO
Now we have three versions of the function: the original iTVF, one that accepts varchar(max) and an mTVF version. Now a performance test.
-- sample data
IF OBJECT_ID('tempdb..#string') IS NOT NULL DROP TABLE #string;
SELECT TOP (10000)
id = IDENTITY(int, 1,1),
txt = REPLICATE(newid(), ABS(checksum(newid())%5)+1)
INTO #string
FROM sys.all_columns a, sys.all_columns b;
SET NOCOUNT ON;
-- Performance tests:
PRINT 'ITVF 8K'+char(13)+char(10)+replicate('-',90);
GO
DECLARE #st datetime2 = getdate(), #x varchar(20);
SELECT #x = ds.Item
FROM #string s
CROSS APPLY dbo.DelimitedSplit8K(s.txt, '-') ds;
PRINT datediff(ms, #st, getdate());
GO 5
PRINT 'MTVF 8K'+char(13)+char(10)+replicate('-',90);
GO
DECLARE #st datetime2 = getdate(), #x varchar(20);
SELECT #x = ds.Item
FROM #string s
CROSS APPLY dbo.DelimitedSplit8K_MTVF(s.txt, '-') ds;
PRINT datediff(ms, #st, getdate());
GO 5
PRINT 'ITVF VCMAX'+char(13)+char(10)+replicate('-',90);
GO
DECLARE #st datetime2 = getdate(), #x varchar(20);
SELECT #x = ds.Item
FROM #string s
CROSS APPLY dbo.DelimitedSplit8K_VCMAXINPUT(s.txt, '-') ds;
PRINT datediff(ms, #st, getdate());
GO 5
and the results:
ITVF 8K
------------------------------------------------------------------------------------------
Beginning execution loop
280
267
284
300
280
Batch execution completed 5 times.
MTVF 8K
------------------------------------------------------------------------------------------
Beginning execution loop
1190
1190
1157
1173
1187
Batch execution completed 5 times.
ITVF VCMAX
------------------------------------------------------------------------------------------
Beginning execution loop
1204
1220
1190
1190
1203
Batch execution completed 5 times.
Both the mTVF and iTVF version that takes varchar(max) are 4-5 times slower. Again: Avoid mTVFs like the plague and avoid max data types whenever possible.