Extracting pipe delimted field into rows - sql

I have a tbl with a field with values that are pipe delimited, and I need them extracted as rows.
Sample data
select distinct [PROV_KEY],
[NTWK_CDS]
FROM [SPOCK].[US\AC39169].[WellPointExtract_ERR]
where [PROV_KEY] = '447358B0A8E1C0F1B7AEB1ED07EC2F25'
--results
PROV_KEY NTWK_CDS
447358B0A8E1C0F1B7AEB1ED07EC2F25 |GA_HMO|GA_OPN|GA_PPO|GA_BD|GA_MCPPO|GA_HDPPO|
And I would like:
PROV_KEY NTWK_CDS
447358B0A8E1C0F1B7AEB1ED07EC2F25 GA_HMO
447358B0A8E1C0F1B7AEB1ED07EC2F25 GA_OPN
447358B0A8E1C0F1B7AEB1ED07EC2F25 GA_PPO
I tried the following but I'm only getting the first set of values:
select distinct [PROV_KEY],
substring([NTWK_CDS], 1,
CHARINDEX('|',[NTWK_CDS], CHARINDEX('|',[NTWK_CDS])+1))
FROM [SPOCK].[US\AC39169].[WellPointExtract_ERR]
where [PROV_KEY] = '447358B0A8E1C0F1B7AEB1ED07EC2F25'

This is a standard string splitting problem and there are many solutions out there. However most still feel like a workaround, as SQL Server does not have a split function build in.
You can start your research here: http://www.sommarskog.se/arrays-in-sql.html

The crucial operation you need to perform is a split. There are lots of solutions to this problem (see here for some), and people favor different ones depending on both situation and personal preference. Once you've done the split, though, you can JOIN or APPLY against the results to get the desired output.
I personally prefer using a SQLCLR function for this purpose since the performance is generally much better; but the number of options out there is staggering.

You can use splitting function
CREATE FUNCTION dbo.SplitStrings_CTE(#List nvarchar (1000), #Delimiter nvarchar(1 ))
RETURNS #returns TABLE(val nvarchar(100), [level] int, PRIMARY KEY CLUSTERED([level]))
AS
BEGIN
;WITH cte AS
(
SELECT SUBSTRING(#List, 0, CHARINDEX(#Delimiter , #List)) AS val ,
CAST(STUFF(#List + #Delimiter, 1, CHARINDEX(#Delimiter, #List),'') AS nvarchar (1000)) AS stval,
1 AS [level]
UNION ALL
SELECT SUBSTRING(stval, 0, CHARINDEX(#Delimiter, stval)),
CAST(STUFF(stval, 1 , CHARINDEX(#Delimiter ,stval), '') AS nvarchar(1000)),
[level] + 1
FROM cte
WHERE stval != ''
)
INSERT #returns
SELECT REPLACE(val ,' ' ,'') AS val, [level]
FROM cte
RETURN
END
Hence, your SELECT statement will be
SELECT *
FROM dbo.test82 t CROSS APPLY dbo.SplitStrings_CTE(t.NTWK_CDS, '|') o
WHERE o.val != ''
Demo on SQLFiddle

Related

String_Split on delimiter '.' SQL Server

I have an issue when parsing out a particular field of data, and I'm at a block on how to solve it, so I'm hoping I can gain some insight on how to solve it.
I have a field being brought [ItemCategory] that contains instances like...
Instance: TennisShoes.Laces
Instance: HikingBoot-Dr.Marten.Laces
(I cannot change the delimiter from '.' to '|' as I don't control the source)
the code being used to separate the instances is as follows:
SELECT
[Program] = LTRIM(RTRIM(LEFT(c.[ItemCategory], CHARINDEX('.',c.[ItemCategory] + '.') - 1)))
,[Category] = LTRIM(RTRIM(RIGHT(c.[ItemCategory],LEN(c.[ItemCategory]) - CHARINDEX('.',c.[ItemCategory]))))
So my issue when the DHikingBoot-Dr.Marten.Laces instance passes through the code it becomes.
[Program] = HikingBoot-Dr
[Category] = Marten.Laces
How would I make it to ignore the first '.' and delimit on the second '.', while still maintaining correctness for the first instance.
Thank you for your time.. any advice is helpful.
Give this one a try for grabbing the end.
RIGHT(c.[ItemCategory], CHARINDEX(REVERSE('.'), REVERSE(c.[ItemCategory])) -1)
I would suggest revisiting how you are storing this data, if you can, as it is flawed and will continue to give you challenges.
But that aside, this solution assumes "Category" will not include a period and the data will always end with .category
A few tweaks to what you had started, we'll use REVERSE() to basically determine the length of "Category" when using LEFT(). Then when we do "Program" we subtract that from the total length when using the RIGHT()
DECLARE #testdata TABLE
(
[sampledata] NVARCHAR(100)
);
INSERT INTO #testdata (
[sampledata]
)
VALUES ( N'TennisShoes.Laces' )
, ( 'HikingBoot-Dr.Marten.Laces' );
SELECT LEFT([sampledata], LEN([sampledata]) - CHARINDEX('.', REVERSE([sampledata]))) AS [Program]
,RIGHT([sampledata], CHARINDEX('.', REVERSE([sampledata])) -1) AS [Category]
FROM #testdata;
You can also use SUBTRING() along with REVERSE()
For category, reverse the data, find the first period, parse the
value and reverse it back.
For Program, reverse the data, go 1 past the first period to the end
and reverse it back.
DECLARE #testdata TABLE
(
[sampledata] NVARCHAR(100)
);
INSERT INTO #testdata (
[sampledata]
)
VALUES ( N'TennisShoes.Laces' )
, ( 'HikingBoot-Dr.Marten.Laces' );
SELECT REVERSE(SUBSTRING(REVERSE([sampledata]), CHARINDEX('.', REVERSE([sampledata])) + 1, LEN([sampledata]))) AS [Program]
, REVERSE(SUBSTRING(REVERSE([sampledata]), 1, CHARINDEX('.', REVERSE([sampledata])) - 1)) AS [Category]
FROM #testdata;
Both giving you results of:
Program Category
--------------------- ----------
TennisShoes Laces
HikingBoot-Dr.Marten Laces
If you need to select only last part after the ., then you can reverse the string, find charindex and do left and right with that position:
with s as (
select 'TennisShoes.Laces' as inst union
select 'HikingBoot-Dr.Marten.Laces' union
select 'Test'
)
, pos as (
select
s.*,
charindex('.', reverse(inst)) as pos
from s
)
select
ltrim(rtrim(left(inst, len(inst) - pos))) as program,
ltrim(rtrim(right(inst, nullif(pos - 1, -1)))) as category
from pos
program | category
:------------------- | :-------
HikingBoot-Dr.Marten | Laces
TennisShoes | Laces
Test | null
db<>fiddle

Order Concatenated field

I have a field which is a concatenation of single letters. I am trying to order these strings within a view. These values can't be hard coded as there are too many. Is someone able to provide some guidance on the function to use to achieve the desired output below? I am using MSSQL.
Current output
CustID | Code
123 | BCA
Desired output
CustID | Code
123 | ABC
I have tried using a UDF
CREATE FUNCTION [dbo].[Alphaorder] (#str VARCHAR(50))
returns VARCHAR(50)
BEGIN
DECLARE #len INT,
#cnt INT =1,
#str1 VARCHAR(50)='',
#output VARCHAR(50)=''
SELECT #len = Len(#str)
WHILE #cnt <= #len
BEGIN
SELECT #str1 += Substring(#str, #cnt, 1) + ','
SET #cnt+=1
END
SELECT #str1 = LEFT(#str1, Len(#str1) - 1)
SELECT #output += Sp_data
FROM (SELECT Split.a.value('.', 'VARCHAR(100)') Sp_data
FROM (SELECT Cast ('<M>' + Replace(#str1, ',', '</M><M>') + '</M>' AS XML) AS Data) AS A
CROSS APPLY Data.nodes ('/M') AS Split(a)) A
ORDER BY Sp_data
RETURN #output
END
This works when calling one field
ie.
Select CustID, dbo.alphaorder(Code)
from dbo.source
where custid = 123
however when i try to apply this to top(10) i receive the error
"Invalid length parameter passed to the LEFT or SUBSTRING function."
Keeping in mind my source has ~4million records, is this still the best solution?
Unfortunately i am not able to normalize the data into a separate table with records for each Code.
This doesn't rely on a id column to join with itself, performance is almost as fast
as the answer by #Shnugo:
SELECT
CustID,
(
SELECT
chr
FROM
(SELECT TOP(LEN(Code))
SUBSTRING(Code,ROW_NUMBER() OVER(ORDER BY (SELECT NULL)),1)
FROM sys.messages) A(Chr)
ORDER by chr
FOR XML PATH(''), type).value('.', 'varchar(max)'
) As CODE
FROM
source t
First of all: Avoid loops...
You can try this:
DECLARE #tbl TABLE(ID INT IDENTITY, YourString VARCHAR(100));
INSERT INTO #tbl VALUES ('ABC')
,('JSKEzXO')
,('QKEvYUJMKRC');
--the cte will create a list of all your strings separated in single characters.
--You can check the output with a simple SELECT * FROM SeparatedCharacters instead of the actual SELECT
WITH SeparatedCharacters AS
(
SELECT *
FROM #tbl
CROSS APPLY
(SELECT TOP(LEN(YourString)) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM master..spt_values) A(Nmbr)
CROSS APPLY
(SELECT SUBSTRING(YourString,Nmbr,1))B(Chr)
)
SELECT ID,YourString
,(
SELECT Chr As [*]
FROM SeparatedCharacters sc1
WHERE sc1.ID=t.ID
ORDER BY sc1.Chr
FOR XML PATH(''),TYPE
).value('.','nvarchar(max)') AS Sorted
FROM #tbl t;
The result
ID YourString Sorted
1 ABC ABC
2 JSKEzXO EJKOSXz
3 QKEvYUJMKRC CEJKKMQRUvY
The idea in short
The trick is the first CROSS APPLY. This will create a tally on-the-fly. You will get a resultset with numbers from 1 to n where n is the length of the current string.
The second apply uses this number to get each character one-by-one using SUBSTRING().
The outer SELECT calls from the orginal table, which means one-row-per-ID and use a correalted sub-query to fetch all related characters. They will be sorted and re-concatenated using FOR XML. You might add DISTINCT in order to avoid repeating characters.
That's it :-)
Hint: SQL-Server 2017+
With version v2017 there's the new function STRING_AGG(). This would make the re-concatenation very easy:
WITH SeparatedCharacters AS
(
SELECT *
FROM #tbl
CROSS APPLY
(SELECT TOP(LEN(YourString)) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM master..spt_values) A(Nmbr)
CROSS APPLY
(SELECT SUBSTRING(YourString,Nmbr,1))B(Chr)
)
SELECT ID,YourString
,STRING_AGG(sc.Chr,'') WITHIN GROUP(ORDER BY sc.Chr) AS Sorted
FROM SeparatedCharacters sc
GROUP BY ID,YourString;
Considering your table having good amount of rows (~4 Million), I would suggest you to create a persisted calculated field in the table, to store these values. As calculating these values at run time in a view, will lead to performance problems.
If you are not able to normalize, add this as a denormalized column to the existing table.
I think the error you are getting could be due to empty codes.
If LEN(#str) = 0
BEGIN
SET #output = ''
END
ELSE
BEGIN
... EXISTING CODE BLOCK ...
END
I can suggest to split string into its characters using referred SQL function.
Then you can concatenate string back, this time ordered alphabetically.
Are you using SQL Server 2017? Because with SQL Server 2017, you can use SQL String_Agg string aggregation function to concatenate characters splitted in an ordered way as follows
select
t.CustId, string_agg(strval, '') within GROUP (order by strval)
from CharacterTable t
cross apply dbo.SPLIT(t.code) s
where strval is not null
group by CustId
order by CustId
If you are not working on SQL2017, then you can follow below structure using SQL XML PATH for concatenation in SQL
select
CustId,
STUFF(
(
SELECT
'' + strval
from CharacterTable ct
cross apply dbo.SPLIT(t.code) s
where strval is not null
and t.CustId = ct.CustId
order by strval
FOR XML PATH('')
), 1, 0, ''
) As concatenated_string
from CharacterTable t
order by CustId

Extract one value from a column containing multiple delimited values

How can I get the value from the sixth field in the following column? I am trying to get the 333 field:
ORGPATHTXT
2123/2322/12323/111/222/333/3822
I believe I have to use select substring, but am unsure how to format the query
Assuming SQL Server
The easiest way I can think of is create a Split function that splits based on '/' and you extract the sixth item like below
declare #text varchar(50) = '2123/2322/12323/111/222/333/3822'
select txt_value from fn_ParseText2Table(#text, '/') t where t.Position = 6
I used the function in this url. See it worked at SQLFiddle
Try this - for a string variable or wrap into a function to use with a select query (Sql-Demo)
Declare #s varchar(50)='2123/2322/12323/111/222/333/3822'
Select #s = right(#s,len(#s)- case charindex('/',#s,1) when 0 then len(#s)
else charindex('/',#s,1) end)
From ( values (1),(2),(3),(4),(5)) As t(num)
Select case when charindex('/',#s,1)>0 then left(#s,charindex('/',#s,1)-1)
else #s end
--Results
333
I'd like to offer a solution that uses CROSS APPLY to split up any delimited string in MSSQL and ROW_NUMBER() to return the 6th element. This assumes you have a table with ORGPATHTXT as a field (it can easily be converted to work without the table though):
SELECT ORGPATHTXT
FROM (
SELECT
Split.a.value('.', 'VARCHAR(100)') AS ORGPATHTXT,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY (SELECT 1)) RN
FROM
(SELECT ID, CAST ('<M>' + REPLACE(ORGPATHTXT, '/', '</M><M>') + '</M>' AS XML) AS String
FROM MyTable
) AS A
CROSS APPLY String.nodes ('/M') AS Split(a)
) t
WHERE t.RN = 6;
Here is some sample Fiddle to go along with it.
Good luck.
For sql, you can use
declare #string varchar(65) = '2123/2322/12323/111/222/333/3822'
select substring(string,25,27) from table_name
If you are using MySQL, then you can use:
select substring_index(orgpathtxt, '/', 6)
Let me just say that it is less convenient in most other databases.
Also you can use option with dynamic management function sys.dm_fts_parser
DECLARE #s nvarchar(50) = '2123/2322/12323/111/222/333/3822'
SELECT display_term
FROM sys.dm_fts_parser('"'+ #s + '"', 1033, NULL, 0)
WHERE display_term NOT LIKE 'nn%' AND occurrence = 6

Strip non-numeric characters from a string

I'm currently doing a data conversion project and need to strip all alphabetical characters from a string. Unfortunately I can't create or use a function as we don't own the source machine making the methods I've found from searching for previous posts unusable.
What would be the best way to do this in a select statement? Speed isn't too much of an issue as this will only be running over 30,000 records or so and is a once off statement.
You can do this in a single statement. You're not really creating a statement with 200+ REPLACEs are you?!
update tbl
set S = U.clean
from tbl
cross apply
(
select Substring(tbl.S,v.number,1)
-- this table will cater for strings up to length 2047
from master..spt_values v
where v.type='P' and v.number between 1 and len(tbl.S)
and Substring(tbl.S,v.number,1) like '[0-9]'
order by v.number
for xml path ('')
) U(clean)
Working SQL Fiddle showing this query with sample data
Replicated below for posterity:
create table tbl (ID int identity, S varchar(500))
insert tbl select 'asdlfj;390312hr9fasd9uhf012 3or h239ur ' + char(13) + 'asdfasf'
insert tbl select '123'
insert tbl select ''
insert tbl select null
insert tbl select '123 a 124'
Results
ID S
1 390312990123239
2 123
3 (null)
4 (null)
5 123124
CTE comes for HELP here.
;WITH CTE AS
(
SELECT
[ProductNumber] AS OrigProductNumber
,CAST([ProductNumber] AS VARCHAR(100)) AS [ProductNumber]
FROM [AdventureWorks].[Production].[Product]
UNION ALL
SELECT OrigProductNumber
,CAST(STUFF([ProductNumber], PATINDEX('%[^0-9]%', [ProductNumber]), 1, '') AS VARCHAR(100) ) AS [ProductNumber]
FROM CTE WHERE PATINDEX('%[^0-9]%', [ProductNumber]) > 0
)
SELECT * FROM CTE
WHERE PATINDEX('%[^0-9]%', [ProductNumber]) = 0
OPTION (MAXRECURSION 0)
output:
OrigProductNumber ProductNumber
WB-H098 098
VE-C304-S 304
VE-C304-M 304
VE-C304-L 304
TT-T092 092
RichardTheKiwi's script in a function for use in selects without cross apply,
also added dot because in my case I use it for double and money values within a varchar field
CREATE FUNCTION dbo.ReplaceNonNumericChars (#string VARCHAR(5000))
RETURNS VARCHAR(1000)
AS
BEGIN
SET #string = REPLACE(#string, ',', '.')
SET #string = (SELECT SUBSTRING(#string, v.number, 1)
FROM master..spt_values v
WHERE v.type = 'P'
AND v.number BETWEEN 1 AND LEN(#string)
AND (SUBSTRING(#string, v.number, 1) LIKE '[0-9]'
OR SUBSTRING(#string, v.number, 1) LIKE '[.]')
ORDER BY v.number
FOR
XML PATH('')
)
RETURN #string
END
GO
Thanks RichardTheKiwi +1
Well if you really can't use a function, I suppose you could do something like this:
SELECT REPLACE(REPLACE(REPLACE(LOWER(col),'a',''),'b',''),'c','')
FROM dbo.table...
Obviously it would be a lot uglier than that, since I only handled the first three letters, but it should give the idea.

TSQL - Querying a table column to pull out popular words for a tag cloud

Just an exploratory question to see if anyone has done this or if, in fact it is at all possible.
We all know what a tag cloud is, and usually, a tag cloud is created by someone assigning tags. Is it possible, within the current features of SQL Server to create this automatically, maybe via trigger when a table has a record added or updated, by looking at the data within a certain column and getting popular words?
It is similar to this question: How can I get the most popular words in a table via mysql?. But, that is MySQL not MSSQL.
Thanks in advance.
James
Here is a good bit on parsing delimited string into rows:
http://anyrest.wordpress.com/2010/08/13/converting-parsing-delimited-string-column-in-sql-to-rows/
http://www.sqlteam.com/article/parsing-csv-values-into-multiple-rows
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=50648
T-SQL: Opposite to string concatenation - how to split string into multiple records
If you want to parse all words, you can use the space ' ' as your delimiter, Then you get a row for each word.
Next you would simply select the result set GROUPing by the word and aggregating the COUNT
Order your results and you're there.
IMO, the design approach is what makes this difficult. Just because you allow users to assign tags does not mean the tags must be stored as a single delimited list of words. You can normalize the structure into something like:
Create Table Posts ( Id ... not null primary key )
Create Table Tags( Id ... not null primary key, Name ... not null Unique )
Create Table PostTags
( PostId ... not null References Posts( Id )
, TagId ... not null References Tags( Id ) )
Now your question becomes trivial:
Select T.Id, T.Name, Count(*) As TagCount
From PostTags As PT
Join Tags As T
On T.Id = PT.TagId
Group By T.Id, T.Name
Order By Count(*) Desc
If you insist on storing tags as delimited values, then only solution is to split the values on their delimiter by writing a custom Split function and then do your count. At the bottom is an example of a Split function. With it your query would look something like (using a comma delimiter):
Select Tag.Value, Count(*) As TagCount
From Posts As P
Cross Apply dbo.Split( P.Tags, ',' ) As Tag
Group By Tag.Value
Order By Count(*) Desc
Split Function:
Create Function [dbo].[Split]
(
#DelimitedList nvarchar(max)
, #Delimiter nvarchar(2) = ','
)
RETURNS TABLE
AS
RETURN
(
With CorrectedList As
(
Select Case When Left(#DelimitedList, DataLength(#Delimiter)/2) <> #Delimiter Then #Delimiter Else '' End
+ #DelimitedList
+ Case When Right(#DelimitedList, DataLength(#Delimiter)/2) <> #Delimiter Then #Delimiter Else '' End
As List
, DataLength(#Delimiter)/2 As DelimiterLen
)
, Numbers As
(
Select TOP (Coalesce(Len(#DelimitedList),1)) Row_Number() Over ( Order By c1.object_id ) As Value
From sys.objects As c1
Cross Join sys.columns As c2
)
Select CharIndex(#Delimiter, CL.list, N.Value) + CL.DelimiterLen As Position
, Substring (
CL.List
, CharIndex(#Delimiter, CL.list, N.Value) + CL.DelimiterLen
, Case
When CharIndex(#Delimiter, CL.list, N.Value + 1)
- CharIndex(#Delimiter, CL.list, N.Value)
- CL.DelimiterLen < 0 Then Len(CL.List)
Else CharIndex(#Delimiter, CL.list, N.Value + 1)
- CharIndex(#Delimiter, CL.list, N.Value)
- CL.DelimiterLen
End
) As Value
From CorrectedList As CL
Cross Join Numbers As N
Where N.Value < Len(CL.List)
And Substring(CL.List, N.Value, CL.DelimiterLen) = #Delimiter
)
Word or Tag clouds need two fields: a string and a value of how many times that word or string appeared in your collection. You can then pass the results into a tag cloud tool that will display the data as you require.
Not to take away from the previous answers, as they do answer the original challenge. However, I have a simpler solution using two functions (similar to #Thomas answer), one of which uses regex to "clean" the words.
The two functions are:
dbo.fnStripChars(a, b) --use regex 'b' to cleanse a string 'a'
dbo.fnMakeTableFromList(a, b) --convert a single field 'a' into a tabled list, delimited by 'b'
I then apply them into a single SQL statement, using the TOP n feature to give me the top 10 words I want to pass onto PowerBI or some other graphical tool, for actually displaying a word or tag cloud.
SELECT TOP 10 b.[words], b.[total]
FROM
(SELECT a.[words], count(*) AS [total]
FROM
(SELECT upper(l.item) AS [words]
FROM dbo.MyTableWithWords AS c
CROSS APPLY POTS.fnMakeTableFromList([POTS].fnStripChars(c.myColumnThatHasTheWords,'[^a-zA-Z ]'),' ') AS l) AS a
GROUP BY a.[words]) AS b
ORDER BY 2 DESC
As you can see, the regex is [^a-zA-Z ], which is to give me only alphabetical characters and spaces. The space is then used as a delimiter to the make table function to separate each word individually. I apply a count(*), to give me the number of times that word appears, hence then I have everything I need to give me the TOP 10 results.
Note that CROSS APPLY is important here so I get only data with actual "words" in each record found. Otherwise it will go through every record with or without words to extract from the column I want.
fnStripChars()
FUNCTION [dbo].[fnStripChars]
(
#String NVARCHAR(4000),
#MatchExpression VARCHAR(255)
)
RETURNS NVARCHAR(MAX)
AS
BEGIN
SET #MatchExpression = '%' + #MatchExpression + '%'
WHILE PatIndex(#MatchExpression, #String) > 0
SET #String = Stuff(#String, PatIndex(#MatchExpression, #String), 1, '')
RETURN #String
END
fnMakeTableFromList()
FUNCTION [dbo].[fnMakeTableFromList](
#List VARCHAR(MAX),
#Delimiter CHAR(1))
RETURNS TABLE
AS
RETURN (SELECT Item = CONVERT(VARCHAR, Item)
FROM (SELECT Item = x.i.value('(./text())[1]','varchar(max)')
FROM (SELECT [XML] = CONVERT(XML,'<i>' + REPLACE(#List,#Delimiter,'</i><i>') + '</i>').query('.')) AS a
CROSS APPLY [XML].nodes('i') AS x(i)) AS y
WHERE Item IS NOT NULL);
I've tested this with over 400K records and it's able to come back with my results in under 60 seconds. I think that's reasonable.