How do I expand comma separated values into separate rows using SQL Server 2005? - sql

I have a table that looks like this:
ProductId, Color
"1", "red, blue, green"
"2", null
"3", "purple, green"
And I want to expand it to this:
ProductId, Color
1, red
1, blue
1, green
2, null
3, purple
3, green
Whats the easiest way to accomplish this? Is it possible without a loop in a proc?

Take a look at this function. I've done similar tricks to split and transpose data in Oracle. Loop over the data inserting the decoded values into a temp table. The convent thing is that MS will let you do this on the fly, while Oracle requires an explicit temp table.
MS SQL Split Function
Better Split Function
Edit by author:
This worked great. Final code looked like this (after creating the split function):
select pv.productid, colortable.items as color
from product p
cross apply split(p.color, ',') as colortable

based on your tables:
create table test_table
(
ProductId int
,Color varchar(100)
)
insert into test_table values (1, 'red, blue, green')
insert into test_table values (2, null)
insert into test_table values (3, 'purple, green')
create a new table like this:
CREATE TABLE Numbers
(
Number int not null primary key
)
that has rows containing values 1 to 8000 or so.
this will return what you want:
EDIT
here is a much better query, slightly modified from the great answer from #Christopher Klein:
I added the "LTRIM()" so the spaces in the color list, would be handled properly: "red, blue, green". His solution requires no spaces "red,blue,green". Also, I prefer to use my own Number table and not use master.dbo.spt_values, this allows the removal of one derived table too.
SELECT
ProductId, LEFT(PartialColor, CHARINDEX(',', PartialColor + ',')-1) as SplitColor
FROM (SELECT
t.ProductId, LTRIM(SUBSTRING(t.Color, n.Number, 200)) AS PartialColor
FROM test_table t
LEFT OUTER JOIN Numbers n ON n.Number<=LEN(t.Color) AND SUBSTRING(',' + t.Color, n.Number, 1) = ','
) t
EDIT END
SELECT
ProductId, Color --,number
FROM (SELECT
ProductId
,CASE
WHEN LEN(List2)>0 THEN LTRIM(RTRIM(SUBSTRING(List2, number+1, CHARINDEX(',', List2, number+1)-number - 1)))
ELSE NULL
END AS Color
,Number
FROM (
SELECT ProductId,',' + Color + ',' AS List2
FROM test_table
) AS dt
LEFT OUTER JOIN Numbers n ON (n.Number < LEN(dt.List2)) OR (n.Number=1 AND dt.List2 IS NULL)
WHERE SUBSTRING(List2, number, 1) = ',' OR List2 IS NULL
) dt2
ORDER BY ProductId, Number, Color
here is my result set:
ProductId Color
----------- --------------
1 red
1 blue
1 green
2 NULL
3 purple
3 green
(6 row(s) affected)
which is the same order you want...

You can try this out, doesnt require any additional functions:
declare #t table (col1 varchar(10), col2 varchar(200))
insert #t
select '1', 'red,blue,green'
union all select '2', NULL
union all select '3', 'green,purple'
select col1, left(d, charindex(',', d + ',')-1) as e from (
select *, substring(col2, number, 200) as d from #t col1 left join
(select distinct number from master.dbo.spt_values where number between 1 and 200) col2
on substring(',' + col2, number, 1) = ',') t

I arrived this question 10 years after the post.
SQL server 2016 added STRING_SPLIT function.
By using that, this can be written as below.
declare #product table
(
ProductId int,
Color varchar(max)
);
insert into #product values (1, 'red, blue, green');
insert into #product values (2, null);
insert into #product values (3, 'purple, green');
select
p.ProductId as ProductId,
ltrim(split_table.value) as Color
from #product p
outer apply string_split(p.Color, ',') as split_table;

Fix your database if at all possible. Comma delimited lists in database cells indicate a flawed schema 99% of the time or more.

I would create a CLR table-defined function for this:
http://msdn.microsoft.com/en-us/library/ms254508(VS.80).aspx
The reason for this is that CLR code is going to be much better at parsing apart the strings (computational work) and can pass that information back as a set, which is what SQL Server is really good at (set management).
The CLR function would return a series of records based on the parsed values (and the input id value).
You would then use a CROSS APPLY on each element in your table.

Just convert your columns into xml and query it. Here's an example.
select
a.value('.', 'varchar(42)') c
from (select cast('<r><a>' + replace(#CSV, ',', '</a><a>') + '</a></r>' as xml) x) t1
cross apply x.nodes('//r/a') t2(a)

Why not use dynamic SQL for this purpose, something like this(adapt to your needs):
DECLARE #dynSQL VARCHAR(max)
SET #dynSQL = 'insert into DestinationTable(field) values'
select #dynSQL = #dynSQL + '('+ REPLACE(Color,',',''',''') + '),' from Table
SET #dynSql = LEFT(#dynSql,LEN(#dynSql) -1) -- delete the last comma
exec #dynSql
One advantage is that you can use it on any SQL Server version

Related

Normalize sql string combinations

I have the following table:
CREATE TABLE #Fruits
(
Fruits VARCHAR(100)
)
INSERT INTO #Fruits (Fruits)
VALUES ( 'banana,apple'),
('apple,banana'),
('kiwi,jackfruit'),
('jackfruit, kiwi')
banana,apple
apple,banana
kiwi,jackfruit
jackfruit, kiwi
I want to add one more column where I am taking the values separated by comma in each row and rearrange them alphabetically. I am trying to normalize the values because for my purpose apple,banana & banana,apple are the same things. And kiwi,jackfruit & jackfruit,kiwi are the same things. The output should look like the following:
Fruits Normalized_Fruits
banana,apple apple,banana
apple,banana apple,banana
kiwi,jackfruit jackfruit, kiwi
jackfruit, kiwi jackfruit, kiwi
How can I accomplish the desired result?
One of my biggest complaints about string_split is it lacks the ordinal position of each value. That makes situations like this one a lot easier to work with. Here is another approach to this. I am using the splitter from Jeff Moden which can be found here. There really is no need for a cursor here.
I also took the liberty of adding a GroupID column so you know which row each value belongs to once you parse them out. If the Fruits column is unique you could skip that but hard to tell for sure.
CREATE TABLE #Fruits
(
GroupID int identity
, Fruits VARCHAR(100)
)
INSERT INTO #Fruits (Fruits)
VALUES ( 'banana,apple'),
('apple,banana'),
('kiwi,jackfruit'),
('jackfruit, kiwi')
;
with SortedResults as
(
select f.GroupID
, Item = ltrim(x.Item)
, x.ItemNumber
, RowNum = ROW_NUMBER() over(partition by GroupID order by ltrim(x.Item))
from #Fruits f
cross apply dbo.DelimitedSplit8K(f.Fruits, ',') x
)
select Max(case when RowNum = 1 then Item end) + ', ' + max(case when RowNum = 2 then Item end)
from SortedResults
group by GroupID
drop table #Fruits
Give this a shot... I'm probably going to get nailed by non-Cursor folks, but this is what I came up with.
CREATE TABLE #Fruits
(
Fruits VARCHAR(100)
)
INSERT INTO #Fruits (Fruits)
VALUES ( 'banana,apple'),
('apple,banana'),
('kiwi,jackfruit'),
('jackfruit, kiwi')
Declare #tblFruit Table (Fruit1 varchar(100))
Declare #tblFruitSorted Table (FruitSorted varchar(100))
Declare fCursor Cursor For
Select Fruits From #Fruits
Declare #Fruitunsorted varchar(100), #FruitSorted Varchar(100) = ''
Open fCursor
Fetch Next From fCursor Into #Fruitunsorted
While ##FETCH_STATUS = 0
BEGIN
Set #FruitSorted = ''
Insert Into #tblFruit
Select * From string_split(#Fruitunsorted,',')
Update #tblFruit Set Fruit1 = Ltrim(Rtrim(Fruit1))
Select #FruitSorted = #FruitSorted + ',' + Ltrim(Rtrim(Fruit1)) From #tblFruit Order by Fruit1
Delete From #tblFruit
Insert Into #tblFruitSorted
Select Substring(#FruitSorted,2, LEN(#FruitSorted)-1)
Fetch Next From fCursor into #Fruitunsorted
END
Close fCursor
Deallocate fCursor
Select * From #tblFruitSorted
Drop Table #Fruits
If you're using SQL 2017 or higher:
SELECT f.Fruits
,STRING_AGG(RTRIM(LTRIM(s.[value])),',') WITHIN GROUP (ORDER BY RTRIM(LTRIM(s.[value])))
FROM #Fruits f CROSS APPLY STRING_SPLIT(f.Fruits,',') s
GROUP BY f.Fruits
;
If you're using older versions of SQL (like 2008):
IF OBJECT_ID('tempdb..#Fruits') IS NOT NULL DROP TABLE #Fruits;
CREATE TABLE #Fruits(Fruits VARCHAR(100));
INSERT INTO #Fruits (Fruits) VALUES
('banana,apple'),
('apple,banana'),
('kiwi,jackfruit'),
('jackfruit, kiwi')
;
;WITH Split AS (
SELECT DISTINCT a.Fruits,RTRIM(LTRIM(tbl.col.value ('#Value', 'nvarchar(max)'))) AS [Fruit]
FROM (SELECT f.Fruits,CONVERT(XML,'<N Value="' + REPLACE(f.Fruits,',','"></N><N Value="') + '"></N>') AS [x] FROM #Fruits f) a
CROSS APPLY a.x.nodes('//N') AS tbl (col)
)
SELECT r.Fruits,STUFF((SELECT ',' + s.Fruit FROM Split s WHERE s.Fruits = r.Fruits ORDER BY s.Fruit FOR XML PATH(''),TYPE).value('(./text())[1]','VARCHAR(MAX)'),1,1,'') AS [NormalizedFruits]
FROM #Fruits r
;
IF OBJECT_ID('tempdb..#Fruits') IS NOT NULL DROP TABLE #Fruits;
Since we're only talking about a comma delimited list of two purely alphabetic strings, I'll just throw out PARSENAME again for fun and conciseness. The trims are there because of the inconsistent use of spaces in the source data and the ELSE could be shorter, but I wanted the results to be consistent.
SELECT
Fruits
,CASE
WHEN LTRIM(RTRIM(PARSENAME(REPLACE(Fruits,',','.'),2))) > LTRIM(RTRIM(PARSENAME(REPLACE(Fruits,',','.'),1)))
THEN LTRIM(RTRIM(PARSENAME(REPLACE(Fruits,',','.'),1))) + ', ' + LTRIM(RTRIM(PARSENAME(REPLACE(Fruits,',','.'),2)))
ELSE LTRIM(RTRIM(PARSENAME(REPLACE(Fruits,',','.'),2))) + ', ' + LTRIM(RTRIM(PARSENAME(REPLACE(Fruits,',','.'),1)))
END AS Normalized_Fruits
FROM #Fruits
And - just for fun - one more solution calling XQuery to the rescue.
DECLARE #Fruits TABLE(Fruits VARCHAR(100));
INSERT INTO #Fruits (Fruits) VALUES
('banana,apple'),
('apple,banana'),
('kiwi,jackfruit'),
('jackfruit, kiwi');
--This is the query
SELECT f.*
,CAST('<x>' + REPLACE(REPLACE(f.Fruits,' ',''),',','</x><x>') + '</x>' AS XML)
.query('
for $f in /x/text()
order by $f
return <y>{concat(",",$f)}</y>
')
.value('substring(.,2,1000)','nvarchar(max)')
FROM #Fruits f;
By using for $f in distinct-values(/x/text()) instead of for $f in /x/text() we would supress repeating words.
In short:
Your string is converted to XML. This allows for .query(), which can deal with XQuery. This is very mighty with rather generic issues. The words are sorted and returned with a leading comma. The final substring() is needed to cut away the first leading comma.

Using data within an extracted text string for an inner join

Newbie so please be easy on me.
I have a select statement which returns a text string
select [column name]from [table] where [column name] like '%dog%'
simple enough, returns a result much like below
random text dog '123' more random text
random text dog '345' more random text
random text dog '723' more random text ...
I am looking to extract the 123, 345, 723 part of the text string which I can kind of do through using
Declare #Text Varchar(100);
Set #Text = 'random text dog ''123'' more random text';
Select Left(Substring(#Text, Patindex('%''%', #Text) + 1, Len(#Text) - Patindex('%''%', #Text))
,Patindex('%''%', Substring(#Text, Patindex('%''%', #Text) + 1, Len(#Text) - Patindex('%''%', #Text)))- 1) 'Lookup Index'
I am then wanting to use the result as part of inner join to another table to return a result along the lines of
Lookup Index Colour
123 Blue
345 Green
723 Orange
I just cant seem to tie it all together so any help much appreciated.
Thanking you all in advance.
Here's an option for you.
Load the parsed data into a temp table, then join that back to your lookup table. I would be easier and would probably perform better.
Have a look at this. There's an example with some test data all using temp tables. You'd obviously need to adjust for your specific tables and requirements:
--This will be our table that has the data you want to parse
CREATE TABLE #TestData (
TextColumn nvarchar(1000)
)
--lookup table for colours
CREATE TABLE #LookUp(
LookUpIndex INT
,Colour NVARCHAR(100)
)
--Use a temp table to load the parsed data from #TestData
CREATE TABLE #TestDataParse (
LookUpIndex INT
,TextColumn NVARCHAR(1000)
)
--Load our test data
INSERT INTO [#TestData] ([TextColumn])
VALUES('random text dog ''123'' more random text')
,('random text dog ''345'' more random text')
,('random text dog ''723'' more random text')
--populate our lookup table
INSERT INTO [#LookUp] (
[LookUpIndex]
, [Colour]
)
VALUES(123, 'Blue')
,(345, 'Green')
,(723 , 'Orange')
--Now parse the LookUp number out and load that to our temp table
INSERT INTO [#TestDataParse] (
[LookUpIndex]
, [TextColumn]
)
SELECT
Left(Substring(TextColumn, Patindex('%''%', TextColumn) + 1, Len(TextColumn) - Patindex('%''%', TextColumn))
,Patindex('%''%', Substring(TextColumn, Patindex('%''%', TextColumn) + 1, Len(TextColumn) - Patindex('%''%', TextColumn)))- 1) AS LookUpIndex
,[TextColumn]
FROM [#TestData]
--Now we can join that back to the lookup table
SELECT *
FROM [#TestDataParse] [a]
INNER JOIN [#LookUp] [b]
ON [b].[LookUpIndex] = [a].[LookUpIndex];
--Example done
--Drop our #temp tables
DROP TABLE [#LookUp]
DROP TABLE [#TestData]
DROP TABLE [#TestDataParse]
or you could skip loading into a temp table and use a sub query to simplify the join back to your lookup table, something like:
SELECT * FROM (
SELECT
Left(Substring(TextColumn, Patindex('%''%', TextColumn) + 1, Len(TextColumn) - Patindex('%''%', TextColumn))
,Patindex('%''%', Substring(TextColumn, Patindex('%''%', TextColumn) + 1, Len(TextColumn) - Patindex('%''%', TextColumn)))- 1) AS LookUpIndex
,[TextColumn]
FROM [#TestData] a
) AS tb
INNER JOIN [#LookUp] b ON [b].[LookUpIndex] = [tb].[LookUpIndex]
I'd test each one and see which performs the best for you.
This is a pain but you can do:
select t.col, left(v.dogv, charindex('''', v.dogv) - 1)
from (values ('random text dog ''123'' more random text')) t(col) cross apply
(values (stuff(col, 1, charindex('dog', col) + 4, ''))
) v(dogv)
where col like '%dog%';

Efficient way to merge alternating values from two columns into one column in SQL Server

I have two columns in a table. I want to merge them into a single column, but the merge should be done taking alternate characters from each columns.
For example:
Column A --> value (1,2,3)
Column B --> value (A,B,C)
Required result - (1,A,2,B,3,C)
It should be done without loops.
You need to make use of the UNION and get a little creative with how you choose to alternate. My solution ended up looking like this.
SELECT ColumnA
FROM Table
WHERE ColumnA%2=1
UNION
SELECT ColumnB
FROM TABLE
WHERE ColumnA%2=0
If you have an ID/PK column that could just as easily be used, I just didn't want to assume anything about your table.
EDIT:
If your table contains duplicates that you wish to keep, use UNION ALL instead of UNION
Try This;
SELECT [value]
FROM [Table]
UNPIVOT
(
[value] FOR [Column] IN ([Column_A], [Column_B])
) UNPVT
If you have SQL 2016 or higher you can use:
SELECT QUOTENAME(STRING_AGG (cast(a as varchar(1)) + ',' + b, ','), '()')
FROM test;
In older versions, depending on how much data you have in your tables you can also try:
SELECT QUOTENAME(STUFF(
(SELECT ',' + cast(a as varchar(1)) + ',' + b
FROM test
FOR XML PATH('')), 1, 1,''), '()')
Here you can try a sample
http://sqlfiddle.com/#!18/6c9af/5
with data as (
select *, row_number() over order by colA) as rn
from t
)
select rn,
case rn % 2 when 1 then colA else colB end as alternating
from data;
The following SQL uses undocumented aggregate concatenation technique. This is described in Inside Microsoft SQL Server 2008 T-SQL Programming on page 33.
declare #x varchar(max) = '';
declare #t table (a varchar(10), b varchar(10));
insert into #t values (1,'A'), (2,'B'),(3,'C');
select #x = #x + a + ',' + b + ','
from #t;
select '(' + LEFT(#x, LEN(#x) - 1) + ')';

Split string and display below other column data using SQL Server [duplicate]

I have a table that looks like this:
ProductId, Color
"1", "red, blue, green"
"2", null
"3", "purple, green"
And I want to expand it to this:
ProductId, Color
1, red
1, blue
1, green
2, null
3, purple
3, green
Whats the easiest way to accomplish this? Is it possible without a loop in a proc?
Take a look at this function. I've done similar tricks to split and transpose data in Oracle. Loop over the data inserting the decoded values into a temp table. The convent thing is that MS will let you do this on the fly, while Oracle requires an explicit temp table.
MS SQL Split Function
Better Split Function
Edit by author:
This worked great. Final code looked like this (after creating the split function):
select pv.productid, colortable.items as color
from product p
cross apply split(p.color, ',') as colortable
based on your tables:
create table test_table
(
ProductId int
,Color varchar(100)
)
insert into test_table values (1, 'red, blue, green')
insert into test_table values (2, null)
insert into test_table values (3, 'purple, green')
create a new table like this:
CREATE TABLE Numbers
(
Number int not null primary key
)
that has rows containing values 1 to 8000 or so.
this will return what you want:
EDIT
here is a much better query, slightly modified from the great answer from #Christopher Klein:
I added the "LTRIM()" so the spaces in the color list, would be handled properly: "red, blue, green". His solution requires no spaces "red,blue,green". Also, I prefer to use my own Number table and not use master.dbo.spt_values, this allows the removal of one derived table too.
SELECT
ProductId, LEFT(PartialColor, CHARINDEX(',', PartialColor + ',')-1) as SplitColor
FROM (SELECT
t.ProductId, LTRIM(SUBSTRING(t.Color, n.Number, 200)) AS PartialColor
FROM test_table t
LEFT OUTER JOIN Numbers n ON n.Number<=LEN(t.Color) AND SUBSTRING(',' + t.Color, n.Number, 1) = ','
) t
EDIT END
SELECT
ProductId, Color --,number
FROM (SELECT
ProductId
,CASE
WHEN LEN(List2)>0 THEN LTRIM(RTRIM(SUBSTRING(List2, number+1, CHARINDEX(',', List2, number+1)-number - 1)))
ELSE NULL
END AS Color
,Number
FROM (
SELECT ProductId,',' + Color + ',' AS List2
FROM test_table
) AS dt
LEFT OUTER JOIN Numbers n ON (n.Number < LEN(dt.List2)) OR (n.Number=1 AND dt.List2 IS NULL)
WHERE SUBSTRING(List2, number, 1) = ',' OR List2 IS NULL
) dt2
ORDER BY ProductId, Number, Color
here is my result set:
ProductId Color
----------- --------------
1 red
1 blue
1 green
2 NULL
3 purple
3 green
(6 row(s) affected)
which is the same order you want...
You can try this out, doesnt require any additional functions:
declare #t table (col1 varchar(10), col2 varchar(200))
insert #t
select '1', 'red,blue,green'
union all select '2', NULL
union all select '3', 'green,purple'
select col1, left(d, charindex(',', d + ',')-1) as e from (
select *, substring(col2, number, 200) as d from #t col1 left join
(select distinct number from master.dbo.spt_values where number between 1 and 200) col2
on substring(',' + col2, number, 1) = ',') t
I arrived this question 10 years after the post.
SQL server 2016 added STRING_SPLIT function.
By using that, this can be written as below.
declare #product table
(
ProductId int,
Color varchar(max)
);
insert into #product values (1, 'red, blue, green');
insert into #product values (2, null);
insert into #product values (3, 'purple, green');
select
p.ProductId as ProductId,
ltrim(split_table.value) as Color
from #product p
outer apply string_split(p.Color, ',') as split_table;
Fix your database if at all possible. Comma delimited lists in database cells indicate a flawed schema 99% of the time or more.
I would create a CLR table-defined function for this:
http://msdn.microsoft.com/en-us/library/ms254508(VS.80).aspx
The reason for this is that CLR code is going to be much better at parsing apart the strings (computational work) and can pass that information back as a set, which is what SQL Server is really good at (set management).
The CLR function would return a series of records based on the parsed values (and the input id value).
You would then use a CROSS APPLY on each element in your table.
Just convert your columns into xml and query it. Here's an example.
select
a.value('.', 'varchar(42)') c
from (select cast('<r><a>' + replace(#CSV, ',', '</a><a>') + '</a></r>' as xml) x) t1
cross apply x.nodes('//r/a') t2(a)
Why not use dynamic SQL for this purpose, something like this(adapt to your needs):
DECLARE #dynSQL VARCHAR(max)
SET #dynSQL = 'insert into DestinationTable(field) values'
select #dynSQL = #dynSQL + '('+ REPLACE(Color,',',''',''') + '),' from Table
SET #dynSql = LEFT(#dynSql,LEN(#dynSql) -1) -- delete the last comma
exec #dynSql
One advantage is that you can use it on any SQL Server version

Calculate TF-IDF using Sql

I have a table in my DB containning a free text field column.
I would like to know the frequency each word appears over all the rows, or maybe even calc a TF-IDF for all words, where my documents are that field's values per row.
Is it possible to calculate this using an Sql Query? if not or there's a simpler way could you please direct me to it?
Many Thanks,
Jon
In SQL Server 2008 depending on your needs you could apply full text indexing to the column then query the sys.dm_fts_index_keywords and sys.dm_fts_index_keywords_by_document table valued functions to get the occurrence count.
Edit: Actually even without creating a persistent full text index you can still leverage the parser
WITH testTable AS
(
SELECT 1 AS Id, N'how now brown cow' AS txt UNION ALL
SELECT 2, N'she sells sea shells upon the sea shore' UNION ALL
SELECT 3, N'red lorry yellow lorry' UNION ALL
SELECT 4, N'the quick brown fox jumped over the lazy dog'
)
SELECT display_term, COUNT(*) As Cnt
FROM testTable
CROSS APPLY sys.dm_fts_parser('"' + REPLACE(txt,'"','""') + '"', 1033, 0,0)
WHERE TXT IS NOT NULL
GROUP BY display_term
HAVING COUNT(*) > 1
ORDER BY Cnt DESC
Returns
display_term Cnt
------------------------------ -----------
the 3
brown 2
lorry 2
sea 2
Solution for SQL Server 2008:
here is the table:
CREATE TABLE MyTable (id INT, txt VARCHAR(MAX));
here is SQL query:
SELECT sum(case when TSplitted.txt_word = 'searched' then 1 else 0 end) as cnt_searched
, count(*) as cnt_all
FROM MyTable MYT
INNER JOIN Fn_Split(MYT.id,' ',MYT.txt) TSplitted on MYT.id=TSplitted.id
here is table valued function Fn_Split(#id int, #separator VARCHAR(32), #string VARCHAR(MAX)) (taken from here):
CREATE FUNCTION Fn_Split (#id int, #separator VARCHAR(32), #string VARCHAR(MAX))
RETURNS #t TABLE
(
ret_id INT
,txt_word VARCHAR(MAX)
)
AS
BEGIN
DECLARE #xml XML
SET #XML = N'<root><r>' + REPLACE(#s, #separator, '</r><r>') + '</r></root>'
INSERT INTO #t(ret_id, val)
SELECT #id, r.value('.','VARCHAR(5)') as Item
FROM #xml.nodes('//root/r') AS RECORDS(r)
RETURN
END