Split string and display below other column data using SQL Server [duplicate] - sql

I have a table that looks like this:
ProductId, Color
"1", "red, blue, green"
"2", null
"3", "purple, green"
And I want to expand it to this:
ProductId, Color
1, red
1, blue
1, green
2, null
3, purple
3, green
Whats the easiest way to accomplish this? Is it possible without a loop in a proc?

Take a look at this function. I've done similar tricks to split and transpose data in Oracle. Loop over the data inserting the decoded values into a temp table. The convent thing is that MS will let you do this on the fly, while Oracle requires an explicit temp table.
MS SQL Split Function
Better Split Function
Edit by author:
This worked great. Final code looked like this (after creating the split function):
select pv.productid, colortable.items as color
from product p
cross apply split(p.color, ',') as colortable

based on your tables:
create table test_table
(
ProductId int
,Color varchar(100)
)
insert into test_table values (1, 'red, blue, green')
insert into test_table values (2, null)
insert into test_table values (3, 'purple, green')
create a new table like this:
CREATE TABLE Numbers
(
Number int not null primary key
)
that has rows containing values 1 to 8000 or so.
this will return what you want:
EDIT
here is a much better query, slightly modified from the great answer from #Christopher Klein:
I added the "LTRIM()" so the spaces in the color list, would be handled properly: "red, blue, green". His solution requires no spaces "red,blue,green". Also, I prefer to use my own Number table and not use master.dbo.spt_values, this allows the removal of one derived table too.
SELECT
ProductId, LEFT(PartialColor, CHARINDEX(',', PartialColor + ',')-1) as SplitColor
FROM (SELECT
t.ProductId, LTRIM(SUBSTRING(t.Color, n.Number, 200)) AS PartialColor
FROM test_table t
LEFT OUTER JOIN Numbers n ON n.Number<=LEN(t.Color) AND SUBSTRING(',' + t.Color, n.Number, 1) = ','
) t
EDIT END
SELECT
ProductId, Color --,number
FROM (SELECT
ProductId
,CASE
WHEN LEN(List2)>0 THEN LTRIM(RTRIM(SUBSTRING(List2, number+1, CHARINDEX(',', List2, number+1)-number - 1)))
ELSE NULL
END AS Color
,Number
FROM (
SELECT ProductId,',' + Color + ',' AS List2
FROM test_table
) AS dt
LEFT OUTER JOIN Numbers n ON (n.Number < LEN(dt.List2)) OR (n.Number=1 AND dt.List2 IS NULL)
WHERE SUBSTRING(List2, number, 1) = ',' OR List2 IS NULL
) dt2
ORDER BY ProductId, Number, Color
here is my result set:
ProductId Color
----------- --------------
1 red
1 blue
1 green
2 NULL
3 purple
3 green
(6 row(s) affected)
which is the same order you want...

You can try this out, doesnt require any additional functions:
declare #t table (col1 varchar(10), col2 varchar(200))
insert #t
select '1', 'red,blue,green'
union all select '2', NULL
union all select '3', 'green,purple'
select col1, left(d, charindex(',', d + ',')-1) as e from (
select *, substring(col2, number, 200) as d from #t col1 left join
(select distinct number from master.dbo.spt_values where number between 1 and 200) col2
on substring(',' + col2, number, 1) = ',') t

I arrived this question 10 years after the post.
SQL server 2016 added STRING_SPLIT function.
By using that, this can be written as below.
declare #product table
(
ProductId int,
Color varchar(max)
);
insert into #product values (1, 'red, blue, green');
insert into #product values (2, null);
insert into #product values (3, 'purple, green');
select
p.ProductId as ProductId,
ltrim(split_table.value) as Color
from #product p
outer apply string_split(p.Color, ',') as split_table;

Fix your database if at all possible. Comma delimited lists in database cells indicate a flawed schema 99% of the time or more.

I would create a CLR table-defined function for this:
http://msdn.microsoft.com/en-us/library/ms254508(VS.80).aspx
The reason for this is that CLR code is going to be much better at parsing apart the strings (computational work) and can pass that information back as a set, which is what SQL Server is really good at (set management).
The CLR function would return a series of records based on the parsed values (and the input id value).
You would then use a CROSS APPLY on each element in your table.

Just convert your columns into xml and query it. Here's an example.
select
a.value('.', 'varchar(42)') c
from (select cast('<r><a>' + replace(#CSV, ',', '</a><a>') + '</a></r>' as xml) x) t1
cross apply x.nodes('//r/a') t2(a)

Why not use dynamic SQL for this purpose, something like this(adapt to your needs):
DECLARE #dynSQL VARCHAR(max)
SET #dynSQL = 'insert into DestinationTable(field) values'
select #dynSQL = #dynSQL + '('+ REPLACE(Color,',',''',''') + '),' from Table
SET #dynSql = LEFT(#dynSql,LEN(#dynSql) -1) -- delete the last comma
exec #dynSql
One advantage is that you can use it on any SQL Server version

Related

How to extract the n-th value from the text field with delimitors

In SQL table I have a text column with value 'Yellow|Green|Blue' and another column with numeric value. This numeric value defines which part of the text column to be extracted. Values in the text column are separated with '|' separator.
For example:
If numeric value is 0, 1st part of the text field should be extracted: Yellow
If numeric value is 1, 2nd part of the text field should be extracted: Green
And so on.
Is there a way how to extract it dynamically ?
Meaning without using CASE statement like:
case when u.UD_2 =0 then 'Yellow' when u.UD_2=1 then 'Green' when u.UD_2=3 then 'Blue' end Kategorie
UPDATE: We are using SQL Server 2016
This should work for you, in the subquery extract each category to separate columns and after it, use a case statement to choose the needed category.
select case sep when 0 then x.[0] when 1 then x.[1] when 2 then x.[2] end as Kategorie
from (
select *
,LEFT(val, CHARINDEX('|', val) - 1) AS '0'
,LEFT(STUFF(SUBSTRING(val, CHARINDEX('|', val), LEN(val)), 1, 1, ''), CHARINDEX('|', STUFF(SUBSTRING(val, CHARINDEX('|', val), LEN(val)), 1, 1, '')) - 1) AS '1'
,SUBSTRING(SUBSTRING(val, CHARINDEX('|', val), LEN(val)), CHARINDEX('|', val) + 1, LEN(val)) AS '2'
from #test
)x
Sample data:
create table #test
(
val nvarchar(500),
sep int
)
insert into #test values
('Yellow|Green|Blue', 0),
('Yellow|Green|Blue', 1),
('Yellow|Green|Blue', 2)
Note: this only works if there are exact 3 values separated with |
UPDATE
And this is a dynamic way to achieve it, doesn't matter how many categories will be separated:
SELECT x.Kategorie
FROM (
SELECT DISTINCT node.s.value('.', 'NVARCHAR(500)') AS Kategorie
,ROW_NUMBER() OVER (PARTITION BY sep ORDER BY (SELECT NULL)) - 1 as rn
FROM (
SELECT sep
,CAST('<M>' + REPLACE(val, '|', '</M><M>') + '</M>' AS XML) AS Kategorie
FROM #test
) AS s
CROSS APPLY Kategorie.nodes('/M') AS node(s)
)x
JOIN #test AS t ON t.sep = x.rn
One possible approach is to split your text data into substrings and get each substring position.
Starting with SQL Server 2016 you may use STRING_SPLIT() to split a string, but in your case this is not an option, because this function returns a table with all substrings, but they are not ordered and the order of substrings is not guaranteed.
Again, if you use SQL Server 2016+, you may try to transform the text data into a valid JSON array using REPLACE() ('Yellow|Green|Blue' is transformed into '["Yellow","Green","Blue"]') and after that to use OPENJSON() with default schema to retrieve this JSON array as table, which has columns key, value and type (key column contains the index of the element in the specified array).
Input:
CREATE TABLE #Data (
TextValue nvarchar(max),
IndexValue int
)
INSERT INTO #Data
(TextValue, IndexValue)
VALUES
('Yellow|Green|Blue', 0),
('Yellow|Green|Blue', 1)
T-SQL:
SELECT d.TextValue, d.IndexValue, j.[value] AS [Value]
FROM #Data d
CROSS APPLY OPENJSON(CONCAT(N'["', REPLACE(d.TextValue, N'|', N'","'), N'"]')) j
WHERE d.IndexValue = j.[key]
Output:
---------------------------------------
TextValue IndexValue Value
---------------------------------------
Yellow|Green|Blue 0 Yellow
Yellow|Green|Blue 1 Green

Using data within an extracted text string for an inner join

Newbie so please be easy on me.
I have a select statement which returns a text string
select [column name]from [table] where [column name] like '%dog%'
simple enough, returns a result much like below
random text dog '123' more random text
random text dog '345' more random text
random text dog '723' more random text ...
I am looking to extract the 123, 345, 723 part of the text string which I can kind of do through using
Declare #Text Varchar(100);
Set #Text = 'random text dog ''123'' more random text';
Select Left(Substring(#Text, Patindex('%''%', #Text) + 1, Len(#Text) - Patindex('%''%', #Text))
,Patindex('%''%', Substring(#Text, Patindex('%''%', #Text) + 1, Len(#Text) - Patindex('%''%', #Text)))- 1) 'Lookup Index'
I am then wanting to use the result as part of inner join to another table to return a result along the lines of
Lookup Index Colour
123 Blue
345 Green
723 Orange
I just cant seem to tie it all together so any help much appreciated.
Thanking you all in advance.
Here's an option for you.
Load the parsed data into a temp table, then join that back to your lookup table. I would be easier and would probably perform better.
Have a look at this. There's an example with some test data all using temp tables. You'd obviously need to adjust for your specific tables and requirements:
--This will be our table that has the data you want to parse
CREATE TABLE #TestData (
TextColumn nvarchar(1000)
)
--lookup table for colours
CREATE TABLE #LookUp(
LookUpIndex INT
,Colour NVARCHAR(100)
)
--Use a temp table to load the parsed data from #TestData
CREATE TABLE #TestDataParse (
LookUpIndex INT
,TextColumn NVARCHAR(1000)
)
--Load our test data
INSERT INTO [#TestData] ([TextColumn])
VALUES('random text dog ''123'' more random text')
,('random text dog ''345'' more random text')
,('random text dog ''723'' more random text')
--populate our lookup table
INSERT INTO [#LookUp] (
[LookUpIndex]
, [Colour]
)
VALUES(123, 'Blue')
,(345, 'Green')
,(723 , 'Orange')
--Now parse the LookUp number out and load that to our temp table
INSERT INTO [#TestDataParse] (
[LookUpIndex]
, [TextColumn]
)
SELECT
Left(Substring(TextColumn, Patindex('%''%', TextColumn) + 1, Len(TextColumn) - Patindex('%''%', TextColumn))
,Patindex('%''%', Substring(TextColumn, Patindex('%''%', TextColumn) + 1, Len(TextColumn) - Patindex('%''%', TextColumn)))- 1) AS LookUpIndex
,[TextColumn]
FROM [#TestData]
--Now we can join that back to the lookup table
SELECT *
FROM [#TestDataParse] [a]
INNER JOIN [#LookUp] [b]
ON [b].[LookUpIndex] = [a].[LookUpIndex];
--Example done
--Drop our #temp tables
DROP TABLE [#LookUp]
DROP TABLE [#TestData]
DROP TABLE [#TestDataParse]
or you could skip loading into a temp table and use a sub query to simplify the join back to your lookup table, something like:
SELECT * FROM (
SELECT
Left(Substring(TextColumn, Patindex('%''%', TextColumn) + 1, Len(TextColumn) - Patindex('%''%', TextColumn))
,Patindex('%''%', Substring(TextColumn, Patindex('%''%', TextColumn) + 1, Len(TextColumn) - Patindex('%''%', TextColumn)))- 1) AS LookUpIndex
,[TextColumn]
FROM [#TestData] a
) AS tb
INNER JOIN [#LookUp] b ON [b].[LookUpIndex] = [tb].[LookUpIndex]
I'd test each one and see which performs the best for you.
This is a pain but you can do:
select t.col, left(v.dogv, charindex('''', v.dogv) - 1)
from (values ('random text dog ''123'' more random text')) t(col) cross apply
(values (stuff(col, 1, charindex('dog', col) + 4, ''))
) v(dogv)
where col like '%dog%';

Efficient way to merge alternating values from two columns into one column in SQL Server

I have two columns in a table. I want to merge them into a single column, but the merge should be done taking alternate characters from each columns.
For example:
Column A --> value (1,2,3)
Column B --> value (A,B,C)
Required result - (1,A,2,B,3,C)
It should be done without loops.
You need to make use of the UNION and get a little creative with how you choose to alternate. My solution ended up looking like this.
SELECT ColumnA
FROM Table
WHERE ColumnA%2=1
UNION
SELECT ColumnB
FROM TABLE
WHERE ColumnA%2=0
If you have an ID/PK column that could just as easily be used, I just didn't want to assume anything about your table.
EDIT:
If your table contains duplicates that you wish to keep, use UNION ALL instead of UNION
Try This;
SELECT [value]
FROM [Table]
UNPIVOT
(
[value] FOR [Column] IN ([Column_A], [Column_B])
) UNPVT
If you have SQL 2016 or higher you can use:
SELECT QUOTENAME(STRING_AGG (cast(a as varchar(1)) + ',' + b, ','), '()')
FROM test;
In older versions, depending on how much data you have in your tables you can also try:
SELECT QUOTENAME(STUFF(
(SELECT ',' + cast(a as varchar(1)) + ',' + b
FROM test
FOR XML PATH('')), 1, 1,''), '()')
Here you can try a sample
http://sqlfiddle.com/#!18/6c9af/5
with data as (
select *, row_number() over order by colA) as rn
from t
)
select rn,
case rn % 2 when 1 then colA else colB end as alternating
from data;
The following SQL uses undocumented aggregate concatenation technique. This is described in Inside Microsoft SQL Server 2008 T-SQL Programming on page 33.
declare #x varchar(max) = '';
declare #t table (a varchar(10), b varchar(10));
insert into #t values (1,'A'), (2,'B'),(3,'C');
select #x = #x + a + ',' + b + ','
from #t;
select '(' + LEFT(#x, LEN(#x) - 1) + ')';

Calculate TF-IDF using Sql

I have a table in my DB containning a free text field column.
I would like to know the frequency each word appears over all the rows, or maybe even calc a TF-IDF for all words, where my documents are that field's values per row.
Is it possible to calculate this using an Sql Query? if not or there's a simpler way could you please direct me to it?
Many Thanks,
Jon
In SQL Server 2008 depending on your needs you could apply full text indexing to the column then query the sys.dm_fts_index_keywords and sys.dm_fts_index_keywords_by_document table valued functions to get the occurrence count.
Edit: Actually even without creating a persistent full text index you can still leverage the parser
WITH testTable AS
(
SELECT 1 AS Id, N'how now brown cow' AS txt UNION ALL
SELECT 2, N'she sells sea shells upon the sea shore' UNION ALL
SELECT 3, N'red lorry yellow lorry' UNION ALL
SELECT 4, N'the quick brown fox jumped over the lazy dog'
)
SELECT display_term, COUNT(*) As Cnt
FROM testTable
CROSS APPLY sys.dm_fts_parser('"' + REPLACE(txt,'"','""') + '"', 1033, 0,0)
WHERE TXT IS NOT NULL
GROUP BY display_term
HAVING COUNT(*) > 1
ORDER BY Cnt DESC
Returns
display_term Cnt
------------------------------ -----------
the 3
brown 2
lorry 2
sea 2
Solution for SQL Server 2008:
here is the table:
CREATE TABLE MyTable (id INT, txt VARCHAR(MAX));
here is SQL query:
SELECT sum(case when TSplitted.txt_word = 'searched' then 1 else 0 end) as cnt_searched
, count(*) as cnt_all
FROM MyTable MYT
INNER JOIN Fn_Split(MYT.id,' ',MYT.txt) TSplitted on MYT.id=TSplitted.id
here is table valued function Fn_Split(#id int, #separator VARCHAR(32), #string VARCHAR(MAX)) (taken from here):
CREATE FUNCTION Fn_Split (#id int, #separator VARCHAR(32), #string VARCHAR(MAX))
RETURNS #t TABLE
(
ret_id INT
,txt_word VARCHAR(MAX)
)
AS
BEGIN
DECLARE #xml XML
SET #XML = N'<root><r>' + REPLACE(#s, #separator, '</r><r>') + '</r></root>'
INSERT INTO #t(ret_id, val)
SELECT #id, r.value('.','VARCHAR(5)') as Item
FROM #xml.nodes('//root/r') AS RECORDS(r)
RETURN
END

How do I expand comma separated values into separate rows using SQL Server 2005?

I have a table that looks like this:
ProductId, Color
"1", "red, blue, green"
"2", null
"3", "purple, green"
And I want to expand it to this:
ProductId, Color
1, red
1, blue
1, green
2, null
3, purple
3, green
Whats the easiest way to accomplish this? Is it possible without a loop in a proc?
Take a look at this function. I've done similar tricks to split and transpose data in Oracle. Loop over the data inserting the decoded values into a temp table. The convent thing is that MS will let you do this on the fly, while Oracle requires an explicit temp table.
MS SQL Split Function
Better Split Function
Edit by author:
This worked great. Final code looked like this (after creating the split function):
select pv.productid, colortable.items as color
from product p
cross apply split(p.color, ',') as colortable
based on your tables:
create table test_table
(
ProductId int
,Color varchar(100)
)
insert into test_table values (1, 'red, blue, green')
insert into test_table values (2, null)
insert into test_table values (3, 'purple, green')
create a new table like this:
CREATE TABLE Numbers
(
Number int not null primary key
)
that has rows containing values 1 to 8000 or so.
this will return what you want:
EDIT
here is a much better query, slightly modified from the great answer from #Christopher Klein:
I added the "LTRIM()" so the spaces in the color list, would be handled properly: "red, blue, green". His solution requires no spaces "red,blue,green". Also, I prefer to use my own Number table and not use master.dbo.spt_values, this allows the removal of one derived table too.
SELECT
ProductId, LEFT(PartialColor, CHARINDEX(',', PartialColor + ',')-1) as SplitColor
FROM (SELECT
t.ProductId, LTRIM(SUBSTRING(t.Color, n.Number, 200)) AS PartialColor
FROM test_table t
LEFT OUTER JOIN Numbers n ON n.Number<=LEN(t.Color) AND SUBSTRING(',' + t.Color, n.Number, 1) = ','
) t
EDIT END
SELECT
ProductId, Color --,number
FROM (SELECT
ProductId
,CASE
WHEN LEN(List2)>0 THEN LTRIM(RTRIM(SUBSTRING(List2, number+1, CHARINDEX(',', List2, number+1)-number - 1)))
ELSE NULL
END AS Color
,Number
FROM (
SELECT ProductId,',' + Color + ',' AS List2
FROM test_table
) AS dt
LEFT OUTER JOIN Numbers n ON (n.Number < LEN(dt.List2)) OR (n.Number=1 AND dt.List2 IS NULL)
WHERE SUBSTRING(List2, number, 1) = ',' OR List2 IS NULL
) dt2
ORDER BY ProductId, Number, Color
here is my result set:
ProductId Color
----------- --------------
1 red
1 blue
1 green
2 NULL
3 purple
3 green
(6 row(s) affected)
which is the same order you want...
You can try this out, doesnt require any additional functions:
declare #t table (col1 varchar(10), col2 varchar(200))
insert #t
select '1', 'red,blue,green'
union all select '2', NULL
union all select '3', 'green,purple'
select col1, left(d, charindex(',', d + ',')-1) as e from (
select *, substring(col2, number, 200) as d from #t col1 left join
(select distinct number from master.dbo.spt_values where number between 1 and 200) col2
on substring(',' + col2, number, 1) = ',') t
I arrived this question 10 years after the post.
SQL server 2016 added STRING_SPLIT function.
By using that, this can be written as below.
declare #product table
(
ProductId int,
Color varchar(max)
);
insert into #product values (1, 'red, blue, green');
insert into #product values (2, null);
insert into #product values (3, 'purple, green');
select
p.ProductId as ProductId,
ltrim(split_table.value) as Color
from #product p
outer apply string_split(p.Color, ',') as split_table;
Fix your database if at all possible. Comma delimited lists in database cells indicate a flawed schema 99% of the time or more.
I would create a CLR table-defined function for this:
http://msdn.microsoft.com/en-us/library/ms254508(VS.80).aspx
The reason for this is that CLR code is going to be much better at parsing apart the strings (computational work) and can pass that information back as a set, which is what SQL Server is really good at (set management).
The CLR function would return a series of records based on the parsed values (and the input id value).
You would then use a CROSS APPLY on each element in your table.
Just convert your columns into xml and query it. Here's an example.
select
a.value('.', 'varchar(42)') c
from (select cast('<r><a>' + replace(#CSV, ',', '</a><a>') + '</a></r>' as xml) x) t1
cross apply x.nodes('//r/a') t2(a)
Why not use dynamic SQL for this purpose, something like this(adapt to your needs):
DECLARE #dynSQL VARCHAR(max)
SET #dynSQL = 'insert into DestinationTable(field) values'
select #dynSQL = #dynSQL + '('+ REPLACE(Color,',',''',''') + '),' from Table
SET #dynSql = LEFT(#dynSql,LEN(#dynSql) -1) -- delete the last comma
exec #dynSql
One advantage is that you can use it on any SQL Server version