Import semicolon separated list into a new table - sql

I have a table which requires refactoring to support new functionality, the table is in the following format:
RefID (int), Data (nvarchar(255))
--------------
1, 161;162;163;164
2, 131;132;133;144
I need to transform this data and import it into a new table as follows:
ID (PK), RefID (int), Data (int))
-------------------------------------------
1,1,161
2,1,162
3,1,163
4,1,164
5,2,131
6,2,132
: :
etc.
Basically, split the semicolon delimited list (data) and create a new record for each one, converting to INTs along the way.

You can use a table valued function that split the string and use it to populate your table. Here is one Split function (credit to #AaronBertrand for the code):
CREATE FUNCTION [dbo].[SplitString]
(
#List NVARCHAR(MAX),
#Delim VARCHAR(255)
)
RETURNS TABLE
AS
RETURN ( SELECT [Value] FROM
(
SELECT
[Value] = LTRIM(RTRIM(SUBSTRING(#List, [Number],
CHARINDEX(#Delim, #List + #Delim, [Number]) - [Number])))
FROM (SELECT Number = ROW_NUMBER() OVER (ORDER BY name)
FROM sys.all_objects) AS x
WHERE Number <= LEN(#List)
AND SUBSTRING(#Delim + #List, [Number], LEN(#Delim)) = #Delim
) AS y
);
Then you just need to do the following:
INSERT INTO dbo.ResultTable(RefID, Data)
SELECT A.RefID,
B.[Value]
FROM dbo.YourTable A
CROSS APPLY [dbo].[SplitString](A.Data,';') B
Here is an sqlfiddle with a demo of this.

You can write sql script to handle this... but if this is a one time deal and don't mind using other means...
select replace(convert(varchar(100),RefID) + ';' + Data,';',',') Data
from table
Then save the result to a csv and import. Pretty quick solution. Or you can write a script. ;)

To add onto #Lamak's answer... here is the pivot part
SELECT Id, [1],[2],[3],[4]
FROM (
SELECT a.Id, b.value, ROW_NUMBER() OVER (PARTITION BY a.Id ORDER BY a.Id) ColNumber
FROM dbo.YourTable a
CROSS APPLY dbo.SplitString(DATA,',') b) pvt
PIVOT (MIN(value) FOR ColNumber IN ([1],[2],[3],[4])) p

Related

Extract strings till the second delim SQL

I wanted to extract all the details till the second /(forward slash)from my table in SQL Server. Any ideas?
website
AA.AA/AB/123
www.google.com/en/abcd/
yahoo.com/us/dev
gmail.com
ouput
website
AA.AA/AB
www.google.com/en
yahoo.com/us
gmail.com
Perhaps this will suit your needs:
DECLARE #Table TABLE (Col1 NVARCHAR(100))
INSERT #Table VALUES
('website'),
('AA.AA/AB/123'),
('www.google.com/en/abcd/'),
('yahoo.com/us/dev'),
('gmail.com')
SELECT
COALESCE(
NULLIF(
SUBSTRING(Col1,1,CHARINDEX('/',Col1,CHARINDEX('/',Col1)+1))
,'')
,Col1
) AS Col1
FROM #Table
If you are using SQL Server 2017 or 2019, you can use STRING_AGG() to reassemble the output from STRING_SPLIT():
SELECT STRING_AGG(x.value, '/')
FROM dbo.table_name CROSS APPLY
(
SELECT value, ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM STRING_SPLIT(Col1, '/') AS ss
) AS x(value, rn)
WHERE x.rn <= 2
GROUP BY Col1;
You might say:
"But Aaron, the output of STRING_SPLIT() isn't guaranteed to be in order; in fact the documentation warns about that."
This is true; the documentation does say that. But in current versions the output is extremely unlikely to be in anything but left-to-right order. I still suggest you be wary of relying on this, since it could break at any time (I warn about this in more detail here).
If you are on an older version, or don't trust it, you can use a table-valued function that preserves the order of the input string, for example from this answer:
CREATE FUNCTION [dbo].[SplitString]
(
#List NVARCHAR(MAX),
#Delim VARCHAR(255)
)
RETURNS TABLE
AS
RETURN ( SELECT [Value], idx = RANK() OVER (ORDER BY n) FROM
(
SELECT n = Number,
[Value] = LTRIM(RTRIM(SUBSTRING(#List, [Number],
CHARINDEX(#Delim, #List + #Delim, [Number]) - [Number])))
FROM (SELECT Number = ROW_NUMBER() OVER (ORDER BY name)
FROM sys.all_objects) AS x
WHERE Number <= LEN(#List)
AND SUBSTRING(#Delim + #List, [Number], LEN(#Delim)) = #Delim
) AS y
);
With that function in place, you can then do the following, and now feel safer about relying on order (at the cost of a more expensive query):
;WITH src AS
(
SELECT Col1, idx, Value
FROM dbo.table_name CROSS APPLY dbo.SplitString(Col1, '/')
)
SELECT STUFF((SELECT '/' + Value
FROM src
WHERE src.idx <= 2 AND Col1 = t.Col1
ORDER BY idx
FOR XML PATH(''), TYPE).value(N'./text()[1]', N'nvarchar(max)'), 1, 1, '')
FROM dbo.table_name AS t
GROUP BY Col1;
I find cross apply handy for these situations
select case when str like '%/%' then left(str, i2-1) else str end as str
from t
cross apply (select charindex( '/', str ) as i1) t2 --position of first slash
cross apply (select charindex( '/', str, (i1 + 1)) as i2 ) t3 --position of second slash
Below is the simple query you can try. In the below query please replace 'colName' with your column name and Table_1 with your table name.
SELECT LEFT([colName], charindex('/', [colName], charindex('/', [colName])+1)-1) AS [AfterSecondPipe]
FROM [Table_1]

Order Concatenated field

I have a field which is a concatenation of single letters. I am trying to order these strings within a view. These values can't be hard coded as there are too many. Is someone able to provide some guidance on the function to use to achieve the desired output below? I am using MSSQL.
Current output
CustID | Code
123 | BCA
Desired output
CustID | Code
123 | ABC
I have tried using a UDF
CREATE FUNCTION [dbo].[Alphaorder] (#str VARCHAR(50))
returns VARCHAR(50)
BEGIN
DECLARE #len INT,
#cnt INT =1,
#str1 VARCHAR(50)='',
#output VARCHAR(50)=''
SELECT #len = Len(#str)
WHILE #cnt <= #len
BEGIN
SELECT #str1 += Substring(#str, #cnt, 1) + ','
SET #cnt+=1
END
SELECT #str1 = LEFT(#str1, Len(#str1) - 1)
SELECT #output += Sp_data
FROM (SELECT Split.a.value('.', 'VARCHAR(100)') Sp_data
FROM (SELECT Cast ('<M>' + Replace(#str1, ',', '</M><M>') + '</M>' AS XML) AS Data) AS A
CROSS APPLY Data.nodes ('/M') AS Split(a)) A
ORDER BY Sp_data
RETURN #output
END
This works when calling one field
ie.
Select CustID, dbo.alphaorder(Code)
from dbo.source
where custid = 123
however when i try to apply this to top(10) i receive the error
"Invalid length parameter passed to the LEFT or SUBSTRING function."
Keeping in mind my source has ~4million records, is this still the best solution?
Unfortunately i am not able to normalize the data into a separate table with records for each Code.
This doesn't rely on a id column to join with itself, performance is almost as fast
as the answer by #Shnugo:
SELECT
CustID,
(
SELECT
chr
FROM
(SELECT TOP(LEN(Code))
SUBSTRING(Code,ROW_NUMBER() OVER(ORDER BY (SELECT NULL)),1)
FROM sys.messages) A(Chr)
ORDER by chr
FOR XML PATH(''), type).value('.', 'varchar(max)'
) As CODE
FROM
source t
First of all: Avoid loops...
You can try this:
DECLARE #tbl TABLE(ID INT IDENTITY, YourString VARCHAR(100));
INSERT INTO #tbl VALUES ('ABC')
,('JSKEzXO')
,('QKEvYUJMKRC');
--the cte will create a list of all your strings separated in single characters.
--You can check the output with a simple SELECT * FROM SeparatedCharacters instead of the actual SELECT
WITH SeparatedCharacters AS
(
SELECT *
FROM #tbl
CROSS APPLY
(SELECT TOP(LEN(YourString)) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM master..spt_values) A(Nmbr)
CROSS APPLY
(SELECT SUBSTRING(YourString,Nmbr,1))B(Chr)
)
SELECT ID,YourString
,(
SELECT Chr As [*]
FROM SeparatedCharacters sc1
WHERE sc1.ID=t.ID
ORDER BY sc1.Chr
FOR XML PATH(''),TYPE
).value('.','nvarchar(max)') AS Sorted
FROM #tbl t;
The result
ID YourString Sorted
1 ABC ABC
2 JSKEzXO EJKOSXz
3 QKEvYUJMKRC CEJKKMQRUvY
The idea in short
The trick is the first CROSS APPLY. This will create a tally on-the-fly. You will get a resultset with numbers from 1 to n where n is the length of the current string.
The second apply uses this number to get each character one-by-one using SUBSTRING().
The outer SELECT calls from the orginal table, which means one-row-per-ID and use a correalted sub-query to fetch all related characters. They will be sorted and re-concatenated using FOR XML. You might add DISTINCT in order to avoid repeating characters.
That's it :-)
Hint: SQL-Server 2017+
With version v2017 there's the new function STRING_AGG(). This would make the re-concatenation very easy:
WITH SeparatedCharacters AS
(
SELECT *
FROM #tbl
CROSS APPLY
(SELECT TOP(LEN(YourString)) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM master..spt_values) A(Nmbr)
CROSS APPLY
(SELECT SUBSTRING(YourString,Nmbr,1))B(Chr)
)
SELECT ID,YourString
,STRING_AGG(sc.Chr,'') WITHIN GROUP(ORDER BY sc.Chr) AS Sorted
FROM SeparatedCharacters sc
GROUP BY ID,YourString;
Considering your table having good amount of rows (~4 Million), I would suggest you to create a persisted calculated field in the table, to store these values. As calculating these values at run time in a view, will lead to performance problems.
If you are not able to normalize, add this as a denormalized column to the existing table.
I think the error you are getting could be due to empty codes.
If LEN(#str) = 0
BEGIN
SET #output = ''
END
ELSE
BEGIN
... EXISTING CODE BLOCK ...
END
I can suggest to split string into its characters using referred SQL function.
Then you can concatenate string back, this time ordered alphabetically.
Are you using SQL Server 2017? Because with SQL Server 2017, you can use SQL String_Agg string aggregation function to concatenate characters splitted in an ordered way as follows
select
t.CustId, string_agg(strval, '') within GROUP (order by strval)
from CharacterTable t
cross apply dbo.SPLIT(t.code) s
where strval is not null
group by CustId
order by CustId
If you are not working on SQL2017, then you can follow below structure using SQL XML PATH for concatenation in SQL
select
CustId,
STUFF(
(
SELECT
'' + strval
from CharacterTable ct
cross apply dbo.SPLIT(t.code) s
where strval is not null
and t.CustId = ct.CustId
order by strval
FOR XML PATH('')
), 1, 0, ''
) As concatenated_string
from CharacterTable t
order by CustId

Get a specific string

It's my data and every ThroughRouteSid record has the same pattern.
six number and five comma. then I just want to get three and five
number into two record to template Table and get the same Count()
value to these two record.
For example: First record in the picture.
ThroughRouteSid(3730,2428,2428,3935,3935,3938,) Count(32).
I want a result like this:
2428 32 3935 32
I get What number I want.become two record and both have same Count value into template table
you can use XML to get your result, please refer below sample code -
create table #t1( ThroughRouteSid varchar(500) , Cnt int)
insert into #t1
select '3730,2428,2428,3935,3935,3938,' , len('3730,2428,2428,3935,3935,3938,')
union all select '1111,2222,3333,4444,5555,6666,' , len('1111,2222,3333,4444,5555,6666,')
select cast( '<xml><td>' + REPLACE( SUBSTRING(ThroughRouteSid ,1 , len(ThroughRouteSid)-1),',','</td><td>') + '</td></xml>' as xml) XmlData , Cnt
into #t2 from #t1
select XmlData.value('(xml/td)[3]' ,'int' ), Cnt ,XmlData.value('(xml/td)[5]' ,'int' ), Cnt
from #t2
First create the function referring How to Split a string by delimited char in SQL Server. Then try Querying the following
select (SELECT CONVERT(varchar,splitdata) + ' '+ Convert(varchar, [Count])+' ' FROM (select splitdata, ROW_NUMBER() over (ORDER BY (SELECT 100)) row_no
from [dbo].[fnSplitString](ThroughRouteSid,',')
where splitdata != '') as temp where row_no in (2,5)
for xml path('')) as col1 from [yourtable]
If you are using SQL Server 2016 you can do something like this:
create table #temp (ThroughRouteSid varchar(1024),[Count] int)
insert into #temp values
('3730,2428,2428,3935,3935,3938,',32),
('730,428,428,335,935,938,',28)
select
spt.value,
t.[Count]
from #temp t
cross apply (
select value from STRING_SPLIT(t.ThroughRouteSid,',') where LEN(value) > 0
)spt

How to split concatenated field in SQL with SSIS or SQL

I have problem that I have to split a concatenated field into different rows.
The delimiter is a "+" marker.
So in my field I have 3%+2%+1% and what I want is row 1 ->3%, row 2 -> 2% and so on.
But there is one more big problem: I don't know how many concatenated values we have so it could 3, 5 or maybe 10 values.
Can somebody help me solving this issue with SSIS or SQL.
For me #sdrzymala is correct here. I would normalise this data first before loading it to a database. If the client or report needed the data pivoted or denormalised again I would do this in client code.
1) First I would save the following split function and staging table "PercentsNormalised" into the database. I got the split function from this question here.
-- DDL Code:
Create FUNCTION dbo.SplitStrings
(
#List NVARCHAR(MAX),
#Delimiter NVARCHAR(255)
)
RETURNS TABLE
AS
RETURN (SELECT Number = ROW_NUMBER() OVER (ORDER BY Number),
Item FROM (SELECT Number, Item = LTRIM(RTRIM(SUBSTRING(#List, Number,
CHARINDEX(#Delimiter, #List + #Delimiter, Number) - Number)))
FROM (SELECT ROW_NUMBER() OVER (ORDER BY s1.[object_id])
FROM sys.all_objects AS s1 CROSS APPLY sys.all_objects) AS n(Number)
WHERE Number <= CONVERT(INT, LEN(#List))
AND SUBSTRING(#Delimiter + #List, Number, 1) = #Delimiter
) AS y);
GO
Create Table PercentsNormalised(
RowIndex Int,
-- Other fields here,
PercentValue Varchar(100)
)
GO
2) Either writing some SQL (like below) or using the same logic in a SSIS dataflow task transform the data like so and insert into the "PercentsNormalised" table created above.
With TestData As (
-- Replace with "real table" containing concatenated rows
Select '3%+2%+1%' As Percents Union All
Select '5%+1%+1%+0%' Union All
Select '10%+8%' Union All
Select '10%+5%+1%+1%+0%'
),
TestDataWithRowIndex As (
-- You might want to order the rows by another field
-- in the "real table"
Select Row_Number() Over (Order By Percents) As RowIndex,
Percents
From TestData
)
-- You could remove this insert and select and have the logic in a
-- SSIS Dataflow task
Insert PercentsNormalised
Select td.RowIndex,
ss.Item As PercentValue
From TestDataWithRowIndex As td
Cross Apply dbo.SplitStrings(td.Percents, '+') ss;
3) Write client code on the "PercentsNormalised" table using say the SQL pivot operator.

Splitting a variable length column in SQL server safely

I have a column (varchar400) in the following form in an SQL table :
Info
UserID=1123456,ItemID=6685642
The column is created via our point of sale application, and so I cannot do the normal thing of simply splitting it into two columns as this would cause an obscene amount of work. My problem is that this column is used to store attributes of products in our database, and so while I am only concerned with UserID and ItemID, there may be superfluous information stored here, for example :
Info
IrrelevantID=666,UserID=123124,AnotherIrrelevantID=1232342,ItemID=1213124.
What I want to retrieve is simply two columns, with no error given if neither of these attributes exists in the Info column. :
UserID ItemID
123124 1213124
Would it be possible to do this effectively, with error checking, given that the length of the IDs are all variable, but all of the attributes are comma-separated and follow a uniform style (i.e "UserID=number").
Can anyone tell me the best way of dealing with my problem ?
Thanks a lot.
Try this
declare #infotable table (info varchar(4000))
insert into #infotable
select 'IrrelevantID=666,UserID=123124,AnotherIrrelevantID=1232342,ItemID=1213124.'
union all
select 'UserID=1123456,ItemID=6685642'
-- convert info column to xml type
; with cte as
(
select cast('<info ' + REPLACE(REPLACE(REPLACE(info,',', '" '),'=','="'),'.','') + '" />' as XML) info,
ROW_NUMBER() over (order by info) id
from #infotable
)
select userId, ItemId from
(
select T.N.value('local-name(.)', 'varchar(max)') as Name,
T.N.value('.', 'varchar(max)') as Value, id
from cte cross apply info.nodes('//#*') as T(N)
) v
pivot (max(value) for Name in ([UserID], [ItemId])) p
SQL DEMO
You can try this split function: http://www.sommarskog.se/arrays-in-sql-2005.html
Assuming ItemID=1213124. is terminated with a dot.
Declare #t Table (a varchar(400))
insert into #t values ('IrrelevantID=666,UserID=123124,AnotherIrrelevantID=1232342,ItemID=1213124.')
insert into #t values ('IrrelevantID=333,UserID=222222,AnotherIrrelevantID=0,ItemID=111.')
Select
STUFF(
Stuff(a,1,CHARINDEX(',UserID=',a) + Len(',UserID=')-1 ,'' )
,CharIndex
(',',
Stuff(a,1,CHARINDEX(',UserID=',a) + Len(',UserID=')-1 ,'' )
)
,400,'') as UserID
,
STUFF(
Stuff(a,1,CHARINDEX(',ItemID=',a) + Len(',ItemID=')-1 ,'' )
,CharIndex
('.',
Stuff(a,1,CHARINDEX(',ItemID=',a) + Len(',ItemID=')-1,'' )
)
,400,'') as ItemID
from #t