Splitting a variable length column in SQL server safely - sql

I have a column (varchar400) in the following form in an SQL table :
Info
UserID=1123456,ItemID=6685642
The column is created via our point of sale application, and so I cannot do the normal thing of simply splitting it into two columns as this would cause an obscene amount of work. My problem is that this column is used to store attributes of products in our database, and so while I am only concerned with UserID and ItemID, there may be superfluous information stored here, for example :
Info
IrrelevantID=666,UserID=123124,AnotherIrrelevantID=1232342,ItemID=1213124.
What I want to retrieve is simply two columns, with no error given if neither of these attributes exists in the Info column. :
UserID ItemID
123124 1213124
Would it be possible to do this effectively, with error checking, given that the length of the IDs are all variable, but all of the attributes are comma-separated and follow a uniform style (i.e "UserID=number").
Can anyone tell me the best way of dealing with my problem ?
Thanks a lot.

Try this
declare #infotable table (info varchar(4000))
insert into #infotable
select 'IrrelevantID=666,UserID=123124,AnotherIrrelevantID=1232342,ItemID=1213124.'
union all
select 'UserID=1123456,ItemID=6685642'
-- convert info column to xml type
; with cte as
(
select cast('<info ' + REPLACE(REPLACE(REPLACE(info,',', '" '),'=','="'),'.','') + '" />' as XML) info,
ROW_NUMBER() over (order by info) id
from #infotable
)
select userId, ItemId from
(
select T.N.value('local-name(.)', 'varchar(max)') as Name,
T.N.value('.', 'varchar(max)') as Value, id
from cte cross apply info.nodes('//#*') as T(N)
) v
pivot (max(value) for Name in ([UserID], [ItemId])) p
SQL DEMO

You can try this split function: http://www.sommarskog.se/arrays-in-sql-2005.html

Assuming ItemID=1213124. is terminated with a dot.
Declare #t Table (a varchar(400))
insert into #t values ('IrrelevantID=666,UserID=123124,AnotherIrrelevantID=1232342,ItemID=1213124.')
insert into #t values ('IrrelevantID=333,UserID=222222,AnotherIrrelevantID=0,ItemID=111.')
Select
STUFF(
Stuff(a,1,CHARINDEX(',UserID=',a) + Len(',UserID=')-1 ,'' )
,CharIndex
(',',
Stuff(a,1,CHARINDEX(',UserID=',a) + Len(',UserID=')-1 ,'' )
)
,400,'') as UserID
,
STUFF(
Stuff(a,1,CHARINDEX(',ItemID=',a) + Len(',ItemID=')-1 ,'' )
,CharIndex
('.',
Stuff(a,1,CHARINDEX(',ItemID=',a) + Len(',ItemID=')-1,'' )
)
,400,'') as ItemID
from #t

Related

How to remove unwanted numbers and/or Special characters from String field in SSIS or SQL

I am new to SSIS. I am trying extract the data from SharePoint and load the data into SQL Server 2012. Most of the fields are coming fine except one. I am getting the unwanted values (random number and # character) like
117;#00.010;#120;#00.013
where I want to display
00.010;00.013
I tried to use below code in Derived column but still no luck
REPLACE([Related Procedure], SUBSTRING([Related Procedure], 1, FINDSTRING([Related Procedure], "#", 1)), "")
and this is the output I am getting if I use the above code
00.010;#120;#00.013
My desired output is
00.010;00.013
Please note this is using TSQL, it is not an SSIS expression. Below is a solution that will work in SQL Server 2017 or newer. The STRING_AGG function is SQL SERVER 2017 or newer and STRING_SPLIT is SQL SERVER 2016 or newer.
I use STRING_SPLIT to break apart the string by ;, then STRING_AGG to recombine the pieces you want to keep. I added another record to my example to demonstrate how you need to GROUP BY to keep the values in separate rows, otherwise all your values would come back in a single row.
CREATE TABLE #MyTable
(
Id INT IDENTITY(1,1)
, [Related Procedure] VARCHAR(100)
)
INSERT INTO #MyTable VALUES
('117;#00.010;#120;#00.013')
, ('118;#00.011;#121;#00.014')
SELECT
STRING_AGG(REPLACE([value], '#', ''), ';')
FROM
#MyTable
CROSS APPLY STRING_SPLIT([Related Procedure], ';')
WHERE
[value] LIKE '%.%'
GROUP BY
Id
Please try this:
IF (OBJECT_ID('tempdb..#temp_table') IS NOT NULL)
BEGIN
DROP TABLE #temp_table
END;
CREATE TABLE #temp_table
(
id int identity(1,1),
String VARCHAR(MAX)
)
INSERT #temp_table SELECT '117;#00.010;#120;#00.013'
;with tmp (id, value)as (
SELECT id, replace(value, '#','')
FROM #temp_table
CROSS APPLY STRING_SPLIT(String, ';')
where value like '%.%'
)
SELECT STUFF((SELECT '; ' + value -- or CAST(value AS VARCHAR(MAX)) [text()]
from tmp
where id= t.id
for xml path(''), TYPE) .value('.','NVARCHAR(MAX)'),1,2,' ') value
FROM tmp t
or since your sql server version is 2017, you can use STRING_AGG instead of STUFF to concatenate strings returned via CTE.
SELECT STRING_AGG(value, NVARCHAR(MAX)) AS csv FROM tmp group by id;

Order Concatenated field

I have a field which is a concatenation of single letters. I am trying to order these strings within a view. These values can't be hard coded as there are too many. Is someone able to provide some guidance on the function to use to achieve the desired output below? I am using MSSQL.
Current output
CustID | Code
123 | BCA
Desired output
CustID | Code
123 | ABC
I have tried using a UDF
CREATE FUNCTION [dbo].[Alphaorder] (#str VARCHAR(50))
returns VARCHAR(50)
BEGIN
DECLARE #len INT,
#cnt INT =1,
#str1 VARCHAR(50)='',
#output VARCHAR(50)=''
SELECT #len = Len(#str)
WHILE #cnt <= #len
BEGIN
SELECT #str1 += Substring(#str, #cnt, 1) + ','
SET #cnt+=1
END
SELECT #str1 = LEFT(#str1, Len(#str1) - 1)
SELECT #output += Sp_data
FROM (SELECT Split.a.value('.', 'VARCHAR(100)') Sp_data
FROM (SELECT Cast ('<M>' + Replace(#str1, ',', '</M><M>') + '</M>' AS XML) AS Data) AS A
CROSS APPLY Data.nodes ('/M') AS Split(a)) A
ORDER BY Sp_data
RETURN #output
END
This works when calling one field
ie.
Select CustID, dbo.alphaorder(Code)
from dbo.source
where custid = 123
however when i try to apply this to top(10) i receive the error
"Invalid length parameter passed to the LEFT or SUBSTRING function."
Keeping in mind my source has ~4million records, is this still the best solution?
Unfortunately i am not able to normalize the data into a separate table with records for each Code.
This doesn't rely on a id column to join with itself, performance is almost as fast
as the answer by #Shnugo:
SELECT
CustID,
(
SELECT
chr
FROM
(SELECT TOP(LEN(Code))
SUBSTRING(Code,ROW_NUMBER() OVER(ORDER BY (SELECT NULL)),1)
FROM sys.messages) A(Chr)
ORDER by chr
FOR XML PATH(''), type).value('.', 'varchar(max)'
) As CODE
FROM
source t
First of all: Avoid loops...
You can try this:
DECLARE #tbl TABLE(ID INT IDENTITY, YourString VARCHAR(100));
INSERT INTO #tbl VALUES ('ABC')
,('JSKEzXO')
,('QKEvYUJMKRC');
--the cte will create a list of all your strings separated in single characters.
--You can check the output with a simple SELECT * FROM SeparatedCharacters instead of the actual SELECT
WITH SeparatedCharacters AS
(
SELECT *
FROM #tbl
CROSS APPLY
(SELECT TOP(LEN(YourString)) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM master..spt_values) A(Nmbr)
CROSS APPLY
(SELECT SUBSTRING(YourString,Nmbr,1))B(Chr)
)
SELECT ID,YourString
,(
SELECT Chr As [*]
FROM SeparatedCharacters sc1
WHERE sc1.ID=t.ID
ORDER BY sc1.Chr
FOR XML PATH(''),TYPE
).value('.','nvarchar(max)') AS Sorted
FROM #tbl t;
The result
ID YourString Sorted
1 ABC ABC
2 JSKEzXO EJKOSXz
3 QKEvYUJMKRC CEJKKMQRUvY
The idea in short
The trick is the first CROSS APPLY. This will create a tally on-the-fly. You will get a resultset with numbers from 1 to n where n is the length of the current string.
The second apply uses this number to get each character one-by-one using SUBSTRING().
The outer SELECT calls from the orginal table, which means one-row-per-ID and use a correalted sub-query to fetch all related characters. They will be sorted and re-concatenated using FOR XML. You might add DISTINCT in order to avoid repeating characters.
That's it :-)
Hint: SQL-Server 2017+
With version v2017 there's the new function STRING_AGG(). This would make the re-concatenation very easy:
WITH SeparatedCharacters AS
(
SELECT *
FROM #tbl
CROSS APPLY
(SELECT TOP(LEN(YourString)) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM master..spt_values) A(Nmbr)
CROSS APPLY
(SELECT SUBSTRING(YourString,Nmbr,1))B(Chr)
)
SELECT ID,YourString
,STRING_AGG(sc.Chr,'') WITHIN GROUP(ORDER BY sc.Chr) AS Sorted
FROM SeparatedCharacters sc
GROUP BY ID,YourString;
Considering your table having good amount of rows (~4 Million), I would suggest you to create a persisted calculated field in the table, to store these values. As calculating these values at run time in a view, will lead to performance problems.
If you are not able to normalize, add this as a denormalized column to the existing table.
I think the error you are getting could be due to empty codes.
If LEN(#str) = 0
BEGIN
SET #output = ''
END
ELSE
BEGIN
... EXISTING CODE BLOCK ...
END
I can suggest to split string into its characters using referred SQL function.
Then you can concatenate string back, this time ordered alphabetically.
Are you using SQL Server 2017? Because with SQL Server 2017, you can use SQL String_Agg string aggregation function to concatenate characters splitted in an ordered way as follows
select
t.CustId, string_agg(strval, '') within GROUP (order by strval)
from CharacterTable t
cross apply dbo.SPLIT(t.code) s
where strval is not null
group by CustId
order by CustId
If you are not working on SQL2017, then you can follow below structure using SQL XML PATH for concatenation in SQL
select
CustId,
STUFF(
(
SELECT
'' + strval
from CharacterTable ct
cross apply dbo.SPLIT(t.code) s
where strval is not null
and t.CustId = ct.CustId
order by strval
FOR XML PATH('')
), 1, 0, ''
) As concatenated_string
from CharacterTable t
order by CustId

How can I compare these two strings in SQL Server?

So I need to compare a string against another string to see if any parts of the string match. This would be useful for checking if a list of salespeople IDs against the ones that are listed to a specific GM or if falls outside of that GMs list of IDs:
ID_SP ID_GM NEEDED FIELD (overlap)
136,338,342 512,338,112 338
512,112,208 512,338,112 512,112
587,641,211 512,338,112 null
I'm struggling on how to achieve this. I'm guessing some sort of UDF?
I realize this would be much easier to have done prior to using the for XML path(''), but I'm hoping for a solution that doesn't require me to unravel the data as that will blow up the overall size of the dataset.
No, that is not how you do it. You would go back to the raw data. To get the ids in common:
select tbob.id
from t tbob join
t tmary
on tbob.id = tmary.id and tbob.manager = 'Bob' and tmary.manager = 'Mary';
Since the data set isn't two raw sources, but one 'concatenated field' and a hardcoded string field that is a list of GMIDs (same value for every row) then the correct answer (from the starting point of the question) is to use something like nodes('/M') as Split(a).
Then you get something like this:
ID_SP ID_GM
136 512,338,112
338 512,338,112
342 512,338,112
and can do something like this:
case when ID_GM not like '%'+ID_SP+'%'then 1 else 0 end as 'indicator'
From here you can aggregate back and sum the indicator field and say that if > 0 then the ID_SP exists in the list of ID_GMs
Hope this helps someone else.
-- Try This
Declare #String1 as varchar(100)='512,112,208';
Declare #String2 as varchar(100)='512,338,112';
WITH FirstStringSplit(S1) AS
(
SELECT CAST('<x>' + REPLACE(#String1,',','</x><x>') + '</x>' AS XML)
)
,SecondStringSplit(S2) AS
(
SELECT CAST('<x>' + REPLACE(#String2,',','</x><x>') + '</x>' AS XML)
)
SELECT STUFF(
(
SELECT ',' + part1.value('.','nvarchar(max)')
FROM FirstStringSplit
CROSS APPLY S1.nodes('/x') AS A(part1)
WHERE part1.value('.','nvarchar(max)') IN(SELECT B.part2.value('.','nvarchar(max)')
FROM SecondStringSplit
CROSS APPLY S2.nodes('/x') AS B(part2)
)
FOR XML PATH('')
),1,1,'')
Gordon is correct, that you should not do this. This ought do be done with the raw data. the following code will "go back to the raw data" and solve this with an easy INNER JOIN.
The CTEs will create derived tables (all the many rows you want to avoid) and check them for equality (Not using indexes! One more reason to do this in advance):
DECLARE #tbl TABLE(ID INT IDENTITY,ID_SP VARCHAR(100),ID_GM VARCHAR(100));
INSERT INTO #tbl VALUES
('136,338,342','512,338,112')
,('512,112,208','512,338,112')
,('587,641,211','512,338,112');
WITH Splitted AS
(
SELECT t.*
,CAST('<x>' + REPLACE(t.ID_SP,',','</x><x>') + '</x>' AS xml) AS PartedSP
,CAST('<x>' + REPLACE(t.ID_GM,',','</x><x>') + '</x>' AS xml) AS PartedGM
FROM #tbl AS t
)
,SetSP AS
(
SELECT Splitted.ID
,Splitted.ID_SP
,x.value('text()[1]','int') AS SP_ID
FROM Splitted
CROSS APPLY PartedSP.nodes('/x') AS A(x)
)
,SetGM AS
(
SELECT Splitted.ID
,Splitted.ID_GM
,x.value('text()[1]','int') AS GM_ID
FROM Splitted
CROSS APPLY PartedGM.nodes('/x') AS A(x)
)
,BackToYourRawData AS --Here is the point you should do this in advance!
(
SELECT SetSP.ID
,SetSP.SP_ID
,SetGM.GM_ID
FROM SetSP
INNER JOIN SetGM ON SetSP.ID=SetGM.ID
AND SetSP.SP_ID=SetGM.GM_ID
)
SELECT ID
,STUFF((
SELECT ',' + CAST(rd2.SP_ID AS VARCHAR(10))
FROM BackToYourRawData AS rd2
WHERE rd.ID=rd2.ID
ORDER BY rd2.SP_ID
FOR XML PATH('')),1,1,'') AS CommonID
FROM BackToYourRawData AS rd
GROUP BY ID;
The result
ID CommonID
1 338
2 112,512

Getting a list of text concatenated in a group by

Say I have this data:
site cell value
a b "1"
a c "2"
And I want the output for the format:
site value
a "b=1,c=2"
Is it possible with SQL?
PS: I am using access. But even if access does not support this particular syntax I would like to know any database that can.
Declare #tbl table ([site] nvarchar(100),Cell nvarchar(100),Value nvarchar(100))
INSERT INTO #tbl values('A','b','1')
INSERT INTO #tbl values('A','c','2')
SELECT [Site],
SUBSTRING(
(
select ' ,'+ Cell +'=' + CAST(value AS VARCHAR)
from #tbl b
WHERE a.[Site] = b.[Site]
FOR XML PATH('')
)
,3,100)
FROM #tbl a
GROUP BY a.[Site]
It is possible to do this in MySQL with GROUP_CONCAT

SQL Server query with multiple values in one column relating to another column

Situation: This table holds the relation information between a Documents table and an Users table. Certain Users need to review or approve documents (Type). I would like to have it to where I could get all of the reviewers on one line if needed. So if three users review Document 1, then a row would have 346, 394, 519 as the value, since those are the reviewers
Table: xDocumentsUsers
DocID..UserID....Type...
1........386......approver
1........346......reviewer
1........394......reviewer..
1........519......reviewer..
4........408......reviewer..
5........408......reviewer..
6........408......reviewer..
7........386......approver..
7........111......readdone..
7........346......reviewer..
8........386......approver..
8........346......reviewer..
9........386......approver..
9........346......reviewer..
10.......386......approver..
11.......386......approver..
11......346......reviewer..
12......386......approver..
12......346......reviewer..
13......386......approver..
13......346......reviewer..
14......386......approver..
14......346......reviewer..
15......386......approver
So desired result would be...
DocID..UserID................Type...
1........386....................approver
1........346,394,519......reviewer.
4........408....................reviewer..
5........408....................reviewer..
6........408....................reviewer..
7........386....................approver..
7........111....................readdone..
7........346....................reviewer..
8........386....................approver..
8........346....................reviewer..
9........386....................approver..
9........346....................reviewer..
10......386....................approver..
11......386....................approver..
11......346....................reviewer..
12......386....................approver..
12......346....................reviewer..
13......386....................approver..
13......346....................reviewer..
14......386....................approver..
14......346....................reviewer..
15......386....................approver
The FOR XML PATH is a great solution. You need to be aware, though, that it will convert any special characters in the inner SELECTs result set into their xml equivalent - i.e., & will become & in the XML result set. You can easily revert back to the original character by using the REPLACE function around the inner result set. To borrow from astander's previous example, it would look like (note that the SELECT as the 1st argument to the REPLACE function is enclosed in ():
--Concat
SELECT t.ID,
REPLACE((SELECT tIn.Val + ','
FROM #Table tIn
WHERE tIn.ID = t.ID
FOR XML PATH('')), '&', '&'))
FROM #Table t
GROUP BY t.ID
Have a look at
Emulating MySQL’s GROUP_CONCAT() Function in SQL Server 2005
Is there a way to create a SQL Server function to “join” multiple rows from a subquery into a single delimited field?
A simple example is
DECLARE #Table TABLE(
ID INT,
Val VARCHAR(50)
)
INSERT INTO #Table (ID,Val) SELECT 1, 'A'
INSERT INTO #Table (ID,Val) SELECT 1, 'B'
INSERT INTO #Table (ID,Val) SELECT 1, 'C'
INSERT INTO #Table (ID,Val) SELECT 2, 'B'
INSERT INTO #Table (ID,Val) SELECT 2, 'C'
--Concat
SELECT t.ID,
(
SELECT tIn.Val + ','
FROM #Table tIn
WHERE tIn.ID = t.ID
FOR XML PATH('')
)
FROM #Table t
GROUP BY t.ID
Does this help?
SELECT DocID
, [Type]
, (SELECT CAST(UserID + ', ' AS VARCHAR(MAX))
FROM [xDocumentsUsers]
WHERE (UserID = x1.UserID)
FOR XML PATH ('')
) AS [UserIDs]
FROM [xDocumentsUsers] AS x1