SQL Server : Compare same string - sql

I have a select query say
select details,* from employee
details column value can like 'very good,very good, bad'. It can have any number of comma separated values.
I want to compare text that falls between each commas and remove duplicates.
Result needs to be like 'very good,bad'
How can i implement it. Please help.
Thanks in advance.

I have create a scalar valued function fn_RemoveDuplicate which takes varchar as input and return a varchar (having no duplicates).
You can then use it as
select dbo.fn_RemoveDuplicate(details),* from employee
Create FUNCTION fn_RemoveDuplicate
(
#inputstring varchar(max)
)
RETURNS varchar(max)
AS
BEGIN
declare #test2 varchar(max)
declare #test1 xml =cast(#inputstring as xml)
SET #test2 ='<Details>'+ cast(('<detail><value1>'+replace(#inputstring,',' ,'</value1></detail><detail><value1>')+'</value1></detail>') as varchar(max))+'</Details>'
set #test1=cast(#test2 as xml)
DECLARE #Details varchar(max)
SET #Details = NULL
SELECT #Details = COALESCE(#Details + ',','') + [value1]
FROM (select distinct
t.x.value('value1[1]','Varchar(50)') as value1
from #test1.nodes('/Details/detail') t(x)) as p
return #Details
END

If you use SQL Server 2016 or later the following answer solve your problem:
select
e.*,
x.[expected_result]
from
employee e
cross apply
(select
stuff((
select
distinct
','+ltrim(rtrim(value))
from
string_split(e.details, ',')
for xml path(''))
,1 ,1 ,'') as [expected_result]) as x
I solve it by using string_split() and stuff() functions. The following link helps you to understand how they work:
STRING_SPLIT (Transact-SQL)
STUFF (Transact-SQL)
SQL Server CROSS APPLY and OUTER APPLY
Storing data with comma separated value is not a good practice. I also strongly suggest you to change the model if it is possible.

The idea for the solution is to use a table valued function (fn_SplitString), and combine the resultant table based on distinct values.
The following query should do what you want:
SELECT
[ID],[Details],
[cleansedDetails] = (SELECT
STUFF((
SELECT
DISTINCT ','+LTRIM(RTRIM(ISNULL(ncValue,cvalue)))
FROM
fn_SplitString([Details], ',')
FOR XML PATH(''))
,1 ,1 ,''))
FROM [dbo].[tb_Employee]
In this db<>fiddle, you could find the DDL & DML for my example data and the definition for the table valued function fn_SplitString. You could check how the code works in different scenarios.

Related

How to remove unwanted numbers and/or Special characters from String field in SSIS or SQL

I am new to SSIS. I am trying extract the data from SharePoint and load the data into SQL Server 2012. Most of the fields are coming fine except one. I am getting the unwanted values (random number and # character) like
117;#00.010;#120;#00.013
where I want to display
00.010;00.013
I tried to use below code in Derived column but still no luck
REPLACE([Related Procedure], SUBSTRING([Related Procedure], 1, FINDSTRING([Related Procedure], "#", 1)), "")
and this is the output I am getting if I use the above code
00.010;#120;#00.013
My desired output is
00.010;00.013
Please note this is using TSQL, it is not an SSIS expression. Below is a solution that will work in SQL Server 2017 or newer. The STRING_AGG function is SQL SERVER 2017 or newer and STRING_SPLIT is SQL SERVER 2016 or newer.
I use STRING_SPLIT to break apart the string by ;, then STRING_AGG to recombine the pieces you want to keep. I added another record to my example to demonstrate how you need to GROUP BY to keep the values in separate rows, otherwise all your values would come back in a single row.
CREATE TABLE #MyTable
(
Id INT IDENTITY(1,1)
, [Related Procedure] VARCHAR(100)
)
INSERT INTO #MyTable VALUES
('117;#00.010;#120;#00.013')
, ('118;#00.011;#121;#00.014')
SELECT
STRING_AGG(REPLACE([value], '#', ''), ';')
FROM
#MyTable
CROSS APPLY STRING_SPLIT([Related Procedure], ';')
WHERE
[value] LIKE '%.%'
GROUP BY
Id
Please try this:
IF (OBJECT_ID('tempdb..#temp_table') IS NOT NULL)
BEGIN
DROP TABLE #temp_table
END;
CREATE TABLE #temp_table
(
id int identity(1,1),
String VARCHAR(MAX)
)
INSERT #temp_table SELECT '117;#00.010;#120;#00.013'
;with tmp (id, value)as (
SELECT id, replace(value, '#','')
FROM #temp_table
CROSS APPLY STRING_SPLIT(String, ';')
where value like '%.%'
)
SELECT STUFF((SELECT '; ' + value -- or CAST(value AS VARCHAR(MAX)) [text()]
from tmp
where id= t.id
for xml path(''), TYPE) .value('.','NVARCHAR(MAX)'),1,2,' ') value
FROM tmp t
or since your sql server version is 2017, you can use STRING_AGG instead of STUFF to concatenate strings returned via CTE.
SELECT STRING_AGG(value, NVARCHAR(MAX)) AS csv FROM tmp group by id;

Order Concatenated field

I have a field which is a concatenation of single letters. I am trying to order these strings within a view. These values can't be hard coded as there are too many. Is someone able to provide some guidance on the function to use to achieve the desired output below? I am using MSSQL.
Current output
CustID | Code
123 | BCA
Desired output
CustID | Code
123 | ABC
I have tried using a UDF
CREATE FUNCTION [dbo].[Alphaorder] (#str VARCHAR(50))
returns VARCHAR(50)
BEGIN
DECLARE #len INT,
#cnt INT =1,
#str1 VARCHAR(50)='',
#output VARCHAR(50)=''
SELECT #len = Len(#str)
WHILE #cnt <= #len
BEGIN
SELECT #str1 += Substring(#str, #cnt, 1) + ','
SET #cnt+=1
END
SELECT #str1 = LEFT(#str1, Len(#str1) - 1)
SELECT #output += Sp_data
FROM (SELECT Split.a.value('.', 'VARCHAR(100)') Sp_data
FROM (SELECT Cast ('<M>' + Replace(#str1, ',', '</M><M>') + '</M>' AS XML) AS Data) AS A
CROSS APPLY Data.nodes ('/M') AS Split(a)) A
ORDER BY Sp_data
RETURN #output
END
This works when calling one field
ie.
Select CustID, dbo.alphaorder(Code)
from dbo.source
where custid = 123
however when i try to apply this to top(10) i receive the error
"Invalid length parameter passed to the LEFT or SUBSTRING function."
Keeping in mind my source has ~4million records, is this still the best solution?
Unfortunately i am not able to normalize the data into a separate table with records for each Code.
This doesn't rely on a id column to join with itself, performance is almost as fast
as the answer by #Shnugo:
SELECT
CustID,
(
SELECT
chr
FROM
(SELECT TOP(LEN(Code))
SUBSTRING(Code,ROW_NUMBER() OVER(ORDER BY (SELECT NULL)),1)
FROM sys.messages) A(Chr)
ORDER by chr
FOR XML PATH(''), type).value('.', 'varchar(max)'
) As CODE
FROM
source t
First of all: Avoid loops...
You can try this:
DECLARE #tbl TABLE(ID INT IDENTITY, YourString VARCHAR(100));
INSERT INTO #tbl VALUES ('ABC')
,('JSKEzXO')
,('QKEvYUJMKRC');
--the cte will create a list of all your strings separated in single characters.
--You can check the output with a simple SELECT * FROM SeparatedCharacters instead of the actual SELECT
WITH SeparatedCharacters AS
(
SELECT *
FROM #tbl
CROSS APPLY
(SELECT TOP(LEN(YourString)) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM master..spt_values) A(Nmbr)
CROSS APPLY
(SELECT SUBSTRING(YourString,Nmbr,1))B(Chr)
)
SELECT ID,YourString
,(
SELECT Chr As [*]
FROM SeparatedCharacters sc1
WHERE sc1.ID=t.ID
ORDER BY sc1.Chr
FOR XML PATH(''),TYPE
).value('.','nvarchar(max)') AS Sorted
FROM #tbl t;
The result
ID YourString Sorted
1 ABC ABC
2 JSKEzXO EJKOSXz
3 QKEvYUJMKRC CEJKKMQRUvY
The idea in short
The trick is the first CROSS APPLY. This will create a tally on-the-fly. You will get a resultset with numbers from 1 to n where n is the length of the current string.
The second apply uses this number to get each character one-by-one using SUBSTRING().
The outer SELECT calls from the orginal table, which means one-row-per-ID and use a correalted sub-query to fetch all related characters. They will be sorted and re-concatenated using FOR XML. You might add DISTINCT in order to avoid repeating characters.
That's it :-)
Hint: SQL-Server 2017+
With version v2017 there's the new function STRING_AGG(). This would make the re-concatenation very easy:
WITH SeparatedCharacters AS
(
SELECT *
FROM #tbl
CROSS APPLY
(SELECT TOP(LEN(YourString)) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM master..spt_values) A(Nmbr)
CROSS APPLY
(SELECT SUBSTRING(YourString,Nmbr,1))B(Chr)
)
SELECT ID,YourString
,STRING_AGG(sc.Chr,'') WITHIN GROUP(ORDER BY sc.Chr) AS Sorted
FROM SeparatedCharacters sc
GROUP BY ID,YourString;
Considering your table having good amount of rows (~4 Million), I would suggest you to create a persisted calculated field in the table, to store these values. As calculating these values at run time in a view, will lead to performance problems.
If you are not able to normalize, add this as a denormalized column to the existing table.
I think the error you are getting could be due to empty codes.
If LEN(#str) = 0
BEGIN
SET #output = ''
END
ELSE
BEGIN
... EXISTING CODE BLOCK ...
END
I can suggest to split string into its characters using referred SQL function.
Then you can concatenate string back, this time ordered alphabetically.
Are you using SQL Server 2017? Because with SQL Server 2017, you can use SQL String_Agg string aggregation function to concatenate characters splitted in an ordered way as follows
select
t.CustId, string_agg(strval, '') within GROUP (order by strval)
from CharacterTable t
cross apply dbo.SPLIT(t.code) s
where strval is not null
group by CustId
order by CustId
If you are not working on SQL2017, then you can follow below structure using SQL XML PATH for concatenation in SQL
select
CustId,
STUFF(
(
SELECT
'' + strval
from CharacterTable ct
cross apply dbo.SPLIT(t.code) s
where strval is not null
and t.CustId = ct.CustId
order by strval
FOR XML PATH('')
), 1, 0, ''
) As concatenated_string
from CharacterTable t
order by CustId

Replace Quotations in records SQL Server

I have records with quotations that I would like to replace with ''.
Example:
"ASKHELLO"SE --> ASKHELLO SE
""HELLO""1 --> HELLO 1
How can I do this in SQL Server?
I know replace function, but how do I get the pattern to check for to be any character other than "".
UPDATE
wordname
SET
wordname = REPLACE(deal, '"'+ '%', '')
This is incorrect. Help, please.
I am adding another answer based on your comment about double spaces on my original answer. ID in this case is arbitrary but I am huge fan of always having a primary key of some kind. XML we meet again!
--Setup the Table
DECLARE #T TABLE (wordname VARCHAR(25))
INSERT INTO #T VALUES ('"ASKHELLO"SE'),('""HELLO""1')
SELECT * FROM #T
--DECLARE AND SET XML REPLACING " with spaces
DECLARE #XML XML =
(
SELECT ROW_NUMBER() OVER (ORDER BY wordname ASC) AS "#ID",
CONVERT(XML,'<PART>' + REPLACE(CAST(CAST(REPLACE(wordname, '"',' ') AS VARCHAR(25)) AS VARCHAR(max)),' ',' </PART><PART>') + '</PART>') AS Word
FROM #T AS T
FOR XML PATH('Node'), ROOT('Nodes'), ELEMENTS, TYPE
)
SELECT #XML
--SHRED THE XML (WHICH WILL REMOVE NULLS) AND TRIM
;WITH
SHRED AS
(
SELECT ID = FieldAlias.value('(#ID)[1]','INT'),
WordName = FieldAlias.value('(Word)[1]','varchar(max)')
FROM #XML.nodes('/Nodes/Node') AS TableAlias(FieldAlias)
)
SELECT S.ID,
LTRIM(RTRIM(S.WordName)) AS WordName
FROM Shred AS S
And it should be relatively trivial for you to update off the shredded result set at this point, but let me know if you need that too. Replace the #T with your original table to pull off your data set.
REPLACE function does a global replace within a string. So u can do simple
UPDATE
wordname
SET
deal = REPLACE(deal, '"', '')
Assuming that "wordname" is your table and "deal" is a field you're replacing.
This will simple remove the double quotes. If you need to replace it with space use ' ' instead of ''
Does this help you? Try using LTRIM to strip off leading spaces after the replace. Here's a quick example based on your code:
DECLARE #T TABLE (wordname VARCHAR(25))
INSERT INTO #T VALUES ('"ASKHELLO"SE'),('""HELLO""1')
SELECT * FROM #T
SELECT LTRIM(REPLACE(wordname, '"',' '))
FROM #T

Extracting pipe delimted field into rows

I have a tbl with a field with values that are pipe delimited, and I need them extracted as rows.
Sample data
select distinct [PROV_KEY],
[NTWK_CDS]
FROM [SPOCK].[US\AC39169].[WellPointExtract_ERR]
where [PROV_KEY] = '447358B0A8E1C0F1B7AEB1ED07EC2F25'
--results
PROV_KEY NTWK_CDS
447358B0A8E1C0F1B7AEB1ED07EC2F25 |GA_HMO|GA_OPN|GA_PPO|GA_BD|GA_MCPPO|GA_HDPPO|
And I would like:
PROV_KEY NTWK_CDS
447358B0A8E1C0F1B7AEB1ED07EC2F25 GA_HMO
447358B0A8E1C0F1B7AEB1ED07EC2F25 GA_OPN
447358B0A8E1C0F1B7AEB1ED07EC2F25 GA_PPO
I tried the following but I'm only getting the first set of values:
select distinct [PROV_KEY],
substring([NTWK_CDS], 1,
CHARINDEX('|',[NTWK_CDS], CHARINDEX('|',[NTWK_CDS])+1))
FROM [SPOCK].[US\AC39169].[WellPointExtract_ERR]
where [PROV_KEY] = '447358B0A8E1C0F1B7AEB1ED07EC2F25'
This is a standard string splitting problem and there are many solutions out there. However most still feel like a workaround, as SQL Server does not have a split function build in.
You can start your research here: http://www.sommarskog.se/arrays-in-sql.html
The crucial operation you need to perform is a split. There are lots of solutions to this problem (see here for some), and people favor different ones depending on both situation and personal preference. Once you've done the split, though, you can JOIN or APPLY against the results to get the desired output.
I personally prefer using a SQLCLR function for this purpose since the performance is generally much better; but the number of options out there is staggering.
You can use splitting function
CREATE FUNCTION dbo.SplitStrings_CTE(#List nvarchar (1000), #Delimiter nvarchar(1 ))
RETURNS #returns TABLE(val nvarchar(100), [level] int, PRIMARY KEY CLUSTERED([level]))
AS
BEGIN
;WITH cte AS
(
SELECT SUBSTRING(#List, 0, CHARINDEX(#Delimiter , #List)) AS val ,
CAST(STUFF(#List + #Delimiter, 1, CHARINDEX(#Delimiter, #List),'') AS nvarchar (1000)) AS stval,
1 AS [level]
UNION ALL
SELECT SUBSTRING(stval, 0, CHARINDEX(#Delimiter, stval)),
CAST(STUFF(stval, 1 , CHARINDEX(#Delimiter ,stval), '') AS nvarchar(1000)),
[level] + 1
FROM cte
WHERE stval != ''
)
INSERT #returns
SELECT REPLACE(val ,' ' ,'') AS val, [level]
FROM cte
RETURN
END
Hence, your SELECT statement will be
SELECT *
FROM dbo.test82 t CROSS APPLY dbo.SplitStrings_CTE(t.NTWK_CDS, '|') o
WHERE o.val != ''
Demo on SQLFiddle

Extract one value from a column containing multiple delimited values

How can I get the value from the sixth field in the following column? I am trying to get the 333 field:
ORGPATHTXT
2123/2322/12323/111/222/333/3822
I believe I have to use select substring, but am unsure how to format the query
Assuming SQL Server
The easiest way I can think of is create a Split function that splits based on '/' and you extract the sixth item like below
declare #text varchar(50) = '2123/2322/12323/111/222/333/3822'
select txt_value from fn_ParseText2Table(#text, '/') t where t.Position = 6
I used the function in this url. See it worked at SQLFiddle
Try this - for a string variable or wrap into a function to use with a select query (Sql-Demo)
Declare #s varchar(50)='2123/2322/12323/111/222/333/3822'
Select #s = right(#s,len(#s)- case charindex('/',#s,1) when 0 then len(#s)
else charindex('/',#s,1) end)
From ( values (1),(2),(3),(4),(5)) As t(num)
Select case when charindex('/',#s,1)>0 then left(#s,charindex('/',#s,1)-1)
else #s end
--Results
333
I'd like to offer a solution that uses CROSS APPLY to split up any delimited string in MSSQL and ROW_NUMBER() to return the 6th element. This assumes you have a table with ORGPATHTXT as a field (it can easily be converted to work without the table though):
SELECT ORGPATHTXT
FROM (
SELECT
Split.a.value('.', 'VARCHAR(100)') AS ORGPATHTXT,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY (SELECT 1)) RN
FROM
(SELECT ID, CAST ('<M>' + REPLACE(ORGPATHTXT, '/', '</M><M>') + '</M>' AS XML) AS String
FROM MyTable
) AS A
CROSS APPLY String.nodes ('/M') AS Split(a)
) t
WHERE t.RN = 6;
Here is some sample Fiddle to go along with it.
Good luck.
For sql, you can use
declare #string varchar(65) = '2123/2322/12323/111/222/333/3822'
select substring(string,25,27) from table_name
If you are using MySQL, then you can use:
select substring_index(orgpathtxt, '/', 6)
Let me just say that it is less convenient in most other databases.
Also you can use option with dynamic management function sys.dm_fts_parser
DECLARE #s nvarchar(50) = '2123/2322/12323/111/222/333/3822'
SELECT display_term
FROM sys.dm_fts_parser('"'+ #s + '"', 1033, NULL, 0)
WHERE display_term NOT LIKE 'nn%' AND occurrence = 6