Remove Characters in a String in SQL

Remove Characters in a String in SQL - sql

I have a column u_manualdoc which contains the values are like this CGY DR# 7405. I want to remove the CGY DR#.
Here's the code:
select u_manualdoc, cardcode, cardname from ODLN
I want only the 7405 number. Thanks!

Try this:
--sample data you provided in comments
declare #tbl table(codes varchar(20))
insert into #tbl values
('CGY PST - 58277') , ('CGY RMC PST # 58083'), ('CGY DR # 7443'), ('CSI # 1304'), ('PO# 0568 , 0570'), ('CGY DR# 7446')
--actual query that you can apply to your table
select SUBSTRING(codes, PATINDEX('%[0-9]%', codes), len(codes)) from #tbl
The key point here is to use patindex, which searches for a pattern and returns index where such pattern occur. I specified %[0-9]% which means that we search for any digit - it will return first occurrence of a digit. Now- since this would be our starting point to substring, we pass it to such function. Third parameter of substring is length. Since we want the rest of a string, len function makes sure that we get that :)
Applying to your naming:
select SUBSTRING(u_manualdoc, PATINDEX('%[0-9]%', u_manualdoc), len(u_manualdoc)),
cardcode,
cardname
from ODLN

You should use string functions charindex,len and substring to get it.
See the code below.
select SUBSTRING(u_manualdoc,CHARINDEX('#',u_manualdoc)+1,LEN(u_manualdoc)- CHARINDEX('#',u_manualdoc))

EDIT
In addition to the other answers, you can use this simple method:
select
substring(
u_manualdoc,
len(u_manualdoc) - patindex('%[^0-9]%', reverse(u_manualdoc)) + 2,
len(u_manualdoc)
),
cardcode, cardname
from ODLN
In this example, patindex finds the first non-digit (as specified by ^[0-9]) from the right side of the string, and then uses that as the starting point of the substring.
This will work on all of your sample strings (including 'PO# 0568 , 0570 CGY DR# 7446').
Or use SQL Server Regex, which lets you use more powerful regular expressions within your queries.

TRY THIS
DECLARE #table TABLE(DirtyCol VARCHAR(100));
INSERT INTO #table
VALUES('AB ABCDE # 123'), ('ABCDE# 123'), ('AB: ABC# 123 AB: ABC# 123'), ('AB#'), ('AB # 1 000 000'), ('AB # 1`234`567'), ('AB # (9)(876)(543)');
WITH tally
AS (
SELECT TOP (100) N = ROW_NUMBER() OVER(ORDER BY ##spid)
FROM sys.all_columns),
data
AS (
SELECT DirtyCol,
Col
FROM #table
CROSS APPLY
(
SELECT
(
SELECT C+''
FROM
(
SELECT N,
SUBSTRING(DirtyCol, N, 1) C
FROM tally
WHERE N <= DATALENGTH(DirtyCol)
) [1]
WHERE C BETWEEN '0' AND '9'
ORDER BY N FOR XML PATH('')
)
) p(Col)
WHERE p.Col IS NOT NULL)
SELECT DirtyCol,
CAST(Col AS INT) IntCol
FROM data;

Related

Is it possible to find the first occurrence of a string that's NOT within a set of delimiters in SQL Server 2016+?

I have a column in a SQL Server table that has strings of varying lengths. I need to find the position of the first occurrence of the string , -- that's not enclosed in single quotes or square brackets.
For example, in the following two strings, I've bolded the portion I would like to get the position of. Notice in the first string, the first time , -- appears on its own (without being between single quote or square bracket delimiters) is at position 13 and in the second string, it's at position 16.
'a, --'[, --]**, --**[, --]
[a, --b]aaaaaaa_ **, --**', --'
Also I should mention that , -- itself could appear multiple times in the string.
Here's a simple query that shows the strings and my desired output.
SELECT
t.string, t.desired_pos
FROM
(VALUES (N'''a, --''[, --], --[, --]', 14),
(N'[a, —-b]aaaaaaa_ , --'', --''', 18)) t(string, desired_pos)
Is there any way to accomplish this using a SELECT query (or multiple) without using a function?
Thank you in advance!
I've tried variations of SUBSTRING, CHARINDEX, and even some CROSS APPLYs but I can't seem to get the result I'm looking for.

Before i write down my solution, i must warn you: DON'T USE IT. Use a function, or do this in some other language. This code is probably buggy.
It doesn't handle stuff like escaped quotes etcetc.
The idea is to first remove the stuff inside brackets [] and quotes '' and then just do a "simple" charindex.
To remove the brackets, i'm using a recursive CTE that loops ever part of matching quotes and replaces their content with placeholder strings.
One important point is that quotes might be embedded in each other, so you have to try both variants and chose the one that is earliest.
WITH CTE AS (
SELECT *
FROM
(VALUES (N'''a, --''[, --], --[, --]', 14),
(N'[a, —-b]aaaaaaa_ , --'', --''', 18)) t(string, desired_pos)
)
, cte2 AS (
select x.start
, x.finish
, case when x.start > 0 THEN STUFF(string, x.start, x.finish - x.start + 1, REPLICATE('a', x.finish - x.start + 1)) ELSE string END AS newString
, 1 as level
, string as orig
, desired_pos
from cte
CROSS APPLY (
SELECT *
, ROW_NUMBER() OVER(ORDER BY case when start > 0 THEN 0 ELSE 1 END, start) AS sortorder
FROM (
SELECT charindex('[', string) AS start
, charindex(']', string) AS finish
UNION ALL
SELECT charindex('''', string) AS startQ
, charindex('''', string, charindex('''', string) + 1) AS finishQ
) x
) x
WHERE x.sortorder = 1
UNION ALL
select x.start
, x.finish
, STUFF(newString, x.start, x.finish - x.start + 1, REPLICATE('a', x.finish - x.start + 1))
, 1 as level
, orig
, desired_pos
from cte2
CROSS APPLY (
SELECT *
, ROW_NUMBER() OVER(ORDER BY case when start > 0 THEN 0 ELSE 1 END, start) AS sortorder
FROM (
SELECT charindex('[', newString) AS start
, charindex(']', newString) AS finish
UNION ALL
SELECT charindex('''', newString) AS startQ
, charindex('''', newString, charindex('''', newString) + 1) AS finishQ
) x
) x
WHERE x.sortorder = 1
AND x.start > 0
AND cte2.start > 0 -- Must have been a match
)
SELECT PATINDEX('%, --%', newString), *
from (
select *, row_number() over(partition by orig order by level desc) AS sort
from cte2
) x
where x.sort = 1

Try this approach. I'm replacing the strings you don't need for another string of the same length. Then look for the position of the interested string.
SELECT string, desired_pos,
CHARINDEX(', --', REPLACE(REPLACE(string, ''', --''', '******'), '[, --]', '******')
) start_index
FROM (VALUES (N''', --''[, --], --[, --]', 13),
(N'[, --]aaaaaaa_ , --'', --''', 16)) t(string, desired_pos)

I don't know if it makes sense with a C# solution, but this class for CVS is a nice little parcer: TextFieldParser
Then you just define Delimeters etc. and assuming the input is escaped consistently then all is good.

Im late the game here but This kind of thing is simple in SQL Server when leveraging NGrams8k. Not only do you not need REGEX, a CLR, C# required. Furthermore, NGrams8k will be the fastest by far. In 8 years nobody has produced anything remotely as fast. Furthermore, this code will be faster and far less complex than a recursive CTE solution (which are almost always slow in SQL Server)
;--==== Sample Data
DECLARE #T Table (String VARCHAR(100))
INSERT #T
VALUES (N'''a, --''[, --], --[, --]'),
(N'[a, —-b]aaaaaaa_ , --'', --''');
;--==== Solution
SELECT
t.String, ng.Position
FROM #t AS t
CROSS APPLY (VALUES(REPLACE(t.String,'[',CHAR(1)))) AS f(S)
CROSS APPLY samd.NGrams8k(f.S,4) AS ng
CROSS APPLY (VALUES(SUBSTRING(f.S,ng.Position-2,7))) AS g(String)
WHERE ng.Token = ', --'
AND g.String NOT LIKE '%''%''%'
AND g.String NOT LIKE '%'+CHAR(1)+'%]%';
Results:
String Position
----------------------------- --------------------
'a, --'[, --], --[, --] 14
[a, —-b]aaaaaaa_ , --', --' 18

How to SELECT string between second and third instance of ",,"?

I am trying to get string between second and third instance of ",," using SQL SELECT.
Apparently functions substring and charindex are useful, and I have tried them but the problem is that I need the string between those specific ",,"s and the length of the strings between them can change.
Can't find working example anywhere.
Here is an example:
Table: test
Column: Column1
Row1: cat1,,cat2,,cat3,,cat4,,cat5
Row2: dogger1,,dogger2,,dogger3,,dogger4,,dogger5
Result: cat3dogger3
Here is my closest attempt, it works if the strings are same length every time, but they aren't:
SELECT SUBSTRING(column1,LEN(LEFT(column1,CHARINDEX(',,', column1,12)+2)),LEN(column1) - LEN(LEFT(column1,CHARINDEX(',,', column1,20)+2)) - LEN(RIGHT(column1,CHARINDEX(',,', (REVERSE(column1)))))) AS column1
FROM testi

Just repeat sub-string 3 times, each time moving onto the next ",," e.g.
select
-- Substring till the third ',,'
substring(z.col1, 1, patindex('%,,%',z.col1)-1)
from (values ('cat1,,cat2,,cat3,,cat4,,cat5'),('dogger1,,dogger2,,dogger3,,dogger4,,dogger5')) x (col1)
-- Substring from the first ',,'
cross apply (values (substring(x.col1,patindex('%,,%',x.col1)+2,len(x.col1)))) y (col1)
-- Substring from the second ',,'
cross apply (values (substring(y.col1,patindex('%,,%',y.col1)+2,len(y.col1)))) z (col1);
And just to reiterate, this is a terrible way to store data, so the best solution is to store it properly.

Here is an alternative solution using charindex. The base idea is the same as in Dale K's an answer, but instead of cutting the string, we specify the start_location for the search by using the third, optional parameter, of charindex. This way, we get the location of each separator, and could slip each value off from the main string.
declare #vtest table (column1 varchar(200))
insert into #vtest ( column1 ) values('dogger1,,dogger2,,dogger3,,dogger4,,dogger5')
insert into #vtest ( column1 ) values('cat1,,cat2,,cat3,,cat4,,cat5')
declare #separetor char(2) = ',,'
select
t.column1
, FI.FirstInstance
, SI.SecondInstance
, TI.ThirdInstance
, iif(TI.ThirdInstance is not null, substring(t.column1, SI.SecondInstance + 2, TI.ThirdInstance - SI.SecondInstance - 2), null)
from
#vtest t
cross apply (select nullif(charindex(#separetor, t.column1), 0) FirstInstance) FI
cross apply (select nullif(charindex(#separetor, t.column1, FI.FirstInstance + 2), 0) SecondInstance) SI
cross apply (select nullif(charindex(#separetor, t.column1, SI.SecondInstance + 2), 0) ThirdInstance) TI
For transparency, I saved the separator string in a variable.
By default the charindex returns 0 if the search string is not present, so I overwrite it with the value null, by using nullif

IMHO, SQL Server 2016 and its JSON support in the best option here.
SQL
-- DDL and sample data population, start
DECLARE #tbl TABLE (ID INT IDENTITY PRIMARY KEY, Tokens VARCHAR(500));
INSERT INTO #tbl VALUES
('cat1,,cat2,,cat3,,cat4,,cat5'),
('dogger1,,dogger2,,dogger3,,dogger4,,dogger5');
-- DDL and sample data population, end
WITH rs AS
(
SELECT *
, '["' + REPLACE(Tokens
, ',,', '","')
+ '"]' AS jsondata
FROM #tbl
)
SELECT rs.ID, rs.Tokens
, JSON_VALUE(jsondata, '$[2]') AS ThirdToken
FROM rs;
Output
+----+---------------------------------------------+------------+
| ID | Tokens | ThirdToken |
+----+---------------------------------------------+------------+
| 1 | cat1,,cat2,,cat3,,cat4,,cat5 | cat3 |
| 2 | dogger1,,dogger2,,dogger3,,dogger4,,dogger5 | dogger3 |
+----+---------------------------------------------+------------+

It´s the same as #"Yitzhak Khabinsky" but i think it looks clearer
WITH CTE_Data
AS(
SELECT 'cat1,,cat2,,cat3,,cat4,,cat5' AS [String]
UNION
SELECT 'dogger1,,dogger2,,dogger3,,dogger4,,dogger5' AS [String]
)
SELECT
A.[String]
,Value3 = JSON_VALUE('["'+ REPLACE(A.[String], ',,', '","') + '"]', '$[2]')
FROM CTE_Data AS A

SQL: select the last values before a space in a string

I have a set of strings like this:
CAP BCP0018 36
MFP ACZZ1BD 265
LZP FEI-12 3
I need to extract only the last values from the right and before the space, like:
36
265
3
how will the select statement look like? I tried using the below statement, but it did not work.
select CHARINDEX(myField, ' ', -1)
FROM myTable;

Perhaps the simplest method in SQL Server is:
select t.*, v.value
from t cross apply
(select top (1) value
from string_split(t.col, ' ')
where t.col like concat('% ', val)
) v;
This is perhaps not the most performant method. You probably would use:
select right(t.col, charindex(' ', reverse(t.col)) - 1)
Note: If there are no spaces, then to prevent an error:
select right(t.col, charindex(' ', reverse(t.col) + ' ') - 1)

Since you have mentioned CHARINDEX() in question, I am assuming you are using SQL Server.
Try below
declare #table table(col varchar(100))
insert into #table values('CAP BCP0018 36')
insert into #table values('MFP ACZZ1BD 265')
insert into #table values('LZP FE-12 3')
SELECT REVERSE(LEFT(REVERSE(col),CHARINDEX(' ',REVERSE(col)) - 1)) FROM #table
Functions used
CHARINDEX ( expressionToFind , expressionToSearch ) : returns position of FIRST occurence of an expression inside another expression.
LEFT ( character_expression , integer_expression ) : Returns the left part of a character string with the specified number of characters.
REVERSE ( string_expression ) : Returns the reverse order of a string value

Scalar function to calculate items count?

I have a nvarchar column which contains values like following:
item1+item2
item1+2item2
4item1+item2+2item3
I want a scalar function to calculate the item count.
As examples above, we notice:
items separated by "+"
The item may be have digit in first. This is the item count.
The required count for above examples should be as following:
item1+item2 2
item1+2item2 3
4item1+item2+2item3 7

Here is an option using try_convert() and string_split()
This one is assuming single digit leads.
Example
Declare #YourTable Table ([SomeCol] varchar(50)) Insert Into #YourTable Values
('item1+item2')
,('item1+2item2')
,('4item1+item2+2item3')
Select *
From #YourTable A
Cross Apply (
Select value = sum(isnull(try_convert(int,left(value,1)),1))
From string_split(SomeCol,'+')
) B
Returns
SomeCol value
item1+item2 2
item1+2item2 3
4item1+item2+2item3 7

You can split the values. Then extract the leading number:
select t.*, v.the_sum
from t cross apply
(select sum(coalesce(nullif(v.num, 0), 1)) as the_sum
from string_split(col, '+') s cross apply
(values (try_convert(int, left(s.value, patindex('%[^0-9]%', ltrim(s.value) + ' ') - 1
)
)
)
) v(num)
) v;
Here is a db<>fiddle.
Note: this assumes that the prefix is never explicitly 0. That can be handled, but adds a bit of complication that doesn't seem necessary.

Order Concatenated field

I have a field which is a concatenation of single letters. I am trying to order these strings within a view. These values can't be hard coded as there are too many. Is someone able to provide some guidance on the function to use to achieve the desired output below? I am using MSSQL.
Current output
CustID | Code
123 | BCA
Desired output
CustID | Code
123 | ABC
I have tried using a UDF
CREATE FUNCTION [dbo].[Alphaorder] (#str VARCHAR(50))
returns VARCHAR(50)
BEGIN
DECLARE #len INT,
#cnt INT =1,
#str1 VARCHAR(50)='',
#output VARCHAR(50)=''
SELECT #len = Len(#str)
WHILE #cnt <= #len
BEGIN
SELECT #str1 += Substring(#str, #cnt, 1) + ','
SET #cnt+=1
END
SELECT #str1 = LEFT(#str1, Len(#str1) - 1)
SELECT #output += Sp_data
FROM (SELECT Split.a.value('.', 'VARCHAR(100)') Sp_data
FROM (SELECT Cast ('<M>' + Replace(#str1, ',', '</M><M>') + '</M>' AS XML) AS Data) AS A
CROSS APPLY Data.nodes ('/M') AS Split(a)) A
ORDER BY Sp_data
RETURN #output
END
This works when calling one field
ie.
Select CustID, dbo.alphaorder(Code)
from dbo.source
where custid = 123
however when i try to apply this to top(10) i receive the error
"Invalid length parameter passed to the LEFT or SUBSTRING function."
Keeping in mind my source has ~4million records, is this still the best solution?
Unfortunately i am not able to normalize the data into a separate table with records for each Code.

This doesn't rely on a id column to join with itself, performance is almost as fast
as the answer by #Shnugo:
SELECT
CustID,
(
SELECT
chr
FROM
(SELECT TOP(LEN(Code))
SUBSTRING(Code,ROW_NUMBER() OVER(ORDER BY (SELECT NULL)),1)
FROM sys.messages) A(Chr)
ORDER by chr
FOR XML PATH(''), type).value('.', 'varchar(max)'
) As CODE
FROM
source t

First of all: Avoid loops...
You can try this:
DECLARE #tbl TABLE(ID INT IDENTITY, YourString VARCHAR(100));
INSERT INTO #tbl VALUES ('ABC')
,('JSKEzXO')
,('QKEvYUJMKRC');
--the cte will create a list of all your strings separated in single characters.
--You can check the output with a simple SELECT * FROM SeparatedCharacters instead of the actual SELECT
WITH SeparatedCharacters AS
(
SELECT *
FROM #tbl
CROSS APPLY
(SELECT TOP(LEN(YourString)) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM master..spt_values) A(Nmbr)
CROSS APPLY
(SELECT SUBSTRING(YourString,Nmbr,1))B(Chr)
)
SELECT ID,YourString
,(
SELECT Chr As [*]
FROM SeparatedCharacters sc1
WHERE sc1.ID=t.ID
ORDER BY sc1.Chr
FOR XML PATH(''),TYPE
).value('.','nvarchar(max)') AS Sorted
FROM #tbl t;
The result
ID YourString Sorted
1 ABC ABC
2 JSKEzXO EJKOSXz
3 QKEvYUJMKRC CEJKKMQRUvY
The idea in short
The trick is the first CROSS APPLY. This will create a tally on-the-fly. You will get a resultset with numbers from 1 to n where n is the length of the current string.
The second apply uses this number to get each character one-by-one using SUBSTRING().
The outer SELECT calls from the orginal table, which means one-row-per-ID and use a correalted sub-query to fetch all related characters. They will be sorted and re-concatenated using FOR XML. You might add DISTINCT in order to avoid repeating characters.
That's it :-)
Hint: SQL-Server 2017+
With version v2017 there's the new function STRING_AGG(). This would make the re-concatenation very easy:
WITH SeparatedCharacters AS
(
SELECT *
FROM #tbl
CROSS APPLY
(SELECT TOP(LEN(YourString)) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM master..spt_values) A(Nmbr)
CROSS APPLY
(SELECT SUBSTRING(YourString,Nmbr,1))B(Chr)
)
SELECT ID,YourString
,STRING_AGG(sc.Chr,'') WITHIN GROUP(ORDER BY sc.Chr) AS Sorted
FROM SeparatedCharacters sc
GROUP BY ID,YourString;

Considering your table having good amount of rows (~4 Million), I would suggest you to create a persisted calculated field in the table, to store these values. As calculating these values at run time in a view, will lead to performance problems.
If you are not able to normalize, add this as a denormalized column to the existing table.
I think the error you are getting could be due to empty codes.
If LEN(#str) = 0
BEGIN
SET #output = ''
END
ELSE
BEGIN
... EXISTING CODE BLOCK ...
END

I can suggest to split string into its characters using referred SQL function.
Then you can concatenate string back, this time ordered alphabetically.
Are you using SQL Server 2017? Because with SQL Server 2017, you can use SQL String_Agg string aggregation function to concatenate characters splitted in an ordered way as follows
select
t.CustId, string_agg(strval, '') within GROUP (order by strval)
from CharacterTable t
cross apply dbo.SPLIT(t.code) s
where strval is not null
group by CustId
order by CustId
If you are not working on SQL2017, then you can follow below structure using SQL XML PATH for concatenation in SQL
select
CustId,
STUFF(
(
SELECT
'' + strval
from CharacterTable ct
cross apply dbo.SPLIT(t.code) s
where strval is not null
and t.CustId = ct.CustId
order by strval
FOR XML PATH('')
), 1, 0, ''
) As concatenated_string
from CharacterTable t
order by CustId

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Remove Characters in a String in SQL - sql

I have a column u_manualdoc which contains the values are like this CGY DR# 7405. I want to remove the CGY DR#. Here's the code: select u_manualdoc, cardcode, cardname from ODLN I want only the 7405 number. Thanks!

You should use string functions charindex,len and substring to get it. See the code below. select SUBSTRING(u_manualdoc,CHARINDEX('#',u_manualdoc)+1,LEN(u_manualdoc)- CHARINDEX('#',u_manualdoc))

Related

Is it possible to find the first occurrence of a string that's NOT within a set of delimiters in SQL Server 2016+?

How to SELECT string between second and third instance of ",,"?

SQL: select the last values before a space in a string

Scalar function to calculate items count?

Order Concatenated field

Categories

Resources