SQL Server : how to remove leading/trailing non-alphanumeric characters from string?

SQL Server : how to remove leading/trailing non-alphanumeric characters from string? - sql

I am using SQL Server 2008, and try to sanitize a list of urls.
Some existing examples of texts:
www.google.com
'www.google.com'
/www.google.com
www.google.com/
Ideally I can strip any leading/trailing non-alphanumeric characters so the four would give out the same output as
www.google.com

Well, if you know they are only at the beginning and end, you can do:
with t as (
select *
from (values ('www.google.com'), ('''www.google.com'''), ('/www.google.com')) v(text)
)
select t.text, v2.text2
from t cross apply
(values (stuff(t.text, 1, patindex('%[a-zA-Z0-9]%', t.text) - 1, ''))
) v(text1) cross apply
(values (case when v.text1 like '%[^a-zA-Z0-9]'
then stuff(v.text1, len(text) + 1 - patindex('%[a-zA-Z0-9]%', reverse(v.text1)), len(v.text1), '')
else v.text1
end)
) v2(text2);
Here is a db<>fiddle.

Why not just use replace() ?:
SELECT REPLACE(REPLACE(col, '''', ''), '/', '')

You should be able to use Substring. Calculating length can be tricky:
DECLARE #temp TABLE (val varchar(100))
INSERT INTO #temp VALUES
('www.google.com'),('''www.google.com'''),('/www.google.com'),('www.google.com/'),('[www.google.com];')
SELECT SUBSTRING(val
,PATINDEX('%[a-zA-Z0-9]%', val) --start at position
,LEN(val) + 2 - PATINDEX('%[a-zA-Z0-9]%', val)
- PATINDEX('%[a-zA-Z0-9]%', REVERSE(val)) --length of substring
) AS [Result]
FROM #temp
Produces output:
Result
--------------
www.google.com
www.google.com
www.google.com
www.google.com
www.google.com

Related

How do I combine a substring and trim right in SQL

I am trying to extract the data between two underscore characters. In some situations, the 2nd underscore may not exist.
MyFld
P_36840
U_216137
C_203134_H
C_203134_W
I tried this:
substring(i.[MyFld],
CHARINDEX ('_',i.[MyFld])+1,len(i.[MyFld])
-CHARINDEX ('_',i.[MyFld])
) [DerivedPrimaryKey]
And I get this:
DerivedPrimaryKey
36840
216137
203134_H
203134_W
https://dbfiddle.uk/uPKC6oX4
I want to remove the second underscore and data that follows it. I'm trying to combine it with a trim right, but I'm unsure where to start.
How can I do this?

We can start by simplifying what you have so far. I will also add enough to make this a complete query, so we can see it in context for later steps:
SELECT
right(i.MyFld, len(i.MyFld) - charindex('_', i.MyFld)) [DerivedPrimaryKey]
FROM I
With this much done, we can now use it as the source for removing the trailing portion of the field:
SELECT
reverse(substring(reverse(step1)
, charindex('_', reverse(step1))+1
, len(step1)
)) [DerivedPrimaryKey]
FROM (
SELECT right(i.MyFld, len(i.MyFld) - charindex('_', i.MyFld)) [step1]
FROM I
) T
Notice the layer of nesting. You can, of course, remove the nesting, but it means replicating the entire inner expression every time you see step1 (good thing I took the time to simplify it):
SELECT
reverse(substring(reverse(right(i.MyFld, len(i.MyFld) - charindex('_', i.MyFld)))
, charindex('_', reverse(right(i.MyFld, len(i.MyFld) - charindex('_', i.MyFld))))+1
, len(right(i.MyFld, len(i.MyFld) - charindex('_', i.MyFld)))
))
FROM I
And now back to just the expression:
reverse(substring(reverse(right(i.MyFld, len(i.MyFld) - charindex('_', i.MyFld)))
, charindex('_', reverse(right(i.MyFld, len(i.MyFld) - charindex('_', i.MyFld))))+1
, len(right(i.MyFld, len(i.MyFld) - charindex('_', i.MyFld)))
))
See it work here:
https://dbfiddle.uk/nFO4Vwhm
There is also this alternate expression that saves one function call:
left( right(i.MyFld,len(i.MyFld)-charindex('_',i.MyFld)),
coalesce(
nullif(
charindex('_',
right(i.MyFld,len(i.MyFld)-charindex('_',i.MyFld))
) -1, -1,
),
len( right(i.MyFld,len(i.MyFld)-charindex('_',i.MyFld)) )
)
)

Just a two more options. One using parsename() provided your data does not have more than 4 segments. The second using a JSON array
Example
Declare #YourTable Table ([MyFld] varchar(50)) Insert Into #YourTable Values
('P_36840')
,('U_216137')
,('C_203134_H')
,('C_203134_W')
Select *
,UsingParseName = reverse(parsename(reverse(replace(MyFld,'_','.')),2))
,UsingJSONValue = json_value('["'+replace(MyFld,'_','","')+'"]','$[1]')
From #You
Results
MyFld UsingParseName UsingJSONValue
P_36840 36840 36840
U_216137 216137 216137
C_203134_H 203134 203134
C_203134_W 203134 203134

We can do this:
Declare #testData Table ([MyFld] varchar(50));
Insert Into #testData (MyFld)
Values ('P_36840')
, ('U_216137')
, ('C_203134_H')
, ('C_203134_W');
Select *
, second_element = substring(v.MyFld, p1.pos, p2.pos - p1.pos - 1)
From #testData As td
Cross Apply (Values (concat(td.MyFld, '__'))) As v(MyFld) -- Make sure we have at least 2 delimiters
Cross Apply (Values (charindex('_', v.MyFld, 1) + 1)) As p1(pos) -- First Position
Cross Apply (Values (charindex('_', v.MyFld, p1.pos) + 1)) As p2(pos) -- Second Position
If you actually have a fixed number of characters in the first element, then it could be simplified to:
Select *
, second_element = substring(v.MyFld, 3, charindex('_', v.MyFld, 4) - 3)
From #testData td
Cross Apply (Values (concat(td.MyFld, '_'))) As v(MyFld)

Often I try to fake out SQL if an expected character isn't always present and I don't need the resulting value:
SELECT SUBSTRING(field_Calculated, 1, CHARINDEX('_', field_Calculated) - 1)
FROM (SELECT SUBSTRING(MyFld, CHARINDEX('_', MyFld) + 1, LEN(MyFld)) + '_' As field_Calculated
FROM MyTable) T
I think this is clear, but I really like the ParseName solution #JohnCappalletti suggests.
If it's only ever one numeric value you can use string_split:
SELECT * FROM MyTable
CROSS APPLY string_split(MyFld, '_')
WHERE ISNUMERIC(value) = 1
Either way you have to be careful of the data before deciding the best approach.

your data
Declare #Table Table ([MyFld] varchar(100))
Insert Into #Table
([MyFld] ) Values
('P_36840')
,('U_216137')
,('C_203134_H')
,('C_203134_W')
use SubString,Left and PatIndex
select
Left(
SubString(
[MyFld],
PatIndex('%[0-9.-]%', [MyFld]),
8000
),
PatIndex(
'%[^0-9.-]%',
SubString(
[MyFld],
PatIndex('%[0-9.-]%', [MyFld]),
8000
) + 'X'
)-1
) as DerivedPrimaryKey
from
#Table

Search between the third and forth occurrence in a string

I have a string that contains some text:
Example:
XPTOP XPTOP 4WS00632 BLACK VERNIS
I want to extract only
4WS00632
and ignore everything else.
I did try indexing the spaces, using charindex for searching all the spaces and then a substring to start at x, end at y.
But, no luck. Sometimes it returns "4WS0063" or "P 4WS00632".
This data is not coherent only "XPTOP XPTOP" is somewhat coherent.
For instance "4WS00632" this has 8 digits, but it might have 9 or 12.
So, I really need to catch everything in between the 3rd space and 4th space.

Yet another option is with a bit of JSON
Example or dbFiddle
Select A.SomeCol
,NewVal = JSON_VALUE('["'+replace(SomeCol,' ','","')+'"]','$[2]')
From YourTable A
Results
SomeCol NewVal
XPTOP XPTOP 4WS00632 BLACK VERNIS 4WS00632

Try this technique:
DECLARE #text NVARCHAR(MAX) = 'XPTOP XPTOP 4WS00632 BLACK VERNIS';
DECLARE #text_xml XML = '<a>' + REPLACE(#text, ' ', '</a><a>')+ '</a>';
WITH DataSource (element_pos, element_text) AS
(
SELECT ROW_NUMBER() OVER (ORDER BY T.c)
,T.c.value('(.)[1]', 'nvarchar(128)')
FROM #text_xml.nodes('a') T(c)
)
SELECT element_text
FROM DataSource
WHERE element_pos = 3;
The idea is to split the string by space and order the strings by their position. Then simply get the 3rd one.

One method is string_split() along with some like comparisons:
select t.*, s.value
from t cross apply
(select s.value
from string_split(t.str, ' ') s
where t.str like concat('% % % ', s.value, ' %') and
t.str not like concat('% % % % ', s.value, ' %')
) s

This is one way of using a combination of charindex and stuff.
It could be done in a single statement but I broke it into a couple of logical steps to make it clearer.
create table #test (col1 varchar(100))
insert into #test values ('XPTOP XPTOP 4WS00632 BLACK VERNIS')
with s1(s) as (select Stuff(col1,1,CharIndex(' ',col1),'') from #test),
s2(s) as (select Stuff(s,1, CharIndex(' ',s),'') from s1)
select Left(s, charIndex(' ',s) )
from s2;

Here is a solution using charindex - and substring:
Declare #testTable Table (TestString varchar(100));
Insert Into #testTable (TestString)
Values ('XPTOP XPTOP 4WS00632 BLACK VERNIS')
, ('XYZ XYZ JW2B0037VK3 S17 SENAPE BLACK');
Select tt.TestString
, part1 = ltrim(substring(v.TestString, 1, p01.pos - 2))
, part2 = ltrim(substring(v.TestString, p01.pos, p02.pos - p01.pos - 1))
, part3 = ltrim(substring(v.TestString, p02.pos, p03.pos - p02.pos - 1))
From #testTable As tt
Cross Apply (Values (concat(tt.TestString, replicate(' ', 3)))) As v(TestString)
Cross Apply (Values (charindex(' ', v.TestString, 1) + 1)) As p01(pos)
Cross Apply (Values (charindex(' ', v.TestString, p01.pos) + 1)) As p02(pos)
Cross Apply (Values (charindex(' ', v.TestString, p02.pos) + 1)) As p03(pos)

Find Substring in SQL

I have to find substring as follows.
Data as below
aaaa.bbb.ccc.dddd.eee.fff.ggg
qq.eeddde.rrr.t.hh.jj.jj.hh.hh
ee.r.t.y.u.i.ii.
I want output as-
bbb
eeeddde
r
challenge I am facing is all have (.) as separator so sub-string is tough to work.
SELECT SUBSTRING(string,CHARINDEX('.',string)+1,
(((LEN(string))-CHARINDEX('.', REVERSE(string)))-CHARINDEX('.',string))) AS Result
FROM [table]
bbb
eeeddde
r
looking substring between first and secound (.)
then it might be between second and third (.)

Here is one method:
select left(v.str1, charindex('.', v.str1 + '.') - 1)
from t cross apply
(values (stuff(t.string, 1, charindex('.', t.string + '.'), '')
) v(str1)

I assume (CHARINDEX) this is ms sql server.
CROSS APPLY is handy for intermediate calculations.
SELECT t.pos, t1.pos,
SUBSTRING(string, t.pos + 1, t1.pos - t.pos -1) AS Result
FROM [table]
CROSS APPLY ( VALUES(CHARINDEX('.',string)) ) t(pos)
CROSS APPLY ( VALUES(CHARINDEX('.',string, t.pos+1))) t1(pos)

Just another option is to use a little XML
Example
Declare #YourTable table (ID int,SomeColumn varchar(max))
Insert Into #YourTable values
(1,'aaaa.bbb.ccc.dddd.eee.fff.ggg')
,(2,'qq.eeddde.rrr.t.hh.jj.jj.hh.hh')
,(3,'ee.r.t.y.u.i.ii.')
Select ID
,SomeValue = convert(xml,'<x>' + replace(SomeColumn,'.','</x><x>')+'</x>').value('/x[2]','varchar(100)')
From #YourTable
Returns
ID SomeValue
1 bbb
2 eeddde
3 r

You can use left(), replace() and charindex() functions together :
select replace(
replace(
left(str,charindex('.',str,charindex('.',str)+1)),
left(str,charindex('.',str)),
''
),
'.'
,''
) as "Output"
from t;
Demo

Pad Zero before first hypen and remove spaces and add BA and IN

I have data as below
98-45.3A-22
104-44.0A-23
00983-29.1-22
01757-42.5A-22
04968-37.3A2-23
Output Looking for output as below in SQL Server
00098-BA45.3A-IN-22
00104-BA44.0A-IN-23
00983-BA29.1-IN-22
01757-BA42.5A-IN-22
04968-BA37.3A2-IN-23

I splitted parts to cope with tricky data templates. This should work even with non-dash-2-digit tail:
WITH Src AS
(
SELECT * FROM (VALUES
('98-45.3A-22'),
('104-44.0A-23'),
('00983-29.1-22'),
('01757-42.5A-22'),
('04968-37.3A2-23')
) T(X)
), Parts AS
(
SELECT *,
RIGHT('00000'+SUBSTRING(X, 1, CHARINDEX('-',X, 1)-1),5) Front,
'BA'+SUBSTRING(X, CHARINDEX('-',X, 1)+1, 2) BA,
SUBSTRING(X, PATINDEX('%.%',X), LEN(X)-CHARINDEX('-', REVERSE(X), 1)-PATINDEX('%.%',X)+1) P,
SUBSTRING(X, LEN(X)-CHARINDEX('-', REVERSE(X), 1)+1, LEN(X)) En
FROM Src
)
SELECT Front+'-'+BA+P+'-IN'+En
FROM Parts
It returns:
00098-BA45.3A-IN-22
00104-BA44.0A-IN-23
00983-BA29.1-IN-22
01757-BA42.5A-IN-22
04968-BA37.3A2-IN-23

Try this,
DECLARE #String VARCHAR(100) = '98-45.3A-22'
SELECT ISNULL(REPLICATE('0',6 - CHARINDEX('-',#String)),'') -- Add leading Zeros
+ STUFF(
STUFF(#String,CHARINDEX('-',#String),1,'-BA'), -- Add 'BA'
CHARINDEX('-',#String,CHARINDEX('-',#String)+1)+2, -- 2 additional for the character 'BA'
1,'-IN') -- Add 'IN'
What if I have more than 6 digit number before first hyphen and want to remove the leading zeros to make it 6 digits.
DECLARE #String VARCHAR(100) = '0000098-45.3A-22'
SELECT CASE WHEN CHARINDEX('-',#String) <= 6
THEN ISNULL(REPLICATE('0',6 - CHARINDEX('-',#String)),'') -- Add leading Zeros
+ STUFF(
STUFF( #String,CHARINDEX('-',#String),1,'-BA'), -- Add 'BA'
CHARINDEX('-',#String,CHARINDEX('-',#String)+1)+2, -- 2 additional for the character 'BA'
1,'-IN') -- Add 'IN'
ELSE STUFF(
STUFF(
STUFF(#String,CHARINDEX('-',#String),1,'-BA'), -- Add 'BA'
CHARINDEX('-',#String,CHARINDEX('-',#String)+1)+2, -- 2 additional for the character 'BA'
1,'-IN'), -- Add 'IN'
1, CHARINDEX('-',#String) - 6, '' -- remove extra leading Zeros
)
END

Making assumptions that the format is consistent (e.g. always ends with "-" + 2 characters....)
DECLARE #Data TABLE (Col1 VARCHAR(100))
INSERT #Data ( Col1 )
SELECT Col1
FROM (
VALUES ('98-45.3A-22'), ('104-44.0A-23'),
('00983-29.1-22'), ('01757-42.5A-22'),
('04968-37.3A2-23')
) x (Col1)
SELECT RIGHT('0000' + LEFT(Col1, CHARINDEX('-', Col1) - 1), 5)
+ '-BA' + SUBSTRING(Col1, CHARINDEX('-', Col1) + 1, CHARINDEX('.', Col1) - CHARINDEX('-', Col1))
+ SUBSTRING(Col1, CHARINDEX('.', Col1) + 1, LEN(Col1) - CHARINDEX('.', Col1) - 3)
+ '-IN-' + RIGHT(Col1, 2)
FROM #Data
It's not ideal IMO to do this string manipulation all the time in SQL. You could shift it out to your presentation layer, or store the pre-formatted value in the db to save the cost of this every time.

Use REPLICATE AND CHARINDEX:
Replicate: will repeat given character till reach required count specify in function
CharIndex: Finds the first occurrence of any character
Declare #Data AS VARCHAR(50)='98-45.3A-22'
SELECT REPLICATE('0',6-CHARINDEX('-',#Data)) + #Data
SELECT
SUBSTRING
(
(REPLICATE('0',6-CHARINDEX('-',#Data)) +#Data)
,0
,6
)
+'-'+'BA'+ CAST('<x>' + REPLACE(#Data,'-','</x><x>') + '</x>' AS XML).value('/x[2]','varchar(max)')
+'-'+ 'IN'+ '-' + CAST('<x>' + REPLACE(#Data,'-','</x><x>') + '</x>' AS XML).value('/x[3]','varchar(max)')

In another way by using PARSENAME() you can use this query:
WITH t AS (
SELECT
PARSENAME(REPLACE(REPLACE(s, '.', '###'), '-', '.'), 3) AS p1,
REPLACE(PARSENAME(REPLACE(REPLACE(s, '.', '###'), '-', '.'), 2), '###', '.') AS p2,
PARSENAME(REPLACE(REPLACE(s, '.', '###'), '-', '.'), 1) AS p3
FROM yourTable)
SELECT RIGHT('00000' + p1, 5) + '-BA' + p2 + '-IN-' + p3
FROM t;

Selecting between quotes (") in SQL Server 2012

I have a table holding IDs in one column and a string in the second column like below.
COLUMN01 COLUMN02
----------------------------------------------------------------------------------
1 abc"11444,12,13"efg"14,15"hij"16,17,18,19"opqr
2 ahsdhg"21,22,23"ghshds"24,25"fgh"26,27,28,28"shgshsg
3 xvd"3142,32,33"hty"34,35"okli"36,37,38,39"adfd
Now I want to have the following result
COLUMN01 COLUMN02
-----------------------------------------------------------
1 11444,12,13,14,15,16,17,18,19
2 21,22,23,24,25,26,27,28,28
3 3142,32,33,34,35,36,37,38,39
How can I do that?
Thanks so much

Here is one way (maybe not the best, but it seems to work). I am NOT a SQL guru...
First, create this SQL Function. It came from: Extract numbers from a text in SQL Server
create function [dbo].[GetNumbersFromText](#String varchar(2000))
returns table as return
(
with C as
(
select cast(substring(S.Value, S1.Pos, S2.L) as int) as Number,
stuff(s.Value, 1, S1.Pos + S2.L, '') as Value
from (select #String+' ') as S(Value)
cross apply (select patindex('%[0-9]%', S.Value)) as S1(Pos)
cross apply (select patindex('%[^0-9]%', stuff(S.Value, 1, S1.Pos, ''))) as S2(L)
union all
select cast(substring(S.Value, S1.Pos, S2.L) as int),
stuff(S.Value, 1, S1.Pos + S2.L, '')
from C as S
cross apply (select patindex('%[0-9]%', S.Value)) as S1(Pos)
cross apply (select patindex('%[^0-9]%', stuff(S.Value, 1, S1.Pos, ''))) as S2(L)
where patindex('%[0-9]%', S.Value) > 0
)
select Number
from C
)
Then, you can do something like this to get the results you were asking for. Note that I broke the query up into 3 parts for clarity. And, obviously, you don't need to declare the table variable and insert data into it.
DECLARE #tbl
TABLE (
COLUMN01 int,
COLUMN02 varchar(max)
)
INSERT INTO #tbl VALUES (1, 'abc"11444,12,13"efg"14,15"hij"16,17,18,19"opqr')
INSERT INTO #tbl VALUES (2, 'ahsdhg"21,22,23"ghshds"24,25"fgh"26,27,28,28"shgshsg')
INSERT INTO #tbl VALUES (3, 'xvd"3142,32,33"hty"34,35"okli"36,37,38,39"adfd')
SELECT COLUMN01, SUBSTRING(COLUMN02, 2, LEN(COLUMN02) - 1) as COLUMN02 FROM
(
SELECT COLUMN01, REPLACE(COLUMN02, ' ', '') as COLUMN02 FROM
(
SELECT COLUMN01, (select ',' + number as 'data()' from dbo.GetNumbersFromText(Column02) for xml path('')) as COLUMN02 FROM #tbl
) t
) tt
GO
output:
COLUMN01 COLUMN02
1 11444,12,13,14,15,16,17,18,19
2 21,22,23,24,25,26,27,28,28
3 3142,32,33,34,35,36,37,38,39

I know you want to do it using SQL. But ones I had nearly the same problem and getting this data to a string using a php or another language, than parsing is a way to do it. For example, you can use this kind of code after receiving the data into a string.
function clean($string) {
$string = str_replace(' ', '-', $string); // Replaces all spaces with hyphens.
$string = preg_replace('/[^A-Za-z0-9\-]/', '', $string); // Removes special chars.
return preg_replace('/-+/', '-', $string); // Replaces multiple hyphens with single one.
}
For more information you might want to look at this post that I retrieved the function: Remove all special characters from a string
As I said this is an easy way to do it, I hope this could help.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL Server : how to remove leading/trailing non-alphanumeric characters from string? - sql

I am using SQL Server 2008, and try to sanitize a list of urls. Some existing examples of texts: www.google.com 'www.google.com' /www.google.com www.google.com/ Ideally I can strip any leading/trailing non-alphanumeric characters so the four would give out the same output as www.google.com

Why not just use replace() ?: SELECT REPLACE(REPLACE(col, '''', ''), '/', '')

Related

How do I combine a substring and trim right in SQL

Search between the third and forth occurrence in a string

Find Substring in SQL

Pad Zero before first hypen and remove spaces and add BA and IN

Selecting between quotes (") in SQL Server 2012

Categories

Resources