Need help in string conversion based on multiple rules - sql

I have a func_name column. I have set of rules to be forced on this func_name column to create a func_short_name.
The desired logic for func_short_name is:
use whatever is to the right of '>'; preceded by whatever comes after each
'+' or '#' from FUNC_NAME field
Each time '+' or '#' appears, append it to the func_short_name
Example:
func_name: toolbox/matlab/cefclient/+matlab/+internal/getOpenPort.p>getOpenPort
func_short_name : matlab.internal.getOpenPort
The above example appends 'matlab' with 'internal' since they are followed by a '+' and the 'getOpenPort' since it is to the right of '>'
How do I take in account for each occurence of '+' or 'a'. Could someone help me construct a SQL or stored proc? Thanks!
I have tried implementing the rules separately but I am unable to do a recursive append of each occurence of '+' or '#'
select substring(FUNC_NAME,charindex('a',FUNC_NAME)+1,100)
FROM table
select FUNC_NAME,
charindex('#',FUNC_NAME)+1,
charindex('/',FUNC_NAME)-1
from table
select concat(substring(FUNC_NAME,charindex('#',FUNC_NAME)+1,charindex('/',FUNC_NAME)-1),'.',substring(FUNC_NAME,charindex('>',FUNC_NAME)+1,100))
FROM table
func_name: toolbox/matlab/cefclient/+matlab/+internal/getOpenPort.p>getOpenPort
func_short_name : matlab.internal.getOpenPort
Another example:
func name:
toolbox/symbolic/symbolic/#sym/#aem/diag.m>diag
func_short_name:
sym.aem.diag

This should do it regardless of the version of SQL Server.
DECLARE #func_name VARCHAR(200);
DECLARE #func_short_name VARCHAR(100) = '';
DECLARE #i INT = 1;
DECLARE #func_name_length INT;
DECLARE #start_position INT = 0;
DECLARE #end_position INT = 0;
DECLARE #gt_position INT = 0;
SET #func_name = 'toolbox/matlab/cefclient/+matlab/+internal/getOpenPort.p>getOpenPort';
--SET #func_name = 'toolbox/symbolic/symbolic/#sym/#aem/diag.m>diag';
SET #i = 1;
SET #func_name_length = LEN(#func_name);
-- loop through string character by character
WHILE #i <= #func_name_length
BEGIN
IF (SUBSTRING(#func_name, #i, 1)) IN ('+', '#')
BEGIN
SET #start_position = #i;
END;
-- ending character found after starting character has been found
IF (SUBSTRING(#func_name, #i, 1)) = '/'
AND #start_position > 0
BEGIN
SET #end_position = #i;
SET #func_short_name = #func_short_name
+ SUBSTRING(#func_name, #start_position + 1, (#end_position - 1) - #start_position)
+ '.';
SET #start_position = 0;
END;
SET #i += 1;
END;
-- find greater than character
SET #gt_position = CHARINDEX('>', #func_name);
SET #func_short_name = #func_short_name + SUBSTRING(#func_name, #gt_position + 1, #func_name_length - #gt_position);
SELECT #func_name AS [FUNC NAME], #func_short_name AS [FUNC SHORT NAME];

Only if it's SQL Server 2017+
Initialization:
DECLARE #Table TABLE (Func_Name NVARCHAR(MAX));
INSERT INTO #Table (Func_Name)VALUES
('toolbox/matlab/cefclient/+matlab/+internal/getOpenPort.p>getOpenPort')
,('toolbox/symbolic/symbolic/#sym/#aem/diag.m>diag')
;
The Code:
SELECT STRING_AGG(REPLACE(REPLACE(a.[value],'+',''),'#',''),'.')
WITHIN GROUP (ORDER BY rn DESC) AS [Result]
FROM (
SELECT b.ID,j.[value],ROW_NUMBER()OVER(PARTITION BY b.ID ORDER BY j.[Key] DESC) AS [rn]
FROM (
SELECT '["' + REPLACE(REPLACE(t.Func_Name,'/','","'),'>','","') + '"]' AS [value]
,ROW_NUMBER()OVER(ORDER BY (SELECT 1)) AS [ID]
FROM #Table t
) b
CROSS APPLY OPENJSON(b.[value]) j
) a
WHERE (a.[value] LIKE '[+]%' OR a.[value] LIKE '[#]%' OR a.rn = 1 /*last piece*/)
GROUP BY a.ID
;

This solution based on a recursive common table expression works with any SQL server version as of 2005 but is not really strict when it comes to the separating characters. The pattern '%[#+>]%' looks for any one of the characters: #, + or >. It does return the desired result though: matlab.internal.getOpenPort
declare #f varchar(255);
set #f='toolbox/matlab/cefclient/+matlab/+internal/getOpenPort.p>getOpenPort'+'/';
With rcte as (
select 0 n,#f str, patindex('%[#+>]%',#f) p union all
select p, substring(str, p+1,255),
patindex('%[#+>]%',substring(str, p+1,255))
from rcte where p>0
)
SELECT STUFF((SELECT '.' +LEFT(str,charindex('/',str)-1)
FROM rcte
WHERE n>0
FOR XML PATH('')), 1, 1, '') short
#f is the variable containing the long function name (with an added / at the end) that needs to be converted.
See here for a demo: https://rextester.com/MFVO10768

Related

Count numeric chars in string

Using tsql I want to count a numeric chars in string. For example i've got 'kick0my234ass' string and i wanna count how many (4 in that example) numbers are in that string. I can't use regex, just plain tslq.
You COULD do this I suppose:
declare #c varchar(30)
set #c = 'kick0my234ass'
select #c, len(replace(#c,' ','')) - len(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(#c,'0',''),'1',''),'2',''),'3',''),'4',''),'5',''),'6',''),'7',''),'8',''),'9',''),' ',''))
You'll first have to split the character string in its individual characters, evaluate which are numeric, and finally count those that are. This will do the trick:
DECLARE #test TABLE (Example NVARCHAR(255))
INSERT #test
VALUES ('kick0my234ass')
SELECT COUNT(1)
FROM #test AS T
INNER JOIN master..spt_values v
ON v.type = 'P'
AND v.number < len(T.Example)
WHERE SUBSTRING(T.Example, v.number + 1, 1) LIKE '[0-9]'
You could try this solution with regular expressions (if you'd allow them):
it uses recursive CTE, at every recursive step, one digit is removed from given string and the condition is to stop, when there are no digits in string. The rows are also numbered with consecutive ids, so the last id is the amount of removed digits from string.
declare #str varchar(100) = 'kick0my123ass';
with cte as (
select 1 [id], stuff(#str,PATINDEX('%[0-9]%', #str),1,'') [col]
union all
select [id] + 1, stuff([col],PATINDEX('%[0-9]%', [col]),1,'') from cte
where col like '%[0-9]%'
)
--this will give you number of digits in string
select top 1 id from cte order by id desc
Use a WHILE loop to each each character is a numeric or not.
Query
declare #text as varchar(max) = 'kick0my234ass';
declare #len as int;
select #len = len(#text);
if(#len > 0)
begin
declare #i as int = 1;
declare #count as int = 0;
while(#i <= #len)
begin
if(substring(#text, #i, 1) like '[0-9]')
set #count += 1;
set #i += 1;
end
print 'Count of Numerics in ' + #text + ' : ' + cast(#count as varchar(100));
end
else
print 'Empty string';
If simplicity & performance are important I suggest a purely set-based solution. Grab a copy of DigitsOnlyEE which will remove all non-numeric characters. Then use LEN against the output.
DECLARE #string varchar(100) = '123xxx45ff678';
SELECT string = #string, digitsOnly, DigitCount = LEN(digitsOnly)
FROM dbo.DigitsOnlyEE(#string);
Results
string digitsOnly DigitCount
------------------ ----------- ------------
123xxx45ff678 12345678 8
using a Tally Table created by an rCTE:
CREATE TABLE #Sample (S varchar(100));
INSERT INTO #Sample
VALUES ('kick0my234 ass');
GO
WITH Tally AS(
SELECT 1 AS N
UNION ALL
SELECT N + 1
FROM Tally
WHERE N + 1 <= 100)
SELECT S.S, SUM(CASE WHEN SUBSTRING(S,T.N, 1) LIKE '[0-9]' THEN 1 ELSE 0 END) AS Numbers
FROM #Sample S
JOIN Tally T ON LEN(S.S) >= T.N
GROUP BY S.S;
For future reference, also post your owns attempts please. We aren't here (really) to do your work for you.

How to convert Hex to String in Sql server

Here is my hex input
0x3c0x3c0x5bIMG0x5d0x5bSIZE0x5dHALF0x5b0x2fSIZE0x5d0x5bID0x5d540x5b0x2fID0x5d0x5b0x2fIMG0x5d0x3e0x3e
Expected output is :
<<[IMG][SIZE]HALF[/SIZE][ID]54[/ID][/IMG]>>
Your string is mixing hex and char data, so you need to parse it with a code. A tricky part is converting 0xCC substring to a char it represents. First pretend it's binary and then cast to char. Using recursion to iterate over all 0xCC substrings
declare #imp nvarchar(max) = '0x3c0x3c0x5bIMG0x5d0x5bSIZE0x5dHALF0x5b0x2fSIZE0x5d0x5bID0x5d540x5b0x2fID0x5d0x5b0x2fIMG0x5d0x3e0x3e';
with cte as (
select replace(col, val, cast(convert(binary(2), val, 1) as char(1))) as col
from (
-- sample table
select #imp as col
) tbl
cross apply (select patindex('%0x__%',tbl.col) pos) p
cross apply (select substring(col,pos,4) val) v
union all
select replace(col, val, cast(convert(binary(2), val, 1) as char(1))) as col
from cte
cross apply (select patindex('%0x__%',col) pos) p
cross apply (select substring(col,pos,4) val) v
where pos > 0
)
select *
from cte
where patindex('%0x__%',col) = 0;
Returns
col
<<[IMG][SIZE]HALF[/SIZE][ID]54[/ID][/IMG]>>
If it's for only a small set of ascii codes that always need replacement in a variable, then you can also replace them like this:
declare #string varchar(max) = '0x3c0x3c0x5bIMG0x5d0x5bSIZE0x5dHALF0x5b0x2fSIZE0x5d0x5bID0x5d540x5b0x2fID0x5d0x5b0x2fIMG0x5d0x3e0x3e';
select #string = replace(#string,hex,chr)
from (values
('0x3c','<'),
('0x3e','>'),
('0x5b','['),
('0x5d',']'),
('0x2f','/')
) hexes(hex,chr);
select #string as string;
Returns:
string
------
<<[IMG][SIZE]HALF[/SIZE][ID]54[/ID][/IMG]>>
If there are more characters, or hardcoding is frowned upon?
Then looping a replacement will also get that result:
declare #string varchar(max) = '0x3c0x3c0x5bIMG0x5d0x5bSIZE0x5dHALF0x5b0x2fSIZE0x5d0x5bID0x5d540x5b0x2fID0x5d0x5b0x2fIMG0x5d0x3e0x3e';
declare #loopcount int = 0;
declare #hex char(4);
while (patindex('%0x[0-9][a-f0-9]%',#string)>0
and #loopcount < 128) -- just safety measure to avoid infinit loop
begin
set #hex = substring(#string,patindex('%0x[0-9][a-f0-9]%',#string),4);
set #string = replace(#string, #hex, convert(char(1),convert(binary(2), #hex, 1)));
set #loopcount = #loopcount + 1;
end;
select #string as string;
If you would wrap it in a UDF then you can even use it in a query.

Change characters but keep length

I am migrating sensitive data to a database, and I need to hide details of the text. We would like to keep the volume and length of the text, but change the meaning.
For example:
"James has been well received, and should be helped when ever he finds it hard to speak"
should change to:
"jhdfy dfw aslk dfe kjdfkjd, kjf kjdsf df iotryy erhr lsdj jf ytwe it kjdf tr kjsdd"
Is there a way to update all rows, set the column text to this random type text? Really only want to change charactors (a-z, A-Z), and keep the rest.
One option is to use a bunch of nested replaces . . . but that would probably hit on the maximum number of nested functions.
You could write a painful query using outer apply:
select
from t outer apply
(select replace(t.col, 'a', 'z') as col1) outer apply
(select replace(col1, 'b', 'y') ) outer apply
. . .
However, you might want to write your own function. In other databases, this is called translate() (after the Unix command). If you Google SQL Server translate, I think you'll find examples on the web.
One way is to split the string character by character and replace each row with a random string. And then concatenate them back to get the desired output
DECLARE #str VARCHAR(MAX) = 'James has been well received, and should be helped when ever he finds it hard to speak'
;WITH Cte(orig, random) AS(
SELECT
SUBSTRING(t.a, v.number + 1, 1),
CASE
WHEN SUBSTRING(t.a, v.number + 1, 1) LIKE '[a-z]'
THEN CHAR(ABS(CHECKSUM(NEWID())) % 25 + 97)
ELSE SUBSTRING(t.a, v.number + 1, 1)
END
FROM (SELECT #str) t(a)
CROSS JOIN master..spt_values v
WHERE
v.number < LEN(t.a)
AND v.type = 'P'
)
SELECT
OrignalString = #str,
RandomString = (
SELECT '' + random
FROM Cte FOR XML PATH(''), TYPE).value('.', 'NVARCHAR(MAX)'
)
TRY IT HERE
OK this is possible using a user defined function (UDF) and a view.
SQL Server does not allow random number generation in a UDF but does allow it in a view. Ref: http://blog.sqlauthority.com/2012/11/20/sql-server-using-rand-in-user-defined-functions-udf/
So here is the solution
CREATE VIEW [dbo].[rndView]
AS
SELECT RAND() rndResult
GO
CREATE FUNCTION [dbo].[RandFn]()
RETURNS float
AS
BEGIN
DECLARE #rndValue float
SELECT #rndValue = rndResult
FROM rndView
RETURN #rndValue
END
GO
CREATE FUNCTION [dbo].[randomstring] ( #stringToParse VARCHAR(MAX))
RETURNS
varchar(max)
AS
BEGIN
/*
A = 65
Z = 90
a = 97
z = 112
declare #stringToParse VARCHAR(MAX) = 'James has been well received, and should be helped when ever he finds it hard to speak'
Select [dbo].[randomstring] ( #stringToParse )
go
Update SpecialTable
Set SpecialString = [dbo].[randomstring] (SpecialString)
go
*/
declare #StringToreturn varchar(max) = ''
declare #charCounter int = 1
declare #len int = len(#stringToParse)
declare #thisRand int
declare #UpperA int = 65
declare #UpperZ int = 90
declare #LowerA int = 97
declare #LowerZ int = 112
declare #thisChar char(1)
declare #Random_Number float
declare #randomChar char(1)
WHILE #charCounter < #len
BEGIN
SELECT #thisChar = SUBSTRING(#stringToParse, #charCounter, 1)
set #randomChar = #thisChar
--print #randomChar
SELECT #Random_Number = dbo.RandFn()
--print #Random_Number
--only swap if a-z or A-Z
if ASCII(#thisChar) >= #UpperA and ASCII(#thisChar) <= #UpperZ begin
--upper case
set #thisRand = #UpperA + (#Random_Number * convert(float, (#UpperZ-#UpperA)))
set #randomChar = CHAR(#thisRand)
--print #thisRand
end
if ASCII(#thisChar) >= #LowerA and ASCII(#thisChar) <= #LowerZ begin
--upper case
set #thisRand = #LowerA + (#Random_Number * convert(float, (#LowerZ-#LowerA)))
set #randomChar = CHAR(#thisRand)
end
--print #thisRand
--print #randomChar
set #StringToreturn = #StringToreturn + #randomChar
SET #charCounter = #charCounter + 1
END
--Select * from #returnList
return #StringToreturn
END
GO

SQL replace with serial number for the specific occurrence of a character in a string

Input table has strings like this:
Col_Name
---------------
YXNYNXYYZY
YYZZY
-- and 100's of rows
I want to find specific occurrence of character Y, and create output field like this:
Col_Name
----------------
1,4,7,8,10
1,2,5
I am trying to find solution with sql functions like replace, len, charindex, etc,, but unable to arrive to the output. Please help.
This solution works but you should consider changing your model because this aproach may be slow and should not be SQL Server job's.
declare #search char(1) = 'Y'
; with input(string) as (
Select * From (values('YXNYNXYYZY'), ('YYZZY')) as input(string)
), find(id, string, pos) as ( => get 1 row each Y found and its position
select 0, string, CHARINDEX(#search, string, 0) From input
Where CHARINDEX(#search, string, 0) > 0
Union All
select id+1, string, CHARINDEX(#search, string, pos+1) From find
Where CHARINDEX(#search, string, pos+1) > 0
)
--Select * from find => 1 position per row
Select STUFF( --=> concatenate all position by string
(
Select ', ' + CAST([pos] AS Varchar(10))
From find f
Where (string = r.string)
Order By string, id
For XML PATH(''),TYPE
).value('(./text())[1]','Varchar(100)')
,1,2,'') AS x
From find r
Group BY string
with the help of function we can bring those values
alter FUNCTION [dbo].[GetPosition]
(
#txt varchar(max),
#Pat varchar(max)
)
RETURNS
#tab TABLE
(
ID int
)
AS
BEGIN
Declare #pos int
Declare #oldpos int
Select #oldpos=0
select #pos=patindex(#pat,#txt)
while #pos > 0 and #oldpos<>#pos
begin
insert into #tab Values (#pos)
Select #oldpos=#pos
select #pos=patindex(#pat,Substring(#txt,#pos + 1,len(#txt))) + #pos
end
RETURN
END
GO
Call Function
SELECT
stuff(
(
SELECT ','+ cast(ID as nvarchar(4)) FROM dbo.[GetPosition] ('YXNYNXYYZY','%Y%') FOR XML PATH('')
),1,1,'')
This would be trivial to do with a scripting language like php or python. Dump the entire dataset to an array and check each value. Event 100's of rows would not be an issue.
Load the CsV File
Search array for values.
//psuedo code (untested)
$Data = str_getcsv('c:\data\dumpeddata.csv');
foreach ($Data as $LineNo => $LineDetails) {
foreach($LineDetails as $ColNo => $ColData){
if (strpos($ColData , 'Y') !== false){
$Found .= "\nLine:$LineNo,Col:$ColNo";
}
}
}
echo $Found;
You'll be surprised how fast this runs.
declare #n int = (select charindex('Y',col_name,1) from tablename)
declare #l int = (select len(col_name) from tablename)
declare #res varchar(100) = ''
while #n <= #l and (select charindex('Y',col_name,#n) from tablename) <> 0
begin
set #res = #res + cast(select charindex('Y',col_name,#n) from tablename as varchar) + ' '
set #n = (select charindex('Y',col_name,#n) from tablename) + 1
end
select #res
This gives you an idea. If this has to be done repeatedly, it is better to wrap this in a function.

Query to get only numbers from a string

I have data like this:
string 1: 003Preliminary Examination Plan
string 2: Coordination005
string 3: Balance1000sheet
The output I expect is
string 1: 003
string 2: 005
string 3: 1000
And I want to implement it in SQL.
First create this UDF
CREATE FUNCTION dbo.udf_GetNumeric
(
#strAlphaNumeric VARCHAR(256)
)
RETURNS VARCHAR(256)
AS
BEGIN
DECLARE #intAlpha INT
SET #intAlpha = PATINDEX('%[^0-9]%', #strAlphaNumeric)
BEGIN
WHILE #intAlpha > 0
BEGIN
SET #strAlphaNumeric = STUFF(#strAlphaNumeric, #intAlpha, 1, '' )
SET #intAlpha = PATINDEX('%[^0-9]%', #strAlphaNumeric )
END
END
RETURN ISNULL(#strAlphaNumeric,0)
END
GO
Now use the function as
SELECT dbo.udf_GetNumeric(column_name)
from table_name
SQL FIDDLE
I hope this solved your problem.
Reference
Try this one -
Query:
DECLARE #temp TABLE
(
string NVARCHAR(50)
)
INSERT INTO #temp (string)
VALUES
('003Preliminary Examination Plan'),
('Coordination005'),
('Balance1000sheet')
SELECT LEFT(subsrt, PATINDEX('%[^0-9]%', subsrt + 't') - 1)
FROM (
SELECT subsrt = SUBSTRING(string, pos, LEN(string))
FROM (
SELECT string, pos = PATINDEX('%[0-9]%', string)
FROM #temp
) d
) t
Output:
----------
003
005
1000
Query:
DECLARE #temp TABLE
(
string NVARCHAR(50)
)
INSERT INTO #temp (string)
VALUES
('003Preliminary Examination Plan'),
('Coordination005'),
('Balance1000sheet')
SELECT SUBSTRING(string, PATINDEX('%[0-9]%', string), PATINDEX('%[0-9][^0-9]%', string + 't') - PATINDEX('%[0-9]%',
string) + 1) AS Number
FROM #temp
Please try:
declare #var nvarchar(max)='Balance1000sheet'
SELECT LEFT(Val,PATINDEX('%[^0-9]%', Val+'a')-1) from(
SELECT SUBSTRING(#var, PATINDEX('%[0-9]%', #var), LEN(#var)) Val
)x
Getting only numbers from a string can be done in a one-liner.
Try this :
SUBSTRING('your-string-here', PATINDEX('%[0-9]%', 'your-string-here'), LEN('your-string-here'))
NB: Only works for the first int in the string, ex: abc123vfg34 returns 123.
I found this approach works about 3x faster than the top voted answer. Create the following function, dbo.GetNumbers:
CREATE FUNCTION dbo.GetNumbers(#String VARCHAR(8000))
RETURNS VARCHAR(8000)
AS
BEGIN;
WITH
Numbers
AS (
--Step 1.
--Get a column of numbers to represent
--every character position in the #String.
SELECT 1 AS Number
UNION ALL
SELECT Number + 1
FROM Numbers
WHERE Number < LEN(#String)
)
,Characters
AS (
SELECT Character
FROM Numbers
CROSS APPLY (
--Step 2.
--Use the column of numbers generated above
--to tell substring which character to extract.
SELECT SUBSTRING(#String, Number, 1) AS Character
) AS c
)
--Step 3.
--Pattern match to return only numbers from the CTE
--and use STRING_AGG to rebuild it into a single string.
SELECT #String = STRING_AGG(Character,'')
FROM Characters
WHERE Character LIKE '[0-9]'
--allows going past the default maximum of 100 loops in the CTE
OPTION (MAXRECURSION 8000)
RETURN #String
END
GO
Testing
Testing for purpose:
SELECT dbo.GetNumbers(InputString) AS Numbers
FROM ( VALUES
('003Preliminary Examination Plan') --output: 003
,('Coordination005') --output: 005
,('Balance1000sheet') --output: 1000
,('(111) 222-3333') --output: 1112223333
,('1.38hello#f00.b4r#\-6') --output: 1380046
) testData(InputString)
Testing for performance:
Start off setting up the test data...
--Add table to hold test data
CREATE TABLE dbo.NumTest (String VARCHAR(8000))
--Make an 8000 character string with mix of numbers and letters
DECLARE #Num VARCHAR(8000) = REPLICATE('12tf56se',800)
--Add this to the test table 500 times
DECLARE #n INT = 0
WHILE #n < 500
BEGIN
INSERT INTO dbo.NumTest VALUES (#Num)
SET #n = #n +1
END
Now testing the dbo.GetNumbers function:
SELECT dbo.GetNumbers(NumTest.String) AS Numbers
FROM dbo.NumTest -- Time to complete: 1 min 7s
Then testing the UDF from the top voted answer on the same data.
SELECT dbo.udf_GetNumeric(NumTest.String)
FROM dbo.NumTest -- Time to complete: 3 mins 12s
Inspiration for dbo.GetNumbers
Decimals
If you need it to handle decimals, you can use either of the following approaches, I found no noticeable performance differences between them.
change '[0-9]' to '[0-9.]'
change Character LIKE '[0-9]' to ISNUMERIC(Character) = 1 (SQL treats a single decimal point as "numeric")
Bonus
You can easily adapt this to differing requirements by swapping out WHERE Character LIKE '[0-9]' with the following options:
WHERE Letter LIKE '[a-zA-Z]' --Get only letters
WHERE Letter LIKE '[0-9a-zA-Z]' --Remove non-alphanumeric
WHERE Letter LIKE '[^0-9a-zA-Z]' --Get only non-alphanumeric
With the previous queries I get these results:
'AAAA1234BBBB3333' >>>> Output: 1234
'-çã+0!\aº1234' >>>> Output: 0
The code below returns All numeric chars:
1st output: 12343333
2nd output: 01234
declare #StringAlphaNum varchar(255)
declare #Character varchar
declare #SizeStringAlfaNumerica int
declare #CountCharacter int
set #StringAlphaNum = 'AAAA1234BBBB3333'
set #SizeStringAlfaNumerica = len(#StringAlphaNum)
set #CountCharacter = 1
while isnumeric(#StringAlphaNum) = 0
begin
while #CountCharacter < #SizeStringAlfaNumerica
begin
if substring(#StringAlphaNum,#CountCharacter,1) not like '[0-9]%'
begin
set #Character = substring(#StringAlphaNum,#CountCharacter,1)
set #StringAlphaNum = replace(#StringAlphaNum, #Character, '')
end
set #CountCharacter = #CountCharacter + 1
end
set #CountCharacter = 0
end
select #StringAlphaNum
declare #puvodni nvarchar(20)
set #puvodni = N'abc1d8e8ttr987avc'
WHILE PATINDEX('%[^0-9]%', #puvodni) > 0 SET #puvodni = REPLACE(#puvodni, SUBSTRING(#puvodni, PATINDEX('%[^0-9]%', #puvodni), 1), '' )
SELECT #puvodni
A solution for SQL Server 2017 and later, using TRANSLATE:
DECLARE #T table (string varchar(50) NOT NULL);
INSERT #T
(string)
VALUES
('003Preliminary Examination Plan'),
('Coordination005'),
('Balance1000sheet');
SELECT
result =
REPLACE(
TRANSLATE(
T.string COLLATE Latin1_General_CI_AI,
'abcdefghijklmnopqrstuvwxyz',
SPACE(26)),
SPACE(1),
SPACE(0))
FROM #T AS T;
Output:
result
003
005
1000
The code works by:
Replacing characters a-z (ignoring case & accents) with a space
Replacing spaces with an empty string.
The string supplied to TRANSLATE can be expanded to include additional characters.
I did not have rights to create functions but had text like
["blahblah012345679"]
And needed to extract the numbers out of the middle
Note this assumes the numbers are grouped together and not at the start and end of the string.
select substring(column_name,patindex('%[0-9]%', column_name),patindex('%[0-9][^0-9]%', column_name)-patindex('%[0-9]%', column_name)+1)
from table name
Although this is an old thread its the first in google search, I came up with a different answer than what came before. This will allow you to pass your criteria for what to keep within a string, whatever that criteria might be. You can put it in a function to call over and over again if you want.
declare #String VARCHAR(MAX) = '-123. a 456-78(90)'
declare #MatchExpression VARCHAR(255) = '%[0-9]%'
declare #return varchar(max)
WHILE PatIndex(#MatchExpression, #String) > 0
begin
set #return = CONCAT(#return, SUBSTRING(#string,patindex(#matchexpression, #string),1))
SET #String = Stuff(#String, PatIndex(#MatchExpression, #String), 1, '')
end
select (#return)
This UDF will work for all types of strings:
CREATE FUNCTION udf_getNumbersFromString (#string varchar(max))
RETURNS varchar(max)
AS
BEGIN
WHILE #String like '%[^0-9]%'
SET #String = REPLACE(#String, SUBSTRING(#String, PATINDEX('%[^0-9]%', #String), 1), '')
RETURN #String
END
Just a little modification to #Epsicron 's answer
SELECT SUBSTRING(string, PATINDEX('%[0-9]%', string), PATINDEX('%[0-9][^0-9]%', string + 't') - PATINDEX('%[0-9]%',
string) + 1) AS Number
FROM (values ('003Preliminary Examination Plan'),
('Coordination005'),
('Balance1000sheet')) as a(string)
no need for a temporary variable
Firstly find out the number's starting length then reverse the string to find out the first position again(which will give you end position of number from the end). Now if you deduct 1 from both number and deduct it from string whole length you'll get only number length. Now get the number using SUBSTRING
declare #fieldName nvarchar(100)='AAAA1221.121BBBB'
declare #lenSt int=(select PATINDEX('%[0-9]%', #fieldName)-1)
declare #lenEnd int=(select PATINDEX('%[0-9]%', REVERSE(#fieldName))-1)
select SUBSTRING(#fieldName, PATINDEX('%[0-9]%', #fieldName), (LEN(#fieldName) - #lenSt -#lenEnd))
T-SQL function to read all the integers from text and return the one at the indicated index, starting from left or right, also using a starting search term (optional):
create or alter function dbo.udf_number_from_text(
#text nvarchar(max),
#search_term nvarchar(1000) = N'',
#number_position tinyint = 1,
#rtl bit = 0
) returns int
as
begin
declare #result int = 0;
declare #search_term_index int = 0;
if #text is null or len(#text) = 0 goto exit_label;
set #text = trim(#text);
if len(#text) = len(#search_term) goto exit_label;
if len(#search_term) > 0
begin
set #search_term_index = charindex(#search_term, #text);
if #search_term_index = 0 goto exit_label;
end;
if #search_term_index > 0
if #rtl = 0
set #text = trim(right(#text, len(#text) - #search_term_index - len(#search_term) + 1));
else
set #text = trim(left(#text, #search_term_index - 1));
if len(#text) = 0 goto exit_label;
declare #patt_number nvarchar(10) = '%[0-9]%';
declare #patt_not_number nvarchar(10) = '%[^0-9]%';
declare #number_start int = 1;
declare #number_end int;
declare #found_numbers table (id int identity(1,1), val int);
while #number_start > 0
begin
set #number_start = patindex(#patt_number, #text);
if #number_start > 0
begin
if #number_start = len(#text)
begin
insert into #found_numbers(val)
select cast(substring(#text, #number_start, 1) as int);
break;
end;
else
begin
set #text = right(#text, len(#text) - #number_start + 1);
set #number_end = patindex(#patt_not_number, #text);
if #number_end = 0
begin
insert into #found_numbers(val)
select cast(#text as int);
break;
end;
else
begin
insert into #found_numbers(val)
select cast(left(#text, #number_end - 1) as int);
if #number_end = len(#text)
break;
else
begin
set #text = trim(right(#text, len(#text) - #number_end));
if len(#text) = 0 break;
end;
end;
end;
end;
end;
if #rtl = 0
select #result = coalesce(a.val, 0)
from (select row_number() over (order by m.id asc) as c_row, m.val
from #found_numbers as m) as a
where a.c_row = #number_position;
else
select #result = coalesce(a.val, 0)
from (select row_number() over (order by m.id desc) as c_row, m.val
from #found_numbers as m) as a
where a.c_row = #number_position;
exit_label:
return #result;
end;
Example:
select dbo.udf_number_from text(N'Text text 10 text, 25 term', N'term',2,1);
returns 10;
This is one of the simplest and easiest one. This will work on the entire String for multiple occurences as well.
CREATE FUNCTION dbo.fn_GetNumbers(#strInput NVARCHAR(500))
RETURNS NVARCHAR(500)
AS
BEGIN
DECLARE #strOut NVARCHAR(500) = '', #intCounter INT = 1
WHILE #intCounter <= LEN(#strInput)
BEGIN
SELECT #strOut = #strOut + CASE WHEN SUBSTRING(#strInput, #intCounter, 1) LIKE '[0-9]' THEN SUBSTRING(#strInput, #intCounter, 1) ELSE '' END
SET #intCounter = #intCounter + 1
END
RETURN #strOut
END
Following a solution using a single common table expression (CTE).
DECLARE #s AS TABLE (id int PRIMARY KEY, value nvarchar(max));
INSERT INTO #s
VALUES
(1, N'003Preliminary Examination Plan'),
(2, N'Coordination005'),
(3, N'Balance1000sheet');
SELECT * FROM #s ORDER BY id;
WITH t AS (
SELECT
id,
1 AS i,
SUBSTRING(value, 1, 1) AS c
FROM
#s
WHERE
LEN(value) > 0
UNION ALL
SELECT
t.id,
t.i + 1 AS i,
SUBSTRING(s.value, t.i + 1, 1) AS c
FROM
t
JOIN #s AS s ON t.id = s.id
WHERE
t.i < LEN(s.value)
)
SELECT
id,
STRING_AGG(c, N'') WITHIN GROUP (ORDER BY i ASC) AS value
FROM
t
WHERE
c LIKE '[0-9]'
GROUP BY
id
ORDER BY
id;
DECLARE #index NVARCHAR(20);
SET #index = 'abd565klaf12';
WHILE PATINDEX('%[0-9]%', #index) != 0
BEGIN
SET #index = REPLACE(#index, SUBSTRING(#index, PATINDEX('%[0-9]%', #index), 1), '');
END
SELECT #index;
One can replace [0-9] with [a-z] if numbers only are wanted with desired castings using the CAST function.
If we use the User Define Function, the query speed will be greatly reduced. This code extracts the number from the string....
SELECT
Reverse(substring(Reverse(rtrim(ltrim( substring([FieldName] , patindex('%[0-9]%', [FieldName] ) , len([FieldName]) )))) , patindex('%[0-9]%', Reverse(rtrim(ltrim( substring([FieldName] , patindex('%[0-9]%', [FieldName] ) , len([FieldName]) )))) ), len(Reverse(rtrim(ltrim( substring([FieldName] , patindex('%[0-9]%', [FieldName] ) , len([FieldName]) ))))) )) NumberValue
FROM dbo.TableName
CREATE OR REPLACE FUNCTION count_letters_and_numbers(input_string TEXT)
RETURNS TABLE (letters INT, numbers INT) AS $$
BEGIN
RETURN QUERY SELECT
sum(CASE WHEN input_string ~ '[A-Za-z]' THEN 1 ELSE 0 END) as letters,
sum(CASE WHEN input_string ~ '[0-9]' THEN 1 ELSE 0 END) as numbers
FROM unnest(string_to_array(input_string, '')) as input_string;
END;
$$ LANGUAGE plpgsql;
For the hell of it...
This solution is different to all earlier solutions, viz:
There is no need to create a function
There is no need to use pattern matching
There is no need for a temporary table
This solution uses a recursive common table expression (CTE)
But first - note the question does not specify where such strings are stored. In my solution below, I create a CTE as a quick and dirty way to put these strings into some kind of "source table".
Note also - this solution uses a recursive common table expression (CTE) - so don't get confused by the usage of two CTEs here. The first is simply to make the data avaliable to the solution - but it is only the second CTE that is required in order to solve this problem. You can adapt the code to make this second CTE query your existing table, view, etc.
Lastly - my coding is verbose, trying to use column and CTE names that explain what is going on and you might be able to simplify this solution a little. I've added in a few pseudo phone numbers with some (expected and atypical, as the case may be) formatting for the fun of it.
with SOURCE_TABLE as (
select '003Preliminary Examination Plan' as numberString
union all select 'Coordination005' as numberString
union all select 'Balance1000sheet' as numberString
union all select '1300 456 678' as numberString
union all select '(012) 995 8322 ' as numberString
union all select '073263 6122,' as numberString
),
FIRST_CHAR_PROCESSED as (
select
len(numberString) as currentStringLength,
isNull(cast(try_cast(replace(left(numberString, 1),' ','z') as tinyint) as nvarchar),'') as firstCharAsNumeric,
cast(isNull(cast(try_cast(nullIf(left(numberString, 1),'') as tinyint) as nvarchar),'') as nvarchar(4000)) as newString,
cast(substring(numberString,2,len(numberString)) as nvarchar) as remainingString
from SOURCE_TABLE
union all
select
len(remainingString) as currentStringLength,
cast(try_cast(replace(left(remainingString, 1),' ','z') as tinyint) as nvarchar) as firstCharAsNumeric,
cast(isNull(newString,'') as nvarchar(3999)) + isNull(cast(try_cast(nullIf(left(remainingString, 1),'') as tinyint) as nvarchar(1)),'') as newString,
substring(remainingString,2,len(remainingString)) as remainingString
from FIRST_CHAR_PROCESSED fcp2
where fcp2.currentStringLength > 1
)
select
newString
,* -- comment this out when required
from FIRST_CHAR_PROCESSED
where currentStringLength = 1
So what's going on here?
Basically in our CTE we are selecting the first character and using try_cast (see docs) to cast it to a tinyint (which is a large enough data type for a single-digit numeral). Note that the type-casting rules in SQL Server say that an empty string (or a space, for that matter) will resolve to zero, so the nullif is added to force spaces and empty strings to resolve to null (see discussion) (otherwise our result would include a zero character any time a space is encountered in the source data).
The CTE also returns everything after the first character - and that becomes the input to our recursive call on the CTE; in other words: now let's process the next character.
Lastly, the field newString in the CTE is generated (in the second SELECT) via concatenation. With recursive CTEs the data type must match between the two SELECT statements for any given column - including the column size. Because we know we are adding (at most) a single character, we are casting that character to nvarchar(1) and we are casting the newString (so far) as nvarchar(3999). Concatenated, the result will be nvarchar(4000) - which matches the type casting we carry out in the first SELECT.
If you run this query and exclude the WHERE clause, you'll get a sense of what's going on - but the rows may be in a strange order. (You won't necessarily see all rows relating to a single input value grouped together - but you should still be able to follow).
Hope it's an interesting option that may help a few people wanting a strictly expression-based solution.
In Oracle
You can get what you want using this:
SUBSTR('ABCD1234EFGH',REGEXP_INSTR ('ABCD1234EFGH', '[[:digit:]]'),REGEXP_COUNT ('ABCD1234EFGH', '[[:digit:]]'))
Sample Query:
SELECT SUBSTR('003Preliminary Examination Plan ',REGEXP_INSTR ('003Preliminary Examination Plan ', '[[:digit:]]'),REGEXP_COUNT ('003Preliminary Examination Plan ', '[[:digit:]]')) SAMPLE1,
SUBSTR('Coordination005',REGEXP_INSTR ('Coordination005', '[[:digit:]]'),REGEXP_COUNT ('Coordination005', '[[:digit:]]')) SAMPLE2,
SUBSTR('Balance1000sheet',REGEXP_INSTR ('Balance1000sheet', '[[:digit:]]'),REGEXP_COUNT ('Balance1000sheet', '[[:digit:]]')) SAMPLE3 FROM DUAL
If you are using Postgres and you have data like '2000 - some sample text' then try substring and position combination, otherwise if in your scenario there is no delimiter, you need to write regex:
SUBSTRING(Column_name from 0 for POSITION('-' in column_name) - 1) as
number_column_name