Sybase SQL: How to extract numbers from varchar-values?

Sybase SQL: How to extract numbers from varchar-values? - sql

I have a bunch of values (varchars) that consist partly of letters (whereby number of letters >= 0) and partly of numbers.
For example:
abc-123
defjke12345
987654
Is there a function in sybase sql that extracts the part of the value that consists of numbers?
Continuing with the given examples,
abc-123 should become 123
defjke12345 should become 12345
987654 should stay 987654
How can I achieve this in sybase sql (procedural language)?

Try this out with the substring and pattern index:
DECLARE #yourfield VARCHAR(100)
SELECT #yourfield ='defjke12345'
SELECT SUBSTRING(#yourfield , PATINDEX('%[^0-9][0-9]%', #yourfield ) + 1,
PATINDEX('%[0-9][^0-9]%', #yourfield ) - PATINDEX('%[^0-9][0-9]%', #yourfield ))

Related

remove last n characters from a varchar in SQL

I am trying to remove the last n characters from a string. I tried this:
replace( str, right(str, 3), '' )
But it fails on str where the pattern repeats more than once. 888106106. In this case I get 888, instead of 888106
Now I am using
left (str, length(str)-3)
Is there a more efficient away of achieving this?

If you fancy a regex based solution:
regexp_replace(str,'...$','')
It will leave strings with < 3 characters unchanged

So checking that LEFT and/or SUBSTR work equally (I assume LEFT is faster):
select
column1
,left(column1, length(column1) -3) as r1
,substr(column1, 0, length(column1) -3) as r2
from values
('abc123')
,('ab123')
,('a123')
,('123')
,('12')
,('1')
,('')
,(null);
gives:
COLUMN1
R1
R2
abc123
abc
abc
ab123
ab
ab
a123
a
a
123
null
null
12
null
null
1
null
null
''
null
null
null
null
null
so no checks are needed, nice to know.
if you do some perf testing:
create database test;
create schema test.test;
create or replace table test.test.many_string as
select seq8()::text as a
from table(generator(ROWCOUNT => 10000000));
ALTER SESSION SET USE_CACHED_RESULT = false;
select sum(length(left(a, length(a) -3))) from test.test.many_string;
select sum(length(substr(a, 0, length(a) -3))) from test.test.many_string;
after running them both a couple of times on my x-small, I get results in the order of 300ms, so these are equal.
So it seems you have a fast solution, and easy to read.

In SQL Server, that's the most effective way.
Note that you should do one of:
assert that all input strings will be length>2 [a bit lazy]
handle the error where 1 of the rows has a length<3 and the query terminates early [a bit shoddy]
use a case statement to handle the case where length < 3 [the preferred approach]
CASE
WHEN LENGTH(str) > 2 THEN LEFT(str, LENGTH(str) - 3)
ELSE str
END
Other flavours of SQL you may have to work without the case statement.

Extract phone number from noised string

I have a column in a table that contains random data along with phone numbers in different formats. The column may contain
Name
Phone
Email
HTML tags
Addresses (with numbers)
Examples:
1) Call back from +79005346546, Conversation started<br>Phone: +79005346546<br>Called twice Came from google.com<br>IP: 77.106.46.202 the web page address is xxx.com utm_medium: cpc<br>utm_campaign: 32587871<br>utm_content: 5283041 79005346546
2) John Smith
3) xxx#yyy.com
4) John Smith 8 999 888 77 77
How a phone number is written is also depends. It may be like 8 927 410 00 22, 8(927)410-00-22, +7(927)410-00-22, +7 (927) 410-00-22, (927)410 00 22, 927 410 00 22, 9(2741) 0 0 0-22 and so on
The common rule here is that the phone number format contains 10-11 digits.
My best guess is to use regular expressions and firstly remove email addresses (since they can contain phone numbers in them like 79990001122#gmail.com) from the string and then use some regular expression to extract phone based on knowing it's 10 or 11 digits in row delimited with characters like ,(,),+,- and so on (I don't think someone would use . as phone digit delimiter so we don't want to think of IP Addresses like 77.106.46.202 in the first sample).
So the question is how to get phone numbers from these values.
The final values I want to get from the three examples above are:
1) 79005346546 79005346546 79005346546
2)
3)
4) 89998887777
The server is Microsoft SQL Server 2014 - 12.0.2000.8 (X64) Standard Edition (64-bit)

UPDATED (20200226)
There were a couple comments that a CLR/regex solution could be faster than the ngram8k solution I posted. I've heard this for six years but every single time, without exception, the test harness tells a different story. I already posted in the earlier comments instructions to get the Microsoft© MDQ family of CLR Regex running in just a few minutes. They were developed, tested and tuned by Microsoft and ship with Master Data Services/Data Quality Services. I've used them for years, they're good.
RegexReplace/RegexSplit vs PatExtract8k/DigitsOnlyEE: 1,000,000 rows
Obviously you don't want functions in your WHEREclause but, since my Regex is rusty AF, I needed to. To level the playing field I did the same with DigitsOnlyEE in the N-Gram solution's WHERE clause.
SET NOCOUNT ON;
DBCC FREEPROCCACHE WITH NO_INFOMSGS;
DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS;
SET STATISTICS TIME ON;
DECLARE
#newData BIT = 0,
#string VARCHAR(8000) = '1) Call back from +79005346546, Conversation started<br>Phone: +79005346546<br>Called twice Came from google.com<br>IP: 77.106.46.202 the web page address is xxx.com utm_medium: cpc<br>utm_campaign: 32587871<br>utm_content: 5283041 79005346546 ',
#pattern VARCHAR(50) = '[^0-9()+.-]',
#srchLen INT = 11;
IF #newData = 1
BEGIN
IF OBJECT_ID('tempdb..#strings','U') IS NOT NULL DROP TABLE #strings;
SELECT
StringId = IDENTITY(INT,1,1),
String = REPLICATE(#string,ABS(CHECKSUM(NEWID())%3)+1)
INTO #strings
FROM dbo.rangeAB(1,1000000,1,1) AS r;
END
PRINT CHAR(10)+'Regex/CLR version Serial'+CHAR(10)+REPLICATE('-',90);
SELECT regex.NewString
FROM #strings AS s
CROSS APPLY
(
SELECT STRING_AGG(clr.RegexReplace(f.Token,'[^0-9]','',0),' ')
FROM clr.RegexSplit(s.string,#pattern,N'[0-9()+.-]',0) AS f
WHERE f.IsValid = 1
AND LEN(clr.RegexReplace(f.Token,'[^0-9]','',0)) = #srchLen
) AS regex(NewString);
PRINT CHAR(10)+'NGrams version Serial'+CHAR(10)+REPLICATE('-',90);
SELECT ngramsStuff.NewString
FROM #strings AS s
CROSS APPLY
(
SELECT STRING_AGG(ee.digitsOnly,' ')
FROM samd.patExtract8K(#string,#pattern) AS pe
CROSS APPLY samd.digitsOnlyEE(pe.item) AS ee
WHERE LEN(ee.digitsOnly) = #srchLen
) AS ngramsStuff(NewString)
OPTION (MAXDOP 1);
SET STATISTICS TIME OFF;
GO
Test Results
Regex/CLR version Serial
------------------------------------------------------------------------------------------
SQL Server Execution Times: CPU time = 19918 ms, elapsed time = 12355 ms.
NGrams version Serial
------------------------------------------------------------------------------------------
SQL Server Execution Times: CPU time = 844 ms, elapsed time = 971 ms.
NGrams8k is very fast, does not require you to compile a new assembly, learn a new programming language, Enable CLR functions, etc... No issues with garbage collection. Even the CLR N-GRAMs function that ships with MDS/DQS can't touch NGrams8k for performance (see the comments under my article).
END OF UPDATE
First grab a copy of ngrams8k and use it to build PatExtract8k (DDL below at the bottom of this post.) Next a quick warm-up:
DECLARE
#string VARCHAR(8000) = 'Call me later at 222-3333 or tomorrow at 312.555.2222,
(313)555-6789, or at 1+800-555-4444 before noon. Thanks!',
#pattern VARCHAR(50) = '%[^0-9()+.-]%';
SELECT pe.itemNumber, pe.itemIndex, pe.itemLength, pe.item
FROM samd.patExtract8K(#string,#pattern) AS pe
WHERE pe.itemLength > 1;
Returns:
ItemNumber ItemIndex ItemLength Item
----------- ----------- ----------- ----------------
1 18 8 222-3333
2 42 12 312.555.2222
3 91 13 (313)555-6789
4 112 14 1+800-555-4444
Note that the function returns the matched pattern, position in the string, Item Length and the item. The first three attributes can be leveraged for further processing which brings us to your post. Note my comments:
-- First for some easily consumable sample data.
DECLARE #things TABLE (StringId INT IDENTITY, String VARCHAR(8000));
INSERT #things (String)
VALUES
('Call back from +79005346546, Conversation started<br>Phone: +79005346546<br>Called twice Came from google.com<br>IP: 77.106.46.202 the web page address is xxx.com utm_medium: cpc<br>utm_campaign: 32587871<br>utm_content: 5283041 79005346546 '),
('John Smith'),
('xxx#yyy.com'),
('John Smith 8 999 888 77 77');
DECLARE #SrchLen INT = 11;
SELECT
StringId = t.StringId,
ItemIndex = pe.itemIndex,
ItemLength = #SrchLen,
Item = i2.Item
FROM #things AS t
CROSS APPLY samd.patExtract8K(t.String,'[^0-9 ]') AS pe
CROSS APPLY (VALUES(PATINDEX('%'+REPLICATE('[0-9]',#SrchLen), pe.item))) AS i(Idx)
CROSS APPLY (VALUES(SUBSTRING(pe.Item,NULLIF(i.Idx,0),11))) AS ns(NewString)
CROSS APPLY (VALUES(ISNULL(ns.NewString, REPLACE(pe.item,' ','')))) AS i2(Item)
WHERE pe.itemLength >= #SrchLen;
Returns:
StringId ItemIndex ItemLength Item
----------- -------------------- ----------- -----------
1 17 11 79005346546
1 62 11 79005346546
1 221 11 79005346546
4 11 11 89998887777
Next we can handle outer rows like so and row-to-column concatenation like this:
WITH t AS
(
SELECT i2.Item, t.StringId
FROM #things AS t
CROSS APPLY samd.patExtract8K(t.String,'[^0-9 ]') AS pe
CROSS APPLY (VALUES(PATINDEX('%'+REPLICATE('[0-9]',#SrchLen), pe.item))) AS i(Idx)
CROSS APPLY (VALUES(SUBSTRING(pe.Item,NULLIF(i.Idx,0),11))) AS ns(NewString)
CROSS APPLY (VALUES(ISNULL(ns.NewString, REPLACE(pe.item,' ','')))) AS i2(Item)
WHERE pe.itemLength >= #SrchLen
)
SELECT
StringId = t2.StringId,
NewString = ISNULL((
SELECT t.item+' '
FROM t
WHERE t.StringId = t2.StringId
FOR XML PATH('')),'')
FROM #things AS t2
LEFT JOIN t AS t1 ON t2.StringId = t1.StringId
GROUP BY t2.StringId;
Returns:
StringId NewString
--------- --------------------------------------
1 79005346546 79005346546 79005346546
2
3
4 89998887777
I wish I had a little more time for additional details but this took a little longer then planned. Any questions welcome.
Patextract:
CREATE FUNCTION samd.patExtract8K
(
#string VARCHAR(8000),
#pattern VARCHAR(50)
)
/*****************************************************************************************
[Description]:
This can be considered a T-SQL inline table valued function (iTVF) equivalent of
Microsoft's mdq.RegexExtract except that:
1. It includes each matching substring's position in the string
2. It accepts varchar(8000) instead of nvarchar(4000) for the input string, varchar(50)
instead of nvarchar(4000) for the pattern
3. The mask parameter is not required and therefore does not exist.
4. You have specify what text we're searching for as an exclusion; e.g. for numeric
characters you should search for '[^0-9]' instead of '[0-9]'.
5. There is is no parameter for naming a "capture group". Using the variable below, both
the following queries will return the same result:
DECLARE #string nvarchar(4000) = N'123 Main Street';
SELECT item FROM samd.patExtract8K(#string, '[^0-9]');
SELECT clr.RegexExtract(#string, N'(?<number>(\d+))(?<street>(.*))', N'number', 1);
Alternatively, you can think of patExtract8K as Chris Morris' PatternSplitCM (found here:
http://www.sqlservercentral.com/articles/String+Manipulation/94365/) but only returns the
rows where [matched]=0. The key benefit of is that it performs substantially better
because you are only returning the number of rows required instead of returning twice as
many rows then filtering out half of them. Furthermore, because we're
The following two sets of queries return the same result:
DECLARE #string varchar(100) = 'xx123xx555xx999';
BEGIN
-- QUERY #1
-- patExtract8K
SELECT ps.itemNumber, ps.item
FROM samd.patExtract8K(#string, '[^0-9]') ps;
-- patternSplitCM
SELECT itemNumber = row_number() over (order by ps.itemNumber), ps.item
FROM dbo.patternSplitCM(#string, '[^0-9]') ps
WHERE [matched] = 0;
-- QUERY #2
SELECT ps.itemNumber, ps.item
FROM samd.patExtract8K(#string, '[0-9]') ps;
SELECT itemNumber = row_number() over (order by itemNumber), item
FROM dbo.patternSplitCM(#string, '[0-9]')
WHERE [matched] = 0;
END;
[Compatibility]:
SQL Server 2008+
[Syntax]:
--===== Autonomous
SELECT pe.ItemNumber, pe.ItemIndex, pe.ItemLength, pe.Item
FROM samd.patExtract8K(#string,#pattern) pe;
--===== Against a table using APPLY
SELECT t.someString, pe.ItemIndex, pe.ItemLength, pe.Item
FROM samd.SomeTable t
CROSS APPLY samd.patExtract8K(t.someString, #pattern) pe;
[Parameters]:
#string = varchar(8000); the input string
#searchString = varchar(50); pattern to search for
[Returns]:
itemNumber = bigint; the instance or ordinal position of the matched substring
itemIndex = bigint; the location of the matched substring inside the input string
itemLength = int; the length of the matched substring
item = varchar(8000); the returned text
[Developer Notes]:
1. Requires NGrams8k
2. patExtract8K does not return any rows on NULL or empty strings. Consider using
OUTER APPLY or append the function with the code below to force the function to return
a row on emply or NULL inputs:
UNION ALL SELECT 1, 0, NULL, #string WHERE nullif(#string,'') IS NULL;
3. patExtract8K is not case sensitive; use a case sensitive collation for
case-sensitive comparisons
4. patExtract8K is deterministic. For more about deterministic functions see:
https://msdn.microsoft.com/en-us/library/ms178091.aspx
5. patExtract8K performs substantially better with a parallel execution plan, often
2-3 times faster. For queries that leverage patextract8K that are not getting a
parallel exeution plan you should consider performance testing using Traceflag 8649
in Development environments and Adam Machanic's make_parallel in production.
[Examples]:
--===== (1) Basic extact all groups of numbers:
WITH temp(id, txt) as
(
SELECT * FROM (values
(1, 'hello 123 fff 1234567 and today;""o999999999 tester 44444444444444 done'),
(2, 'syat 123 ff tyui( 1234567 and today 999999999 tester 777777 done'),
(3, '&**OOOOO=+ + + // ==?76543// and today !!222222\\\tester{}))22222444 done'))t(x,xx)
)
SELECT
[temp.id] = t.id,
pe.itemNumber,
pe.itemIndex,
pe.itemLength,
pe.item
FROM temp AS t
CROSS APPLY samd.patExtract8K(t.txt, '[^0-9]') AS pe;
-----------------------------------------------------------------------------------------
Revision History:
Rev 00 - 20170801 - Initial Development - Alan Burstein
Rev 01 - 20180619 - Complete re-write - Alan Burstein
*****************************************************************************************/
RETURNS TABLE WITH SCHEMABINDING AS RETURN
SELECT itemNumber = ROW_NUMBER() OVER (ORDER BY f.position),
itemIndex = f.position,
itemLength = itemLen.l,
item = SUBSTRING(f.token, 1, itemLen.l)
FROM
(
SELECT ng.position, SUBSTRING(#string,ng.position,DATALENGTH(#string))
FROM samd.NGrams8k(#string, 1) AS ng
WHERE PATINDEX(#pattern, ng.token) < --<< this token does NOT match the pattern
ABS(SIGN(ng.position-1)-1) + --<< are you the first row? OR
PATINDEX(#pattern,SUBSTRING(#string,ng.position-1,1)) --<< always 0 for 1st row
) AS f(position, token)
CROSS APPLY (VALUES(ISNULL(NULLIF(PATINDEX('%'+#pattern+'%',f.token),0),
DATALENGTH(#string)+2-f.position)-1)) AS itemLen(l);
GO

The following isn't a direct answer to the question but shows how it can be done in PostgresSQL, which has a mature regular expression replace function. Would expect the solution might be adaptable to SQL Server using some kind of library CLR integration but I'm not experienced in that...
SQL
SELECT REGEXP_REPLACE(
REGEXP_REPLACE(
REGEXP_REPLACE(phoneNumber, '((([0-9])[ ()+-]*){10,11})([^0-9]|$)', '`\1¬','g'),
'(^|¬)[^`¬]*(`|$)', ',', 'g'),
'(^,|,$|[^0-9,])', '', 'g')
FROM tbl;
Online Demo
db-fiddle.uk demo: https://dbfiddle.uk/?rdbms=postgres_12&fiddle=b12d9f9779b686fd0c4aa84956595f70
Explanation
The innermost REGEXP_REPLACE locates groups of either 10 or 11 digits, each of which may have any number of space, bracket, plus or minus characters after it. The group must either be followed by a non-digit character or the end of the line. For each located group, a single ` is appended before the group of digits and a single ¬ is appended after. You might need to adjust these characters to something rarer - they shouldn't appear anywhere else in the text.
The middle REGEXP_REPLACE replaces each block of text that isn't between a pair of marker characters with a single comma.
The outermost REGEXP_REPLACE removes any commas at the start or end of the string and also removes anything that isn't a digit or comma.

Filter IDs with just numbers excluding letters

So I have results that begins with 2 letters followed by 3 numbers, for example:
ID_Sample
AB001
BC003
AB100
BC400
How can I do a query that ignores the letters and just looks up the numbers to do a filter? For example:
WHERE ID_Sample >= 100
I tried using a "Replace" to get rid of known letters, but I figured there might be a better way. For example:
Select
Replace(id_sample,'AB','')
Choosing the 3 numerals on the right would work too.

For your sample data, you can just start at the third character and convert to a number:
where try_convert(int, stuff(ID_Sample, 1, 2, '')) > 100
Or, if you know that the number is 3 characters:
where try_convert(int, right(ID_Sample, 3)) > 100

+1 for Gordon's answer. This is a fun problem that you can solve using TRANSLATE if you're using SQL 2017+.
First, in case you've never used it, Per BOL TRANSLATE:
Returns the string provided as a first argument after some characters
specified in the second argument are translated into a destination set
of characters specified in the third argument.2
This:
SELECT TRANSLATE('123AABBCC!!!','ABC','XYZ');
Returns: 123XXYYZZ!!!
Here's the solution using TRANSLATE:
-- Sample Data
DECLARE #t TABLE (ID_Sample CHAR(6))
INSERT #t (ID_Sample) VALUES ('AB001'),('BC003'),('AB100'),('BC400'),('CC555');
-- Solution
SELECT
ID_Sample = t.ID_Sample,
ID_Sample_Int = s.NewString
FROM #t AS t
CROSS JOIN (VALUES('ABCDEFGHIJKLMNOPQRSTUVWXYZ', REPLICATE(0,26))) AS f(S1,S2)
CROSS APPLY (VALUES(TRY_CAST(TRANSLATE(t.ID_Sample,f.S1,f.S2) AS INT))) AS s(NewString)
WHERE s.NewString >= 100;
Without the WHERE clause filter you get:
ID_Sample ID_Sample_Int
--------- -------------
AB001 1
BC003 3
AB100 100
BC400 400
CC555 555
... the WHERE clause filters out the first two rows.

Check these methods- Unit test also done!
Declare #Table as table(ID_Sample varchar(20))
set nocount on
Insert into #Table (ID_Sample)
Values('AB001'),('BC003'),('AB100'),('BC400')
--substring_method
select * from #Table
where try_cast(substring(ID_Sample,3,3) as int) >100
--right_method
select * from #Table
where try_cast(right(ID_Sample,3) as int) >100
--stuff_method
select * from #Table
where try_cast(stuff(ID_Sample,1,2,'') as int) >100
--replace_method
select * from #Table
where try_cast(replace(ID_Sample,left(ID_Sample,2),'') as int) >100

Determine if zip code contains numbers only

I have a field called zip, type char(5), which contains zip codes like
12345
54321
ABCDE
I'd like to check with an sql statement if a zip code contains numbers only.
The following isn't working
SELECT * FROM S1234.PERSON
WHERE ZIP NOT LIKE '%'
It can't work because even '12345' is an "array" of characters (it is '%', right?
I found out that the following is working:
SELECT * FROM S1234.PERSON
WHERE ZIP NOT LIKE ' %'
It has a space before %. Why is this working?

If you use SQL Server 2012 or up the following script should work.
DECLARE #t TABLE (Zip VARCHAR(10))
INSERT INTO #t VALUES ('12345')
INSERT INTO #t VALUES ('54321')
INSERT INTO #t VALUES ('ABCDE')
SELECT *
FROM #t AS t
WHERE TRY_CAST(Zip AS NUMERIC) IS NOT NULL

Using answer from here to check if all are digit
SELECT col1,col2
FROM
(
SELECT col1,col2,
CASE
WHEN LENGTH(RTRIM(TRANSLATE(ZIP , '*', ' 0123456789'))) = 0
THEN 0 ELSE 1
END as IsAllDigit
FROM S1234.PERSON
) AS Z
WHERE IsAllDigit=0
DB2 doesnot have regular expression facility like MySQL REGEXP

USE ISNUMERIC function;
ISUMERIC returns 1 if the parameter contains only numbers and zero if it not
EXAMPLE:
SELECT * FROM S1234.PERSON
WHERE ISNUMERIC(ZIP) = 1
Your statement doesn't validate against numbers but it says get everything that doesn't start with a space.

Let's suppose you ZIP code is a USA zip code, composed by 5 numbers.
db2 "with val as (
select *
from S1234.PERSON t
where xmlcast(xmlquery('fn:matches(\$ZIP,''^\d{5}$'')') as integer) = 1
)
select * from val"
For more information about xQuery:fn:matches: http://pic.dhe.ibm.com/infocenter/db2luw/v10r5/topic/com.ibm.db2.luw.xml.doc/doc/xqrfnmat.html

mySql does not have a native isNumberic() function. This would be pretty straight-forward in Excel with the ISNUMBER() function, or in T-SQL with ISNUMERIC(), but neither work in MySQL so after a little searching around I came across this solution...
SELECT * FROM S1234.PERSON
WHERE ZIP REGEXP ('[0-9]')
Effectively we're processing a regular expression on the contents of the 'ZIP' field, it may seem like using a sledgehammer to crack a nut and I've no idea how performance would differ from a more simple approach but it worked and I guess that's the point.

I have made more error-prone version based on the solution https://stackoverflow.com/a/36211270/565525, added intermedia result, some examples:
select
test_str
, TRIM(TRANSLATE(replace(trim(test_str), ' ', 'x'), 'yyyyyyyyyyy', '0123456789'))
, case when length(TRIM(TRANSLATE(replace(trim(test_str), ' ', 'x'), 'yyyyyyyyyyy', '0123456789')))=5 then '5-digit-zip' else 'not 5d-zip' end is_zip
from (VALUES
(' 123 ' )
,(' abc ' )
,(' a12 ' )
,(' 12 3 ')
,(' 99435 ')
,('99323' )
) AS X(test_str)
;
The result for this example set is:
TEST_STR 2 IS_ZIP
-------- -------- -----------
123 yyy not 5d-zip
abc abc not 5d-zip
a12 ayy not 5d-zip
12 3 yyxy not 5d-zip
99435 yyyyy 5-digit-zip
99323 yyyyy 5-digit-zip

Try checking if there's a difference between lower case and upper case. Numerics and special chars will look the same:
SELECT *
FROM S1234.PERSON
WHERE UPPER(ZIP COLLATE Latin1_General_CS_AI ) = LOWER(ZIP COLLATE Latin1_General_CS_AI)

Here's a working example for the case where you'd want to check zip codes in a range. You could use this code for inspiration to make a simple single post code check, if you want:
if local_test_environment?
# SQLite supports GLOB which is similar to LIKE (which it only has limited support for), for matching in strings.
where("(zip_code NOT GLOB '*[^0-9]*' AND zip_code <> '') AND (CAST(zip_code AS int) >= :range_start AND CAST(zip_code AS int) <= :range_finish)", range_start: range_start, range_finish: range_finish)
else
# SQLServer supports LIKE with more advanced matching in strings than what SQLite supports.
# SQLServer supports TRY_PARSE which is non-standard SQL, but fixes the error SQLServer gives with CAST, namely: Conversion failed when converting the nvarchar value 'US-19803' to data type int.
where("(zip_code NOT LIKE '%[^0-9]%' AND zip_code <> '') AND (TRY_PARSE(zip_code AS int) >= :range_start AND TRY_PARSE(zip_code AS int) <= :range_finish)", range_start: range_start, range_finish: range_finish)
end

Use regex.
SELECT * FROM S1234.PERSON
WHERE ZIP REGEXP '\d+'

Parse column value based on delimeters

Here is a sample of my data:
ABC*12345ABC
BCD*234()
CDE*3456789(&(&
DEF*4567A*B*C
Using SQL Server 2008 or SSIS, I need to parse this data and return the following result:
12345
234
3456789
4567
As you can see, the asterisk (*) is my first delimiter. The second "delimiter" (I use this term loosely) is when the sequence of numbers STOP.
So, basically, just grab the sequence of numbers after the asterisk...
How can I accomplish this?
EDIT:
I made a mistake in my original post. An example of another possible value would be:
XWZ*A12345%$%
In this case, I would like to return the following:
A12345
The value can START with an alpha character, but it will always END with a number. So, grab everything after the asterisk, but stop at the last number in the sequence.
Any help with this will be greatly appreciated!

You could do this with a little patindex and charindex trickery, like:
; with YourTable(col1) as
(
select 'ABC*12345ABC'
union all select 'BCD*234()'
union all select 'CDE*3456789(&(&'
union all select 'DEF*4567A*B*C'
union all select 'XWZ*A12345%$%'
)
select left(AfterStar, len(Leader) + PATINDEX('%[^0-9]%', AfterLeader) - 1)
from (
select RIGHT(AfterStar, len(AfterStar) - PATINDEX('%[0-9]%', AfterStar) + 1)
as AfterLeader
, LEFT(AfterStar, PATINDEX('%[0-9]%', AfterStar) - 1) as Leader
, AfterStar
from (
select RIGHT(col1, len(col1) - CHARINDEX('*', col1)) as AfterStar
from YourTable
) as Sub1
) as Sub2
This prints:
12345
234
3456789
4567
A12345

If you ignore that this is in SQL then the first thing that comes to mind is Regex:
^.*\*(.*[0-9])[^0-9]*$
The capture group there should get what you want. I don't know if SQL has a regex function.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Sybase SQL: How to extract numbers from varchar-values? - sql

Try this out with the substring and pattern index: DECLARE #yourfield VARCHAR(100) SELECT #yourfield ='defjke12345' SELECT SUBSTRING(#yourfield , PATINDEX('%[^0-9][0-9]%', #yourfield ) + 1, PATINDEX('%[0-9][^0-9]%', #yourfield ) - PATINDEX('%[^0-9][0-9]%', #yourfield ))

Related

remove last n characters from a varchar in SQL

Extract phone number from noised string

Filter IDs with just numbers excluding letters

Determine if zip code contains numbers only

Parse column value based on delimeters

Categories

Resources