Using 'LIKE' and 'REGEXP' in a SQL query - sql

I'm trying to use some regex on an expression where I have two conditions on the WHERE clause. The pattern I want to capture is 106 followed by any digit followed by a digit that must be either 3 or 4, i.e. 106[0-9][3-4]
First, I tried this:
SELECT DISTINCT Loggers
FROM [alo].[Forests] C
WHERE (R.LogSU = 3)
AND (ForestID REGEXP '106[0-9][3-4]')
This produced an error as below and it would be good to know why.
Msg 102, Level 15, State 1, Line 16
Incorrect syntax near 'REGEXP'.
Next, I have tried this, which is now running but I am unsure about whether this is doing what I want it to do.
SELECT DISTINCT Loggers
FROM [alo].[Forests] C
WHERE (R.LogSU = 3)
AND (ForestID LIKE '106[0-9][3-4]')
Would this do as I described above?

You specify this:
The pattern I want to capture is 106 followed by any digit followed by
a digit that must be either 3 or 4, i.e. 106[0-9][3-4]
And then you give an example using a regular expression:
WHERE ForestID REGEXP '106[0-9][3-4]'
Regular expressions match patterns anywhere inside a string. So, this will match '10603'. It will also match 'abc10694 def'. This is true of regular expressions in general, not merely one databases's implementation of them.
If this is the behavior you want, then the corresponding LIKE (in SQL Server)` is:
WHERE ForestID LIKE '%106[0-9][3-4]%'
If you only want 5-digit values, then the corresponding regular expression is:
WHERE ForestID REGEXP '^106[0-9][3-4]$'

You do not need to interact with managed code, as you can use LIKE:
SELECT DISTINCT Loggers
FROM [alo].[Forests] C
WHERE (R.LogSU = 3)
AND ForestID LIKE '106[0-9][3-4]')
to make clear: SQL Server doesn't supports regular expressions without managed code. Depending on the situation, the LIKE operator can be an option, but it lacks the flexibility that regular expressions provides.
If you would like to have full regular expression functionality, try this.

Try Below
SELECT DISTINCT Loggers
FROM [alo].[Forests] C
WHERE (R.LogSU = 3)
AND ((ForestID LIKE '%106_3%' OR ForestID LIKE '%106_4%'))

Related

Extract number from a URL in Redshift

I would like to extract an ID (a number) from a bunch of URLs in Redshift. I know I can use regexp_substr for this purpose, but my knowledge of regular expressions is weak. Here are a couple example URLs:
/checkout?feature=ADVANCED_SEARCH&upgradeRedirect=%2Fmentions%3Ftop_ids%3D1222874068&btv=feature_ADVANCED_SEARCH
/checkout?feature=ADVANCED_SEARCH&trigger=mentioning-author-rw&upgradeRedirect=%2Fmentions%3Ftop_ids%3D160447990
After parsing the above URLs, I would like the output to be:
1222874068
160447990
Note that the parameter top_ids remains constant and will help break the URL.
I tried using multiple versions of split_part as well. But there may be variations in the URL where it might break. So using a regular expression may be a better idea.
Any help would be greatly appreciated.
You can use:
select regexp_substr(column,'top_ids%3D([0-9]*)', 1, 1, 'e')
The 'e' extracts the substring in (brackets).
Try something like this:
SUBSTR(REGEXP_SUBSTR(yourcolumn, 'top_ids%3D([0-9]{2,})'), 11, 20)
Just looking for 'top_ids%3D' and 2 or more digits after it.
Then removes the first 10 characters.

Regex match first number if it does not appear at the end

I am currently facing a Regex problem which apparently I cannot find an answer to.
My Regex is embedded in a teradata SQL of the form:
REGEXP_SUBSTR(column, 'regex_pattern')
I want to find the first appearance of any number except if it appears at the end of the string.
For Example:
"YEL2X30" -> "2"
"YEL19XYZ05" -> "19"
"YELLOW05" -> ""
I tried it with '[0-9]+(?!$)/' but this returns me a blank String always.
Thanks in Advance!
Shot in the dark here since I'm unfamiliar with teradata and the supported SQL-functionality. However, reading the docs on the REGEXP_SUBSTR() function it seems like you may want to use the 3rd and 4th possible argument along with a slightly different regular expression:
[0-9]+(?![0-9]|$)
Meaning: 1+ Digits that are not followed by either the end of the string or another digit.
I'd believe the following syntax may work now to retrieve the 1st appearance of any number from the matching results:
REGEXP_SUBSTR(column, '[0-9]+(?![0-9]|$)', 1, 1)
The 3rd parameter states from which position in the source-string we need to start searching whereas the 4th will return the 1st match from any possible multiple matches (is how I read the docs). For example: abc123def456ghi789 whould return 123.
Fiddling around in online IDE's gave me that:
CREATE TABLE TBL (TST varchar(100));
INSERT INTO TBL values ('YEL2X30'), ('YEL19XYZ05'), ('YELLOW05'), ('abc123def456ghi789');
SELECT REGEXP_SUBSTR(TST, '[0-9]+(?![0-9]|$)', 1, 1) as 'RESULTS' FROM TBL;
Resulted in:
RESULTS
2
19
NULL
123
NOTE: I also noticed that leaving out the 3rd and 4th parameter made no difference since they will default back to 1 without explicitly mentioning them. I tested this over here.
Possibly the simplest way is to look for digits followed by a non-digit. Then keep all the digits:
regexp_substr(regexp_substr(column, '[0-9]+[^0-9]'), '[0-9]+')

Find a specific number in a string

I'm using an IF THEN statement to determine if a string contains a specific number. For example, the string = 1, 9, 13. I'm trying to isolate strings that contain the single number "3". However, when I use Like "3", or Like "3", I also get the results back that contain 13. How do I use wildcards to do this?
If your string is just a list of numbers in the form you have shown ...
"n1, n2, n3, n4"
... then you can use VBA's Like function as follows:
Debug.Print " 1, 7, 5, 13," Like "*[ ]3,*" 'matches 3 but not 13
If your string is arbitrary, then this is actually a regular expression question, and you'll need to decide if this a route you want to go down. Something like the following works at the Regex Tester Page:
/\b3\b/g
There's a useful VBA Regex Regular Expressions Guide which shows you how to use syntax like this, including how to set a reference to the additional library that you need for it to work.

How SQL/sqlite wildcars work? LIKE operator

How wildcards in sqlite work. Or how LIKE operator matches.
For examle lets say:
1: LIKE('s%s%', 's12s12')
2: LIKE('asdaska', '%sk%')
In 1st example what % matches after 1st s, and how it decides to continue matching % or s after %.
In 2nd example if s matches first then FALSE returned.
Both examples return TRUE. From my Programming knowledge I came up with that LIKE function is some like a recursive function that when 2 possibilities appear function calls itself with 2 different params and uses OR between them, then obviously if one call returns true, upper function directly returns true. If it is so, then LIKE operator is quiet slow to use on large DBs.
P.S. There is one more '_' wildcard which matches exactly one character
I couldnt find any detailed documentation of LIKE operator.
% matches zero or more characters, _ matches exactly one.
Your first pattern 's%s%' would match, 'ss', 's1s', 's1111s', 'ss1111', etc. etc.
However if you wrote 's_s_' it would match 's1s1', but none of the above.

SQL to return results for the following regex

I have the following regular expression:
WHERE A.srvc_call_id = '40750564' AND REGEXP_LIKE (A.SRVC_CALL_DN, '[^TEST]')
The row that contains 40750564 has "TEST CALL" in the column SRVC_CALL_DN and REGEXP_LIKE doesn't seem to be filtering it out. Whenever I run the query it returns the row when it shouldn't.
Is my regex pattern wrong? Or does SQL not accept [^whatever]?
The carat anchors the expression to the start of a string. By enclosing the letters T, E, S & T in square brackets you're searching, as barsju suggests for any of these characters, not for the string TEST.
You say that SRVC_CALL_DN contains the string 'TEST CALL', but you don't say where in the string. You also say that you're looking for where this string doesn't match. This implies that you want to use not regexp_like(...
Putting all this together I think you need:
AND NOT REGEXP_LIKE (A.SRVC_CALL_DN, '^TEST[[:space:]]CALL')
This excludes every match from your query where the string starts with 'TEST CALL'. However, if this string may be in any position in the column you need to remove the carat - ^.
This also assumes that the string is always in upper case. If it's in mixed case or lower, then you need to change it again. Something like the following:
AND NOT REGEXP_LIKE (upper(A.SRVC_CALL_DN), '^TEST[[:space:]]CALL')
By upper-casing SRV_CALL_DN you ensure that you're always going to match but ensure that your query may not use an index on this column. I wouldn't worry about this particular point as regular expressions queries can be fairly poor at using indexes anyway and it appears as though SRVC_CALL_ID is indexed.
Also if it may not include 'CALL' you will have to remove this. It is best when using regular expressions to make your match pattern as explicit as possible; so include 'CALL' if you can.
Try with '^TEST' or '^TEST.*'
Your regexp means any string not starting with any of the characters: T,E,S,T.
But your case is so simple, starts with TEST. Why not use a simple like:
LIKE 'TEST%'