Understanding REGEX expression with groups in SQL query - sql

I am currently debugging an old script and trying to understand a regex in an SQL query.
What would be the result of this search?
[...] REGEXP '(^| |"|\\()ORDER(-| )[[:digit:]]{3}';

Here is a part by part explanation:
(^| |"|\\(): either the beginning of the string (^), or a space, or a double quote, or an opening parenthese (this character needs to be escaped because it is meaningful in regexes)
ORDER: the word "ORDER"
(-| ): either a dash or a space
[[:digit:]]{3}: a sequence of 3 consecutive digits (between 0 and 9)

Related

Remove special characters and alphabets from a string except number in sql query in db2

Hi I tried using Regex_replace and it is still not working.
select CASE WHEN sbbb <> ' ' THEN regexp_replace(sbbb,'[a-zA-Z _-#]','']
ELSE sbbb
AS ABCDF
from Table where sccc=1;
This is the query which I am using to remove alphabets and specials characters from string and have only numbers. but it doesnot work. Query returns me the complete string with numbers,characters and special characters .What is wrong in the above query
I am working on a sql query. There is a column in database which contains characters,special characters and numbers. I want to only keep the numbers and remove all the special characters and alphabets. How can I do it in query of DB2. If a use PATINDEX it is not working. please help here.
The allowed regular expression patterns are listed on this page
Regular expression control characters
Outside of a set, the following must be preceded with a backslash to be treated as a literal
* ? + [ ( ) { } ^ $ | \ . /
Inside a set, the follow must be preceded with a backslash to be treated as a literal
Characters that must be quoted to be treated as literals are [ ] \
Characters that might need to be quoted, depending on the context are - &
So for you, this should work
regexp_replace(sbbb,'[a-zA-Z _\-#]','')

Regex to find period in square brackets

I am parsing some SQL statements and have found places where the SELECT statement may be:
SELECT [tblCustomer].[FirstName], [tblCustomer.LastName], [tblOrder].[Order_No]
and as you see the second column has a . inside the square brackets. This is acceptable in Access SQL but not SQL Server. I'm trying to build a RegEx to identify when there is a . inside square brackets and replace it with ].[
I've tried: \[.+?\](?![\.]) which will get me a period inside square brackets but it doesn't stop searching when it finds the closing bracket.
I'm using ECMAScript to be compatible with VBA and I don't have concerns about nested brackets.
Example: https://regex101.com/r/Inxhdg/1/
You can use
Search for: (\[\w+)\.(?=\w+])
Replace with: $1].[
See the regex demo. Details:
(\[\w+) - Group 1 ($1): [ and then any one or more letters, digits, or underscores
\. - a dot
(?=\w+]) - a positive lookahead that requires one or more letters, digits or underscores and then a ] char immediately to the right of the current location.

sql regexp string end with ".0"

I want to judge if a positive number string is end with ".0", so I wrote the following sql:
select '12310' REGEXP '^[0-9]*\.0$'. The result is true however. I wonder why I got the result, since I use "\" before "." to escape.
So I write another one as select '1231.0' REGEXP '^[0-9]\d*\.0$', but this time the result is false.
Could anyone tell me the right pattern?
Dot (.) in regexp has special meaning (any character) and requires escaping if you want literally dot:
select '12310' REGEXP '^[0-9]*\\.0$';
Result:
false
Use double-slash to escape special characters in Hive. slash has special meaning and used for characters like \073 (semicolon), \n (newline), \t (tab), etc. This is why for escaping you need to use double-slash. Also for character class digit use \\d:
hive> select '12310.0' REGEXP '^\\d*?\\.0$';
OK
true
Also characters inside square brackets do not need double-slash escaping: [.] can be used instead of \\.
If you know it is a number string, why not just use:
select ( val like '%.0' )
You need regular expression if you want to validate that the string has digits everywhere else. But if you only need to check the last two characters, like is sufficient.
As for your question . is a wildcard in regular expressions. It matches any character.

REGEXP_REPLACE explanation

Hi may i know what does the below query means?
REGEXP_REPLACE(number,'[^'' ''-/0-9:-#A-Z''[''-`a-z{-~]', 'xy') ext_number
part 1
In terms of explaining what the function function call is doing:
It is a function call to analyse an input string 'number' with a regex (2nd argument) and replace any parts of the string which match a specific string. As for the name after the parenthesis I am not sure, but the documentation for the function is here
part 2
Sorry to be writing a question within an answer here but I cannot respond in comments yet (not enough rep)
Does this regex work? Unless sql uses different syntax this would appear to be a non-functional regex. There are some red flags, e.g:
The entire regex is wrapped in square parenthesis, indicating a set of characters but seems to predominantly hold an expression
There is a range indicator between a single quote and a character (invalid range: if a dash was required in the match it should be escaped with a '\' (backslash))
One set of square brackets is never closed
After some minor tweaks this regex is valid syntax:
^'' ''\-\/0-9:-#A-Z''[''-a-z{-~]`, but does not match anything I can think of, it is important to know what string is being examined/what the context is for the program in order to identify what the regex might be attempting to do
It seems like it is meant to replaces all ASCII control characters in the column or variable number with xy.
[] encloses a class of characters. Any character in that class matches. [^] negates that, hence all characters match, that are not in the class.
- is a range operator, e.g. a-z means all characters from a to z, like abc...xyz.
It seams like characters enclosed in ' should be escaped (The second ' is to escape the ' in the string itself.) At least this would make some sense. (But for none of the DBMS I found having a regexp_replace() function (Postgres, Oracle, DB2, MariaDB, MySQL), I found something in the docs, that would indicate this escape mechanism. They all use \, but maybe I missed something? Unfortunately you didn't tag which DBMS you're actually using!)
Now if you take an ASCII table you'll see, that the ranges in the expression make up all printable characters (counting space as printable) in groups from space to /, 0 to 9, : to #, etc.. Actually it might have been shorter to express it as '' ''-~, space to ~.
Given the negation, all these don't match. The ones left are from NUL to US and DEL. These match and get replaced by xy one by one.

Teradata regular expressions, 0 or 1 spaces

In Teradata, I'm looking for one regular expression pattern that would allow me to find a pattern of some numbers, then a space or maybe no space, and then 'SF'. It should return 7 in both cases below:
SELECT
REGEXP_INSTR('12345 1000SF', pattern),
REGEXP_INSTR('12345 1000 SF', pattern)
Or, my actual goal is to extract the 1000 in both cases if there's an easier way, probably using REGEXP_SUBSTR. More details are below if you need them.
I have a column that contains free text and I would like to extract the square footage. But, in some cases, there is a space between the number and 'SF' and in some cases there is not:
'other stuff 1000 SF'
'other stuff 1000SF'
I am trying to use the REGEXP_INSTR function to find the starting position. Through google, I have found the pattern for the first to be
'([0-9])+ SF'
When I try the pattern for the second, I try
'([0-9])+SF'
and I get the error
SELECT Failed. [2662] SUBSTR: string subscript out of bounds
I've also found an answer to a similar questions, but they don't work for Teradata. For example, I don't think you can use ? in Teradata.
The error message indicates you're using SUBSTR, not REGEXP_SUBSTR.
Try this:
RegExp_Substr(col, '[0-9]*(?= {0,1}SF)')
Find multiple digits followed by a single optional blank followed by SF and extract those digits.
I would pattern it like this:
\b(\d+)\s*[Ss][Ff]\b
\b # word boundary
(\d+) # 1 or more digits (captured)
\s* # 0 or more white-space characters
[Ss] # character class
[Ff] # character class
\b # word boundary
Demo