Understanding query with REGEX expressions - sql

So I'm looking at the documentation of something left behind by a former employee who has been gone for awhile. I understand Regex, I just don't know what's going on in this scenario.
TABLE_QUERY(server_logs_abc, 'REGEXP_MATCH(table_id, r"^def_[A-Za-z0-9]{5}_[\d]{8}") and datediff( current_timestamp(), timestamp( regexp_extract(table_id,r"(\d{8})$") ) ) < 30')
There are also daily tables that look like this
iserver_cogs_abc.def_4J389_20180221
iserver_cogs_abc.def_4J389_20180220
iserver_cogs_abc.def_4J389_20180219
iserver_cogs_abc.def_4J389_20180218
And so on chronologically.
So I understand the two regex expressions. The first for REGEXP_MATCH is the naming convention:
def_4J389_2018XXXX
(XXXX representing the month and day timecode i.e 0221)
And the second Regex in REGEXP_EXTRACT is the 8-digit time code (i.e 20180221)
But what does it all mean when put together. Also, what does that leading rmean that proceeds the two Regex portions (i.e r"^def_[A-Za-z0-9]{5}_[\d]{8}"

Second regex extracts YYYYMMDD portion of table name which then is translated into timestamp and finally gets compared with current timestamp and DATEDIFF then returns number of the days between those two timestamps
So together they return return only last 30 days tables that match pattern from first regex
As of r - when used as prefix to string - it makes it so called raw string that is used widely in regular expression. for example - You can escape "\", "_", or "%" using two backslashes. For example, "\%". If you are using raw strings, only a single backslash is required. For example, r"\%"
I recommend you to google for raw / literal string to get more details on this

Related

How to extract digits from field using regex

I am using Firebird 2.5 and I have a field (called identifier) with mixed letters, numbers and special characters. I would like to use regex to extract only the numbers in a new column. I have tried something like below, but it is not working.
Any idea how I can achieve this using regex without using stored procedures or execute block
SELECT ORDER_ID,
ORDER_DATE,
SUBSTRING(IDENTIFIER FROM 1 TO 10) SIMILAR TO '^[0-9]{10}$' --- DESIRED EXTRACTION COLUMN
FROM ORDERS
Example of data
IDENTIFIER DESIRED OUTPUT
ANDRE 02869567995 02869567995
02869567995 MARIA 02869567995
028.695.67.995 02869567995
028695679-95 02869567995
You cannot do this in Firebird 2.5, at least not without help from a UDF, or a (selectable) stored procedure. I'm not aware of third-party UDFs providing regular expressions, so you might have to write this yourself.
In Firebird 3.0, you could also use a UDR or stored function to achieve this. Unfortunately, using the regular expression functionality available in Firebird alone will not be enough to solve this.
NOTE: The rest of the answer is based on the assumption to extract digits if the first 10 characters of string are digits. With the updated question, this assumption is no longer valid.
That said, if your need is exactly as shown in your question, that is only extract the first 10 characters from a string if they are all digits, then you could use:
case
when IDENTIFIER similar to '[[:DIGIT:]]{10}%'
then substring(IDENTIFIER from 1 for 10)
end
(as an aside, the positional SUBSTRING syntax is from <start> for <length>, not from <start> to <end>)
In Firebird 3.0 and higher, you can use SUBSTRING(... SIMILAR ...) with a SQL regular expression pattern. Assuming you want to extract 10 digits from the start of a string, you can do:
substring(IDENTIFIER similar '#"[[:DIGIT:]]{10}#"%' escape '#')
The #" delimits the pattern to extract (where # is a custom escape character as specified in the ESCAPE clause). The remainder of the pattern must match the rest of the string, hence the use of % here (in other cases, you may need to specify a pattern before the first #" as well.
See this dbfiddle for an example.
It is not possible in any version of Firebird.

Regular expression metacharacter in SQL yields different results in Oracle vs. Postgres

I'm trying to convert some queries from an Oracle environment to Postgres. This is a simplified version of one of the queries:
SELECT * FROM TABLE
WHERE REGEXP_LIKE(TO_CHAR(LINK_ID),'\D')
I believe the equivalent postgreSQL should be this:
SELECT * FROM TABLE
WHERE CAST(LINK_ID AS TEXT) ~ '\D'
But when I run these queries in their respective environments on the exact same dataset, the first query outputs no records (which is correct) and the second query outputs all records in the table. I didn't write the original code, but as I understand it, it's looking for any values in the numeric field LINK_ID that are non-digit characters. Is the \D metacharacter supposed to behave differently in Oracle vs. postgres? I'm not seeing anything in documentation to say they should.
The documentation for Oracle's TO_CHAR(number) states
If you omit fmt, then n is converted to a VARCHAR2 value exactly long enough to hold its significant digits.
https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions181.htm
This means that the only non-numeric character which might be produced is a negative sign or a decimal point. If the number is positive and has no fractional part, it will not match the regular expression \D.
On the other hand, on PostgreSQL CAST(numeric(38,8)as TEXT) returns a value with the number of decimal places specified by the type specification, in this case 8.
E.g.:
cast( cast(12341234 as numeric(38,8)) as TEXT)
Generates 12341234.00000000 The result of such a cast will always contain a decimal point and therefore will always match the regular expression \D.
You may find that replacing it with this solves your problem:
(LINK_ID % 1) <> 0.0
Alternatively, If you need to use the regex (e.g. to simplify migration work), consider changing it to '\.0*[1-9]' i.e. to find a decimal point with any nonzero digit after it.

select month in date with substring and cast in postgresql

I've been cracking my head with this excercise, the professor want us to select only the months from a date type in postgresql, but using only substring and cast.
I've tried several ways, but none of them worked. This is the latter sentence that i have:
select substring(cast(fac.fecha as varchar(10)), '_____[0,9]___' ) from facturas fac
this sentence returns this:
sentence result
[0,9] is regex for "a single 0 or nine or colon".
You probably want [0-9]{2}, i.e. regex for "2 digits".
On the other hand, all implementations (those I know of) of substr() (different spelling, maybe you have a more powerful engine, which supports regexes...) use indexes into the string as parameters, nothing wiht regexes or even the single letter wildcards _.
So I recommend to study the documentation of your substring().

How to select values around .(dot) using sql

I am running below query in Teradata :
sel requesttext from dbc.tables
where tablename='old_employee_table'
Result:
alter table DB_NAME.employee_table,no fallback ;
I want to get below result using SQL:
DB_NAME.employee_table
Requesttext can be:
create set table DB_NAME.employee_table;
DB Name and table can occur anywhere in the result. Since .(dot) is joining them that's why i want to split with .(dot).
Basically I need sql which can result me surrounding values of .(dot)
I want DBName and Tablename in result.
I'm not a Teradata person, but this should work for both strings given so far, as long as teradata's regexp_substr() supports positive look-behind and positive look-ahead assertions (I might have the Teradata syntax wrong, so a little tweaking may be needed):
SELECT REGEXP_SUBSTR(requesttext, '(?<= )(\w+\.\w+)(?=[,$]?)', 1, 1)
FROM dbc.tables
WHERE tablename='old_employee_table'
See the regex101 example. Hopefully it translates to Teradata easily.
The regex looks for and returns the words either side of and including the period, when preceded by a space, and followed by an optional comma or the end of the line.
You could do this with either regexp_substr() or strtok().
As Jamie Zawinski said:
Some people, when confronted with a problem, think "I know, I'll use
regular expressions." Now they have two problems.
So I would go with the strtok() method. Also I'm lazy and regular expressions are hard.
Function strtok() takes three arguments:
The string being split
The delimiter to split the string
The number of the token to grab.
To get at the <database>.<table> from that string that is returned in your query, we can split by a space, grab the third token, then split that by a comma and grab the first token.
That would look like:
SELECT strtok(strtok(requestText,' ',3),',',1)
FROM dbc.tables
WHERE tablename='old_employee_table'

Regular expression filter

I have this regular expression in my sql query
DECLARE #RETURN_VALUE VARCHAR(MAX)
IF #value LIKE '%[0-9]%[^A-Z]%[0-9]%'
BEGIN
SET #RETURN_VALUE = NULL
END
I am not sure, but whenever I have this in my row 12 TEST then it gives me the value of 12, but if I have three digit number then it filters out the three digit numbers.How can I modify the regular expression to return me the three digits numbers too.
any help will be appreciated.
SQL doesn't have regular expressions: it has SQL wildcard expressions. They are much simpler than regular expressions and long predate regular expressions. For instance, there is no way to specify alternation (a|b) or repetition ( a*, a+, a?, a{m,n} ) such as you might find in a regular expression.
The 'like expression' that you have
LIKE '%[0-9]%[^A-Z]%[0-9]%'
will match any string containing the following pattern anywhere in the string
zero or more of any character, followed by...
a single decimal digit, followed by...
zero or more of any character, followed by...
a single character other than A–Z (whether it's case sensitive or not depends on the collating sequence in use), followed by...
zero or of any character, followed by...
a single decimal digit, followed by...
zero or more of any character
One should note that the % is likely to match perhaps more than you might like.
Have you tried ([0-9]*). I believe that this will capture every digit for you. However, I am not as strong at regex. When I ran this through rubular, it worked, though :) BTW, rubular is a great way to test out regular expressions
You can easily create a SQL CLR function and use this in your queries. Visual Studio has a project template for this and makes deploying the functions a snap.
Here is more information from Microsoft about how to create the function and how to use it (for boolean matches and for data extraction).
First of all, note that this is not really a "regular expression", it's a SQL-specific form of wildcard matching. You are very limited in what you can accomplish with SQL wildcards. As one example, you cannot "optionally" match a specific character or character set.
Your expression, as you've written it, will match any value that contains two digits with at least one non-letter character in between them, meaning it will match:
111
1^1
1?7
1AAAAAAAAAAA?AAAAAAAAA1
-----------------------5-----------------3-------
And infinitely more items of a similar structure.
Oddly, one string that would not match this pattern is "12 TEST" because there is no character between the 1 and 2. The pattern also won't "give you" the value of 12 back because it's not a parsing expression, just a matching expression: it returns 1 (true) or 0 (false).
There is clearly something else going on in your application, possibly even an actual regular expression, but it has nothing to do with the SQL you've included here.