select month in date with substring and cast in postgresql - sql

I've been cracking my head with this excercise, the professor want us to select only the months from a date type in postgresql, but using only substring and cast.
I've tried several ways, but none of them worked. This is the latter sentence that i have:
select substring(cast(fac.fecha as varchar(10)), '_____[0,9]___' ) from facturas fac
this sentence returns this:
sentence result

[0,9] is regex for "a single 0 or nine or colon".
You probably want [0-9]{2}, i.e. regex for "2 digits".
On the other hand, all implementations (those I know of) of substr() (different spelling, maybe you have a more powerful engine, which supports regexes...) use indexes into the string as parameters, nothing wiht regexes or even the single letter wildcards _.
So I recommend to study the documentation of your substring().

Related

Understanding query with REGEX expressions

So I'm looking at the documentation of something left behind by a former employee who has been gone for awhile. I understand Regex, I just don't know what's going on in this scenario.
TABLE_QUERY(server_logs_abc, 'REGEXP_MATCH(table_id, r"^def_[A-Za-z0-9]{5}_[\d]{8}") and datediff( current_timestamp(), timestamp( regexp_extract(table_id,r"(\d{8})$") ) ) < 30')
There are also daily tables that look like this
iserver_cogs_abc.def_4J389_20180221
iserver_cogs_abc.def_4J389_20180220
iserver_cogs_abc.def_4J389_20180219
iserver_cogs_abc.def_4J389_20180218
And so on chronologically.
So I understand the two regex expressions. The first for REGEXP_MATCH is the naming convention:
def_4J389_2018XXXX
(XXXX representing the month and day timecode i.e 0221)
And the second Regex in REGEXP_EXTRACT is the 8-digit time code (i.e 20180221)
But what does it all mean when put together. Also, what does that leading rmean that proceeds the two Regex portions (i.e r"^def_[A-Za-z0-9]{5}_[\d]{8}"
Second regex extracts YYYYMMDD portion of table name which then is translated into timestamp and finally gets compared with current timestamp and DATEDIFF then returns number of the days between those two timestamps
So together they return return only last 30 days tables that match pattern from first regex
As of r - when used as prefix to string - it makes it so called raw string that is used widely in regular expression. for example - You can escape "\", "_", or "%" using two backslashes. For example, "\%". If you are using raw strings, only a single backslash is required. For example, r"\%"
I recommend you to google for raw / literal string to get more details on this

How to select values around .(dot) using sql

I am running below query in Teradata :
sel requesttext from dbc.tables
where tablename='old_employee_table'
Result:
alter table DB_NAME.employee_table,no fallback ;
I want to get below result using SQL:
DB_NAME.employee_table
Requesttext can be:
create set table DB_NAME.employee_table;
DB Name and table can occur anywhere in the result. Since .(dot) is joining them that's why i want to split with .(dot).
Basically I need sql which can result me surrounding values of .(dot)
I want DBName and Tablename in result.
I'm not a Teradata person, but this should work for both strings given so far, as long as teradata's regexp_substr() supports positive look-behind and positive look-ahead assertions (I might have the Teradata syntax wrong, so a little tweaking may be needed):
SELECT REGEXP_SUBSTR(requesttext, '(?<= )(\w+\.\w+)(?=[,$]?)', 1, 1)
FROM dbc.tables
WHERE tablename='old_employee_table'
See the regex101 example. Hopefully it translates to Teradata easily.
The regex looks for and returns the words either side of and including the period, when preceded by a space, and followed by an optional comma or the end of the line.
You could do this with either regexp_substr() or strtok().
As Jamie Zawinski said:
Some people, when confronted with a problem, think "I know, I'll use
regular expressions." Now they have two problems.
So I would go with the strtok() method. Also I'm lazy and regular expressions are hard.
Function strtok() takes three arguments:
The string being split
The delimiter to split the string
The number of the token to grab.
To get at the <database>.<table> from that string that is returned in your query, we can split by a space, grab the third token, then split that by a comma and grab the first token.
That would look like:
SELECT strtok(strtok(requestText,' ',3),',',1)
FROM dbc.tables
WHERE tablename='old_employee_table'

Redshift SQL - Extract numbers from string

In Amazon Redshift tables, I have a string column from which I need to extract numbers only out. For this currently I use
translate(stringfield, '0123456789'||stringfield, '0123456789')
I was trying out REPLACE function, but its not gonna be elegant.
Any thoughts with converting the string into ASCII first and then doing some operation to extract only number? Or any other alternatives.
It is hard here as Redshift do not support functions and is missing lot of traditional functions.
Edit:
Trying out the below, but it only returns 051-a92 where as I need 05192 as output. I am thinking of substring etc, but I only have regexp_substr available right now. How do I get rid of any characters in between
select REGEXP_SUBSTR('somestring-051-a92', '[0-9]+..[0-9]+', 1)
might be late but I was solving the same problem and finally came up with this
select REGEXP_replace('somestring-051-a92', '[a-z/-]', '')
alternatively, you can create a Python UDF now
Typically your inputs will conform to some sort of pattern that can be used to do the parsing using SUBSTRING() with CHARINDEX() { aka STRPOS(), POSITION() }.
E.g. find the first hyphen and the second hyphen and take the data between them.
If not (and assuming your character range is limited to ASCII) then your best bet would be to nest 26+ REPLACE() functions to remove all of the standard alpha characters (and any punctuation as well).
If you have multibyte characters in your data though then this is a non-starter.
Better method is to remove all the non-numeric values:
select REGEXP_replace('somestring-051-a92', '[^0-9]', '')
You can specify "any non digit" that includes non-printable, symbols, alpha, etc.
e.g., regexp_replace('brws--A*1','[\D]')
returns
"1"

Regular expression filter

I have this regular expression in my sql query
DECLARE #RETURN_VALUE VARCHAR(MAX)
IF #value LIKE '%[0-9]%[^A-Z]%[0-9]%'
BEGIN
SET #RETURN_VALUE = NULL
END
I am not sure, but whenever I have this in my row 12 TEST then it gives me the value of 12, but if I have three digit number then it filters out the three digit numbers.How can I modify the regular expression to return me the three digits numbers too.
any help will be appreciated.
SQL doesn't have regular expressions: it has SQL wildcard expressions. They are much simpler than regular expressions and long predate regular expressions. For instance, there is no way to specify alternation (a|b) or repetition ( a*, a+, a?, a{m,n} ) such as you might find in a regular expression.
The 'like expression' that you have
LIKE '%[0-9]%[^A-Z]%[0-9]%'
will match any string containing the following pattern anywhere in the string
zero or more of any character, followed by...
a single decimal digit, followed by...
zero or more of any character, followed by...
a single character other than A–Z (whether it's case sensitive or not depends on the collating sequence in use), followed by...
zero or of any character, followed by...
a single decimal digit, followed by...
zero or more of any character
One should note that the % is likely to match perhaps more than you might like.
Have you tried ([0-9]*). I believe that this will capture every digit for you. However, I am not as strong at regex. When I ran this through rubular, it worked, though :) BTW, rubular is a great way to test out regular expressions
You can easily create a SQL CLR function and use this in your queries. Visual Studio has a project template for this and makes deploying the functions a snap.
Here is more information from Microsoft about how to create the function and how to use it (for boolean matches and for data extraction).
First of all, note that this is not really a "regular expression", it's a SQL-specific form of wildcard matching. You are very limited in what you can accomplish with SQL wildcards. As one example, you cannot "optionally" match a specific character or character set.
Your expression, as you've written it, will match any value that contains two digits with at least one non-letter character in between them, meaning it will match:
111
1^1
1?7
1AAAAAAAAAAA?AAAAAAAAA1
-----------------------5-----------------3-------
And infinitely more items of a similar structure.
Oddly, one string that would not match this pattern is "12 TEST" because there is no character between the 1 and 2. The pattern also won't "give you" the value of 12 back because it's not a parsing expression, just a matching expression: it returns 1 (true) or 0 (false).
There is clearly something else going on in your application, possibly even an actual regular expression, but it has nothing to do with the SQL you've included here.

GREP REGEX for EnCase - Date conversion

How do I convert February 2, 2002 at 10 to GREP REGEX for EnCase Forensics?
Thanks
I don't know EnCase Forensics and given the astounding number of answers here, I am not alone...
This is probably not even a programming question, more like a superuser one, "how to use a program".
But well, if it uses traditional regexes (regular expressions, right), like in Perl (not pearl), just enter the string as given, it doesn't use any special character used in regexes. Ie. that will be a plain text search.
It depends on the format. For Month/Day/Year formatting, I would recommend 0?2/11/(20)?02. This makes a leading zero optional as well as allowing for a 2 digit or 4 digit year.
A general month/day/year date regexp in EnCase could be [01]?#/##/(##)?##. However, there will be many dates in a different format, using abbreviations for months or formatted as year/month/day, etc.
(NB: EnCase uses '#' for [0-9] whereas most grep engines use \d.)
Jon