Regex Extract SQL - sql

I have a string in one of my columns that looks like this:
DadC - Review Vid - Vid - Eng - TC
How can I regex extract using SQL the second to last word 'Eng'?
How can I regex extract the second-word 'Review Vid"?
Currently in google SQL I have a query that looks like this to extract the last word:
SELECT *,
REGEX_EXTRACT(column_name, r'(\w+$)') AS lan

Your regular expression (\w+$) contains the following bits:
(...) capture the stuff inside
\w a word character: letters and numbers and underscore
+ at least one of the previous thing
$ the end of the text
So \w+ means "at least one word character"
And \w+$ means "at least one word character at the end of the text"
And (\w+$) means "capture at least one word character at the end of the text"
But you don't want to capture those word characters - you want something earlier in the text.
So \w+ - \w+ would match two words with a dash between.
And \w+ - \w+$ would match two words with a dash between at the end of the text.
And all that is left is to put parens around the part you want to capture: (\w+) - \w+$

Related

Delete specific pattern between commas in text file

I have thousand of SQL queries written over notepad++ line by line.Single line contain single SQL query.Every SQL query contain list of columns to be selected from database as comma separated values.Now we want certain columns not to be part of that list which follow a specific pattern/regular expression.The SQL query follows a specific pattern :
A trimmed column has been selected as alias 'PK'
Every query has got a 'dated'where condition at the end of it.
Sometimes the pattern which we wish to remove exist in either PK/where or both.we don't want to remove that column/pattern from those places.Just from the column selection list.
Below is the example of a SQL query :
select (TRIM(TAE_TSP_REC_UPDATE)) as PK,TAE_AMT_FAIR_MV,TAE_TXT_ACCT_NUM,TAE_CDE_OWNER_TYPE,TAE_DTE_AQA_ABA,TAE_RID_OWNER,TAE_FID_OWNER,TAE_CID_OWNER,TAE_TSP_REC_UPDATE from TABLE_TAX_REP where DATE(TAE_TSP_REC_UPDATE)>='03/31/2018'
After removal of columns/patterns query should look like below :
select (TRIM(TAE_TSP_REC_UPDATE)) as PK,TAE_AMT_FAIR_MV,TAE_TXT_ACCT_NUM,TAE_CDE_OWNER_TYPE,TAE_DTE_AQA_ABA from TABLE_TAX_REP where DATE(TAE_TSP_REC_UPDATE)>='03/31/2018'
want to remove below patterns from each and every query between the commas :
.FID.
.RID.
.CID.
.TSP.
If the pattern exist within TRIM/DATE function it should not be touched.It should only be removed from column selection list.
Could somebody please help me regarding above.Thanks in advance
You may use
(?:\G(?!^)|\sas\s(?=.*'\d{2}/\d{2}/\d{4}'$))(?:(?!\sfrom\s).)*?\K,?\s*[A-Z_]+_(?:[FRC]ID|TSP)_[A-Z_]+
Details
(?:\G(?!^)|\sas\s(?=.*'\d{2}/\d{2}/\d{4}'$)) - two alternatives:
\G(?!^) - the end of the previous location, not a position at the start of the line
| - or
\sas\s(?=.*'\d{2}/\d{2}/\d{4}'$) - an as surrounded with single whitespaces that is followed with any 0+ chars other than line break chars and then ', 2 digits, /, 2 digits, /, 4 digits and ' at the end of the line
(?:(?!\sfrom\s).)*? - consumes any char other than a linebreak char, 0 or more repetitions, as few as possible, that does not start whitespace, from, whitespace sequence
\K - a match reset operator discarding all text matched so far
,?\s* - an optional comma followed with 0+ whitespaces
[A-Z_]+_(?:[FRC]ID|TSP)_[A-Z_]+ - ASCII letters or/and _, 1 or more occurrences, followed with _, then F, R or C followed with ID or TSP, then _, and again 1 or more occurrences of ASCII letters or/and _.
See the regex demo.

regular expression using oracle REGEXP_INSTR

I want to use REGEXP_INSTR function in a query to search for any match for user input but I don't know how to write the regular expression that for example will match any value that includes the word car followed by unspecified numbers of letters/numbers/spaces and then the word Paterson. can any one please help me with writing this regEx?
Ok, so let's break this down.
"any value that includes the word car"
I surmise from this that the word car doesn't need to be at the start of the string, therefore i would start the format string with...
'^.*'
Here the '^' character means the start of the string, the '.' means any character and '*' means 0 or more of the preceding character. So zero or more of any character after the start of the string.
Then the word 'car', so...
'^.*car'
Next up...
"followed by unspecified numbers of letters/numbers/spaces"
I'm guessing that unspecified means zero or more. This is very similar to what we did to identify any characters that might come before 'car'. Where the '.' means any character
'^.*car.*'
However, if unspecified means one or more, then you can use '+' in place of '*'
"then the word Paterson"
I'm going to assume that as this is the end of the description, there are no more characters after 'Paterson'.
'^.*car.*Paterson$'
The '$' symbol means that the 'n' of 'Paterson' must be at the end of the string.
Code example:
select
REGEXP_INSTR('123456car1234ABCDPaterson', '^.*car.*Paterson$') as rgx
from dual
Output
RGX
----------
1

SQL Oracle - Replace character in string between two vowels

I already read all REGEXP_REPLACE documentation, but didn't found anything that I looking for. I want to replace a specificate charater between two vowels to another charater.
Example:
String: abcdeZebca
Output: abcdeSebca
The letter Z was replaced by S, cause its was between two vowels. Thats possible in SQL Oracle?
I'm guessing you didn't catch the bit about backreferences in the docs though:
SELECT
REGEXP_REPLACE(yourcolumn, '([aeiou])Z([aeiou])', '\1S\2')
FROM
yourtable
Explained:
[aeiou] means match any single vowel. Surrounding it in brackets means "and remember what you found into a numbered slot, starting with 1" slots are numbered from left to right throughout the entire expression - each (brackets expression) gets its own number
Hence the full expression means:
- find any vowel and store in slot 1
- followed by Z
- followed by any vowel and store in slot 2
The replacement string is:
- the contents of slot 1
- S
- the contents of slot 2
Hence
aZe -> aSe
eZi -> eSi
And so on..

match line that doesnt contain certain words

I have the following string:
ignoreword1,word1, ignoreword2
i would like to match any word that is not ignoreword1 or ignoreword2
this is what i have so far
(?s)^((?!ignoreword1).)*$
the main goal is to use the regex as part of postgresql database to select rows where the column match a substring after removing "ignoreword1", "ignoreword2" and the comma ","
To match any word that is not ignoreword1 or Ignoreword2 use 
\b(?!(?:ignoreword1|ignoreword2)\b)\w+
In PostgreSQL, word boundaries are [[:<:]] and [[:>:]], so use something like:
[[:<:]](?!(?:ignoreword1|ignoreword2)[[:>:]])[a-zA-Z]+
Pattern details:
[[:<:]] - leading word boundary
(?!(?:ignoreword1|ignoreword2)[[:>:]]) - fail the match if the whole string is either ignoreword1 or ignoreword2
[a-zA-Z]+ - one or more any ASCII letters.

Argument '0' is out of range error

I have a query (sql) to pull out a street name from a string. It's looking for the last occurrence of a digit, and then pulling the proceeding text as the street name. I keep getting the oracle
"argument '0' is out of range"
error but I'm struggling to figure out how to fix it.
the part of the query in question is
substr(address,regexp_instr(address,'[[:digit:]]',1,regexp_count(address,'[[:digit:]]'))+2)
any help would be amazing. (using sql developer)
The fourth parameter of regexp_instr is the occurrence:
occurrence is a positive integer indicating which occurrence of
pattern in source_string Oracle should search for. The default is 1,
meaning that Oracle searches for the first occurrence of pattern.
In this case, if an address has no digits within, the regexp_count will return 0, that's not a valid occurrence.
A simpler solution, which does not require separate treatment for addresses without a house number, is this:
with t (address) as (
select '422 Hickory Str.' from dual union all
select 'One US Bank Plaza' from dual
)
select regexp_substr(address, '\s*([^0-9]*)$', 1, 1, null, 1) as street from t;
The output looks like this:
STREET
-------------------------
Hickory Str.
One US Bank Plaza
The third argument to regexp_substr is the first of the three 1's. It means start the search at the first character of address. The second 1 means find the first occurrence of the search pattern. The null means no special match modifiers (such as case insensitive - nothing like that needed here). The last 1 means "return the first SUBEXPRESSION from the match pattern". Subexpressions are parts of the match expression enclosed in parentheses.
The match pattern has a $ at the end - meaning "anchor at the end of the input string" ($ means the end of the string). Then [...] means match any of the characters in square brackets, but the ^ in [^...] changes it to match any character OTHER THAN what is in the square brackets. 0-9 means all characters between 0 and 9; so [^0-9] means match any character(s) OTHER THAN digits, and the * after that means "any number of such characters" (between 0 and everything in the input string). \s is "blank space" - if there are any blank spaces following a possible number in the address, you don't want them included right at the beginning of the street name. The subexpression is just [^0-9]* meaning the non-digits, not including any spaces before them (because the \s* is outside the left parenthesis).
My example illustrates a potential problem though - sometimes an address does, in fact, have a "number" in it, but spelled out as a word instead of using digits. What I show is in fact a real-life address in my town.
Good luck!
looking for the last occurrence of a digit, and then pulling the proceeding text as the street name
You could simply do:
SELECT REGEXP_REPLACE( address, '^(.*)\d+\D*$', '\1' )
AS street_name
FROM address_table;