I am trying to use regular expression in a sql statement to remove all spaces and digits from a string 'text' which may look like
1. 000000123456 (No space)
2. 00000 123456 (spaces in the beginning and end of string)
3. 90330000 45 (2 spaces at the end)
I have been able to come up with the solutions below so far:
select regexp_replace('text','\\s(^[0-9]*)\\s','\\1')
select regexp_replace('text','[[:blank:]]+[^[0-9]*][[:blank:]]+','\\1')
The results I get are:
1. 000000123456
2. 00000 123456
3. 90330000 45
I get the text as is. If I try to just remove the digits using
regexp_replace('text','^[0-9]*','\\1'),
it works fine- all digits get removed and the value results in ''(null). But the text with spaces does not remove the digits nor the space.
What am I doing wrong here?
Try using a character class which contains both digits and whitespace:
regexp_replace('text', '[0-9\\s]+', '')
This should remove all numbers and whitespace. But, it isn't completely clear what you are trying to do, because you did not show us the numbers in context.
Related
I have a table with medication_product_amount column where there are spaces between numbers and characteres like below:
medication_product_amount
1 UN DE 50 ML
20 UN
1 UN DE 600 G
What I want is to remove the single space ONLY between numbers and characters, something like this:
new_medication_product_amount
1UN DE 50ML
20UN
1UN DE 600G
To do this, I am looking for a regular expression to use in the function REGEXP_REPLACE. I tried using the pattern below, indicating to replace the single space after the numbers, but the output remained the same as the input:
select REGEXP_REPLACE(medication_product_amount, '(^[0-9])( )', '\1') as new_medication_product_amount
from medications
Can anyone help me come up with the right way to do this? Thanks!
Your regex is a little off. First what yours does. '(^[0-9])( )', '\1')
(^[0-9]) Start Capture (field 1) at the beginning of the string for 1 digit
followed by Start Capture (field 2) for 1 space.
Replace the string by field1.
The problems and correction:
What you want to capture does not necessary the first character of the string. So eliminate the anchor ^.
What you want to capture may be more that 1 digit in length. So replace [0-9] by [0-9]+. I.E any number of digits.
Not actually a problem but a space holds no special meaning in a regexp, it is just a space so no need to capture it unless user later. Replace ( ) with just .
END of Pattern. But there may be other occurrences. Tell Postgres to continue with the above pattern until end of string. (see flag 'g').
Resulting Expression/Query: (demo here)
select regexp_replace(medication_product_posology, '([0-9]+) ', '\1','g') as new_medication_product_posology
from medications;
Match "digit space letter", capturing and the digit and letter using '([0-9]) ([A-Z])', then put them back using back references.
select REGEXP_REPLACE(medication_product_amount, '([0-9]) ([A-Z])', '\1\2') as new_medication_product_amount
from medications
I am new to regex and need to search a string field in Impala for multiple matches to this exact sequence of characters: ~FC* followed by 11 more * that could have letters/digits between (but could not, they are basically delimiters in this string field). After the 12th * (if you count #1 in ~FC*) it should be immediately followed by Y~.
since the asterisks are not letters or digits, I am unsure on how to search for these delimiters properly.
This is my SQL so far:
select
regexp_extract(col_name, '(~FC\\*).*(\\*Y~)', 1) as "pattern_found"
from db.table
where id = 123456789
limit 1
data returned:
pattern_found
--------------
~FC*
(~FC\\*) in Impala SQL it returns ~FC* which is great (got it from my other question)
Been trying this (~FC\\*).*(\\*Y~) which obviously isnt counting the number of asterisks but its is also not picking the Y up.
This is a test string, it has 2 occurrences:
N4*CITY*STATE*2155446*2120~FC*C*IND*30*MC*blah blah fjdgfeufh*27*0*****Y~FC*Z*IND*39*MC*jhlkfhfudfgsdkufgkusgfn*23*0*****Y~
results should be these 2, which has an overlapping ~ between them. but will settle for at least the first being found if both cannot.
~FC*C*IND*30*MC*blah blah fjdgfeufh*27*0*****Y~
~FC*Z*IND*39*MC*jhlkfhfudfgsdkufgkusgfn*23*0*****Y~
figured out a solution but happy to learn of a better way to accomplish this
This is what worked in Impala SQL, needed parentheses and double escape backslashes for allllll the asterisks:
(~FC\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*Y)
Full SQL:
select
regexp_extract(col_name, '(~FC\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*Y)', 1) as "pattern_found"
from db.table
where id = 123456789
limit 1
and here is the RegexDemo without the additional syntax needed for Impala SQL
I need to find rows where the phone number field contains unexpected characters.
Most of the values in this field look like:
123456-7890
This is expected. However, we are also seeing character values in this field such as * and #.
I want to find all rows where these unexpected character values exist.
Expected:
Numbers are expected
Hyphen with numbers is expected (hyphen alone is not)
NULL is expected
Empty is expected
Tried this:
WHERE phone_num is not like ' %[0-9,-,' ' ]%
Still getting rows where phone has numbers.
from https://regexr.com/3c53v address you can edit regex to match your needs.
I am going to use example regex for this purpose
select * from Table1
Where NOT REGEXP_LIKE(PhoneNumberColumn, '^[+]*[(]{0,1}[0-9]{1,4}[)]{0,1}[-\s\./0-9]*$')
You can use translate()
...
WHERE translate(Phone_Number,'a1234567890-', 'a') is NOT NULL
This will strip out all valid characters leaving behind the invalid ones. If all the characters are valid, the result would be NULL. This does not validate the format, for that you'd need to use REGEXP_LIKE or something similar.
You can use regexp_like().
...
WHERE regexp_like(phone_num, '[^ 0123456789-]|^-|-$')
[^ 0123456789-] matches any character that is not a space nor a digit nor a hyphen. ^- matches a hyphen at the beginning and -$ on the end of the string. The pipes are "ors" i.e. a|b matches if pattern a matches of if pattern b matches.
Oracle has REGEXP_LIKE for regex compares:
WHERE REGEXP_LIKE(phone_num,'[^0-9''\-]')
If you're unfamiliar with regular expressions, there are plenty of good sites to help you build them. I like this one
I have thousand of SQL queries written over notepad++ line by line.Single line contain single SQL query.Every SQL query contain list of columns to be selected from database as comma separated values.Now we want certain columns not to be part of that list which follow a specific pattern/regular expression.The SQL query follows a specific pattern :
A trimmed column has been selected as alias 'PK'
Every query has got a 'dated'where condition at the end of it.
Sometimes the pattern which we wish to remove exist in either PK/where or both.we don't want to remove that column/pattern from those places.Just from the column selection list.
Below is the example of a SQL query :
select (TRIM(TAE_TSP_REC_UPDATE)) as PK,TAE_AMT_FAIR_MV,TAE_TXT_ACCT_NUM,TAE_CDE_OWNER_TYPE,TAE_DTE_AQA_ABA,TAE_RID_OWNER,TAE_FID_OWNER,TAE_CID_OWNER,TAE_TSP_REC_UPDATE from TABLE_TAX_REP where DATE(TAE_TSP_REC_UPDATE)>='03/31/2018'
After removal of columns/patterns query should look like below :
select (TRIM(TAE_TSP_REC_UPDATE)) as PK,TAE_AMT_FAIR_MV,TAE_TXT_ACCT_NUM,TAE_CDE_OWNER_TYPE,TAE_DTE_AQA_ABA from TABLE_TAX_REP where DATE(TAE_TSP_REC_UPDATE)>='03/31/2018'
want to remove below patterns from each and every query between the commas :
.FID.
.RID.
.CID.
.TSP.
If the pattern exist within TRIM/DATE function it should not be touched.It should only be removed from column selection list.
Could somebody please help me regarding above.Thanks in advance
You may use
(?:\G(?!^)|\sas\s(?=.*'\d{2}/\d{2}/\d{4}'$))(?:(?!\sfrom\s).)*?\K,?\s*[A-Z_]+_(?:[FRC]ID|TSP)_[A-Z_]+
Details
(?:\G(?!^)|\sas\s(?=.*'\d{2}/\d{2}/\d{4}'$)) - two alternatives:
\G(?!^) - the end of the previous location, not a position at the start of the line
| - or
\sas\s(?=.*'\d{2}/\d{2}/\d{4}'$) - an as surrounded with single whitespaces that is followed with any 0+ chars other than line break chars and then ', 2 digits, /, 2 digits, /, 4 digits and ' at the end of the line
(?:(?!\sfrom\s).)*? - consumes any char other than a linebreak char, 0 or more repetitions, as few as possible, that does not start whitespace, from, whitespace sequence
\K - a match reset operator discarding all text matched so far
,?\s* - an optional comma followed with 0+ whitespaces
[A-Z_]+_(?:[FRC]ID|TSP)_[A-Z_]+ - ASCII letters or/and _, 1 or more occurrences, followed with _, then F, R or C followed with ID or TSP, then _, and again 1 or more occurrences of ASCII letters or/and _.
See the regex demo.
I have a string like the following in a sqlite 1 column:
1a 2B 3c 354 AfS 151 31s2fef 1fs31 3F1e2s 84f64e 45fs
space separated, x amount of characters 0-9, a-z, A-Z, there might be punctuation I'm not sure, but it is definitely space separated.
I'm trying to make a regular expression so I can query the database by number of words. basically if I wanted to get the 6th "word" in the example I'd be looking for:
151
so I tried to make a regular expression that says if the Nth word = 151, return me that row.
Here's what I've got so far.
SELECT * FROM table1 WHERE column1 REGEXP ^((?:\S+\s+){1}){6}
That unfortunately gives me the first through sixth words, but I really want to pinpoint the 6th word like the example above.
Also, I was thinking so save room in the database I could get rid of the white space, I'd just need to know how to count a specific number of characters in to the string which I couldn't figure out either.
Thanks for the help, never written a regular expression before.
If all you need is a match, there is no need for the non-capturing group. Just match a non-space + space group 5 times and follow with a 151.
^(\S+\s+){5}151
Use following regex
^([^\s]+\s){6}(.*?)(\s|$)
this must return 151, you can change {6} with your number to match the string.