Use Regular Expression to extract text between two characters - sql

I'm using PostgreSQL and I have a list of names in the following format:
"Abbing, Mr. Anthony"
"Abbott, Mr. Rossmore Edward"
"Abbott, Mrs. Stanton (Rosa Hunt)"
And I want to extract the title (i.e. "Mr", "Mrs"). It is always between the comma and the dot.
I'm only new to regular expressions, this is what i'm trying to use but i'm not getting the correct answer.
SELECT SUBSTRING(name from ',..')
I get ", M" as an answer.
I assume this is something very simple to fix.
Thanks

substring(name from '(, (Mr)(s){0,1}\.)')
will extract , Mr. or , Mrs.. Note the parentheses around the whole expression. substring( ... from ..) will return the match from the first group in the regex. As I have used (Mr) and (s) to match the titles, I have to put everything between parentheses to make substring() return the whole pattern
to get rid of the leading ', ' you can use trim()
trim(substring(name from '(, (Mr)(s){0,1}\.)'), ', ')

Try
SELECT SUBSTRING(name from ',\s([^\.]+)\.')

You could use position and substring function to do this:
SELECT samplename
,trim(substring(samplename, position(',' IN samplename) + 1
, position('.' IN samplename) - position(',' IN samplename))) AS initials
FROM test
SQL Fiddle Demo

Related

how to replace dots from 2nd occurrence

I have column with software versions. I was trying to remove dot from 2nd occurrence in column, like
select REGEXP_REPLACE('12.5.7.8', '.','');
expected out is 12.578
sample data is here
Is it possible to remove dot from 2nd occurrence
One option is to break this into two pieces:
Get the first number.
Get the rest of the numbers as an array.
Then convert the array to a string with no separator and combine with the first:
select (split_part('12.5.7.8', '.', 1) || '.' ||
array_to_string((REGEXP_SPLIT_TO_ARRAY('12.5.7.8', '[.]'))[2:], '')
)
Another option is to replace the first '.' with something else, then get rid of the '.'s and replace the something else with a '|':
select translate(regexp_replace(version, '^([^.]+)[.](.*)$', '\1|\2'), '|.', '.')
from software_version;
Here is a db<>fiddle with the three versions, including the version a_horse_with_no_name mentions in the comment.
I'd just take the left and right:
concat(
left(str, position('.' in str)),
replace(right(str, -position('.' in str)), '.', '')
)
For a str of 12.34.56.78, the left gives 12. and the right using a negative position gives 34.56.78 upon which the replace then removes the dots

Select the first word on the left of a string in snowflake

I have a column (mycolumn) in my snowflake table (mytable) whose content has this pattern :
JohnDoe - Client Number One
MaryJane - Client Number Two
I would need to extract the first portion on the left of the string (JohnDoe,MaryJane - with no whitespace behind).
I tried to use the following approach, but I got stucked because I could only remove the first two block of words to the right, but not the - (dash) and the white spaces.
select substring(mycolumn,1,length(mycolumn)- CHARINDEX(' ', REVERSE(mycolumn))- CHARINDEX(' ', REVERSE(mycolumn))) from mytable
You can use regexp_substr():
select regexp_substr(mycolumn, '^[^ ]+')
from mytable;

How to remove WhiteSpaces using LTRIM and RTRIM to display name?

I'm new to sql and working with a column name where names are listed with spaces.
Example: Alan Joe
I am using LTRIM and RTRIM to display name as 'AlanJoe'
select LTRIM(name)
Any help how to remove spaces between the names or any links I can learn from?
Thank you
use replace() function
select replace(name,' ','')
You can try with
ISNULL(LTRIM(RTRIM((FirstName,''),'') + ' ','') + LTRIM(RTRIM((LastName,''),'')
this will make FirstName LastName combination with 1 Space if FirstName Value exists else will give only Last Name.
For removing the spaces simply use REPLACE(Name,' ','')

What's the equivalent of Excel's `left(find(), -1)` in BigQuery?

I have names in my dataset and they include parentheses. But, I am trying to clean up the names to exclude those parentheses.
Example: ABC Company (Somewhere, WY)
What I want to turn it into is: ABC Company
I'm using standard SQL with google big query.
I've done some research and I know big query has left(), but I do not know the equivalent of find(). My plan was to do something that finds the ( and then gives me everything to the left of -1 characters from the (.
My plan was to do something that finds the ( and then gives me everything to the left of -1 characters from the (.
Good plan! In BigQuery Standard SQL - equivalent of LEFT is SUBSTR(value, position[, length]) and equivalent of FIND is STRPOS(value1, value2)
With this in mind your query can look like (which is exactly as you planned)
#standardSQL
WITH names AS (
SELECT 'ABC Company (Somewhere, WY)' AS name
)
SELECT SUBSTR(name, 1, STRPOS(name, '(') - 1) AS clean_name
FROM names
Usually, string functions are less expensive than regular expression functions, so if you have pattern as in your example - you should go with above version
But in more generic cases, when pattern to clean is more dynamic like in Graham's answer - you should go with solution in Graham's answer
Just use REGEXP_REPLACE + TRIM. This will work with all variants (just not nested parentheses):
#standardSQL
WITH
names AS (
SELECT
'ABC Company (Somewhere, WY)' AS name
UNION ALL
SELECT
'(Somewhere, WY) ABC Company' AS name
UNION ALL
SELECT
'ABC (Somewhere, WY) Company' AS name)
SELECT
TRIM(REGEXP_REPLACE(name,r'\(.*?\)',''), ' ') AS cleaned
FROM
names
Use REGEXP_EXTRACT:
SELECT
RTRIM(REGEXP_EXTRACT(names, r'([^(]*)')) AS new_name
FROM yourTable
The regex used here will greedily consume and match everything up until hitting an opening parenthesis. I used RTRIM to remove any unwanted whitespace picked up by the regex.
Note that this approach is robust with respect to the edge case of an address record not having any term with parentheses. In this case, the above query would just return the entire original value.
I can't test this solution at the moment, but you can combine SUBSTR and INSTR. Like this:
SELECT CASE WHEN INSTR(name, '(') > 0 THEN SUBSTR( name, 1, INSTR(name, '(') ) ELSE name END as name FROM table;

Oracle SQL - Parsing a name string and converting it to first initial & last name

Does anyone know how to turn this string: "Smith, John R"
Into this string: "jsmith" ?
I need to lowercase everything with lower()
Find where the comma is and track it's integer location value
Get the first character after that comma and put it in front of the string
Then get the entire last name and stick it after the first initial.
Sidenote - instr() function is not compatible with my version
Thanks for any help!
Start by writing your own INSTR function - call it my_instr for example. It will start at char 1 and loop until it finds a ','.
Then use as you would INSTR.
The best way to do this is using Oracle Regular Expressions feature, like this:
SELECT LOWER(regexp_replace('Smith, John R',
'(.+)(, )([A-Z])(.+)',
'\3\1', 1, 1))
FROM DUAL;
That says, 1) when you find the pattern of any set of characters, followed by ", ", followed by an uppercase character, followed by any remaining characters, take the third element (initial of first name) and append the last name. Then make everything lowercase.
Your side note: "instr() function is not compatible with my version" doesn't make sense to me, as that function's been around for ages. Check your version, because Regular Expressions was only added to Oracle in version 9i.
Thanks for the points.
-- Stew
instr() is not compatible with your version of what? Oracle? Are you using version 4 or something?
There is no need to create your own function, and quite frankly, it seems a waste of time when this can be done fairly easily with sql functions that already exist. Care must be taken to account for sloppy data entry.
Here is another way to accomplish your stated goal:
with name_list as
(select ' Parisi, Kenneth R' name from dual)
select name
-- There may be a space after the comma. This will strip an arbitrary
-- amount of whitespace from the first name, so we can easily extract
-- the first initial.
, substr(trim(substr(name, instr(name, ',') + 1)), 1, 1) AS first_init
-- a simple substring function, from the first character until the
-- last character before the comma.
, substr(trim(name), 1, instr(trim(name), ',') - 1) AS last_name
-- put together what we have done above to create the output field
, lower(substr(trim(substr(name, instr(name, ',') + 1)), 1, 1)) ||
lower(substr(trim(name), 1, instr(trim(name), ',') - 1)) AS init_plus_last
from name_list;
HTH,
Gabe
I have a hard time believing you don’t have access to a proper instr() but if that’s the case, implement your own version.
Assuming you have that straightened out:
select
substr(
lower( 'Smith, John R' )
, instr( 'Smith, John R', ',' ) + 2
, 1
) || -- first_initial
substr(
lower( 'Smith, John R' )
, 1
, instr( 'Smith, John R', ',' ) - 1
) -- last_name
from dual;
Also, be careful about your assumption that all names will be in that format. Watch out for something other than a single space after the comma, last names having data like “Parisi, Jr.”, etc.