I have a column with a name value with a data type of char(64) LATIN in a Teradata table. The values look like 'SMITH JOHN J ', 'Doe Jane Anne ', etc. The spaces between the elements vary from value to value. I am able to parse out the last name out with a left, but I am having trouble parsing out the first name and middle initial/name. I have tried using the index and position functions, but I am not getting the desired result. Has anyone encountered a similar scenario?
You could use regexp_substr() and adjust the occurence argument, which specifies the number of the occurence to return:
select
regexp_substr(name, '\w+', 1, 1) last_name,
regexp_substr(name, '\w+', 1, 2) middle_name,
regexp_substr(name, '\w+', 1, 3) first_name
from mytable
In PCRE notation, which Teradata used, \w matches on word characters (alphanumeric and the underscore). You might want to make the regex a little broader with \S (anything but a space).
Related
I've got pretty dirty data of client addresses. For each client, there are 2 or more addresses in one string. Using regular expressions in Oracle I want to subtract the first one.
It would be very easy if there was the same separator as ';'. But sometimes there is a comma. And comma is also used within an address to separate city, street, and building.
I've got Russian addresses so I translated them for you.
For example, I have a string with multiple addresses:
A comma is a separator, but it also separates blocks inside addresses.
So I could match the first address by matching everything until the second '\sul\.'.
But I don't how to do it. Regexp_substr(address, '.*,\sul') will return
This is far from what I need.
So how can I subtract everything until second ,\sul\. ?
Russia, Moscow, ul. Tverskaya, d.32 should be returned.
You could address this requirement using SUBSTR and INSTR instead of regexes. The following expression should give you what you need:
SUBSTR(v, 1, INSTR(v, ', ul.', 1, 2) - 1)
INSTR() finds the position of the second occurence of string ', ul.' in the source string, and SUBSTR() selects everything from the beginning of the string until that position (minus 1).
Example:
WITH t AS (
SELECT 'Russia, Moscow, ul. Tverskaya, d.32, ul. Yakimanka, d21, ul. Kalinina, d.43' address FROM DUAL
)
SELECT SUBSTR(address, 1, INSTR(address, ', ul.', 1, 2) - 1) adress1 FROM t
| ADRESS1 |
| :---------------------------------- |
| Russia, Moscow, ul. Tverskaya, d.32 |
Demo on DB Fiddle
NB: this works as long as there are indeed at least two occurences of the given pattern in the string. If you happen to have values that do not match this spec and that you want to preserve, you would need an additional level of testing, like:
CASE INSTR(address, ', ul.', 1, 2)
WHEN 0 THEN address
ELSE SUBSTR(address, 1, INSTR(address, ', ul.', 1, 2) - 1)
END adress1
Demo on DB Fiddle
I have a field with following values, now i want to extract only those rows with "xyz" in the field value mentioned below, can you please help?
Mydata_xyz_aug21
Mydata2_zzz_aug22
Mydata3_xyz_aug33
One more requirement
I want to extract only "aIBM_MyProjectFile" from following string below, can you please help me with this?
finaldata/mydata/aIBM_MyProjectFile.exe.ld
I've tried this but it didn't work.
select
regexp_substr('FinalProject/MyProject/aIBM_MyProjectFile.exe.ld','([^/]*)[\.]') exp
from dual;
To extract substrings between the first pair of underscores, you need to use
regexp_substr('Mydata_xyz_aug21','_([^_]+)_', 1, 1, NULL, 1)
To get the file name without the extension, you need
regexp_substr('FinalProject/MyProject/aIBM_MyProjectFile.exe.ld','.*/([^.]+)', 1, 1, NULL, 1)
Note that each regex contains a capturing group (a pattern inside (...)) and this value is accessed with the last 1 argument to the regexp_substr function.
The _([^_]+)_ pattern finds the first _, then places 1 or more chars other than _ into Group 1 and then matches another _.
The .*/([^.]+) pattern matches the whole text up to the last /, then captures 1 or more chars other than . into Group 1 using ([^.]+).
For the first requirement, it would suffice to use LIKE, as posted in answer above:
SELECT column
FROM table
WHERE column LIKE '%xyz%';
For your second requirement (extraction) you will have to use REGEXP_SUBSTR function:
SELECT REGEXP_SUBSTR ('FinalProject/MyProject/aIBM_MyProjectFile.exe.ld', '.*/([^.]+)', 1, 1, NULL, 1)
FROM DUAL
I hope it helped!
Another way to do this is to skip regexp completely:
WITH
aset AS
(SELECT 'with_extension.txt' txt FROM DUAL
UNION ALL
SELECT 'without_extension' FROM DUAL)
SELECT CASE
WHEN INSTR (txt, '.', -1) > 0
THEN
SUBSTR (txt, 1, INSTR (txt, '.', -1) - 1)
ELSE
txt
END
txt
FROM aset
The result of this is
with_extension
without_extension
A BIG Caveat where the regexp is better:
My method doesn't handle this case correctly:
\this\is.a\test
So after I have gone to all this effort, stay with the regexp solutions. I'll leave this here so that others may learn from it.
I am trying to separate first and last name . I have a column called 'Fullname' and it has first and last name and a comma all in one column. I've tried the below but I get an error " its not a valid number". When I remove the comma it works, so I am not sure how to incorporate a comma in the formula so it can work.
,substr(Fullname,1,',') as Lastname
,substr(Fullname,',',' ') as Firstname
Column
Fullname
Brown,John N
Green,Julie T
Desired results
Lastname FirstName
Brown John
Green Julie
You can use regexp_substr():
select regexp_substr(name, '[^,]+', 1, 1) as lastname,
regexp_substr(name, '[^, ]+', 1, 2) as firstname
The second argument to SUBSTR() is the position of the substring, the third argument is the length of the substring. It will not automatically search for a delimiter if you use strings there instead of numbers. You can use LOCATE() to find the positions that you want.
SUBSTR(Fullname, 1, LOCATE(Fullname, ',')-1) AS Lastname,
SUBSTR(Fullname, LOCATE(Fullname, ',')+1) AS Firstname
Can be performed in Classical way by using instr inside substr function as the following case :
select substr(fullname,1,instr(fullname,',')-1) Firstname,
substr(fullname,instr(fullname,',')+1,length(fullname)) Lastname
from tab;
SQL Fiddle Demo
I have a column name that represents a person's name in the following format:
firstname [middlename] lastname [, Sr.|Jr.]
For, example:
John Smith
John J. Smith
John J. Smith, Sr.
How can I order items by lastname?
A correct and faster version could look like this:
SELECT *
FROM tbl
ORDER BY substring(name, '([^[:space:]]+)(?:,|$)')
Or:
ORDER BY substring(name, E'([^\\s]+)(?:,|$)')
Or even:
ORDER BY substring(name, E'([^\\s]+)(,|$)')
Explain
[^[:space:]]+ .. first (and longest) string consisting of one or more non-whitespace characters.
(,|$) .. terminated by a comma or the end of the string.
The last two examples use escape-string syntax and the class-shorthand \s instead of the long form [[:space:]] (which loses the outer level of brackets when inside a character class).
We don't actually have to use non-capturing parenthesis (?:) after the part we want to extract, because (quoting the manual):
.. if the pattern contains any parentheses, the portion of the text that
matched the first parenthesized subexpression (the one whose left
parenthesis comes first) is returned.
Test
SELECT substring(name, '([^[:space:]]+)(?:,|$)')
FROM (VALUES
('John Smith')
,('John J. Smith')
,('John J. Smith, Sr.')
,('foo bar Smith, Jr.')
) x(name)
SELECT *
FROM t
ORDER BY substring(name, E'^.*\\s([^\\s]+)(?=,|$)') ASC
While this should provide the sorting you are looking for, it would be a lot cheaper to store the name in multiple columns and index them based on which parts of the name you need to sort by.
You should use functional index for this purpose
http://www.postgresql.org/docs/7.3/static/indexes-functional.html
In your case somehow....
CREATE INDEX test1_lastname_col1_idx ON test1 (split_part(col1, ' ', 3));
SELECT * FROM test1 ORDER BY split_part(col1, ' ', 3);
Does anyone know how to turn this string: "Smith, John R"
Into this string: "jsmith" ?
I need to lowercase everything with lower()
Find where the comma is and track it's integer location value
Get the first character after that comma and put it in front of the string
Then get the entire last name and stick it after the first initial.
Sidenote - instr() function is not compatible with my version
Thanks for any help!
Start by writing your own INSTR function - call it my_instr for example. It will start at char 1 and loop until it finds a ','.
Then use as you would INSTR.
The best way to do this is using Oracle Regular Expressions feature, like this:
SELECT LOWER(regexp_replace('Smith, John R',
'(.+)(, )([A-Z])(.+)',
'\3\1', 1, 1))
FROM DUAL;
That says, 1) when you find the pattern of any set of characters, followed by ", ", followed by an uppercase character, followed by any remaining characters, take the third element (initial of first name) and append the last name. Then make everything lowercase.
Your side note: "instr() function is not compatible with my version" doesn't make sense to me, as that function's been around for ages. Check your version, because Regular Expressions was only added to Oracle in version 9i.
Thanks for the points.
-- Stew
instr() is not compatible with your version of what? Oracle? Are you using version 4 or something?
There is no need to create your own function, and quite frankly, it seems a waste of time when this can be done fairly easily with sql functions that already exist. Care must be taken to account for sloppy data entry.
Here is another way to accomplish your stated goal:
with name_list as
(select ' Parisi, Kenneth R' name from dual)
select name
-- There may be a space after the comma. This will strip an arbitrary
-- amount of whitespace from the first name, so we can easily extract
-- the first initial.
, substr(trim(substr(name, instr(name, ',') + 1)), 1, 1) AS first_init
-- a simple substring function, from the first character until the
-- last character before the comma.
, substr(trim(name), 1, instr(trim(name), ',') - 1) AS last_name
-- put together what we have done above to create the output field
, lower(substr(trim(substr(name, instr(name, ',') + 1)), 1, 1)) ||
lower(substr(trim(name), 1, instr(trim(name), ',') - 1)) AS init_plus_last
from name_list;
HTH,
Gabe
I have a hard time believing you don’t have access to a proper instr() but if that’s the case, implement your own version.
Assuming you have that straightened out:
select
substr(
lower( 'Smith, John R' )
, instr( 'Smith, John R', ',' ) + 2
, 1
) || -- first_initial
substr(
lower( 'Smith, John R' )
, 1
, instr( 'Smith, John R', ',' ) - 1
) -- last_name
from dual;
Also, be careful about your assumption that all names will be in that format. Watch out for something other than a single space after the comma, last names having data like “Parisi, Jr.”, etc.