Postgres: order data by part of string - sql

I have a column name that represents a person's name in the following format:
firstname [middlename] lastname [, Sr.|Jr.]
For, example:
John Smith
John J. Smith
John J. Smith, Sr.
How can I order items by lastname?

A correct and faster version could look like this:
SELECT *
FROM tbl
ORDER BY substring(name, '([^[:space:]]+)(?:,|$)')
Or:
ORDER BY substring(name, E'([^\\s]+)(?:,|$)')
Or even:
ORDER BY substring(name, E'([^\\s]+)(,|$)')
Explain
[^[:space:]]+ .. first (and longest) string consisting of one or more non-whitespace characters.
(,|$) .. terminated by a comma or the end of the string.
The last two examples use escape-string syntax and the class-shorthand \s instead of the long form [[:space:]] (which loses the outer level of brackets when inside a character class).
We don't actually have to use non-capturing parenthesis (?:) after the part we want to extract, because (quoting the manual):
.. if the pattern contains any parentheses, the portion of the text that
matched the first parenthesized subexpression (the one whose left
parenthesis comes first) is returned.
Test
SELECT substring(name, '([^[:space:]]+)(?:,|$)')
FROM (VALUES
('John Smith')
,('John J. Smith')
,('John J. Smith, Sr.')
,('foo bar Smith, Jr.')
) x(name)

SELECT *
FROM t
ORDER BY substring(name, E'^.*\\s([^\\s]+)(?=,|$)') ASC
While this should provide the sorting you are looking for, it would be a lot cheaper to store the name in multiple columns and index them based on which parts of the name you need to sort by.

You should use functional index for this purpose
http://www.postgresql.org/docs/7.3/static/indexes-functional.html
In your case somehow....
CREATE INDEX test1_lastname_col1_idx ON test1 (split_part(col1, ' ', 3));
SELECT * FROM test1 ORDER BY split_part(col1, ' ', 3);

Related

Pulling a section of a string between two characters in SQL, and the section of the string around the extracted section

I have a table that includes names and allows for a "nickname" for each name in parenthesis.
PersonName
John (Johnny) Hendricks
Zekeraya (Zeke) Smith
Ajamain Sterling (Aljo)
Beth ()) Jackson
I need to extract the Nickname, and return a column of nicknames and a column of full names (Full string without the nickname portion in parenthesis). I also need a condition for the nickname to be null if no nickname exists, and so that the nickname only returns letters. So far I have been able to figure out how to get the nickname out using Substring, but I can't figure out how to create a separate column for just the name.
Select SUBSTRING(PersonName, CHARINDEX('(', PersonName) +1,(((LEN(PersonName))-CHARINDEX(')',REVERSE(PersonName)))-CHARINDEX('(',PersonName)))
as NickName
from dbo.Person
Any help would be appreciated. I'm using MS SQL Server 2019. I'm pretty new at this, as you can tell.
Using your existing substring, one simple way is to use apply.
Assuming your last row is an example of a nickname that should be NULL, you can use an inline if to check its length - presumably a nickname must be longer than 1 character? Adjust this logic as required.
select PersonName, Iif(Len(nn)<2,null,nn) NickName, Trim(Replace(Replace(personName, Concat('(',nn,')') ,''),' ','')) FullName
from Person
cross apply (values(SUBSTRING(PersonName, CHARINDEX('(', PersonName) +1,(((LEN(PersonName))-CHARINDEX(')',REVERSE(PersonName)))-CHARINDEX('(',PersonName))) ))c(nn)
The following code will deal correctly with missing parenthesis or empty strings.
Note how the first CROSS APPLY feeds into the next
SELECT
PersonName,
NULLIF(NickName, ''),
FullName = ISNULL(REPLACE(personName, ' (' + NickName + ')', ''), PersonName)
FROM t
CROSS APPLY (VALUES(
NULLIF(CHARINDEX('(', PersonName), 0))
) v1(opening)
CROSS APPLY (VALUES(
SUBSTRING(
PersonName,
v1.opening + 1,
NULLIF(CHARINDEX(')', PersonName, v1.opening), 0) - v1.opening - 1
)
)) v2(NickName);
db<>fiddle

Teradata Parsing Full Name field sql

I have a column with a name value with a data type of char(64) LATIN in a Teradata table. The values look like 'SMITH JOHN J ', 'Doe Jane Anne ', etc. The spaces between the elements vary from value to value. I am able to parse out the last name out with a left, but I am having trouble parsing out the first name and middle initial/name. I have tried using the index and position functions, but I am not getting the desired result. Has anyone encountered a similar scenario?
You could use regexp_substr() and adjust the occurence argument, which specifies the number of the occurence to return:
select
regexp_substr(name, '\w+', 1, 1) last_name,
regexp_substr(name, '\w+', 1, 2) middle_name,
regexp_substr(name, '\w+', 1, 3) first_name
from mytable
In PCRE notation, which Teradata used, \w matches on word characters (alphanumeric and the underscore). You might want to make the regex a little broader with \S (anything but a space).

How to split string with inconsistent order format in SQL

I want to split the strings in the 'Scorer' column so that the scorer name is retained but not the score type (i.e. to remove the text within the brackets and the brackets to just leave the scorer name in that field).
Scorer
Ellis J.(Conversion Goal)
Ellis J.(Try)
Ellis J.(Conversion Goal)
Trueman J.(Try)
(Conversion Goal)Brough D.
(Try)McGillvary J.
(Try)McGillvary J.
(Penalty Goal)Brough D.
Ellis J.(Conversion Goal)
It should look like the below.
Scorer
Ellis J.
Ellis J.
Ellis J.
Trueman J.
Brough D.
McGillvary J.
McGillvary J.
Brough D.
Ellis J.
The correct solution would be to fix the database structure by adding another column to the table for the score type. In fact, you should probably have a table for score types and add a foreign key to it from this table.
Assuming you can't change the database structure, this is better done at the presentation layer. Any programming language should enable you do do it quite easily. String manipulation is not SQL's strong suit.
That being said, it can certainly be done using pure T-SQL - with a simple common table expression to get the brackets indexes using charindex, and a case expression with stuff in the select statement.
First, create and populate sample table (Please save us this step in your future questions):
DECLARE #T AS TABLE
(
Scorer nvarchar(100)
);
INSERT INTO #T (Scorer) VALUES
('Ellis J.(Conversion Goal)'),
('Ellis J.(Try)'),
('Ellis J.(Conversion Goal)'),
('Trueman J.(Try)'),
('(Conversion Goal)Brough D.'),
('(Try)McGillvary J.'),
('(Try)McGillvary J.'),
('(Penalty Goal)Brough D.'),
('Ellis J.(Conversion Goal)'),
-- Note: I've added some edge cases to the sample data:
('a row with (brackets) in the middle'),
('Just an open bracket (forgot to close '),
('Just a close bracket forgot to open)'),
('no brackets at all'),
('brackets ) in reversed order (');
Then, the CTE:
WITH CTE AS
(
SELECT Scorer,
CHARINDEX('(', Scorer) As OpenBrackets,
CHARINDEX(')', Scorer) As CloseBrackets
FROM #T
)
The select statement:
SELECT CASE WHEN OpenBrackets > 0 AND CloseBrackets > OpenBrackets
THEN
STUFF(Scorer, OpenBrackets, CloseBrackets - OpenBrackets + 1, '')
ELSE
Scorer
END As Scorer
FROM CTE
Results:
Scorer
Ellis J.
Ellis J.
Ellis J.
Trueman J.
Brough D.
McGillvary J.
McGillvary J.
Brough D.
Ellis J.
a row with in the middle
Just an open bracket (forgot to close
Just a close bracket forgot to open)
no brackets at all
brackets ) in reversed order (
Below query works for you
SELECT LTRIM(RTRIM(REPLACE(Scorer, SUBSTRING(Scorer, CHARINDEX('(', Scorer), CHARINDEX(')', Scorer) - CHARINDEX('(', Scorer) + 1), '')))
FROM <TABLENAME>
These two pieces of information (the name and action) should not be in the same column. You should create a separate column for name and for action. And if the position of the action (before or after the name) is important, you might even need an additional column for that.
When you have migrated your data after that - in other words when you have cleaned up - you could still create a view or a computed column to output the scorer the way you do now, for example
ALTER TABLE my_table ADD scorer AS athlete_name + ' (' + action + ')'
You could try:
SELECT Scorer
,CASE WHEN PATINDEX('%(%)%',Scorer) > 1
THEN LEFT(Scorer, PATINDEX('%(%)%',Scorer)-1)
ELSE RIGHT (Scorer, LEN(Scorer) - CHARINDEX(')',Scorer,1) )
END AS ColumnName
FROM ScoreTable
this should work assuming you only expect 1 instance if the pattern per row, but will work whether the "()" data is at the front or the back of the values
You can use this query
with t(str) as
(
select 'Ellis J.(Conversion Goal)' union all
select '(Conversion Goal)Brough D.' union all
select ' (Try)McGillvary J.'
)
select (case when charindex('(', ltrim(str)) = 1 then
substring(str,charindex(')', str)+1,len(str))
else
left(str, charindex('(', str) - 1)
end) as "Scorers"
from t
Scorers
--------------
Ellis J.
Brough D.
McGillvary J.
by contribution of substring, charindex and left functions together. ltrim is used against probabilty of spaces left before ( character at the beginning of the string.
Rextester Demo

substr Error -- ORA-01722:invaild number separate string

I am trying to separate first and last name . I have a column called 'Fullname' and it has first and last name and a comma all in one column. I've tried the below but I get an error " its not a valid number". When I remove the comma it works, so I am not sure how to incorporate a comma in the formula so it can work.
,substr(Fullname,1,',') as Lastname
,substr(Fullname,',',' ') as Firstname
Column
Fullname
Brown,John N
Green,Julie T
Desired results
Lastname FirstName
Brown John
Green Julie
You can use regexp_substr():
select regexp_substr(name, '[^,]+', 1, 1) as lastname,
regexp_substr(name, '[^, ]+', 1, 2) as firstname
The second argument to SUBSTR() is the position of the substring, the third argument is the length of the substring. It will not automatically search for a delimiter if you use strings there instead of numbers. You can use LOCATE() to find the positions that you want.
SUBSTR(Fullname, 1, LOCATE(Fullname, ',')-1) AS Lastname,
SUBSTR(Fullname, LOCATE(Fullname, ',')+1) AS Firstname
Can be performed in Classical way by using instr inside substr function as the following case :
select substr(fullname,1,instr(fullname,',')-1) Firstname,
substr(fullname,instr(fullname,',')+1,length(fullname)) Lastname
from tab;
SQL Fiddle Demo

Retrieve Second to Last Word in PostgreSQL

I am using PostgreSQL 9.5.1
I have an address field where I am trying to extract the street type (AVE, RD, ST, etc). Some of them are formatted like this: 5th AVE N or PEE DEE RD N
I have seen a few methods in PostgreSQL to count segments from the left based on spaces i.e. split_part(name, ' ', 3), but I can't seem to find any built-in functions or regular expression examples where I can count the characters from the right.
My idea for moving forward is something along these lines:
select case when regexp_replace(name, '^.* ', '') = 'N'
then *grab the second to last group of string values*
end as type;
Leaving aside the issue of robustness of this approach when applied to address data, you can extract the penultimate space-delimited substring in a string like this:
with a as (
select string_to_array('5th AVE N', ' ') as addr
)
select
addr[array_length(addr, 1)-1] as street
from
a;