Retrieve Second to Last Word in PostgreSQL - sql

I am using PostgreSQL 9.5.1
I have an address field where I am trying to extract the street type (AVE, RD, ST, etc). Some of them are formatted like this: 5th AVE N or PEE DEE RD N
I have seen a few methods in PostgreSQL to count segments from the left based on spaces i.e. split_part(name, ' ', 3), but I can't seem to find any built-in functions or regular expression examples where I can count the characters from the right.
My idea for moving forward is something along these lines:
select case when regexp_replace(name, '^.* ', '') = 'N'
then *grab the second to last group of string values*
end as type;

Leaving aside the issue of robustness of this approach when applied to address data, you can extract the penultimate space-delimited substring in a string like this:
with a as (
select string_to_array('5th AVE N', ' ') as addr
)
select
addr[array_length(addr, 1)-1] as street
from
a;

Related

Pulling a section of a string between two characters in SQL, and the section of the string around the extracted section

I have a table that includes names and allows for a "nickname" for each name in parenthesis.
PersonName
John (Johnny) Hendricks
Zekeraya (Zeke) Smith
Ajamain Sterling (Aljo)
Beth ()) Jackson
I need to extract the Nickname, and return a column of nicknames and a column of full names (Full string without the nickname portion in parenthesis). I also need a condition for the nickname to be null if no nickname exists, and so that the nickname only returns letters. So far I have been able to figure out how to get the nickname out using Substring, but I can't figure out how to create a separate column for just the name.
Select SUBSTRING(PersonName, CHARINDEX('(', PersonName) +1,(((LEN(PersonName))-CHARINDEX(')',REVERSE(PersonName)))-CHARINDEX('(',PersonName)))
as NickName
from dbo.Person
Any help would be appreciated. I'm using MS SQL Server 2019. I'm pretty new at this, as you can tell.
Using your existing substring, one simple way is to use apply.
Assuming your last row is an example of a nickname that should be NULL, you can use an inline if to check its length - presumably a nickname must be longer than 1 character? Adjust this logic as required.
select PersonName, Iif(Len(nn)<2,null,nn) NickName, Trim(Replace(Replace(personName, Concat('(',nn,')') ,''),' ','')) FullName
from Person
cross apply (values(SUBSTRING(PersonName, CHARINDEX('(', PersonName) +1,(((LEN(PersonName))-CHARINDEX(')',REVERSE(PersonName)))-CHARINDEX('(',PersonName))) ))c(nn)
The following code will deal correctly with missing parenthesis or empty strings.
Note how the first CROSS APPLY feeds into the next
SELECT
PersonName,
NULLIF(NickName, ''),
FullName = ISNULL(REPLACE(personName, ' (' + NickName + ')', ''), PersonName)
FROM t
CROSS APPLY (VALUES(
NULLIF(CHARINDEX('(', PersonName), 0))
) v1(opening)
CROSS APPLY (VALUES(
SUBSTRING(
PersonName,
v1.opening + 1,
NULLIF(CHARINDEX(')', PersonName, v1.opening), 0) - v1.opening - 1
)
)) v2(NickName);
db<>fiddle

Extract unmatched content or values

I want to extract the un-matched values in data like in (table1)
name id subject
maria 01 Math computer english
faro 02 Computer stat english
hina 03 Chemistry physics bio
The below query
Select *
from table1
where subject like ‘%english%’ or
subject like ‘%stat%’
returns first two rows that are matched with the criteria.
But I just need to extract the un-matched values from column (subject) like below output
unmatched
math computer
computer
chemistry physics bio
(Because in the first row only math computer values are not matching, in the second row two matches and in third row there are no matches).
can i get that output??
With REPLACE you eliminate all occurrences of the values 'english' and/or 'stat':
SELECT
trim(
replace(replace(replace(subject, 'english', ''), 'stat', ''), ' ', '')
) unmatched
FROM tablename;
The final trim and replace will remove double spaces from the result and spaces from the start and the end.
You have a poor table design. You should be storing lists as separate rows in another table -- a so-called "junction" or "association" table. SQL has a great data type for storing lists. It is called a "table" not a "string".
That said, sometimes we are stuck with other peoples really, really bad choices of data model.
If so, you can use replace() and trim() to get the list you want. I would do:
SELECT trim(replace(replace(' ' || subject || ' ', ' english ', ' '
), ' stat ', ''
), ' ', ' '
) as unmatched
FROM tablename;
This easily generalizes to more values, without worrying about introducing adjacent spaces.

How to get all the string before an empty space in postgres?

So basically I need to get a part of the postcode string, all before an empty space
my current query is
select postcode, left(postcode, length(postcode) - strpos(' ', postcode))
from postcodetable
But it just doesn't return the post code correctly, example:
1st column is NW1 1AA, 2nd column should just be NW1 but instead it just repeats the first column
Your arguments to strpos() are in the wrong order. So you can do:
select postcode, left(postcode, length(postcode) - strpos(postcode, ' '))
from (values ('NW1 1AA')) v(postcode);
You can also do this using substring() with regular expressions:
select postcode, substring(postcode from '[^ ]+'::text)

Get index of two consecutive upper case characters

I am trying to separate a city/state/zip field into the city, state, and zip. Normally I would do this with charindex of ',' to get the city and state, and isnumeric and right() for the zip.
This will work fine for the zip, but most of the rows in the data I am working with now have no commas City ST Zip. Is there a way to identify the index of two upper case characters?
If not, does anybody have a better idea than just a case statement checking for each state individually?
EDIT: I found the PATINDEX/COLLATE option to work fairly intermittently. See my answer below.
PATINDEX should work for you:
PATINDEX('% [A-Z][A-Z] %', A COLLATE Latin1_general_cs_as)
So your full extract would be something like:
WITH CTE AS
( SELECT i = PATINDEX('% [A-Z][A-Z] %', A COLLATE Latin1_general_cs_as) + 1,
A
FROM (VALUES
('City ST Zip'),
('Another City ST Zip'),
('City, with comma ST Zip')
) t (A)
)
SELECT City = LEFT(A, i - 2),
State = SUBSTRING(A, i, 2),
Zip = SUBSTRING(A, i + 3, LEN(A))
FROM CTE;
Example on SQL Fiddle
The reason why PATINDEX appears to work intermittently is that you cannot use a character range (i.e. A-Z) to accomplish a case-sensitive search, even if using a case-sensitive collation. The issue is that character ranges work like sorting, and case-sensitive sorting groups the upper-case letters with their lower-case equivalents, just like it would be ordered in a dictionary. Range sorting is really: a,A,b,B,c,C,d,D,etc. Or, depending on the collation, it might be: A,a,B,b,C,c,D,d,etc (there are 31 Collations that sort upper-case first). When doing this in a case-sensitive collation, that merely groups all A entries together, separate from the a entries, whereas in a case-insensitive sort they would be intermixed.
But if you specify each of the letters individually (hence not using a range), then it will work as expected:
PATINDEX(N'%[ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ]%',
[CityStZip] COLLATE Latin1_General_100_CS_AS)
The reason that PATINDEX and LIKE (both of which allow for a single character class of [A-Z]) work this way is that the [start-end] syntax is not a Regular Expression. Many people claim that PATINDEX and LIKE support "limited" RegEx due to supporting this syntax, but that is not true. It is merely a very similar (and a confusingly similar) syntax to RegEx where [A-Z] would normally not include any lower-case matches.
Of course, if you are guaranteed to only be searching on the US-English letters of A-Z, then a binary collation (i.e. one ending in _BIN2; don't use ones ending in _BIN as they have been deprecated since SQL Server 2005 was introduced, I believe) should work.
PATINDEX(N'%[A-Z][A-Z]%', [CityStZip] COLLATE Latin1_General_100_BIN2)
For more details about case-sensitive matching, especially in regards to including Unicode / NVARCHAR data, please see my related answer on DBA.StackExchange:
How to find values with multiple consecutive upper case characters
If you have zip code and state at the end of the string, then this might work:
select right(address, 5) as zip,
left(right(address, 8), 2) as state,
left(address, len(address) - 9) as city
You can start by removing the commas and double spaces from the address.
If you have a table of states(which you should) with a column of the abbreviations you can do things like this:
SELECT a.* FROM Addresses a
INNER JOIN States s ON
a.CityStateZip Like '% ' + s.UpperCaseAbbreviation + ' %' --space on either side of abbreviation
You can make it work for both commas and spaces:
SELECT a.* FROM Addresses a
INNER JOIN States s ON
Replace(a.CityStateZip, ',' , ' ') Like '% ' + s.UpperCaseAbbreviation + ' %'
I found the PATINDEX/COLLATE option to work fairly intermittently. Here is what I ended up doing:
--get rid of the sparsely used commas
--get rid of the duplicate spaces
update MyTable set
CityStZip=
replace(
replace(
replace(CityStZip,' ',' '),
' ',' '),
',','')
select
--check if state and zip are there and then grab the city
case when isNumeric(right(CityStZip,1))=1
then left(CityStZip,len(CityStZip)-charindex(' ',reverse(CityStZip),
charindex(' ',reverse(CityStZip))+1)+1)
--no zip. check for state
when left(right(CityStZip,3),1) = ' '
then left(CityStZip,len(CityStZip)-charIndex(' ',reverse(CityStZip)))
else CityStZip
end as City,
--check if zip is there and then grab the city
case when isNumeric(right(CityStZip,1))=1
then substring(CityStZip,
len(CityStZip)-charindex(' ',reverse(CityStZip),
charindex(' ',reverse(CityStZip))+1)+2,
2)
--no zip. check if 3rd to last char is a space and grab the last two chars
when left(right(CityStZip,3),1) = ' '
then right(CityStZip,2)
end as [State],
--grab everything after the last space if the last character is numeric
case when isNumeric(right(CityStZip,1))=1
then substring(CityStZip,
len(CityStZip)-charindex(' ',reverse(CityStZip))+1,
charindex(' ',reverse(CityStZip)))
end as Zip
from MyTable

How to delete all non-numerical letters in db2

I have some data in DATA column (varchar) that looks like this:
Nowshak 7,485 m
Maja e Korabit (Golem Korab) 2,764 m
Tahat 3,003 m
Morro de Moco 2,620 m
Cerro Aconcagua 6,960 m (located in the northwestern corner of the province of Mendoza)
Mount Kosciuszko 2,229 m
Grossglockner 3,798 m
What I want is this:
7485
2764
3003
2620
6960
2229
3798
Is there a way in IBM DB2 version 9.5 to remove/delete all those non-numeric letters by doing something like this:
SELECT replace(DATA, --somekind of regular expression--, '') FROM TABLE_A
or any other ways?
This question follows from this question.
As suggested in the other question, the TRANSLATE function might help. For example, try this:
select translate('Nowshak 7,485 m','','Nowshakm,') from sysibm.sysdummy1;
Returns:
7 485
Probably with a little tweaking you can get it to how you want it...in the third argument of the function you just need to specify the entire alphabet. Kind of ugly but it will work.
One easy way to accomplish that is to use the TRANSLATE(value, replacewith, replacelist) function. It replaces all of the characters in a list (third parameter) with the value in the second parameter.
You can leverage that to essentially erase all of the non-numeric charaters out of the character string, including the spaces.
Just make the list in the third parameter contain all of the possible characters you might see that you don't want. Translate those to an empty space, and you end up with just the characters you want, essentially erasing the undesired characters.
Note: I included all of the common symbols (non-alpha numeric) for the benefit of others who may have character values of a larger variety than your example.
Select
TRANSLATE(UCASE(CHAR_COLUMN),'',
'ABCDEFGHIJKLMNOPQRSTUVWXYZ!##$%^&*()-=+/\{}[];:.,<>? ')
FROM TABLE_A
More simply: For your particular set of values, since there is a much smaller set of possible characters you could trim the replace list down to this:
Select
TRANSLATE(UCASE(CHAR_COLUMN),'','ABCDEFGHIJKLMNOPQRSTUVWXYZ(), ')
FROM TABLE_A
NOTE: The "UCASE" on the CHAR_COLUMN is not necessary, but it was a nice enhancement to simplify this expression by eliminating the need to include all of the lower case alpha characters.
TRANSLATE(CHAR_COLUMN,'',
'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz!##$%^&*()-=+/\{}[];:.,<>? ')
As many of the answers above your best approach is to use the TRANSLATE function. However this approach is a different as you can white list the characters you want instead of black list the characters you don't want. We can do this by using the TRANSLATE function twice. We'll use the inner translate to generate a list of characters to remove for the parameter of the outer translate.
select TRANSLATE(dirty,'',TRANSLATE(dirty,'','1234567890',''),'') as clean
from (Values 'Nowshak 7,485 m'
,'Maja e Korabit (Golem Korab) 2,764 m'
,'Tahat 3,003 m','Morro de Moco 2,620 m'
,'Cerro Aconcagua 6,960 m (located in the northwestern corner of the province of Mendoza)'
,'Mount Kosciuszko 2,229 m','Grossglockner 3,798 m'
) as temp(dirty)
Just taking #Darryls99 and turning it into a UDF
CREATE OR REPLACE FUNCTION REMOVE_ALLBUT(in_string VARCHAR(32000), characters_to_remote VARCHAR(32000))
RETURNS VARCHAR(32000)
LANGUAGE SQL CONTAINS SQL DETERMINISTIC NO EXTERNAL ACTION
RETURN
TRANSLATE(in_string,'',TRANSLATE(in_string,'',characters_to_remote,''),'')
;
use like this
select DB_REMOVE_ALLBUT(s,'1234567890')
from (values 'Nowshak 7,485 m'
,'Maja e Korabit (Golem Korab) 2,764 m'
,'Tahat 3,003 m','Morro de Moco 2,620 m'
,'Cerro Aconcagua 6,960 m (located in the northwestern corner of the province of Mendoza)'
,'Mount Kosciuszko 2,229 m'
,'Grossglockner 3,798 m'
) t(s);
which returns
1
----
7485
2764
3003
2620
6960
2229
3798
Dirty string can be like this: 'qwerty12453lala<<>777*9'
We need to get cleared string and keep only digits.
We could remove any excess characters with TRANSLATE function,
but there is one problem: too long and ugly value of 3-th parameter.
Something like this:
VALUES
(
TRANSLATE( UPPER('qwerty12453lala<<>777*9'), '', 'ABCDEFGHIJKLMNOPQRSTUVWXYZ!##$%^&*()-=+/\{}[];:.,<>? ')
)
So, this is not very convenient.
My idea is - use TRANSLATE functuion 2 times ( one time inside another one):
Calculate 3-th parameter as a particular list of replaced symbols
Use TRANSLATE function second time to replace excess symbols by using this calculated parameter
Let me show you here in code:
VALUES
(
REPLACE --Remove spaces from result
(
TRANSLATE
(
UPPER( 'qwerty12453lala<<>777*9')
, ' '
, TRANSLATE( UPPER( 'qwerty12453lala<<>777*9') , ' ' , '0123456789')-- This is calculation of 3-th param, it contains only NOT digital characters, like 'QWERTYLALA<<>*'
)
, ' '
, ''
)
)
Result must be like this: 124537779
In case SELECT statement, it would be like this:
SELECT REPLACE
(
TRANSLATE( UPPER(T.DIRTY_FIELD), ' ', TRANSLATE(UPPER(T.DIRTY_FIELD), '', '1234567890' ) )
, ' '
, ''
)
FROM SOMETABLE T
the proper combination of all
select replace(translate(dirty,' ','ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz!##$%^&*()-=+/{}[];:.,<>?' ), ' ','') as clean
Is there a way in IBM DB2 version 9.5
to remove/delete all those non-numeric
letters by doing something like this:
SELECT replace(DATA, --somekind of
regular expression--, '') FROM TABLE_A
or any other ways?
No. You will have to create a User Defined Function or implement it in your host application's language.
The statement below will remove non-alphanumeric characters from any 'string-value' and prevents the SQLSTATE message 42815 when a zero length string-value is passed.
SELECT REPLACE(TRANSLATE(string-value || '|',
'||||||||||||||||||||||||||||||||',
'`¬!"£$%^&*()_-+={[}]:;#~#,<>.?/'''),'|','')
FROM SYSIBM.SYSDUMMY1;