Get index of two consecutive upper case characters - sql

I am trying to separate a city/state/zip field into the city, state, and zip. Normally I would do this with charindex of ',' to get the city and state, and isnumeric and right() for the zip.
This will work fine for the zip, but most of the rows in the data I am working with now have no commas City ST Zip. Is there a way to identify the index of two upper case characters?
If not, does anybody have a better idea than just a case statement checking for each state individually?
EDIT: I found the PATINDEX/COLLATE option to work fairly intermittently. See my answer below.

PATINDEX should work for you:
PATINDEX('% [A-Z][A-Z] %', A COLLATE Latin1_general_cs_as)
So your full extract would be something like:
WITH CTE AS
( SELECT i = PATINDEX('% [A-Z][A-Z] %', A COLLATE Latin1_general_cs_as) + 1,
A
FROM (VALUES
('City ST Zip'),
('Another City ST Zip'),
('City, with comma ST Zip')
) t (A)
)
SELECT City = LEFT(A, i - 2),
State = SUBSTRING(A, i, 2),
Zip = SUBSTRING(A, i + 3, LEN(A))
FROM CTE;
Example on SQL Fiddle

The reason why PATINDEX appears to work intermittently is that you cannot use a character range (i.e. A-Z) to accomplish a case-sensitive search, even if using a case-sensitive collation. The issue is that character ranges work like sorting, and case-sensitive sorting groups the upper-case letters with their lower-case equivalents, just like it would be ordered in a dictionary. Range sorting is really: a,A,b,B,c,C,d,D,etc. Or, depending on the collation, it might be: A,a,B,b,C,c,D,d,etc (there are 31 Collations that sort upper-case first). When doing this in a case-sensitive collation, that merely groups all A entries together, separate from the a entries, whereas in a case-insensitive sort they would be intermixed.
But if you specify each of the letters individually (hence not using a range), then it will work as expected:
PATINDEX(N'%[ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ]%',
[CityStZip] COLLATE Latin1_General_100_CS_AS)
The reason that PATINDEX and LIKE (both of which allow for a single character class of [A-Z]) work this way is that the [start-end] syntax is not a Regular Expression. Many people claim that PATINDEX and LIKE support "limited" RegEx due to supporting this syntax, but that is not true. It is merely a very similar (and a confusingly similar) syntax to RegEx where [A-Z] would normally not include any lower-case matches.
Of course, if you are guaranteed to only be searching on the US-English letters of A-Z, then a binary collation (i.e. one ending in _BIN2; don't use ones ending in _BIN as they have been deprecated since SQL Server 2005 was introduced, I believe) should work.
PATINDEX(N'%[A-Z][A-Z]%', [CityStZip] COLLATE Latin1_General_100_BIN2)
For more details about case-sensitive matching, especially in regards to including Unicode / NVARCHAR data, please see my related answer on DBA.StackExchange:
How to find values with multiple consecutive upper case characters

If you have zip code and state at the end of the string, then this might work:
select right(address, 5) as zip,
left(right(address, 8), 2) as state,
left(address, len(address) - 9) as city
You can start by removing the commas and double spaces from the address.

If you have a table of states(which you should) with a column of the abbreviations you can do things like this:
SELECT a.* FROM Addresses a
INNER JOIN States s ON
a.CityStateZip Like '% ' + s.UpperCaseAbbreviation + ' %' --space on either side of abbreviation
You can make it work for both commas and spaces:
SELECT a.* FROM Addresses a
INNER JOIN States s ON
Replace(a.CityStateZip, ',' , ' ') Like '% ' + s.UpperCaseAbbreviation + ' %'

I found the PATINDEX/COLLATE option to work fairly intermittently. Here is what I ended up doing:
--get rid of the sparsely used commas
--get rid of the duplicate spaces
update MyTable set
CityStZip=
replace(
replace(
replace(CityStZip,' ',' '),
' ',' '),
',','')
select
--check if state and zip are there and then grab the city
case when isNumeric(right(CityStZip,1))=1
then left(CityStZip,len(CityStZip)-charindex(' ',reverse(CityStZip),
charindex(' ',reverse(CityStZip))+1)+1)
--no zip. check for state
when left(right(CityStZip,3),1) = ' '
then left(CityStZip,len(CityStZip)-charIndex(' ',reverse(CityStZip)))
else CityStZip
end as City,
--check if zip is there and then grab the city
case when isNumeric(right(CityStZip,1))=1
then substring(CityStZip,
len(CityStZip)-charindex(' ',reverse(CityStZip),
charindex(' ',reverse(CityStZip))+1)+2,
2)
--no zip. check if 3rd to last char is a space and grab the last two chars
when left(right(CityStZip,3),1) = ' '
then right(CityStZip,2)
end as [State],
--grab everything after the last space if the last character is numeric
case when isNumeric(right(CityStZip,1))=1
then substring(CityStZip,
len(CityStZip)-charindex(' ',reverse(CityStZip))+1,
charindex(' ',reverse(CityStZip)))
end as Zip
from MyTable

Related

In SQL Server, how can I identify "double" strings and correct?

How can I find strings in a column that are doubled-up and correct them? I feel like there is an easy answer to this I just can't think of it.
Example:
I want to find instances of a repeating string, example "SolonSolon", and then update the column to "Solon".
Update:
They're always the same. No extra characters, but might have a space as part of the repeating value. Other examples would be...
"PlacePlace", "TreeTree", "OrangeOrange", "TravisMemorialHSTravisMemorialHS", "Texas HSTexas HS"
You can check if the string is equal to the first half replicated.
SELECT LEFT(YourCol,LEN(REPLACE(YourCol, ' ', 'x'))/2)
FROM YourTable
WHERE YourCol = REPLICATE(LEFT(YourCol,LEN(REPLACE(YourCol, ' ', 'x'))/2),2)
The reason for the REPLACE of spaces with x before calculating the LEN is because trailing spaces are ignored by this function. You can also use the technique in #lptr's answer for this but an edge case will be if the string was varchar(8000) and already 8000 characters long in which case concatenating an extra character won't do anything (LEN(SPACE(8000) + 'x') is 0).
..replace the first half of the value with an empty string..if there is nothing left..the value consists of two equal parts
select *, substring(c, 1, (len(c+'.')-1)/2)
from
(
values
('solosolo'), ('yoyo'), ('andand'), ('1212'),(' . .'),
('ababc'), ('onetwoone')
) as t(c)
where replace(c, substring(c, 1, (len(c+'.')-1)/2), '') = '';
Another alternative. The query removes inner spaces using REPLACE(str_col, ' ', ''), removes leading/traling spaces using TRIM, and checks to make sure the first half of the string equals the second half.
select left(no_spaces.str_col, v.str_len/2)
from foo f
cross apply (values (replaced trim(f.str_col), ' ', '')) no_spaces(str_col)
cross apply (values (len(no_spaces.str_col))) v(str_len)
where no_spaces.str_col=replicate(left(f.str_col, v.str_len/2), 2);

Retrieve Second to Last Word in PostgreSQL

I am using PostgreSQL 9.5.1
I have an address field where I am trying to extract the street type (AVE, RD, ST, etc). Some of them are formatted like this: 5th AVE N or PEE DEE RD N
I have seen a few methods in PostgreSQL to count segments from the left based on spaces i.e. split_part(name, ' ', 3), but I can't seem to find any built-in functions or regular expression examples where I can count the characters from the right.
My idea for moving forward is something along these lines:
select case when regexp_replace(name, '^.* ', '') = 'N'
then *grab the second to last group of string values*
end as type;
Leaving aside the issue of robustness of this approach when applied to address data, you can extract the penultimate space-delimited substring in a string like this:
with a as (
select string_to_array('5th AVE N', ' ') as addr
)
select
addr[array_length(addr, 1)-1] as street
from
a;

SQL- Remove Trailing space and add it to the beginning of the Name

Was working on SQL-EX.ru exercises.
There is one question for DML that I could not do, but I cannot proceed to the next one, until this one is done.
the question itself: All the trailing spaces in the name column of the Battles table remove and add them at the beginning of the name.
My code:
Update Battles
set name=concat(' ',(LTRIM(RTRIM(name))))
The system does not let it go through, I understand that I am using ' ' for the concat, whereas I need to use the stuff that got trimmed. And I have no idea how...
Any help would be very much appreciated
Try Something Like:-
set name = lpad(trim(name), length(trim(name))+4, ' ')
Here use TRIM to remove space from both side. use LPAD to add something on left side with n (4) chars
I'm not familiar with SQL-EX.ru, but if it's Oracle compatible and you can use regular expressions (or you are at that point in the training) here's a way. Maybe it'll give you an idea at least. The first part is just setup and uses a WITH clause to create a table (like a temp table in memory, actually called a Common Table Expression or CTE) called battles containing a name column with 2 rows. Each name column datum has a different number of spaces at the end. Next select from that column using a regular expression that uses 2 "remembered" groups surrounded by parentheses, the first containing the string up to until but not including the first space, the second containing 0 or more space characters anchored to the end of the line. Replace that with the 2nd group (the spaces) first, followed by the first group (the first part of the string). This is surrounded by square brackets just to prove in the output the same spaces were moved to the front of the string.
SQL> with battles(name) as (
select 'test2 ' from dual union
select 'test1 ' from dual
)
select '[' || regexp_replace(name, '(.*?)([ ]*)$', '\2\1') || ']' fixed
from battles;
FIXED
----------------------------------------------------------------------------
[ test1]
[ test2]
SQL>
I hope this solution can be applied to your problem or at least give you some ideas.
Try this:
set name = case when len(name) > len(rtrim(name))
then replicate(' ', len(name) - len(rtrim(name))) + rtrim(name)
else name
end
update battles
set name = case when (len(name+'a')-1) > len(rtrim(name))
then
replicate(' ',
(len(name+'a')-1) - len(rtrim(name))) + rtrim(name)
else name
end
Len() doesn't count trailing spaces. So using (len(name+'a')-1).
Simplest answer:
UPDATE Battles
SET name = SPACE(DATALENGTH(name)-DATALENGTH(RTRIM(name))) + RTRIM(name)
But only works because name is VARCHAR.
More generic is to do:
UPDATE Battles
SET name = SPACE(len(name+'x')-1-len(RTRIM(name))) + RTRIM(name)
simple example below ... enjoy :)
update battles set name =
Space( DATALENGTH(name) - DATALENGTH(rtrim(name))) + rtrim(name)
where date in ( select date from battles)

trim the column value string

In SQL Query, I need the values as below using select query of my column.
Result has to be the text after the first space ' ' and before the first '('
Source Column
create Table Test_Table (Column1 Varchar(50))
Insert into Test_Table Values
('0636 KAVITHI (LOC)'),
('0638 SRI KRISHNA (NAT)'),
('0639 SELVAM'),
('0643 GOOD SERVICE (LOC)'),
('0644 FINA CARE EVENT (LOC)')
I need get the string found between first ' ' and the '('
Expected Result
KAVITHI
SRI KRISHNA
SELVAM
GOOD SERVICE
FINA CARE EVENT
Another approach without using an OUTER APPLY.
SELECT CASE WHEN Column1 LIKE '%(%'
THEN SUBSTRING(RIGHT(Column1,LEN(Column1)-CHARINDEX(' ',Column1)),0,
CHARINDEX('(',RIGHT(Column1,LEN(Column1)-CHARINDEX(' ',Column1)),0))
ELSE RIGHT(Column1,LEN(Column1)-CHARINDEX(' ',Column1))
END AS Trimmed
FROM Test_Table
OUTPUT
Trimmed
KAVITHI
SRI KRISHNA
SELVAM
GOOD SERVICE
FINA CARE EVENT
SQL Fiddle: http://sqlfiddle.com/#!3/69dd1/20/0
CHARINDEX() can be used to find the position of specific characters.
OUTER APPLY can be used to find the position of the space and brace characters, and store them in a place that you can re-use them.
SUBSTRING() can be used to find the text between the space and the brace.
EDIT: Added CASE to cope with values that contain no (.
SELECT
SUBSTRING(
test_table.column1, -- the field we're searching
stats.idx_space + 1, -- starting from the character after the first space
CASE
WHEN stats.idx_brace > stats.idx_space
THEN stats.idx_brace
ELSE stats.idx_eos
END
-
stats.idx_space -- for as many characters as there are between the space and the brace
)
FROM
test_table
OUTER APPLY
(
SELECT
CHARINDEX(' ', test_table.column1) AS idx_space, -- position of the first space
CHARINDEX('(', test_table.column1) AS idx_brace, -- position of the first brace
LEN(test_table.column1) AS idx_eos -- position of the end-of-string
)
AS stats
EDIT: A single "line", as requested.
Do note that forcing this as a single line does make this harder to read, maintain and adapt. One of APPLY's strongest use-cases is to maintain DRY (Don't Repeat Yourself) principles.
This query repeats several parts several times:
- find the first space repeated 2 times
- find the first brace repeated 3 times
SELECT
SUBSTRING(
test_table.column1,
CHARINDEX(' ', test_table.column1) + 1,
CASE
WHEN CHARINDEX('(', test_table.column1) > CHARINDEX(' ', test_table.column1)
THEN CHARINDEX('(', test_table.column1)
ELSE LEN(test_table.column1)
END
-
CHARINDEX('(', test_table.column1)
)
FROM
test_table

Oracle SQL - Parsing a name string and converting it to first initial & last name

Does anyone know how to turn this string: "Smith, John R"
Into this string: "jsmith" ?
I need to lowercase everything with lower()
Find where the comma is and track it's integer location value
Get the first character after that comma and put it in front of the string
Then get the entire last name and stick it after the first initial.
Sidenote - instr() function is not compatible with my version
Thanks for any help!
Start by writing your own INSTR function - call it my_instr for example. It will start at char 1 and loop until it finds a ','.
Then use as you would INSTR.
The best way to do this is using Oracle Regular Expressions feature, like this:
SELECT LOWER(regexp_replace('Smith, John R',
'(.+)(, )([A-Z])(.+)',
'\3\1', 1, 1))
FROM DUAL;
That says, 1) when you find the pattern of any set of characters, followed by ", ", followed by an uppercase character, followed by any remaining characters, take the third element (initial of first name) and append the last name. Then make everything lowercase.
Your side note: "instr() function is not compatible with my version" doesn't make sense to me, as that function's been around for ages. Check your version, because Regular Expressions was only added to Oracle in version 9i.
Thanks for the points.
-- Stew
instr() is not compatible with your version of what? Oracle? Are you using version 4 or something?
There is no need to create your own function, and quite frankly, it seems a waste of time when this can be done fairly easily with sql functions that already exist. Care must be taken to account for sloppy data entry.
Here is another way to accomplish your stated goal:
with name_list as
(select ' Parisi, Kenneth R' name from dual)
select name
-- There may be a space after the comma. This will strip an arbitrary
-- amount of whitespace from the first name, so we can easily extract
-- the first initial.
, substr(trim(substr(name, instr(name, ',') + 1)), 1, 1) AS first_init
-- a simple substring function, from the first character until the
-- last character before the comma.
, substr(trim(name), 1, instr(trim(name), ',') - 1) AS last_name
-- put together what we have done above to create the output field
, lower(substr(trim(substr(name, instr(name, ',') + 1)), 1, 1)) ||
lower(substr(trim(name), 1, instr(trim(name), ',') - 1)) AS init_plus_last
from name_list;
HTH,
Gabe
I have a hard time believing you don’t have access to a proper instr() but if that’s the case, implement your own version.
Assuming you have that straightened out:
select
substr(
lower( 'Smith, John R' )
, instr( 'Smith, John R', ',' ) + 2
, 1
) || -- first_initial
substr(
lower( 'Smith, John R' )
, 1
, instr( 'Smith, John R', ',' ) - 1
) -- last_name
from dual;
Also, be careful about your assumption that all names will be in that format. Watch out for something other than a single space after the comma, last names having data like “Parisi, Jr.”, etc.