How to remove Roman letter and numeric value from column in SQL - sql

Using SQL Server, I have a column with numeric and Roman numerals at the end. How do I remove the numeric alone without specifying the position?
Job_Title
Data Analyst 2
Manager 50
Robotics 1615
Software Engineer
DATA ENGINEER III
I tried using this query:
SELECT
CASE
WHEN PATINDEX('%[0-9 ]%', job_title) > 0
THEN RTRIM(SUBSTRING(Job_title, 1, PATINDEX('%[0-9 ]%', job_title) - 1))
ELSE JOB_TITLE
END
FROM
my_table
WHERE
PATINDEX('%[0-9]%', JOB_TITLE) <> 0
But the result I'm getting is:
Job_Title
Data
Manager
Robotics

Use the TRANSLATE function like this :
SELECT TRANSLATE(Job_title, '0123456789', ' ') AS JOB_TITLE
from my_table
You can use RTRIM to complete

You should remove the space character in the regex expression. So, new code should be
SELECT case when patindex('%[0-9]%', job_title) > 0 then
rtrim(substring(Job_title,1, patindex('%[0-9]%', job_title) - 1))
else
JOB_TITLE
end
from my_table
WHERE PATINDEX('%[0-9]%',JOB_TITLE) <>0

I think you're trying to remove numbers from the end of a job title, and not exclude results. So, as others have mentioned, you need to remove the space from the brackets of the regex and put it in front of the brackets to say it is separated from the stuff in front of it by a space. But I think you also need to remove the wildcard character from the right side of the comparison value so that the numbers have to be at the end of the job title, like...
SELECT case when patindex('% [0-9]', job_title) > 0 then
rtrim(substring(Job_title,1, patindex('% [0-9]', job_title) - 1))
else
JOB_TITLE
end
from my_table
But, you also mention roman numerals... and... that's tougher if it's possible for a job title to end in something like " X" where it means "X" and not "10". If that's not possible, you should just be able to do [0-9IVXivx] to replace all the bracketed segments.

Related

RIGHT of CHARINDEX not selecting correctly

I am trying to parse out a last name field that may have two last names that are separated by either a blank space ' ' or a hyphen '-' or it may only have one name.
Here is what I'm using to do that:
select top 1000
BENE_FIRST_NAME,
BENE_LAST_NAME,
FirstNm =
case
when BENE_FIRST_NAME like '% %' then
left(BENE_FIRST_NAME, CHARINDEX(' ', BENE_FIRST_NAME))
when BENE_FIRST_NAME like '%-%' then
left(BENE_FIRST_NAME, CHARINDEX('-', BENE_FIRST_NAME))
else BENE_FIRST_NAME
end,
LastNm =
case
when BENE_LAST_NAME like '% %' then
right(BENE_LAST_NAME, CHARINDEX(' ', BENE_LAST_NAME))
when BENE_LAST_NAME like '%-%' then
right(BENE_LAST_NAME, CHARINDEX('-', BENE_LAST_NAME))
else BENE_LAST_NAME
end,
CharIndxDash = CHARINDEX('-', BENE_LAST_NAME),
CharIndxSpace = CHARINDEX(' ', BENE_LAST_NAME)
from xMIUR_Elig_Raw_v3
Here are some results:
BENE_FIRST_NAME
BENE_LAST_NAME
FirstNm
LastNm
CharIndxDash
CharIndxSpace
JUANA
PEREZ-MARTINEZ
JUANA
RTINEZ
6
0
EMILIANO
PICENO ESPINOZA
EMILIANO
SPINOZA
0
7
JULIAN
NIETO-CARRENO
JULIAN
ARRENO
6
0
EMILY
SALMERON TERRIQUEZ
EMILY
TERRIQUEZ
0
9
The CHARINDEX seems to be selecting the correct position but it is not bringing in all of the CHARs to the right of that position. Sometimes it works like in the last record. But sometimes it is off by 1. And sometimes it is off by 2. Any ideas?
If you need to select part of a last name after space/hyphen, you need to get right part of the string with length = total_lenght - space_position:
...
LastNm =
case
when BENE_LAST_NAME like '% %' then
right(BENE_LAST_NAME, LEN(BENE_LAST_NAME) - CHARINDEX(' ', BENE_LAST_NAME))
when BENE_LAST_NAME like '%-%' then
right(BENE_LAST_NAME, LEN(BENE_LAST_NAME) -CHARINDEX('-', BENE_LAST_NAME))
else BENE_LAST_NAME
end,
...
Your last name logic doesn't make sense..
RIGHT takes N chars from the right of the string
CHARINDEX gives the position of a char from the left of the string
You can't use it to find a position from left and then take that number of chars from the right of the string
Here's a name:
JOHN MALKOVICH
The space is at 5. If you take 5 chars from the right, you get OVICH. The shorter the name before the space and the longer the name after the space, the fewer chars you get from the last name
Perhaps you mean to put a LEN in there so you take the string length minus the index of the space.. You can also use it in a call to SUBSTRING as the start index, and tell SQLS to take 9999 chars (of any number longer than the remaining string) and it will take up to the end of the string
SUBSTRING(name, CHARINDEX(' ', name)+1, 9999)
I think you can simplify your code by a lot. Consider below with a different but representative sample data
with data (name) as
(select 'first-last' union select 'first last' union select 'firstlast'),
data_prepped (name, indx) as
(select name,coalesce(nullif(charindex(' ', name)+charindex('-', name),0),len(name))
from data)
select name,
left(name, indx-1) as part1,
right(name, indx) as part2
from data_prepped

Remove special character from begining and end of string in sql

sql server 2008
I have a data in a column something like
"Brake pad kit, disc brake"
/Brake disk (sold separately).
"The belt pulley, crankshaft"
Fuel Pump
the special character are "",space,/
i want to remove any special character or space present in begining or end of the string.
is this possible to do in sql, not sure.
Please share your thoughts.
Here is one way to do it using String functions
DECLARE #str VARCHAR(200)= '"The belt pulley, crankshaft"'
SELECT Reverse(CASE
WHEN LEFT(Reverse(scd_str), 1) LIKE '[A-Z]' OR LEFT(Reverse(scd_str), 1) LIKE '[a-z]' THEN Reverse(scd_str)
ELSE Substring(Reverse(scd_str), 2, Len(Reverse(scd_str)))
END)
FROM (SELECT CASE
WHEN LEFT(string, 1) LIKE '[A-Z]' OR LEFT(string, 1) LIKE '[a-z]' THEN string
ELSE Substring(string, 2, Len(string))
END AS Scd_Str
FROM (SELECT Rtrim(Ltrim(#str)) AS string) A) B
Result : The belt pulley, crankshaft

trim the column value string

In SQL Query, I need the values as below using select query of my column.
Result has to be the text after the first space ' ' and before the first '('
Source Column
create Table Test_Table (Column1 Varchar(50))
Insert into Test_Table Values
('0636 KAVITHI (LOC)'),
('0638 SRI KRISHNA (NAT)'),
('0639 SELVAM'),
('0643 GOOD SERVICE (LOC)'),
('0644 FINA CARE EVENT (LOC)')
I need get the string found between first ' ' and the '('
Expected Result
KAVITHI
SRI KRISHNA
SELVAM
GOOD SERVICE
FINA CARE EVENT
Another approach without using an OUTER APPLY.
SELECT CASE WHEN Column1 LIKE '%(%'
THEN SUBSTRING(RIGHT(Column1,LEN(Column1)-CHARINDEX(' ',Column1)),0,
CHARINDEX('(',RIGHT(Column1,LEN(Column1)-CHARINDEX(' ',Column1)),0))
ELSE RIGHT(Column1,LEN(Column1)-CHARINDEX(' ',Column1))
END AS Trimmed
FROM Test_Table
OUTPUT
Trimmed
KAVITHI
SRI KRISHNA
SELVAM
GOOD SERVICE
FINA CARE EVENT
SQL Fiddle: http://sqlfiddle.com/#!3/69dd1/20/0
CHARINDEX() can be used to find the position of specific characters.
OUTER APPLY can be used to find the position of the space and brace characters, and store them in a place that you can re-use them.
SUBSTRING() can be used to find the text between the space and the brace.
EDIT: Added CASE to cope with values that contain no (.
SELECT
SUBSTRING(
test_table.column1, -- the field we're searching
stats.idx_space + 1, -- starting from the character after the first space
CASE
WHEN stats.idx_brace > stats.idx_space
THEN stats.idx_brace
ELSE stats.idx_eos
END
-
stats.idx_space -- for as many characters as there are between the space and the brace
)
FROM
test_table
OUTER APPLY
(
SELECT
CHARINDEX(' ', test_table.column1) AS idx_space, -- position of the first space
CHARINDEX('(', test_table.column1) AS idx_brace, -- position of the first brace
LEN(test_table.column1) AS idx_eos -- position of the end-of-string
)
AS stats
EDIT: A single "line", as requested.
Do note that forcing this as a single line does make this harder to read, maintain and adapt. One of APPLY's strongest use-cases is to maintain DRY (Don't Repeat Yourself) principles.
This query repeats several parts several times:
- find the first space repeated 2 times
- find the first brace repeated 3 times
SELECT
SUBSTRING(
test_table.column1,
CHARINDEX(' ', test_table.column1) + 1,
CASE
WHEN CHARINDEX('(', test_table.column1) > CHARINDEX(' ', test_table.column1)
THEN CHARINDEX('(', test_table.column1)
ELSE LEN(test_table.column1)
END
-
CHARINDEX('(', test_table.column1)
)
FROM
test_table

SQL Server - select substring of all characters following last hyphen

I am working with a database of products, trying to extract the product color from a combined ID/color code column where the color code is always the string following the last hyphen in the column. The issue is that the number of hyphens, product ID, and color code can all be different.
Here are four examples:
ABC123-001
BCD45678-0165
S-XYZ999-M2235
A-S-ABC123-001
The color codes in this case would be 001, 0165, M2235, and 001. What would be the best way to select these into their own column?
I think the following does what you want:
select right(col, charindex('-', reverse(col)) - 1)
In the event that you might have no hyphens in the value, then use a case:
select (case when col like '%-%'
then right(col, charindex('-', reverse(col)) - 1)
else col
end)
It is great to check whether the hyphen exists or not in the string, to avoid the following error:
Invalid length parameter passed to the right function.
SELECT CASE WHEN Col like '%\%' THEN RIGHT(Col,CHARINDEX('\',REVERSE(Col))-1) ELSE '' END AS ColName

Get index of two consecutive upper case characters

I am trying to separate a city/state/zip field into the city, state, and zip. Normally I would do this with charindex of ',' to get the city and state, and isnumeric and right() for the zip.
This will work fine for the zip, but most of the rows in the data I am working with now have no commas City ST Zip. Is there a way to identify the index of two upper case characters?
If not, does anybody have a better idea than just a case statement checking for each state individually?
EDIT: I found the PATINDEX/COLLATE option to work fairly intermittently. See my answer below.
PATINDEX should work for you:
PATINDEX('% [A-Z][A-Z] %', A COLLATE Latin1_general_cs_as)
So your full extract would be something like:
WITH CTE AS
( SELECT i = PATINDEX('% [A-Z][A-Z] %', A COLLATE Latin1_general_cs_as) + 1,
A
FROM (VALUES
('City ST Zip'),
('Another City ST Zip'),
('City, with comma ST Zip')
) t (A)
)
SELECT City = LEFT(A, i - 2),
State = SUBSTRING(A, i, 2),
Zip = SUBSTRING(A, i + 3, LEN(A))
FROM CTE;
Example on SQL Fiddle
The reason why PATINDEX appears to work intermittently is that you cannot use a character range (i.e. A-Z) to accomplish a case-sensitive search, even if using a case-sensitive collation. The issue is that character ranges work like sorting, and case-sensitive sorting groups the upper-case letters with their lower-case equivalents, just like it would be ordered in a dictionary. Range sorting is really: a,A,b,B,c,C,d,D,etc. Or, depending on the collation, it might be: A,a,B,b,C,c,D,d,etc (there are 31 Collations that sort upper-case first). When doing this in a case-sensitive collation, that merely groups all A entries together, separate from the a entries, whereas in a case-insensitive sort they would be intermixed.
But if you specify each of the letters individually (hence not using a range), then it will work as expected:
PATINDEX(N'%[ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ]%',
[CityStZip] COLLATE Latin1_General_100_CS_AS)
The reason that PATINDEX and LIKE (both of which allow for a single character class of [A-Z]) work this way is that the [start-end] syntax is not a Regular Expression. Many people claim that PATINDEX and LIKE support "limited" RegEx due to supporting this syntax, but that is not true. It is merely a very similar (and a confusingly similar) syntax to RegEx where [A-Z] would normally not include any lower-case matches.
Of course, if you are guaranteed to only be searching on the US-English letters of A-Z, then a binary collation (i.e. one ending in _BIN2; don't use ones ending in _BIN as they have been deprecated since SQL Server 2005 was introduced, I believe) should work.
PATINDEX(N'%[A-Z][A-Z]%', [CityStZip] COLLATE Latin1_General_100_BIN2)
For more details about case-sensitive matching, especially in regards to including Unicode / NVARCHAR data, please see my related answer on DBA.StackExchange:
How to find values with multiple consecutive upper case characters
If you have zip code and state at the end of the string, then this might work:
select right(address, 5) as zip,
left(right(address, 8), 2) as state,
left(address, len(address) - 9) as city
You can start by removing the commas and double spaces from the address.
If you have a table of states(which you should) with a column of the abbreviations you can do things like this:
SELECT a.* FROM Addresses a
INNER JOIN States s ON
a.CityStateZip Like '% ' + s.UpperCaseAbbreviation + ' %' --space on either side of abbreviation
You can make it work for both commas and spaces:
SELECT a.* FROM Addresses a
INNER JOIN States s ON
Replace(a.CityStateZip, ',' , ' ') Like '% ' + s.UpperCaseAbbreviation + ' %'
I found the PATINDEX/COLLATE option to work fairly intermittently. Here is what I ended up doing:
--get rid of the sparsely used commas
--get rid of the duplicate spaces
update MyTable set
CityStZip=
replace(
replace(
replace(CityStZip,' ',' '),
' ',' '),
',','')
select
--check if state and zip are there and then grab the city
case when isNumeric(right(CityStZip,1))=1
then left(CityStZip,len(CityStZip)-charindex(' ',reverse(CityStZip),
charindex(' ',reverse(CityStZip))+1)+1)
--no zip. check for state
when left(right(CityStZip,3),1) = ' '
then left(CityStZip,len(CityStZip)-charIndex(' ',reverse(CityStZip)))
else CityStZip
end as City,
--check if zip is there and then grab the city
case when isNumeric(right(CityStZip,1))=1
then substring(CityStZip,
len(CityStZip)-charindex(' ',reverse(CityStZip),
charindex(' ',reverse(CityStZip))+1)+2,
2)
--no zip. check if 3rd to last char is a space and grab the last two chars
when left(right(CityStZip,3),1) = ' '
then right(CityStZip,2)
end as [State],
--grab everything after the last space if the last character is numeric
case when isNumeric(right(CityStZip,1))=1
then substring(CityStZip,
len(CityStZip)-charindex(' ',reverse(CityStZip))+1,
charindex(' ',reverse(CityStZip)))
end as Zip
from MyTable