I'm trying to return a substring of the following, it's comma delimited [only one comma]
City-City-City, State-State-State
Sometimes it's only one city and state, sometimes it's more than one of either [or both]
Basically, I need to just return the state initials pass the comma.
What's the best way to do this? I'm looking into the substring function, but that doesn't seem that smart. I found a split function but it looks like overkill and I don't like to use code I don't understand.
Ex:
Cincinnati-Middletown, OH-KY-IN
Cleveland-Elyria-Mentor, OH
Abilene, TX
Output:
OH-KY-IN
OH
TX
Thanks for the answers;I just figured it out thanks to Sonam's starting point.
Here's what I got. Haven't looked into it but it seems to returning the right stuff.
select substring(CBSAName,charindex(',',CBSAName)+1, LEN(CBSAName)) FROM CBSAMasterList
select substring('Abilene, TX',charindex(',','Abilene, TX')+2,2)
Related
update: there are situations that dot position that might not be the best solution.
I got a column of website.
website
www.abc.google.com
www.bcd.google.com
wwww.efd.google.co.za
I want to transform it into
website
google.com
google.com
google.co.za
Anyone knows how to split based on the '.' position from the right?
Thanks.
regexp_substr() does exactly what you want:
select regexp_substr('www.abc.google.com', '[^.]*[.][^.]*$')
Split String based on the second dot from right
you can use regexp_extract(website, r'(\w+\.\w+)$')
or
there are situations that dot position that might not be the best solution.
net.reg_domain(website)
if apply to sample data in your question - the last one gives below output
My task is to validate existing data in an MSSQL database. I've got some SQL experience, but not enough, apparently. We have a zip code field that must be either 5 or 9 digits (US zip). What we are finding in the zip field are embedded spaces and other oddities that will be prevented in the future. I've searched enough to find the references for LIKE that leave me with this "novice approach":
ZIP NOT LIKE '[0-9][0-9][0-9][0-9][0-9]'
AND ZIP NOT LIKE '[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
Is this really what I must code? Is there nothing similar to...?
ZIP NOT LIKE '[\d]{5}' AND ZIP NOT LIKE '[\d]{9}'
I will loath validating longer fields! I suppose, ultimately, both code sequences will be equally efficient (or should be).
Thanks for your help
Unfortunately, LIKE is not regex-compatible so nothing of the sort \d. Although, combining a length function with a numeric function may provide an acceptable result:
WHERE ISNUMERIC(ZIP) <> 1 OR LEN(ZIP) NOT IN(5,9)
I would however not recommend it because it ISNUMERIC will return 1 for a +, - or valid currency symbol. Especially the minus sign may be prevalent in the data set, so I'd still favor your "novice" approach.
Another approach is to use:
ZIP NOT LIKE '%[^0-9]%' OR LEN(ZIP) NOT IN(5,9)
which will find any row where zip does not contain any character that is not 0-9 (i.e only 0-9 allowed) where the length is not 5 or 9.
There are few ways you could achieve that.
You can replace [0-9] with _ like
ZIP NOT LIKE '_'
USE LEN() so it's like
LEN(ZIP) NOT IN(5,9)
You are looking for LENGTH()
select * from table WHERE length(ZIP)=5;
select * from table WHERE length(ZIP)=9;
To test for non-numeric values you can use ISNUMERIC():
WHERE ISNUMERIC(ZIP) <> 1
I want to update a database table field using another field in the same table. Currently I have this table called sources.
Name Code
In the name column I have values like this example :
' Deals On Wheels '
'Homesru - Abu Dhabi - Madinat Zayed Gold Centre'
And I am having this update statement :
UPDATE Sources
SET Code = REPLACE((LTRIM(RTRIM(Name))),' ','-')
the result is :
Deals-On-Wheels-Al-Aweer
which is fine.
but for second one I have this :
Homesru---Abu-Dhabi---Madinat-Zayed-Gold-Centre
I want it to be like this :
Homesru-Abu-Dhabi-Madinat-Zayed-Gold-Centre
How can I Achieve this ? Any Help is appreciated.
As suggested by #DanielE. my answer will point to a more global solution, in case you ever need to replace duplicated/triplicated/quadriplicated/... occurrences of a character on a string.
I'll not create a full solution for this issue, is a recurring question and there are really good solutions around already. Check these links:
SQL Server Central: remove spaces between specific character in a string?. This forum post will point to the next link I'm posting here. But is good to know what they are asking and answering.
Replace multiple spaces with new one but you can slightly modify it to replace any character you want.
You can also rely on this answer Find and remove repeated strings from Aaron Bertrand.
try
REPLACE((LTRIM(RTRIM(REPLACE((LTRIM(RTRIM(Name))),' - ','-')))),' ','-')
this will first replace ' - ' with just '-'
You might want to look into using a UDF to do a regular expression search and replace. See https://launchpad.net/mysql-udf-regexp
Hi Ive tried to find an answer to this but cant find one.
Id like to remove some characters and prepend a pound sign to the result of an SQL query which looks as follows (its already using a replace command can I stack these)?:
select fundraiser.Company_Name,
replace(Just_Giving_Campaign,'"label":',''),
sum(fundraising_campaigns.Total_Collected) as donations
from fundraising_campaigns,
fundraiser
where Charity_Name = 'WaterAid'
and fundraising_campaigns.Campaigners_ID = fundraiser.id
group by fundraiser.Company_Name
Can anyone confirm how I would go about adding (£ sign) and remove several sets of characters from a select statement.Certainly dont appear to be able to stack replace statements (e.g.
replace(replace (string, what to match, what to replace it with), what to match, what to replace it with)
Appreciate any thoughts
I am not sure about your question. If I am correct you want to prepend £ and do some nested replace. Hope the below example helps.
select '£'+replace(replace('YourText','x','s'),'You','U')
I have a data set that I import into a SQL table every night. One field is 'Address_3' and contains the City, State, Zip and Country fields. However, this data isn't standardized. How can I best parse the data that is currently going into 1 field into individual fields. Here are some examples of the data I might receive:
'INDIANAPOLIS, IN 46268 US'
'INDIANAPOLIS, IN 46268-1234 US'
'INDIANAPOLIS, IN 46268-1234'
'INDIANAPOLIS, IN 46268'
Thanks in advance!
David
I've done something similar (not in T-SQL) and I find it works best to start at the end of the string and work backwards.
Grab the rightmost element up to the first space or comma.
Is it a known country code? It's a country
If not, is it all numeric (including a hyphen)? It's a zip code.
Else discard it
Grab the second rightmost element up to the next space or comma
Is it a two alpha-character field? It's the state
Grab everything else preceding the last comma and call it the city.
You'll need to make some adjustments based on what your input data looks like but the basic idea is to start from the right, grab the elements you can easily classify and call everything else the city.
You can implement something like this by using the REVERSE function to make searching easier (in which case you'll be parsing the string from left to right instead of right to left like I said above), the PATINDEX or CHARINDEX functions to find spaces and commas, and the SUBSTRING function to pull the address apart based on the positions found by PATINDEX and CHARINDEX. You could use the ASCII function to determine if a character is numeric or not.
You tagged your question with the SSIS tag as well - it might be easier to implement the parsing in some VB script in SSIS rather than try to do it with T-SQL.
By far the best way is to not reinvent the wheel and get an address parsing and standardization engine. Ideally, you would use a CASS certified engine which is what is approved by the Postal Service. However, there are free address parsers on the net these days and any of those would be more accurate and less frustrating than trying to parse the address yourself.
That said, I will say that address parsers and the Post Office work from bottom up (So, country, then zip code, then city, then state then address line 2 etc.).
In SSIS you can have 4 derived columns (city,state,zip,country).
substring(column,1,FINDSTRING(",",column,1)-1) --city
substring(column,FINDSTRING(" ",column,1)+1,FINDSTRING("",column,2)-1) --state
substring(column,FINDSTRING(" ",column,2)+1,FINDSTRING(" ",column,3)-1) -- zip
You can see the pattern above and continue accordingly. This might get a bit complicated. You can use a Script Component to better pull out the lines of text.
something like this should help:
select substring(CityStateZip, 1,
case when charindex(',',reverse(CityStateZip)) = 0 then len(CityStateZip)
else len(CityStateZip) - charindex(',',reverse(CityStateZip)) end) as City,
LEFT(LTRIM(
SUBSTRING(CityStateZip, case when charindex(',',reverse(CityStateZip)) = 0 then len(CityStateZip) else
len(CityStateZip) - charindex(',',reverse(CityStateZip))+2 end, LEN(CityStateZip)))
,2) as State,
SUBSTRING(CityStateZip, case when charindex(' ',reverse(CityStateZip)) = 0 then len(CityStateZip) else
len(CityStateZip) - charindex(' ',reverse(CityStateZip))+2 end, LEN(CityStateZip)) as Zip
from YourAddressTable