Using SQL - how do I match an exact number of characters? - sql

My task is to validate existing data in an MSSQL database. I've got some SQL experience, but not enough, apparently. We have a zip code field that must be either 5 or 9 digits (US zip). What we are finding in the zip field are embedded spaces and other oddities that will be prevented in the future. I've searched enough to find the references for LIKE that leave me with this "novice approach":
ZIP NOT LIKE '[0-9][0-9][0-9][0-9][0-9]'
AND ZIP NOT LIKE '[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
Is this really what I must code? Is there nothing similar to...?
ZIP NOT LIKE '[\d]{5}' AND ZIP NOT LIKE '[\d]{9}'
I will loath validating longer fields! I suppose, ultimately, both code sequences will be equally efficient (or should be).
Thanks for your help

Unfortunately, LIKE is not regex-compatible so nothing of the sort \d. Although, combining a length function with a numeric function may provide an acceptable result:
WHERE ISNUMERIC(ZIP) <> 1 OR LEN(ZIP) NOT IN(5,9)
I would however not recommend it because it ISNUMERIC will return 1 for a +, - or valid currency symbol. Especially the minus sign may be prevalent in the data set, so I'd still favor your "novice" approach.
Another approach is to use:
ZIP NOT LIKE '%[^0-9]%' OR LEN(ZIP) NOT IN(5,9)
which will find any row where zip does not contain any character that is not 0-9 (i.e only 0-9 allowed) where the length is not 5 or 9.

There are few ways you could achieve that.
You can replace [0-9] with _ like
ZIP NOT LIKE '_'
USE LEN() so it's like
LEN(ZIP) NOT IN(5,9)

You are looking for LENGTH()
select * from table WHERE length(ZIP)=5;
select * from table WHERE length(ZIP)=9;

To test for non-numeric values you can use ISNUMERIC():
WHERE ISNUMERIC(ZIP) <> 1

Related

Regex match first number if it does not appear at the end

I am currently facing a Regex problem which apparently I cannot find an answer to.
My Regex is embedded in a teradata SQL of the form:
REGEXP_SUBSTR(column, 'regex_pattern')
I want to find the first appearance of any number except if it appears at the end of the string.
For Example:
"YEL2X30" -> "2"
"YEL19XYZ05" -> "19"
"YELLOW05" -> ""
I tried it with '[0-9]+(?!$)/' but this returns me a blank String always.
Thanks in Advance!
Shot in the dark here since I'm unfamiliar with teradata and the supported SQL-functionality. However, reading the docs on the REGEXP_SUBSTR() function it seems like you may want to use the 3rd and 4th possible argument along with a slightly different regular expression:
[0-9]+(?![0-9]|$)
Meaning: 1+ Digits that are not followed by either the end of the string or another digit.
I'd believe the following syntax may work now to retrieve the 1st appearance of any number from the matching results:
REGEXP_SUBSTR(column, '[0-9]+(?![0-9]|$)', 1, 1)
The 3rd parameter states from which position in the source-string we need to start searching whereas the 4th will return the 1st match from any possible multiple matches (is how I read the docs). For example: abc123def456ghi789 whould return 123.
Fiddling around in online IDE's gave me that:
CREATE TABLE TBL (TST varchar(100));
INSERT INTO TBL values ('YEL2X30'), ('YEL19XYZ05'), ('YELLOW05'), ('abc123def456ghi789');
SELECT REGEXP_SUBSTR(TST, '[0-9]+(?![0-9]|$)', 1, 1) as 'RESULTS' FROM TBL;
Resulted in:
RESULTS
2
19
NULL
123
NOTE: I also noticed that leaving out the 3rd and 4th parameter made no difference since they will default back to 1 without explicitly mentioning them. I tested this over here.
Possibly the simplest way is to look for digits followed by a non-digit. Then keep all the digits:
regexp_substr(regexp_substr(column, '[0-9]+[^0-9]'), '[0-9]+')

Using SQL like for pattern query

I have a PHP function that accepts a parameter called $letter and I want to set the default value of the parameter to a pattern which is "any number or any symbol". How can I do that?
This is my query by the way .
select ID from $wpdb->posts where post_title LIKE '".$letter."%
I tried posting at wordpress stackexchange and they told me to post it here as this is an SQL/general programming question that specific to wordpress.
Thank you! Replies much appreciated :)
In order to match just numbers or letters (I'm not sure exactly what you mean by symbols) you can use the RLIKE operator in MySQL:
SELECT ... WHERE post_title RLIKE '^[A-Za-z0-9]'
That means by default $letter would be [A-Za-z0-9] - this means all letters from a to z (both cases) and numbers from 0-9. If you need specific symbols you can add them to the list (but - has to be first or last, since otherwise it has a special meaning of range). The ^ character tells it to be at the beginning of the string. So you will need something like:
"select ID from $wpdb->posts where post_title RLIKE '^".$letter."%'"
Of course I have to warn you against SQL injection attacks if you build your query like this without sanitizing the input (making sure it doesn't have any ' (apostrophe) in it.
Edit
To match a title that starts with a number just use [0-9] - that means it will match one digit from 0 to 9

Strip first 2 characters and replace with another

This seems like quite a silly basic question but it seems something I have completely passed over in my knowledge.
Basically I have a string representing a phone number (0 being the area code): 0828889988
And would like to replace the first zero with a 27 (South African dialing code) as I am pretty sure it will always be so, my SMS api requires it in international format but I want the user to enter it in local format, so should be: 27828889988
Is there a line or two of code I can call to replace that first character with the two others?
As is - I can think around a workaround solution but as I am not sure of the direct syntax will be quite a few lines long.
Dim number as String = "0828889988"
number = "27" + number.SubString(1)
return number //returns "27828889988"

inserting number into oracle sql - using jython

I have this insert command where iam trying to insert a number to be taken from loop
i=0
for line in column:
myStmt.executeQuery("INSERT INTO REVERSE_COL
( TABLE_NAME,COL_NAME,POS) values
(,'test','"+column[i]+"','"+i+"'")
i=i+1
POS IS NUMBER DATATYPE
but it works if i hard code as 1
i=0
for line in column:
myStmt.executeQuery("INSERT INTO REVERSE_COL
( TABLE_NAME,COL_NAME,POS) values
(,'test','"+column[i]+"',1")
I have tried only i , +i+ and other method but its not working any suggestion how to solve this .
Thanks everyone .
I have no jython experience, but I will still try to offer my personal approach and advice. Take from it what you will.
The first thing that I would look into, and perhaps this is something someone else knows offhand, is the way that a number is concatenated to the string. I'm speaking from a C++ background here, but a number i may well be converted to the ASCII character representing that value, and not necessarily the character that you intend.
For example, if i is 9, it may be placing a TAB into the string and not the number 9, which would be an ASCII value 57.
Again, I'm not telling you this IS the answer...but it's the first thing that pops into my mind. Good luck!

What's the best way to parse an Address field using t-sql or SSIS?

I have a data set that I import into a SQL table every night. One field is 'Address_3' and contains the City, State, Zip and Country fields. However, this data isn't standardized. How can I best parse the data that is currently going into 1 field into individual fields. Here are some examples of the data I might receive:
'INDIANAPOLIS, IN 46268 US'
'INDIANAPOLIS, IN 46268-1234 US'
'INDIANAPOLIS, IN 46268-1234'
'INDIANAPOLIS, IN 46268'
Thanks in advance!
David
I've done something similar (not in T-SQL) and I find it works best to start at the end of the string and work backwards.
Grab the rightmost element up to the first space or comma.
Is it a known country code? It's a country
If not, is it all numeric (including a hyphen)? It's a zip code.
Else discard it
Grab the second rightmost element up to the next space or comma
Is it a two alpha-character field? It's the state
Grab everything else preceding the last comma and call it the city.
You'll need to make some adjustments based on what your input data looks like but the basic idea is to start from the right, grab the elements you can easily classify and call everything else the city.
You can implement something like this by using the REVERSE function to make searching easier (in which case you'll be parsing the string from left to right instead of right to left like I said above), the PATINDEX or CHARINDEX functions to find spaces and commas, and the SUBSTRING function to pull the address apart based on the positions found by PATINDEX and CHARINDEX. You could use the ASCII function to determine if a character is numeric or not.
You tagged your question with the SSIS tag as well - it might be easier to implement the parsing in some VB script in SSIS rather than try to do it with T-SQL.
By far the best way is to not reinvent the wheel and get an address parsing and standardization engine. Ideally, you would use a CASS certified engine which is what is approved by the Postal Service. However, there are free address parsers on the net these days and any of those would be more accurate and less frustrating than trying to parse the address yourself.
That said, I will say that address parsers and the Post Office work from bottom up (So, country, then zip code, then city, then state then address line 2 etc.).
In SSIS you can have 4 derived columns (city,state,zip,country).
substring(column,1,FINDSTRING(",",column,1)-1) --city
substring(column,FINDSTRING(" ",column,1)+1,FINDSTRING("",column,2)-1) --state
substring(column,FINDSTRING(" ",column,2)+1,FINDSTRING(" ",column,3)-1) -- zip
You can see the pattern above and continue accordingly. This might get a bit complicated. You can use a Script Component to better pull out the lines of text.
something like this should help:
select substring(CityStateZip, 1,
case when charindex(',',reverse(CityStateZip)) = 0 then len(CityStateZip)
else len(CityStateZip) - charindex(',',reverse(CityStateZip)) end) as City,
LEFT(LTRIM(
SUBSTRING(CityStateZip, case when charindex(',',reverse(CityStateZip)) = 0 then len(CityStateZip) else
len(CityStateZip) - charindex(',',reverse(CityStateZip))+2 end, LEN(CityStateZip)))
,2) as State,
SUBSTRING(CityStateZip, case when charindex(' ',reverse(CityStateZip)) = 0 then len(CityStateZip) else
len(CityStateZip) - charindex(' ',reverse(CityStateZip))+2 end, LEN(CityStateZip)) as Zip
from YourAddressTable