SQL Check constraint on substrings - sql

Forgive me, if this problem is solved in another ticket on SO... I've been searching, but can't seem to find quite the right solution...
I am creating a table. Column 'Creditor' is numeric all the way, EXCEPT that the very last char may be a dash.
This means that examples like '1234-', '92846293' and so on, are valid, and that '12354-53', '12345K' are invalid.
The string length is not fixed (except it's a varchar(50)).
I don't know how to create the check constraint.
Please help!

You did not state your DBMS. For PostgreSQL this would be:
alter table foo
add constraint check_creditor check (creditor ~ '^([0-9]+)\-?$');
For Oracle this would be:
alter table foo
add constraint check_creditor check (regexp_like(creditor, '^([0-9]+)\-?$'))
If your DBMS supports regular expressions, you will need to use the syntax for your DBMS to check this. The regular expression itself '^([0-9]+)\-$' will most probably be the same though.

Thank you for your replies.
The proposal on '%[^0-9]%' was a nice eye-opener for me, as I didn't know the ^ operator before.
I did two versions of the required constraint. One using "only" AND, OR, substrings and isnumeric. No fancy indices or exclusions. Was waaaay too long.
The other version consisted of AND, OR, substrings and insmuric, but with the inclusion of the proposed ^ operations. Looks a lot nicer.
Then, in the end, I went with a third solution :-)
Added a bool column on the DB, RequiresCreditorValidation, and implemented a Regex in my C# code.
For others hoping to benefit from the resulting checks, here they are.
Starting with the "ugly" one:
CHECK ((val NOT IN ('+','-') AND (ISNUMERIC(val) = 1) OR
(ISNUMERIC(SUBSTRING(val, 1, DATALENGTH(val) -1))) = 1) AND
((SUBSTRING(val, (DATALENGTH(val)),1) LIKE '[0-9]') OR
(SUBSTRING(val, DATALENGTH(val),1) = '-')) AND
(SUBSTRING(val, 1, 1) NOT IN ('+','-')) )
The second one:
CHECK ( (SUBSTRING(val, 1, DATALENGTH(val) - 1) NOT LIKE '%[^0-9]%') AND
(SUBSTRING(val, DATALENGTH(val),1) LIKE '%[0-9-]') AND (DATALENGTH(val) > 0)
AND SUBSTRING(val, 1,1) NOT IN ('+','-') )
And then the Regex:
var allButLast = kreditorId.Substring(0, kreditorId.Length - 1);
if (Regex.Match(allButLast, "[^0-9]").Success)
return false;
if (!kreditorId.EndsWith("-"))
if (Regex.Match(kreditorId, "[^0-9]").Success)
return false;
return true;
Thank you all for good, qualified and quick replies.

For SQL Server:
Test 1: all characters apart from the last are numeric (alternatively, there does not exist a character that is non-numeric):
CHECK ( SUBSTRING(Creditor, 1, LEN(Creditor) - 1) NOT LIKE '%[^0-9]%' )
Test 2: the last character is either numeric or a dash:
CHECK ( SUBSTRING(Creditor, LEN(Creditor), 1) LIKE '%[0-9-%]' )
The above assumes Creditor cannot be the empty string i.e.
CHECK ( LEN(Creditor) > 0 )
Just for fun:
CHECK ( REVERSE(CAST(REVERSE(Creditor) + REPLICATE(0, 50) AS CHAR(50)))
LIKE REPLICATE('[0-9]', 49) + '[0-9-]' )

Related

SQL - Version comparison

this is my first question here.
I am building an SQL query in which I need to verify that the version of the object B is always lower or equal than the version of the object A. This is a link table, here is an example :
The query is :
SELECT *
FROM TABLE
WHERE B_VERSION <= A_VERSION
As you can see, it works for the 2 first rows, but not the third, because AA0 is detected as smaller than H08 while it shouldn't (when we arrive at Z99 the next version number is AA0 so the <= operator doesn't work anymore).
So I would like to do something like to parse the version to compare on how many letters are they in the versions, and only if both versions have the same number of letters then I use the <= operator.
I don't know however how to do that in an SQL query. Didn't find anything usefull on google neither. Do you have a solution ?
Thanks in advance
The key for solving this problem is the function PATINDEX. You can find more information here.
This query takes the value of A_VERSION and finds the first occurrence of a number. Then uses this position to divide the value in two parts. The first part is padded to the right with spaces because it is alphabetic, while the second part is padded to the right with zeros ('0') because it is numeric.
The same process occurs for B_VERSION.
Noticed that in this example, each part is assumed to be of maximum 5 characters, so this will work in your case for versions ranging from A0 to ZZZZZ99999. Feel free to adjust as you need.
SELECT *
FROM TABLE
WHERE RIGHT(SPACE(5)
+ SUBSTRING(A_VERSION,
1,
PATINDEX('%[0-9]%', A_VERSION) - 1), 5)
+ RIGHT(REPLICATE('0', 5)
+ SUBSTRING(A_VERSION,
PATINDEX('%[0-9]%', A_VERSION),
LEN(A_VERSION)), 5)
<= RIGHT(SPACE(5)
+ SUBSTRING(B_VERSION,
1,
PATINDEX('%[0-9]%', B_VERSION) - 1), 5)
+ RIGHT(REPLICATE('0', 5)
+ SUBSTRING(B_VERSION,
PATINDEX('%[0-9]%', B_VERSION),
LEN(B_VERSION)) ,5)
If you are going to do this operation in many places, you might consider creating a function for this operation.
Hope this helps.
Many thanks! It helped a lot however I am using sql developer and I cannot use PATINDEX with this software, I found the equivalent which is REGEXP_INSTR, it works very similarly.
I used this alrogithm that filters out the lines where there are more letters in VERSION_B than VERSION_A and then filter out the lines where VERSION_B is bigger than VERSION_A when they have both the same quantity of letters:
WHERE
(REGEXP_INSTR(VERSION_B, '[0-9]') < REGEXP_INSTR(VERSION_A, '[0-9]')) OR
(REGEXP_INSTR(VERSION_B, '[0-9]') = REGEXP_INSTR(VERSION_A, '[0-9]') AND VERSION_B <= VERSION_A)

DB2 Get value based on 5 character of field

I am working with a pretty badly designed table. I have a field called optional fields which for some reason has been used as a catch-all for someone who didn't want to create the table correctly.
I need to make a query where I look at this optional_fields value and do a comparison on the fifth value of the string in optional_fields.
The value from this field is something like NN14YN...N
My query would be something like:
SELECT COMPANY_NUMBER
FROM table
WHERE fifth character of OPtional Fields = 'Y'
Looking at the supported string functions in DB2 (according to the documentation for DB2 for Linux UNIX and Windows 9.7.0) it would seem that substr could be used:
SELECT COMPANY_NUMBER
FROM table
WHERE substr(optional_Fields,5,1) = 'Y'
In addition to the great answer from #jpw, if you for some reason need to check multiple positions within the string (which I have unfortunately had to do at one time), you can use an IN, and invert the "normal" order, like so:
...
WHERE 'Y' in (
substr(t.flags_field, 123, 1)
,substr(t.flags_field, 19, 1)
,substr(t.flags_field, 128, 1)
,substr(t.flags_field, 1, 1)
)
Just thought I would share. It surprised me the first time I used it!

Return rows where first character is non-alpha

I'm trying to retrieve all columns that start with any non alpha characters in SQlite but can't seem to get it working. I've currently got this code, but it returns every row:
SELECT * FROM TestTable WHERE TestNames NOT LIKE '[A-z]%'
Is there a way to retrieve all rows where the first character of TestNames are not part of the alphabet?
Are you going first character only?
select * from TestTable WHERE substr(TestNames,1) NOT LIKE '%[^a-zA-Z]%'
The substr function (can also be called as left() in some SQL languages) will help isolate the first char in the string for you.
edit:
Maybe substr(TestNames,1,1) in sqllite, I don't have a ready instance to test the syntax there on.
Added:
select * from TestTable WHERE Upper(substr(TestNames,1,1)) NOT in ('A','B','C','D','E',....)
Doesn't seem optimal, but functionally will work. Unsure what char commands there are to do a range of letters in SQLlite.
I used 'upper' to make it so you don't need to do lower case letters in the not in statement...kinda hope SQLlite knows what that is.
try
SELECT * FROM TestTable WHERE TestNames NOT LIKE '[^a-zA-Z]%'
SELECT * FROM NC_CRIT_ATTACH WHERE substring(FILENAME,1,1) NOT LIKE '[A-z]%';
SHOULD be a little faster as it is
A) First getting all of the data from the first column only, then scanning it.
B) Still a full-table scan unless you index this column.

SQL query - LEFT 1 = char, RIGHT 3-5 = numbers in Name

I need to filter out junk data in SQL (SQL Server 2008) table. I need to identify these records, and pull them out.
Char[0] = A..Z, a..z
Char[1] = 0..9
Char[2] = 0..9
Char[3] = 0..9
Char[4] = 0..9
{No blanks allowed}
Basically, a clean record will look like this:
T1234, U2468, K123, P50054 (4 record examples)
Junk data looks like this:
T12.., .T12, MARK, TP1, SP2, BFGL, BFPL (7 record examples)
Can someone please assist with a SQL query to do a LEFT and RIGHT method and extract those characters, and do a LIKE IN or something?
A function would be great though!
The following should work in a few different systems:
SELECT *
FROM TheTable
WHERE Data LIKE '[A-Za-z][0-9][0-9][0-9][0-9]%'
AND Data NOT LIKE '% %'
This approach will indeed match P2343, P23423JUNK, and other similar text but requires that the format is A0000*.
Now, if the OP implies a format of 1st position is a character and all succeeding positions are numeric, as in A0+, then use the following (in SQL Server and a good deal of other database systems):
SELECT *
FROM TheTable
WHERE SUBSTRING(Data, 1, 1) LIKE '[A-Za-z]'
AND SUBSTRING(Data, 2, LEN(Data) - 1) NOT LIKE '%[^0-9]%'
AND LEN(Data) >= 5
To incorporate this into a SQL Server 2008 function, since this appears to be what you'd like most, you can write:
CREATE FUNCTION ufn_IsProperFormat(#data VARCHAR(50))
RETURNS BIT
AS
BEGIN
RETURN
CASE
WHEN SUBSTRING(#Data, 1, 1) LIKE '[A-Za-z]'
AND SUBSTRING(#Data, 2, LEN(#Data) - 1) NOT LIKE '%[^0-9]%'
AND LEN(#Data) >= 5 THEN 1
ELSE 0
END
END
...and call into it like so:
SELECT *
FROM TheTable
WHERE dbo.ufn_IsProperFormat(Data) = 1
...this query needs to change for Oracle queries because Oracle doesn't appear to support bracket notation in LIKE clauses:
SELECT *
FROM TheTable
WHERE REGEXP_LIKE(Data, '^[A-za-z]\d{4,}$')
This is the expansion gbn is doing in his answer, but these versions allow for varying string lengths without the OR conditions.
EDIT: Updated to support examples in SQL Server and Oracle for ensuring the format A0+, so that A1324, A2342388, and P2342 match but A2342JUNK and A234 do not.
The Oracle REGEXP_LIKE code was borrowed from Mark's post but updated to support 4 or more numeric digits.
Added a custom SQL Server 2008 approach which implements these techniques.
Depends on your database. Many have regex functions (note examples not tested so check)
e.g. Oracle
SELECT x
FROM table
WHERE REGEXP_LIKE(x, '^[A-za-z][:digit:]{4}$')
Sybase uses LIKE
Given that you're allowing between 3 and 6 digits for the number in your examples then it's probably better to use the ISNUMERIC() function on the 2nd character onwards:
SELECT *
FROM TheTable
-- start with a letter
WHERE Data LIKE '[A-Za-z]%'
-- everything from 2nd character onwards is a number
AND ISNUMERIC( SUBSTRING( Data, 2, 50 ) ) = 1
-- number doesn't have a decimal place
AND Data NOT LIKE '%.%'
For more information look at the ISNUMERIC function on MSDN.
Also note that:
I've limited the 2nd part with the number to 50 characters maximum, change this to suit your needs.
Strictly speaking you should check for currency symbols etc, as ISNUMERIC allows them, as well as +/- and some others
A better option might be to create a function that checks that each character after the first is between 0 and 9 (or 1 and 0 if you're using ASCII codes).
You can't use Regular Expressions in SQL Server, so you have to use OR. Correcting David Andres' answer...
WHERE
(
Data LIKE '[A-Za-z][0-9][0-9][0-9]'
OR
Data LIKE '[A-Za-z][0-9][0-9][0-9][0-9]'
OR
Data LIKE '[A-Za-z][0-9][0-9][0-9][0-9][0-9]'
)
David's answer allows "D1234junk" through
You also only need "[A-Z]" if you don't have case sensitivity

What is the best way to select string fields based on character ranges?

I need to add the ability for users of my software to select records by character ranges.
How can I write a query that returns all widgets from a table whose name falls in the range Ba-Bi for example?
Currently I'm using greater than and less than operators, so the above example would become:
select * from widget
where name >= 'ba' and name < 'bj'
Notice how I have "incremented" the last character of the upper bound from i to j so that "bike" would not be left out.
Is there a generic way to find the next character after a given character based on the field's collation or would it be safer to create a second condition?
select * from widget
where name >= 'ba'
and (name < 'bi' or name like 'bi%')
My application needs to support localization. How sensitive is this kind of query to different character sets?
I also need to support both MSSQL and Oracle. What are my options for ensuring that character casing is ignored no matter what language appears in the data?
Let's skip directly to localization. Would you say "aa" >= "ba" ? Probably not, but that is where it sorts in Sweden. Also, you simply can't assume that you can ignore casing in any language. Casing is explicitly language-dependent, with the most common example being Turkish: uppercase i is İ. Lowercase I is ı.
Now, your SQL DB defines the result of <, == etc by a "collation order". This is definitely language specific. So, you should explicitly control this, for every query. A Turkish collation order will put those i's where they belong (in Turkish). You can't rely on the default collation.
As for the "increment part", don't bother. Stick to >= and <=.
For MSSQL see this thread: http://bytes.com/forum/thread483570.html .
For Oracle, it depends on your Oracle version, as Oracle 10 now supports regex(p) like queries: http://www.psoug.org/reference/regexp.html (search for regexp_like ) and see this article: http://www.oracle.com/technology/oramag/webcolumns/2003/techarticles/rischert_regexp_pt1.html
HTH
Frustratingly, the Oracle substring function is SUBSTR(), whilst it SQL-Server it's SUBSTRING().
You could write a simple wrapper around one or both of them so that they share the same function name + prototype.
Then you can just use
MY_SUBSTRING(name, 2) >= 'ba' AND MY_SUBSTRING(name, 2) <= 'bi'
or similar.
You could use this...
select * from widget
where name Like 'b[a-i]%'
This will match any row where the name starts with b, the second character is in the range a to i, and any other characters follow.
I think that I'd go with something simple like appending a high-sorting string to the end of the upper bound. Something like:
select * from widgetwhere name >= 'ba' and name <= 'bi'||'~'
I'm not sure that would survive EBCDIC conversion though
You could also do it like this:
select * from widget
where left(name, 2) between 'ba' and 'bi'
If your criteria length changes (as you seemed to indicate in a comment you left), the query would need to have the length as an input also:
declare #CriteriaLength int
set #CriteriaLength = 4
select * from widget
where left(name, #CriteriaLength) between 'baaa' and 'bike'