What is the best way to select string fields based on character ranges? - sql

I need to add the ability for users of my software to select records by character ranges.
How can I write a query that returns all widgets from a table whose name falls in the range Ba-Bi for example?
Currently I'm using greater than and less than operators, so the above example would become:
select * from widget
where name >= 'ba' and name < 'bj'
Notice how I have "incremented" the last character of the upper bound from i to j so that "bike" would not be left out.
Is there a generic way to find the next character after a given character based on the field's collation or would it be safer to create a second condition?
select * from widget
where name >= 'ba'
and (name < 'bi' or name like 'bi%')
My application needs to support localization. How sensitive is this kind of query to different character sets?
I also need to support both MSSQL and Oracle. What are my options for ensuring that character casing is ignored no matter what language appears in the data?

Let's skip directly to localization. Would you say "aa" >= "ba" ? Probably not, but that is where it sorts in Sweden. Also, you simply can't assume that you can ignore casing in any language. Casing is explicitly language-dependent, with the most common example being Turkish: uppercase i is İ. Lowercase I is ı.
Now, your SQL DB defines the result of <, == etc by a "collation order". This is definitely language specific. So, you should explicitly control this, for every query. A Turkish collation order will put those i's where they belong (in Turkish). You can't rely on the default collation.
As for the "increment part", don't bother. Stick to >= and <=.

For MSSQL see this thread: http://bytes.com/forum/thread483570.html .
For Oracle, it depends on your Oracle version, as Oracle 10 now supports regex(p) like queries: http://www.psoug.org/reference/regexp.html (search for regexp_like ) and see this article: http://www.oracle.com/technology/oramag/webcolumns/2003/techarticles/rischert_regexp_pt1.html
HTH

Frustratingly, the Oracle substring function is SUBSTR(), whilst it SQL-Server it's SUBSTRING().
You could write a simple wrapper around one or both of them so that they share the same function name + prototype.
Then you can just use
MY_SUBSTRING(name, 2) >= 'ba' AND MY_SUBSTRING(name, 2) <= 'bi'
or similar.

You could use this...
select * from widget
where name Like 'b[a-i]%'
This will match any row where the name starts with b, the second character is in the range a to i, and any other characters follow.

I think that I'd go with something simple like appending a high-sorting string to the end of the upper bound. Something like:
select * from widgetwhere name >= 'ba' and name <= 'bi'||'~'
I'm not sure that would survive EBCDIC conversion though

You could also do it like this:
select * from widget
where left(name, 2) between 'ba' and 'bi'
If your criteria length changes (as you seemed to indicate in a comment you left), the query would need to have the length as an input also:
declare #CriteriaLength int
set #CriteriaLength = 4
select * from widget
where left(name, #CriteriaLength) between 'baaa' and 'bike'

Related

SQL computed column based on a special character on another column

I am not good with SQL at all, barely have an idea on how to do basic scripts suck as delete, drop, add.
I have this data with about 12 columns, I want to add a calculated column which will change depending if a special character shows up in another column.
lets say
A C
Money$ YES
Money NO
that is the idea, I want to create a column C where it says yes if there is a $ sign on the column A. Is this possible? I am assuming you can use something similar to an if condition but I have no experience with SQL scripting.
You would use a case expression and like:
select t.*,
(case when a like '%$%' then 'YES' else 'NO' end) as c
from t;
The following is just commentary.
This is very basic syntax for SQL. I would recommend that you spend some time to learn the basics. Learning-as-you-go is an okay approach -- assuming you have some fundamentals to build on. Otherwise, you are likely to spend a lot of time to learn a few things, and you may not learn the best way to do things.
yes, this is possible. you'll have to replace the parts in braces ({}) with the appropriate object names. I also use a bit rather than 'Yes'/'No'; as that seems better suited:
ALTER TABLE {YourTable} ADD {New Column Name} AS CONVERT(bit, CASE WHEN {Column} LIKE '%$%' THEN 1 ELSE 0 END) PERSISTED;
Note that this will return 0 if the column ({Column}) has a value of NULL, not NULL; unsure if this is the correct logic however, this should be more than enough to get the ball rolling. If not, read up on the CASE expression and NULL logic.
Regexp match can help you find out if there is a character you consider as special char in the strings:
SELECT
ColumnA
, SUBSTRING(ColumnA, PATINDEX('%[^ a-zA-Z0-9]%', ColumnA), 1) AS FirstSpecialChar
WHERE
ColumnA LIKE '%[^ a-zA-Z0-9]%'
;
The pattern [^ a-zA-Z0-9] will match on any character which is not a number, a space or an alphabetic character (note the ^ at the beginning of the character group - that mean NOT)
You can use regex to check any special character in column EX:
SQL SERVER
SELECT CASE WHEN 'ABCD$' Like '%[^a-zA-Z0-9]%' 1 THEN 'YES' ELSE 'NO' END as result
MYSQL
SELECT CASE WHEN 'ABCD$' REGEXP '[^a-zA-Z0-9]' = 1 THEN 'YES' ELSE 'NO' END as result
Regex can be changed as per the requirement
REGEXP '[^[:alnum:]]'

DB2 Get value based on 5 character of field

I am working with a pretty badly designed table. I have a field called optional fields which for some reason has been used as a catch-all for someone who didn't want to create the table correctly.
I need to make a query where I look at this optional_fields value and do a comparison on the fifth value of the string in optional_fields.
The value from this field is something like NN14YN...N
My query would be something like:
SELECT COMPANY_NUMBER
FROM table
WHERE fifth character of OPtional Fields = 'Y'
Looking at the supported string functions in DB2 (according to the documentation for DB2 for Linux UNIX and Windows 9.7.0) it would seem that substr could be used:
SELECT COMPANY_NUMBER
FROM table
WHERE substr(optional_Fields,5,1) = 'Y'
In addition to the great answer from #jpw, if you for some reason need to check multiple positions within the string (which I have unfortunately had to do at one time), you can use an IN, and invert the "normal" order, like so:
...
WHERE 'Y' in (
substr(t.flags_field, 123, 1)
,substr(t.flags_field, 19, 1)
,substr(t.flags_field, 128, 1)
,substr(t.flags_field, 1, 1)
)
Just thought I would share. It surprised me the first time I used it!

How do I sort a VARCHAR column in PostgreSQL that contains words and numbers?

I need to order a select query using a varchar column, using numerical and text order. The query will be done in a java program, using jdbc over postgresql.
If I use ORDER BY in the select clause I obtain:
1
11
2
abc
However, I need to obtain:
1
2
11
abc
The problem is that the column can also contain text.
This question is similar (but targeted for SQL Server):
How do I sort a VARCHAR column in SQL server that contains words and numbers?
However, the solution proposed did not work with PostgreSQL.
Thanks in advance, regards,
I had the same problem and the following code solves it:
SELECT ...
FROM table
order by
CASE WHEN column < 'A'
THEN lpad(column, size, '0')
ELSE column
END;
The size var is the length of the varchar column, e.g 255 for varying(255).
You can use regular expression to do this kind of thing:
select THECOL from ...
order by
case
when substring(THECOL from '^\d+$') is null then 9999
else cast(THECOL as integer)
end,
THECOL
First you use regular expression to detect whether the content of the column is a number or not. In this case I use '^\d+$' but you can modify it to suit the situation.
If the regexp doesn't match, return a big number so this row will fall to the bottom of the order.
If the regexp matches, convert the string to number and then sort on that.
After this, sort regularly with the column.
I'm not aware of any database having a "natural sort", like some know to exist in PHP. All I've found is various functions:
Natural order sort in Postgres
Comment in the PostgreSQL ORDER BY documentation

SQL query - LEFT 1 = char, RIGHT 3-5 = numbers in Name

I need to filter out junk data in SQL (SQL Server 2008) table. I need to identify these records, and pull them out.
Char[0] = A..Z, a..z
Char[1] = 0..9
Char[2] = 0..9
Char[3] = 0..9
Char[4] = 0..9
{No blanks allowed}
Basically, a clean record will look like this:
T1234, U2468, K123, P50054 (4 record examples)
Junk data looks like this:
T12.., .T12, MARK, TP1, SP2, BFGL, BFPL (7 record examples)
Can someone please assist with a SQL query to do a LEFT and RIGHT method and extract those characters, and do a LIKE IN or something?
A function would be great though!
The following should work in a few different systems:
SELECT *
FROM TheTable
WHERE Data LIKE '[A-Za-z][0-9][0-9][0-9][0-9]%'
AND Data NOT LIKE '% %'
This approach will indeed match P2343, P23423JUNK, and other similar text but requires that the format is A0000*.
Now, if the OP implies a format of 1st position is a character and all succeeding positions are numeric, as in A0+, then use the following (in SQL Server and a good deal of other database systems):
SELECT *
FROM TheTable
WHERE SUBSTRING(Data, 1, 1) LIKE '[A-Za-z]'
AND SUBSTRING(Data, 2, LEN(Data) - 1) NOT LIKE '%[^0-9]%'
AND LEN(Data) >= 5
To incorporate this into a SQL Server 2008 function, since this appears to be what you'd like most, you can write:
CREATE FUNCTION ufn_IsProperFormat(#data VARCHAR(50))
RETURNS BIT
AS
BEGIN
RETURN
CASE
WHEN SUBSTRING(#Data, 1, 1) LIKE '[A-Za-z]'
AND SUBSTRING(#Data, 2, LEN(#Data) - 1) NOT LIKE '%[^0-9]%'
AND LEN(#Data) >= 5 THEN 1
ELSE 0
END
END
...and call into it like so:
SELECT *
FROM TheTable
WHERE dbo.ufn_IsProperFormat(Data) = 1
...this query needs to change for Oracle queries because Oracle doesn't appear to support bracket notation in LIKE clauses:
SELECT *
FROM TheTable
WHERE REGEXP_LIKE(Data, '^[A-za-z]\d{4,}$')
This is the expansion gbn is doing in his answer, but these versions allow for varying string lengths without the OR conditions.
EDIT: Updated to support examples in SQL Server and Oracle for ensuring the format A0+, so that A1324, A2342388, and P2342 match but A2342JUNK and A234 do not.
The Oracle REGEXP_LIKE code was borrowed from Mark's post but updated to support 4 or more numeric digits.
Added a custom SQL Server 2008 approach which implements these techniques.
Depends on your database. Many have regex functions (note examples not tested so check)
e.g. Oracle
SELECT x
FROM table
WHERE REGEXP_LIKE(x, '^[A-za-z][:digit:]{4}$')
Sybase uses LIKE
Given that you're allowing between 3 and 6 digits for the number in your examples then it's probably better to use the ISNUMERIC() function on the 2nd character onwards:
SELECT *
FROM TheTable
-- start with a letter
WHERE Data LIKE '[A-Za-z]%'
-- everything from 2nd character onwards is a number
AND ISNUMERIC( SUBSTRING( Data, 2, 50 ) ) = 1
-- number doesn't have a decimal place
AND Data NOT LIKE '%.%'
For more information look at the ISNUMERIC function on MSDN.
Also note that:
I've limited the 2nd part with the number to 50 characters maximum, change this to suit your needs.
Strictly speaking you should check for currency symbols etc, as ISNUMERIC allows them, as well as +/- and some others
A better option might be to create a function that checks that each character after the first is between 0 and 9 (or 1 and 0 if you're using ASCII codes).
You can't use Regular Expressions in SQL Server, so you have to use OR. Correcting David Andres' answer...
WHERE
(
Data LIKE '[A-Za-z][0-9][0-9][0-9]'
OR
Data LIKE '[A-Za-z][0-9][0-9][0-9][0-9]'
OR
Data LIKE '[A-Za-z][0-9][0-9][0-9][0-9][0-9]'
)
David's answer allows "D1234junk" through
You also only need "[A-Z]" if you don't have case sensitivity

Regular expressions inside SQL Server

I have stored values in my database that look like 5XXXXXX, where X can be any digit. In other words, I need to match incoming SQL query strings like 5349878.
Does anyone have an idea how to do it?
I have different cases like XXXX7XX for example, so it has to be generic. I don't care about representing the pattern in a different way inside the SQL Server.
I'm working with c# in .NET.
You can write queries like this in SQL Server:
--each [0-9] matches a single digit, this would match 5xx
SELECT * FROM YourTable WHERE SomeField LIKE '5[0-9][0-9]'
stored value in DB is: 5XXXXXX [where x can be any digit]
You don't mention data types - if numeric, you'll likely have to use CAST/CONVERT to change the data type to [n]varchar.
Use:
WHERE CHARINDEX(column, '5') = 1
AND CHARINDEX(column, '.') = 0 --to stop decimals if needed
AND ISNUMERIC(column) = 1
References:
CHARINDEX
ISNUMERIC
i have also different cases like XXXX7XX for example, so it has to be generic.
Use:
WHERE PATINDEX('%7%', column) = 5
AND CHARINDEX(column, '.') = 0 --to stop decimals if needed
AND ISNUMERIC(column) = 1
References:
PATINDEX
Regex Support
SQL Server 2000+ supports regex, but the catch is you have to create the UDF function in CLR before you have the ability. There are numerous articles providing example code if you google them. Once you have that in place, you can use:
5\d{6} for your first example
\d{4}7\d{2} for your second example
For more info on regular expressions, I highly recommend this website.
Try this
select * from mytable
where p1 not like '%[^0-9]%' and substring(p1,1,1)='5'
Of course, you'll need to adjust the substring value, but the rest should work...
In order to match a digit, you can use [0-9].
So you could use 5[0-9][0-9][0-9][0-9][0-9][0-9] and [0-9][0-9][0-9][0-9]7[0-9][0-9][0-9]. I do this a lot for zip codes.
SQL Wildcards are enough for this purpose. Follow this link: http://www.w3schools.com/SQL/sql_wildcards.asp
you need to use a query like this:
select * from mytable where msisdn like '%7%'
or
select * from mytable where msisdn like '56655%'