String comparison in SQL Server 2008 - sql

Does SQL Server 2008 have a string comparison method that checks which string is supposed to come first (ex 'abc' comes before 'abd' etc)? I need to do a <= comparison.

In what context? <= works in a SELECT statement.

<= works fine. The problem you're having is that you're expecting numeric sorting out of strings. That doesn't work without special handling.
String Sorting
a1 - a10 strings sort in this order:
a1
a10
a2
a3
a4
...
This is because both a1 and a10 start with "a1".
Since they're strings the numeric values are irrelevant. Look what happens when we substitute a-z for 0-9:
ab
aba
ac
ad
ae
Can you see now why you're getting the results you are? In a dictionary, aba comes before ac, and a10 comes before a2.
To solve your problem it's best to split your column into two: one char and one a number. Some unpleasant expressions can get the right sort order for you, but it's a much worse solution unless you have absolutely no choice.
Here's one way. It may not suit or there may be a more efficient way, but I don't know what all your data is like.
SELECT
FROM Table
WHERE
Col LIKE 'a%'
AND Substring(Col, Convert(int, PatIndex('%[^a-z]%', Col + '0'), 1000)) <= 10
If the alpha part is always one character you can do it more simply. If the numbers can have letters after them then more twiddling is needed.
You could also try a derived table that splits the column into its separate alpha and numeric parts, then put conditions in the outer query.
Collation
Be aware each string and char-based column has a collation setting that determines what letters are sorted together (mostly for case and accents) and this can change the results of an inequality operation.
SELECT *
FROM Table
WHERE Value <= 'abc'
SELECT CASE WHEN Value <= 'abc' COLLATE Latin1_General_CS_AS_KS_WS THEN 1 ELSE 0 END
FROM Table
The collation I used there is case sensitive, accent sensitive.
You can see all the collations available to you like so:
SELECT *
FROM ::fn_helpcollations()

Related

SQL: Using <= and >= to compare string with wildcard

Assuming I have table that looks like this:
Id | Name | Age
=====================
1 | Jose | 19
2 | Yolly | 26
20 | Abby | 3
29 | Tara | 4
And my query statement is:
1) Select * from thisTable where Name <= '*Abby';
it returns 0 row
2) Select * from thisTable where Name <= 'Abby';
returns row with Abby
3) Select * from thisTable where Name >= 'Abby';
returns all rows // row 1-4
4) Select * from thisTable where Name >= '*Abby';
returns all rows; // row 1-4
5) Select * from thisTable where Name >= '*Abby' and Name <= "*Abby";
returns 0 row.
6) Select * from thisTable where Name >= 'Abby' and Name <= 'Abby';
returns row with Abby;
My question: why I got these results? How does the wildcard affect the result of query? Why don't I get any result if the condition is this Name <= '*Abby' ?
Wildcards are only interpreted when you use LIKE opterator.
So when you are trying to compare against the string, it will be treated literally. So in your comparisons lexicographical order is used.
1) There are no letters before *, so you don't have any rows returned.
2) A is first letter in alphabet, so rest of names are bigger then Abby, only Abby is equal to itself.
3) Opposite of 2)
4) See 1)
5) See 1)
6) This condition is equivalent to Name = 'Abby'.
When working with strings in SQL Server, ordering is done at each letter, and the order those letters are sorted in depends on the collation. For some characters, the sorting method is much easier to understand, It's alphabetical or numerical order: For example 'a' < 'b' and '4' > '2'. Depending on the collation this might be done by letter and then case ('AaBbCc....') or might be Case then letter ('ABC...Zabc').
Let's take a string like 'Abby', this would be sorted in the order of the letters A, b, b, y (the order they would appear would be according to your collation, and i don't know what it is, but I'm going to assume a 'AaBbCc....' collation, as they are more common). Any string starting with something like 'Aba' would have a value sell than 'Abby', as the third character (the first that differs) has a "lower value". As would a value like 'Abbie' ('i' has a lower value than 'y'). Similarly, a string like 'Abc' would have a greater value, as 'c' has a higher value than 'b' (which is the first character that differs).
If we throw numbers into the mix, then you might be surpised. For example the string (important, I didn't state number) '123456789' has a lower value than the string '9'. This is because the first character than differs if the first character. '9' is greater than '1' and so '9' has the "higher" value. This is one reason why it's so important to ensure you store numbers as numerical datatypes, as the behaviour is unlikely to be what you expect/want otherwise.
To what you are asking, however, the wildcard for SQL Server is '%' and '_' (there is also '^',m but I won't cover that here). A '%' represents multiple characters, while '_' a single character. If you want to specifically look for one of those character you have to quote them in brackets ([]).
Using the equals (=) operator won't parse wildcards. you need to use a function that does, like LIKE. Thus, if you want a word that started with 'A' you would use the expression WHERE ColumnName LIKE 'A%'. If you wanted to search for one that consisted of 6 characters and ended with 'ed' you would use WHERE ColumnName LIKE '____ed'.
Like I said before, if you want to search for one of those specific character, you quote then. So, if you wanted to search for a string that contained an underscore, the syntax would be WHERE ColumnName LIKE '%[_]%'
Edit: it's also worth noting that, when using things like LIKE that they are effected by the collations sensitivity; for example, Case and Accent. If you're using a case sensitive collation, for example, then the statement WHERE 'Abby' LIKE 'abb%' is not true, and 'A' and 'a' are not the same case. Like wise, the statement WHERE 'Covea' = 'Covéa' would be false in an accent sensitive collation ('e' and 'é' are not treated as the same character).
A wildcard character is used to substitute any other characters in a string. They are used in conjunction with the SQL LIKE operator in the WHERE clause. For example.
Select * from thisTable WHERE name LIKE '%Abby%'
This will return any values with Abby anywhere within the string.
Have a look at this link for an explanation of all wildcards https://www.w3schools.com/sql/sql_wildcards.asp
It is because, >= and <= are comparison operators. They compare string on the basis of their ASCII values.
Since ASCII value of * is 42 and ASCII values of capital letters start from 65, that is why when you tried name<='*Abby', sql-server picked the ASCII value of first character in your string (that is 42), since no value in your data has first character with ASCII value less than 42, no data got selected.
You can refer ASCII table for more understanding:
http://www.asciitable.com/
There are a few answers, and a few comments - I'll try to summarize.
Firstly, the wildcard in SQL is %, not * (for multiple matches). So your queries including an * ask for a comparison with that literal string.
Secondly, comparing strings with greater/less than operators probably does not do what you want - it uses the collation order to see which other strings are "earlier" or "later" in the ordering sequence. Collation order is a moderately complex concept, and varies between machine installations.
The SQL operator for string pattern matching is LIKE.
I'm not sure I understand your intent with the >= or <= stateements - do you mean that you want to return rows where the name's first letter is after 'A' in the alphabet?

postgres replace calculated value in text

I have a table column numbers containing strings like:
1, 2, 2A, 14, 14A, 20
Listed in the desired ascending sort order.
How can I formulate an ORDER BY clause to achieve this order?
Per default, postgres has to resort to alphabetical order which would be:
1, 2, 14, 20, 2A, 14A
Can this be done using only the string-manipulation features that come with Postgres? (replace(), regex_replace() etc?)
My first idea was:
cut the letter, if present
number * 100
add ascii of letter, if present
This would yield the desired result as the mapped values would be:
100, 200, 265, 1400, 1465, 2000
I could also index this manipulated value to speed up sorting.
Additional restrictions:
I cannot use casts to hex numbers, because eg.: 14Z is valid too.
Ideally, the result is a single expression. I'd need to use this transformation for filtering and sorting like:
SELECT * FROM table WHERE transform(numbers) < 15 ORDER BY transform(numbers)
RESULT:
1, 2, 2A, 14, 14A
I tried to implement my idea, using what I learned from #klin's answer:
Cut the letter and multiply number by 100:
substring('12A' from '(\d+).*')::int*100
Cut the numbers and get ASCII of letter:
ascii(substring('12A' from '\d+([A-Z])'))
Add the two.
This works fine with 12A, but does not work with 12, as the second expression returns NULL and not 0 (numeric zero). Any ideas?
Based on these assumptions:
Numbers consist of digits and optionally one pending letter and nothing else.
There is always at least one leading digit.
All letters are either upper case [A-Z] or lower case [a-z], but not mixed.
I would enforce that with a CHECK constraint on the table column to be absolutely reliable.
Create a tiny IMMUTABLE SQL function:
CREATE OR REPLACE FUNCTION f_nr2sort(text)
RETURNS int AS
$func$
SELECT CASE WHEN right($1, 1) > '9' COLLATE "C" -- no collation
THEN left($1, -1)::int * 100 + ascii(right($1, 1))
ELSE $1::int * 100 END -- only digits
$func$ LANGUAGE SQL IMMUTABLE;
Optimized for performance based on above assumptions. I replaced all regular expressions with the much cheaper left() and right().
I disabled collation rules with COLLATE "C" for the CASE expression (it's cheaper, too) to assure default byte order of ASCII letters. Letters in [a-zA-Z] sort above '9' and if that's the case for the last letter, we proceed accordingly.
This way we avoid adding NULL values and don't need to fix with COALESCE.
Then your query can be:
SELECT *
FROM tbl
WHERE f_nr2sort(numbers) < f_nr2sort('15C')
ORDER BY f_nr2sort(numbers);
Since the function is IMMUTABLE, you can even create a simple functional index to support this class of queries:
CREATE INDEX tbl_foo_id ON tbl (f_nr2sort(numbers));
I am new at PostgreSQL, but I found this very useful post:
Alphanumeric sorting with PostgreSQL
So what about something like this:
select val
from test
order by (substring(val, '^[0-9]+'))::int, substring(val, '[^0-9_].*$') desc
Hope it helps

How to make to_number ignore non-numerical values

Column xy of type 'nvarchar2(40)' in table ABC.
Column consists mainly of numerical Strings
how can I make a
select to_number(trim(xy)) from ABC
query, that ignores non-numerical strings?
In general in relational databases, the order of evaluation is not defined, so it is possible that the select functions are called before the where clause filters the data. I know this is the case in SQL Server. Here is a post that suggests that the same can happen in Oracle.
The case statement, however, does cascade, so it is evaluated in order. For that reason, I prefer:
select (case when NOT regexp_like(xy,'[^[:digit:]]') then to_number(xy)
end)
from ABC;
This will return NULL for values that are not numbers.
You could use regexp_like to find out if it is a number (with/without plus/minus sign, decimal separator followed by at least one digit, thousand separators in the correct places if any) and use it like this:
SELECT TO_NUMBER( CASE WHEN regexp_like(xy,'.....') THEN xy ELSE NULL END )
FROM ABC;
However, as the built-in function TO_NUMBER is not able to deal with all numbers (it fails at least when a number contains thousand separators), I would suggest to write a PL/SQL function TO_NUMBER_OR_DEFAULT(numberstring, defaultnumber) to do what you want.
EDIT: You may want to read my answer on using regexp_like to determine if a string contains a number here: https://stackoverflow.com/a/21235443/2270762.
You can add WHERE
SELECT TO_NUMBER(TRIM(xy)) FROM ABC WHERE REGEXP_INSTR(email, '[A-Za-z]') = 0
The WHERE is ignoring columns with letters. See the documentation

Comparing Strings in where clause

Hello I am confused according to string comparison in sql.
select * from table where column1 = 'abc';
As I understand the string 'abc' is converted to a number let us pretend (1+2+3=6) for this example.
This means that
select * from table where column1 = 'cba';
will also have the same value 6. The Strings are not the same. Please enlighten me.
Edit: Because you think this is a joke.
"The character letter King is converted to a numeric representation. Assuming a US7ASCII database character set with AMERICAN NLS settings, the literal king is converted into a sum of its ordinal character values: K+i+n+g = (75+105+110+103=393)."
This is the exact text from a book that made me confused.
you rather see it like this
a= 00000100
b= 00010000
c= 01100100
abc= 000001000001000001100100
cba= 011001000001000000000100
Thus not the same
The quote seems to be from page 31 of chapter 9 of this OCA/OCP Oracle Database 11g All-in-One Exam Guide. This appears to be incorrect (being kind), since if it worked like then abc and cba would indeed be seen as equivalent.
The 11gR2 SQL language reference says:
In binary comparison, which is the default, Oracle compares character
strings according to the concatenated value of the numeric codes of
the characters in the database character set. One character is greater
than another if it has a greater numeric value than the other in the
character set.
The key difference is phrase 'the concatenated value', i.e. closer to what #JoroenMoonen demonstrated, where the numeric codes from the character set are pieced together; and not the sum of the values as the book showed.
But it would be misleading to think of the numeric codes for each character being concatenated and the resulting (potentially very long!) string representing a number which is compared. Taking those values, abc = 000001000001000001100100 = 266340, and cba = 011001000001000000000100 = 6557700. Just comparing 6557700 with 266340 would indeed show that cba is 'greater than' abc. But cb is also 'greater than' abc - select greatest('abc', 'cb') from dual - and if you do the same conversion you get cb = 0110010000010000 = 25616, which as a number is clearly less than 266340.
I think it's actually better explained in the equivalent 10gR1 documentation:
Oracle compares two values character by character up to the first
character that differs. The value with the greater character in that
position is considered greater. If two values of different length are
identical up to the end of the shorter one, then the longer value is
considered greater. If two values of equal length have no differing
characters, then the values are considered equal.
So, assuming ASCII, c (99) is greater than a (97), so it doesn't need to look at any further characters in either string. This can never see abc and cba as equivalent.
Anyway, you're quite right to be confused by the book's explanation.

SQL: insert space before numbers in string

I have a nvarchar field in my table, which contains all sorts of strings.
In case there are strings which contain a number following a non-number sign, I want to insert a space before that number.
That is - if a certain entry in that field is abc123, it should be turned into abc 123, or ab12.34 should become ab 12. 34.I want this to be done throughout the entire table.
What's the best way to achieve it?
You can try something like that:
select left(col,PATINDEX('%[0-9]%',col)-1 )+space(1)+
case
when PATINDEX('%[.]%',col)<>0
then substring(col,PATINDEX('%[0-9]%',col),len(col)+1-PATINDEX('%[.]%',col))
+space(1)+
substring(col,PATINDEX('%[.]%',col)+1,len(col)+1-PATINDEX('%[.]%',col))
else substring(col,PATINDEX('%[0-9]%',col),len(col)+1-PATINDEX('%[0-9]%',col))
end
from tab
It's not simply, but I hope it will help you.
SQL Fiddle
I used functions (link to MSDN):
LEFT, PATINDEX, SPACE, SUBSTRING, LEN
and regular expression.