Questions of string comparison in oracle - sql

I am reading basics of oracle and came across strange statement. I don't know how much true it is.
Statement says
" String value '2' is greater than String value '100'. Character
'1' is less than Character '10'. "
Kindly throw some light on above topic. I understand that internally comparison must be happening using ASCII values. I am seeking some good logical explanation.

It means that numbers treated as strings are not sorted in numerical order but in lexical order, the same way words are sorted in a dictionary. That is, characters are compared one at a time from the left side.
In your first example, "2" is greater than "100" because the '2' is compared to the '1' and found to be bigger. Compare this to the ordering of "C" and "BAA" in a dictionary.
In your second example, "1" is less than "10" because the "1" fully matches the "1" in the left side of "10", but the "10" has characters following the match. Therefore it is greater. Again, compare this to the ordering of "B" and "BA" in a dictionary.

You are exactly correct in assuming that they are sorted by ASCII values - this is called an alphabetic sort. The strings are sorted not as numeric values but as text.
The alphabetic sort compares the values position by position. When comparing the string '2' with the string '100' it starts by comparing '2' with '1'. '2' comes after '1' (the ASCII values of '2' is greater than the ASCII value of '1') alphabetically so the comparison stops so '100' will be listed before '2' in an alphabetic sort. This is exactly equivalent as comparing 'b' to 'azz' - since 'a' comes before 'b', 'azz' will be sorted before 'b'.
Your text is pointing this out because this behavior while understandable is non-intuitive. You would expect a sort to place '100' after '2' since 2 < 100, but that is not was the sort does.

Related

SQL: Using <= and >= to compare string with wildcard

Assuming I have table that looks like this:
Id | Name | Age
=====================
1 | Jose | 19
2 | Yolly | 26
20 | Abby | 3
29 | Tara | 4
And my query statement is:
1) Select * from thisTable where Name <= '*Abby';
it returns 0 row
2) Select * from thisTable where Name <= 'Abby';
returns row with Abby
3) Select * from thisTable where Name >= 'Abby';
returns all rows // row 1-4
4) Select * from thisTable where Name >= '*Abby';
returns all rows; // row 1-4
5) Select * from thisTable where Name >= '*Abby' and Name <= "*Abby";
returns 0 row.
6) Select * from thisTable where Name >= 'Abby' and Name <= 'Abby';
returns row with Abby;
My question: why I got these results? How does the wildcard affect the result of query? Why don't I get any result if the condition is this Name <= '*Abby' ?
Wildcards are only interpreted when you use LIKE opterator.
So when you are trying to compare against the string, it will be treated literally. So in your comparisons lexicographical order is used.
1) There are no letters before *, so you don't have any rows returned.
2) A is first letter in alphabet, so rest of names are bigger then Abby, only Abby is equal to itself.
3) Opposite of 2)
4) See 1)
5) See 1)
6) This condition is equivalent to Name = 'Abby'.
When working with strings in SQL Server, ordering is done at each letter, and the order those letters are sorted in depends on the collation. For some characters, the sorting method is much easier to understand, It's alphabetical or numerical order: For example 'a' < 'b' and '4' > '2'. Depending on the collation this might be done by letter and then case ('AaBbCc....') or might be Case then letter ('ABC...Zabc').
Let's take a string like 'Abby', this would be sorted in the order of the letters A, b, b, y (the order they would appear would be according to your collation, and i don't know what it is, but I'm going to assume a 'AaBbCc....' collation, as they are more common). Any string starting with something like 'Aba' would have a value sell than 'Abby', as the third character (the first that differs) has a "lower value". As would a value like 'Abbie' ('i' has a lower value than 'y'). Similarly, a string like 'Abc' would have a greater value, as 'c' has a higher value than 'b' (which is the first character that differs).
If we throw numbers into the mix, then you might be surpised. For example the string (important, I didn't state number) '123456789' has a lower value than the string '9'. This is because the first character than differs if the first character. '9' is greater than '1' and so '9' has the "higher" value. This is one reason why it's so important to ensure you store numbers as numerical datatypes, as the behaviour is unlikely to be what you expect/want otherwise.
To what you are asking, however, the wildcard for SQL Server is '%' and '_' (there is also '^',m but I won't cover that here). A '%' represents multiple characters, while '_' a single character. If you want to specifically look for one of those character you have to quote them in brackets ([]).
Using the equals (=) operator won't parse wildcards. you need to use a function that does, like LIKE. Thus, if you want a word that started with 'A' you would use the expression WHERE ColumnName LIKE 'A%'. If you wanted to search for one that consisted of 6 characters and ended with 'ed' you would use WHERE ColumnName LIKE '____ed'.
Like I said before, if you want to search for one of those specific character, you quote then. So, if you wanted to search for a string that contained an underscore, the syntax would be WHERE ColumnName LIKE '%[_]%'
Edit: it's also worth noting that, when using things like LIKE that they are effected by the collations sensitivity; for example, Case and Accent. If you're using a case sensitive collation, for example, then the statement WHERE 'Abby' LIKE 'abb%' is not true, and 'A' and 'a' are not the same case. Like wise, the statement WHERE 'Covea' = 'Covéa' would be false in an accent sensitive collation ('e' and 'é' are not treated as the same character).
A wildcard character is used to substitute any other characters in a string. They are used in conjunction with the SQL LIKE operator in the WHERE clause. For example.
Select * from thisTable WHERE name LIKE '%Abby%'
This will return any values with Abby anywhere within the string.
Have a look at this link for an explanation of all wildcards https://www.w3schools.com/sql/sql_wildcards.asp
It is because, >= and <= are comparison operators. They compare string on the basis of their ASCII values.
Since ASCII value of * is 42 and ASCII values of capital letters start from 65, that is why when you tried name<='*Abby', sql-server picked the ASCII value of first character in your string (that is 42), since no value in your data has first character with ASCII value less than 42, no data got selected.
You can refer ASCII table for more understanding:
http://www.asciitable.com/
There are a few answers, and a few comments - I'll try to summarize.
Firstly, the wildcard in SQL is %, not * (for multiple matches). So your queries including an * ask for a comparison with that literal string.
Secondly, comparing strings with greater/less than operators probably does not do what you want - it uses the collation order to see which other strings are "earlier" or "later" in the ordering sequence. Collation order is a moderately complex concept, and varies between machine installations.
The SQL operator for string pattern matching is LIKE.
I'm not sure I understand your intent with the >= or <= stateements - do you mean that you want to return rows where the name's first letter is after 'A' in the alphabet?

Not selecting Max Value

I am using the query
select max(entry_no) from tbl_Invmaster
but its giving me ans 9 however the max value is 10.
You probably have the numbers in a VARCHAR column. Ordering in those fields is by alphabetcal order. That way 9 is bigger than 10. Explanation from the link:
To determine which of two strings comes first in alphabetical order, their first letters are compared. If they differ, then the string whose first letter comes earlier in the alphabet is the one which comes first in alphabetical order. If the first letters are the same, then the second letters are compared, and so on. If a position is reached where one string has no more letters to compare while the other does, then the first (shorter) string is deemed to come first in alphabetical order.
Your best solution is not to store numbers in VARCHAR columns but instead use the appropriate type, eg INT. That way your query would return the correct result.
If that is not an option for you, you could CAST the column to an integer type. Eg in SQL Server you would write:
select max(CAST(entry_no AS INT)) from tbl_Invmaster
select max( to_number( entry_no )) from tbl_invmaster

MAX in SQLSERVER

This may be a small one but i could find any , see this is how it is..
I have a sqlserver table with two columns and two rows , one of the column's name is Number and it has two rows with values
1. c7df055e-f8b5-4fc5-9c0a-8f59624c4022
2. 1234
When i query the table with this query select max(Number) from table table_name
Its giving the result c7df055e-f8b5-4fc5-9c0a-8f59624c4022 , So how does MAX calculate the maximum value when any of the values contains characters, i have searched for this and found this
For character columns, MAX finds the highest value in the collating sequence.
But could understand better , so anyone please suggest a better explanation..
Thanks in advance
Collating sequence refers to the definition of how the numeric codes translate to characters. ASCII is a common collating sequence, for example; the byte "65" translates to the character "A", the byte "58" translates to the character "8" etc.
Most languages will compare character by character, comparing the underlying values. So "c" is 99 ASCII, and "1" is 49 ASCII, so the string starting with "c" will be the larger value. In general, lowercase letters are higher than upper case are higher than numbers, and other characters are all over the place.
Your "number" column is a text type (evidenced by presence of alpha and hyphen chars). For text types, sorting is alphabetic, and letters are "higher" than numbers, so the value starting with "c" is greater than one starting with "1".
Sorting has nothing to do with the format if the value: If the first character of the alphanumeric value was a zero, you would have got "1234" as the max.

Comparing Strings in where clause

Hello I am confused according to string comparison in sql.
select * from table where column1 = 'abc';
As I understand the string 'abc' is converted to a number let us pretend (1+2+3=6) for this example.
This means that
select * from table where column1 = 'cba';
will also have the same value 6. The Strings are not the same. Please enlighten me.
Edit: Because you think this is a joke.
"The character letter King is converted to a numeric representation. Assuming a US7ASCII database character set with AMERICAN NLS settings, the literal king is converted into a sum of its ordinal character values: K+i+n+g = (75+105+110+103=393)."
This is the exact text from a book that made me confused.
you rather see it like this
a= 00000100
b= 00010000
c= 01100100
abc= 000001000001000001100100
cba= 011001000001000000000100
Thus not the same
The quote seems to be from page 31 of chapter 9 of this OCA/OCP Oracle Database 11g All-in-One Exam Guide. This appears to be incorrect (being kind), since if it worked like then abc and cba would indeed be seen as equivalent.
The 11gR2 SQL language reference says:
In binary comparison, which is the default, Oracle compares character
strings according to the concatenated value of the numeric codes of
the characters in the database character set. One character is greater
than another if it has a greater numeric value than the other in the
character set.
The key difference is phrase 'the concatenated value', i.e. closer to what #JoroenMoonen demonstrated, where the numeric codes from the character set are pieced together; and not the sum of the values as the book showed.
But it would be misleading to think of the numeric codes for each character being concatenated and the resulting (potentially very long!) string representing a number which is compared. Taking those values, abc = 000001000001000001100100 = 266340, and cba = 011001000001000000000100 = 6557700. Just comparing 6557700 with 266340 would indeed show that cba is 'greater than' abc. But cb is also 'greater than' abc - select greatest('abc', 'cb') from dual - and if you do the same conversion you get cb = 0110010000010000 = 25616, which as a number is clearly less than 266340.
I think it's actually better explained in the equivalent 10gR1 documentation:
Oracle compares two values character by character up to the first
character that differs. The value with the greater character in that
position is considered greater. If two values of different length are
identical up to the end of the shorter one, then the longer value is
considered greater. If two values of equal length have no differing
characters, then the values are considered equal.
So, assuming ASCII, c (99) is greater than a (97), so it doesn't need to look at any further characters in either string. This can never see abc and cba as equivalent.
Anyway, you're quite right to be confused by the book's explanation.

SQL String contains ONLY

I have a table with a field that denotes whether the data in that row is valid or not. This field contains a string of undetermined length. I need a query that will only pull out rows where all the characters in this field are N. Some possible examples of this field.
NNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNN
NNNNNEEEENNNNNNNNNNNN
NNNNNOOOOOEEEENNNNNNNNNNNN
Any suggestions on a postcard please.
Many thanks
This should do the trick:
SELECT Field
FROM YourTable
WHERE Field NOT LIKE '%[^N]%' AND Field <> ''
What it's doing is a wildcard search, broken down:
The LIKE will find records where the field contains characters other than N in the field. So, we apply a NOT to that as we're only interested in records that do not contain characters other than N. Plus a condition to filter out blank values.
SELECT *
FROM mytable
WHERE field NOT LIKE '%[^N]%'
I don't know which SQL dialect you are using. For example Oracle has several functions you may use. With oracle you could use condition like :
WHERE LTRIM(field, 'N') = ''
The idea is to trim out all N's and see if the result is empty string. If you don't have LTRIM, check if you have some kind of TRANSLATE or REPLACE function to do the same thing.
Another way to do it could be to pick length of your field and then construct comparator value by padding empty string with N. Perhaps something like:
WHERE field = RPAD('', field, 'N)
Oracle pads that empty string with N's and picks number of pad characters from length of the second argument. Perhaps this works too:
WHERE field = RPAD('', LENGTH(field), 'N)
I haven't tested those, but hopefully that give you some ideas how to solve your problem. I guess that many of these solutions have bad performance if you have lot of rows and you don't have other WHERE conditions to select proper index.