This may be a small one but i could find any , see this is how it is..
I have a sqlserver table with two columns and two rows , one of the column's name is Number and it has two rows with values
1. c7df055e-f8b5-4fc5-9c0a-8f59624c4022
2. 1234
When i query the table with this query select max(Number) from table table_name
Its giving the result c7df055e-f8b5-4fc5-9c0a-8f59624c4022 , So how does MAX calculate the maximum value when any of the values contains characters, i have searched for this and found this
For character columns, MAX finds the highest value in the collating sequence.
But could understand better , so anyone please suggest a better explanation..
Thanks in advance
Collating sequence refers to the definition of how the numeric codes translate to characters. ASCII is a common collating sequence, for example; the byte "65" translates to the character "A", the byte "58" translates to the character "8" etc.
Most languages will compare character by character, comparing the underlying values. So "c" is 99 ASCII, and "1" is 49 ASCII, so the string starting with "c" will be the larger value. In general, lowercase letters are higher than upper case are higher than numbers, and other characters are all over the place.
Your "number" column is a text type (evidenced by presence of alpha and hyphen chars). For text types, sorting is alphabetic, and letters are "higher" than numbers, so the value starting with "c" is greater than one starting with "1".
Sorting has nothing to do with the format if the value: If the first character of the alphanumeric value was a zero, you would have got "1234" as the max.
Related
Is there a way to use Pattern Matching with SQL LIKE, to match first part of letters and a second part of variable number of numbers?
For example, I want to select only ABC1002, ABC23, ABC569, CDE48569.
Here is one method:
where col like '[A-Z][A-Z][A-Z][0-9]%' and
col not like '[A-Z][A-Z][A-Z]%[^0-9]%'
The logic says:
The column starts with three letters and a digit.
Nothing other than a digit follows the three letters.
I am using the query
select max(entry_no) from tbl_Invmaster
but its giving me ans 9 however the max value is 10.
You probably have the numbers in a VARCHAR column. Ordering in those fields is by alphabetcal order. That way 9 is bigger than 10. Explanation from the link:
To determine which of two strings comes first in alphabetical order, their first letters are compared. If they differ, then the string whose first letter comes earlier in the alphabet is the one which comes first in alphabetical order. If the first letters are the same, then the second letters are compared, and so on. If a position is reached where one string has no more letters to compare while the other does, then the first (shorter) string is deemed to come first in alphabetical order.
Your best solution is not to store numbers in VARCHAR columns but instead use the appropriate type, eg INT. That way your query would return the correct result.
If that is not an option for you, you could CAST the column to an integer type. Eg in SQL Server you would write:
select max(CAST(entry_no AS INT)) from tbl_Invmaster
select max( to_number( entry_no )) from tbl_invmaster
I have a list of values coming from a PGSQL database that looks something like this:
198
199
1S
2
20
997
998
999
C1
C10
A
I'm looking to parse this field a bit into individual components, which I assume would take two regexp_replace function uses in my SQL. Essentially, any non-numeric character that appears before numeric ones needs to be returned for one column, and the other column would show all non-numeric characters appearing AFTER numeric ones.
The above list would then be split into this layout as the result from PG:
I have created a function that strips out the non-numeric characters (the last column) and casts it as an Integer, but I can't figure out the regex to return the string values prior to the number, or those found after the number.
All I could come up with so far, with my next to non-existant regex knowledge, was this: regexp_replace(fieldname, '[^A-Z]+', '', 'g'), which just strips out anything not A-Z, but I can;t get to to work with strings before numeric values, or after them.
For extracting the characters before the digits:
regexp_replace(fieldname, '\d.*$', '')
For extracting the characters after the digits:
regexp_replace(fieldname, '^([^\d]*\d*)', '')
Note that:
if there are no digits, the first will return the original value and then second an empty string. This way you are sure that the concatenation is equal to the original value in this case also.
the concatenation of the three parts will not return the original if there are non-numerical characters surrounded by digits: those will be lost.
This also works for any non-alphanumeric characters like #, [, ! ...etc.
Final SQL
select
fieldname as original,
regexp_replace(fieldname, '\d.*$', '') as before_s,
regexp_replace(fieldname, '^([^\d]*\d*)', '') as after_s,
cast(nullif(regexp_replace(fieldname, '[^\d]', '', 'g'), '') as integer) as number
from mytable;
See fiddle.
This answer relies on information you delivered, which is
Essentially, any non-numeric character that appears before numeric
ones needs to be returned for one column, and the other column would
show all non-numeric characters appearing AFTER numeric ones.
Everything non-numeric before a numeric value into 1 column
Everything non-numeric after a numeric value into 2 column
So there's assumption that you have a value that has a numeric value in it.
select
val,
regexp_matches(val,'([a-zA-Z]*)\d+') AS before_numeric,
regexp_matches(val,'\d+([a-zA-Z]*)') AS after_numeric
from
val;
Attached SQLFiddle for a preview.
Hello I am confused according to string comparison in sql.
select * from table where column1 = 'abc';
As I understand the string 'abc' is converted to a number let us pretend (1+2+3=6) for this example.
This means that
select * from table where column1 = 'cba';
will also have the same value 6. The Strings are not the same. Please enlighten me.
Edit: Because you think this is a joke.
"The character letter King is converted to a numeric representation. Assuming a US7ASCII database character set with AMERICAN NLS settings, the literal king is converted into a sum of its ordinal character values: K+i+n+g = (75+105+110+103=393)."
This is the exact text from a book that made me confused.
you rather see it like this
a= 00000100
b= 00010000
c= 01100100
abc= 000001000001000001100100
cba= 011001000001000000000100
Thus not the same
The quote seems to be from page 31 of chapter 9 of this OCA/OCP Oracle Database 11g All-in-One Exam Guide. This appears to be incorrect (being kind), since if it worked like then abc and cba would indeed be seen as equivalent.
The 11gR2 SQL language reference says:
In binary comparison, which is the default, Oracle compares character
strings according to the concatenated value of the numeric codes of
the characters in the database character set. One character is greater
than another if it has a greater numeric value than the other in the
character set.
The key difference is phrase 'the concatenated value', i.e. closer to what #JoroenMoonen demonstrated, where the numeric codes from the character set are pieced together; and not the sum of the values as the book showed.
But it would be misleading to think of the numeric codes for each character being concatenated and the resulting (potentially very long!) string representing a number which is compared. Taking those values, abc = 000001000001000001100100 = 266340, and cba = 011001000001000000000100 = 6557700. Just comparing 6557700 with 266340 would indeed show that cba is 'greater than' abc. But cb is also 'greater than' abc - select greatest('abc', 'cb') from dual - and if you do the same conversion you get cb = 0110010000010000 = 25616, which as a number is clearly less than 266340.
I think it's actually better explained in the equivalent 10gR1 documentation:
Oracle compares two values character by character up to the first
character that differs. The value with the greater character in that
position is considered greater. If two values of different length are
identical up to the end of the shorter one, then the longer value is
considered greater. If two values of equal length have no differing
characters, then the values are considered equal.
So, assuming ASCII, c (99) is greater than a (97), so it doesn't need to look at any further characters in either string. This can never see abc and cba as equivalent.
Anyway, you're quite right to be confused by the book's explanation.
I am reading basics of oracle and came across strange statement. I don't know how much true it is.
Statement says
" String value '2' is greater than String value '100'. Character
'1' is less than Character '10'. "
Kindly throw some light on above topic. I understand that internally comparison must be happening using ASCII values. I am seeking some good logical explanation.
It means that numbers treated as strings are not sorted in numerical order but in lexical order, the same way words are sorted in a dictionary. That is, characters are compared one at a time from the left side.
In your first example, "2" is greater than "100" because the '2' is compared to the '1' and found to be bigger. Compare this to the ordering of "C" and "BAA" in a dictionary.
In your second example, "1" is less than "10" because the "1" fully matches the "1" in the left side of "10", but the "10" has characters following the match. Therefore it is greater. Again, compare this to the ordering of "B" and "BA" in a dictionary.
You are exactly correct in assuming that they are sorted by ASCII values - this is called an alphabetic sort. The strings are sorted not as numeric values but as text.
The alphabetic sort compares the values position by position. When comparing the string '2' with the string '100' it starts by comparing '2' with '1'. '2' comes after '1' (the ASCII values of '2' is greater than the ASCII value of '1') alphabetically so the comparison stops so '100' will be listed before '2' in an alphabetic sort. This is exactly equivalent as comparing 'b' to 'azz' - since 'a' comes before 'b', 'azz' will be sorted before 'b'.
Your text is pointing this out because this behavior while understandable is non-intuitive. You would expect a sort to place '100' after '2' since 2 < 100, but that is not was the sort does.