Comparing Strings in where clause - sql

Hello I am confused according to string comparison in sql.
select * from table where column1 = 'abc';
As I understand the string 'abc' is converted to a number let us pretend (1+2+3=6) for this example.
This means that
select * from table where column1 = 'cba';
will also have the same value 6. The Strings are not the same. Please enlighten me.
Edit: Because you think this is a joke.
"The character letter King is converted to a numeric representation. Assuming a US7ASCII database character set with AMERICAN NLS settings, the literal king is converted into a sum of its ordinal character values: K+i+n+g = (75+105+110+103=393)."
This is the exact text from a book that made me confused.

you rather see it like this
a= 00000100
b= 00010000
c= 01100100
abc= 000001000001000001100100
cba= 011001000001000000000100
Thus not the same

The quote seems to be from page 31 of chapter 9 of this OCA/OCP Oracle Database 11g All-in-One Exam Guide. This appears to be incorrect (being kind), since if it worked like then abc and cba would indeed be seen as equivalent.
The 11gR2 SQL language reference says:
In binary comparison, which is the default, Oracle compares character
strings according to the concatenated value of the numeric codes of
the characters in the database character set. One character is greater
than another if it has a greater numeric value than the other in the
character set.
The key difference is phrase 'the concatenated value', i.e. closer to what #JoroenMoonen demonstrated, where the numeric codes from the character set are pieced together; and not the sum of the values as the book showed.
But it would be misleading to think of the numeric codes for each character being concatenated and the resulting (potentially very long!) string representing a number which is compared. Taking those values, abc = 000001000001000001100100 = 266340, and cba = 011001000001000000000100 = 6557700. Just comparing 6557700 with 266340 would indeed show that cba is 'greater than' abc. But cb is also 'greater than' abc - select greatest('abc', 'cb') from dual - and if you do the same conversion you get cb = 0110010000010000 = 25616, which as a number is clearly less than 266340.
I think it's actually better explained in the equivalent 10gR1 documentation:
Oracle compares two values character by character up to the first
character that differs. The value with the greater character in that
position is considered greater. If two values of different length are
identical up to the end of the shorter one, then the longer value is
considered greater. If two values of equal length have no differing
characters, then the values are considered equal.
So, assuming ASCII, c (99) is greater than a (97), so it doesn't need to look at any further characters in either string. This can never see abc and cba as equivalent.
Anyway, you're quite right to be confused by the book's explanation.

Related

SQL Like and Wildcard with Oracle SQL Developer

I am using SQL Developer over a backend Oracle DB. I have a record where the buyer name is Pete Hansen.
Why I try
select * from data1 where buyer = 'Pete Hansen';
I get the result, no problem. However, when I use the following, I do not get any results:
select * from data1 where buyer like 'Pete Hansen';
Also, when I try the following, I do not get any results
select * from data1 where buyer like 'Pete Hans_n';
but the following works well:
select * from data1 where buyer like 'Pete Hans_n%';
Could you please help me understand? Thanks in advance.
Your buyer column seems to be defined as char; you can see the issue reproduced in this db<>fiddle, but not when the column is varchar2.
The documentation for character comparison explains the difference between blank-padded or nonpadded comparison semantics. When you compare them with =, because both the column and the string literal are `char, blank-padded semantics are used:
With blank-padded semantics, if the two values have different lengths, then Oracle first adds blanks to the end of the shorter one so their lengths are equal. ... If two values have no differing characters, then they are considered equal. This rule means that two values are equal if they differ only in the number of trailing blanks. Oracle uses blank-padded comparison semantics only when both values in the comparison are either expressions of data type CHAR, NCHAR, text literals, or values returned by the USER function.
When the column is `varchar2 then nonpadded semantics are used:
With nonpadded semantics, ... If two values of equal length have no differing characters, then the values are considered equal. Oracle uses nonpadded comparison semantics whenever one or both values in the comparison have the data type VARCHAR2 or NVARCHAR2.
LIKE works differently. Only your final pattern with % matches, because that is allowing for the trailing spaces in the char value, while the other two patterns do not. With the varchar2 version there aren't any trailing spaces to the other two patterns also match.
It's unusual to need or want to user char columns; varchar2 is more usual. Tom Kyte opined on this many years ago.
I suspect it may have to do with trailing white spaces, which like operator is not forgiving about but = is. To test that, try
select * from data1 where trim(buyer) like 'Pete Hansen';

Sql Oracle query - field contains special characters or alpha

need to see if a taxid field contains letters or any special characters:. *,!,?,#,#,$,&,+,(,),/
How can I write this SQL?
Oracle has regexp_like:
select * from tablename where regexp_like(columnname, '[*!?##$&+()/]');
Here is the best way to display all the taxid's that are NOT entirely composed of digits:
select taxid
from your_table
where translate(taxid, '.0123456789', '.') is not null
TRANSLATE will "translate" (replace) each period in the input with a period in the output. Since the other characters in the second argument do not have a corresponding character in the third argument, they will simply be deleted. If the result of this operation is not null, then the taxid contains at least one character that is not a digit. If taxid is all digits, the result of the operation is null. Note that the period character used here is needed, due to an oddity in Oracle's definition of TRANSLATE: if any of its arguments is null, then so is its return value. That doesn't make a lot of sense, but we must work with the functions as Oracle defined them.

SQL: Using <= and >= to compare string with wildcard

Assuming I have table that looks like this:
Id | Name | Age
=====================
1 | Jose | 19
2 | Yolly | 26
20 | Abby | 3
29 | Tara | 4
And my query statement is:
1) Select * from thisTable where Name <= '*Abby';
it returns 0 row
2) Select * from thisTable where Name <= 'Abby';
returns row with Abby
3) Select * from thisTable where Name >= 'Abby';
returns all rows // row 1-4
4) Select * from thisTable where Name >= '*Abby';
returns all rows; // row 1-4
5) Select * from thisTable where Name >= '*Abby' and Name <= "*Abby";
returns 0 row.
6) Select * from thisTable where Name >= 'Abby' and Name <= 'Abby';
returns row with Abby;
My question: why I got these results? How does the wildcard affect the result of query? Why don't I get any result if the condition is this Name <= '*Abby' ?
Wildcards are only interpreted when you use LIKE opterator.
So when you are trying to compare against the string, it will be treated literally. So in your comparisons lexicographical order is used.
1) There are no letters before *, so you don't have any rows returned.
2) A is first letter in alphabet, so rest of names are bigger then Abby, only Abby is equal to itself.
3) Opposite of 2)
4) See 1)
5) See 1)
6) This condition is equivalent to Name = 'Abby'.
When working with strings in SQL Server, ordering is done at each letter, and the order those letters are sorted in depends on the collation. For some characters, the sorting method is much easier to understand, It's alphabetical or numerical order: For example 'a' < 'b' and '4' > '2'. Depending on the collation this might be done by letter and then case ('AaBbCc....') or might be Case then letter ('ABC...Zabc').
Let's take a string like 'Abby', this would be sorted in the order of the letters A, b, b, y (the order they would appear would be according to your collation, and i don't know what it is, but I'm going to assume a 'AaBbCc....' collation, as they are more common). Any string starting with something like 'Aba' would have a value sell than 'Abby', as the third character (the first that differs) has a "lower value". As would a value like 'Abbie' ('i' has a lower value than 'y'). Similarly, a string like 'Abc' would have a greater value, as 'c' has a higher value than 'b' (which is the first character that differs).
If we throw numbers into the mix, then you might be surpised. For example the string (important, I didn't state number) '123456789' has a lower value than the string '9'. This is because the first character than differs if the first character. '9' is greater than '1' and so '9' has the "higher" value. This is one reason why it's so important to ensure you store numbers as numerical datatypes, as the behaviour is unlikely to be what you expect/want otherwise.
To what you are asking, however, the wildcard for SQL Server is '%' and '_' (there is also '^',m but I won't cover that here). A '%' represents multiple characters, while '_' a single character. If you want to specifically look for one of those character you have to quote them in brackets ([]).
Using the equals (=) operator won't parse wildcards. you need to use a function that does, like LIKE. Thus, if you want a word that started with 'A' you would use the expression WHERE ColumnName LIKE 'A%'. If you wanted to search for one that consisted of 6 characters and ended with 'ed' you would use WHERE ColumnName LIKE '____ed'.
Like I said before, if you want to search for one of those specific character, you quote then. So, if you wanted to search for a string that contained an underscore, the syntax would be WHERE ColumnName LIKE '%[_]%'
Edit: it's also worth noting that, when using things like LIKE that they are effected by the collations sensitivity; for example, Case and Accent. If you're using a case sensitive collation, for example, then the statement WHERE 'Abby' LIKE 'abb%' is not true, and 'A' and 'a' are not the same case. Like wise, the statement WHERE 'Covea' = 'Covéa' would be false in an accent sensitive collation ('e' and 'é' are not treated as the same character).
A wildcard character is used to substitute any other characters in a string. They are used in conjunction with the SQL LIKE operator in the WHERE clause. For example.
Select * from thisTable WHERE name LIKE '%Abby%'
This will return any values with Abby anywhere within the string.
Have a look at this link for an explanation of all wildcards https://www.w3schools.com/sql/sql_wildcards.asp
It is because, >= and <= are comparison operators. They compare string on the basis of their ASCII values.
Since ASCII value of * is 42 and ASCII values of capital letters start from 65, that is why when you tried name<='*Abby', sql-server picked the ASCII value of first character in your string (that is 42), since no value in your data has first character with ASCII value less than 42, no data got selected.
You can refer ASCII table for more understanding:
http://www.asciitable.com/
There are a few answers, and a few comments - I'll try to summarize.
Firstly, the wildcard in SQL is %, not * (for multiple matches). So your queries including an * ask for a comparison with that literal string.
Secondly, comparing strings with greater/less than operators probably does not do what you want - it uses the collation order to see which other strings are "earlier" or "later" in the ordering sequence. Collation order is a moderately complex concept, and varies between machine installations.
The SQL operator for string pattern matching is LIKE.
I'm not sure I understand your intent with the >= or <= stateements - do you mean that you want to return rows where the name's first letter is after 'A' in the alphabet?

MAX in SQLSERVER

This may be a small one but i could find any , see this is how it is..
I have a sqlserver table with two columns and two rows , one of the column's name is Number and it has two rows with values
1. c7df055e-f8b5-4fc5-9c0a-8f59624c4022
2. 1234
When i query the table with this query select max(Number) from table table_name
Its giving the result c7df055e-f8b5-4fc5-9c0a-8f59624c4022 , So how does MAX calculate the maximum value when any of the values contains characters, i have searched for this and found this
For character columns, MAX finds the highest value in the collating sequence.
But could understand better , so anyone please suggest a better explanation..
Thanks in advance
Collating sequence refers to the definition of how the numeric codes translate to characters. ASCII is a common collating sequence, for example; the byte "65" translates to the character "A", the byte "58" translates to the character "8" etc.
Most languages will compare character by character, comparing the underlying values. So "c" is 99 ASCII, and "1" is 49 ASCII, so the string starting with "c" will be the larger value. In general, lowercase letters are higher than upper case are higher than numbers, and other characters are all over the place.
Your "number" column is a text type (evidenced by presence of alpha and hyphen chars). For text types, sorting is alphabetic, and letters are "higher" than numbers, so the value starting with "c" is greater than one starting with "1".
Sorting has nothing to do with the format if the value: If the first character of the alphanumeric value was a zero, you would have got "1234" as the max.

sql query for alphanumeric ID in hex

I want to be able to differentiate between a string that is alphnumeric and a string that is in hex format.
My current query is:
<columnName> LIKE '?_____=' + REPLICATE('[0-9A-Fa-f]',16)
I found this method of searching for hex ID's online and I thought it was working. However after getting a significantly larger sample size I can see a high false positive rate in my results. The problem is that this gives me all the results I do want but it also gives me a bunch of results I dont care about. For example:
I want to see:
<url>.php?mains=d7ad916d1c0396ff
but i dont want to see:
<url>.php?mblID=2007012422060265
The difference between the 2 strings is that the 16 characters at the end that i want to collect are all numeric and not a hex ID. What are some ways you guys use to limit the results to hex ID only? Thanks in advnace.
UPDATE:
Juergen brought up a good point, the second number could be a hex value to. Not all hex numbers contain [a-F]. I would like to rephrase the question to state that I am looking for an ID with both letters and numbers in it, not just numbers.
The simplest way is just to add a separate clause for that restriction:
<columnName> LIKE '?_____=' + REPLICATE('[0-9A-Fa-f]',16)
AND <columnName> NOT LIKE '?_____=' + REPLICATE('[0-9]',16)
It should be fairly simple to determine if a string contains only numbers...
Setting up a test table:
CREATE TABLE #Temp (Data char(32) not null)
INSERT #Temp
values ('<url>.php?mains=d7ad916d1c0396ff')
,('<url>.php?mblID=2007012422060265 ')
Write a query:
SELECT
right(Data, 16) StringToCheck
,isnumeric(right(Data, 16)) IsNumeric
from #Temp
Get results:
StringToCheck IsNumeric
d7ad916d1c0396ff 0
2007012422060265 1
So, if the IsNumeric function returns 0, it could be a hex string.
This makes several assumptions:
The rightmost 16 characters are what you want to check
You only ever hit 16 characters. I don't know when the string would get too long to check.
A non-numeric character means hex. Any chance of "Q" or "~" being embedded in the string?