This question already has answers here:
What is the major difference between Varchar2 and char
(7 answers)
Closed 6 years ago.
I have a table named tab1 with a column as col1 and data type as varchar2(10) and another table named tab2 with a single column as col2 and data type as char(20)
with following data:
tab1 tab2
a a
b b
c c
when I run the following query
select tab1.*,tab2.*
from tab1 full join tab2
on tab1.col1 = tab2.col2;
I get the following output:
col1 col2
null a
null b
null c
a null
b null
c null
I know that char occupies fixed memory but should oracle not join on string comparison?
varchar2(10) occupied only required space.
char(20) will pad blank at end if the text is of smaller length.
Hence, in tab1 col1, value a is stored as a
but in tab2 col2, value a is stored as a
and hence no match.
Char is blank padded to its full width, so you are comparing
'a ' with 'a'
and they are not the same
Straight from the Oracle Documentation...
https://docs.oracle.com/database/122/SQLRF/Data-Type-Comparison-Rules.htm#SQLRF30027
Blank-Padded and Nonpadded Comparison Semantics
With blank-padded semantics, if the two values have different lengths,
then Oracle first adds blanks to the end of the shorter one so their
lengths are equal. Oracle then compares the values character by
character up to the first character that differs. The value with the
greater character in the first differing position is considered
greater. If two values have no differing characters, then they are
considered equal. This rule means that two values are equal if they
differ only in the number of trailing blanks. Oracle uses blank-padded
comparison semantics only when both values in the comparison are
either expressions of data type CHAR, NCHAR, text literals, or values
returned by the USER function.
With nonpadded semantics, Oracle compares two values character by
character up to the first character that differs. The value with the
greater character in that position is considered greater. If two
values of different length are identical up to the end of the shorter
one, then the longer value is considered greater. If two values of
equal length have no differing characters, then the values are
considered equal. Oracle uses nonpadded comparison semantics whenever
one or both values in the comparison have the data type VARCHAR2 or
NVARCHAR2.
The results of comparing two character values using different
comparison semantics may vary. The table that follows shows the
results of comparing five pairs of character values using each
comparison semantic. Usually, the results of blank-padded and
nonpadded comparisons are the same. The last comparison in the table
illustrates the differences between the blank-padded and nonpadded
comparison semantics.
Edit: After reading the other answers, you can know what is wrong. You can overcome it by using below syntax.
select tab1.*,tab2.*
from tab1 full join tab2
on trim(tab1.col1)=trim(tab2.col2);
Related
I have some queries where I'm not familiar with, maybe all of you can explain it, please
Select *
From TableA
Where field1 = N'FAIL' And field2 = N'0'
N'FAIL' or N'0' => what does that mean? Can we use that condition to get more efficient query?
Is this query the same as this?
select *
from TableA
where field1 = 'FAIL' and field2 = '0'
I hope I can get some answers, thank you
N'FAIL' is a literal NVARCHAR. 'FAIL' is a literal VARCHAR.
Those are two different datatypes: both are variable-length strings (with a maximum length), however NVARCHAR stores both Unicode and non-Unicode characters, while VARCHAR supports non-Unicode characters only. The storage size of NVARCHAR is bigger if you have Unicode characters (2 bytes per character in Unicode, vs 1 byte per character in non-Unicode).
Which one should be used in your statement depends on the actual datatype of the column that the value is compared against. If the column is NVARCHAR, use NVACHAR literals, else use VARCHAR; using the proper datatype for literal strings is always more efficient, and safer as well.
In SQL Server 2017 (14.0.2)
Consider the following table:
CREATE TABLE expTest
(
someNumbers [NVARCHAR](10) NULL
)
And let's say you populate the table with some values:
INSERT INTO expTest VALUES('²', '2')
Why does the following SELECT return both rows?
SELECT *
FROM expTest
WHERE someNumbers = '2'
Shouldn't nvarchar realize that '²' is unicode, while '2' is a separate value? How (without using the UNICODE() function) could I identify this data as being nonequivalent?
Here is a db<>fiddle. This shows the following:
Your observation is true even when the values are entered as national character set constants.
The "ASCII" versions of the characters are actually different.
The problem goes away with a case-sensitive collation.
I think the exponent is just being treated as a different "case" of the number, so they are considered the same in a case-insensitive collation.
The comparison is what you expect with a case-sensitive collation.
I ran this query on a PostgreSQL table:
select * wkt from table where column <>'' and column is not null
..and unexpectedly received several rows with no visible value in that column. Why is this? Is there some 'hidden' value in that column for those rows, or a corrupted table, or something else?
t=# select ascii(chr(9));
ascii
-------
9
(1 row)
thus
select ascii(column) from table where column <>'' and column is not null
should give the idea
https://www.postgresql.org/docs/current/static/functions-string.html
ASCII code of the first character of the argument. For UTF8 returns
the Unicode code point of the character. For other multibyte
encodings, the argument must be an ASCII character.
I've read all about varchar versus nvarchar. But I didn't see an answer to what I think is a simple question. How do you determine the length of your nvarchar column? For varchar it's very simple: my Description, for example, can have 100 characters, so I define varchar(100). Now I'm told we need to internationalize and support any language. Does this mean I need to change my Description column to nvarchar(200), i.e. simply double the length? (And I'm ignoring all the other issues that are involved with internationalization for the moment.)
Is it that simple?
Generally it is the same as for varchar really. The number is still the maximum number of characters not the data length.
nvarchar(100) allows 100 characters (which would potentially consume 200 bytes in SQL Server).
You might want to allow for the fact that different cultures may take more characters to express the same thing though.
An exception to this is however is if you are using an SC collation (which supports supplementary characters). In that case a single character can potentially take up to 4 bytes.
So worst case would be to double the character value declared.
From microsoft web site:
A common misconception is to think that NCHAR(n) and NVARCHAR(n), the n defines the number of characters. But in NCHAR(n) and NVARCHAR(n) the n defines the string length in byte-pairs (0-4,000). n never defines numbers of characters that can be stored. This is similar to the definition of CHAR(n) and VARCHAR(n).
The misconception happens because when using characters defined in the Unicode range 0-65,535, one character can be stored per each byte-pair. However, in higher Unicode ranges (65,536-1,114,111) one character may use two byte-pairs. For example, in a column defined as NCHAR(10), the Database Engine can store 10 characters that use one byte-pair (Unicode range 0-65,535), but less than 10 characters when using two byte-pairs (Unicode range 65,536-1,114,111). For more information about Unicode storage and character ranges, see
https://learn.microsoft.com/en-us/sql/t-sql/data-types/nchar-and-nvarchar-transact-sql?view=sql-server-ver15
#Musa Calgar - exactly right. That link has the information for the answer to this question.
But to make sure the question itself is clear, we are talking about the 'length' attribute we see when we look at the column definition for a given table, right? That is the storage allocated per column. On the other hand, if we want to know the number of characters for a given string in the table at a given moment you can:
"SELECT myColumn, LEN(myColumn) FROM myTable"
But if the storage length is desired, you can drag the table name into the query window using SSMS, highlight it, and use 'Alt-F1' to see the defined lengths of each column.
So as an example, I created a table like this specifiying collations. (Latin1_General_100_CI_AS_SC allows for supplemental characters - that is, characters that take more than just 2 bytes):
CREATE TABLE [dbo].[TestTable1](
[col1] [varchar](10) COLLATE Latin1_General_100_CI_AS,
[col2] [nvarchar](10) COLLATE Latin1_General_100_CI_AS_SC,
[col3] [nvarchar](10) COLLATE Latin1_General_100_CI_AS
) ON [PRIMARY]
The lengths show up like this (Highlight in query window and Alt-F1):
Column_Name Type Length [...] Collation
col1 varchar 10 Latin1_General_100_CI_AS
col2 nvarchar 20 Latin1_General_100_CI_AS_SC
col3 nvarchar 20 Latin1_General_100_CI_AS
If you insert ASCII characters into the varchar and nvarchar fields, it will allow you to put 10 characters into all of them. There will be an error if you try to put more than 10 characters into those fields:
"String or binary data would be truncated.
The statement has been terminated."
If you insert non-ASCII characters like 'ā' you can still put 10 of them into each one, but SQL Server will convert the values going into col1 to the closest known character that fits into 1-byte. In this case, 'ā' will be converted to 'a'.
However, if you insert characters that require 4 bytes to store, like for example, '𠜎', you will only be allowed to put FIVE of them into the varchar and nvarchar fields. Any more than that will result in the truncation error shown above. The varchar field will show question marks because it has no single-byte character that it can convert that input to.
So when you insert five of these '𠜎', do a select of that row using len(<colname>) and you will see this:
col1 len(col1) col2 len(col2) col3 len(col3)
?????????? 10 𠜎𠜎𠜎𠜎𠜎 5 𠜎𠜎𠜎𠜎𠜎 10
So the length of col2 shows 5 characters since supplemental characters were defined when the table was created (see above CREATE TABLE DDL statement). However, col3 did not have _SC for its collation, so it is showing length 10 for the five characters we inserted.
Note that col1 has ten question marks. If we had defined the col1 varchar using the _SC collation instead of the non-supplemental one, it would behave the same way.
I have an Oracle table and a column (col1) has type varchar2(12 byte). It has one row and value of col1 is 1234
When I say
select * from table where col1 = 1234
Oracle says invalid number. Why is that? Why I cannot pass a number when it is varchar2?
EDIT: All the responses are great. Thank you. But I am not able to understand why it does not take 1234 when 1234 is a valid varchar2 datatype.
The problem is that you expect that Oracle will implicitly cast 1234 to a character type. To the contrary, Oracle is implicitly casting the column to a number. There is a non-numeric value in the column, so Oracle throws an error. The Oracle documentation warns against implicit casts just before it explains how they will be resolved. The rule which explains the behaviour you're seeing is:
When comparing a character value with a numeric value, Oracle converts the character data to a numeric value.
Oh, it is much better to convert to char rather than to numbers:
select *
from table
where col1 = to_char(1234)
When the col1 does not look like a number, to_number returns an error, stopping the query.
Oracle says invalid number. Why is that? Why I cannot pass a number when it is varchar2?
Oracle does an implicit conversion from character type of col1 to number, since you're comparing it as a number.
Also, you assume that 1234 is the only row that's being fetched. In reality, Oracle has to fetch all rows from the table, and then filter out as per the where clause. Now there's a character value in col1 that's being fetched before it encounters your 1234 row & that's causing the error, since the character cannot be converted to a number.
This fiddle shows the behaviour. Since abc canot be converted to a number, you get that error message
Now if the only record in the table is that of col1 containing a numeric character, you'll see that the statement will work fine