Hive QL Declaration - sql

What is the difference between CHAR() and VARCHAR() declarations from HQL?

VARCHAR holds the advantage since variable-length data would produce smaller rows and, thus, smaller physical files.
CHAR fields require less string manipulation because of fixed field widths. Partiton, lookup, join, group on CHAR field are faster than VARCHAR fields.

like in any other language:
CHAR is fixed length character datatype , for example If you define char(10) and the input value is of 6 characters then the remaining 4 will be filled with spaces.
VARCHAR has variable length, for example If you define varchar(10) and the input value is of 6 characters then only 6 bytes will be used and no additional space will be blocked.
HIVE DOC REFERENCE

Related

SQL Developer fills remaining space in char with spaces

I have a attribute with the data type char(256). I import the value via SQL Developer from a csv file
When the attribute gets a value with 10 characters, the remaining space gets filled with spaces.
I know that char allocates the space staticly, but does that also mean that I get a string in the format like "abc " ?
Since this make sql statements with equal operators difficult.
You are operating under a misconception; it has nothing to do with SQL Developer.
A CHAR data-type is a fixed-length string; if you do not provide a string of the full length then Oracle will right-pad the string with space (ASCII 32) characters until it has the correct length.
From the documentation:
CHAR Datatype
The CHAR datatype stores fixed-length character strings. When you create a table with a CHAR column, you must specify a string length (in bytes or characters) between 1 and 2000 bytes for the CHAR column width. The default is 1 byte. Oracle then guarantees that:
When you insert or update a row in the table, the value for the CHAR column has the fixed length.
If you give a shorter value, then the value is blank-padded to the fixed length.
If a value is too large, Oracle Database returns an error.
Oracle Database compares CHAR values using blank-padded comparison semantics.
To solve this, do not use CHAR for variable length strings and use VARCHAR2 instead.
VARCHAR2 and VARCHAR Datatypes
The VARCHAR2 datatype stores variable-length character strings. When you create a table with a VARCHAR2 column, you specify a maximum string length (in bytes or characters) between 1 and 4000 bytes for the VARCHAR2 column. For each row, Oracle Database stores each value in the column as a variable-length field unless a value exceeds the column's maximum length, in which case Oracle Database returns an error. Using VARCHAR2 and VARCHAR saves on space used by the table.
You may use varchar2 instead of char as datatype to avoid this.
Or you can trim your data in query by using rtrim(columnname) .

Select truncated string from Postgres

I have some large varchar values in Postgres that I want to SELECT and move somewhere else. The place they are going to uses VARCHAR(4095) so I only need at most 4095 bytes (I think that's bytes) and some of these varchars are quite big, so a performance optimization would be to SELECT a truncated version of them.
How can I do that?
Something like:
SELECT TRUNCATED(my_val, 4095) ...
I don't think it's a character length though, it needs to be a byte length?
The n in varchar(n) is the number of characters, not bytes. The manual:
SQL defines two primary character types: character varying(n) and
character(n), where n is a positive integer. Both of these types can
store strings up to n characters (not bytes) in length.
Bold emphasis mine.
The simplest way to "truncate" a string would be with left():
SELECT left(my_val, 4095)
Or just cast:
SELECT my_val::varchar(4095)
The manual once more:
If one explicitly casts a value to character varying(n) or
character(n), then an over-length value will be truncated to n
characters without raising an error. (This too is required by the SQL standard.)

zero padding in teradata sql

Table A
Id varchar(30)
I'm trying to re-create a logic where I have to use 9 digit Ids irrespective of the actual length of the Value of the Id field.
So for instance, if the Id is of length 6, I'll need to left pad with 3 leading zeros. The actual length can be anything ranging from 1 to 9.
Any ideas how to implement this in Teradata SQL?
If the actual length is 1 to 9 characters why is the column defined as VarCar(30)?
If it was a numeric column it would be easy:
CAST(CAST(numeric_col AS FORMAT '9(9)') AS CHAR(9))
For strings there's no FORMAT like that, but depending on your release you might have an LPAD function:
LPAD(string_col, 9, '0')
Otherwise it's:
SUBSTRING('000000000' FROM CHAR_LENGTH(string_col)+1) || string_col,
If there are more than nine characters all previous calculations will return them.
If you want to truncate (or a CHAR instead of a VARCHAR result) you have to add a final CAST AS CHAR(9)
And finally, if there are leading or trailing blanks you might want to use TRIM(string_col)

Are there any limits on length of string in mysql?

I am using MySQL data base with Rails. I have created a field of type string. Are there any limits to its length? What about type text?
Also as text is variable sized, I believe there would be extra costs associated with using text objects. How important can they get, if at all?
CHAR
A fixed-length string that is always right-padded with spaces to the specified length when stored The range of Length is 1 to 255 characters. Trailing spaces are removed when the value is retrieved. CHAR values are sorted and compared in case-insensitive fashion according to the default character set unless the BINARY keyword is given.
VARCHAR
A variable-length string. Note: Trailing spaces are removed when the value is stored (this differs from the ANSI SQL specification)
The range of Length is 1 to 255 characters. VARCHAR values are sorted and compared in case-insensitive fashion unless the BINARY keyword is given
TINYBLOB, TINYTEXT
A TINYBLOB or TINYTEXT column with a maximum length of 255 (28 - 1) characters
BLOB, TEXT
A BLOB or TEXT column with a maximum length of 65,535 (216 - 1) characters , bytes = 64 KiB
MEDIUMBLOB, MEDIUMTEXT
A MEDIUMBLOB or MEDIUMTEXT column with a maximum length of 16,777,215 (224 - 1)characters , bytes = 16 MiB
LONGBLOB, LONGTEXT
A LONGBLOB or LONGTEXT column with a maximum length of 4,294,967,295 (232 - 1) characters , bytes = 4 GiB
See MySQL Data Types Quick Reference Table for more info.
also you can see MYSQL - String Type Overview
String, in general, should be used for short text. For example, it is a VARCHAR(255) under MySQL.
Text uses the larger text from the database, like, in MySQL, the type TEXT.
For information on how this works and the internals in MySQL and limits and such, see the other answer by Pekka.
If you are requesting, say, a paragraph, I would use text. If you are requesting a username or email, use string.
See the mySQL manual on String Types.
Varchar (String):
Values in VARCHAR columns are variable-length strings. The length can be specified as a value from 0 to 255 before MySQL 5.0.3, and 0 to 65,535 in 5.0.3 and later versions. The effective maximum length of a VARCHAR in MySQL 5.0.3 and later is subject to the maximum row size (65,535 bytes, which is shared among all columns) and the character set used.
Text: See storage requirements
If you want a fixed size text field, use CHAR which can be 255 characters in length maximum. VARCHAR and TEXT both have variable length.

Difference between BYTE and CHAR in column datatypes

In Oracle, what is the difference between :
CREATE TABLE CLIENT
(
NAME VARCHAR2(11 BYTE),
ID_CLIENT NUMBER
)
and
CREATE TABLE CLIENT
(
NAME VARCHAR2(11 CHAR), -- or even VARCHAR2(11)
ID_CLIENT NUMBER
)
Let us assume the database character set is UTF-8, which is the recommended setting in recent versions of Oracle. In this case, some characters take more than 1 byte to store in the database.
If you define the field as VARCHAR2(11 BYTE), Oracle can use up to 11 bytes for storage, but you may not actually be able to store 11 characters in the field, because some of them take more than one byte to store, e.g. non-English characters.
By defining the field as VARCHAR2(11 CHAR) you tell Oracle it can use enough space to store 11 characters, no matter how many bytes it takes to store each one. A single character may require up to 4 bytes.
One has exactly space for 11 bytes, the other for exactly 11 characters. Some charsets such as Unicode variants may use more than one byte per char, therefore the 11 byte field might have space for less than 11 chars depending on the encoding.
See also http://www.joelonsoftware.com/articles/Unicode.html
Depending on the system configuration, size of CHAR mesured in BYTES can vary. In your examples:
Limits field to 11 BYTE
Limits field to 11 CHARacters
Conclusion: 1 CHAR is not equal to 1 BYTE.
I am not sure since I am not an Oracle user, but I assume that the difference lies when you use multi-byte character sets such as Unicode (UTF-16/32). In this case, 11 Bytes could account for less than 11 characters.
Also those field types might be treated differently in regard to accented characters or case, for example 'binaryField(ete) = "été"' will not match while 'charField(ete) = "été"' might (again not sure about Oracle).
In simple words when you write NAME VARCHAR2(11 BYTE) then only 11 Byte can be accommodated in that variable.
No matter which characters set you are using, for example, if you are using Unicode (UTF-16) then only half of the size of Name can be accommodated in NAME.
On the other hand, if you write NAME VARCHAR2(11 CHAR) then NAME can accommodate 11 CHAR regardless of their character encoding.
BYTE is the default if you do not specify BYTE or CHAR
So if you write NAME VARCHAR2(4000 BYTE) and use Unicode(UTF-16) character encoding then only 2000 characters can be accommodated in NAME
That means the size limit on the variable is applied in BYTES and it depends on the character encoding that how many characters can be accommodated in that vraible.