Select truncated string from Postgres - sql

I have some large varchar values in Postgres that I want to SELECT and move somewhere else. The place they are going to uses VARCHAR(4095) so I only need at most 4095 bytes (I think that's bytes) and some of these varchars are quite big, so a performance optimization would be to SELECT a truncated version of them.
How can I do that?
Something like:
SELECT TRUNCATED(my_val, 4095) ...
I don't think it's a character length though, it needs to be a byte length?

The n in varchar(n) is the number of characters, not bytes. The manual:
SQL defines two primary character types: character varying(n) and
character(n), where n is a positive integer. Both of these types can
store strings up to n characters (not bytes) in length.
Bold emphasis mine.
The simplest way to "truncate" a string would be with left():
SELECT left(my_val, 4095)
Or just cast:
SELECT my_val::varchar(4095)
The manual once more:
If one explicitly casts a value to character varying(n) or
character(n), then an over-length value will be truncated to n
characters without raising an error. (This too is required by the SQL standard.)

Related

SQL Developer fills remaining space in char with spaces

I have a attribute with the data type char(256). I import the value via SQL Developer from a csv file
When the attribute gets a value with 10 characters, the remaining space gets filled with spaces.
I know that char allocates the space staticly, but does that also mean that I get a string in the format like "abc " ?
Since this make sql statements with equal operators difficult.
You are operating under a misconception; it has nothing to do with SQL Developer.
A CHAR data-type is a fixed-length string; if you do not provide a string of the full length then Oracle will right-pad the string with space (ASCII 32) characters until it has the correct length.
From the documentation:
CHAR Datatype
The CHAR datatype stores fixed-length character strings. When you create a table with a CHAR column, you must specify a string length (in bytes or characters) between 1 and 2000 bytes for the CHAR column width. The default is 1 byte. Oracle then guarantees that:
When you insert or update a row in the table, the value for the CHAR column has the fixed length.
If you give a shorter value, then the value is blank-padded to the fixed length.
If a value is too large, Oracle Database returns an error.
Oracle Database compares CHAR values using blank-padded comparison semantics.
To solve this, do not use CHAR for variable length strings and use VARCHAR2 instead.
VARCHAR2 and VARCHAR Datatypes
The VARCHAR2 datatype stores variable-length character strings. When you create a table with a VARCHAR2 column, you specify a maximum string length (in bytes or characters) between 1 and 4000 bytes for the VARCHAR2 column. For each row, Oracle Database stores each value in the column as a variable-length field unless a value exceeds the column's maximum length, in which case Oracle Database returns an error. Using VARCHAR2 and VARCHAR saves on space used by the table.
You may use varchar2 instead of char as datatype to avoid this.
Or you can trim your data in query by using rtrim(columnname) .

Hive QL Declaration

What is the difference between CHAR() and VARCHAR() declarations from HQL?
VARCHAR holds the advantage since variable-length data would produce smaller rows and, thus, smaller physical files.
CHAR fields require less string manipulation because of fixed field widths. Partiton, lookup, join, group on CHAR field are faster than VARCHAR fields.
like in any other language:
CHAR is fixed length character datatype , for example If you define char(10) and the input value is of 6 characters then the remaining 4 will be filled with spaces.
VARCHAR has variable length, for example If you define varchar(10) and the input value is of 6 characters then only 6 bytes will be used and no additional space will be blocked.
HIVE DOC REFERENCE

try to concatenate 2 strings, result ends in a lot of spaces

select CONCAT(convert(char, 123), 'sda');
Or
select convert(char, 123) + 'sda'
Or
select ltrim(convert(char, 123) + 'sda')
Output is:
How can I get the output without those spaces?
The problem here is 2 fold. Firstly that you are converting to a char, which is a fixed width datatype, and secondly that you aren't defining the length of your char, therefore the default length is used. For CAST and CONVERT that's a char(30).
So, what you have to start is convert(char, 123). This converts the int 123 to the fixed width string '123 '. Then you concatenate the varchar(3) value 'sda' to that, resulting in '123 sda'. This is working exactly as written, but clearly not as you intend.
The obvious fix would be to use a varchar and define a length, such as CONCAT(CONVERT(varchar(5),123),'sda') which would return '123sda', however, all of the CONCAT function's parameters are a string type:
string_value
A string value to concatenate to the other values. The CONCAT function requires at least two string_value arguments, and no more than 254 string_value arguments.
This means you can simply just pass the value 123 and it'll be implicitly cast to a string type: CONCAT(123,'sda').
To reiterate my comment's link too: Bad Habits to Kick : Declaring VARCHAR without (length)
You are using char while you probably want [n]varchar(...): the former pads the string with white spaces, while the latter does not:
concat(convert(varchar(10), 123), 'sda');
But simpler yet: concat() forces the conversion of its arguments to the correct datatype by default, so this should do it:
concat(123, 'sda')
First, in SQL Server, never us char or related string definitions without a length. SQL Server requires a length and the default depends on the context. If you depend on the default length your code has a bug just waiting to happen.
Second, char is almost never what you want. It is a fixed length string, with shorter strings padded with spaces.
If you want an explicit conversion use varchar, variable length strings:
select convert(varchar(255), 123) + 'sda'
Or dispense with the explicit conversion and use concat():
select concat(123, 'sda')
As the others have already pointed out the root cause of the issue, if you cannot edit the datatype, you can always use SELECT CONCAT(TRIM(CONVERT(char,123)),'sda'). Although it's highly recommended to either use varchar(n) or give exclusive length of char as it is kind of pointless to create fixed length string and then reduce the length by using TRIM. varchar(30) would perfectly fit in here as the length can still NOT exceed the 30 symbols, but would not use all the length if the string is shorter.
Lets refer to Microsoft docs:
When n isn't specified in a data definition or variable declaration statement, the default length is 1. If n isn't specified when using the CAST and CONVERT functions, the default length is 30.
Reference: https://learn.microsoft.com/en-us/sql/t-sql/data-types/char-and-varchar-transact-sql?view=sql-server-ver15#remarks
So, You have Convert(char, 123), and you did not specify the n for char, so your code is equal to Convert(char(30), 123).
Now it is clear why you have many space characters. To resolve the problem simply use variant length character datatypes such as varchar instead, however I recommend you to always use character datatypes with length. (Same as what #GordonLinoff posted: https://stackoverflow.com/a/63467483/1666800)
select convert(varchar, 123) + 'sda'

What is the max value of a CHAR?

I was wondering what the max char value is in sql? I noticed in C# this \uFFFF, but when I use that value to compare a string SQL renders it as an empty string I think.
The table is in SQL_Latin1_General_CP1_CI_AS if that matters.
There is a deep misconception of what is ascii...
ASCII is a 7bit code (0 to 127) where the characters are fix
the 8th bit offers this range a second time (128 to 255). In this area the characters are depending on codepages and collations.
Thinking of CHAR as a BYTE (8 bit in memory) is misleading...
Try this, both return a captial A
SELECT CHAR(65) COLLATE Latin1_General_CI_AS
SELECT CHAR(65) COLLATE Arabic_CI_AS
The code 255 renders with Latin1_General_CI_AS as ΓΏ, with the arabic collation there seems to be no printable character, hence the question mark.
SELECT CHAR(255) COLLATE Latin1_General_CI_AS
SELECT CHAR(255) COLLATE Arabic_CI_AS
So in short: SQL renders it as an empty string is not true. This is depending on your settings
Did you checked Documentation as it clearly says
char [ ( n ) ]
Fixed-length, non-Unicode string data. n defines the
string length and must be a value from 1 through 8,000. The storage
size is n bytes. The ISO synonym for char is character.
Numerically, the answer is 255. CHAR has a potential range of 0 to 255. It is an 8-bit code unit for the character encoding configured for the field (which it might inherit from the table or database).
Whether 255 is a valid code unit and is a complete codepoint, and which character it represents, and its sort order (is that what you meant by max?), depends on the collation. (A collation specifies a character encoding and sort order.)
Oh, if you are going to compare SQL datatypes to others, NVARCHAR and C#'s char and .NET's Char all use UTF-16 as the character encoding.

Are there any limits on length of string in mysql?

I am using MySQL data base with Rails. I have created a field of type string. Are there any limits to its length? What about type text?
Also as text is variable sized, I believe there would be extra costs associated with using text objects. How important can they get, if at all?
CHAR
A fixed-length string that is always right-padded with spaces to the specified length when stored The range of Length is 1 to 255 characters. Trailing spaces are removed when the value is retrieved. CHAR values are sorted and compared in case-insensitive fashion according to the default character set unless the BINARY keyword is given.
VARCHAR
A variable-length string. Note: Trailing spaces are removed when the value is stored (this differs from the ANSI SQL specification)
The range of Length is 1 to 255 characters. VARCHAR values are sorted and compared in case-insensitive fashion unless the BINARY keyword is given
TINYBLOB, TINYTEXT
A TINYBLOB or TINYTEXT column with a maximum length of 255 (28 - 1) characters
BLOB, TEXT
A BLOB or TEXT column with a maximum length of 65,535 (216 - 1) characters , bytes = 64 KiB
MEDIUMBLOB, MEDIUMTEXT
A MEDIUMBLOB or MEDIUMTEXT column with a maximum length of 16,777,215 (224 - 1)characters , bytes = 16 MiB
LONGBLOB, LONGTEXT
A LONGBLOB or LONGTEXT column with a maximum length of 4,294,967,295 (232 - 1) characters , bytes = 4 GiB
See MySQL Data Types Quick Reference Table for more info.
also you can see MYSQL - String Type Overview
String, in general, should be used for short text. For example, it is a VARCHAR(255) under MySQL.
Text uses the larger text from the database, like, in MySQL, the type TEXT.
For information on how this works and the internals in MySQL and limits and such, see the other answer by Pekka.
If you are requesting, say, a paragraph, I would use text. If you are requesting a username or email, use string.
See the mySQL manual on String Types.
Varchar (String):
Values in VARCHAR columns are variable-length strings. The length can be specified as a value from 0 to 255 before MySQL 5.0.3, and 0 to 65,535 in 5.0.3 and later versions. The effective maximum length of a VARCHAR in MySQL 5.0.3 and later is subject to the maximum row size (65,535 bytes, which is shared among all columns) and the character set used.
Text: See storage requirements
If you want a fixed size text field, use CHAR which can be 255 characters in length maximum. VARCHAR and TEXT both have variable length.