What does the specified number mean in a VARCHAR() clause? - sql

Just to clarify, by specifying something like VARCHAR(45) means it can take up to max 45 characters? I remember I heard from someone a few years ago that the number in the parenthesis doesn't refer to the number of characters, then the person tried to explain to me something quite complicated which I don't understand and forgot already.
And what is the difference between CHAR and VARCHAR? I did search around a bit and see that CHAR gives you the max of the size of the column and it is better to use it if your data has a fixed size and use VARCHAR if your data size varies.
But if it gives you the max of the size of the column of all the data of this column, isn't it better to use it when your data size varies? Especially if you don't know how big your data size is going to be. VARCHAR needs to specify the size (CHAR don't really need right?), isn't it more troublesome?

You also have to specify the size with CHAR. With CHAR, column values are padded with spaces to fill the size you specified, whereas with VARCHAR, only the actual value you specified is stored.
For example:
CREATE TABLE test (
char_value CHAR(10),
varchar_value VARCHAR(10)
);
INSERT INTO test VALUES ('a', 'b');
SELECT * FROM test;
The above will select "a " for char_value and "b" for varchar_value
If all your values are about the same size, the CHAR is possibly a better choice because it will often require less storage space than VARCHAR. This is because VARCHAR stores both the length of the value and the value itself, whereas CHAR can just store the (fixed-size) value.

The MySQL documentation gives a good explanation of the storage requirements of the various data types.
In particular, for a string of length L, a CHAR(M) datatype will take up (M x c) bytes (where c is the number of bytes required to store a character... this depends on the character set in use).
A VARCHAR(M) will take up (L + 1) or (L + 2) depending on whether M is <=255 or >255.
So, it really depends on how long you expect your strings to be, what the variation in length will be.
NB: The documetation doesn't discuss the impact of character sets on the storage requirements of a VARCHAR type. I've tried to quote it accurately, but my guess is that you would need to multiply the string length by the character byte-width as well to get the storage requirement.

The complicated stuff you don't remember is that the 45 refer to bytes, not chars. It's not the same if you are using a multibyte character encoding. In Oracle you can specify bytes or chars explicitly.
varchar2(45 BYTE)
or
varchar2(45 CHAR)
See Difference between BYTE and CHAR in column datatypes

char and varchar actually becomes irrelevant if you have just 1 variable length field in your table, like a varchar or text. Mysql will automatically change all char to varchar.
The fixed length/size record can give you extra performance, but you can't use any variable length field types. The reason is that it will be quicker and easier for mysql to find the next record.
For example, if you do a SELECT * FROM table LIMIT 10, mysql has to scan the table file for the tenth record. This means finding the end of each record until you find the end of the 10th record. But if your table has fixed length/size records, mysql just needs to know the record size and then skip 10 x #bytes.

If you know a column will contain a small, fixed number of chars use a CHAR, otherwise use a varchar. A CHAR column is padded to the max length.
VARCHAR has a small overhead (4-8 bytes depending on RDBMS), but only uses the overhead + the actual number of chars stored.

For the values you know they are going to be constant, for example for Phone Numbers, Zip Codes etc., It is optimal to use "char" for sure.

Related

What is a type of my value 675763582022462206:57 in sql creating data table query?

I am creating a table with several columns in sql:
CREATE TABLE.....
and one of them is going to have values like this: 675763582022462206:57. As you see it has : in it. So what is a type of it? Is it UInt16 or String?
It must be varchar or nvarchar in this case. The database doesn't recognize ":" as a part of a number, unless you say to Windows in advanced region settings that this is your decimal point. If you can store 57 (after ":") in a different column, then you can save the number before ":" as a bigint if you wish
This value can't be stored in a numeric type due to the colon (:), so you'll have to use one of the character types - i.e., a sufficiently long char or varchar.

Create table with numeric column

CREATE TABLE SampleMath
(m NUMERIC (10,3),
n INTEGER,
p INTEGER);
Question: why there is a space between NUMERIC and (10,3).
At first, I thought (10,3) is a column constraint. However, it doesn't seem to be correct as common constraints are NOT NULL, UNIQUE as listed here.
Then I thought that it could be the property (precision, scale) for the datatype NUMERIC as described in the documentation. In that case, I think there should not be a space between NUMERIC and (10,3). I also tried to delete the space in between and it seems the code still works. In my understanding, if it is the property, there should not be a space, which makes me really confused.
Any help to clarify this would be really appreciated. Thanks for your help in advance.
NUMERIC (10,3) is a datatype. It can store a number with a total of 10 digits (that's called precision), including 3 decimal (aka scale, ie the count of digits at the right of the decimal point). So basically the biggest number that it can store is 9999999.999.
The space between NUMERIC and the definition of its scale and precision is not meaningful. Even a new line would be OK, like:
CREATE TABLE SampleMath
(m NUMERIC
(10,3),
n INTEGER,
p INTEGER);

Numeric Data Type - Storage

According to Microsoft Site a data with type Numeric(10,2) - 10 means precision should have 9 bytes.
But when I'm doing this:
DECLARE #var as numeric(10,0) = 2147483649
SELECT #var, DATALENGTH(#var)
DATALENGTH(#var) is returning 5 bytes instead of 10. Can someone explain me why?
The documentation specifies:
Maximum storage sizes vary, based on the precision.
The storage is not constant for a given precision. The actual storage depends on the value.
As a note, this has nothing to do with integerness. The following also returns 5:
declare #var numberic(11, 1) = 214483649.8
In actual fact, SQL Server seems to use the amount of storage needed for the value, not for the maximum value of the type. You can readily see this by changing the "10" to "20" and noting that the data length does not change.
EDIT:
You can see the dependence on the value if you run:
declare #a numeric(20, 1) = '123.1';
declare #b numeric(20, 1) = '1234567890123456789.0';
select datalength(#a), datalength(#b);
The two lengths are not the same.
The other answer, by #GordonLinoff is wrong, or at least misleading.
Numeric is not stored with a variable number of bytes, but with a fixed size for a specific precision.
Trying this on SQL Server 2017 gave the same results you got.
The documentation you linked to originally, for numeric, is correct about how many bytes it takes to store a numeric of varying precisions.
This storage requirement is based only on the precision of the numeric column. In other words, that's how many bytes of storage are used. It is not a maximum that depends on the value in that row.
All rows use the same number of bytes for that column.
The key to this variation is the documentation for DATALENGTH says this function
Returns the number of bytes used to represent any expression.
It appears that DATALENGTH goes not mean 'represent' as in 'represent' on disk, but rather 'represent' in memory.
The other documenation regarding numeric is talking about the on-disk storage of numeric.
This is probably because DATALENGTH is intended primarily for var* types or the other BLOB types.
So although a numeric(20,1) requires 13 bytes of storage, depending on the value, SQL Server can represent it in a smaller number of bytes when in memory, which is when DATALENGTH evaluates it.
As I pointed out in my other comment, although numeric has different sizes, it a fixed size data type, because for a specific column in a specific table, every values takes up the same amount of storage.
Roughly, a SQL Server row has 4 parts:
4 byte header
Fixed size data
Offsets into variable size data
Variable size data
Numerics & other fixed size types are stored in 2, var* are stored in 4, with lengths in 3.
This script displays the metadata for a table with some fixed & variable columns.
declare #a numeric(20, 1) = '123.1';
declare #b numeric(20, 1) = '1234567890123456789.0';
select datalength(#a) union select datalength(#b);
create table #numeric(num1 numeric(20,1), text1 varchar(10), char2 char(6));
insert into #numeric(num1, text1, char2) values ('123.1', 'hello', 'first'), ('1234567890123456789.0', 'there', '2nd');
select datalength(num1) from #numeric;
select
t.name as table_name,
c.name as column_name,
pc.partition_column_id,
pc.max_inrow_length,
pc.max_length,
pc.precision,
pc.scale,
pc.collation_name,
pc.leaf_offset
from tempdb.sys.tables as t
join tempdb.sys.partitions as p
on(t.object_id=p.object_id)
join tempdb.sys.system_internals_partition_columns as pc
on(pc.partition_id=p.partition_id)
join tempdb.sys.columns as c
on((c.object_id=p.object_id)and(c.column_id=pc.partition_column_id))
where (t.object_id=object_id('tempdb..#numeric'));
drop table #numeric;
Notice the leaf_offset column. This indicates the starting position of the value in the raw binary data.
The first column starts immediately after the 4 byte header.
The second fixed column starts 13 bytes later, as per the SQL documentation.
The varchar column has an offset of -1, indicating it is a variable length column & it's position in the byte array isn't fixed.
In this case it could be fixed since there's only 1 var column, but an alter table statement could add another column & shift things.
If you want to research further, the best source is a book called SQL Server Internals, by Kalen Delaney. She was part of the team that wrote SQL Server.

Arithmetic operation with numeric datatype in SQL server yields different results

I get different results when using real and numeric data type.
When I use real as datatype I get finalValue as -139.2466, when I use numeric datatype I get finalVaue as --139.246409. Which value is correct?
When I plug these numbers in Excel, it matches to value -139.2466.
For .eg
create table #resr ( a1 real, a2 real, a3 real)
insert #resr select 0.471163361822717, 0.0096160000 , 0.001669000000000
select a1*a2*-51.295/a3 finalValue from #resr
create table #resn ( a1 numeric(30,15), a2 numeric(30,15), a3 numeric(30,15))
insert #resn select 0.471163361822717, 0.0096160000 , 0.001669000000000
select a1*a2*-51.295/a3 finalValue from #resn
Floating point data types (of which REAL is a member) are approximate values, and can use any of a number of algorithms to encode the sequence of number, causing minute differences in how they're interpreted in SQL. This is the reason you can have a single float(10) value of 1234567890 and .1234567890
select cast(1234567890 as float(10))
select cast(.1234567890 as float(10))
Exact values (such as Decimal and Numeric) define exactly how many decimal places are allowed, and fills in zeroes for any out to as many as have been defined.
Floats give you the ability to model a wider range of numbers since you can allow extremely large numbers and extremely small numbers by allowing the decimal point to "float" rather than be a fixed point in memory. They're also fine in most cases as usually the decimal precision you lose isn't a big deal. They also tend to be smaller than precise data types (not always). However, if you know the size of the values you're expecting ahead of time, it's usually best to use a decimal.
Which value is "correct"? The numeric value. If you're ever comparing a floating point representation of a number vs an exact representation, go with the exact representation.

Which data type saves more space TINYTEXT or VARCHAR for variable data length in MySQL?

I need to store a data into MySQL. Its length is not fixed, it could be 255 or 2 characters long. Should I use TINYTEXT or VARCHAR in order to save space (speed is irrelevant)?
When using VARCHAR, you need to specify maximum number of characters that will be stored in that column. So, if you declare a column to be VARCHAR(255), it means that you can store text with up to 255 characters. The important thing here is that if you insert two characters, only those two characters will be stored, i.e. allocated space will be 2 not 255.
TINYTEXT is one of four TEXT types. They are very similar to VARCHAR, but there are few differences (this depends on MySQL version you are using though). But for version 5.5, there are some limitations when it comes to TEXT types. First one is that you have to specify an index prefix length for indexes on TEXT. And the other one is that TEXT columns can't have default values.
In general, TEXT should be used for (extremely) long values. If you will be using string that will have up to 255 characters, then you should use VARCHAR.
Hope that this helps.
As for data storage space, VARCHAR(255) and TINYTEXT are equivalent:
VARCHAR(M): L + 1 bytes if column values require 0 – 255 bytes, L + 2 bytes if values may require more than 255 bytes.
TINYTEXT: L + 1 bytes, where L < 28.
Source: MySQL Reference Manual: Data Storage Requirements.
Storage space being equal, you may want to check out the following Stack Overflow posts for further reading on when you should use one or the other:
What’s the difference between VARCHAR(255) and TINYTEXT string types in MySQL?
varchar(255) v tinyblob v tinytext