Using "power of two numbers" for length of database columns - sql

I always try to use power of two numbers to define the length of my database columns, either VARCHAR or CHAR type.
Researching by Internet and debate with partners we do not achieve clarify anything about that, if it take advantaged of full clusters usage or something like that, so the question es simple:
Is it better use power of two numbers to define the length of the database VARCHAR and CHAR columns?

Is it better use power of two numbers to define the length of the database VARCHAR and CHAR columns?
No.
CHAR fields should be used for codes that take up the same number of characters. For example, CHAR(4) for a four character code column.
Since VARCHAR fields only use the number of characters needed, plus length bytes, you may set your VARCHAR fields to the maximum length for your database. For example VARCHAR(255) uses only one byte for the length, while VARCHAR(65535) will use two bytes.
Depending on which database system we're talking about, there might be a lower limit for VARCHAR than 65,535. There is also a limit to how long a row can be in bytes.

Related

PostgreSQL char varchar datatype search speed difference

I will create index for a column in PostgreSQL.
I wonder if there is a search speed difference between char and varchar datatype?
When you store a string as a char(n), extra spaces are added to pad it out to length n.
When you perform an operation on a char, extra logic is required to ignore any trailing spaces.
Aside from that, char is identical to varchar in Postgres.
So, if you are dealing with variable-length strings, use varchar. If your values are fixed-length, then there is not much difference, though a char column might make your database schema slightly easier to understand.
Note that char cannot tell the difference between literal spaces and padding characters, which leads to some odd behaviour (as required by the SQL standard). For example, the values ''::char(1) and ' '::char(1) are considered equal, which is probably not what you'd expect. So if in doubt, use varchar.
If you have variable-length values and you don't know the maximum size, you may want text instead. Storage and performance of text is exactly the same as varchar(n) (all string values are handled identically by TOAST), but text has no maximum length (other than the 1GB limit which applies to all types).

SQL - Numeric data type with leading zeros

I need to store Medicare APC codes. I believe the format requires 4 numbers. Leading zeros are relevant. Is there any way to store this data type with verification? How should I store this data (varchar(4), int)?
This kind of issue, storing zero leading numbers that need to be treated as Numeric values on some scenarios (i.e. sorting) and as textual values in others (i.e. addresses) is always a pain and there is no one answer that is best for all users. At my company we have a database that stores numbers as text for codes (not Medicare APC codes) and we must pad them with zero’s so they will sort properly when used in an order operation.
Do not use a numeric data type for this because the item is not a true number but textual data that uses numeric characters. You will not be performing any calculations or aggregates on the codes and so the only benefit to storing them as a number would be to ensure proper sorting of the codes and that can be done with the code stored as text by padding it with zeros where needed. If you sue a numeric data type then any time the code is combined with other textual values you will have to explicitly convert it to CHAR/VARCHAR or let SQL Server do it since implicit conversions should always be avoided that means a lot of extra work for you and the query processor any time the code is used.
Assuming you decide to go with a textual data type the question then is should you use VARCHAR or CHAR and while many who have posted say VARCHAR I would suggest you go with CHAR set to a length of 4. WHY?
The VARCHAR data type is for textual data in which the size (the length or number of characters) is unknown in advance. For this Medicare code we know the length will always be at least 4 and possibly no more than 4 for the foreseeable future. SQL Server handles the storage of the data differently between CHAR and VARCHAR. SQL Server’s BOL (Books On Line) says :
Use CHAR when the size of the column data entries are consistent
Use VARCHAR when the size of the column data varies considerably.
I can’t say for certain this is true for SQL Server 2008 and up but for earlier versions, the use of a VARCHAR data type carries an extra overhead of 1 byte per row of data per column in a table that has a VARCHAR data type. If the data stored is always the same size and in your scenario it sounds like it is then this extra byte is a waste.
In the end it’s up to you as to whether you like CHAR or VARCHAR better but definitely don’t use a numeric data type to store a fixed length code.
That's not numeric data; it's textual data that happens to contain digits.
Use a VARCHAR.
I agree, using
CHAR(4)
for the check constraint use
check( APC_ODE LIKE '[0-9][0-9][0-9][0-9]' )
This will force a 4 digit number only to be accepted...
varchar(4)
optionally, you can still add a check constraint to ensure the data is numeric with leading zeros. This example will throw exceptions in Oracle. In other RDBMS, you could use regular expression checks:
alter table X add constraint C
check (cast(APC_CODE as int) = cast(APC_CODE as int))
If you are certain that the APC codes will always be numeric (that is if it wouldn't change in the near future), a better way would be to leave the database column as is, and handle the formatting (to include leading zeros) at places where you use this field values.
If you need leading 0s, then you must use a varchar or other string data type.
There are ways to format the output for leading 0s without compromising your actual data.
See this blog entry for an easy method.
CHAR(4) seems more appropriate to me (if I understood you right, and the code is always 4 digits).
What you want to use is a VARCHAR data type with a CHECK constraint, using LIKE with a pattern to check for numeric values.
in TSQL
check( isnumeric(APC_ODE) = 1)

Best data type for storing strings in SQL Server?

What's the best data type to be used when storing strings, like a first name? I've seen varchar and nvarchar both used. Which one is better? Does it matter?
I've also heard that the best length to use is 255, but I don't know why. Is there a specific length that is preferred for strings?
nvarchar stores unicode character data which is required if you plan to store non-English names. If it's a web application, I highly recommend using nvarchar even if you don't plan on being international. The downside is that it consumes twice as much space, 16-bits per character for nvarchar and 8-bits per character for varchar.
What's the best data type to be used
when storing strings, like a first
name? I've seen varchar and nvarchar
both used. Which one is better? Does
it matter?
See What is the difference between nchar(10) and varchar(10) in MSSQL?
If you need non-ASCII characters, you have to use nchar/nvarchar. If you don't, then you may want to use char/varchar to save space.
Note that this issue is specific to MS SQL Server, which doesn't have good support for UTF-8. In other SQL implementations that do, you can use Unicode strings with no extra space requirements (for English).
EDIT: Since this answer was originally written, SQL Server 2019 (15.x) finally introduced UTF-8 support. You may want to consider using it as your default database text encoding.
I've also heard that the best length
to use is 255, but I don't know why.
See Is there a good reason I see VARCHAR(255) used so often (as opposed to another length)?
Is there a specific length that is
preferred for strings?
If you data has a well-defined maximum limit (e.g., 17 characters for a VIN), then use that.
OTOH, if the limit is arbitrary, then choose a generous maximum size to avoid rejecting valid data. In SQL Server, you may want to consider the 900-byte maximum size of index keys.
nvarchar means you can save unicode character inside it. there is 2GB limit for nvarchar type. if the field length is more than 4000 characters, an overflow page is used. smaller fields means one page can hold more rows which increase the query performance.
Generally, for small strings use nvarchar(n), which supports Unicode characters. The string is compressed when used with row or page compression (at least one of which is generally desirable).
Large strings need nvarchar(max), which Unicode compression does not support.
For special-case scenarios when your data set never uses Unicode characters, varchar(n) and varchar(max) restrict the string type of one byte per character.
If you know the max length (n) is less than 256, SQL Server only needs to use 1 byte to store the string length. This reduces storage space by about half a percent compared a string type whose max length is just over 255.

What's the difference between VARCHAR and CHAR?

What's the difference between VARCHAR and CHAR in MySQL?
I am trying to store MD5 hashes.
VARCHAR is variable-length.
CHAR is fixed length.
If your content is a fixed size, you'll get better performance with CHAR.
See the MySQL page on CHAR and VARCHAR Types for a detailed explanation (be sure to also read the comments).
CHAR
Used to store character string value of fixed length.
The maximum no. of characters the data type can hold is 255 characters.
It's 50% faster than VARCHAR.
Uses static memory allocation.
VARCHAR
Used to store variable length alphanumeric data.
The maximum this data type can hold is up to
Pre-MySQL 5.0.3: 255 characters.
Post-MySQL 5.0.3: 65,535 characters shared for the row.
It's slower than CHAR.
Uses dynamic memory allocation.
CHAR Vs VARCHAR
CHAR is used for Fixed Length Size Variable
VARCHAR is used for Variable Length Size Variable.
E.g.
Create table temp
(City CHAR(10),
Street VARCHAR(10));
Insert into temp
values('Pune','Oxford');
select length(city), length(street) from temp;
Output will be
length(City) Length(street)
10 6
Conclusion: To use storage space efficiently must use VARCHAR Instead CHAR if variable length is variable
A CHAR(x) column can only have exactly x characters.
A VARCHAR(x) column can have up to x characters.
Since your MD5 hashes will always be the same size, you should probably use a CHAR.
However, you shouldn't be using MD5 in the first place; it has known weaknesses.
Use SHA2 instead.
If you're hashing passwords, you should use bcrypt.
What's the difference between VARCHAR and CHAR in MySQL?
To already given answers I would like to add that in OLTP systems or in systems with frequent updates consider using CHAR even for variable size columns because of possible VARCHAR column fragmentation during updates.
I am trying to store MD5 hashes.
MD5 hash is not the best choice if security really matters. However, if you will use any hash function, consider BINARY type for it instead (e.g. MD5 will produce 16-byte hash, so BINARY(16) would be enough instead of CHAR(32) for 32 characters representing hex digits. This would save more space and be performance effective.
Varchar cuts off trailing spaces if the entered characters is shorter than the declared length, while char will not. Char will pad spaces and will always be the length of the declared length. In terms of efficiency, varchar is more adept as it trims characters to allow more adjustment. However, if you know the exact length of char, char will execute with a bit more speed.
CHAR is fixed length and VARCHAR is variable length. CHAR always uses the same amount of storage space per entry, while VARCHAR only uses the amount necessary to store the actual text.
CHAR is a fixed length field; VARCHAR is a variable length field. If you are storing strings with a wildly variable length such as names, then use a VARCHAR, if the length is always the same, then use a CHAR because it is slightly more size-efficient, and also slightly faster.
In most RDBMSs today, they are synonyms. However for those systems that still have a distinction, a CHAR field is stored as a fixed-width column. If you define it as CHAR(10), then 10 characters are written to the table, where "padding" (typically spaces) is used to fill in any space that the data does not use up. For example, saving "bob" would be saved as ("bob"+7 spaces). A VARCHAR (variable character) column is meant to store data without wasting the extra space that a CHAR column does.
As always, Wikipedia speaks louder.
CHAR
CHAR is a fixed length string data type, so any remaining space in the field is padded with blanks.
CHAR takes up 1 byte per character. So, a CHAR(100) field (or variable) takes up 100 bytes on disk, regardless of the string it holds.
VARCHAR
VARCHAR is a variable length string data type, so it holds only the characters you assign to it.
VARCHAR takes up 1 byte per character, + 2 bytes to hold length information (For example, if you set a VARCHAR(100) data type = ‘Dhanika’, then it would take up 7 bytes (for D, H, A, N, I, K and A) plus 2 bytes, or 9 bytes in all.)
CHAR
Uses specific allocation of memory
Time efficient
VARCHAR
Uses dynamic allocation of memory
Memory efficient
The char is a fixed-length character data type, the varchar is a variable-length character data type.
Because char is a fixed-length data type, the storage size of the char value is equal to the maximum size for this column. Because varchar is a variable-length data type, the storage size of the varchar value is the actual length of the data entered, not the maximum size for this column.
You can use char when the data entries in a column are expected to be the same size.
You can use varchar when the data entries in a column are expected to vary considerably in size.
Distinguishing between the two is also good for an integrity aspect.
If you expect to store things that have a rule about their length such as yes or no then you can use char(1) to store Y or N. Also useful for things like currency codes, you can use char(3) to store things like USD, EUR or AUD.
Then varchar is better for things were there is no general rule about their length except for the limit. It's good for things like names or descriptions where there is a lot of variation of how long the values will be.
Then the text data type comes along and puts a spanner in the works (although it's generally just varchar with no defined upper limit).
according to High Performance MySQL book:
VARCHAR stores variable-length character strings and is the most common string data type. It can require less storage space than
fixed-length types, because it uses only as much space as it needs
(i.e., less space is used to store shorter values). The exception is a
MyISAM table created with ROW_FORMAT=FIXED, which uses a fixed amount
of space on disk for each row and can thus waste space. VARCHAR helps
performance because it saves space.
CHAR is fixed-length: MySQL always allocates enough space for the specified number of characters. When storing a CHAR value, MySQL
removes any trailing spaces. (This was also true of VARCHAR in MySQL
4.1 and older versions—CHAR and VAR CHAR were logically identical and differed only in storage format.) Values are padded with spaces as
needed for comparisons.
Char has a fixed length (supports 2000 characters), it is stand for character is a data type
Varchar has a variable length (supports 4000 characters)
Char or varchar- it is used to enter texual data where the length can be indicated in brackets
Eg- name char (20)
CHAR :
Supports both Character & Numbers.
Supports 2000 characters.
Fixed Length.
VARCHAR :
Supports both Character & Numbers.
Supports 4000 characters.
Variable Length.
any comments......!!!!

SQL When to use Which Data Type

Hi I was wondering when I should use the different data types. As in in my table, how can I decide which to use: nvarchar, nchar, varchar, varbinary, etc.
Examples:
What would I use for a ... column:
Phone number,
Address,
First Name, Last Name,
Email,
ID number,
etc.
Thanks for any help!
As a general rule, I would not define anything as a "number" field if I wasn't going to be doing arithmetic on it, even if the data itself was numeric.
Your "phone" field is one example. I'd define that as a varchar.
Varchar, Integer, and Bit cover 99% of my day to day uses.
The question really depends on your requirements. I know that's not a particularly satisfactory answer, but it's true.
The n..char data types are for Unicode data, so if you're going to need to use unicode character sets in your data you should use those types as opposed to their "non-n" analogs. the nchar and char type are fixed length, and the nvarchar and varchar type can have a variable length, which will effect the size of the column on the disk and in memory. Generally I would say to use the type that uses the least disk space but fits for your needs.
This page has links to the Microsoft descriptions of these datatypes for SQL Server 2005, many of which give pointers for when to use which type. You might be particularly interested in this page regarding char and varchar types.
A data type beginning with n means it can be used for unicode characters... eg nVarchar.
Selection of integers is also quite fun.
http://www.databasejournal.com/features/mssql/article.phpr/2212141/Choosing-SQL-Server-2000-Data-Types.htm
The most common data type i use is varchar....
The N* data types (NVARCHAR, NCHAR, NTEXT) are for Unicode strings. They take up two times the space their "normal" pendants (VARCHAR, CHAR, TEXT) need, but they can store Unicode without conversion and possible loss of fidelity.
The TEXT data types can store nearly unlimited amounts of data, but they perform not as good as the CHAR data types because they are stored outside of the record.
THE VARCHAR data types are of variable length. They will not be padded with spaces at the end, but their CHAR pendants will (a CHAR(20) is always twenty characters long, even if if contains 5 letters only. The remaining 15 will be spaces).
The binary data types are for binary data, whatever you care to store into them (images are a primary example).
Other people have given good general answers, but I'd add one important point: when using VARCHAR()s (which I would recommend for those kinds of fields), be sure to use a length that's big enough for any reasonable value. For example, I typically declare VARCHAR(100) for a name, e-mail address, domain name, city name, etc., and VARCHAR(200) for an URL or street address.
This is more than you'll routinely need. In fact, 30 characters is enough for almost all of these values (except full name, but a good database should always store first and last name separately), but it's better than having to change data types some day down the road. There's very little cost in specifying a higher-than-necessary length for a VARCHAR, but note that VARCHAR(MAX) and TEXT do entail significant overhead, so use them only when necessary.
Here's a post which points out a case where a longer-than-necessary VARCHAR can hurt performance: Importance of varchar length in MySQL table. Goes to show that everything has a cost, though in general I'd still favor long VARCHARs.