I have an SQL table linked in MS Access which contains a number of short text fields that are limited to 255 characters. This table is updated from an Access form.
I was informed that when data is extracted from the table one of the fields is excessively long based on the contents.
I investigated and ran this query:
SELECT [dbo_NCR User Input].ImpactGrade, Len([impactgrade]) AS length
FROM [dbo_NCR User Input];
...which showed that regardless of the contents, the field length was 255 characters:
Has anyone experienced this issue and if so how can it be resolved to remove the additional characters from the field?
If the data type of the field in the SQL Server table is CHAR(255) as opposed to VARCHAR(255), then 255 bytes are always allocated for the field value, regardless of the true length of the content.
Conversely, VARCHAR(255) only allocates the bytes required to store the content of the field (+2 bytes), up to the given maximum.
Related
I am going to create:
a table for storing IDs and unique text values (which are expected to
be large)
a stored procedure which will have a text value as input parameter
(it will check if the value exists in the above table and return the
corresponding ID if it exists, or inserted a new record if not and
return the new ID as well)
I want to optimize the search of text values using hash value of the text and created index on it. So, during the search I expect a non-clustered index to be used (not the clustered index).
I decided to use the HASHBYTES with SHA2_256 and I am wondering are there any differences/benefits if I am storing the hash value as BINARY(32) or NVARCHAR(16)?
You can't reasonably store a hash value as chars because binary data is not text. Various text processing and comparison functions interpret those chars. For example trailing whitespace is sometimes ignored leading to incorrect results.
Since you've got 32 totally random unstructured bytes to store a binary(32) is the most natural format and it is the fastest one.
What is the size of nullable data types in Microsoft SQL Server DBMS?
For example, non-nullable int should take 4 bytes, how much space will be dedicated to nullable column?
Subquestions: nullable int, char(N), nvarchar(N) - I assume they might be stored differently.
What I've read:
Where to find the size of SQL Server data types - good way to get sql types list and their size for my version of SQL server. But doesn't say a word about nullable types.
http://msdn.microsoft.com/en-us/library/ms189124.aspx - there is a formula for calculating variable-size columns space required: "Variable_Data_Size = 2 + (Num_Variable_Cols x 2) + Max_Var_Size". It's very strange: why it contains *2 multiplier (nothing told about nvarchar - this formula is for all variable sized types as comes from explanation); it must be a typo that Max_Var_Size is added rather than multiplied; and finally it contains +2 bytes for storing the length of the value, but again contains nothing for storing NULL values. As I understand it's possible to use 3 remaining bits of the value-length 2 bytes to store NULL identifier, but is it stored this way really?
How much size "Null" value takes in SQL Server - as for me top answers are confusing. #Mark Byers said "If the field is fixed width storing NULL takes the same space as any other value - the width of the field", but it is not possible to store the standard integer value interval and additional NULL value in the same count of bits. Then "If the field is variable width the NULL value takes up no space" - again storing a NULL can't take no space at all - it has to store some marker for null value. Similar confusion with other answers there: somebody say it takes 2 additional bytes, somebody - that only 1 byte.
http://home.clara.net/drdsl/MSSQL/DataTypes.html - nice table with types sizes, but again nothing dedicated to NULL values.
Nullable columns and non-nullable columns occupy exactly the same storage on a data page. Part of each data page is the null-bit-map, which has a bit for every column in the table, even non-nullable ones.
It is a common misconception that the null-bit-map section of the data page only stores bits for nullable columns. This is not true. The null-bit-map section contains nullable flags for all columns in the table. Here is a good reference explaining this myth. Here is another.
I have wondered why SQL Server (and previously Sybase) use this structure. One possibility is that changing the nullability of a column can be a "fastish" operation. Although the bit much change on all the pages, there is no danger of page splits by introducing a new NULLable field.
Another possibility is that it decouples, a bit, the layout on the page from the table metadata. Although the page does not know column names, it does know everything about columns based on column indexes.
As per Microsoft Support
NULL value in SQL Server 2008 :
-- SPARSE needs 6 bytes additional the row size + 4 bytes per non-null column + the bytes of the values (type doesn’t matter)
-- Regular NULL value needs 1 bit in the NULL bitmap
Empty string in SQL Server 2008:
-- Fixed length data need the full space of data even if an empty string is entered
-- Variable length data need 2 bytes overhead for storing the data
-- There is no empty string on numeric values
-- NULL value needs 1 bit in the NULL bitmap
reference: http://social.msdn.microsoft.com/Forums/sqlserver/en-US/0404de89-70dc-4026-9e2e-13ddc3e7c058/null-data-storage-sql-server-2008?forum=sqldatabaseengine
I have data structure something along the line of this -
class House
{
int id;
string street;
string city;
string review;
string status;
}
street and city are regular strings and should always be less than 32 characters. I take it they should have an SQL data type of nchar(32)? Or should they be varchar?
review is an optional string that users may input. And if they do give it, it can vary hugely, from 4-5 words up to 2000+ characters. So what data type should I use to store this?
status is a flag that can have values of 'New', 'Old', 'UnStarted'. Whatever value it has will be displayed in a datagrid in a desktop application. Should I be storing this field as a string, or a byte with different bits acting as flags?
I would suggest using VARCHAR(32) for address and city if you are designing a United States table. If you are designing an international one, make both fields larger and switch to NVARCHAR(72) for example. NVARCHAR storage takes up more space, but allows non-ASCII characters to be stored....
CHAR(32) reserves 32 bytes of data, regardless of whether the field holds 1 character or 32 characters. In addition, some client programming languages will not trim the extra spaces automatically (which is proper, but might not be expected). NCHAR(32) reserves 64 bytes, since every character is represent in 2 bytes
For the review, I agree with lanks, a TEXT or VARCHAR(max) -(MS SQL specific) would be best. Can the review be longer than 2000 characters? If 2000 is the absolute limit, then I would go with VARCHAR(2000). I would only go with a TEXT field if you can have any length review. Keep in mind, if a user enters a review more than 2000 characters, the database will raise an error if you try to insert it, so your application needs to either restrict the number of characters or handle the error when it occurs.
Status should be the smallest integer your database allows. You can create a second table to provide text descriptions for the codes.
If I have a table with a field which is declared as accepting varchar(100) and then I actually insert the word "hello" how much real storage will be used on the mysql server? Also will an insert of NULL result in no storage being used even though varchar(100) is declared?
What ever the answer is, is it consistent accross different database implementations?
If I have a table with a field which
is declared as accepting varchar(100)
and then I actually insert the word
"hello" how much real storage will be
used on the mysql server?
Mysql will store 5 bytes plus one byte for the length. If the varchar is greater than 255, then it will store 2 bytes for the length.
Note that this is dependent on the charset of the column. If the charset is utf8, mysql will require up to 3 bytes per character. Some storage engines (i.e. memory) will always require the maximum byte length per character for the character set.
Also will an insert of NULL result in
no storage being used even though
varchar(100) is declared?
Making a column nullable means that mysql will have to set aside an extra byte per up to 8 nullable columns per row. This is called the "null mask".
What ever the answer is, is it consistent accross different database implementations?
It's not even consistent between storage engines within mysql!
It really depends on your table's charset.
In contrast to CHAR, VARCHAR values
are stored as a one-byte or two-byte
length prefix plus data. The length
prefix indicates the number of bytes
in the value. A column uses one length
byte if values require no more than
255 bytes, two length bytes if values
may require more than 255 bytes.
- source
UTF-8 often takes more space than an
encoding made for one or a few
languages. Latin letters with
diacritics and characters from other
alphabetic scripts typically take one
byte per character in the appropriate
multi-byte encoding but take two in
UTF-8. East Asian scripts generally
have two bytes per character in their
multi-byte encodings yet take three
bytes per character in UTF-8.
- source
varchar only stores what is used whereas char stores a set number of bytes.
Utf16 sometimes takes less data then utf8, for some rare languages, I don't know which ones.
Guys, is there an option to use COMPRESSed tables in MySql? Like in Apache. Thanks a lot
I have 2 columns containing text one will be max 150 chars long and the other max 700 chars long,
My question is, should I use for both varchar types or should I use text for the 700 chars long column ? why ?
Thanks,
The varchar data type in MySQL < 5.0.3 cannot hold data longer than 255 characters. While in MySQL >= 5.0.3 it has a maximum of 65,535 characters.
So, it depends on the platform you're targeting, and your deployability requirements. If you want to be sure that it will work on MySQL versions less than 5.0.3, go with a text type data field for your longer column
http://dev.mysql.com/doc/refman/5.0/en/char.html
An important consideration is that with varchar the database stores the data directly in the table, and with text the database stores a pointer to a separate tablespace in which the data is stored. So, unless you run into the limit of a row length (64K in MySQL, 4-32K in DB2, 8K in SQL Server 2000) I would normally use varchar.
Not sure about mysql specifically, but in MS SQL you definitely should use a VARCHAR for anything under 8000 characters long, if you want to be able to run any sort of comparison on the value in the field. For example this would be possible with a VARCHAR:
select your_column from your_table
where your_column like '%dogs%'
but not with a TEXT field.
More information regarding TEXT field in mysql 5.4 can be found here and more information about the VARCHAR field can be found here.
I'd pick the option based on the type of data being entered. For example, if it's a (potentially) super long username and a biography, I'd do varchar and text. Sure, you're limiting the bio to 700 chars, but what if you add HTML formatting down the road, keeping the 700 char limit but allowing HTML tags for formatting?
Twitter may use text for their tweets, since it could be quicker to add meta data to the posts (e.g. url href's and #user PK's) to cache the additional data instead of caluclate it every rendering.