SQL data types for strings - sql

I have data structure something along the line of this -
class House
{
int id;
string street;
string city;
string review;
string status;
}
street and city are regular strings and should always be less than 32 characters. I take it they should have an SQL data type of nchar(32)? Or should they be varchar?
review is an optional string that users may input. And if they do give it, it can vary hugely, from 4-5 words up to 2000+ characters. So what data type should I use to store this?
status is a flag that can have values of 'New', 'Old', 'UnStarted'. Whatever value it has will be displayed in a datagrid in a desktop application. Should I be storing this field as a string, or a byte with different bits acting as flags?

I would suggest using VARCHAR(32) for address and city if you are designing a United States table. If you are designing an international one, make both fields larger and switch to NVARCHAR(72) for example. NVARCHAR storage takes up more space, but allows non-ASCII characters to be stored....
CHAR(32) reserves 32 bytes of data, regardless of whether the field holds 1 character or 32 characters. In addition, some client programming languages will not trim the extra spaces automatically (which is proper, but might not be expected). NCHAR(32) reserves 64 bytes, since every character is represent in 2 bytes
For the review, I agree with lanks, a TEXT or VARCHAR(max) -(MS SQL specific) would be best. Can the review be longer than 2000 characters? If 2000 is the absolute limit, then I would go with VARCHAR(2000). I would only go with a TEXT field if you can have any length review. Keep in mind, if a user enters a review more than 2000 characters, the database will raise an error if you try to insert it, so your application needs to either restrict the number of characters or handle the error when it occurs.
Status should be the smallest integer your database allows. You can create a second table to provide text descriptions for the codes.

Related

"out of range" error inserting mobile number from servlet to SQL

I am trying to insert mobile number through prepared satement, but I am getting an error stating that type out of range even for Double type.
pstmt.setDouble(6, 9677627718);
How to insert mobile number from servlet to sql?
It says that out of range even for double data.
"Telephone numbers" are not really numbers in the mathematical sense. They are just addresses (similar to email addresses) that happen to be composed of numeric digits. They do not need to be represented as numbers because typical numeric manipulations are not relevant. For example, it makes no sense to subtract one phone number from another.
Therefore you should store telephone numbers in your tables as strings, not as numbers. In particular, a common int (signed, 32-bit) could store the phone number 202-555-1212 (2025551212 with the hyphens removed) but could not store 613-555-1212 because 6135551212 is too large to fit into an int column. (The largest "number" it could hold would be 214-748-3647.)

What is the major difference between Varchar2 and char

Creating Table:
CREATE TABLE test (
charcol CHAR(10),
varcharcol VARCHAR2(10));
SELECT LENGTH(charcol), LENGTH(varcharcol) FROM test;
Result:
LENGTH(CHARCOL) LENGTH(VARCHARCOL)
--------------- ------------------
10 1
Please Let me know what is the difference between Varchar2 and char?
At what times we use both?
Although there are already several answers correctly describing the behaviour of char, I think it needs to be said that you should not use it except in three specific situations:
You are building a fixed-length file or report, and assigning a non-null value to a char avoids the need to code an rpad() expression. For example, if firstname and lastname are both defined as char(20), then firstname||lastname is a shorter way of writing rpad(firstname,20)||rpad(lastname,20) to create
Chuck Norris
You need to distinguish between the explicit empty string '' and null. Normally they are the same thing in Oracle, but assigning '' to a char value will trigger its blank-padding behaviour while null will not, so if it's important to tell the difference, and I can't really think of a reason why it would be, then you have a way to do that.
Your code is ported from (or needs to be compatible with) some other system that requires blank-padding for legacy reasons. In that case you are stuck with it and you have my sympathy.
There is really no reason to use char just because some length is fixed (e.g. a Y/N flag or an ISO currency code such as 'USD'). It's not more efficient, it doesn't save space (there's no mythical length indicator for a varchar2, there's just a blank padding overhead for char), and it doesn't stop anyone entering shorter values. (If you enter 'ZZ' in your char(3) currency column, it will just get stored as 'ZZ '.) It's not even backward-compatible with some ancient version of Oracle that once relied on it, because there never was one.
And the contagion can spread, as (following best practice) you might anchor a variable declaration using something like sales.currency%type. Now your l_sale_currency variable is a stealth char which will get invisibly blank-padded for shorter values (or ''), opening the door to obscure bugs where l_sale_currency does not equal l_refund_currency even though you assigned 'ZZ' to both of them.
Some argue that char(n) (where n is some character length) indicates that values are expected to be n characters long, and this is a form of self-documentation. But surely if you are serious about a 3-character format (ISO-Alpha-3 country codes rather than ISO-Alpha-2, for example), wouldn't you define a constraint to enforce the rule, rather than letting developers glance at a char(3) datatype and draw their own conclusions?
CHAR was introduced in Oracle 6 for, I'm sure, ANSI compatibility reasons. Probably there are potential customers deciding which database product to purchase and ANSI compatibility is on their checklist (or used to be back then), and CHAR with blank-padding is defined in the ANSI standard, so Oracle needs to provide it. You are not supposed to actually use it.
Simple example to show the difference:
SELECT
'"'||CAST('abc' AS VARCHAR2(10))||'"',
'"'||CAST('abc' AS CHAR(10))||'"'
FROM dual;
'"'||CAST('ABC'ASVARCHAR2(10))||'"' '"'||CAST('ABC'ASCHAR(10))||'"'
----------------------------------- -------------------------------
"abc" "abc "
1 row selected.
The CHAR is usefull for expressions where the length of charaters is always fix, e.g. postal code for US states, for example CA, NY, FL, TX
Just to avoid confusion about much wrong information. Here are some information about difference including performance
Reference: https://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:2668391900346844476
Since a char is nothing more than a VARCHAR2 that is blank padded out
to the maximum length - that is, the difference between the columns X
and Y below:
create table t ( x varchar2(30), y char(30) ); insert into t (x,y)
values ( rpad('a',' ',30), 'a' );
IS ABSOLUTELY NOTHING, and given that the difference between columns X
and Y below:
insert into t (x,y) values ('a','a')
is that X consumes 3 bytes (null indicator, leading byte length, 1
byte for 'a') and Y consumes 32 bytes (null indicator, leading byte
length, 30 bytes for 'a ' )
Umm, varchar2 is going to be somewhat "at an advantage performance
wise". It helps us NOT AT ALL that char(30) is always 30 bytes - to
us, it is simply a varchar2 that is blank padded out to the maximum
length. It helps us in processing - ZERO, zilch, zippo.
Anytime you see anyone say "it is up to 50% faster", and that is it -
no example, no science, no facts, no story to back it up - just laugh
out loud at them and keep on moving along.
There are other "made up things" on that page as well, for example:
"Searching is faster in CHAR as all the strings are stored at a
specified position from the each other, the system doesnot have to
search for the end of string. Whereas in VARCHAR the system has to
first find the end of string and then go for searching."
FALSE: a char is just a varchar2 blank padded - we do not store
strings "at a specified position from each other". We do search for
the end of the string - we use a leading byte length to figure things
out.
CHAR
CHAR should be used for storing fix length character strings. String values will be space/blank padded before stored on disk. If this type is used to store varibale length strings, it will waste a lot of disk space.
VARCHAR2
VARCHAR2 is used to store variable length character strings. The string value's length will be stored on disk with the value itself.
And
At what times we use both?
Its all depend upon your requirement.
CHAR type has fixed size, so if you say it is 10 bytes, then it always stores 10 bytes in the database and it doesn't matter whether you store any text or just empty 10 bytes
VARCHAR2 size depends on how many bytes you are actually going to store in the database. The number you specify is just the maximum number of bytes that can be stored (although 1 byte is minimum)
You should use CHAR when dealing with fixed length strings (you know in advance the exact length of string you will be storing) - database can then manipulate with it better and faster since it knows the exact lenght
You should use VARCHAR2 when you don't know the exact lenght of stored strings.
Situation you would use both may be:
name VARCHAR2(255),
zip_code CHAR(5) --if your users have only 5 place zip codes
When stored in a database, varchar2 uses only the allocated space. E.g. if you have a varchar2(1999) and put 50 bytes in the table, it will use 52 bytes.
But when stored in a database, char always uses the maximum length and is blank-padded. E.g. if you have char(1999) and put 50 bytes in the table, it will consume 2000 bytes.
CHAR is used for storing fix length character strings. It will waste a lot of disk space if this type is used to store varibale length strings.
VARCHAR2 is used to store variable length character strings.
At what times we use both?
This may vary and depend on your requirement.
EDIT:-
Lets understand this with an example, If you have an student name column with size 10; sname CHAR(10) and If a column value 'RAMA' is inserted, 6 empty spaces will be inserted to the right of the value. If this was a VARCHAR column; sname VARCHAR2(10). Then Varchar will take 4 spaces out of 10 possible and free the next 6 for other usage.

How to choose SQL DDL type parameters?

All the SQL dialects I've seen use to either allow or to require to specify an integer argument for some of the data types they support when defining a table. But I haven't managed to find any comprehensive information (at least for MySQL and SQLite) about what exactly do these numbers mean and how to chose them adequately...
If you mean the notation like INT(11), VARCHAR(255), then normally it's the length of the values stored (or retrieved) from the table.
http://dev.mysql.com/doc/refman/5.5/en/numeric-type-overview.html
http://dev.mysql.com/doc/refman/5.5/en/string-type-overview.html
There's an "M" in parentheses after name of every type which supports it.
Also, keep in mind that these numbers may affect both storage and selection length. For example, if you define a column as VARCHAR(100), the actual space reserved for each value will still be 255. But the values you retrieve in SELECT will be trimmed down to 100 characters. And if you define a VARCHAR(256), then it reserves up to 65535 characters, if I remember correctly.
The same with INT(5). It still reserves space for storing values up to 2147483647 (max value for signed integers), but trims the input/output to 5 digits.

How much real storage is used with a varchar(100) declaration in mysql?

If I have a table with a field which is declared as accepting varchar(100) and then I actually insert the word "hello" how much real storage will be used on the mysql server? Also will an insert of NULL result in no storage being used even though varchar(100) is declared?
What ever the answer is, is it consistent accross different database implementations?
If I have a table with a field which
is declared as accepting varchar(100)
and then I actually insert the word
"hello" how much real storage will be
used on the mysql server?
Mysql will store 5 bytes plus one byte for the length. If the varchar is greater than 255, then it will store 2 bytes for the length.
Note that this is dependent on the charset of the column. If the charset is utf8, mysql will require up to 3 bytes per character. Some storage engines (i.e. memory) will always require the maximum byte length per character for the character set.
Also will an insert of NULL result in
no storage being used even though
varchar(100) is declared?
Making a column nullable means that mysql will have to set aside an extra byte per up to 8 nullable columns per row. This is called the "null mask".
What ever the answer is, is it consistent accross different database implementations?
It's not even consistent between storage engines within mysql!
It really depends on your table's charset.
In contrast to CHAR, VARCHAR values
are stored as a one-byte or two-byte
length prefix plus data. The length
prefix indicates the number of bytes
in the value. A column uses one length
byte if values require no more than
255 bytes, two length bytes if values
may require more than 255 bytes.
- source
UTF-8 often takes more space than an
encoding made for one or a few
languages. Latin letters with
diacritics and characters from other
alphabetic scripts typically take one
byte per character in the appropriate
multi-byte encoding but take two in
UTF-8. East Asian scripts generally
have two bytes per character in their
multi-byte encodings yet take three
bytes per character in UTF-8.
- source
varchar only stores what is used whereas char stores a set number of bytes.
Utf16 sometimes takes less data then utf8, for some rare languages, I don't know which ones.
Guys, is there an option to use COMPRESSed tables in MySql? Like in Apache. Thanks a lot

SQL When to use Which Data Type

Hi I was wondering when I should use the different data types. As in in my table, how can I decide which to use: nvarchar, nchar, varchar, varbinary, etc.
Examples:
What would I use for a ... column:
Phone number,
Address,
First Name, Last Name,
Email,
ID number,
etc.
Thanks for any help!
As a general rule, I would not define anything as a "number" field if I wasn't going to be doing arithmetic on it, even if the data itself was numeric.
Your "phone" field is one example. I'd define that as a varchar.
Varchar, Integer, and Bit cover 99% of my day to day uses.
The question really depends on your requirements. I know that's not a particularly satisfactory answer, but it's true.
The n..char data types are for Unicode data, so if you're going to need to use unicode character sets in your data you should use those types as opposed to their "non-n" analogs. the nchar and char type are fixed length, and the nvarchar and varchar type can have a variable length, which will effect the size of the column on the disk and in memory. Generally I would say to use the type that uses the least disk space but fits for your needs.
This page has links to the Microsoft descriptions of these datatypes for SQL Server 2005, many of which give pointers for when to use which type. You might be particularly interested in this page regarding char and varchar types.
A data type beginning with n means it can be used for unicode characters... eg nVarchar.
Selection of integers is also quite fun.
http://www.databasejournal.com/features/mssql/article.phpr/2212141/Choosing-SQL-Server-2000-Data-Types.htm
The most common data type i use is varchar....
The N* data types (NVARCHAR, NCHAR, NTEXT) are for Unicode strings. They take up two times the space their "normal" pendants (VARCHAR, CHAR, TEXT) need, but they can store Unicode without conversion and possible loss of fidelity.
The TEXT data types can store nearly unlimited amounts of data, but they perform not as good as the CHAR data types because they are stored outside of the record.
THE VARCHAR data types are of variable length. They will not be padded with spaces at the end, but their CHAR pendants will (a CHAR(20) is always twenty characters long, even if if contains 5 letters only. The remaining 15 will be spaces).
The binary data types are for binary data, whatever you care to store into them (images are a primary example).
Other people have given good general answers, but I'd add one important point: when using VARCHAR()s (which I would recommend for those kinds of fields), be sure to use a length that's big enough for any reasonable value. For example, I typically declare VARCHAR(100) for a name, e-mail address, domain name, city name, etc., and VARCHAR(200) for an URL or street address.
This is more than you'll routinely need. In fact, 30 characters is enough for almost all of these values (except full name, but a good database should always store first and last name separately), but it's better than having to change data types some day down the road. There's very little cost in specifying a higher-than-necessary length for a VARCHAR, but note that VARCHAR(MAX) and TEXT do entail significant overhead, so use them only when necessary.
Here's a post which points out a case where a longer-than-necessary VARCHAR can hurt performance: Importance of varchar length in MySQL table. Goes to show that everything has a cost, though in general I'd still favor long VARCHARs.