HSQLDB - How is the "VARCHAR" type storage handled? - hsqldb

My question must be quiet stupid for some of you, but I've been unable to find the answer directly on HSQLDB website, google, or here (I may I've missed something, but I dont think so, as there isn't much on the web regarding HSQLDB compared to other well known databases).
To explain in more details, my background is a more Oracle DB background... I'm starting with HSQLDB and I wondered, as we can't use type declaration such as:
"mycolumn VARCHAR(25 CHAR)"
"mycolumn VARCHAR(25 BYTE)"
How was the storage managed on HSQLDB as I have to use "mycolumn VARCHAR(25)" instead of the previous solutions. I would be glad if anyone got a good description or a link regarding how characters are stored, to avoid storage issues with special chars for example.
Thanks in advance !
Antoine

HSQLDB uses the Unicode character set with UTF-16 encoding. Therefore all possible characters can be stored in a CHAR, VARCHAR or CLOB column. The declaration size of a VARCHAR column refers to the maximum number of UTF-16 characters allowed.
The physical storage of VARCHAR data on disk is similar to UTF-8 and takes one byte per each Latin character but more than one for other characters. The user does not see this encoding and its only significance is the amount of disk space used for long VARCHAR data.

Related

SSIS error "UTF8" has no equivalant in encoding "WIN1252"

I'm using SSIS package to extract the data from a Postgres database, but I'm getting following error one of the tables.
Character with byte sequence 0xef 0xbf 0xbd in encoding "UTF8" has no
equivalant in encoding "WIN1252
I have no idea how to resolve it. I made all the columns in the sql table to NVARCHAR(MAX) but still no use. Please provide the solution.
The full Unicode character set (as encoded in UTF8) contains tens of thousands of different characters. WIN1252 contains 256. Your data contains characters that cannot be represented in WIN1252.
You either need to export to a more useful character encoding, remove the "awkward" characters from the source database or do some (lossy) translation with SSIS itself (I believe "character map translation" is what you want to search for).
I would recommended first though spending am hour or so googling around the subject of Unicode, it's utf encodings and its relationship to the ISO and WIN character sets. That way you will understand which of the above to choose.

what is the maximum length of varchar(n) in postgresql 9.2 and which is best to use varchar(n) or text?

Hi I am using postgresql 9.2 and I want to use varchar(n) to store some long string but I don't know the maximum length of character which varchar(n) supports. and which one is better to use so could you please suggest me? thanks
tl;dr: 1 GB (each character (really: codepoint) may be represented by 1 or more bytes, depending on where they are on a unicode plane - assuming a UTF-8 encoded database). You should always use text datatype for arbitrary-length character data in Postgresql now.
Explanation:
varchar(n) and text use the same backend storage type (varlena): a variable length byte array with a 32bit length counter. For indexing behavior text may even have some performance benefits. It is considered a best practice in Postgres to use text type for new development; varchar(n) remains for SQL standard support reasons. NB: varchar() (with empty brackets) is a Postgres-specific alias for text.
See also:
http://www.postgresql.org/about/
According to the official documentation ( http://www.postgresql.org/docs/9.2/static/datatype-character.html ):
In any case, the longest possible character string that can be stored is about 1 GB. (The maximum value that will be allowed for n in the data type declaration is less than that. It wouldn't be useful to change this because with multibyte character encodings the number of characters and bytes can be quite different. If you desire to store long strings with no specific upper limit, use text or character varying without a length specifier, rather than making up an arbitrary length limit.)
Searching online reveals that the maximum value allowed varies depending on the installation and compilation options, some users report a maximum of 10485760 characters (10MiB exactly, assuming 1-byte-per-character fixed encoding).
By "the installation and compilation options" I mean that you can always build PostgreSQL from source yourself and before you compile PostgreSQL to make your own database server you can configure how it stores text to change the maximum amount you can store - but if you do this then it means you might run into trouble if you try to use your database files with a "normal", non-customized build of PostgreSQL.

Best data type for storing strings in SQL Server?

What's the best data type to be used when storing strings, like a first name? I've seen varchar and nvarchar both used. Which one is better? Does it matter?
I've also heard that the best length to use is 255, but I don't know why. Is there a specific length that is preferred for strings?
nvarchar stores unicode character data which is required if you plan to store non-English names. If it's a web application, I highly recommend using nvarchar even if you don't plan on being international. The downside is that it consumes twice as much space, 16-bits per character for nvarchar and 8-bits per character for varchar.
What's the best data type to be used
when storing strings, like a first
name? I've seen varchar and nvarchar
both used. Which one is better? Does
it matter?
See What is the difference between nchar(10) and varchar(10) in MSSQL?
If you need non-ASCII characters, you have to use nchar/nvarchar. If you don't, then you may want to use char/varchar to save space.
Note that this issue is specific to MS SQL Server, which doesn't have good support for UTF-8. In other SQL implementations that do, you can use Unicode strings with no extra space requirements (for English).
EDIT: Since this answer was originally written, SQL Server 2019 (15.x) finally introduced UTF-8 support. You may want to consider using it as your default database text encoding.
I've also heard that the best length
to use is 255, but I don't know why.
See Is there a good reason I see VARCHAR(255) used so often (as opposed to another length)?
Is there a specific length that is
preferred for strings?
If you data has a well-defined maximum limit (e.g., 17 characters for a VIN), then use that.
OTOH, if the limit is arbitrary, then choose a generous maximum size to avoid rejecting valid data. In SQL Server, you may want to consider the 900-byte maximum size of index keys.
nvarchar means you can save unicode character inside it. there is 2GB limit for nvarchar type. if the field length is more than 4000 characters, an overflow page is used. smaller fields means one page can hold more rows which increase the query performance.
Generally, for small strings use nvarchar(n), which supports Unicode characters. The string is compressed when used with row or page compression (at least one of which is generally desirable).
Large strings need nvarchar(max), which Unicode compression does not support.
For special-case scenarios when your data set never uses Unicode characters, varchar(n) and varchar(max) restrict the string type of one byte per character.
If you know the max length (n) is less than 256, SQL Server only needs to use 1 byte to store the string length. This reduces storage space by about half a percent compared a string type whose max length is just over 255.

Data Type for storing URLs

What is the best data type to store URLs?
I need to save file system paths for pictures in a database.
URLs are strings and will be of variable lenghts.
If your database system supports this, use VARCHAR.
VARCHAR is quite enough.
CHAR should be used for storing fix length character strings. String values will be space/blank padded before stored on disk. If this type is used to store varibale length strings, it will waste a lot of disk space.
VARCHAR2(4000) is sufficient for your needs
We tend to save them as urlencoded VARCHARs. (Since our URLs are coming to the database from a server, we encode them using PHP's urlencode and then decode them when we retrieve them with urldecode.) Don't think there's really much else that needs done - you could probably just store them as unencoded VARCHARs.
varchar. Choose a suitable max length based on your domain knowledge.

Best database field type for a URL

I need to store a url in a MySQL table. What's the best practice for defining a field that will hold a URL with an undetermined length?
Lowest common denominator max URL length among popular web browsers: 2,083 (Internet Explorer)
http://dev.mysql.com/doc/refman/5.0/en/char.html
Values in VARCHAR columns are variable-length strings. The length can be specified as a value from 0 to 255 before MySQL 5.0.3, and 0 to 65,535 in 5.0.3 and later versions. The effective maximum length of a VARCHAR in MySQL 5.0.3 and later is subject to the maximum row size (65,535 bytes, which is shared among all columns) and the character set used.
So ...
< MySQL 5.0.3 use TEXT
or
>= MySQL 5.0.3 use VARCHAR(2083)
VARCHAR(512) (or similar) should be sufficient. However, since you don't really know the maximum length of the URLs in question, I might just go direct to TEXT. The danger with this is of course loss of efficiency due to CLOBs being far slower than a simple string datatype like VARCHAR.
This really depends on your use case (see below), but storing as TEXT has performance issues, and a huge VARCHAR sounds like overkill for most cases.
My approach: use a generous, but not unreasonably large VARCHAR length, such as VARCHAR(500) or so, and encourage the users who need a larger URL to use a URL shortener such as safe.mn.
The Twitter approach: For a really nice UX, provide an automatic URL shortener for overly-long URL's and store the "display version" of the link as a snippet of the URL with ellipses at the end. (Example: http://stackoverflow.com/q/219569/1235702 would be displayed as stackoverflow.com/q/21956... and would link to a shortened URL http://ex.ampl/e1234)
Notes and Caveats
Obviously, the Twitter approach is nicer, but for my app's needs, recommending a URL shortener was sufficient.
URL shorteners have their drawbacks, such as security concerns. In my case, it's not a huge risk because the URL's are not public and not heavily used; however, this obviously won't work for everyone. safe.mn appears to block a lot of spam and phishing URL's, but I would still recommend caution.
Be sure to note that you shouldn't force your users to use a URL shortener. For most cases (at least for my app's needs), 500 characters is overly sufficient for what most users will be using it for. Only use/recommend a URL shortener for overly-long links.
varchar(max) for SQLServer2005
varchar(65535) for MySQL 5.0.3 and later
This will allocate storage as need and shouldn't affect performance.
You'll want to choose between a TEXT or VARCHAR column based on how often the URL will be used and whether you actually need the length to be unbound.
Use VARCHAR with maxlength >= 2,083 as micahwittman suggested if:
You'll use a lot of URLs per query (unlike TEXT columns, VARCHARs are stored inline with the row)
You're pretty sure that a URL will never exceed the row-limit of 65,535 bytes.
Use TEXT if :
The URL really might break the 65,535 byte row limit
Your queries won't select or update a bunch of URLs at once (or very often). This is because TEXT columns just hold a pointer inline, and the random accesses involved in retrieving the referenced data can be painful.
You should use a VARCHAR with an ASCII character encoding. URLs are percent encoded and international domain names use punycode so ASCII is enough to store them. This will use much less space than UTF8.
VARCHAR(512) CHARACTER SET 'ascii' COLLATE 'ascii_general_ci' NOT NULL
Most browsers will let you put very large amounts of data in a URL and thus lots of things end up creating very large URLs so if you are talking about anything more than the domain part of a URL you will need to use a TEXT column since the VARCHAR/CHAR are limited.
I don't know about other browsers, but IE7 has a 2083 character limit for HTTP GET operations. Unless any other browsers have lower limits, I don't see why you'd need any more characters than 2083.
Here are some SQL data types according to AWS.
You better use varchar(max) which (in terms of size) means varchar (65535).
This will even store your bigger web addresses and will save your space as well.
The max specifier expands the storage capabilities of the varchar,
nvarchar, and varbinary data types. varchar(max), nvarchar(max), and
varbinary(max) are collectively called large-value data types. You can
use the large-value data types to store up to 2^31-1 bytes of data.
See this article on TechNet about using Using Large-Value Data Types
Most web servers have a URL length limit (which is why there is an error code for "URI too long"), meaning there is a practical upper size. Find the default length limit for the most popular web servers, and use the largest of them as the field's maximum size; it should be more than enough.