nvarchar(4001)? - sql

MSDN has this to say on the subject:
nvarchar [ ( n | max ) ]
Variable-length Unicode character data. ncan be a value from 1 through 4,000. max indicates that the maximum storage size is 2^31-1 bytes. The storage size, in bytes, is two times the number of characters entered + 2 bytes. The data entered can be 0 characters in length. The ISO synonyms for nvarchar are national char varying and national character varying.
This leaves me confused. I can define a column as being 1 - 4000 long, or 2147483647 long but nothing inbetween? Is my understanding correct? Why can't I be explicit about values inbetween?

NVARCHAR(MAX) covers everything else (not just 2 billion characters). If you need more than 4,000 characters the data is most certainly going to be off-page, so as far as behavior is concerned it doesn't matter if you've used 4,001 characters, 10,000 characters, or 10,000,000 characters. It only occupies the space you need, so don't think that you are wasting (2 billion characters - the length of your actual string).

Max will accept values between 4001 and 1073741823 (bear in mind storage size is approx 2x the length of the actual string).
The restriction is basically that anything over 4000 characters must be a MAX.

Because 4000 characters or less has one behavior in terms of storage and MAX has another behavior in terms of storage. And you really don't want to start forcing string length calculations on things that are 1M characters long do you? My current understanding is that up to 4000 characters is stored in-table and MAX is stored out-of-table.
Also NVARCHAR(MAX) and VARCHAR(MAX) are replacements for text and ntext.

Related

database padding space if the value inserted has smaller length than column size- DB2

I was checking if DB pads spaces in a column if the inserted string has fewer characters than the designated length of the column. Example:
lets say size of <column1> is 10 but the value entered is abc - then is it abc_______ which the DB stores where _ represents spaces?
I am asking because I used LTRIM-RTRIM while INSERTing the values and on again fetching the value in the very next minute I got the result as abc_______.
You are using the CHAR or CHARACTER datatype for the column. The CHAR or CHARACTER datatype is a fixed length datatype and is padded with space at the end of the value to fill the column size.
You can use VARCHAR to avoid the padding with spaces at the end of the values.
Note: Make sure you are using CHARACTER_LENGTH on CHARACTER columns to get the correct character length (without padding spaces). The result of LENGTH also includes the padding spaces.
demo on dbfiddle.uk

WHAT is the meaning of Leading Length?

I was checking out the difference between char vs varchar2 from google. I came across a word LEADING LENGTH in this link . THERE it was written that
Suppose you store the string ‘ORATABLE’ in a CHAR(20) field and a VARCHAR2(20) field. The CHAR field will use 22 bytes (2 bytes for leading length). The VARCHAR2 field will use 10 bytes only (8 for the string, 2 bytes for leading length).
Q1:How does the char field will use 22 bytes if the string is of 8 characters if (1 byte = 1 char)?
Q2 What is the LEADING LENGTH ? why it does occupy 2 bytes?
The CHAR() datatype pads the string with characters. So, for 'ORATABLE', it looks like:
'ORATABLE '
12345678901234567890
The "leading length" are two bytes at the beginning that specify the length of the string. Two bytes are needed because one byte is not enough. Two bytes allow lengths up to 65,535 units; one byte would only allow lengths up to 255.
The important point both CHAR() and VARCHAR2() use the same internal format, so there is little reason to sue CHAR(). Personally, I would only use it for fixed-length codes, such as ISO country codes or US social security numbers.

Efficient way to store FilePath

Currently I have a table with the following format/Desc:
ColumnName ColID PK IndexPos Null DataType
ID 1 1 N VARCHAR2 (1 Byte)
FILEPATH 2 N VARCHAR2 (127 Byte)
As you can see the length of ID Column is only 1 Byte we can store only 36 different file paths. I have more than 35 different file paths that has to be stored and retrieved. I know increasing the length of ID solves the issue but I want to also know/suggestion that is there any Efficient way to handle this.
Thanks!
The assertion that you can store only 35 different values in the table is incorrect, because varchar2 characters are not limited to letters and digits (even if they were you'd have 26 letters + 10 digits + 1 empty string = 37, not 35 possibilities).
If you need to store few more paths, say, 40 or 50, you could make your keys mixed case, so 'a' and 'A' would reference different paths. This would instantly give you 26 extra possibilities.
Expanding past the limit of 63 is a little harder, because you need to bring special characters into the mix. However, the theoretical maximum for a single character is 256 plus one combination for an empty string.

SQL - Create Unique AlphaNumeric based on a 10-digit integer stored as VARCHAR

I'm trying to emulate a function in SQL that a client has produced in Excel. In effect, they have a unique, 10-digit numeric value (VARCHAR) as the primary key in one of their enterprise database systems. Within another database, they require a unique, 5-digit alphanumeric identifier. They want that 5-digit alphanumeric value to be a representation of the 10-digit number. So what they did in excel was to split the 10-digit number into pairs, then convert each of those pairs into a hexadecimal value, then stitch them back together.
The EXCEL equation is:
=IF(VALUE(MID(A2,1,4))>0,DEC2HEX(VALUE(MID(A2,3,2)))&DEC2HEX(VALUE(MID(A2,5,2)))&DEC2HEX(VALUE(MID(A2,7,2)))&DEC2HEX(VALUE(MID(A2,9,2))),DEC2HEX(VALUE(MID(A2,5,2)))&DEC2HEX(VALUE(MID(A2,7,2)))&DEC2HEX((VALUE(MID(A2,9,2)))))
I need the SQL equivalent of this. Of course, should someone out there know a better way to accomplish their goal of "a 5-digit alphanumeric identifier" based off the 10-digit number, I'm all ears.
ADDED 8/2/2011
First of all, thank you to everyone for the replies. Nice to see folks willing to help and even enjoying it! Based on all the responses, I'm apt to tell my client they're intent is sound, only their method is off kilter. I'd also like to recommend a solution. So the challenge remains, just modified slightly:
CHALLENGE: Within SQL, take a 10 digit, unique NUMERIC string and represent it ALPHANUMERICALLY in as few characters as possible. The resulting string must also be unique.
Note that the first 3-4 characters in the 10-digit string are likely to be zeros, and that they could be stripped to shorten the resulting alphanumeric string. Not required, but perhaps helpful.
This problem is inherently impossible. You have a 10 digit numeric value that you want to convert to a 5 digit alphanumeric value. Since there are 10 numeric characters, this means that there are 10^10 = 10 000 000 000 unique values for your 10 digit number. Since there are 36 alphanumeric characters (26 letters + 10 numbers), there are 36^5 = 60 466 176 unique values for your 5 digit number. You cannot map a set of 10 billion elements into a set with around 60 million.
Now, lets take a closer look at what your client's code is doing:
So what they did in excel was to split the 10-digit number into pairs, then convert each of those pairs into a hexadecimal value, then stitch them back together.
This isn't 100% accurate. The excel code never uses the first 2 digits, but performs this operation on the remaining 8. There are two main problems with this algorithm which may not be intuitively obvious:
Two 10 digit numbers can map to the same 5 digit number. Consider the numbers 1000000117 and 1000001701. The last four digits of 1000000117 get mapped to 1 11, where the last four digits of 1000001701 get mapped to 11 1. This causes both to map to 00111.
The 5 digit number may not even end up being 5 digits! For example, 1000001616 gets mapped to 001010.
So, what is a possible solution? Well, if you don't care if that 5 digit number is unique or not, in MySQL you can use something like:
hex(<NUMERIC VALUE> % 0xFFFFF)
The log of 10^10 base 2 is 33.219280948874
> return math.log(10 ^ 10) / math.log(2)
33.219280948874
> = 2 ^ 33.21928
9999993422.9114
So, it takes 34 bits to represent this number. In hex this will take 34/4 = 8.5 characters, much more than 5.
> return math.log(10 ^ 10) / math.log(16)
8.3048202372184
The Excel macro is ignoring the first 4 (or 6) characters of the 10 character string.
You could try encoding in base 36 instead of 16. This will get you to 7 characters or less.
> return math.log(10 ^ 10) / math.log(36)
6.4254860446923
The popular base 64 encoding will get you to 6 characters
> return math.log(10 ^ 10) / math.log(64)
5.5365468248123
Even Ascii85 encoding won't get you down to 5.
> return math.log(10 ^ 10) / math.log(85)
5.1829075929158
You need base 100 to get to 5 characters
> return math.log(10 ^ 10) / math.log(100)
5
There aren't 100 printable ASCII characters, so this is not going to work, as zkhr explained as well, unless you're willing to go beyond ASCII.
I found your question interesting (although I don't claim to know the answer) - I googled a bit for you out of interest and found this which may help you http://dpatrickcaldwell.blogspot.com/2009/05/converting-decimal-to-hexadecimal-with.html

Storing a 30KB BLOB in SQL Server 2005

My data is 30KB on disk (Serialized object) was size should the binary field in t-sql be?
Is the brackets bit bytes ?
... so is binary(30000) .... 30KB?
Thanks
You need to use the varbinary(max) data type; the maximum allowed size for binary is 8,000 bytes. Per the MSDN page on binary and varbinary:
varbinary [ ( n | max) ]
Variable-length binary data. n can be a value from 1 through 8,000. max indicates that the maximum storage size is 2^31-1 bytes. The storage size is the actual length of the data entered + 2 bytes. The data that is entered can be 0 bytes in length.
The number after binary() is the number of bytes, see MSDN:
binary [ ( n ) ]
Fixed-length binary data of n bytes. n
must be a value from 1 through 8,000.
Storage size is n+4 bytes.
Whether 30kb is 30000 or 30720 bytes depends on which binary prefix system your file system is using.