How long should SQL email fields be? [duplicate] - sql

This question already has answers here:
What is the optimal length for an email address in a database?
(9 answers)
Closed 9 years ago.
I recognize that an email address can basically be indefinitely long so any size I impose on my varchar email address field is going to be arbitrary. However, I was wondering what the "standard" is? How long do you guys make it? (same question for Name field...)
update: Apparently the max length for an email address is 320 (<=64 name part, <= 255 domain). Do you use this?

The theoretical limit is really long but do you really need worry about these long Email addresses? If someone can't login with a 100-char Email, do you really care? We actually prefer they can't.
Some statistical data may shed some light on the issue. We analyzed a database with over 10 million Email addresses. These addresses are not confirmed so there are invalid ones. Here are some interesting facts,
The longest valid one is 89.
There are hundreds longer ones up to the limit of our column (255) but they are apparently fake by visual inspection.
The peak of the length distribution is at 19.
There isn't long tail. Everything falls off sharply after 38.
We cleaned up the DB by throwing away anything longer than 40. The good news is that no one has complained but the bad news is not many records got cleaned out.

I've in the past just done 255 because that's the so-ingrained standard of short but not too short input. That, and I'm a creature of habit.
However, since the max is 319, I'd do nvarchar(320) on the column. Gotta remember the #!
nvarchar won't use the space that you don't need, so if you only have a 20 character email address, it will only take up 20 bytes. This is in contrast to a nchar which will always take up its maximum (it right-pads the value with spaces).
I'd also use nvarchar in lieu of varchar since it's Unicode. Given the volatility of email addresses, this is definitely the way to go.

The following email address is only 94 characters:
i.have.a.really.long.name.like.seetharam.krishnapillai#AReallyLongCompanyNameOfSomeKind.com.au
Would an organisation actually give you an email that long?
If they were stupid enough to, would you actually use an email address like that?
Would anyone? Of course not. Too long to type and too hard to remember.
Even a 92-year-old technophobe would figure out how to sign up for a nice short gmail address, and just use that, rather than type this into your registration page.
Disk space probably isn't an issue, but there are at least two problems with allowing user input fields to be many times longer than they need to be:
Displaying them could mess up your UI (at best they will be cut off, at worst they push your containers and margins around)
Malicious users can do things with them you can't anticipate (like those cases where hackers used a free online API to store a bunch of data)
I like 50 chars:
123456789.123456789.123456789#1234567890123456.com
If one user in a million has to use their other email address to use my app, so be it.
(Statistics show that no-one actually enters more than about 40 chars for email address, see e.g.: ZZ Coder's answer https://stackoverflow.com/a/1297352/87861)

According to this text, based on the proper RFC documents, it's not 320 but 254:
http://www.eph.co.uk/resources/email-address-length-faq/
Edit:
Using WayBack Machine:
https://web.archive.org/web/20120222213813/http://www.eph.co.uk/resources/email-address-length-faq/
What is the maximum length of an email address?
254 characters
There appears to be some confusion over the maximum valid email
address size. Most people believe it to be 320 characters (64
characters for the username + 255 characters for the domain + 1
character for the # symbol). Other sources suggest 129 (64 + 1 + 64)
or 384 (128+1+255, assuming the username doubles in length in the
future).
This confusion means you should heed the 'robustness principle'
("developers should carefully write software that adheres closely to
extant RFCs but accept and parse input from peers that might not be
consistent with those RFCs." - Wikipedia) when writing software that
deals with email addresses. Furthermore, some software may be crippled
by naive assumptions, e.g. thinking that 50 characters is adequate
(examples). Your 200 character email address may be technically valid
but that will not help you if most websites or applications reject it.
The actual maximum email length is currently 254 characters:
"The original version of RFC 3696 did indeed say 320 was the maximum
length, but John Klensin (ICANN) subsequently accepted this was
wrong."
"This arises from the simple arithmetic of maximum length of a domain
(255 characters) + maximum length of a mailbox (64 characters) + the #
symbol = 320 characters. Wrong. This canard is actually documented in
the original version of RFC3696. It was corrected in the errata.
There's actually a restriction from RFC5321 on the path element of an
SMTP transaction of 256 characters. But this includes angled brackets
around the email address, so the maximum length of an email address is
254 characters." - Dominic Sayers

I use varchar(64) i do not think anyone could have longer email

If you're really being pendantic about it, make a username varchar(60), domain varchar(255). Then you can do ridiculous statistics on domain usage that is slightly faster than doing it as a single field. If you're feeling really gun-ho about optimization, that will also make your SMTP server able to send out emails with fewer connections / better batching.

RFC 5321 (the current SMTP spec, obsoletes RFC2821) states:
4.5.3.1.1. Local-part
The maximum total length of a user
name or other local-part is 64
octets.
4.5.3.1.2. Domain
The maximum total length of a
domain name or number is 255 octets.
This pertains to just localpart#domain, for a total of 320 ASCII (7-bit) characters.
If you plan to normalize your data, perhaps by splitting the localpart and domain into separate fields, additional things to keep in mind:
A technique known as VERP may result in full-length localparts for automatically generated mail (may not be relevant to your use case)
domains are case insensitive; recommend lowercasing the domain portion
localparts are case sensitive; user#domain.com and USER#domain.com are technically different addresses per the specs, although the policy at the domain.com may be to treat the two addresses as equivalent. It's best to restrict localpart case folding to domains that are known to do this.

For email, regardless of the spec, I virtually always go with 512 (nvarchar). Names and surnames are similar.
Really, you need to look at how much you care about having a little extra data. For me, mostly, it's not a worry, so I'll err on the conservative side. But if you've decided, through logically and accurate means, that you'll need to conserve space, then do so. But in general, be conservative with field sizes, and life shall be good.
Note that probably not all email clients support the RFC, so regardless of what it says, you may encounter different things in the wild.

Related

Bitcoin mnemonic words/keys. Can be shorter than 12 by increacing set of words? Good idea or not?

Bitcoin mnemonic private key consist of 12 words. Theese words represent private key.
I was thinking if it is possible to make it shorter than 12 by increasing set of words from which to choose. It would be easier to remember. Lets say we have set of 10000 words. How many words would be enough to represent one private key then? Does somebody know the exact calculation? Or any suggestion why this is not a good idea?
Thank you.
The seed phrase can contain any number of words that you want it to. Their function is simply to allow recreation of the private key for the Bitcoin wallet.
Regarding the length of 12 words, taken from here:
The English-language wordlist for the BIP39 standard has 2048 words,
so if the phrase contained only 12 random words, the number of
possible combinations would be 2048^12 = 2^132 and the phrase would
have 132 bits of security. However, some of the data in a BIP39 phrase
is not random, so the actual security of a 12-word BIP39 seed
phrase is only 128 bits. This is approximately the same strength as
all Bitcoin private keys, so most experts consider it to be
sufficiently secure.
So although 12 would seem the correct amount, the important thing about them is that they must be securely stored, preferably in memory, but at least written and locked somewhere safe, and never stored in plaintext online or on your PC\devices.
One afterthought - by increasing the number of words that are used in the phrase, you could actually make it less secure because the longer it is, the more certain it would be that it must be recored somewhere physical rather than in your own memory.

Minimum precision SQL numbers

What is the best way to check for minimum precision with numbers in SQL with Oracle database?
CreditCardNumber NUMBER(16) NOT NULL
CHECK (CreditCardNumber LIKE '________________')
or
CreditCardNumber NUMBER(16) NOT NULL
CHECK (REGEXP_LIKE(CreditCardNumber, '^\d{16}'))
I understand this is presentation level checking but I heard it's still good practice to avoid illogical data on the database level.
None of your function really work as expected as you will end-up having an implicit number to string conversion. Losing leading zeros. Maybe this is not a problem in that particular case, assuming a credit card number never start with a leading zero (and never will ? — according to ISO/IEC 7812 the leading number could be a 0 in some corner cases).
However, notice you don't have any benefit here in using the NUMBER type, as you will never perform calculation on the credit card "number". So, for that kind of data (credit card "numbers, telephone "numbers", zip codes, ...), I would strongly suggest you to use a character type (VARCHAR2 or CHAR if you prefer) instead, and at the very least check using an appropriate regexp than only digits are part of the string. Would be better to validate the checksum as suggested by #Allan in his answer though.
In addition, even if 16 digits is the most common case, bank card numbers are variable length -- from 12 to 19 digits (according to http://www.watersprings.org/pub/id/draft-eastlake-card-map-08.txt as I don't have access to the ISO official document).
Finally, concerning credit card numbers, you have to remember that depending your local regulation you are not necessarily allowed to store them unencrypted...
NUMBER(16) will only allow integers (i.e. if you try to insert '10.1', it will round to '10').
Keep in mind too that credit card numbers aren't always 16 digits - American Express uses 15.
The only benefit you'll gain from storing credit cards numbers in the NUMBER type is storage space. Since the digits are packed at a ratio of 9 digits to 4 bytes, 8 bytes will store 16 digits. However, every interaction with the data will require a type conversion to or from text, so you'll have to weigh the costs of storage, processing, and ease of coding.
The obvious way to validate that the value is wide enough is to check the value numerically:
CreditCardNumber NUMBER(16) NOT NULL
CHECK (CreditCardNumber >= 1000000000000000)
However, as #BenGrimm points out, this may not be valid for all credit card numbers.
One way to validate the card lenght per providers is to have a lookup table with each provider that you accept and the length of their card numbers. Again, you'd have to use a trigger to check against that, but it would allow you to verify that the length is appropriate is precisely correct.
A better validation might be to implement the Luhn algorithm in a function and use it to validate the column value via a trigger.
Finally, to reiterate what Sylvain Leroux pointed out, this should all be academic. You shouldn't be storing credit card numbers in plain text and may even be legally or contractually prevented from doing so.
You could potentially use ceiling(log10(Number)) = 16. I think for a credit card number there are better ways to check though.

Should I use VARCHAR(20) or VARCHAR(255) to store a name?

At the university I was told to use VARCHAR(20) to store a first name. The VARCHAR type takes space depending of string length, so is it necessary to specify smaller length range? I'm asking because RedBean ORM creates on default a VARCHAR(255) field for strings which length is <= 255 chars.
Is it any difference between using VARCHAR(20) and VARCHAR(255)? Except the maximum string length that can be stored :)
*I know that such questions have already been asked but all I understand from them is that using VARCHAR(255) where it isn't necessary could cause excessive memory consumption in DB applications.
What it is in real life programming? Should I use VARCHAR(255) for all short text inputs or try to limit length whenever it is possible?
Because 255 is now just an arbitrary choice for a VARCHAR length.
Explanation: Prior to MySQL 5.0.3 (give or take a few point releases - I forget) a VARCHAR column could be 255 characters in length maximum, so VARCHAR(255) was often used as a default. Now, however, you can go up to 65,535 characters on VARCHAR, so if you're still using "255" then that seems arbitrary and not well thought out (or your schema is just old).
It is better to define the limit to reduce the default database size and also helpful for validation purpose.
As far as I know with varchar(20) you are telling that the field will contain not more than 20 characters.
First of all determine an ideal range for the specified field depending on what it will hold (name or address etc). Its always efficient to use as small length as required.
The decision to use 20 characters is just as arbitrary as 255 unless you know your data.
You could use varchar(max) but that changes the way your data is stored and could impact performance; but without knowing more about your application and the size/volume of the data it's hard to give advice.
VARchar (255) is generally abad choice unless you need that much space. YOU always wmake all fields the size they need to be and no more. There are several reasons for this. ONe is the row size of the record has limits. Often you can crete a row larger than the limits but the first time you try to enter data that exceeds teh limits, it willfail. THis is a bad thing and shoudl be avopided by using smaller field sizes. Larger field sizes also encourage the entry of bad information. If users know they havea lot of room in a filed, they willenter notes inthat field instead of theh data, I have seen such gems as (and this is genuine example from a past job) "Talk to the fat girl as the blonde is useless." in fields where teh length was too long. You don't want to give room for junk to be put into a field if you can help it. Bad data in means bad data out.
Wider pages can also be slower to access, so it is the database's best interest to limit field size.
Under no circumstances should you use nvarchar(max) or varchar(max) for any string type fields unless you intend for some records to contain more than 4000 characters. These fields cannot be indexed (in SQL server, know your own datbase limits when doing design) and using them indiscriminately is very bad and will cause a slower than slow database.
Names are tricky and can be hard to determine the size so some people go big. But 25-50 is more reasonable than 255. It may vary depending on the kind of names you are storing, for instance if corporations are mixed in with people names, then the field will need to be wider. If you have a lot of foreign names to store, you need to know what is the norm for names in that country as far as length. Some countries have typically longer names than others. ANd remeber as fara as first name are considered, it will make a difference whether the person uses their middle names as well and if there is anywhere to store that. This is especillay true for people who have more than one middle name or who are using their maiden name as a middle name but still go by their other names such as Mary Elizabeth Annette Von Middlesworth Jamison - you can see how hard a name like that is to break up into first, middle and last and the majority of the name might end up in the firstname column.

Paypal API USER, PWD, SIGNATURE field lengths

I've gone through a number of pages and PDFs within the Paypal guides and x.com, but I can't find any reference to the maximum field lengths for the API login/connection. I see the Transaction ID maxes out at 19 characters, but they seem to avoid saying the maximums for the access fields.
I'm setting up a database table to hold multiple Paypal API logins as well as a section to edit it with I'll have validate by length. I want to use some real values instead of guessing 255 characters.
Surely someone must have been as specific as me in this regard, I'm hoping someone has found this answer.
I've always gone with 75 and that seems to be safe.
On the NVP API Method and Field Reference page at http://www.paypalobjects.com/en_US/ebook/PP_NVPAPI_DeveloperGuide/Appx_fieldreference.html they indicate EMAIL as "127 single byte characters".
As for the other items, I did not see any documented lengths, but checking 5 client accounts, it appears that
API PWD is 16 characters
API signature is 56 characters - in each account (these must be fixed length).

[My]SQL VARCHAR Size and Null-Termination

Disclaimer: I'm very new to SQL and databases in general.
I need to create a field that will store a maximum of 32 characters of text data. Does "VARCHAR(32)" mean that I have exactly 32 characters for my data? Do I need to reserve an extra character for null-termination?
I conducted a simple test and it seems that this is a WYSIWYG buffer. However, I wanted to get a concrete answer from people who actually know what they're doing.
I have a C[++] background, so this question is raising alarm bells in my head.
Yes, you have 32 characters at your disposal. SQL does not concern itself with nul terminated strings like some programming languages do.
Your VARCHAR specification size is the max size of your data, so in this case, 32 characters. However, VARCHARS are a dynamic field, so the actual physical storage used is only the size of your data, plus one or two bytes.
If you put a 10-character string into a VARCHAR(32), the physical storage will be 11 or 12 bytes (the manual will tell you the exact formula).
However, when MySQL is dealing with result sets (ie. after a SELECT), 32 bytes will be allocated in memory for that field for every record.