Username or Userid - sql

I've seen many discussion whether is better to use userid or username as primary key for a table. userid would allow for the flexibility of later changing username if desired. Also is a way to implement security. However, username is also a unique identifier.
If I choose userid as my primary key, what is the best way to enforce username to take on a unique value?
If I choose username, what problems should I be aware?

I would declare UserId as the PRIMARY KEY as there will be other tables referencing this user record through UserId and thereby will be useful to enforce any FOREIGN KEY constraints.
If username needs to be unique, then I would declare it as NON NULL column and define UNIQUE KEY constraint. The NON NULL property will prevent the single null value allowed by the UNIQUE KEY constraint in a column. So, this set up on UserName would be similar to that of a PRIMARY KEY.

This is from my own point of view.
I rather choose UserID of data type int (or could be string) to be the primary key of the table since at all times this can't be change. And there is no problem on some foreign keys that are referencing on it since it is unchangeable.
The reason why I didn't choose Username is because at some point, although this is unique, can be change sometimes. If there are already foreign keys that are referencing to it, that username can't be change at all until those keys or records where dropped or deleted first.

Semantically, to my ear at least, userid sounds like an artificially created value, possibly an artificial primary key, while username sounds like a natural, user-friendly (component of) a natural primary key. To use either term in the opposite sense is likely to confuse programmers and users occasionally, and possibly create subtle bugs down the road.

what is the best way to enforce username to take on a unique value?
Create a unique index.
If I choose username, what problems should I be aware?
Will you allow a user to change their user name (as long as it stays unique) whenever they want? If yes, then use a user id that you generate; otherwise use a user name that they choose.

For your first question, UNIQUE constraint is available in most modern RDBMS. You can implement it at application level too.
For your second question, I don't see any obvious problems. UserNames are generally known to the end-users. However, if you have large set of users and the max length of username field is large, indexing may not be as efficient as on int type userID fields.

Related

Confusion with primary key

I am new to designing database. Maybe its a stupid question so please pardon me for that. So the thing is I am designing the database for users. After user fills registration information he gets the unique receipt number. So my question is since receipt no. is unique can I use it as a primary key in Users table or should I stick with standard method of assigning userID to each row in table and use userID as primary key?
A table can have more than one key. If the receipt number is intended to be unique and you want the DBMS to enforce key dependencies on that attribute as a data integrity constraint then yes you should make it a key (with uniqueness implemented by a PRIMARY KEY or UNIQUE constraint or whatever mechanisms your DBMS provides).
Designating any one key to be a "primary" one isn't especially important - or at least it is only as important as you want it to be. What really matters is the full set of key(s) you choose. The requirements for any key are Uniqueness and Irreducibility. Sensible criteria for choosing a key are also: Familiarity, Simplicity and Stability
if your receipt number contains only integer number then, may be you can make your receipt-no field as auto-increment number,
If your receipt no. is complicated enough - built using numbers and characters, or 20 digits in length - better use the surrogate UserId
If receipt No can be simple integer and may be generated from DB - better use UserId and assign its value to ReceiptNo
If ReceiptNo is simple integer and its identifies the user AND the only user - use it as PKey.

Setting up a basic table, is integer auto-increment primary key a standard?

I've got a web app, I have a concept of users, which will probably go into a user table like:
table: user
username (varchar 32) | email (varchar 64) | fav_color | ...
I'd like username and email to be unique, meaning I can't allow users to have the same username, or the same email. I see example tables of this sort always introduce an integer auto-increment primary key.
Not sure why this is done, is it to somehow speed up queries by foreign keys later on? For example, let's say I have another table like:
table: grades
username (foreign key?) | grade
Is it inefficient to be using the username as a foreign key? I want to do queries like:
SELECT FROM grades WHERE username = 'john'
so I guess it'd be faster to do an integer lookup for the database instead?:
SELECT FROM grades WHERE fk_user_id = 20431
Thanks
What you are asking is somewhat a design decision based on the judgment of the individual data modeler. Personally, in this case, I would include the auto-incremented integer primary key. It is unusual to be able to guarantee that username (and even more so e-mail address) will be unchanging. However, you can design your software so that the same integer primary key always refers to the same user, regardless of what else may change about that user record.
What would help the performance of username lookups would be a UNIQUE constraint on username with an index that corresponds to it. If you really want e-mail addresses to be unique (mostly a business requirement decision), you could also put a UNIQUE constraint on e-mail address. Foreign keys are ignored in the default database engine in MySQL (unfortunately), so I won't bother going into the benefits there from a data modeling perspective.
Edit:
I guess I will go into the benefits for foreign keys if they are being enforced now. Yes, there are provisions for updating all the data that depends on a foreign key (such as ON UPDATE CASCADE). However, they are often poorly understood and viewed as difficult to maintain. It is usually a better practice to have a foreign key refer to something unchanging, hence your integer primary key.
My advice, after years of db-building
only use chars as PK when they don't represent anything in the real world.
The real world is a caotic place, and as soon as you use PK's from it, you're one a slope.
Just trust me.
(and there's a speed gain too).
regards,
//t
It may not necessarily be a "standard" per se, but it is quick, easy, convenient and generally resistant to business key changes.
See also: Pros and cons of autoincrement keys on every table
Integers as the primary key will make your life a lot easier down the road as your application evolves. Use an index on your username and/or email for the query optimization.
I like integer keys because:
make joins faster
smaller and faster indexes
never need to change (your username and email field values may need to change)
Indexes on Integer columns perform faster than on large character values. Having the primary key on the narrow identity column is the most optimal solution.
Using real-world data as foreign keys is very problematic, and "inefficient" because they violate referential integrity. You think that user names and emails are unique and will never change? You are almost certainly wrong. Read an earlier question on natural keys
Integer auto-increment primary keys will be faster, but that's not why they're used. They're used because they work. Use them.

Use email address as primary key?

Is email address a bad candidate for primary when compared to auto incrementing numbers?
Our web application needs the email address to be unique in the system. So, I thought of using email address as primary key. However my colleague suggests that string comparison will be slower than integer comparison.
Is it a valid reason to not use email as primary key?
We are using PostgreSQL.
String comparison is slower than int comparison. However, this does not matter if you simply retrieve a user from the database using the e-mail address. It does matter if you have complex queries with multiple joins.
If you store information about users in multiple tables, the foreign keys to the users table will be the e-mail address. That means that you store the e-mail address multiple times.
I will also point out that email is a bad choice to make a unique field, there are people and even small businesses that share an email address. And like phone numbers, emails can get re-used. Jsmith#somecompany.com can easily belong to John Smith one year and Julia Smith two years later.
Another problem with emails is that they change frequently. If you are joining to other tables with that as the key, then you will have to update the other tables as well which can be quite a performance hit when an entire client company changes their emails (which I have seen happen.)
the primary key should be unique and constant
email addresses change like the seasons. Useful as a secondary key for lookup, but a poor choice for the primary key.
Disadvantages of using an email address as a primary key:
Slower when doing joins.
Any other record with a posted foreign key now has a larger value, taking up more disk space. (Given the cost of disk space today, this is probably a trivial issue, except to the extent that the record now takes longer to read. See #1.)
An email address could change, which forces all records using this as a foreign key to be updated. As email address don't change all that often, the performance problem is probably minor. The bigger problem is that you have to make sure to provide for it. If you have to write the code, this is more work and introduces the possibility of bugs. If your database engine supports "on update cascade", it's a minor issue.
Advantages of using email address as a primary key:
You may be able to completely eliminate some joins. If all you need from the "master record" is the email address, then with an abstract integer key you would have to do a join to retrieve it. If the key is the email address, then you already have it and the join is unnecessary. Whether this helps you any depends on how often this situation comes up.
When you are doing ad hoc queries, it's easy for a human being to see what master record is being referenced. This can be a big help when trying to track down data problems.
You almost certainly will need an index on the email address anyway, so making it the primary key eliminates one index, thus improving the performance of inserts as they now have only one index to update instead of two.
In my humble opinion, it's not a slam-dunk either way. I tend to prefer to use natural keys when a practical one is available because they're just easier to work with, and the disadvantages tend to not really matter much in most cases.
No one seems to have mentioned a possible problem that email addresses could be considered private. If the email address is the primary key, a profile page URL most likely will look something like ..../Users/my#email.com. What if you don't want to expose the user's email address? You'd have to find some other way of identifying the user, possibly by a unique integer value to make URLs like ..../Users/1. Then you'd end up with a unique integer value after all.
It is pretty bad. Assume some e-mail provider goes out of business. Users will then want to change their e-mail. If you have used e-mail as primary key, all foreign keys for users will duplicate that e-mail, making it pretty damn hard to change ...
... and I haven't even started talking about performance considerations.
I don't know if that might be an issue in your setup, but depending on your RDBMS the values of a columns might be case sensitive. PostgreSQL docs say: „If you declare a column as UNIQUE or PRIMARY KEY, the implicitly generated index is case-sensitive“. In other words, if you accept user input for a search in a table with email as primary key, and the user provides "John#Doe.com", you won't find “john#doe.com".
At the logical level, the email is the natural key.
At the physical level, given you are using a relational database, the natural key doesn't fit well as the primary key. The reason is mainly the performance issues mentioned by others.
For that reason, the design can be adapted. The natural key becomes the alternate key (UNIQUE, NOT NULL), and you use a surrogate/artificial/technical key as the primary key, which can be an auto-increment in your case.
systempuntoout asked,
What if someone wants to change his email address? Are you going to change all the foreign keys too?
That's what cascading is for.
Another reason to use a numeric surrogate key as the primary key is related to how the indexing works in your platform. In MySQL's InnoDB, for example, all indexes in a table have the primary key pre-pended to them, so you want the PK to be as small as possible (for speed's and size's sakes). Also related to this, InnoDB is faster when the primary key is stored in sequence, and a string would not help there.
Another thing to take into consideration when using a string as an alternate key, is that using a hash of the actual string that you want might be faster, skipping things like upper and lower cases of some letters. (I actually landed here while looking for a reference to confirm what I just said; still looking...)
yes, it is better if you use an integer instead. you can also set your email column as unique constraint.
like this:
CREATE TABLE myTable(
id integer primary key,
email text UNIQUE
);
Yes, it is a bad primary key because your users will want to update their email addresses.
Another reason why integer primary key is better is when you refer to email address in different table. If address itself is a primary key then in another table you have to use it as a key. So you store email addresses multiple time.
I am not too familiar with postgres. Primary Keys is a big topic. I've seen some excellent questions and answers on this site (stackoverflow.com).
I think you may have better performance by having a numeric primary key and use a UNIQUE INDEX on the email column. Emails tend to vary in length and may not be proper for primary key index.
some reading here and here.
Personally, I do not use any information for primary key when designing database, because it is very likely that I might need to alter any information later. The sole reason that I provide primary key is, it is convenience to do most SQL operation from client-side, and my choice for that has been always auto-increment integer type.
I know this is a bit of a late entry but i would like to add that people abandon email accounts and service providers recover the address allowing another person to use it.
As #HLGEM pointed out "Jsmith#somecompany.com can easily belong to John Smith one year and Julia Smith two years later." in this case should John Smith want your service you either have to refuse to use his email address or delete all your records pertaining to Julia Smith.
If you have to delete records and they relate to the financial history of the business depending on local law you could find yourself in hot water.
So i would never use data like email addresses, number plates, etc. as a primary keys because no matter how unique they seem they are out of your control and can provide some interesting challenges that you may not have time to deal with.
You may need to consider any applicable data regulation legislation. Email is personal information, and if your users are EU citizens for instance then under GDPR they can instruct you to delete their information from your records (remember this applies regardless of which country you are based).
If you need to keep the record itself in the database for referential integrity or historical reasons such as audit, using a surrogate key would allow you to just NULL all the personal data field. This obviously isn't as easy if their personal data is the primary key
Your colleague is right: Use an autoincrementing integer for your primary key.
You can implement the email-uniqueness either at the application level, or you coudl mark your email address column as unique, and add an index on that column.
Adding the field as unique will cost you string comparision only when inserting into that table, and not when performing joins and foreign key constraint checks.
Of course, you must note that adding any constraints to your application at the database level can cause your app to become inflexible. Always give due consideration before you make any field "unique" or "not null" just because your application needs it to be unique or non-empty.
Use a GUID as a primary key... that way you can generate it from your program when you do an INSERT and you don't need to get a response from the server to find out what the primary key is. It will also be unique accross tables and databases and you don't have to worry about what happens if you truncate the table some day and the auto-increment gets reset to 1.
you can boost the performance by using integer primary key.
you should use an integer primary key. if you need the email-column to be unique, why don't you simply set an unique-index on that column?
If you have a non int value as primary key then insertions and retrievals will be very slow on large data.
primary key should be chosen a static attribute. Since email addresses are not static and can be shared by multiple candidates so it is not a good idea to use them as primary key. Moreover email addresses are strings usually of a certain length which may be greater than unique id we would like to use[len(email_address)>len(unique_id)] so it would require more space and even worst they are stored multiple times as foreign key. And consequently it will lead to degrade the performance.
It depends on the table. If the rows in your table represent email addresses, then email is the best ID. If not, then email is not a good ID.
If it's simply a matter of requiring the email to be unique then you can just create a unique index with that column.
Email is a good unique index candidate, but not for primary key, if it is a primary key, you will be no able to change the contact's emails address for example.
I think your join querys will be slower too.
don not use email address as primary key , keep email as unique but don not use it as primary key, use user id or username as primary key

why we use an ID column in the table if we have a unique value

i want to ask a small question here but i really don't know what is the answer of this question.
i have a accounts table which has
Username | Password
the username is a primary key so its unique
so is it necessary to put an ID column to the table ? if Yes, what is the benefit of that ?
Thanks
Search by a numeric key is slightly faster (varies from one DB to another). Also, if you have a lot of references to the user table, you save some database space by having the numeric ID as the foreign key, as opposed to a string name.
It will make everything else easier, mainly foreign key relationships from other tables. And it allows you to change the username if you want - primary keys are not easy to change.
Faster in indexes
Consumes less disk space (and is again faster) when used as a foreign key
And, as mentioned a number of times, you can change the username without modifying a host of other tables.

SQL: To primary key or not to primary key?

I have a table with sets of settings for users, it has the following columns:
UserID INT
Set VARCHAR(50)
Key VARCHAR(50)
Value NVARCHAR(MAX)
TimeStamp DATETIME
UserID together with Set and Key are unique. So a specific user cannot have two of the same keys in a particular set of settings. The settings are retrieved by set, so if a user requests a certain key from a certain set, the whole set is downloaded, so that the next time a key from the same set is needed, it doesn't have to go to the database.
Should I create a primary key on all three columns (userid, set, and key) or should I create an extra field that has a primary key (for example an autoincrement integer called SettingID, bad idea i guess), or not create a primary key, and just create a unique index?
----- UPDATE -----
Just to clear things up: This is an end of the line table, it is not joined in anyway. UserID is a FK to the Users table. Set is not a FK. It is pretty much a helper table for my GUI.
Just as an example: users get the first time they visit parts of the website, a help balloon, which they can close if they want. Once they click it away, I will add some setting to the "GettingStarted" set that will state they helpballoon X has been disabled. Next time when the user comes to the same page, the setting will state that help balloon X should not be shown anymore.
Having composite unique keys is mostly not a good idea.
Having any business relevant data as primary key can also make you troubles. For instance, if you need to change the value. If it is not possible in the application to change the value, it could be in the future, or it must be changed in an upgrade script.
It's best to create a surrogate key, a automatic number which does not have any business meaning.
Edit after your update:
In this case, you can think of having conceptually no primary key, and make this three columns either the primary key of a composite unique key (to make it changeable).
Should I create a primary key on all three columns (userid, set, and key)
Make this one.
Using surrogate primary key will result in an extra column which is not used for other purposes.
Creating a UNIQUE INDEX along with surrogate primary key is same as creating a non-clustered PRIMARY KEY, and will result in an extra KEY lookup which is worse for performance.
Creating a UNIQUE INDEX without a PRIMARY KEY will result in a HEAP-organized table which will need an extra RID lookup to access the values: also not very good.
How many Key's and Set's do you have? Do these need to be varchar(50) or can they point to a lookup table? If you can convert this Set and Key into SetId and KeyId then you can create your primary key on the 3 integer values which will be much faster.
I would probably try to make sure that UserID was a unique identifier, rather than having duplicates of UserID throughout the code. Composite keys tend to get confusing later on in your code's life.
I'm assuming this is a lookup field for config values of some kind, so you could probably go with the composite key if this is the case. The data is already there. You can guarantee it's uniqueness using the primary key. If you change your mind and decide later that it isn't appropriate for you, you can easily add a SettingId and make the original composite key a unique index.
Create one, separate primary key. No matter what how bussines logic will change, what new rules will have to be applied to your Key VARCHAR(50) field - having one primary key will make you completly independent of bussines logic.
In my experience it all depends how many tables will be using this table as FK information. Do you want 3 extra columns in your other tables just to carry over a FK?
Personally I would create another FK column and put a unique constraint over the other three columns. This makes foreign keys to this table a lot easier to swallow.
I'm not a proponent of composite keys, but in this case as an end of the line table, it might make sense. However, if you allow nulls in any of these three fields becasue one or more of the values is not known at the time of the insert, there can be difficulty and a unique index might be better.
Better have UserID as 32 bit newid() or unique identifier because UserID as int gives a hint to the User of the probable UserID. This will also solve your issue of composite key.