Performance of primary/foreign key versus single table with no primary key

Performance of primary/foreign key versus single table with no primary key - sql

If I have a user that can be associated with multiple keys would the proper table setup be:
One table with two columns such as:
UserName | Key
where there is no primary key and a user can have multiple rows, or:
Two tables with an matching identifier
Table 1 UserName | UserId
Table 2 Key | UserId
where UserId is the primary key of table1 and the foreign key of table 2.
Which way is more preferred if I wanted to find all the keys associated with a user?

If you wanted to find all the keys associated with a given user you might use the following JOIN query:
SELECT Key
FROM keys k INNER JOIN users u
ON k.UserId = u.UserId
WHERE u.UserName = 'username'
The place which would benefit most from an index in this case would be the UserId columns in the two tables. If this index existed, then, for a given user, looking up keys in the Key table would require roughly constant time.
Without any indices, then MySQL will have to do a full table scan for each user, as it tries to find keys corresponding to that user.

There is nothing common or uncommon in this case! it all depends on your business requirements; If a user can have multiple usernames, you need to have a table to link all these usernames for each user together, identified by a userId, and this userId should be the identifier of the user throughout your database design, therefore, you need two tables in this case:
UserDetails that will contain the user info, such as name age...birth date...etc, and UserNames that will contain the at least one userName for each user.
Otherwise, you can use the same table UserDetails to store userName along with the rest of the user details
So, in your case, use a separate table to store userNames, why in an example:
Supposing you have a user with two userNames for him.
if you use one table for storing user info with the userName, you will have your data like this:
Name BirthDate OtherDetails UserName AnotherDetails
user 1/1/1990 blah blah.. user1 blah blah...
user 1/1/1990 blah blah.. user2 blah blah...
As you can see, your data in the above table is repeated
But if you used two tables, data size will be reduced
This is called database normalization

Without an understanding of the entities and attributes you are attempting to model, it's not really possible to give you an answer to the question you asked.
Entity Relationship Modeling
What entities does your data model represent? An entity is a person, place, thing, concept or event that can be uniquely identified, is important to the system, and we can store information about.
From the description given in the question, we are thinking that a "user" is an entity. And maybe "key" is also an entity. We can't really tell from the description whether that's an entity, or whether it's a repeating (multi-valued) attribute.
What uniquely identifies a "user"?
What attributes do we need/want to store about a "user"?
The second part is understanding the relationships between the entities.
To do that, we need to ask and get answers to some questions, such as:
How many "users" can be associated with a specific "key"?
Does a "key" have to be related to a user, or can a key be related to zero users?
Can a "key" be uniquely identified, apart from a user?
And so on.
Based on those answers, we can start to put together a model, and evaluate how well that model represents the problem, and how well that model is going to work for our expected use cases.
If both "user" and "key" are entities, and there is a many-to-many relationship between the entities, the model for that is going to look different than if "key" is not an entity, but just a multi-valued attribute.
If a key must "belong" to one and only one user, and a user can "hold" zero, one or more keys, likely it's a multivalued attribute. Then we need two tables. One "parent" table for the "user" entity, and another "child" table to store the repeating attribute.
We don't know (yet) what set of attributes uniquely identifies a user, so we'll represent that with a generic "userid" attribute of some unspecified datatype.
user
-----
userid datatype NOT NULL PRIMARY KEY
name varchar(30) NOT NULL
e.g.
userid name
------ ------
psimon paul simon
agarfu art garfunkel
To store a multi-valued attribute, we use the PRIMARY KEY of the entity table as a foreign key in our second "child" table.
user_key
--------
userid datatype NOT NULL FOREIGN KEY ref user.userid
key VARCHAR(30) NOT NULL
e.g.
user_key
userid key
------- -------
psimon G major
psimon A major
psimon A major
psimon B minor
agarfu A major
If we decide that "user" will have a different column as the primary key, then we'd use that same column as the foreign key in the child table.
In this example, we've allowed "duplicate" values for "key" for a given user. If we only want distinct values of "key" for a user, we'd need to add a UNIQUE constraint on the (userid, key) tuple.
Before we get too worried about performance, we need to concerned with getting some workable data models. From there, we can translate that into some implementations, and evaluate performance characteristics of each of those.
If the implementation has tables that don't have a suitable primary key, we can introduce another column to stand in as a "surrogate" primary key.

So long as your table has a unique PK you're basically correct and somewhere on the spectrum of "perfect" to "could do better".
In your first case, you're still correct, just that the PK is both UserName and Key.
The second one is more common and probably more correct because sooner or later you'll want things against users that bear no relation to the key and logically fit on the UserName table.

Related

How to construct a Junction Table for Many-to-Many relationship without breaking Normal Form

I have these two tables, Company and Owner.
Right now they are both in Normal Form, but I need to create a Many-to-Many relationship between them, since one Company can have many Owners and one Owner can have many Companies.
I have previously gotten an answer to whether adding an array of CompanyOwners (with Owner UUIDs) to Companies would break Normal Form, It will break Normal Form, and have been able to gather that what could be used is a Junction Table, see thread.
My question is as following: will the creation of an additional Junction Table as shown below, break Normal Form?
-- This is the junction table.
CREATE TABLE CompanyOwners(
Connection-ID UUID NOT NULL, // Just the ID (PK) of the relationship.
Company-ID UUID NOT NULL REFERENCES Company (Company-ID),
Owner-ID UUID NOT NULL REFERENCES Owner (Owner-ID),
CONSTRAINT "CompanyOwners" PRIMARY KEY ("Connection-ID")
)

Your structure allows duplicate data. For example, it allows data like this. (UUIDs abbreviated to prevent horizontal scrolling.)
Connection_id Company_id Owner_id
--
b56f5dc4...af5762ad2f86 4d34cd58...a4a529eefd65 3737dd70...a359346a13b3
0778038c...ad9525bd6099 4d34cd58...a4a529eefd65 3737dd70...a359346a13b3
8632c51e...1876f6d2ebd7 4d34cd58...a4a529eefd65 3737dd70...a359346a13b3
Each row in a relation should have a distinct meaning. This table allows millions of rows that mean the same thing.
Something along these lines is better. It's in 5NF.
CREATE TABLE CompanyOwners(
Company_ID UUID NOT NULL references Company (Company_ID),
Owner_ID UUID NOT NULL references Owner (Owner_ID),
PRIMARY KEY (Company_ID, Owner_ID)
);
Standard SQL doesn't allow "-" in identifiers.

This is fine as it is but you could add a couple of more columns like
DateOwned Datetime --<-- when the owner bought the company
DateSold Datetime --<-- when a the owner sold the compnay
After all you would want to know something like is company is still owned by the same owner, and keep track of the company's ownership history etc.

SQL Table Design: Multiple foreign key columns or general "fkName" and "fkValue" columns

Given a table (Contacts) which could apply to distinct items in a database (Employers, Churches, Hospitals, Government Groups, etc.) which are stored in different tables, when leveraging this single contacts table in the end I've found there exist two choices for relating a contact back to one particular "item"
One column for each "item" type with a Foreign Key association, this results in a table looking like:
contactID empID churchID hospID govID conFN conLN ...
One column indicating the type of "item" (fkName) and one column for the value corresponding to the item of that type (fkValue). This results in a table looking like:
contactID fkName fkValue conFN conLN ...
The first means that out of the X possible foreign keys, X-1 will be NULL, but I get the advantages of hard-associated foreign keys.
The second means that I can set fkName and fkValue as NOT NULL but I don't get the advantages of DB-supported foreign keys.
Ultimately, is there a "right" answer? Are there other advantages / disadvantages that I haven't thought about (performance, security, growth/expansion)?

The second approach is an anti-pattern.
You need to set up many-to-many relationship tables between each entity (Hospitals, Churches, Employers, Government Groups, etc.) and Contacts.
If you want to make it easier to query for all of the entities a contact is related to, consider creating a view on top of the many-to-many relationship tables.

I think the second option is better as it will allow you to maintain referential integrity of your database using the in-built SQL features (foreign keys), rather than relying on your code to maintain it.

This is the solution that you should be going towards:
type
----------------
typeId name
1 hospital
2 church
contact
-----------------------------------------
contactId firstName LastName typeId (fk)
1 bob is 1
2 your uncle 2
If Bob can be a contact for more than one type, than you will need a junction table.

If two tables have a "One To One" relationship, should they have the same column as primary key?

Let's say I have a table called Users which represents registered users of a website. I also have an AccountActivation table which stores the randomly generated string sent to a new user's email to verify that email.
The AccountActivation table has UserId column which also happen to be the primary key for the Users table. It also has the ActivationCode column to store the code. Either column could uniquely identify a row in the AccountActivation table.
So if I choose the activation code column as the primary key, I end up having two one-to-one tables with different primary keys. I thought in one to one relationship, the two tables must have the same primary key?

If you choose ActivationCode as PK, then why do you have two one-to-one relations?
The only relation that's there is
AccountActivation.UserId -> Users.UserId
or what else do you think you suddenly have?
If go do what you suggested, then the table Users has its PK on UserId and table AccountActivation has its PK on ActivationCode - not a problem at all, and there's no reason not to do it this way.
Which column (UserId or ActivationCode) you pick for the PK of AccountActivation doesn't matter - that doesn't influence / disturb the FK relationship between AccountActivation and User, nor does it add an extra one-to-one relationship of any kind .....
If you do choose ActivationCode for the PK of AccountActivation, the only extra step that I would take is creating a nonclustered index on UserId so that queries that join the two tables will benefit from maximum performance.

If there is only to be one ActivationCode they could share the UserId. But that would imply that when a user re-generated a key you should, either update the old row or delete it.
But why you need to store such data? You could also composite the account activation code with some sort of computation and encryption with unique data from the User.
Just to illustrate my suggestion:
Users table has two columns UserId CreationDate
Then the token might be UserId + CreationDate (example). You would be able to generate and check it without the extra data in the database. I know that this might not suite your requirements.

Make the UserId column in the AccountActivation a foreign key to the Users table.
Users
=====
UserId primary key
Name
Address
etc...
AccountActivation
=================
UserId primary key (foreign key to Users.UserId)
ActivationCode (unique constraint)
Now you have a one-to-one relationship

You need not have the same column as primary key in 2 tables to have one-to-one relationship.
You can have any column as primary key in the AccountActivation table.
UserId which is the foreign key to the AccountActivation table is the primary key in Users table. So, you should definitely be able to uniquely identify a users activation code from the AccountActivation table using this column, no matter whether it is primary key of that table(But it should be unique and I hope it would).

Is the following acceptable foreign key usage

I have the following database, the first table users is a table containing my users, userid is a primary key.
The next is my results table, now for each user, there can be a result with an id and it can be against an exam. Is it ok in this scenario to use "id" as a primary key and "userid" as a foreign key? Is there a better way I could model this scenario?
These then link to the corresponding exams...

I would probably not have userid as a varchar. I would have that as an int as well.
So the user table is like this:
userId int
userName varchar
firstName varchar
lastName varchar
And then the forenkey in the results table table would be an int. Like this:
userId int
result varchar
id int
examid INT
Becuase if you are plaing on JOIN ing the tables together then JOIN ing on a varchar is not as fast as JOIN ing on a INT
EDIT
That depend on how much data you are planing to store. Beause you know that there is a minimum chans that GUIDs are not unique. Simple proof that GUID is not unique. I think if I would design this database I would go with an int. Becuase it feels a little bit overkill to use a GUID as a userid

Provided that each user/exam will only ever produce one result, then you could create a composite key using the userid and exam columns in the results table.
Personally though, i'd go with the arbitrary id field approach as I don't like having to pass in several values to reference records. But that's just me :).
Also, the exam field in the results table should also be a foreign key.

Another way of doing this could be to abstract the Grade Levels from the Exam, and make the Exam a unique entity (and primary key) on its own table. So this would make a Grade Levels table (pkey1 = A, pkey2 = B, etc) where the grade acts as the foreign key in your second table, thus removing an entire field.
You could also normal out another level and make a table for Subjects, which would be the foreign key for a dedicated Exam Code table. You can have ENG101, ENG102, etc for exams, and the same for the other exam codes for the subject. The benefit of this is to maintain your exams, subjects, students and grade levels as unique entities. The primary and foreign keys of each are evident, and you keep a simple maintenance future with room to scale up.
You could consider using composite keys, but this is a nice and simple way to start, and you can merge tables for indexing and compacting as required.

Please make sure you first understand Normal Forms before actually normalizing your schema.

Database Design

I am making a webapp right now and I am trying to get my head around the database design.
I have a user model(username (which is primary key), password, email, website)
I have a entry model(id, title, content, comments, commentCount)
A user can only comment on an entry once. What is the best and most efficient way to go about doing this?
At the moment, I am thinking of another table that has username (from user model) and entry id (from entry model)
**username id**
Sonic 4
Sonic 5
Knuckles 2
Sonic 6
Amy 15
Sonic 20
Knuckles 5
Amy 4
So then to list comments for entry 4 it searches for id=4.
On a side note:
Instead of storing a commentCount, would it be better to calculate the comment count from the database each time when needed?

Your design is basically sound. Your third table should be named something like UsersEntriesComments, with fields UserName, EntryID and Comment. In this table, you would have a compound primary key consisting of the UserName and EntryID fields; this would enforce the rule that each user can comment on each entry only once. The table would also have foreign key constraints such that UserName must be in the Users table, and EntryID must be in the Entries table (the ID field, specifically).
You could add an ID field to the Users table, but many programmers (myself included) advocate the use of "natural" keys where possible. Since UserNames must be unique in your system, this is a perfectly valid (and easily readable) primary key.
Update: just read your question again. You don't need the Comments or the CommentsCount fields in your Entries table. Comments would properly be stored in the UsersEntriesComments table, and the counts would be calculated dynamically in your queries (saving you the trouble of updating this value yourself).
Update 2: James Black makes a good point in favor of not using UserName as the primary key, and instead adding an artificial primary key to the table (UserID or some such). If you use UserName as the primary key, allowing a user to change their user name is more difficult, as you have to change the username in all the related tables as well.

What exactly do you mean by
entry model(id, title, content, **comments**, commentCount)
(emphasis mine)? Since it looks like you have multiple comments per entity, they should be stored in a separate table:
comments(id, entry_id, content, user_id)
entry_id and user_id are foreign keys to respective tables. Now you just need to create a unique index on (entry_id, user_id) to ensure user can only add one comment per entity.
Also, you may want to create a surrogate (numeric, generated via sequence / identity) primary key for your users table instead of making user name your PK.

Here's my recommendation for your data model:
USERS table
USER_ID (pk, int)
USER_NAME
PASSWORD
EMAIL
WEBSITE
ENTRY table
ENTRY_ID (pk, int)
ENTRY_TITLE
CONTENT
ENTRY_COMMENTS table
ENTRY_ID (pk, fk)
USER_ID (pk, fk)
COMMENT
This setup allows an ENTRY to have 0+ comments. When a comment is added, the primary key being a composite key of ENTRY_ID and USER_ID means that the pair can only exist once in the table (IE: 1, 1 won't allow 1, 1 to be added again).
Do not store counts in a table - use a VIEW for that so the number can be generated based on existing data at the time of execution.

I wouldn't use the username as a primary ID. I would make a numeric id with autoincrement
I would use that new id in the relations table with a unique key on the 2 fields

Even though it isn't in the question, you may want to have a userid that is the primary key, otherwise it will be difficult if the user is allowed to change their username, or make certain people know you cannot change your username.
Make the joined table have a unique constraint on the userid and entryid. That way the database forces that there is only one comment/entry/user.
It would help if you specified a database, btw.

It sounds like you want to guarantee that the set of comments is unique with respect to username X post_id. You can do this by using a unique constraint, or if your database system doesn't support that explicitly, with an index that does the same. Here's some SQL expressing that:
CREATE TABLE users (
username VARCHAR(10) PRIMARY KEY,
-- any other data ...
);
CREATE TABLE posts (
post_id INTEGER PRIMARY KEY,
-- any other data ...
);
CREATE TABLE comments (
username VARCHAR(10) REFERENCES users(username),
post_id INTEGER REFERENCES posts(post_id),
-- any other data ...
UNIQUE (username, post_id) -- Here's the important bit!
);

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas