Database Design - sql

I am making a webapp right now and I am trying to get my head around the database design.
I have a user model(username (which is primary key), password, email, website)
I have a entry model(id, title, content, comments, commentCount)
A user can only comment on an entry once. What is the best and most efficient way to go about doing this?
At the moment, I am thinking of another table that has username (from user model) and entry id (from entry model)
**username id**
Sonic 4
Sonic 5
Knuckles 2
Sonic 6
Amy 15
Sonic 20
Knuckles 5
Amy 4
So then to list comments for entry 4 it searches for id=4.
On a side note:
Instead of storing a commentCount, would it be better to calculate the comment count from the database each time when needed?

Your design is basically sound. Your third table should be named something like UsersEntriesComments, with fields UserName, EntryID and Comment. In this table, you would have a compound primary key consisting of the UserName and EntryID fields; this would enforce the rule that each user can comment on each entry only once. The table would also have foreign key constraints such that UserName must be in the Users table, and EntryID must be in the Entries table (the ID field, specifically).
You could add an ID field to the Users table, but many programmers (myself included) advocate the use of "natural" keys where possible. Since UserNames must be unique in your system, this is a perfectly valid (and easily readable) primary key.
Update: just read your question again. You don't need the Comments or the CommentsCount fields in your Entries table. Comments would properly be stored in the UsersEntriesComments table, and the counts would be calculated dynamically in your queries (saving you the trouble of updating this value yourself).
Update 2: James Black makes a good point in favor of not using UserName as the primary key, and instead adding an artificial primary key to the table (UserID or some such). If you use UserName as the primary key, allowing a user to change their user name is more difficult, as you have to change the username in all the related tables as well.

What exactly do you mean by
entry model(id, title, content, **comments**, commentCount)
(emphasis mine)? Since it looks like you have multiple comments per entity, they should be stored in a separate table:
comments(id, entry_id, content, user_id)
entry_id and user_id are foreign keys to respective tables. Now you just need to create a unique index on (entry_id, user_id) to ensure user can only add one comment per entity.
Also, you may want to create a surrogate (numeric, generated via sequence / identity) primary key for your users table instead of making user name your PK.

Here's my recommendation for your data model:
USERS table
USER_ID (pk, int)
USER_NAME
PASSWORD
EMAIL
WEBSITE
ENTRY table
ENTRY_ID (pk, int)
ENTRY_TITLE
CONTENT
ENTRY_COMMENTS table
ENTRY_ID (pk, fk)
USER_ID (pk, fk)
COMMENT
This setup allows an ENTRY to have 0+ comments. When a comment is added, the primary key being a composite key of ENTRY_ID and USER_ID means that the pair can only exist once in the table (IE: 1, 1 won't allow 1, 1 to be added again).
Do not store counts in a table - use a VIEW for that so the number can be generated based on existing data at the time of execution.

I wouldn't use the username as a primary ID. I would make a numeric id with autoincrement
I would use that new id in the relations table with a unique key on the 2 fields

Even though it isn't in the question, you may want to have a userid that is the primary key, otherwise it will be difficult if the user is allowed to change their username, or make certain people know you cannot change your username.
Make the joined table have a unique constraint on the userid and entryid. That way the database forces that there is only one comment/entry/user.
It would help if you specified a database, btw.

It sounds like you want to guarantee that the set of comments is unique with respect to username X post_id. You can do this by using a unique constraint, or if your database system doesn't support that explicitly, with an index that does the same. Here's some SQL expressing that:
CREATE TABLE users (
username VARCHAR(10) PRIMARY KEY,
-- any other data ...
);
CREATE TABLE posts (
post_id INTEGER PRIMARY KEY,
-- any other data ...
);
CREATE TABLE comments (
username VARCHAR(10) REFERENCES users(username),
post_id INTEGER REFERENCES posts(post_id),
-- any other data ...
UNIQUE (username, post_id) -- Here's the important bit!
);

Related

Performance of primary/foreign key versus single table with no primary key

If I have a user that can be associated with multiple keys would the proper table setup be:
One table with two columns such as:
UserName | Key
where there is no primary key and a user can have multiple rows, or:
Two tables with an matching identifier
Table 1 UserName | UserId
Table 2 Key | UserId
where UserId is the primary key of table1 and the foreign key of table 2.
Which way is more preferred if I wanted to find all the keys associated with a user?
If you wanted to find all the keys associated with a given user you might use the following JOIN query:
SELECT Key
FROM keys k INNER JOIN users u
ON k.UserId = u.UserId
WHERE u.UserName = 'username'
The place which would benefit most from an index in this case would be the UserId columns in the two tables. If this index existed, then, for a given user, looking up keys in the Key table would require roughly constant time.
Without any indices, then MySQL will have to do a full table scan for each user, as it tries to find keys corresponding to that user.
There is nothing common or uncommon in this case! it all depends on your business requirements; If a user can have multiple usernames, you need to have a table to link all these usernames for each user together, identified by a userId, and this userId should be the identifier of the user throughout your database design, therefore, you need two tables in this case:
UserDetails that will contain the user info, such as name age...birth date...etc, and UserNames that will contain the at least one userName for each user.
Otherwise, you can use the same table UserDetails to store userName along with the rest of the user details
So, in your case, use a separate table to store userNames, why in an example:
Supposing you have a user with two userNames for him.
if you use one table for storing user info with the userName, you will have your data like this:
Name BirthDate OtherDetails UserName AnotherDetails
user 1/1/1990 blah blah.. user1 blah blah...
user 1/1/1990 blah blah.. user2 blah blah...
As you can see, your data in the above table is repeated
But if you used two tables, data size will be reduced
This is called database normalization
Without an understanding of the entities and attributes you are attempting to model, it's not really possible to give you an answer to the question you asked.
Entity Relationship Modeling
What entities does your data model represent? An entity is a person, place, thing, concept or event that can be uniquely identified, is important to the system, and we can store information about.
From the description given in the question, we are thinking that a "user" is an entity. And maybe "key" is also an entity. We can't really tell from the description whether that's an entity, or whether it's a repeating (multi-valued) attribute.
What uniquely identifies a "user"?
What attributes do we need/want to store about a "user"?
The second part is understanding the relationships between the entities.
To do that, we need to ask and get answers to some questions, such as:
How many "users" can be associated with a specific "key"?
Does a "key" have to be related to a user, or can a key be related to zero users?
Can a "key" be uniquely identified, apart from a user?
And so on.
Based on those answers, we can start to put together a model, and evaluate how well that model represents the problem, and how well that model is going to work for our expected use cases.
If both "user" and "key" are entities, and there is a many-to-many relationship between the entities, the model for that is going to look different than if "key" is not an entity, but just a multi-valued attribute.
If a key must "belong" to one and only one user, and a user can "hold" zero, one or more keys, likely it's a multivalued attribute. Then we need two tables. One "parent" table for the "user" entity, and another "child" table to store the repeating attribute.
We don't know (yet) what set of attributes uniquely identifies a user, so we'll represent that with a generic "userid" attribute of some unspecified datatype.
user
-----
userid datatype NOT NULL PRIMARY KEY
name varchar(30) NOT NULL
e.g.
userid name
------ ------
psimon paul simon
agarfu art garfunkel
To store a multi-valued attribute, we use the PRIMARY KEY of the entity table as a foreign key in our second "child" table.
user_key
--------
userid datatype NOT NULL FOREIGN KEY ref user.userid
key VARCHAR(30) NOT NULL
e.g.
user_key
userid key
------- -------
psimon G major
psimon A major
psimon A major
psimon B minor
agarfu A major
If we decide that "user" will have a different column as the primary key, then we'd use that same column as the foreign key in the child table.
In this example, we've allowed "duplicate" values for "key" for a given user. If we only want distinct values of "key" for a user, we'd need to add a UNIQUE constraint on the (userid, key) tuple.
Before we get too worried about performance, we need to concerned with getting some workable data models. From there, we can translate that into some implementations, and evaluate performance characteristics of each of those.
If the implementation has tables that don't have a suitable primary key, we can introduce another column to stand in as a "surrogate" primary key.
So long as your table has a unique PK you're basically correct and somewhere on the spectrum of "perfect" to "could do better".
In your first case, you're still correct, just that the PK is both UserName and Key.
The second one is more common and probably more correct because sooner or later you'll want things against users that bear no relation to the key and logically fit on the UserName table.

Identity column separate from composite primary key

I have a table representing soccer matches:
Date
Opponent
I feel {Date,Opponent} is the primary key because in this table there can never be more than one opponent per date. The problem is that when I create foreign key constraints in other tables, I have to include both Date and Opponent columns in the other tables:
Soccer game statistics table:
Date
Opponent
Event (Goal scored, yellow card etc)
Ideally I would like to have:
Soccer matches table:
ID
Date
Opponent
Soccer match statistics table:
SoccerMatchID
Event (Goal scored, yellow card etc)
where SoccerMatch.ID is a unique ID (but not the primary key) and {Date,Opponent} is still the primary key.
The problem is SQL Server doesn't seem to let me define ID as being a unique identity whilst {Date,Component} is the primary key. When I go to the properties for ID, the part signalling unique identifying is grayed-out with "No".
(I assume everyone agrees I should try to achieve the above as it's a better design?)
I think most people don't use the graphical designer to do this, as it's the graphical designer that's preventing it, not SQL Server. Try running DDL in a query window:
ALTER TABLE dbo.YourTable ADD ID INT IDENTITY(1,1);
GO
CREATE UNIQUE INDEX yt_id ON dbo.YourTable(ID);
GO
Now you can reference this column in other tables no problem:
CREATE TABLE dbo.SomeOtherTable
(
MatchID INT FOREIGN KEY REFERENCES dbo.YourTable(ID)
);
That said, I find the column name ID completely useless. If it's a MatchID, why not call it MatchID everywhere it appears in the schema? Yes it's redundant in the PK table but IMHO consistency throughout the model is more important.
For that matter, why is your table called SoccerMatch? Do you have other kinds of matches? I would think it would be Matches with a unique ID = MatchID. That way if you later have different types of matches you don't have to create a new table for each sport - just add a type column of some sort. If you only ever have soccer, then SoccerMatch is kind of redundant, no?
Also I would suggest that the key and unique index be the other way around. If you're not planning to use the multi-column key for external reference then it is more intuitive, at least to me, to make the PK the thing you do reference in other tables. So I would say:
CREATE TABLE dbo.Matches
(
MatchID INT IDENTITY(1,1),
EventDate DATE, -- Date is also a terrible name and it's reserved
Opponent <? data type ?> / FK reference?
);
ALTER TABLE dbo.Matches ADD CONSTRAINT PK_Matches
PRIMARY KEY (MatchID);
ALTER TABLE dbo.Matches ADD CONSTRAINT UQ_Date_Opponent
UNIQUE (EventDate, Opponent);

If two tables have a "One To One" relationship, should they have the same column as primary key?

Let's say I have a table called Users which represents registered users of a website. I also have an AccountActivation table which stores the randomly generated string sent to a new user's email to verify that email.
The AccountActivation table has UserId column which also happen to be the primary key for the Users table. It also has the ActivationCode column to store the code. Either column could uniquely identify a row in the AccountActivation table.
So if I choose the activation code column as the primary key, I end up having two one-to-one tables with different primary keys. I thought in one to one relationship, the two tables must have the same primary key?
If you choose ActivationCode as PK, then why do you have two one-to-one relations?
The only relation that's there is
AccountActivation.UserId -> Users.UserId
or what else do you think you suddenly have?
If go do what you suggested, then the table Users has its PK on UserId and table AccountActivation has its PK on ActivationCode - not a problem at all, and there's no reason not to do it this way.
Which column (UserId or ActivationCode) you pick for the PK of AccountActivation doesn't matter - that doesn't influence / disturb the FK relationship between AccountActivation and User, nor does it add an extra one-to-one relationship of any kind .....
If you do choose ActivationCode for the PK of AccountActivation, the only extra step that I would take is creating a nonclustered index on UserId so that queries that join the two tables will benefit from maximum performance.
If there is only to be one ActivationCode they could share the UserId. But that would imply that when a user re-generated a key you should, either update the old row or delete it.
But why you need to store such data? You could also composite the account activation code with some sort of computation and encryption with unique data from the User.
Just to illustrate my suggestion:
Users table has two columns UserId CreationDate
Then the token might be UserId + CreationDate (example). You would be able to generate and check it without the extra data in the database. I know that this might not suite your requirements.
Make the UserId column in the AccountActivation a foreign key to the Users table.
Users
=====
UserId primary key
Name
Address
etc...
AccountActivation
=================
UserId primary key (foreign key to Users.UserId)
ActivationCode (unique constraint)
Now you have a one-to-one relationship
You need not have the same column as primary key in 2 tables to have one-to-one relationship.
You can have any column as primary key in the AccountActivation table.
UserId which is the foreign key to the AccountActivation table is the primary key in Users table. So, you should definitely be able to uniquely identify a users activation code from the AccountActivation table using this column, no matter whether it is primary key of that table(But it should be unique and I hope it would).

How to enforce uniques across multiple tables

I have the following tables in MySQL server:
Companies:
- UID (unique)
- NAME
- other relevant data
Offices:
- UID (unique)
- CompanyID
- ExternalID
- other data
Employees:
- UID (unique)
- OfficeID
- ExternalID
- other data
In each one of them the UID is unique identifier, created by the database.
There are foreign keys to ensure the links between Employee -> Office -> Company on the UID.
The ExternalID fields in Offices and Employees is the ID provided to my application by the Company (my client(s) actually). The clients does not have (and do not care) about my own IDs, and all the data my application receives from them is identified solely based on their IDs (i.e. ExternalID in my tables).
I.e. a request from the client in pseudo-language is like "I'm Company X, update the data for my employee Y".
I need to enforce uniqueness on the combination of CompanyID and Employees.ExternalID, so in my database there will be no duplicate ExternalID for the employees of the same company.
I was thinking about 3 possible solutions:
Change the schema for Employees to include CompanyID, and create unique constrain on the two fields.
Enforce a trigger, which upon update/insert in Employees validates the uniqueness.
Enforce the check on application level (i.e. my receiving service).
My alternative-dbadmin-in-me sais that (3) is the worst solution, as it does not protect the database of inconsistency in case of application bug or something else, and most probably will be the slowest one.
The trigger solution may be what I want, but it may become complicated, especially if a multiple inserts/updates need to be performed in a single statement, and I'm not sure about the performance vs. (1).
And (1) looks the fastest and easiest approach, but kind of goes against my understanding of relational model.
What SO DB experts opinion is about pros and cons of each of the approaches, especially if there is a possibility for adding an additional level of indirection - i.e. Company -> Office -> Department -> Employee, and the same uniqueness needs to be preserved (Company/Employee).
You're right - #1 is the best option.
Granted, I would question it at first glance (because of shortcutting) but knowing the business rule to ensure an employee is only related to one company - it makes sense.
Additionally, I'd have a foreign key relating the companyid in the employee table to the companyid in the office table. Otherwise, you allow an employee to be related to a company without an office. Unless that is acceptable...
Triggers are a last resort if the relationship can not be demonstrated in the data model, and servicing the logic from the application means the logic is centralized - there's no opportunity for bad data to occur, unless someone drops constraints (which means you have bigger problems).
Each of your company-provided tables should include CompanyID into the `UNIQUE KEY' over the company-provided ids.
Company-provided referential integrity should use company-provided ids:
CREATE TABLE company (
uid INT NOT NULL PRIMARY KEY,
name TEXT
);
CREATE TABLE office (
uid INT NOT NULL PRIMARY KEY,
companyID INT NOT NULL,
externalID INT NOT NULL,
UNIQIE KEY (companyID, externalID),
FOREIGN KEY (companyID) REFERENCES company (uid)
);
CREATE TABLE employee (
uid INT NOT NULL PRIMARY KEY,
companyID INT NOT NULL,
officeID INT NOT NULL,
externalID INT NOT NULL,
UNIQIE KEY (companyID, externalID),
FOREIGN KEY (companyID) REFERENCES company(uid)
FOREIGN KEY (companyID, officeID) REFERENCES office (companyID, externalID)
);
etc.
Set auto_increment_increment to the number of table you have.
SET auto_increment_increment = 3; (you might want to set this in your my.cnf)
Then manually set the starting auto_increment value of each table to different values
first table to 1, second table to 2, third table to 3
Table 1 will have values like 1,4,7,10,13,etc
Table 2 will have values like 2,5,8,11,14,etc
Table 3 will have values like 3,6,9,12,15,etc
Of course this is just ONE option, personally I'd just make it a combo value. Could be as simple as TableID, AutoincrementID, Where the TableID is constant in all rows.

implementing UNIQUE across linked tables in MySQL

a USER is a PERSON and a PERSON has a COMPANY - user -> person is one-to-one, person -> company is many-to-one.
person_id is FK in USER table.
company_id is FK in PERSON table.
A PERSON may not be a USER, but a USER is always a PERSON.
If company_id was in user table, I could create a unique key based on username and company_id, but it isn't, and would be a duplication of data if it was.
Currently, I'm implementing the unique username/company ID rule in the RoseDB manager wrapper code, but it feels wrong. I'd like to define the unique rule at the DB level if I can, but I'm not sure excactly how to approach it. I tried something like this:
alter table user add unique(used_id,person.company_id);
but that doesn't work.
By reading through the documentation, I can't find an example that does anything even remotely similar. Am I trying to add functionality that doesn't exist, or am I missing something here?
Well, there's nothing simple that does what you want. You can probably enforce the constraint you need using BEFORE INSERT and BEFORE UPDATE triggers, though. See this SO question about raising MySQL errors for how to handle making the triggers fail.
Are there more attributes to your PERSON table? Reason I ask is that what you want to implement is a typical corollary table:
USERS table:
user_id (pk)
USER_COMPANY_XREF (nee PERSON) table:
user_id (pk, fk)
company_id (pk, fk)
EFFECTIVE_DATE (not null)
EXPIRY_DATE (not null)
COMPANIES table:
company_id (pk)
The primary key of the USER_COMPANY_XREF table being a composite key of USERS.user_id and COMPANIES.company_id would allow you to associate a user with more than one company while not duplicating data in the USERS table, and provide referencial integrity.
You could define the UNIQUE constraint in the Person table:
CREATE TABLE Company (
company_id SERIAL PRIMARY KEY
) ENGINE=InnoDB;
CREATE TABLE Person (
person_id SERIAL PRIMARY KEY,
company_id BIGINT UNSIGNED,
UNIQUE KEY (person_id, company_id),
FOREIGN KEY (company_id) REFERENCES Company (company_id)
) ENGINE=InnoDB;
CREATE TABLE User (
person_id BIGINT UNSIGNED PRIMARY KEY,
FOREIGN KEY (person_id) REFERENCES Person (person_id)
) ENGINE=InnoDB;
But actually you don't need the unique constraint even in the Person table, because person_id is already unique on its own. There's no way a given person_id could reference two companies.
So I'm not sure what problem you're trying to solve.
Re your comment:
That doesn't solve the issue of allowing the same username to exist in different companies.
So you want a given username to be unique within one company, but usable in different companies? That was not clear to me from your original question.
So if you don't have many other attributes specific to users, I'd combine User with Person and add an "is_user" column. Or just rely on it being implicitly true that a Person with a non-null cryptpass is by definition a User.
Then your problem with cross-table UNIQUE constraints goes away.