SQL - how to keep track of "simple relations" - sql

I hope somebody can edit my title to better describe what I mean, because I don't know exactly what this would be called. However, consider this setup: I want to create a notification system, where a message is displayed to a user until he clicks "dismiss". I then need to "remember" that this user has dismissed the notification so I don't show it to him again. Here is my current solution
users table has a uid primary key and user info
notifications table has a nid primary key and notification text
notifications_seen table with two columns, uid and nid
When somebody clicks dismiss on a notification, I store their uid and the notification's nid in notifications_seen. This seems to work fine, but phpMyAdmin has giant red messages telling me that notifications_seen does not have an index. However, neither column is unique. Should I really have an extra utterly useless column in notifications_seen and call that a primary key? Is there a better way to do this?

You can use more than one column to create your primary key. In this case, you should set nid AND uid as your primary key in your notifications_seen table. The idea here is that even though neither nid or uid will be unique within your notifications_seen table; the nid/uid PAIR is unique. You should add a primary key constraint to these two columns. This is usually what you would like to do for this kind of situation.
There are times where you might actually want to create an auto-increment row to simplify the primary key. For example, when your best candidate key consists of a lot of columns (I'm pulling this out of the air; but lets say 4 or more columns) or you have columns which contain strings; which would be slower to match when doing lookups. But for this situation, just adding the primary key constraint to the two columns should be more than fine.
Primary keys are indexed BY default; which is why you should just add the primary key constraint to the two columns. This will also preserve the integrity of your data by making sure you don't accidentally insert rows with the same uid/nid pair.
You should also add a foreign key constraint on the uid to the id in the users table, and a foreign key constraint to the nid on the id in the notifications table. Adding the foreign key constraints will ensure you don't insert uids or nids which don't actually exist into your notifications_seen table.

You may be able to create a compound primary key (consisting of both uid and nid).

You could make an index on notifications_seen that contains both columns! Or create a separate column just for a primary key, or do both - having an index on uid and nid might speed up queries (but don't worry too much about that until you start to notice major performance problems - just remember it for the future). Having a primary key for these n:n relations isn't a terrible thing.

Related

Primary key in "many-to-many" table

I have a table in a SQL database that provides a "many-to-many" connection.
The table contains id's of both tables and some fields with additional information about the connection.
CREATE TABLE SomeTable (
f_id1 INTEGER NOT NULL,
f_id2 INTEGER NOT NULL,
additional_info text NOT NULL,
ts timestamp NULL DEFAULT now()
);
The table is expected to contain 10 000 - 100 000 entries.
How is it better to design a primary key? Should I create an additional 'id' field, or to create a complex primary key from both id's?
DBMS is PostgreSQL
This is a "hard" question in the sense that there are pretty good arguments on both sides. I have a bias toward putting in auto-incremented ids in all tables that I use. Over time, I have found that this simply helps with the development process and I don't have to think about whether they are necessary.
A big reason for this is so foreign key references to the table can use only one column.
In a many-to-many junction table (aka "association table"), this probably isn't necessary:
It is unlikely that you will add a table with a foreign key relationship to a junction table.
You are going to want a unique index on the columns anyway.
They will probably be declared not null anyway.
Some databases actually store data based on the primary key. So, when you do an insert, then data must be moved on pages to accommodate the new values. Postgres is not one of those databases. It treats the primary key index just like any other index. In other words, you are not incurring "extra" work by declaring one more more columns as a primary key.
My conclusion is that having the composite primary key is fine, even though I would probably have an auto-incremented primary key with separate constraints. The composite primary key will occupy less space so probably be more efficient than an auto-incremented id. However, if there is any chance that this table would be used for a foreign key relationship, then add in another id field.
A surrogate key wont protect you from adding multiple instances of (f_id1, f_id2) so you should definitely have a unique constraint or primary key for that. What would the purpose of a surrogate key be in your scenario?
Yes that's actually what people commonly do, that key is called surrogate key.. I'm not exactly sure with PostgreSQL, but in MySQL by using surrogate key you can delete/edit the records from the user interface.. Besides, this allows the database to query the single key column faster than it could multiple columns.. Hope it helps..

Should a table have a PK when it has a unique FK?

I have this case :
UserSettings is not really a junction table since it only has one FK, which is gonna be unique, one UserSettings for one User. Should UserSettings have UserId marked as Primary Key even if UserId is a unique FK or is it unnecessary ?
If you want to ensure this "which is gonna be unique" requirement then you'll need to define UserID either as UNIQUE or as Primary Key constraint.
UserSettings should ideally not exist. Logically, all of this is one table.
If you wish to keep a separate table (which might be useful for performance or architecture) you should probably use the same primary key. In other words, UserSettings should use the FK as the PK. This is advantageous for performance, storage space and simplicity.
With few exceptions, every table should have a primary key. So yes, I would make it a primary key even if it is also a foreign key.

Should every table have a primary key?

I read somewhere saying that every table should have a primary key to fulfill 1NF.
I have a tbl_friendship table.
There are 2 fields in the table : Owner and Friend.
Fields of Owner and Friends are foreign keys of auto increment id field in tbl_user.
Should this tbl_friendship has a primary key?
Should I create an auto increment id field in tbl_friendship and make it as primary key?
Primary keys can apply to multiple columns! In your example, the primary key should be on both columns, For example (Owner, Friend). Especially when Owner and Friend are foreign keys to a users table rather than actual names say (personally, my identity columns use the "Id" naming convention and so I would have (OwnerId, FriendId)
Personally I believe every table should have a primary key, but you'll find others who disagree.
Here's an article I wrote on the topic of normal forms.
http://michaeljswart.com/2011/01/ridiculously-unnormalized-database-schemas-part-zero/
Yes every table should have a primary key.
Yes you should create surrogate key.. aka an auto increment pk field.
You should also make "Friend" an FK to that auto increment field.
If you think that you are going to "rekey" in the future you might want to look into using natural keys, which are fields that naturally identify your data. The key to this is while coding always use the natural identifiers, and then you create unique indexes on those natural keys. In the future if you have to re-key you can, because your ux guarantees your data is consistent.
I would only do this if you absolutely have to, because it increases complexity, in your code and data model.
It is not clear from your description, but are owner and friend foreign keys and there can be only one relationship between any given pair? This makes two foreign key column a perfect candidate for a natural primary key.
Another option is to use surrogate key (extra auto-incremented column as you suggested). Take a look here for an in-depth discussion.
A primary key can be something abstract as well. In this case, each tuple (owner, friend), e.g. ("Dave","Matt") can form a unique entry and therefore be your primary key. In that case, it would be useful not to use names, but keys referencing another table. If you guarantee, that these tuples can't have duplicates, you have a valid primary key.
For processing reasons it might be useful to introduce a special primary key, like an autoincrement field (e.g. in MySQL) or using a sequence with Oracle.
To comply with 1NF (which is not completely aggreed upon what defines 1NF), yes you should have a primary key identified on each table. This is necessary to provide for uniqueness of each record.
http://en.wikipedia.org/wiki/First_normal_form
In general, you can create a primary key in many ways, one of which is to have an auto-increment column, another is to have a column with GUIDs, another is to have two or more columns that will identify a row uniquely when taken together.
Your table will be much easier to manage in the long term if it has a primary key. At the very least, you need to uniquely identify each record in the table. The field that is used to uniquely identify each record might as well be the primary key.
Yes every table should have (at least one) key. Duplicating rows in any table is undesirable for lots of reasons so put the constraint on those two columns.

How can I replace the existing primary key with a new primary key on my table?

I'm working with a legacy SQL Server database which has a core table with a bad primary key.
The key is of type NVARCHAR(50) and contains an application-generated string based on various things in the table. For obvious reasons, I'd like to replace this key with an auto-incrementing (identity) INT column.
This is a huge database and we're upgrading it piece-by-piece. We want to minimize the changes to tables that other components write to. I figured I could change the table without breaking anything by just:
Adding the new Id column to the table and making it nullable
Filling it with unique integers and making it NOT NULL
Dropping the existing primary key while ensuring there's a uniqueness constraint still on that column
Setting the new Id column to be the new primary key and identity
Item 3 is proving very painful. Because this is a core table, there are a lot of other tables with foreign key constraints on it. To drop the existing primary key, it seems I have to delete all these foreign key constraints and create them again afterwards.
Is there an easier way to do this or will I just have to script everything?
Afraid that is the bad news. We just got through a big project of doing the same type of thing, although our head DBA had a few tricks up his sleeve. You might look at something like this to get your scripts generated for the flipping of the switch:
I once did the same thing and basically used the process you describe. Except of course you have to first visit each other table and add new foreign key pointing to the new column in your base table
So the approach I used was
Add a new column with an auto incrementing integer in the base table, ensure it has a unique index on it (to be replaced later by the primary key)
For each foreign key relationship pointing to the base table add a new column in the child table. (note this can result in adding more than one column in the child table if more than one relationship)
For each instance of a key in the child table enter a value into the new foreign key field(s)
Replace your foreign key relationships such that the new column now serves
Make the new column in the base table the primary
Drop the old primary key in the base table and each old foreign key in the
children.
It is doable and not as hard as it might sound at first. The crux is a series of update statements for the children table of the nature
Update child_table
set new_column = (select new_primary from base)
where old_primary = old_foreign

SQL: To primary key or not to primary key?

I have a table with sets of settings for users, it has the following columns:
UserID INT
Set VARCHAR(50)
Key VARCHAR(50)
Value NVARCHAR(MAX)
TimeStamp DATETIME
UserID together with Set and Key are unique. So a specific user cannot have two of the same keys in a particular set of settings. The settings are retrieved by set, so if a user requests a certain key from a certain set, the whole set is downloaded, so that the next time a key from the same set is needed, it doesn't have to go to the database.
Should I create a primary key on all three columns (userid, set, and key) or should I create an extra field that has a primary key (for example an autoincrement integer called SettingID, bad idea i guess), or not create a primary key, and just create a unique index?
----- UPDATE -----
Just to clear things up: This is an end of the line table, it is not joined in anyway. UserID is a FK to the Users table. Set is not a FK. It is pretty much a helper table for my GUI.
Just as an example: users get the first time they visit parts of the website, a help balloon, which they can close if they want. Once they click it away, I will add some setting to the "GettingStarted" set that will state they helpballoon X has been disabled. Next time when the user comes to the same page, the setting will state that help balloon X should not be shown anymore.
Having composite unique keys is mostly not a good idea.
Having any business relevant data as primary key can also make you troubles. For instance, if you need to change the value. If it is not possible in the application to change the value, it could be in the future, or it must be changed in an upgrade script.
It's best to create a surrogate key, a automatic number which does not have any business meaning.
Edit after your update:
In this case, you can think of having conceptually no primary key, and make this three columns either the primary key of a composite unique key (to make it changeable).
Should I create a primary key on all three columns (userid, set, and key)
Make this one.
Using surrogate primary key will result in an extra column which is not used for other purposes.
Creating a UNIQUE INDEX along with surrogate primary key is same as creating a non-clustered PRIMARY KEY, and will result in an extra KEY lookup which is worse for performance.
Creating a UNIQUE INDEX without a PRIMARY KEY will result in a HEAP-organized table which will need an extra RID lookup to access the values: also not very good.
How many Key's and Set's do you have? Do these need to be varchar(50) or can they point to a lookup table? If you can convert this Set and Key into SetId and KeyId then you can create your primary key on the 3 integer values which will be much faster.
I would probably try to make sure that UserID was a unique identifier, rather than having duplicates of UserID throughout the code. Composite keys tend to get confusing later on in your code's life.
I'm assuming this is a lookup field for config values of some kind, so you could probably go with the composite key if this is the case. The data is already there. You can guarantee it's uniqueness using the primary key. If you change your mind and decide later that it isn't appropriate for you, you can easily add a SettingId and make the original composite key a unique index.
Create one, separate primary key. No matter what how bussines logic will change, what new rules will have to be applied to your Key VARCHAR(50) field - having one primary key will make you completly independent of bussines logic.
In my experience it all depends how many tables will be using this table as FK information. Do you want 3 extra columns in your other tables just to carry over a FK?
Personally I would create another FK column and put a unique constraint over the other three columns. This makes foreign keys to this table a lot easier to swallow.
I'm not a proponent of composite keys, but in this case as an end of the line table, it might make sense. However, if you allow nulls in any of these three fields becasue one or more of the values is not known at the time of the insert, there can be difficulty and a unique index might be better.
Better have UserID as 32 bit newid() or unique identifier because UserID as int gives a hint to the User of the probable UserID. This will also solve your issue of composite key.