I just have a quick question. Can a table have it's only primary key as a foreign key?
To clarify. When I've been creating tables I sometimes have a table with multiple keys where some of them are foreign keys. For example:
create table Pet(
Name varchar(20),
Owner char(1),
Color varchar(10),
primary key(Name, Owner),
foreign key(Owner) referecnes Person(Ssn)
);
So now I'm wondering if it's possible to do something like this:
create table WorksAs(
Worker char(1),
Work varcahr(30),
primary key(Worker),
foreign key(Worker) references Person(Ssn)
);
This would result in two tables having the exact same primary key. Is this something that should be avoided or is it an ok way to design a database? If the above is not a good standard I would simply make the Work variable a primary key as well and that would be fine, but it seems simpler to just skip if it is not needed.
Yes, it's perfectly legal to do that.
In fact, this is the basis of IS-A relations ;)
Yes. Because of the following reasons.
Making them the primary key will force uniqueness (as opposed to imply it).
The primary key will presumably be clustered (depending on the dbms) which will improve performance for some queries.
It saves the space of adding a unique constraint which in some DBMS also creates a unique index
Yes, you might do so. But you need to be careful as foreign keys can have NULL values whereas Primary can't.
Sure. You can use this approach when mapping inheritance hierarchies using the Concrete Table Inheritance or Class Table Inheritance approach, see e.g. SQL Alchemy docs
Related
I recently created a relational database model and it has a lot of primary key and foreign key relations. I want to use clickhouse for my database but it turns out that clickhouse does not support foreign key and unique primary keys. Can someone tell me if I am missing anything here.
You are right. CH does not have unique & foreign constraints.
Moreover JOINs are not the best part of ClickHouse.
ClickHouse suggests to create single wide denormalized table and avoid joins as possible.
I have a table in a SQL database that provides a "many-to-many" connection.
The table contains id's of both tables and some fields with additional information about the connection.
CREATE TABLE SomeTable (
f_id1 INTEGER NOT NULL,
f_id2 INTEGER NOT NULL,
additional_info text NOT NULL,
ts timestamp NULL DEFAULT now()
);
The table is expected to contain 10 000 - 100 000 entries.
How is it better to design a primary key? Should I create an additional 'id' field, or to create a complex primary key from both id's?
DBMS is PostgreSQL
This is a "hard" question in the sense that there are pretty good arguments on both sides. I have a bias toward putting in auto-incremented ids in all tables that I use. Over time, I have found that this simply helps with the development process and I don't have to think about whether they are necessary.
A big reason for this is so foreign key references to the table can use only one column.
In a many-to-many junction table (aka "association table"), this probably isn't necessary:
It is unlikely that you will add a table with a foreign key relationship to a junction table.
You are going to want a unique index on the columns anyway.
They will probably be declared not null anyway.
Some databases actually store data based on the primary key. So, when you do an insert, then data must be moved on pages to accommodate the new values. Postgres is not one of those databases. It treats the primary key index just like any other index. In other words, you are not incurring "extra" work by declaring one more more columns as a primary key.
My conclusion is that having the composite primary key is fine, even though I would probably have an auto-incremented primary key with separate constraints. The composite primary key will occupy less space so probably be more efficient than an auto-incremented id. However, if there is any chance that this table would be used for a foreign key relationship, then add in another id field.
A surrogate key wont protect you from adding multiple instances of (f_id1, f_id2) so you should definitely have a unique constraint or primary key for that. What would the purpose of a surrogate key be in your scenario?
Yes that's actually what people commonly do, that key is called surrogate key.. I'm not exactly sure with PostgreSQL, but in MySQL by using surrogate key you can delete/edit the records from the user interface.. Besides, this allows the database to query the single key column faster than it could multiple columns.. Hope it helps..
I was hoping someone could explain to me the purpose of the SQL keyword REFERENCES
CREATE TABLE wizards(
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT,
age INTEGER
, color TEXT);
CREATE TABLE powers(
id INTEGER PRIMARY KEY AUTOINCREMENT,
name STRING,
damage INTEGER,
wizard_id INTEGER REFERENCES wizards(id)
);
I've spent a lot of time trying to look this up and I initially thought that it would constrain the type of data you can enter into the powers table (based on whether the wizard_id ) However, I am still able to insert data into both columns without any constraint that I have noticed.
So, is the keyword REFERENCES just for increasing querying speed? What is its true purpose?
Thanks
It creates a Foreign Key to the other table. This can have performance benefits, but foreign keys are mostly about data integrity. It means that (in your case) the wizard_id filed of powers must have a value that exists in the id field of the wizards table. In other words, powers must refer to a valid wizard. Many databases also use this information to propagate deletions or other changes, so the tables stay in sync.
Just noticed this. A reason that you're able to bypass the key constraint may be that foreign keys aren't enabled. See Enabling foreign keys in the SQLite3 documentation.
From what I've gathered, there are two main benefits of using REFERENCES, and an important distinction to be made between its use with and without FOREIGN KEY.
It gives the DBMS room to optimize
Without using REFERENCES, SQLite would not know that attribute id and attribute wizard_id are functionally equivalent. The more known constraints you can define for the Database Management System (SQLite in this case), the more freedom it has to optimize the way it handles your data under the hood.
It can enforce or encourage good practice
Reference declaration can also be useful for enforcement and warning provision. For example, say you have two tables, A and B, and you assume that A.name is functionally equivalent to B.name, so you attempt a join: SELECT * FROM A, B WHERE A.name = B.name. If REFERENCE was never used to indicate functional equivalency between these two attributes, the DBMS could warn you when you make the join, which would be helpful in the case that these attributes only happen to have the same name but are not actually meant to represent the same thing.
REFERENCES does not always create a "foreign key"
Contrary to what has already been suggested, references and foreign keys are not the same thing. A reference declares functional equivalency between attributes. A foreign key refers to the primary key of another table.
EDIT: #IanMcLaird has corrected me: the use of REFERENCES does always create a foreign key of some kind, although this conflicts with the popular definition of foreign key as "a set of attributes in a table that refers to the primary key of another table" (Wikipedia).
Using REFERENCES without FOREIGN KEY may create a "column-level foreign key" which operates contrary to the popular definition of "foreign key."
There is a difference between the following statements.
driver_id INT REFERENCES Drivers
driver_id INT REFERENCES Drivers(id)
driver_id INT,
FOREIGN KEY(driver_id) REFERENCES Drivers(id)
The first statement assumes that you would like to reference the primary key of Drivers since no attribute is specified. The third statement requires that id be the primary key of Drivers. Both assume you want to make a foreign key by the popular definition provided above; both create a table-level foreign key.
The second statement is tricky. If specifying an attribute which is the primary key of Drivers, the DBMS may opt to create a table-level foreign key. But the specified attribute does not have to be the primary key of Drivers, and if it isn't, the DBMS will create a column-level foreign key. This is somewhat unintuitive for those who are first approaching databases and learn the less-flexible, popular definition of "foreign key."
Some people may use these three statements as if they are the same, and they may be functionally identical in many general use cases, but they are not the same.
All that said, this is just my understanding. I am not an expert in this subject and would greatly appreciate additions, corrections, and affirmations.
For a Bridge Table I have the PK from 2 other tables. What are the pros and cons of making a PK field for the bridge table or making a composite/compound between the two fields.
I want to make sure I am following best practices.
Some Links I am reading over:
https://dba.stackexchange.com/questions/3134/in-sql-is-it-composite-or-compound-keys
http://social.msdn.microsoft.com/Forums/en-US/sqlgetstarted/thread/9d3cfd17-e596-4411-b3d8-66e0ec8bfdc7/
http://www.ben-morris.com/identity-surrogate-vs-composite-keys-in-sql-server
Composite primary keys versus unique object ID field
You have to enforce a unique constraint of some kind on the two foreign keys. The easiest way to do that is with a primary key constraint.
An additional, surrogate ID number isn't really useful. Some people use it because it makes foreign key constraints and joins to the "bridge" table easier to write. I think that, if you think it's hard to make a join using two integers, you shouldn't be working with databases in the first place.
I realise there might be similar questions but I couldn't find one that was close enough for guidance.
Given this spec,
Site
---------------------------
SiteID int identity
Name varchar(50)
Series
---------------------
SiteID int
SeriesCode varchar(6)
...
--SeriesCode will be unique for every unique SiteID
Episode
----------------------
SiteID int
SeriesCode varchar(6)
EpisodeCode varchar(10)
...
my proposed design/implementation is
Site
----------------------------
SiteID int identity
Name varchar(50)
Series
-------------------------------------------
SeriesID int identity, surrogate key
SiteID int natural key
SeriesCode varchar(6) natural key
UNIQUE(SiteID, SeriesCode)
...
Episode
-------------------------------------------
EpisodeID int identity, surrogate key
SeriesID int foreign key
EpisodeCode varchar(6) natural key
...
Anything wrong with this? Is it okay to have the SeriesID surrogate as a foreign* key here? I'm not sure if I'm missing any obvious problems that can arise. Or would it be better to use composite natural keys (SiteID+SeriesCode / SiteID+EpisodeCode)? In essence that'd decouple the Episode table from the Series table and that doesn't sit right for me.
Worth adding is that SeriesCode looks like 'ABCD-1' and EpisodeCode like 'ABCD-1NMO9' in the raw input data that will populate these tables, so that's another thing that could be changed I suppose.
*: "virtual" foreign key, since it's been previously decided by the higher-ups we should not use actual foreign keys
Yes, it all looks fine. The only (minor) point I might make is that unless you have another 4th child table hanging off of Episode, you probably don't need EpisodeId, as Episode.EpisodeCode is a single attribute natural key sufficient to identify and locate rows in Episode. It's no harm to leave it there, of course, but as a general rule I add surrogate keys to act as targets for FKs in child tables, and try to add a narural key to every table to indentify and control redundant data rows... So if a table has no other table with a FK referencing it, (and never will) I sometimes don't bother including a surrogate key in it.
What's a "virtual" foreign key? Did the higher-ups decide not to use foreign key constraints? In that case, you're not using foreign keys at all. You're just pretending to.
And is Episode the best choice for an entity? Doesn't it really mean Show or Podcast or so, and just happens to always be part of a series right now? If so, will that change in the future? Will Episode eventually be abused to encompass Show outside of a Series? In that case, tying Episode to Site via Series might come back to haunt you.
Given all that, and assuming that you as a grunt probably can't change any of it: if i was you i'd feel safer using natural keys wherever possible. In absence of foreign key constraints, it makes recognizing bad data easier, and if you have to resort to some SeriesCode='EMPTY' trickery later on that's easier with natural keys, too.
My suggestion:
Use natural/business as primary key whenever possible except in the following 3 situations:
The natural/business key is unknown at the moment of inserting
The natural/business key is not good ( it's not unique, it's liable to change frequently )
The natural/business key is a composite of more than 3 columns and the table will have child tables
In situations 1 and 2 a surrogate key is requiered.
In situation 3 a surrogate key is strongly recommended.