REFERENCES keyword in SQLite3 - sql

I was hoping someone could explain to me the purpose of the SQL keyword REFERENCES
CREATE TABLE wizards(
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT,
age INTEGER
, color TEXT);
CREATE TABLE powers(
id INTEGER PRIMARY KEY AUTOINCREMENT,
name STRING,
damage INTEGER,
wizard_id INTEGER REFERENCES wizards(id)
);
I've spent a lot of time trying to look this up and I initially thought that it would constrain the type of data you can enter into the powers table (based on whether the wizard_id ) However, I am still able to insert data into both columns without any constraint that I have noticed.
So, is the keyword REFERENCES just for increasing querying speed? What is its true purpose?
Thanks

It creates a Foreign Key to the other table. This can have performance benefits, but foreign keys are mostly about data integrity. It means that (in your case) the wizard_id filed of powers must have a value that exists in the id field of the wizards table. In other words, powers must refer to a valid wizard. Many databases also use this information to propagate deletions or other changes, so the tables stay in sync.
Just noticed this. A reason that you're able to bypass the key constraint may be that foreign keys aren't enabled. See Enabling foreign keys in the SQLite3 documentation.

From what I've gathered, there are two main benefits of using REFERENCES, and an important distinction to be made between its use with and without FOREIGN KEY.
It gives the DBMS room to optimize
Without using REFERENCES, SQLite would not know that attribute id and attribute wizard_id are functionally equivalent. The more known constraints you can define for the Database Management System (SQLite in this case), the more freedom it has to optimize the way it handles your data under the hood.
It can enforce or encourage good practice
Reference declaration can also be useful for enforcement and warning provision. For example, say you have two tables, A and B, and you assume that A.name is functionally equivalent to B.name, so you attempt a join: SELECT * FROM A, B WHERE A.name = B.name. If REFERENCE was never used to indicate functional equivalency between these two attributes, the DBMS could warn you when you make the join, which would be helpful in the case that these attributes only happen to have the same name but are not actually meant to represent the same thing.
REFERENCES does not always create a "foreign key"
Contrary to what has already been suggested, references and foreign keys are not the same thing. A reference declares functional equivalency between attributes. A foreign key refers to the primary key of another table.
EDIT: #IanMcLaird has corrected me: the use of REFERENCES does always create a foreign key of some kind, although this conflicts with the popular definition of foreign key as "a set of attributes in a table that refers to the primary key of another table" (Wikipedia).
Using REFERENCES without FOREIGN KEY may create a "column-level foreign key" which operates contrary to the popular definition of "foreign key."
There is a difference between the following statements.
driver_id INT REFERENCES Drivers
driver_id INT REFERENCES Drivers(id)
driver_id INT,
FOREIGN KEY(driver_id) REFERENCES Drivers(id)
The first statement assumes that you would like to reference the primary key of Drivers since no attribute is specified. The third statement requires that id be the primary key of Drivers. Both assume you want to make a foreign key by the popular definition provided above; both create a table-level foreign key.
The second statement is tricky. If specifying an attribute which is the primary key of Drivers, the DBMS may opt to create a table-level foreign key. But the specified attribute does not have to be the primary key of Drivers, and if it isn't, the DBMS will create a column-level foreign key. This is somewhat unintuitive for those who are first approaching databases and learn the less-flexible, popular definition of "foreign key."
Some people may use these three statements as if they are the same, and they may be functionally identical in many general use cases, but they are not the same.
All that said, this is just my understanding. I am not an expert in this subject and would greatly appreciate additions, corrections, and affirmations.

Related

Primary key in "many-to-many" table

I have a table in a SQL database that provides a "many-to-many" connection.
The table contains id's of both tables and some fields with additional information about the connection.
CREATE TABLE SomeTable (
f_id1 INTEGER NOT NULL,
f_id2 INTEGER NOT NULL,
additional_info text NOT NULL,
ts timestamp NULL DEFAULT now()
);
The table is expected to contain 10 000 - 100 000 entries.
How is it better to design a primary key? Should I create an additional 'id' field, or to create a complex primary key from both id's?
DBMS is PostgreSQL
This is a "hard" question in the sense that there are pretty good arguments on both sides. I have a bias toward putting in auto-incremented ids in all tables that I use. Over time, I have found that this simply helps with the development process and I don't have to think about whether they are necessary.
A big reason for this is so foreign key references to the table can use only one column.
In a many-to-many junction table (aka "association table"), this probably isn't necessary:
It is unlikely that you will add a table with a foreign key relationship to a junction table.
You are going to want a unique index on the columns anyway.
They will probably be declared not null anyway.
Some databases actually store data based on the primary key. So, when you do an insert, then data must be moved on pages to accommodate the new values. Postgres is not one of those databases. It treats the primary key index just like any other index. In other words, you are not incurring "extra" work by declaring one more more columns as a primary key.
My conclusion is that having the composite primary key is fine, even though I would probably have an auto-incremented primary key with separate constraints. The composite primary key will occupy less space so probably be more efficient than an auto-incremented id. However, if there is any chance that this table would be used for a foreign key relationship, then add in another id field.
A surrogate key wont protect you from adding multiple instances of (f_id1, f_id2) so you should definitely have a unique constraint or primary key for that. What would the purpose of a surrogate key be in your scenario?
Yes that's actually what people commonly do, that key is called surrogate key.. I'm not exactly sure with PostgreSQL, but in MySQL by using surrogate key you can delete/edit the records from the user interface.. Besides, this allows the database to query the single key column faster than it could multiple columns.. Hope it helps..

Can a foreign key be the only primary key

I just have a quick question. Can a table have it's only primary key as a foreign key?
To clarify. When I've been creating tables I sometimes have a table with multiple keys where some of them are foreign keys. For example:
create table Pet(
Name varchar(20),
Owner char(1),
Color varchar(10),
primary key(Name, Owner),
foreign key(Owner) referecnes Person(Ssn)
);
So now I'm wondering if it's possible to do something like this:
create table WorksAs(
Worker char(1),
Work varcahr(30),
primary key(Worker),
foreign key(Worker) references Person(Ssn)
);
This would result in two tables having the exact same primary key. Is this something that should be avoided or is it an ok way to design a database? If the above is not a good standard I would simply make the Work variable a primary key as well and that would be fine, but it seems simpler to just skip if it is not needed.
Yes, it's perfectly legal to do that.
In fact, this is the basis of IS-A relations ;)
Yes. Because of the following reasons.
Making them the primary key will force uniqueness (as opposed to imply it).
The primary key will presumably be clustered (depending on the dbms) which will improve performance for some queries.
It saves the space of adding a unique constraint which in some DBMS also creates a unique index
Yes, you might do so. But you need to be careful as foreign keys can have NULL values whereas Primary can't.
Sure. You can use this approach when mapping inheritance hierarchies using the Concrete Table Inheritance or Class Table Inheritance approach, see e.g. SQL Alchemy docs

Creating PostgreSQL tables + relationships - PROBLEMS with relationships - ONE TO ONE

So I am supposed to create this schema + relationships exactly the way this ERD depicts it. Here I only show the tables that I am having problems with:
So I am trying to make it one to one but for some reason, no matter what I change, I get one to many on whatever table has the foreign key.
This is my sql for these two tables.
CREATE TABLE lab4.factory(
factory_id INTEGER UNIQUE,
address VARCHAR(100) NOT NULL,
PRIMARY KEY ( factory_id )
);
CREATE TABLE lab4.employee(
employee_id INTEGER UNIQUE,
employee_name VARCHAR(100) NOT NULL,
factory_id INTEGER REFERENCES lab4.factory(factory_id),
PRIMARY KEY ( employee_id )
);
Here I get the same thing. I am not getting the one to one relationship but one to many. Invoiceline is a weak entity.
And here is my code for the second image.
CREATE TABLE lab4.product(
product_id INTEGER PRIMARY KEY,
product_name INTEGER NOT NULL
);
CREATE TABLE lab4.invoiceLine(
line_number INTEGER NOT NULL,
quantity INTEGER NOT NULL,
curr_price INTEGER NOT NULL,
inv_no INTEGER REFERENCES invoice,
product_id INTEGER REFERENCES lab4.product(product_id),
PRIMARY KEY ( inv_no, line_number )
);
I would appreciate any help. Thanks.
One-to-one isn't well represented as a first-class relationship type in standard SQL. Much like many-to-many, which is achieved using a connector table and two one-to-many relationships, there's no true "one to one" in SQL.
There are a couple of options:
Create an ordinary foreign key constraint ("one to many" style) and then add a UNIQUE constraint on the referring FK column. This means that no more than one of the referred-to values may appear in the referring column, making it one-to-one optional. This is a fairly simple and quite forgiving approach that works well.
Use a normal FK relationship that could model 1:m, and let your app ensure it's only ever 1:1 in practice. I do not recommend this, there's only a small write performance downside to adding the FK unique index and it helps ensure data validity, find app bugs, and avoid confusing someone else who needs to modify the schema later.
Create reciprocal foreign keys - possible only if your database supports deferrable foreign key constraints. This is a bit more complex to code, but allows you to implement one-to-one mandatory relationships. Each entity has a foreign key reference to the others' PK in a unique column. One or both of the constraints must be DEFERRABLE and either INITIALLY DEFERRED or used with a SET CONSTRAINTS call, since you must defer one of the constraint checks to set up the circular dependency. This is a fairly advanced technique that is not necessary for the vast majority of applications.
Use pre-commit triggers if your database supports them, so you can verify that when entity A is inserted exactly one entity B is also inserted and vice versa, with corresponding checks for updates and deletes. This can be slow and is usually unnecessary, plus many database systems don't support pre-commit triggers.

Omitting columns of parent-table when creating Foreign Key

To create a Foreign Key in Oracle, some times I see
CONSTRAINT FK_Supplier
FOREIGN KEY (Supplier_id)
REFERENCES Supplier(Supplier_id)
But, some other times, I see this
CONSTRAINT FK_Supplier
FOREIGN KEY (Supplier_id)
REFERENCES Supplier
The difference is that the column Supplier_id comes after the table Supplier in the first statement but it is omitted in the second statement.
Thanks for helping
This is described in the documentation:
If you identify only the parent table or view and omit the column
name, then the foreign key automatically references the primary key of
the parent table or view. The corresponding column or columns of the
foreign key and the referenced key must match in order and datatype.
One of the major criticisms of SQL as regards not being faithful to the relational model is reliance on column ordering. However, just because SQL includes non-relational feature it does not mean that one should use them; in fact, I feel strongly that such features should be avoided or, when avoidance is impossible, mitigated against.
Standard SQL provides some syntax to avoid column ordering reliance (NATURAL JOIN, UNION CORRESPONDING, etc). Other syntax helps mitigate against such reliance (e.g. INSERT INTO (<comma list of columns>) VALUES (<comma list of fields in same order>)). FOREIGN KEY syntax falls into this second category.
Conclusion: always use the syntax in your first example and avoid the second.

Why is this kind of foreign keys possible?

Why does the SQL Standard accept this? Which are the benefits?
If have those tables:
create table prova_a (a number, b number);
alter table prova_a add primary key (a,b);
create table prova_b (a number, b number);
alter table prova_b add foreign key (a,b) references prova_a(a,b) ;
insert into prova_a values (1,2);
You can insert this without error:
insert into prova_b values (123,null);
insert into prova_b values (null,123);
Note1: This comes from this answer.
Note2: This can be avoid, setting not null on both columns.
Remarks: I'm not asking about avoid, I'm interested on which are the beneficts.
References:
Oracle documentation: The relational model permits the value of foreign keys to match either the referenced primary or unique key value, or be null. If any column of a composite foreign key is null, then the non-null portions of the key do not have to match any corresponding portion of a parent key.
SQL Server documentation: A FOREIGN KEY constraint can contain null values; however, if any column of a composite FOREIGN KEY constraint contains null values, verification of all values that make up the FOREIGN KEY constraint is skipped.
I know some DBMSs simply don't enforce referential integrity when it comes to foreign keys with foreign key constraints. SQLite comes to mind. It's talked about here.
Other DBMSs are different, I know that MS SQL Server will complain if you attempt something like that.
SQLite has its uses but it is not meant to be used in high-concurrency situations. If you are seeing this behavior in a different DBMS, check their documentation to see if they did something similar. Most should be enforcing integrity however.
at least do your DEV work with a reasonably standard RDBMS, even if you are doing your production system with something like SQLite (which is an excellent database- it runs in your Ipod touch!) It will flush out all these mistakes- like Lint really. If you run your code with SQL Server Express, which you can download for free, you'll get plenty of errors such as...
Msg 8111, Level 16, State 1, Line 2
Cannot define PRIMARY KEY constraint on nullable column in table 'prova_a'.
Msg 1750, Level 16, State 0, Line 2
Could not create constraint. See previous errors.
Oracle and SQL Server both allow NULL foreign keys, and it is easily understandable why this is necessary.
Think of a tree, for instance, where every row has a parent key that references the primary key of the same table. There has to be a root node in the tree that does not have a parent, and the parent key will be null.
A more tangible example: think of employees and managers. Some people in the company, and if it is only the CEO, will not have a manager. Were it not possible to set the manager id on the employee table to NULL, you would have to create a "No Manager" employee - something that is just wrong, because it has no real-life correspondence.
Now that we know this, it is obvious why your composite keys behave like they do. Logically, if part of the composite is NULL, the entire key is null. A string concatenation returns NULL if one of the pieces is NULL. There cannot be a match, and the constraint is not enforced in these cases.
The SQL standard doesn't accept this; you've found a DBMS that doesn't enforce referential integrity. Uninstall it now if you're smart. At a bare minimum, don't use it for production purposes.
Earlier SQL standards (SQL86) had no referential integrity and SQL89 level 2 fixed that.
Try adding this declaration:
alter table prova_b add primary key (a,b);
This will forbid NULLS in prova_b. It will also forbid duplicate entries. In Oracle and SQL server, it will also create an index. This index will speed up lookups and joins, at the cost of slowing down inserts a tiny bit.
Is this what you want to do?
As to why standard SQL lets you do something you consider stupid, that's a philosophical question. Most tools allow some stupid choices. Tools that try to forbid all stupid choices generally end up forbidding some really smart choices unintentionally.