Why is this kind of foreign keys possible? - sql

Why does the SQL Standard accept this? Which are the benefits?
If have those tables:
create table prova_a (a number, b number);
alter table prova_a add primary key (a,b);
create table prova_b (a number, b number);
alter table prova_b add foreign key (a,b) references prova_a(a,b) ;
insert into prova_a values (1,2);
You can insert this without error:
insert into prova_b values (123,null);
insert into prova_b values (null,123);
Note1: This comes from this answer.
Note2: This can be avoid, setting not null on both columns.
Remarks: I'm not asking about avoid, I'm interested on which are the beneficts.
References:
Oracle documentation: The relational model permits the value of foreign keys to match either the referenced primary or unique key value, or be null. If any column of a composite foreign key is null, then the non-null portions of the key do not have to match any corresponding portion of a parent key.
SQL Server documentation: A FOREIGN KEY constraint can contain null values; however, if any column of a composite FOREIGN KEY constraint contains null values, verification of all values that make up the FOREIGN KEY constraint is skipped.

I know some DBMSs simply don't enforce referential integrity when it comes to foreign keys with foreign key constraints. SQLite comes to mind. It's talked about here.
Other DBMSs are different, I know that MS SQL Server will complain if you attempt something like that.
SQLite has its uses but it is not meant to be used in high-concurrency situations. If you are seeing this behavior in a different DBMS, check their documentation to see if they did something similar. Most should be enforcing integrity however.

at least do your DEV work with a reasonably standard RDBMS, even if you are doing your production system with something like SQLite (which is an excellent database- it runs in your Ipod touch!) It will flush out all these mistakes- like Lint really. If you run your code with SQL Server Express, which you can download for free, you'll get plenty of errors such as...
Msg 8111, Level 16, State 1, Line 2
Cannot define PRIMARY KEY constraint on nullable column in table 'prova_a'.
Msg 1750, Level 16, State 0, Line 2
Could not create constraint. See previous errors.

Oracle and SQL Server both allow NULL foreign keys, and it is easily understandable why this is necessary.
Think of a tree, for instance, where every row has a parent key that references the primary key of the same table. There has to be a root node in the tree that does not have a parent, and the parent key will be null.
A more tangible example: think of employees and managers. Some people in the company, and if it is only the CEO, will not have a manager. Were it not possible to set the manager id on the employee table to NULL, you would have to create a "No Manager" employee - something that is just wrong, because it has no real-life correspondence.
Now that we know this, it is obvious why your composite keys behave like they do. Logically, if part of the composite is NULL, the entire key is null. A string concatenation returns NULL if one of the pieces is NULL. There cannot be a match, and the constraint is not enforced in these cases.

The SQL standard doesn't accept this; you've found a DBMS that doesn't enforce referential integrity. Uninstall it now if you're smart. At a bare minimum, don't use it for production purposes.
Earlier SQL standards (SQL86) had no referential integrity and SQL89 level 2 fixed that.

Try adding this declaration:
alter table prova_b add primary key (a,b);
This will forbid NULLS in prova_b. It will also forbid duplicate entries. In Oracle and SQL server, it will also create an index. This index will speed up lookups and joins, at the cost of slowing down inserts a tiny bit.
Is this what you want to do?
As to why standard SQL lets you do something you consider stupid, that's a philosophical question. Most tools allow some stupid choices. Tools that try to forbid all stupid choices generally end up forbidding some really smart choices unintentionally.

Related

Why would someone need to enable/disable constraints?

Just starting to learn basics of SQL. In some versions of SQL (Oracle, SQL server etc.) there are enable/disable constraints keywords. What is the difference between these and add/drop constraints keywords? Why do we need it?
Constraint validation has a performance penalty when performing a DML operation. It's common to disable a constraint before a bulk insert/import of data (especially if you know that data is "OK"), and then enable it after the bulk operation is done.
I use disabled constraints in a special situation. I have an application with many tables (around 1000). The records in these table have "natural keys", i.e. identifiers and relations which are given by external source. Some tables use even different natural keys as foreign key references to different tables.
But I like to use common surrogate keys as primary key and for foreign references.
Here is one example (not 100% sure about correct syntax):
CREATE TABLE T_BTS (
OBJ_ID number constraint BTS_PK (OBJ_ID) PRIMARY KEY,
BTS_ID VARCHAR2(20) CONSTRAINT BTS_UK (BTS_ID) UNIQUE,
some more columns);
CREATE TABLE T_CELL (
OBJ_ID number constraint BTS_PK (OBJ_ID) PRIMARY KEY,
OBJ_ID_PARENT number,
BTS_ID VARCHAR2(20),
CELL_ID VARCHAR2(20) CONSTRAINT CELL_UK (BTS_ID, CELL_ID) UNIQUE,
some more columns);
ALTER TABLE T_CELL ADD CONSTRAINT CELL_PARENT_FK
FOREIGN KEY (OBJ_ID_PARENT)
REFERENCES T_BTS (OBJ_ID);
ALTER TABLE T_CELL ADD CONSTRAINT CELL_PARENT
FOREIGN KEY (BTS_ID)
REFERENCES T_BTS (BTS_ID) DISABLE;
In all my tables the primary key column is always OBJ_ID and the key to parent table is always OBJ_ID_PARENT, not matter how the natural key is defined. This makes me easier to have common PL/SQL procedures and compose dynamic SQL Statements.
One example: In order to set OBJ_ID_PARENT after insert, following update would be needed
UPDATE T_CELL cell SET OBJ_ID_PARENT =
(SELECT OBJ_ID
FROM T_BTS bts
WHERE cell.BTS_ID = bts.BTS_ID)
I am too lazy to write 1000+ such individual statements. By using views USER_CONSTRAINTS and USER_CONS_COLUMNS I am able to link the natural keys and the surrogate keys and I can execute these updates via dynamic SQL.
All my keys and references are purely defined by constraints. I don't need to maintain any extra table where I track relations or column names. The only limitation in my application design is, I have to utilize a certain naming convention for the constraints. But the countervalue for this is almost no maintenance is required to keep the data consistent and have good performance.
In order to use all above, some constrains needs to be disabled - even permanently.
I [almost] never disable constraints during the normal operation of the application. The point of the constraints is to preserve data quality.
Now, during maintenance, I can disable them temporarily while adding or removing massive amounts of data. Once they data is loaded I make sure they are enabled again before restarting the application.

REFERENCES keyword in SQLite3

I was hoping someone could explain to me the purpose of the SQL keyword REFERENCES
CREATE TABLE wizards(
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT,
age INTEGER
, color TEXT);
CREATE TABLE powers(
id INTEGER PRIMARY KEY AUTOINCREMENT,
name STRING,
damage INTEGER,
wizard_id INTEGER REFERENCES wizards(id)
);
I've spent a lot of time trying to look this up and I initially thought that it would constrain the type of data you can enter into the powers table (based on whether the wizard_id ) However, I am still able to insert data into both columns without any constraint that I have noticed.
So, is the keyword REFERENCES just for increasing querying speed? What is its true purpose?
Thanks
It creates a Foreign Key to the other table. This can have performance benefits, but foreign keys are mostly about data integrity. It means that (in your case) the wizard_id filed of powers must have a value that exists in the id field of the wizards table. In other words, powers must refer to a valid wizard. Many databases also use this information to propagate deletions or other changes, so the tables stay in sync.
Just noticed this. A reason that you're able to bypass the key constraint may be that foreign keys aren't enabled. See Enabling foreign keys in the SQLite3 documentation.
From what I've gathered, there are two main benefits of using REFERENCES, and an important distinction to be made between its use with and without FOREIGN KEY.
It gives the DBMS room to optimize
Without using REFERENCES, SQLite would not know that attribute id and attribute wizard_id are functionally equivalent. The more known constraints you can define for the Database Management System (SQLite in this case), the more freedom it has to optimize the way it handles your data under the hood.
It can enforce or encourage good practice
Reference declaration can also be useful for enforcement and warning provision. For example, say you have two tables, A and B, and you assume that A.name is functionally equivalent to B.name, so you attempt a join: SELECT * FROM A, B WHERE A.name = B.name. If REFERENCE was never used to indicate functional equivalency between these two attributes, the DBMS could warn you when you make the join, which would be helpful in the case that these attributes only happen to have the same name but are not actually meant to represent the same thing.
REFERENCES does not always create a "foreign key"
Contrary to what has already been suggested, references and foreign keys are not the same thing. A reference declares functional equivalency between attributes. A foreign key refers to the primary key of another table.
EDIT: #IanMcLaird has corrected me: the use of REFERENCES does always create a foreign key of some kind, although this conflicts with the popular definition of foreign key as "a set of attributes in a table that refers to the primary key of another table" (Wikipedia).
Using REFERENCES without FOREIGN KEY may create a "column-level foreign key" which operates contrary to the popular definition of "foreign key."
There is a difference between the following statements.
driver_id INT REFERENCES Drivers
driver_id INT REFERENCES Drivers(id)
driver_id INT,
FOREIGN KEY(driver_id) REFERENCES Drivers(id)
The first statement assumes that you would like to reference the primary key of Drivers since no attribute is specified. The third statement requires that id be the primary key of Drivers. Both assume you want to make a foreign key by the popular definition provided above; both create a table-level foreign key.
The second statement is tricky. If specifying an attribute which is the primary key of Drivers, the DBMS may opt to create a table-level foreign key. But the specified attribute does not have to be the primary key of Drivers, and if it isn't, the DBMS will create a column-level foreign key. This is somewhat unintuitive for those who are first approaching databases and learn the less-flexible, popular definition of "foreign key."
Some people may use these three statements as if they are the same, and they may be functionally identical in many general use cases, but they are not the same.
All that said, this is just my understanding. I am not an expert in this subject and would greatly appreciate additions, corrections, and affirmations.

What's a foreign key for?

I've been using table associations with SQL (MySQL) and Rails without a problem, and I've never needed to specify a foreign key constraint.
I just add a table_id column in the belongs_to table, and everything works just fine.
So what am I missing? What's the point of using the foreign key clause in MySQL or other RDBMS?
Thanks.
A foreign key is a referential constraint between two tables
The reason foreign key constraints exist is to guarantee that the referenced rows exist.
The foreign key identifies a column or set of columns in one (referencing or child) table that refers to a column or set of columns in another (referenced or parent) table.
you can get nice "on delete cascade" behavior, automatically cleaning up tables
There are lots of reason of using foreign key listed over here: Why Should one use foreign keys
Rails (ActiveRecord more specifically) auto-guesses the foreign key for you.
... By default this is guessed to be the name of the association with an “_id” suffix.
Foreign keys enforce referential integrity.
Foreign key: A column or set of columns in a table whose values are required to match at least one PrimaryKey values of a row of another table.
See also:
http://api.rubyonrails.org/classes/ActiveRecord/Associations/ClassMethods.html#method-i-belongs_to
http://c2.com/cgi/wiki?ForeignKey
The basic idea of foreign keys, or any referential constraint, is that the database should not allow you to store obviously invalid data. It is a core component of data consistency, one of the ACID rules.
If your data model says that you can have multiple phone numbers associated with an account, you can define the phone table to require a valid account number. It's therefore impossible to store orphaned phone records because you cannot insert a row in the phone table without a valid account number, and you can't delete an account without first deleting the phone numbers. If the field is birthdate, you might enforce a constraint that the date be prior to tomorrow's date. If the field is height, you might enforce that the distance be between 30 and 4000 cm. This means that it is impossible for any application to store invalid data in the database.
"Well, why can'd I just write all that into my application?" you ask. For a single-application database, you can. However, any business with a non-trivial database that stores data used business operations will want to access data directly. They'll want to be able to import data from finance or HR, or export addresses to sales, or create application user accounts by importing them from Active Directory, etc. For a non-trivial application, the user's data is what's important, and that's what they will want to access. At some point, they will want to access their data without your application code getting in the way. This is the real power and strength of an RDMBS, and it's what makes system integration possible.
However, if all your rules are stored in the application, then your users will need to be extremely careful about how they manipulate their database, lest they cause the application to implode. If you specify relational constraints and referential integrity, you require that other applications modify the data in a way that makes sense to any application that's going to use it. The logic is tied to the data (where it belongs) rather than the application.
Note that MySQL is absolute balls with respect to referential integrity. It will tend to silently succeed rather than throw errors, usually by inserting obviously invalid values like a datetime of today when you try to insert a null date into a datetime field with the constraint not null default null. There's a good reason that DBAs say that MySQL is a joke.
Foreign keys enforce referential integrity. Foreign key constraint will prevent you or any other user from adding incorrect records by mistake in the table. It makes sure that the Data (ID) being entered in the foreign key does exists in the reference table. If some buggy client code tries to insert incorrect data then in case of foreign key constraint an exception will raise, otherwise if the constraint is absent then your database will end up with inconsistent data.
Some advantages of using foreign key I can think of:
Make data consistent among tables, prevent having bad data( e.g. table A has some records refer to something does not exist in table B)
Help to document our database
Some framework is based on foreign keys to generate domain model

Oracle Database Enforce CHECK on multiple tables

I am trying to enforce a CHECK Constraint in a ORACLE Database on multiple tables
CREATE TABLE RollingStocks (
Id NUMBER,
Name Varchar2(80) NOT NULL,
RollingStockCategoryId NUMBER NOT NULL,
CONSTRAINT Pk_RollingStocks Primary Key (Id),
CONSTRAINT Check_RollingStocks_CategoryId
CHECK ((RollingStockCategoryId IN (SELECT Id FROM FreightWagonTypes))
OR
(RollingStockCategoryId IN (SELECT Id FROM LocomotiveClasses)))
);
...but i get the following error:
*Cause: Subquery is not allowed here in the statement.
*Action: Remove the subquery from the statement.
Can you help me understanding what is the problem or how to achieve the same result?
Check constraints are very limited in Oracle. To do a check like you propose, you'd have to implement a PL/SQL trigger.
My advise would be to avoid triggers altogether. Implement a stored procedure that modifies the database and includes the checks. Stored procedures are easier to maintain, although they are slightly harder to implement. But changing a front end from direct table access to stored procedure access pays back many times in the long run.
What you are trying to is ensure that the values inserted in one table exist in another table i.e. enforce a foreign key. So that would be :
CREATE TABLE RollingStocks (
...
CONSTRAINT Pk_RollingStocks Primary Key (Id),
CONSTRAINT RollingStocks_CategoryId_FK (RollingStockCategoryId )
REFERENCES FreightWagonTypes (ID)
);
Except that you want to enforce a foreign key which references two tables. This cannot be done.
You have a couple of options. One would be to merge FreightWagonTypes and LocomotiveClasses into a single table. If you need separate tables for other parts of your application then you could build a materialized view for the purposes of enforcing the foreign key. Materialized Views are like tables and can be referenced by foreign keys. This option won't work if the key values for the two tables clash.
Another option is to recognise that the presence of two candidate referenced tables suggests that RollingStock maybe needs to be split into two tables - or perhaps three: a super type and two sub-type tables, that is RollingStock and FreightWagons, Locomotives.
By the way, what about PassengerCoaches, GuardsWagons and RestaurantCars?
Oracle doesn't support complex check constraints like that, unfortunately.
In this case, your best option is to change the data model a bit - add a parent table over FreightWagonTypes and LocomotiveClasses, which will hold all the ids from both of these tables. That way you can add a FK to a single table.

What is the purpose of constraint naming

What is the purpose of naming your constraints (unique, primary key, foreign key)?
Say I have a table which is using natural keys as a primary key:
CREATE TABLE Order
(
LoginName VARCHAR(50) NOT NULL,
ProductName VARCHAR(50) NOT NULL,
NumberOrdered INT NOT NULL,
OrderDateTime DATETIME NOT NULL,
PRIMARY KEY(LoginName, OrderDateTime)
);
What benefits (if any) does naming my PK bring?
Eg.
Replace:
PRIMARY KEY(LoginName, OrderDateTime)
With:
CONSTRAINT Order_PK PRIMARY KEY(LoginName, OrderDateTime)
Sorry if my data model is not the best, I'm new to this!
Here's some pretty basic reasons.
(1) If a query (insert, update, delete) violates a constraint, SQL will generate an error message that will contain the constraint name. If the constraint name is clear and descriptive, the error message will be easier to understand; if the constraint name is a random guid-based name, it's a lot less clear. Particulary for end-users, who will (ok, might) phone you up and ask what "FK__B__B_COL1__75435199" means.
(2) If a constraint needs to be modified in the future (yes, it happens), it's very hard to do if you don't know what it's named. (ALTER TABLE MyTable drop CONSTRAINT um...) And if you create more than one instance of the database "from scratch" and use system-generated default names, no two names will ever match.
(3) If the person who gets to support your code (aka a DBA) has to waste a lot of pointless time dealing with case (1) or case (2) at 3am on Sunday, they're quite probably in a position to identify where the code came from and be able to react accordingly.
To identify the constraint in the future (e.g. you want to drop it in the future), it should have a unique name. If you don't specify a name for it, the database engine will probably assign a weird name (e.g. containing random stuff to ensure uniqueness) for you.
It keeps the DBAs happy, so they let your schema definition into the production database.
When your code randomly violates some foreign key constraint, it sure as hell saves time on debugging to figure out which one it was. Naming them greatly simplifies debugging your inserts and your updates.
It helps someone to know quickly what constraints are doing without having to look at the actual constraint, as the name gives you all the info you need.
So, I know if it is a primary key, unique key or default key, as well as the table and possibly columns involved.
By correctly naming all constraints, You can quickly associate a particular constraint with our data model. This gives us two real advantages:
We can quickly identify and fix any errors.
We can reliably modify or drop constraints.
By naming the constraints you can differentiate violations of them. This is not only useful for admins and developers, but your program can also use the constraint names. This is much more robust than trying to parse the error message. By using constraint names your program can react differently depending on which constraint was violated.
Constraint names are also very useful to display appropriate error messages in the user’s language mentioning which field caused a constraint violation instead of just forwarding a cryptic error message from the database server to the user.
See my answer on how to do this with PostgreSQL and Java.
While the OP's example used a permanent table, just remember that named constraints on temp tables behave like named constraints on permanent tables (i.e. you can't have multiple sessions with the exact same code handling the temp table, without it generating an error because the constraints are named the same). Because named constraints must be unique, if you absolutely must name a constraint on a temp table try to do so with some sort of randomized GUID (like SELECT NEWID() ) on the end of it to ensure that it will uniquely-named across sessions.
Another good reason to name constraints is if you are using version control on your database schema. In this case, if you have to drop and re-create a constraint using the default database naming (in my case SQL Server) then you will see differences between your committed version and the working copy because it will have a newly generated name. Giving an explicit name to the constraint will avoid this being flagged as a change.