What's a foreign key for? - sql

I've been using table associations with SQL (MySQL) and Rails without a problem, and I've never needed to specify a foreign key constraint.
I just add a table_id column in the belongs_to table, and everything works just fine.
So what am I missing? What's the point of using the foreign key clause in MySQL or other RDBMS?
Thanks.

A foreign key is a referential constraint between two tables
The reason foreign key constraints exist is to guarantee that the referenced rows exist.
The foreign key identifies a column or set of columns in one (referencing or child) table that refers to a column or set of columns in another (referenced or parent) table.
you can get nice "on delete cascade" behavior, automatically cleaning up tables
There are lots of reason of using foreign key listed over here: Why Should one use foreign keys

Rails (ActiveRecord more specifically) auto-guesses the foreign key for you.
... By default this is guessed to be the name of the association with an “_id” suffix.
Foreign keys enforce referential integrity.
Foreign key: A column or set of columns in a table whose values are required to match at least one PrimaryKey values of a row of another table.
See also:
http://api.rubyonrails.org/classes/ActiveRecord/Associations/ClassMethods.html#method-i-belongs_to
http://c2.com/cgi/wiki?ForeignKey

The basic idea of foreign keys, or any referential constraint, is that the database should not allow you to store obviously invalid data. It is a core component of data consistency, one of the ACID rules.
If your data model says that you can have multiple phone numbers associated with an account, you can define the phone table to require a valid account number. It's therefore impossible to store orphaned phone records because you cannot insert a row in the phone table without a valid account number, and you can't delete an account without first deleting the phone numbers. If the field is birthdate, you might enforce a constraint that the date be prior to tomorrow's date. If the field is height, you might enforce that the distance be between 30 and 4000 cm. This means that it is impossible for any application to store invalid data in the database.
"Well, why can'd I just write all that into my application?" you ask. For a single-application database, you can. However, any business with a non-trivial database that stores data used business operations will want to access data directly. They'll want to be able to import data from finance or HR, or export addresses to sales, or create application user accounts by importing them from Active Directory, etc. For a non-trivial application, the user's data is what's important, and that's what they will want to access. At some point, they will want to access their data without your application code getting in the way. This is the real power and strength of an RDMBS, and it's what makes system integration possible.
However, if all your rules are stored in the application, then your users will need to be extremely careful about how they manipulate their database, lest they cause the application to implode. If you specify relational constraints and referential integrity, you require that other applications modify the data in a way that makes sense to any application that's going to use it. The logic is tied to the data (where it belongs) rather than the application.
Note that MySQL is absolute balls with respect to referential integrity. It will tend to silently succeed rather than throw errors, usually by inserting obviously invalid values like a datetime of today when you try to insert a null date into a datetime field with the constraint not null default null. There's a good reason that DBAs say that MySQL is a joke.

Foreign keys enforce referential integrity. Foreign key constraint will prevent you or any other user from adding incorrect records by mistake in the table. It makes sure that the Data (ID) being entered in the foreign key does exists in the reference table. If some buggy client code tries to insert incorrect data then in case of foreign key constraint an exception will raise, otherwise if the constraint is absent then your database will end up with inconsistent data.

Some advantages of using foreign key I can think of:
Make data consistent among tables, prevent having bad data( e.g. table A has some records refer to something does not exist in table B)
Help to document our database
Some framework is based on foreign keys to generate domain model

Related

What database design do you favor; one with actual physical relations or one that mimics it?

What is the best practice while designing a relational database? To have physical relationships between tables (actual line drawn from table to table(s)) or mimic the relationship only?
E.g.
TableA has columns
ID, Name
TableB has columns
ID, CNIC, TableA_ID
Now, TableA_ID doesn't have an actual foreign key constraint on it but stills stores the value that maps to the ID column of TableA.
I consider the later to be good and I believe that the former slows down and has cascade operation problems?
One of the main jobs of a database is to ensure data integrity.
A part of data integrity is referential integrity.
Relational databases ensure referential integrity with Foreign key constraints.
Remove the constraint, and you remove the database ability to guard against corrupt data.
Therefor, you should always specify foreign keys when your tables are related, even at the price of a performance penalty (which is usually negligible anyway).
You can't mimic a constraint if it doesn't exist - even if you have a front-end application that validates all the data being entered into the database - nothing is stopping a developer, DBA, or anyone that has a direct access to the database to enter corrupt data by mistake.
Relational model doesn't contains any kind of physical links. One relation (table) may have a reference to other one using pairs "key--foreign key" ("key" may not be a primary). To ensure the integrity of references the foreign key constraint is required. Foreign key constraint should be used in your design "by default".
Usually, a database designer can add an index on FK columns and don't take into consideration any other performance issues because they are well managed in production (i.e. temporary disabling FK checks, bulk load operations ignore FK etc).

Constrain table entries on sql table

I have 3 related tables as shown in the image below.
Everyone has an entry in the User table and a UserRole.
Some Users are Subscribers.
I want to constrain the Subscribers table so that it can only contain users whose role IsSusbcriber but retain the UserRole at User level. Is it possible?
Excuse the current relationships between the table, they represent whats there at the moment rather whats necessarily needed.
I think you could drop the IsSubscriber columns and add a UserSubscriberRoles table that will contain exactly those roles that had previously set the IsSubscriber column.
CREATE UserSubscriberRoles
( UserRoleId PRIMARY KEY
, FOREIGN KEY (UserRoleId)
REFERENCES UserRoles (UserRoleId)
) ;
Then change the FKs in Subscribers table to:
FOREIGN KEY (UserId, UserRoleId)
REFERENCES User (UserId, UserRoleId)
FOREIGN KEY (UserRoleId)
REFERENCES UserSubscriberRoles (UserRoleId)
Is it possible?
In theory, yes; you can use a SQL assertion. See this StackOverflow answer for an example.
In practice, however, no major DBMS supports SQL assertions, and what you describe cannot be implemented as a foreign-key constraint, so I think your only option is to write a trigger that evaluates this constraint, and raises an exception if it's not satisfied.
The only way to contrain this without RoleId in the table is via either a trigger (and triggers are usually a bad design choice), or a SQL Agent job that periodically removes people from Subscribers that don't fit the criteria.
With RoleID in Subscribers, you can add a check constraint that limits it to a specific value.
This would really be better enforced with a different design and/or in the application code.

SQL Server - add foreign key relationship

I have an already existing database with tables. I added foreign key relationships (because they were referring data from another table, just that relationship was not explicit in the way tables were created) for one of the tables.
How does this change impact the existing database? Does the database engine have to do some extra work on existing data in the database? Can this change be a "breaking change" if you already have an application that uses the current database schema?
If you added a referential constraint, then the database stores that constraint and ensures it is maintained. For example, if table A has a foreign key referring to table B, then you cannot insert a row into table A that refers to a key that does not exist in table B.
There is indeed some extra work (though very minimal, depending on your database server) to enforce referential integrity. In practice, the performance impact is almost never something you'd notice.
It can be a "breaking change" - your client code may insert data that doesn't meet the referential constraints. If the DB allowed you to create the constraints in the first place, it's not likely, but it is possible.
You can specify WITH NOCHECK when creating a foreign key constraint:
The WITH NOCHECK option is useful when the existing data already meets
the new FOREIGN KEY constraint, or when a business rule requires the
constraint to be enforced only from this point forward.
However, you should be careful when you add a constraint without
checking existing data because this bypasses the controls in the
Database Engine that enforce the data integrity of the table.

What are database constraints? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
What is a clear definition of database constraint? Why are constraints important for a database? What are the types of constraints?
Constraints are part of a database schema definition.
A constraint is usually associated with a table and is created with a CREATE CONSTRAINT or CREATE ASSERTION SQL statement.
They define certain properties that data in a database must comply with. They can apply to a column, a whole table, more than one table or an entire schema. A reliable database system ensures that constraints hold at all times (except possibly inside a transaction, for so called deferred constraints).
Common kinds of constraints are:
not null - each value in a column must not be NULL
unique - value(s) in specified column(s) must be unique for each row in a table
primary key - value(s) in specified column(s) must be unique for each row in a table and not be NULL; normally each table in a database should have a primary key - it is used to identify individual records
foreign key - value(s) in specified column(s) must reference an existing record in another table (via it's primary key or some other unique constraint)
check - an expression is specified, which must evaluate to true for constraint to be satisfied
To understand why we need constraints, you must first understand the value of data integrity.
Data Integrity refers to the validity of data. Are your data valid? Are your data representing what you have designed them to?
What weird questions I ask you might think, but sadly enough all too often, databases are filled with garbage data, invalid references to rows in other tables, that are long gone... and values that doesn't mean anything to the business logic of your solution any longer.
All this garbage is not alone prone to reduce your performance, but is also a time-bomb under your application logic that eventually will retreive data that it is not designed to understand.
Constraints are rules you create at design-time that protect your data from becoming corrupt. It is essential for the long time survival of your heart child of a database solution. Without constraints your solution will definitely decay with time and heavy usage.
You have to acknowledge that designing your database design is only the birth of your solution. Here after it must live for (hopefully) a long time, and endure all kinds of (strange) behaviour by its end-users (ie. client applications). But this design-phase in development is crucial for the long-time success of your solution! Respect it, and pay it the time and attention it requires.
A wise man once said: "Data must protect itself!". And this is what constraints do. It is rules that keep the data in your database as valid as possible.
There are many ways of doing this, but basically they boil down to:
Foreign key constraints is probably the most used constraint,
and ensures that references to other
tables are only allowed if there
actually exists a target row to
reference. This also makes it
impossible to break such a
relationship by deleting the
referenced row creating a dead link.
Check constraints can ensure that only specific values are allowed in
certain column. You could create a constraint only allowing the word 'Yellow' or 'Blue' in a VARCHAR column. All other values would yield an error. Get ideas for usage of check constraints check the sys.check_constraints view in the AdventureWorks sample database
Rules in SQL Server are just reusable Check Constraints (allows
you to maintain the syntax from a
single place, and making it easier to
deploy your constraints to other
databases)
As I've hinted here, it takes some thorough considerations to construct the best and most defensive constraint approach for your database design. You first need to know the possibilities and limitations of the different constraint types above. Further reading could include:
FOREIGN KEY Constraints - Microsoft
Foreign key constraint - w3schools
CHECK Constraints
Good luck! ;)
Constraints are nothing but the rules on the data. What data is valid and what is invalid can be defined using constraints. So, that integrity of data can be maintained.
Following are the widely used constraints:
Primary Key : which uniquely identifies the data . If this constraint has been specified for certain column then we can't enter duplicate data in that column
Check : Such as NOT NULL . Here we can specify what data we can enter for that particular column and what is not expected for that column.
Foreign key : Foreign key references to the row of other table. So that data referred in one table from another table is always available for the referencing table.
Constraints can be used to enforce specific properties of data. A simple example is to limit an int column to values [0-100000]. This introduction looks good.
Constraints dictate what values are valid for data in the database. For example, you can enforce the a value is not null (a NOT NULL constraint), or that it exists as a unique constraint in another table (a FOREIGN KEY constraint), or that it's unique within this table (a UNIQUE constraint or perhaps PRIMARY KEY constraint depending on your requirements). More general constraints can be implemented using CHECK constraints.
The MSDN documentation for SQL Server 2008 constraints is probably your best starting place.
UNIQUE constraint (of which a PRIMARY KEY constraint is a variant). Checks that all values of a given field are unique across the table. This is X-axis constraint (records)
CHECK constraint (of which a NOT NULL constraint is a variant). Checks that a certain condition holds for the expression over the fields of the same record. This is Y-axis constraint (fields)
FOREIGN KEY constraint. Checks that a field's value is found among the values of a field in another table. This is Z-axis constraint (tables).
A database is the computerized logical representation of a conceptual (or business) model, consisting of a set of informal business rules. These rules are the user-understood meaning of the data. Because computers comprehend only formal representations, business rules cannot be represented directly in a database. They must be mapped to a formal representation, a logical model, which consists of a set of integrity constraints. These constraints — the database schema — are the logical representation in the database of the business rules and, therefore, are the DBMS-understood meaning of the data. It follows that if the DBMS is unaware of and/or does not enforce the full set of constraints representing the business rules, it has an incomplete understanding of what the data means and, therefore, cannot guarantee (a) its integrity by preventing corruption, (b) the integrity of inferences it makes from it (that is, query results) — this is another way of saying that the DBMS is, at best, incomplete.
Note: The DBMS-“understood” meaning — integrity constraints — is not identical to the user-understood meaning — business rules — but, the loss of some meaning notwithstanding, we gain the ability to mechanize logical inferences from the data.
"An Old Class of Errors" by Fabian Pascal
There are basically 4 types of main constraints in SQL:
Domain Constraint: if one of the attribute values provided for a new
tuple is not of the specified attribute domain
Key Constraint: if the value of a key attribute in a new tuple
already exists in another tuple in the relation
Referential Integrity: if a foreign key value in a new tuple
references a primary key value that does not exist in the referenced
relation
Entity Integrity: if the primary key value is null in a new tuple
constraints are conditions, that can validate specific condition.
Constraints related with database are Domain integrity, Entity integrity, Referential Integrity, User Defined Integrity constraints etc.

How do I rename primary key values in Oracle?

Our application uses an Oracle 10g database where several primary keys are exposed to the end user. Productcodes and such. Unfortunately it's to late to do anything with this, as there are tons of reports and custom scripts out there that we do not have control over. We can't redefine the primary keys or mess up the database structure.
Now some customer want to change some of the primary key values. What they initially wanted to call P23A1 should now be called CAT23MOD1 (not a real example, but you get my meaning.)
Is there an easy way to do this? I would prefer a script of some sort, that could be parametrized to fit other tables and keys, but external tools would be acceptable if no other way exists.
The problem is presumably with the foreign keys that reference the PK. You must define the foreign keys as "deferrable initially immediate", as described in this Tom Kyte article: http://www.oracle.com/technology/oramag/oracle/03-nov/o63asktom.html
That lets you ...
Defer the constraints
Modify the parent value
Modify the child values
Commit the change
Simple.
Oops. A little googling makes it appear that, inexplicably, Oracle does not implement ON UPDATE CASCADE, only ON DELETE CASCADE. To find workarounds google ORACLE ON UPDATE CASCADE. Here's a link on Creating A Cascade Update Set of Tables in Oracle.
Original answer:
If I understand correctly, you want to change the values of data in primary key columns, not the actual constraint names of the keys themselves.
If this is true it can most easily be accomplished redefining ALL the foreign keys that reference the affected primary key constraint as ON UPDATE CASCADE. This means that when you make a change to the primary key value, the engine will automatically update all related values in foreign key tables.
Be aware that if this results in a lot of changes it could be prohibitively expensive in a production system.
If you have to do this on a live system with no DDL changes to the tables involved, then I think your only option is to (for each value of the PK that needs to be changed):
Insert into the parent table a copy of the row with the PK value replaced
For each child table, update the FK value to the new PK value
Delete the parent table row with the old PK value
If you have a list of parent tables and the PK values to be renamed, it shouldn't be too hard to write a procedure that does this - the information in USER_CONSTRAINTS can be used to get the FK-related tables for a given parent table.