What are database constraints? [closed] - sql

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
What is a clear definition of database constraint? Why are constraints important for a database? What are the types of constraints?

Constraints are part of a database schema definition.
A constraint is usually associated with a table and is created with a CREATE CONSTRAINT or CREATE ASSERTION SQL statement.
They define certain properties that data in a database must comply with. They can apply to a column, a whole table, more than one table or an entire schema. A reliable database system ensures that constraints hold at all times (except possibly inside a transaction, for so called deferred constraints).
Common kinds of constraints are:
not null - each value in a column must not be NULL
unique - value(s) in specified column(s) must be unique for each row in a table
primary key - value(s) in specified column(s) must be unique for each row in a table and not be NULL; normally each table in a database should have a primary key - it is used to identify individual records
foreign key - value(s) in specified column(s) must reference an existing record in another table (via it's primary key or some other unique constraint)
check - an expression is specified, which must evaluate to true for constraint to be satisfied

To understand why we need constraints, you must first understand the value of data integrity.
Data Integrity refers to the validity of data. Are your data valid? Are your data representing what you have designed them to?
What weird questions I ask you might think, but sadly enough all too often, databases are filled with garbage data, invalid references to rows in other tables, that are long gone... and values that doesn't mean anything to the business logic of your solution any longer.
All this garbage is not alone prone to reduce your performance, but is also a time-bomb under your application logic that eventually will retreive data that it is not designed to understand.
Constraints are rules you create at design-time that protect your data from becoming corrupt. It is essential for the long time survival of your heart child of a database solution. Without constraints your solution will definitely decay with time and heavy usage.
You have to acknowledge that designing your database design is only the birth of your solution. Here after it must live for (hopefully) a long time, and endure all kinds of (strange) behaviour by its end-users (ie. client applications). But this design-phase in development is crucial for the long-time success of your solution! Respect it, and pay it the time and attention it requires.
A wise man once said: "Data must protect itself!". And this is what constraints do. It is rules that keep the data in your database as valid as possible.
There are many ways of doing this, but basically they boil down to:
Foreign key constraints is probably the most used constraint,
and ensures that references to other
tables are only allowed if there
actually exists a target row to
reference. This also makes it
impossible to break such a
relationship by deleting the
referenced row creating a dead link.
Check constraints can ensure that only specific values are allowed in
certain column. You could create a constraint only allowing the word 'Yellow' or 'Blue' in a VARCHAR column. All other values would yield an error. Get ideas for usage of check constraints check the sys.check_constraints view in the AdventureWorks sample database
Rules in SQL Server are just reusable Check Constraints (allows
you to maintain the syntax from a
single place, and making it easier to
deploy your constraints to other
databases)
As I've hinted here, it takes some thorough considerations to construct the best and most defensive constraint approach for your database design. You first need to know the possibilities and limitations of the different constraint types above. Further reading could include:
FOREIGN KEY Constraints - Microsoft
Foreign key constraint - w3schools
CHECK Constraints
Good luck! ;)

Constraints are nothing but the rules on the data. What data is valid and what is invalid can be defined using constraints. So, that integrity of data can be maintained.
Following are the widely used constraints:
Primary Key : which uniquely identifies the data . If this constraint has been specified for certain column then we can't enter duplicate data in that column
Check : Such as NOT NULL . Here we can specify what data we can enter for that particular column and what is not expected for that column.
Foreign key : Foreign key references to the row of other table. So that data referred in one table from another table is always available for the referencing table.

Constraints can be used to enforce specific properties of data. A simple example is to limit an int column to values [0-100000]. This introduction looks good.

Constraints dictate what values are valid for data in the database. For example, you can enforce the a value is not null (a NOT NULL constraint), or that it exists as a unique constraint in another table (a FOREIGN KEY constraint), or that it's unique within this table (a UNIQUE constraint or perhaps PRIMARY KEY constraint depending on your requirements). More general constraints can be implemented using CHECK constraints.
The MSDN documentation for SQL Server 2008 constraints is probably your best starting place.

UNIQUE constraint (of which a PRIMARY KEY constraint is a variant). Checks that all values of a given field are unique across the table. This is X-axis constraint (records)
CHECK constraint (of which a NOT NULL constraint is a variant). Checks that a certain condition holds for the expression over the fields of the same record. This is Y-axis constraint (fields)
FOREIGN KEY constraint. Checks that a field's value is found among the values of a field in another table. This is Z-axis constraint (tables).

A database is the computerized logical representation of a conceptual (or business) model, consisting of a set of informal business rules. These rules are the user-understood meaning of the data. Because computers comprehend only formal representations, business rules cannot be represented directly in a database. They must be mapped to a formal representation, a logical model, which consists of a set of integrity constraints. These constraints — the database schema — are the logical representation in the database of the business rules and, therefore, are the DBMS-understood meaning of the data. It follows that if the DBMS is unaware of and/or does not enforce the full set of constraints representing the business rules, it has an incomplete understanding of what the data means and, therefore, cannot guarantee (a) its integrity by preventing corruption, (b) the integrity of inferences it makes from it (that is, query results) — this is another way of saying that the DBMS is, at best, incomplete.
Note: The DBMS-“understood” meaning — integrity constraints — is not identical to the user-understood meaning — business rules — but, the loss of some meaning notwithstanding, we gain the ability to mechanize logical inferences from the data.
"An Old Class of Errors" by Fabian Pascal

There are basically 4 types of main constraints in SQL:
Domain Constraint: if one of the attribute values provided for a new
tuple is not of the specified attribute domain
Key Constraint: if the value of a key attribute in a new tuple
already exists in another tuple in the relation
Referential Integrity: if a foreign key value in a new tuple
references a primary key value that does not exist in the referenced
relation
Entity Integrity: if the primary key value is null in a new tuple

constraints are conditions, that can validate specific condition.
Constraints related with database are Domain integrity, Entity integrity, Referential Integrity, User Defined Integrity constraints etc.

Related

How does "default" known as a "constraint" in sql?

Every constraint in SQL is trying to restrict some kind of illegal input. FOREIGN KEY, UNIQUE, CHECK, etc.
The thing is, DEFAULT, as the name suggests, does not restrict the input value, rather, "assigns" the value if it's not provided. So how could it possibly categorised as an "constraint" in SQL?
A DEFAULT constraint, constrains the insertion of an implicit NULL into a column.
That is one way it called be known as a constraint.
However, note that in the SQL-92 standard grammer the 'DEFAULT' column definition element is part of a "default clause" and is not referred to as a constraint.
Its referred to as a "constraint" in vendor specific implementations, notably Microsoft SQL Server.
Argument's exist, for and against.
In general it's all about data integrity. There are two types of integrity:
data integrity (about correct values in the columns) and
referential integrity (about relationships between tables).
So constraints (as MSDN says) - let you define the way the Database Engine automatically enforces the integrity of a database.
And finally - default constraint are a special type of constraint (i.e. special type of data integrity mechanisms) that specify what values are used in a column if you do not specify a value for the column when you insert a row.
In other words constraint is just a term that used for defining the mechanisms for ensuring the database integrity. Something like this.

What's a foreign key for?

I've been using table associations with SQL (MySQL) and Rails without a problem, and I've never needed to specify a foreign key constraint.
I just add a table_id column in the belongs_to table, and everything works just fine.
So what am I missing? What's the point of using the foreign key clause in MySQL or other RDBMS?
Thanks.
A foreign key is a referential constraint between two tables
The reason foreign key constraints exist is to guarantee that the referenced rows exist.
The foreign key identifies a column or set of columns in one (referencing or child) table that refers to a column or set of columns in another (referenced or parent) table.
you can get nice "on delete cascade" behavior, automatically cleaning up tables
There are lots of reason of using foreign key listed over here: Why Should one use foreign keys
Rails (ActiveRecord more specifically) auto-guesses the foreign key for you.
... By default this is guessed to be the name of the association with an “_id” suffix.
Foreign keys enforce referential integrity.
Foreign key: A column or set of columns in a table whose values are required to match at least one PrimaryKey values of a row of another table.
See also:
http://api.rubyonrails.org/classes/ActiveRecord/Associations/ClassMethods.html#method-i-belongs_to
http://c2.com/cgi/wiki?ForeignKey
The basic idea of foreign keys, or any referential constraint, is that the database should not allow you to store obviously invalid data. It is a core component of data consistency, one of the ACID rules.
If your data model says that you can have multiple phone numbers associated with an account, you can define the phone table to require a valid account number. It's therefore impossible to store orphaned phone records because you cannot insert a row in the phone table without a valid account number, and you can't delete an account without first deleting the phone numbers. If the field is birthdate, you might enforce a constraint that the date be prior to tomorrow's date. If the field is height, you might enforce that the distance be between 30 and 4000 cm. This means that it is impossible for any application to store invalid data in the database.
"Well, why can'd I just write all that into my application?" you ask. For a single-application database, you can. However, any business with a non-trivial database that stores data used business operations will want to access data directly. They'll want to be able to import data from finance or HR, or export addresses to sales, or create application user accounts by importing them from Active Directory, etc. For a non-trivial application, the user's data is what's important, and that's what they will want to access. At some point, they will want to access their data without your application code getting in the way. This is the real power and strength of an RDMBS, and it's what makes system integration possible.
However, if all your rules are stored in the application, then your users will need to be extremely careful about how they manipulate their database, lest they cause the application to implode. If you specify relational constraints and referential integrity, you require that other applications modify the data in a way that makes sense to any application that's going to use it. The logic is tied to the data (where it belongs) rather than the application.
Note that MySQL is absolute balls with respect to referential integrity. It will tend to silently succeed rather than throw errors, usually by inserting obviously invalid values like a datetime of today when you try to insert a null date into a datetime field with the constraint not null default null. There's a good reason that DBAs say that MySQL is a joke.
Foreign keys enforce referential integrity. Foreign key constraint will prevent you or any other user from adding incorrect records by mistake in the table. It makes sure that the Data (ID) being entered in the foreign key does exists in the reference table. If some buggy client code tries to insert incorrect data then in case of foreign key constraint an exception will raise, otherwise if the constraint is absent then your database will end up with inconsistent data.
Some advantages of using foreign key I can think of:
Make data consistent among tables, prevent having bad data( e.g. table A has some records refer to something does not exist in table B)
Help to document our database
Some framework is based on foreign keys to generate domain model

SQL Server - add foreign key relationship

I have an already existing database with tables. I added foreign key relationships (because they were referring data from another table, just that relationship was not explicit in the way tables were created) for one of the tables.
How does this change impact the existing database? Does the database engine have to do some extra work on existing data in the database? Can this change be a "breaking change" if you already have an application that uses the current database schema?
If you added a referential constraint, then the database stores that constraint and ensures it is maintained. For example, if table A has a foreign key referring to table B, then you cannot insert a row into table A that refers to a key that does not exist in table B.
There is indeed some extra work (though very minimal, depending on your database server) to enforce referential integrity. In practice, the performance impact is almost never something you'd notice.
It can be a "breaking change" - your client code may insert data that doesn't meet the referential constraints. If the DB allowed you to create the constraints in the first place, it's not likely, but it is possible.
You can specify WITH NOCHECK when creating a foreign key constraint:
The WITH NOCHECK option is useful when the existing data already meets
the new FOREIGN KEY constraint, or when a business rule requires the
constraint to be enforced only from this point forward.
However, you should be careful when you add a constraint without
checking existing data because this bypasses the controls in the
Database Engine that enforce the data integrity of the table.

Recommended structure for table that has FK for 3 other tables

I have a table that will contain information for 3 other tables. The design I have is that this table will have a column that will tell the objects's ID and another column will tell the objects's type (and thus the table that that row refers to).
Two questions:
a) Is that the best design or is there something else more widely accepted?
b) What is the recommend procedure to assure that IDs are valid for the given objects's type?
If I understood your question correctly, each row in your table links to exactly one of the three other tables.
Your approach (type field + one foreign key field) is a valid design, and it's useful if you want to create a general-purpose table that contains meta-information about your data (e.g. a list of records that should be retransmitted for replication).
Another approach, which might be more suitable for real application-level data, would be to have three columns, each being a foreign key to one of the three tables, and to add a constraint that requires exactly two of those fields to be null. The has the following advantages:
The three FKs do not need to have the same data type.
The JOIN syntax becomes more natural (not involving the type field).
You can add referential integrity constraints on those FK columns.
You don't need to ensure correctness of the type field -- in fact, you don't need the type field at all. The type is determined implicitly by the one FK column which is not null.
a) I'm supposing you have a relationship one to many between objects and object types. In a normal design you'd have a reference from the objecttype column in the objects table to the primary key of the object types table
b) I would enforce referential integrity in the relationship properties (this depends on the dbms you are using). It's also up to you to use cascading on updates and deletes. This way, an update or a delete of the primary key on object types table would be reflected on the objects one, updating its foreign key column (object type column) or deleting the registers that have that object type.
The basics of DB schema design are easy, but more complicated situations can be really complicated to figure out what's best. There is a lot of personal subjectivity that can come into play here, and even performance can be a factor in denormalizing a design.
Disclaimer aside, my personal recommendation is to never use a column to store more than one kind of FK, i.e. a column for FKs should store FKs that point only to a single table. If you don't do this, you have to map the cascade of that column's data into multiple sub-select queries inside your code, and it can begin to get more messy than you expected. Your given "Problem No. 2, ensuring validity between type and FK" is just the beginning of a whole world of pain that will cascade throughout your source code.
Assuming you change the design to use one field per FK reference, I would also check whether each FK field in your main "information-holding table" will be fully valid for each record. If not, I would move out the FK columns that will only be applicable some of the time to a separate table.

What are the down sides of using a composite/compound primary key?

What are the down sides of using a composite/compound primary key?
Could cause more problems for normalisation (2NF, "Note that when a 1NF table has no composite candidate keys (candidate keys consisting of more than one attribute), the table is automatically in 2NF")
More unnecessary data duplication. If your composite key consists of 3 columns, you will need to create the same 3 columns in every table, where it is used as a foreign key.
Generally avoidable with the help of surrogate keys (read about their advantages and disadvantages)
I can imagine a good scenario for composite key -- in a table representing a N:N relation, like Students - Classes, and the key in the intermediate table will be (StudentID, ClassID). But if you need to store more information about each pair (like a history of all marks of a student in a class) then you'll probably introduce a surrogate key.
There's nothing wrong with having a compound key per se, but a primary key should ideally be as small as possible (in terms of number of bytes required). If the primary key is long then this will cause non-clustered indexes to be bloated.
Bear in mind that the order of the columns in the primary key is important. The first column should be as selective as possible i.e. as 'unique' as possible. Searches on the first column will be able to seek, but searches just on the second column will have to scan, unless there is also a non-clustered index on the second column.
I think this is a specialisation of the synthetic key debate (whether to use meaningful keys or an arbitrary synthetic primary key). I come down almost completely on the synthetic key side of this debate for a number of reasons. These are a few of the more pertinent ones:
You have to keep dependent child
tables on the end of a foriegn key
up to date. If you change the the
value of one of the primary key
fields (which can happen - see
below) you have to somehow change
all of the dependent tables where
their PK value includes these
fields. This is a bit tricky
because changing key values will
invalidate FK relationships with
child tables so you may (depending
on the constraint validation options
available on your platform) have to
resort to tricks like copying the
record to a new one and deleting the
old records.
On a deep schema the keys can get
quite wide - I've seen 8 columns
once.
Changes in primary key values can be
troublesome to identify in ETL
processes loading off the system.
The example I once had occasion to
see was an MIS application
extracting from an insurance
underwriting system. On some
occasions a policy entry would be
re-used by the customer, changing
the policy identifier. This was a
part of the primary key of the
table. When this happens the
warehouse load is not aware of what
the old value was so it cannot match
the new data to it. The developer
had to go searching through audit
logs to identify the changed value.
Most of the issues with non-synthetic primary keys revolve around issues when PK values of records change. The most useful applications of non-synthetic values are where a database schema is intended to be used, such as an M.I.S. application where report writers are using the tables directly. In this case short values with fixed domains such as currency codes or dates might reasonably be placed directly on the table for convenience.
I would recommend a generated primary key in those cases with a unique not null constraint on the natural composite key.
If you use the natural key as primary then you will most likely have to reference both values in foreign key references to make sure you are identifying the correct record.
Take the example of a table with two candidate keys: one simple (single-column) and one compound (multi-column). Your question in that context seems to be, "What disadvantage may I suffer if I choose to promote one key to be 'primary' and I choose the compound key?"
First, consider whether you actually need to promote a key at all: "the very existence of the PRIMARY KEY in SQL seems to be an historical accident of some kind. According to author Chris Date the earliest incarnations of SQL didn't have any key constraints and PRIMARY KEY was only later addded to the SQL standards. The designers of the standard obviously took the term from E.F.Codd who invented it, even though Codd's original notion had been abandoned by that time! (Codd originally proposed that foreign keys must only reference one key - the primary key - but that idea was forgotten and ignored because it was widely recognised as a pointless limitation)." [source: David Portas' Blog: Down with Primary Keys?
Second, what criteria would you apply to choose which key in a table should be 'primary'?
In SQL, the choice of key PRIMARY KEY is arbitrary and product specific. In ACE/Jet (a.k.a. MS Access) the two main and often competing factors is whether you want to use PRIMARY KEY to favour clustering on disk or whether you want the columns comprising the key to appears as bold in the 'Relationships' picture in the MS Access user interface; I'm in the minority by thinking that index strategy trumps pretty picture :) In SQL Server, you can specify the clustered index independently of the PRIMARY KEY and there seems to be no product-specific advantage afforded. The only remaining advantage seems to be the fact you can omit the columns of the PRIMARY KEY when creating a foreign key in SQL DDL, being a SQL-92 Standard behaviour and anyhow doesn't seem such a big deal to me (perhaps another one of the things they added to the Standard because it was a feature already widespread in SQL products?) So, it's not a case of looking for drawbacks, rather, you should be looking to see what advantage, if any, your SQL product gives the PRIMARY KEY. Put another way, the only drawback to choosing the wrong key is that you may be missing out on a given advantage.
Third, are you rather alluding to using an artificial/synthetic/surrogate key to implement in your physical model a candidate key from your logical model because you are concerned there will be performance penalties if you use the natural key in foreign keys and table joins? That's an entirely different question and largely depends on your 'religious' stance on the issue of natural keys in SQL.
Need more specificity.
Taken too far, it can overcomplicate Inserts (Every key MUST exist) and documentation and your joined reads could be suspect if incomplete.
Sometimes it can indicate a flawed data model (is a composite key REALLY what's described by the data?)
I don't believe there is a performance cost...it just can go really wrong really easily.
when you se it on a diagram are less readable
when you use it on a query join are less
readable
when you use it on a foregein key
you have to add a check constraint
about all the attribute have to be
null or not null (if only one is
null the key is not checked)
usualy need more storage when use it
as foreign key
some tool doesn't manage composite
key
The main downside of using a compound primary key, is that you will confuse the hell out of typical ORM code generators.