Unclear constraints in SQL's CREATE TABLE - sql

In a Coursera course, there is a snippet of code:
I don't understand the parts:
CONSTRAINT AUTHOR_PK
(author_id) (after PRIMARY KEY)
Could you please explain?
Clarifications: for CONSTRAINT AUTHOR_PK, I don't understand why CONSTRAINT is there explicitly but it's not there for the other attributes of the table. I also don't know what AUTHOR_PK is used for.
For (author_id), I don't understand its presence. Since PRIMARY KEY is written on the same line as author_id, isn't it already implicit that author_id will be used as the primary key?
I'm very new to SQL. I consulted
https://www.w3schools.com/sql/sql_create_table.asp
https://www.w3schools.com/sql/sql_constraints.asp
but could not resolve these issues myself.

There are two types of constraints you can create in a CREATE TABLE statement. These are column constraints and table constraints.
Column constraints are included in the definition of a single column.
Table constraints are included as separate declarations, not part of a column definition.
This is the same table with the primary key declared as a table constraint:
CREATE TABLE Author
(author_id CHAR(2),
lastname VARCHAR(15) not null,
...,
CONSTRAINT PK_AUTHOR PRIMARY KEY (author_id)
)
What you have in your example is a constraint being declared as a column constraint. Normally, column constraints don't have to name which columns they're relevant to since they're part of the column's definition, and indeed in some dialects of SQL, the sample you've shown would be rejected, because it does name the columns explicitly.
PK_AUTHOR, in both your sample and mine, is being used to give a specific name to the constraint. This is helpful if you'll later need to remove the constraint. If you don't want to name the constraint, then CONSTRAINT PK_AUTHOR may be omitted from either sampe.

The CONSTRAINT keyword is necessary when you want to provide a specific name for the constraint, in this case AUTHOR_PK. If you don't do this, a name would be auto-generated, and these names are generally not very useful. All the NOT NULL constraints in this example would have auto-generated names.
In my experience, it's standard practice to name all constraints except NOT NULL ones.
I think you are right that (author_id) is unnecessary in this example, as it is implied by the fact that the constraint is declared for that column already. But the syntax allows it. (I wonder if it would allow specifying a different column in this position - I don't think so but haven't tried it.)
The syntax to specify columns is more useful when you want to declare a multiple-column key. In this case, the CONSTRAINT clause would be specified as if it were another column in the table definition:
...
country CHAR(2),
CONSTRAINT one_city_per_country UNIQUE (country,city)
);

Related

What does it mean when there is no "Id" at the end of a column name which appears to be a foreign key?

This is the databases ERD my final project for school is on (at the bottom), I am required to make a database using this information. I understand how to add the tables that are setup like 'trainer' and even how to add self-joining tables to my database, but something we have NOT learned is what it means or what to do when there is no Id at the end? Like 'evolvesfrom' and 'pokemonfightexppoint'.
Do you not have to add an Id at the end? From what my teacher taught us, I assumed you did. From what I see in this ERD is how evolvesfrom is self-joining itself to pokemonId. I know how to complete this only when there is an Id at the end of evolvesfrom.
For something like trainerId, it is super easy to understand how to add the constraints and everything like so:
CREATE TABLE trainer (
trainerId INT IDENTITY(1, 1),
trainerName VARCHAR(50) NOT NULL,
CONSTRAINT pk_trainer_trainerId PRIMARY KEY (trainerId)
);
I just don't understand how to do this when there is no Id added. For the pokemonFight table, it is noted that "It is assumed that a
Pokémon can play any battles at any battle locations. In other words, the battle experience points are functionally dependent on Pokémon, battle, and battle location", if that makes a difference.
If possible, could anyone show me an example on how to add a table, with constraints on either the pokemon or the pokemonFight table? (obviously you don't have to include the data types or anything).
Thank you in advance.
I am using SQL Server.
There is no required naming convention for columns in SQL Server that differentiates between a data column, a primary key column or a foreign key column.
The only constraints on column names are that they follow the rules for SQL Server identifier naming. However in a particular work environment you might well use a naming convention which does include ID at the end of the column name in order to clearly make the intention of the column obvious.
To create a self-referencing foreign key you just do the same as normal which can be as part of the create table or an alter table.
CREATE TABLE pokemon (
pokemonId INT IDENTITY(1, 1),
...
CONSTRAINT fk_pokemon_evolvesFrom FOREIGN KEY (evolvesFrom) REFERENCES pokemon (pokemonId)
);
-- OR
ALTER TABLE pokemon
ADD CONSTRAINT fk_pokemon_evolvesFrom FOREIGN KEY (evolvesFrom)
REFERENCES pokemon (pokemonId)

What are the drawbacks of foreign key constraints that reference non-primary-key columns?

I want to know if there are any drawbacks between a referential relation that uses primary key columns versus unique key columns (in SQL Server a foreign key constraint can only reference columns in a primary key or unique index).
Are there differences in how queries are parsed, in specific DB systems (e.g. Microsoft SQL Server 2005), based on whether a foreign key references a primary key versus a unique key?
Note that I'm not asking about the differences between using columns of different datatypes for referential integrity, joins, etc.
Purely as an example, imagine a DB in which there is a 'lookup table' dbo.Offices:
CREATE TABLE dbo.Offices (
ID int NOT NULL IDENTITY(1,1) CONSTRAINT PK_Codes PRIMARY KEY,
Code varchar(50) NOT NULL CONSTRAINT UQ_Codes_Code UNIQUE
);
There is also a table dbo.Patients:
CREATE TABLE dbo.Patients (
ID int NOT NULL IDENTITY(1,1) CONSTRAINT PK_Patients PRIMARY KEY,
OfficeCode varchar(50) NOT NULL,
...
CONSTRAINT FK_Patients_Offices FOREIGN KEY ( OfficeCode )
REFERENCES dbo.Offices ( Code )
);
What are the drawbacks of the table dbo.Patients and its constraint FK_Patients_Offices as in the T-SQL code above, versus the following alternate version:
CREATE TABLE dbo.Patients (
ID int NOT NULL IDENTITY(1,1) CONSTRAINT PK_Patients PRIMARY KEY,
OfficeID int NOT NULL,
...
CONSTRAINT FK_Patients_Offices FOREIGN KEY ( OfficeID )
REFERENCES dbo.Offices ( ID )
);
Obviously, for the second version of dbo.Patients, the values in the column OfficeID don't need to be updated if changes are made to values in the Code column of dbo.Offices.
Also (obvious) is that using the Code column of dbo.Offices for foreign key references largely defeats the purpose of the surrogate key column ID – this is purely an artifact of the example. [Is there a better example of a table for which foreign key references might reasonably use a non-primary key?]
There is no drawback.
However..
Why do you have an ID column in the Offices table? A surrogate key is used to reduce space and improve performance over, say, a varchar column when used in other tables as a foreign key.
If you are going to use the varchar column for foreign keys, then you don't need a surrogate key.
Most benefits of having the IDENTITY are squandered by using the Code column for FKs.
Why do you think there would be any drawbacks??
Quite the contrary! It's good to see you're enforcing referential integrity as everyone should! No drawbacks - just good practice to do this!
I don't see any functional difference or any problems/issues with referencing a unique index vs. referencing a primary key.
Update: since you're not interested in performance- or datatype-related issues, this last paragraph probably doesn't add any additional value.
The only minor thing I see is that your OfficeCode is both a VARCHAR and thus you might run into issues with collation and/or casing (upper-/lower-case, depending on your collation), and JOIN's on a fairly large (up to 50 bytes) and varying length field are probably not quite as efficient as JOIN conditions based on a small, fixed-length INT column.
A primary key is a candidate key and is not fundamentally different from any other candidate key. It is a widely observed convention that one candidate key per table is designated as a "primary" one and that this is the key used for all foreign key references.
A possible advantage of singling out one key in this way is that you make the use of the key clearer to users of the database: they know which key is the one being referenced without looking in every referencing table. This is entirely optional however. If you find it convenient to do otherwise or if requirements dictate that some other key should be referenced by a foreign key then I suggest you do that.
Assuming you add an index on the code column (which you definitely should as soon as you reference to it), is there anything to be said against getting rid of the entire ID column and using the code column as PK as well?
The most significant one I can think of is that, if they ever renumber the offices, you'll either lose integrity or need to update both tables. However likely that might be.
The performance consequences are vanishingly small unless you have irrationally large office codes, and even then less than you probably expect.
It's not considered a significant determinant of database design for most people.
Big flaw
We were able to enter some value into dbo.Patients.OfficeID that is not there in dbo.Offices.ID
There is no meaning to say that there is a reference.

Foreign key reference to a two-column primary key

I'm building a database that must work on MySQL, PostgreSQL, and SQLite. One of my tables has a two-column primary key:
CREATE TABLE tournament (
state CHAR(2) NOT NULL,
year INT NOT NULL,
etc...,
PRIMARY KEY(state, year)
);
I want a reference to the tournament table from another table, but I want this reference to be nullable. Here's how I might do it, imagining that a winner doesn't necessarily have a tournament:
CREATE TABLE winner (
name VARCHAR NOT NULL,
state CHAR(2) NULL,
year INT NULL
);
If state is null but year is not, or vice-versa, the table would be inconsistent. I believe the following FOREIGN KEY constraint fixes it:
ALTER TABLE winner ADD CONSTRAINT FOREIGN KEY fk (name, state) REFERENCES tournament (name, state);
Is this the proper way of enforcing consistency? Is this schema properly normalized?
Rule #1: ALWAYS SAY THE DATABASE YOU'RE USING
Ok, I'm going to suggest you look at the ON DELETE clause, and the MATCH clause. Because, Pg is fairly SQL compliant I'll point you to the current docs on CREATE TABLE.
Excerpt:
These clauses specify a foreign key
constraint, which requires that a
group of one or more columns of the
new table must only contain values
that match values in the referenced
column(s) of some row of the
referenced table. If refcolumn is
omitted, the primary key of the
reftable is used. The referenced
columns must be the columns of a
unique or primary key constraint in
the referenced table. Note that
foreign key constraints cannot be
defined between temporary tables and
permanent tables.
A value inserted into the referencing
column(s) is matched against the
values of the referenced table and
referenced columns using the given
match type. There are three match
types: MATCH FULL, MATCH PARTIAL, and
MATCH SIMPLE, which is also the
default. MATCH FULL will not allow one
column of a multicolumn foreign key to
be null unless all foreign key columns
are null. MATCH SIMPLE allows some
foreign key columns to be null while
other parts of the foreign key are not
null. MATCH PARTIAL is not yet
implemented.
In addition, when the data in the
referenced columns is changed, certain
actions are performed on the data in
this table's columns. The ON DELETE
clause specifies the action to perform
when a referenced row in the
referenced table is being deleted.
Likewise, the ON UPDATE clause
specifies the action to perform when a
referenced column in the referenced
table is being updated to a new value.
If the row is updated, but the
referenced column is not actually
changed, no action is done.
Referential actions other than the NO
ACTION check cannot be deferred, even
if the constraint is declared
deferrable. There are the following
possible actions for each clause:
Also, there is a major exception here with MS SQL -- which doesn't permit partial matches (MATCH SIMPLE and MATCH PARTIAL) behaviors in foreign keys (defaults and enforces MATCH FULL). There are workarounds where you create a MATCH FULL index on the part of the table that IS NOT NULL for any of the composite key's constituents.

Can I put constraint on column without referring to another table?

I have a text column that should only have 1 of 3 possible strings. To put a constraint on it, I would have to reference another table. Can I instead put the values of the constraint directly on the column without referring to another table?
If this is SQL Server, Oracle, or PostgreSQL, yes, you can use a check constraint.
If it's MySQL, check constraints are recognized but not enforced. You can use an enum, though. If you need a comma-separated list, you can use a set.
However, this is generally frowned upon, since it's definitely not easy to maintain. Just best to create a lookup table and ensure referential integrity through that.
In addition to the CHECK constraint and ENUM data type that other mention, you could also write a trigger to enforce your desired restriction.
I don't necessarily recommend a trigger as a good solution, I'm just pointing out another option that meets your criteria of not referencing a lookup table.
My habit is to define lookup tables instead of using constraints or triggers, when the rule is simply to restrict a column to a finite set of values. The performance impact of checking against a lookup table is no worse than using CHECK constraints or triggers, and it's a lot easier to manage when the set of values might change from time to time.
Also a common task is to query the set of permitted value, for instance to populate a form field in the user interface. When the permitted values are in a lookup table, this is a lot easier than when they're defined in a list of literal values in a CHECK constraint or ENUM definition.
Re comment "how exactly to do lookup without id"
CREATE TABLE LookupStrings (
string VARCHAR(20) PRIMARY KEY
);
CREATE TABLE MainTable (
main_id INT PRIMARY KEY,
string VARCHAR(20) NOT NULL,
FOREIGN KEY (string) REFERENCES LookupStrings (string)
);
Now you can be assured that no value in MainTable.string is invalid, since the referential integrity prevents that. But you don't have to join to the LookupStrings table to get the string, when you query MainTable:
SELECT main_id, string FROM MainTable;
See? No join! But you get the string value.
Re comment about multiple foreign key columns:
You can have two individual foreign keys, each potentially pointing to different rows in the lookup table. The foreign key column doesn't have to be named the same as the column in the referenced table.
My common example is a bug-tracking database, where a bug was reported by one user, but assigned to be fixed by a different user. Both reported_by and assigned_to are foreign keys referencing the Accounts table.
CREATE TABLE Bugs (
bug_id INT PRIMARY KEY,
reported_by INT NOT NULL,
assigned_to INT,
FOREIGN KEY (reported_by) REFERENCES Accounts (account_id),
FOREIGN KEY (assigned_to) REFERENCES Accounts (account_id)
);
In Oracle, SQL Server and PostgreSQL, use CHECK constraint.
CREATE TABLE mytable (myfield INT VARCHAR(50) CHECK (myfield IN ('first', 'second', 'third'))
In MySQL, use ENUM datatype:
CREATE TABLE mytable (myfield ENUM ('first', 'second', 'third'))

Can there be constraints with the same name in a DB?

This is a follow-on question from the one I asked here.
Can constraints in a DB have the same name?
Say I have:
CREATE TABLE Employer
(
EmployerCode VARCHAR(20) PRIMARY KEY,
Address VARCHAR(100) NULL
)
CREATE TABLE Employee
(
EmployeeID INT PRIMARY KEY,
EmployerCode VARCHAR(20) NOT NULL,
CONSTRAINT employer_code_fk FOREIGN KEY (EmployerCode) REFERENCES Employer
)
CREATE TABLE BankAccount
(
BankAccountID INT PRIMARY KEY,
EmployerCode VARCHAR(20) NOT NULL,
Amount MONEY NOT NULL,
CONSTRAINT employer_code_fk FOREIGN KEY (EmployerCode) REFERENCES Employer
)
Is this allowable? Does it depend on the DBMS (I'm on SQL Server 2005)? If it is not allowable, does anyone have any suggestions on how to work around it?
No - a constraint is a database object as well, and thus its name needs to be unique.
Try adding e.g. the table name to your constraint, that way it'll be unique.
CREATE TABLE BankAccount
(
BankAccountID INT PRIMARY KEY,
EmployerCode VARCHAR(20) NOT NULL,
Amount MONEY NOT NULL,
CONSTRAINT FK_BankAccount_Employer
FOREIGN KEY (EmployerCode) REFERENCES Employer
)
We basically use "FK_"(child table)_(parent table)" to name the constraints and are quite happy with this naming convention.
Information from MSDN
That constraint names have to be unique to the schema (ie. two different schemas in the same database can both contain a constraint with the same name) is not explicitly documented. Rather you need to assume the identifiers of database objects must be unique within the containing schema unless specified otherwise. So the constraint name is defined as:
Is the name of the constraint. Constraint names must follow the rules for identifiers, except that the name cannot start with a number sign (#). If constraint_name is not supplied, a system-generated name is assigned to the constraint.
Compare this to the name of an index:
Is the name of the index. Index names must be unique within a table or view but do not have to be unique within a database. Index names must follow the rules of identifiers.
which explicitly narrows the scope of the identifier.
The other answers are all good but I thought I'd add an answer to the question in the title, i.e., "can there be constraints with the same name in a DB?"
The answer for MS SQL Server is yes – but only so long as the constraints are in different schemas. Constraint names must be unique within a schema.
I was always puzzled why constraint names must be unique in the database, since they seem like they're associated with tables.
Then I read about SQL-99's ASSERTION constraint, which is like a check constraint, but exists apart from any single table. The conditions declared in an assertion must be satisfied consistently like any other constraint, but the assertion can reference multiple tables.
AFAIK no SQL vendor implements ASSERTION constraints. But this helps explain why constraint names are database-wide in scope.
It depends on the DBMS.
For example on PostgreSQL, the answer is yes :
Because PostgreSQL does not require constraint names to be unique
within a schema (but only per-table), it is possible that there is
more than one match for a specified constraint name.
Source : https://www.postgresql.org/docs/current/static/sql-set-constraints.html
I've seen Foreign Keys constraint names equals on 2 different tables within the same schema.
Does it depend on the DBMS (I'm on SQL Server 2005)?
Yes, apparently it does depend on the DBMS.
Other answers say it's not permitted, but I have a MS SQL CE ("Compact Edition") database in which I accidentally successfully created two FK contraints, in two tables, with the same contraint name.
Good practice is to create index and constraint names specifying table name at the beginning.
There's 2 approaches, with index/constraint type at the beginning or at the end) eg.
UQ_TableName_FieldName
or
TableName_FieldName_UQ
Foreign keys names should also contain names of referenced Table/Field(s).
One of good naming conventions is to give table names in form of FullName_3LetterUniqueAlias eg.
Employers_EMR
Employees_EMP
BankAccounts_BNA
Banks_BNK
This give you opportunity to use "predefined" aliases in queries which improves readability and also makes Naming of foreign keys easier, like:
EMPEMR_EmployerCode_FK
BNKEMR_EmployerCode_FK