which constraint makes sure a column has some value entered? I am confused between primary key and not null constraint .
A NOT NULL constraint.
All columns that participate in a PK must also not allow NULL but the PK constraint guarantees something more, uniqueness, - i.e. no two rows in the table can have the same value for the primary key.
In SQL Server even though syntactically you can name a NOT NULL constraint in the DDL it is different from other constraints in that no metadata (including even the name) is actually stored for the constraint itself.
CREATE TABLE T
(
X INT CONSTRAINT NotNull NOT NULL
)
Another point I didn't see addressed: NULL and empty string are two very different things, but they are often deemed interchangeable by a large portion of the community.
You can declare a varchar column as NOT NULL but you can still do this:
DECLARE #x TABLE(y VARCHAR(32) NOT NULL);
INSERT #x(y) VALUES('');
So if your goal is to make sure there is a valid value that is neither NULL nor a zero-length string, you can also add a check constraint, e.g.
DECLARE #x TABLE(y VARCHAR(32) NOT NULL CHECK (DATALENGTH(LTRIM(y)) > 0));
NOT NULL
is the condition that a field has a value. You can enforce that a field always have a value entered for every record inserted or updated, making the field NOT NULL in the table definition.
A primary key must meet these three conditions:
The values of the field are NOT NULL.
The values are unique.
The values are immutable.
The database can enforce the first two conditions with a unique index (and a not null condition on the field).
The third condition is not typically enforced by the database. Databases will typically allow changes to primary key fields, so DBAs can "fix" them. So the third condition is more philosophical, in that you agree to use the key for identification, and not write an application which changes the value, unless intended for an administrator to fix the keys.
I have been using field here, but a primary key can be a compound primary key, made up of any combination of fields which meets the conditions. Any combination of fields which matches the first 2 or all 3 conditions is called a candidate key.
Only one candidate key can be used as the primary key. Which one is just an arbitrary choice.
Related
Does defining the constraint PRIMARY KEY already makes sure that the column values are unique and not null or do you have to define it seperately?
Yes. But one exception is Sqlite.
See https://sqlite.org/lang_createtable.html
Each row in a table with a primary key must have a unique combination of values in its primary key columns. For the purposes of determining the uniqueness of primary key values, NULL values are considered distinct from all other values, including other NULLs. If an INSERT or UPDATE statement attempts to modify the table content so that two or more rows have identical primary key values, that is a constraint violation.
According to the SQL standard, PRIMARY KEY should always imply NOT NULL. Unfortunately, due to a bug in some early versions, this is not the case in SQLite. Unless the column is an INTEGER PRIMARY KEY or the table is a WITHOUT ROWID table or the column is declared NOT NULL, SQLite allows NULL values in a PRIMARY KEY column. SQLite could be fixed to conform to the standard, but doing so might break legacy applications. Hence, it has been decided to merely document the fact that SQLite allows NULLs in most PRIMARY KEY columns.
I need to add a foreign key to a table.
The column value will be added by the users eventually over time as each record is modified.
So I created a FK constraint with NOCHECK. But the column can't be left empty, so I declared it as NOT NULL. SQL Server won't let me do this on a nullable column, it must have a value or a default.
How can I create a FK that at first will have no entries but should it be modified a Key value must be provided?
It sounds like you have an optional relationship - modelling the FK as NULLable while still enforcing the constraint is a common approach here (NULLs FK's will not be checked by the RDBMS). e.g. In the below, we have a Person table, but initially during Person capture, we do not know in which City the user lives, so we make the CityId foreign key nullable:
CREATE TABLE City
(
CityId INT NOT NULL PRIMARY KEY,
Name NVARCHAR(200)
);
CREATE TABLE Person
(
PersonId INT NOT NULL PRIMARY KEY,
CityId INT NULL, -- Nullable
Name NVARCHAR(200),
CONSTRAINT FK_PersonCity FOREIGN KEY(CityID) REFERENCES City(CityID)
);
Person will now allow CityID to be NULL (meaning not known), or if non-null, the CityID must be a valid and exist in City. The only 'gotcha' here is that you will need to LEFT OUTER JOIN Person back to City to consider persons with unknown City.
Another strategy I've seen is to add a DUMMY row into the primary table (e.g. City(ID = 0, Name = 'Unknown') and then use this as a default for foreign keys - it will guarantee an INNER JOIN.
If you need to enforce the requirement that at a later step that the Foreign Key (City classification) becomes mandatory, I would suggest you do this in your business logic - either in code, or from a stored procedure.
When you add a column to an existing table, the new column is created with NULL values. So you can't add a not null column to a table unless you also specify a default value. In that case, the new field will contain the default value.
If you want to add a new column which will be a mandatory FK, you really only have two options:
Create the column as nullable, wait until all the fields have been filled, then change the definition of the column to not null.
Create the column as nullable, execute an update filling the fields with the correct value, then change the definition of the column to not null.
The advantage of the first option is that you're not holding locks on the table for very long. The disadvantage is that it could take weeks or months to finally getting around to populating every row.
The advantage of the second option is that the field is created and filled very quickly and all of it can be placed in one transaction. The disadvantage is that you are locking the entire table for the duration. It could be very difficult obtaining the lock in the first place and your DBA might have a few choice words about locking it for more than a few seconds.
Get with the aforementioned DBA and decide which way is best for your particular situation.
I have a table with only 1 primary column (nvarchar), as in the designer, it's marked as primary key and not allow nulls. But by somehow, there is a row in that table with the key value being empty, it's of course not null and doesn't duplicate with any other rows in the primary key column and there isn't any conflict or violation occurring.
However as far as I know, that kind of value (empty) is not allowed for a primary column in SQL Server. I wonder if there is any option to turn on to make it work properly. Or I have to check the value myself through CHECK constraint or right in C# code (before updating).
Your help would be highly appreciated. Thanks!
"Empty-string" is a string with length of zero. It is NOT NULL, so it doesn't violate a null check. It most certainly IS an allowed value for a character-based primary key column in SQL Server. If the empty string value is not allowed by the business, a check constraint would be the best way to implement this as a business rule. That way, clients which might not know about the rule can't violate it.
This code runs without violations in SQL Server, I just tested it just to be sure.
create table TestTable (
myKey varchar(10) primary key,
myData int
)
GO
insert TestTable
select '', 1
I have a table with this layout:
CREATE TABLE Favorites (
FavoriteId uuid NOT NULL PRIMARY KEY,
UserId uuid NOT NULL,
RecipeId uuid NOT NULL,
MenuId uuid
);
I want to create a unique constraint similar to this:
ALTER TABLE Favorites
ADD CONSTRAINT Favorites_UniqueFavorite UNIQUE(UserId, MenuId, RecipeId);
However, this will allow multiple rows with the same (UserId, RecipeId), if MenuId IS NULL. I want to allow NULL in MenuId to store a favorite that has no associated menu, but I only want at most one of these rows per user/recipe pair.
The ideas I have so far are:
Use some hard-coded UUID (such as all zeros) instead of null.
However, MenuId has a FK constraint on each user's menus, so I'd then have to create a special "null" menu for every user which is a hassle.
Check for existence of a null entry using a trigger instead.
I think this is a hassle and I like avoiding triggers wherever possible. Plus, I don't trust them to guarantee my data is never in a bad state.
Just forget about it and check for the previous existence of a null entry in the middle-ware or in a insert function, and don't have this constraint.
I'm using Postgres 9.0. Is there any method I'm overlooking?
Postgres 15 or newer
Postgres 15 adds the clause NULLS NOT DISTINCT. The release notes:
Allow unique constraints and indexes to treat NULL values as not distinct (Peter Eisentraut)
Previously NULL values were always indexed as distinct values, but
this can now be changed by creating constraints and indexes using
UNIQUE NULLS NOT DISTINCT.
With this clause null is treated like just another value, and a UNIQUE constraint does not allow more than one row with the same null value. The task is simple now:
ALTER TABLE favorites
ADD CONSTRAINT favo_uni UNIQUE NULLS NOT DISTINCT (user_id, menu_id, recipe_id);
There are examples in the manual chapter "Unique Constraints".
The clause switches behavior for all keys of the same index. You can't treat null as equal for one key, but not for another.
NULLS DISTINCT remains the default (in line with standard SQL) and does not have to be spelled out.
The same clause works for a UNIQUE index, too:
CREATE UNIQUE INDEX favo_uni_idx
ON favorites (user_id, menu_id, recipe_id) NULLS NOT DISTINCT;
Note the position of the new clause after the key fields.
Postgres 14 or older
Create two partial indexes:
CREATE UNIQUE INDEX favo_3col_uni_idx ON favorites (user_id, menu_id, recipe_id)
WHERE menu_id IS NOT NULL;
CREATE UNIQUE INDEX favo_2col_uni_idx ON favorites (user_id, recipe_id)
WHERE menu_id IS NULL;
This way, there can only be one combination of (user_id, recipe_id) where menu_id IS NULL, effectively implementing the desired constraint.
Possible drawbacks:
You cannot have a foreign key referencing (user_id, menu_id, recipe_id). (It seems unlikely you'd want a FK reference three columns wide - use the PK column instead!)
You cannot base CLUSTER on a partial index.
Queries without a matching WHERE condition cannot use the partial index.
If you need a complete index, you can alternatively drop the WHERE condition from favo_3col_uni_idx and your requirements are still enforced.
The index, now comprising the whole table, overlaps with the other one and gets bigger. Depending on typical queries and the percentage of null values, this may or may not be useful. In extreme situations it may even help to maintain all three indexes (the two partial ones and a total on top).
This is a good solution for a single nullable column, maybe for two. But it gets out of hands quickly for more as you need a separate partial index for every combination of nullable columns, so the number grows binomially. For multiple nullable columns, see instead:
Why doesn't my UNIQUE constraint trigger?
Aside: I advise not to use mixed case identifiers in PostgreSQL.
You could create a unique index with a coalesce on the MenuId:
CREATE UNIQUE INDEX
Favorites_UniqueFavorite ON Favorites
(UserId, COALESCE(MenuId, '00000000-0000-0000-0000-000000000000'), RecipeId);
You'd just need to pick a UUID for the COALESCE that will never occur in "real life". You'd probably never see a zero UUID in real life but you could add a CHECK constraint if you are paranoid (and since they really are out to get you...):
alter table Favorites
add constraint check
(MenuId <> '00000000-0000-0000-0000-000000000000')
You can store favourites with no associated menu in a separate table:
CREATE TABLE FavoriteWithoutMenu
(
FavoriteWithoutMenuId uuid NOT NULL, --Primary key
UserId uuid NOT NULL,
RecipeId uuid NOT NULL,
UNIQUE KEY (UserId, RecipeId)
)
I believe there is an option that combines the previous answers into a more optimal solution.
create table unique_with_nulls (
id serial not null,
name varchar not null,
age int2 not null,
email varchar,
email_discriminator varchar not null generated always as ( coalesce(email::varchar, 0::varchar) ) stored,
constraint uwn_pkey primary key (id)
);
create unique index uwn_name_age_email_uidx on unique_with_nulls(name, age, email_discriminator);
What happens here is that the column email_discriminator will be generated at "insert-or-update-time", as either an actual email, or "0" if the former one is null. Then, your unique index must target the discriminator column.
This way we don't have to create two partial indexes, and we don't loose the ability to use indexed scans on name and age selection only.
Also, you can keep the type of the email column and we don't have any problems with the coalesce function, because email_discriminator is not a foreign key. And you don't have to worry about this column receiving unexpected values because generated columns cannot be written to.
I can see three opinionated drawbacks in this solution, but they are all fine for my needs:
the duplication of data between the email and email_discriminator.
the fact that I must write to a column and read from another.
the need to find a value that is outside the set of acceptable values of email to be the fallback one (and sometimes this could be hard to find or even subjective).
I think there is a semantic problem here. In my view, a user can have a (but only one) favourite recipe to prepare a specific menu. (The OP has menu and recipe mixed up; if I am wrong: please interchange MenuId and RecipeId below)
That implies that {user,menu} should be a unique key in this table. And it should point to exactly one recipe. If the user has no favourite recipe for this specific menu no row should exist for this {user,menu} key pair. Also: the surrogate key (FaVouRiteId) is superfluous: composite primary keys are perfectly valid for relational-mapping tables.
That would lead to the reduced table definition:
CREATE TABLE Favorites
( UserId uuid NOT NULL REFERENCES users(id)
, MenuId uuid NOT NULL REFERENCES menus(id)
, RecipeId uuid NOT NULL REFERENCES recipes(id)
, PRIMARY KEY (UserId, MenuId)
);
I have a text column that should only have 1 of 3 possible strings. To put a constraint on it, I would have to reference another table. Can I instead put the values of the constraint directly on the column without referring to another table?
If this is SQL Server, Oracle, or PostgreSQL, yes, you can use a check constraint.
If it's MySQL, check constraints are recognized but not enforced. You can use an enum, though. If you need a comma-separated list, you can use a set.
However, this is generally frowned upon, since it's definitely not easy to maintain. Just best to create a lookup table and ensure referential integrity through that.
In addition to the CHECK constraint and ENUM data type that other mention, you could also write a trigger to enforce your desired restriction.
I don't necessarily recommend a trigger as a good solution, I'm just pointing out another option that meets your criteria of not referencing a lookup table.
My habit is to define lookup tables instead of using constraints or triggers, when the rule is simply to restrict a column to a finite set of values. The performance impact of checking against a lookup table is no worse than using CHECK constraints or triggers, and it's a lot easier to manage when the set of values might change from time to time.
Also a common task is to query the set of permitted value, for instance to populate a form field in the user interface. When the permitted values are in a lookup table, this is a lot easier than when they're defined in a list of literal values in a CHECK constraint or ENUM definition.
Re comment "how exactly to do lookup without id"
CREATE TABLE LookupStrings (
string VARCHAR(20) PRIMARY KEY
);
CREATE TABLE MainTable (
main_id INT PRIMARY KEY,
string VARCHAR(20) NOT NULL,
FOREIGN KEY (string) REFERENCES LookupStrings (string)
);
Now you can be assured that no value in MainTable.string is invalid, since the referential integrity prevents that. But you don't have to join to the LookupStrings table to get the string, when you query MainTable:
SELECT main_id, string FROM MainTable;
See? No join! But you get the string value.
Re comment about multiple foreign key columns:
You can have two individual foreign keys, each potentially pointing to different rows in the lookup table. The foreign key column doesn't have to be named the same as the column in the referenced table.
My common example is a bug-tracking database, where a bug was reported by one user, but assigned to be fixed by a different user. Both reported_by and assigned_to are foreign keys referencing the Accounts table.
CREATE TABLE Bugs (
bug_id INT PRIMARY KEY,
reported_by INT NOT NULL,
assigned_to INT,
FOREIGN KEY (reported_by) REFERENCES Accounts (account_id),
FOREIGN KEY (assigned_to) REFERENCES Accounts (account_id)
);
In Oracle, SQL Server and PostgreSQL, use CHECK constraint.
CREATE TABLE mytable (myfield INT VARCHAR(50) CHECK (myfield IN ('first', 'second', 'third'))
In MySQL, use ENUM datatype:
CREATE TABLE mytable (myfield ENUM ('first', 'second', 'third'))