column pk dual varchars or int?

column pk dual varchars or int? - sql

Imagine a table that has a JobID and a SubJobID.
The combination of JobID + SubJobID make the row unique.
For instance:
JobID = CAX100
SubJobID = CA00
JobID = CAX200RW
SubJobID = CA00
JobID = CAX200YYXZ
SubJobID = CA01
etc...
These are valid rows, but if I try to again insert JobID=CAX100 and SubJobID=CA00 then this is not allowed and should error.
So currently should I keep JobID and SubJobID the combination of both as a primary key? Or should I introduce an int as a primary key (maybe an autoincrement int)? I know ints are better and smaller for primary keys, but then how do I ensure the uniqueness of JobID + SubJobID if I introduce a number as a primary key?
Currently I have JobID and SubJobID as composite primary keys I understand that, that is what I currently have, but what is recommended having both columns as pk (varchar) or introduce a new column (int type with auto number) and create a unique constraint on JobID + SubJobID?

If you add a surrogate autoincrement integer column, you still need a unique index on the JobID, SubJobID columns. This is the natural primary key.
It may not add much value except using more disk and memory to store the extra column.
Saying that, if the composite (JobID, SubJobID) index is used in child tables as a foreign key, it can make more sense to use one. 10000 rows here could have 10 million child rows: then the varchar overhead matters.
However, there is no truth in anything you read that says
Thou shalt always have an identity/autoincrement primary key
It is an implementation decision for performance depending on the design.

Personally, I would define an integer Primary Key and a unique index on the 2 varchar fields. This is assuming that your database application is non-trivial and performance is an issue.
If you only have a few dozen records in this table, on the other hand, it is possible to over-engineer things.

Related

sqlite text as primary key vs autoincrement integers

I'm currently debating between two strategies to using a text column as a key.
The first one is to simply use the text column itself as a key, as such:
create table a(
key_a text primary key,
)
create table b(
key_b text primary key,
)
create table c(
key_a text,
key_b text,
foreign key("key_a") references a("key_a"),
foreign key("key_b") references b("key_b")
)
I'm concerned that this would result in every key being duplicated, once in a and b and another in c, since text isn't stored inline.
My second approach is to use an autoincrement id on the first two tables as a primary key, and use those ids on table c to refer to them, as such:
create table a(
id_a integer,
key_a text unique,
primary key("id_a" autoincrement)
)
create table b(
id_b integer,
key_b text unique,
primary key("id_a" autoincrement)
)
create table c(
id_a integer,
id_b integer,
foreign key("id_a") references a("id_a"),
foreign key("id_b") references b("id_b")
)
Am I right to be concerned about text duplication in the first case? Or does sqlite somehow intern these and just use an id for both, akin to what the second strategy does?

SQLite does not automatically compress text. So the answer to your question is "no".
Should you use text or an auto-incrementing id as the primary key? This can be a complex question. But happily, the answer is that it doesn't make much difference. That said, there are some considerations:
Integers are of fixed length. In general, fix length keys are slightly more efficient in B-tree indexes than variable length keys.
If the strings are short (like 1 or 2 or 3 characters), then they may be shorter -- or no longer -- than integers.
If you change the string (say, if it is originally misspelled), then using an "artificial" primary key makes this easy: just change the value in one table. Using the string itself as a key can result in lots of updates to lots of tables.

Am I right to be concerned about text duplication in the first case?
Or does sqlite somehow intern these and just use an id for both, akin
to what the second strategy does?
Yes, you are right to be concerned. The text will be duplicated.
Also, even if you did not define an integer primary key in your 1st approach, there is one.
From Rowid Tables:
The PRIMARY KEY of a rowid table (if there is one) is usually not the
true primary key for the table, in the sense that it is not the unique
key used by the underlying B-tree storage engine. The exception to
this rule is when the rowid table declares an INTEGER PRIMARY KEY. In
the exception, the INTEGER PRIMARY KEY becomes an alias for the rowid.
The true primary key for a rowid table (the value that is used as the
key to look up rows in the underlying B-tree storage engine) is the
rowid.
In your 2nd approach actually you are not creating a new column in each of the tables a and b by defining an integer primary key.
What you are doing is aliasing the existing rowid column:
id_a becomes the alias of rowid of the table a
id_b becomes the alias of rowid of the table b.
So, defining these integer primary keys is not more expensive in terms of space in the parent tables.
Although with your 1st approach you can avoid explicit updates in the child tables when you update a value in the parent tables by defining the foreign keys with ON UPDATE CASCADE, your 2nd approach is what I would suggest.
An integer primary key with a value assigned to it by the system and you don't even have to know or worry about it is common practice.
All you have to do is use that primary key and its corresponding foreign keys in the queries that you create to access the parent tables when you want to fetch from them the text values.

For performance (also it is a good db practice) you should stick to numeric/int value for the Primary Key.
As for the second approach, I'm not getting the concept you are after. Could you elaborate more on this?

Do I really need PRIMARY KEY when using UNIQUE NOT NULL columns?

My knowledge in SQL is limited and I would appreciate someone who could help me to clarify the use of PRIMARY KEY in the following circumstances. I created a table to support ISO country information. I'm using MariaDB 10 but I believe that will not be relevant for the kind of questions I have(?)
CREATE TABLE IF NOT EXISTS python.country
(
iso_code INTEGER( 3) NOT NULL ,
iso_2_alpha VARCHAR( 2) NOT NULL ,
iso_3_alpha VARCHAR( 3) NOT NULL ,
short_name VARCHAR( 32) NOT NULL ,
long_name VARCHAR( 64) NOT NULL ,
flag_link VARCHAR(2000) DEFAULT(NULL),
CONSTRAINT CK_iso_code CHECK (iso_code > 0 AND iso_code <= 999) ,
CONSTRAINT CK_iso_alpha CHECK (
iso_2_alpha RLIKE BINARY '^[A-Z]+$' AND LENGTH(iso_2_alpha) = 2
AND
iso_3_alpha RLIKE BINARY '^[A-Z]+$' AND LENGTH(iso_3_alpha) = 3
) ,
CONSTRAINT CK_names CHECK (
short_name RLIKE '^\\p{L}+(\\.?[[:blank:]]\\p{L}+)*\\p{L}+$'
AND
long_name RLIKE '^\\p{L}+(\\.?[[:blank:]]\\p{L}+)*\\p{L}+$'
) ,
CONSTRAINT UN_short_name UNIQUE (short_name) ,
CONSTRAINT UN_long_name UNIQUE (long_name) ,
CONSTRAINT UN_iso_2_alpha UNIQUE (iso_2_alpha) ,
CONSTRAINT UN_iso_3_alpha UNIQUE (iso_3_alpha)
-- ???
-- CONSTRAINT PK_country PRIMARY KEY (iso_code,iso_2_alpha,iso_3_alpha)
); -- ENGINE = 'InnoDB';
Question 1: Since all main columns (iso_code,iso_2_alpha,iso_3_alpha) are NOT NULL and UNIQUE does make sense to create a composite PRIMARY KEY? I "believe" it's waste of space and time when inserting new elements?
Question 2: Can I use iso_code safely has being the FOREIGN KEY in other table?
Many thanks.

Since all main columns (iso_code,iso_2_alpha,iso_3_alpha) are NOT NULL and UNIQUE does make sense to create a composite PRIMARY KEY? I "believe" it's waste of space and time when inserting new elements?
Your proposed PK is a superkey over existing keys. It's not necessary in and of itself. You could choose to declare one of your unique key constraints as a PK instead but it's not necessary.
Can I use iso_code safely has being the FOREIGN KEY in other table?
If you also mark iso_code as a unique key in this table, that should work fine.
Some people would recommend that every table always have an autogenerated column marked as PK. That's fine so long as you also enforce the logical keys. Unfortunately, many people will just create that auto-PK and no other keys, which means your data is nonsense.
You've chosen (currently) to just have the logical keys. I think that's fine in this case, especially as several (iso_code, iso_2_alpha and iso_3_alpha) are likely to be more compact that the recommended autogenerated column.

Can't comment on performance and efficiency but one thing with composite keys is that when you use them as a primary key, you have to repeat them in your foreign key. I.e, PK iso_code, iso_2_alpha, iso_3_alpha will be additional FK columns in all the tables related. You also have to then query by these 3 columns in your SQL queries. Bit of a PITA IMO when you can simply use a generic, unique self generating column.
If you can use iso_code and you are sure you never ever ever will have the chance to require inserting a duplicate iso_code that has a different iso_2_alpha, iso_3_alpha then go ahead. But, you should future proof and make table more robust and anticipate the unexpected, use a new dedicated id column unrelated to the business, IMHO.

Can a Unique constraint on multiple Columns add indexes separately on those columns

I have a table with structure shown below :-
CREATE TABLE IF NOT EXISTS tblvideolikes (
itemid SERIAL PRIMARY KEY,
videoid integer NOT NULL,
userid integer NOT NULL,
CONSTRAINT liked_video_user UNIQUE(videoid,userid)
)
I have a lot of select queries with userid and videoid. I want to know whether adding unique constraint on both columns are sufficient or Do I need to do indexing on both of them as well. I have searched a lot about this but nothing makes it clear.

If you have to enforce the unique combination of both columns, you have to create the unique index on both of them.
Postgres will use that index as well if your where clause only has a condition on the first column of the index (the usual "it depends" on index usage still applies here).
Postgres is able to use a column that is not the leading column of an index for a where condition - however that is less efficient then using a leading column.
I would put that column first that is used more often as single where condition. The order of the columns does not matter for the uniqueness.
If the usage of (only) the second column is as frequent as using the (only) first column, then adding an additional index with only the second column could make sense, e.g.:
CREATE TABLE IF NOT EXISTS videolikes (
itemid SERIAL PRIMARY KEY,
videoid integer NOT NULL,
userid integer NOT NULL,
CONSTRAINT liked_video_user UNIQUE(videoid,userid)
);
create index on videolikes (userid);
The unique index would then be used for conditions on only videoid and (equality) conditions using both columns. The second index would be used for conditions on only the userid
Unrelated, but:
The itemid primary key is pretty much useless with the above setup. You needlessly increase the size of the table and add another index that needs to be maintained. You can simply leave it out and declare videoid, userid as the primary key:
CREATE TABLE IF NOT EXISTS videolikes (
videoid integer NOT NULL,
userid integer NOT NULL,
CONSTRAINT pk_videolikes primary key (videoid,userid)
);
create index on videolikes (userid);

Indexing on both the column separately is a better idea if you are going to do frequent queries from both sides.

mysql: difference between primary key and unique index? [duplicate]

At work we have a big database with unique indexes instead of primary keys and all works fine.
I'm designing new database for a new project and I have a dilemma:
In DB theory, primary key is fundamental element, that's OK, but in REAL projects what are advantages and disadvantages of both?
What do you use in projects?
EDIT: ...and what about primary keys and replication on MS SQL server?

What is a unique index?
A unique index on a column is an index on that column that also enforces the constraint that you cannot have two equal values in that column in two different rows. Example:
CREATE TABLE table1 (foo int, bar int);
CREATE UNIQUE INDEX ux_table1_foo ON table1(foo); -- Create unique index on foo.
INSERT INTO table1 (foo, bar) VALUES (1, 2); -- OK
INSERT INTO table1 (foo, bar) VALUES (2, 2); -- OK
INSERT INTO table1 (foo, bar) VALUES (3, 1); -- OK
INSERT INTO table1 (foo, bar) VALUES (1, 4); -- Fails!
Duplicate entry '1' for key 'ux_table1_foo'
The last insert fails because it violates the unique index on column foo when it tries to insert the value 1 into this column for a second time.
In MySQL a unique constraint allows multiple NULLs.
It is possible to make a unique index on mutiple columns.
Primary key versus unique index
Things that are the same:
A primary key implies a unique index.
Things that are different:
A primary key also implies NOT NULL, but a unique index can be nullable.
There can be only one primary key, but there can be multiple unique indexes.
If there is no clustered index defined then the primary key will be the clustered index.

You can see it like this:
A Primary Key IS Unique
A Unique value doesn't have to be the Representaion of the Element
Meaning?; Well a primary key is used to identify the element, if you have a "Person" you would like to have a Personal Identification Number ( SSN or such ) which is Primary to your Person.
On the other hand, the person might have an e-mail which is unique, but doensn't identify the person.
I always have Primary Keys, even in relationship tables ( the mid-table / connection table ) I might have them. Why? Well I like to follow a standard when coding, if the "Person" has an identifier, the Car has an identifier, well, then the Person -> Car should have an identifier as well!

Foreign keys work with unique constraints as well as primary keys. From Books Online:
A FOREIGN KEY constraint does not have
to be linked only to a PRIMARY KEY
constraint in another table; it can
also be defined to reference the
columns of a UNIQUE constraint in
another table
For transactional replication, you need the primary key. From Books Online:
Tables published for transactional
replication must have a primary key.
If a table is in a transactional
replication publication, you cannot
disable any indexes that are
associated with primary key columns.
These indexes are required by
replication. To disable an index, you
must first drop the table from the
publication.
Both answers are for SQL Server 2005.

The choice of when to use a surrogate primary key as opposed to a natural key is tricky. Answers such as, always or never, are rarely useful. I find that it depends on the situation.
As an example, I have the following tables:
CREATE TABLE toll_booths (
id INTEGER NOT NULL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
...
UNIQUE(name)
)
CREATE TABLE cars (
vin VARCHAR(17) NOT NULL PRIMARY KEY,
license_plate VARCHAR(10) NOT NULL,
...
UNIQUE(license_plate)
)
CREATE TABLE drive_through (
id INTEGER NOT NULL PRIMARY KEY,
toll_booth_id INTEGER NOT NULL REFERENCES toll_booths(id),
vin VARCHAR(17) NOT NULL REFERENCES cars(vin),
at TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL,
amount NUMERIC(10,4) NOT NULL,
...
UNIQUE(toll_booth_id, vin)
)
We have two entity tables (toll_booths and cars) and a transaction table (drive_through). The toll_booth table uses a surrogate key because it has no natural attribute that is not guaranteed to change (the name can easily be changed). The cars table uses a natural primary key because it has a non-changing unique identifier (vin). The drive_through transaction table uses a surrogate key for easy identification, but also has a unique constraint on the attributes that are guaranteed to be unique at the time the record is inserted.
http://database-programmer.blogspot.com has some great articles on this particular subject.

There are no disadvantages of primary keys.
To add just some information to #MrWiggles and #Peter Parker answers, when table doesn't have primary key for example you won't be able to edit data in some applications (they will end up saying sth like cannot edit / delete data without primary key). Postgresql allows multiple NULL values to be in UNIQUE column, PRIMARY KEY doesn't allow NULLs. Also some ORM that generate code may have some problems with tables without primary keys.
UPDATE:
As far as I know it is not possible to replicate tables without primary keys in MSSQL, at least without problems (details).

If something is a primary key, depending on your DB engine, the entire table gets sorted by the primary key. This means that lookups are much faster on the primary key because it doesn't have to do any dereferencing as it has to do with any other kind of index. Besides that, it's just theory.

In addition to what the other answers have said, some databases and systems may require a primary to be present. One situation comes to mind; when using enterprise replication with Informix a PK must be present for a table to participate in replication.

As long as you do not allow NULL for a value, they should be handled the same, but the value NULL is handled differently on databases(AFAIK MS-SQL do not allow more than one(1) NULL value, mySQL and Oracle allow this, if a column is UNIQUE)
So you must define this column NOT NULL UNIQUE INDEX

There is no such thing as a primary key in relational data theory, so your question has to be answered on the practical level.
Unique indexes are not part of the SQL standard. The particular implementation of a DBMS will determine what are the consequences of declaring a unique index.
In Oracle, declaring a primary key will result in a unique index being created on your behalf, so the question is almost moot. I can't tell you about other DBMS products.
I favor declaring a primary key. This has the effect of forbidding NULLs in the key column(s) as well as forbidding duplicates. I also favor declaring REFERENCES constraints to enforce entity integrity. In many cases, declaring an index on the coulmn(s) of a foreign key will speed up joins. This kind of index should in general not be unique.

There are some disadvantages of CLUSTERED INDEXES vs UNIQUE INDEXES.
As already stated, a CLUSTERED INDEX physically orders the data in the table.
This mean that when you have a lot if inserts or deletes on a table containing a clustered index, everytime (well, almost, depending on your fill factor) you change the data, the physical table needs to be updated to stay sorted.
In relative small tables, this is fine, but when getting to tables that have GB's worth of data, and insertrs/deletes affect the sorting, you will run into problems.

I almost never create a table without a numeric primary key. If there is also a natural key that should be unique, I also put a unique index on it. Joins are faster on integers than multicolumn natural keys, data only needs to change in one place (natural keys tend to need to be updated which is a bad thing when it is in primary key - foreign key relationships). If you are going to need replication use a GUID instead of an integer, but for the most part I prefer a key that is user readable especially if they need to see it to distinguish between John Smith and John Smith.
The few times I don't create a surrogate key are when I have a joining table that is involved in a many-to-many relationship. In this case I declare both fields as the primary key.

My understanding is that a primary key and a unique index with a not‑null constraint, are the same (*); and I suppose one choose one or the other depending on what the specification explicitly states or implies (a matter of what you want to express and explicitly enforce). If it requires uniqueness and not‑null, then make it a primary key. If it just happens all parts of a unique index are not‑null without any requirement for that, then just make it a unique index.
The sole remaining difference is, you may have multiple not‑null unique indexes, while you can't have multiple primary keys.
(*) Excepting a practical difference: a primary key can be the default unique key for some operations, like defining a foreign key. Ex. if one define a foreign key referencing a table and does not provide the column name, if the referenced table has a primary key, then the primary key will be the referenced column. Otherwise, the the referenced column will have to be named explicitly.
Others here have mentioned DB replication, but I don't know about it.

Unique Index can have one NULL value. It creates NON-CLUSTERED INDEX.
Primary Key cannot contain NULL value. It creates CLUSTERED INDEX.

In MSSQL, Primary keys should be monotonically increasing for best performance on the clustered index. Therefore an integer with identity insert is better than any natural key that might not be monotonically increasing.

If it were up to me...
You need to satisfy the requirements of the database and of your applications.
Adding an auto-incrementing integer or long id column to every table to serve as the primary key takes care of the database requirements.
You would then add at least one other unique index to the table for use by your application. This would be the index on employee_id, or account_id, or customer_id, etc. If possible, this index should not be a composite index.
I would favor indices on several fields individually over composite indices. The database will use the single field indices whenever the where clause includes those fields, but it will only use a composite when you provide the fields in exactly the correct order - meaning it can't use the second field in a composite index unless you provide both the first and second in your where clause.
I am all for using calculated or Function type indices - and would recommend using them over composite indices. It makes it very easy to use the function index by using the same function in your where clause.
This takes care of your application requirements.
It is highly likely that other non-primary indices are actually mappings of that indexes key value to a primary key value, not rowid()'s. This allows for physical sorting operations and deletes to occur without having to recreate these indices.

Primary key or Unique index?

At work we have a big database with unique indexes instead of primary keys and all works fine.
I'm designing new database for a new project and I have a dilemma:
In DB theory, primary key is fundamental element, that's OK, but in REAL projects what are advantages and disadvantages of both?
What do you use in projects?
EDIT: ...and what about primary keys and replication on MS SQL server?

What is a unique index?
A unique index on a column is an index on that column that also enforces the constraint that you cannot have two equal values in that column in two different rows. Example:
CREATE TABLE table1 (foo int, bar int);
CREATE UNIQUE INDEX ux_table1_foo ON table1(foo); -- Create unique index on foo.
INSERT INTO table1 (foo, bar) VALUES (1, 2); -- OK
INSERT INTO table1 (foo, bar) VALUES (2, 2); -- OK
INSERT INTO table1 (foo, bar) VALUES (3, 1); -- OK
INSERT INTO table1 (foo, bar) VALUES (1, 4); -- Fails!
Duplicate entry '1' for key 'ux_table1_foo'
The last insert fails because it violates the unique index on column foo when it tries to insert the value 1 into this column for a second time.
In MySQL a unique constraint allows multiple NULLs.
It is possible to make a unique index on mutiple columns.
Primary key versus unique index
Things that are the same:
A primary key implies a unique index.
Things that are different:
A primary key also implies NOT NULL, but a unique index can be nullable.
There can be only one primary key, but there can be multiple unique indexes.
If there is no clustered index defined then the primary key will be the clustered index.

You can see it like this:
A Primary Key IS Unique
A Unique value doesn't have to be the Representaion of the Element
Meaning?; Well a primary key is used to identify the element, if you have a "Person" you would like to have a Personal Identification Number ( SSN or such ) which is Primary to your Person.
On the other hand, the person might have an e-mail which is unique, but doensn't identify the person.
I always have Primary Keys, even in relationship tables ( the mid-table / connection table ) I might have them. Why? Well I like to follow a standard when coding, if the "Person" has an identifier, the Car has an identifier, well, then the Person -> Car should have an identifier as well!

Foreign keys work with unique constraints as well as primary keys. From Books Online:
A FOREIGN KEY constraint does not have
to be linked only to a PRIMARY KEY
constraint in another table; it can
also be defined to reference the
columns of a UNIQUE constraint in
another table
For transactional replication, you need the primary key. From Books Online:
Tables published for transactional
replication must have a primary key.
If a table is in a transactional
replication publication, you cannot
disable any indexes that are
associated with primary key columns.
These indexes are required by
replication. To disable an index, you
must first drop the table from the
publication.
Both answers are for SQL Server 2005.

The choice of when to use a surrogate primary key as opposed to a natural key is tricky. Answers such as, always or never, are rarely useful. I find that it depends on the situation.
As an example, I have the following tables:
CREATE TABLE toll_booths (
id INTEGER NOT NULL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
...
UNIQUE(name)
)
CREATE TABLE cars (
vin VARCHAR(17) NOT NULL PRIMARY KEY,
license_plate VARCHAR(10) NOT NULL,
...
UNIQUE(license_plate)
)
CREATE TABLE drive_through (
id INTEGER NOT NULL PRIMARY KEY,
toll_booth_id INTEGER NOT NULL REFERENCES toll_booths(id),
vin VARCHAR(17) NOT NULL REFERENCES cars(vin),
at TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL,
amount NUMERIC(10,4) NOT NULL,
...
UNIQUE(toll_booth_id, vin)
)
We have two entity tables (toll_booths and cars) and a transaction table (drive_through). The toll_booth table uses a surrogate key because it has no natural attribute that is not guaranteed to change (the name can easily be changed). The cars table uses a natural primary key because it has a non-changing unique identifier (vin). The drive_through transaction table uses a surrogate key for easy identification, but also has a unique constraint on the attributes that are guaranteed to be unique at the time the record is inserted.
http://database-programmer.blogspot.com has some great articles on this particular subject.

There are no disadvantages of primary keys.
To add just some information to #MrWiggles and #Peter Parker answers, when table doesn't have primary key for example you won't be able to edit data in some applications (they will end up saying sth like cannot edit / delete data without primary key). Postgresql allows multiple NULL values to be in UNIQUE column, PRIMARY KEY doesn't allow NULLs. Also some ORM that generate code may have some problems with tables without primary keys.
UPDATE:
As far as I know it is not possible to replicate tables without primary keys in MSSQL, at least without problems (details).

If something is a primary key, depending on your DB engine, the entire table gets sorted by the primary key. This means that lookups are much faster on the primary key because it doesn't have to do any dereferencing as it has to do with any other kind of index. Besides that, it's just theory.

In addition to what the other answers have said, some databases and systems may require a primary to be present. One situation comes to mind; when using enterprise replication with Informix a PK must be present for a table to participate in replication.

As long as you do not allow NULL for a value, they should be handled the same, but the value NULL is handled differently on databases(AFAIK MS-SQL do not allow more than one(1) NULL value, mySQL and Oracle allow this, if a column is UNIQUE)
So you must define this column NOT NULL UNIQUE INDEX

There is no such thing as a primary key in relational data theory, so your question has to be answered on the practical level.
Unique indexes are not part of the SQL standard. The particular implementation of a DBMS will determine what are the consequences of declaring a unique index.
In Oracle, declaring a primary key will result in a unique index being created on your behalf, so the question is almost moot. I can't tell you about other DBMS products.
I favor declaring a primary key. This has the effect of forbidding NULLs in the key column(s) as well as forbidding duplicates. I also favor declaring REFERENCES constraints to enforce entity integrity. In many cases, declaring an index on the coulmn(s) of a foreign key will speed up joins. This kind of index should in general not be unique.

There are some disadvantages of CLUSTERED INDEXES vs UNIQUE INDEXES.
As already stated, a CLUSTERED INDEX physically orders the data in the table.
This mean that when you have a lot if inserts or deletes on a table containing a clustered index, everytime (well, almost, depending on your fill factor) you change the data, the physical table needs to be updated to stay sorted.
In relative small tables, this is fine, but when getting to tables that have GB's worth of data, and insertrs/deletes affect the sorting, you will run into problems.

I almost never create a table without a numeric primary key. If there is also a natural key that should be unique, I also put a unique index on it. Joins are faster on integers than multicolumn natural keys, data only needs to change in one place (natural keys tend to need to be updated which is a bad thing when it is in primary key - foreign key relationships). If you are going to need replication use a GUID instead of an integer, but for the most part I prefer a key that is user readable especially if they need to see it to distinguish between John Smith and John Smith.
The few times I don't create a surrogate key are when I have a joining table that is involved in a many-to-many relationship. In this case I declare both fields as the primary key.

My understanding is that a primary key and a unique index with a not‑null constraint, are the same (*); and I suppose one choose one or the other depending on what the specification explicitly states or implies (a matter of what you want to express and explicitly enforce). If it requires uniqueness and not‑null, then make it a primary key. If it just happens all parts of a unique index are not‑null without any requirement for that, then just make it a unique index.
The sole remaining difference is, you may have multiple not‑null unique indexes, while you can't have multiple primary keys.
(*) Excepting a practical difference: a primary key can be the default unique key for some operations, like defining a foreign key. Ex. if one define a foreign key referencing a table and does not provide the column name, if the referenced table has a primary key, then the primary key will be the referenced column. Otherwise, the the referenced column will have to be named explicitly.
Others here have mentioned DB replication, but I don't know about it.

Unique Index can have one NULL value. It creates NON-CLUSTERED INDEX.
Primary Key cannot contain NULL value. It creates CLUSTERED INDEX.

In MSSQL, Primary keys should be monotonically increasing for best performance on the clustered index. Therefore an integer with identity insert is better than any natural key that might not be monotonically increasing.

If it were up to me...
You need to satisfy the requirements of the database and of your applications.
Adding an auto-incrementing integer or long id column to every table to serve as the primary key takes care of the database requirements.
You would then add at least one other unique index to the table for use by your application. This would be the index on employee_id, or account_id, or customer_id, etc. If possible, this index should not be a composite index.
I would favor indices on several fields individually over composite indices. The database will use the single field indices whenever the where clause includes those fields, but it will only use a composite when you provide the fields in exactly the correct order - meaning it can't use the second field in a composite index unless you provide both the first and second in your where clause.
I am all for using calculated or Function type indices - and would recommend using them over composite indices. It makes it very easy to use the function index by using the same function in your where clause.
This takes care of your application requirements.
It is highly likely that other non-primary indices are actually mappings of that indexes key value to a primary key value, not rowid()'s. This allows for physical sorting operations and deletes to occur without having to recreate these indices.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas