Transform Logical Data model to SQL Table design - sql

In a LDM I recently made, I have an entity which has the following structure:
Building_ID (Primary Key, Foreign Key),
Plant_ID (Foreign Key),
Build_Year (Primary Key),
Size
I need to create a table in a SQL database using this design. The question I'm running into is how do I handle the primary keys here? Is it OK for a SQL table to have multiple primary keys? If the answer to this question is yes, then which column should act as the unique index? Should I create a new column to act as the unique index identifier?

Any SQL table for any relational database system (SQL Server, Oracle, Firebird, IBM DB2, Sybase etc.) I know can only ever have one primary key - after all, it's the primary key - there can only ever be one.
However, a primary key can be made up from multiple columns (called a "compound primary key"). There are downsides such as: all foreign key constraints from other tables also must specify all columns in the compound PK, thus making joining the tables a bit of a pain (since you need to specify all equality constraints for all columns included in the key in your JOIN).
Besides a primary key, you can also have multiple alternate keys - other column(s) that could also identify the row uniquely. Those make excellent candidates for e.g. indices, if those can help you speed up access to the table (but don't over-index your tables! Less is more)

Related

Why Primary key is required in SQL Server, if non nullable unique key can serve the purpose

Can an expert help me in understanding why we need primary key in a sql table, if the purpose of uniquely identifying a row can be served using non nullable unique key.
A Primary Key isn't required per se. But it serves a different function conceptually than a unique index.
The primary key identifies a row. A unique index simply ensures there are not duplicates. SQL Engine can optimize queries based on this information. Also by default many RDMSes will create the clustered index based on the primary key.
You can only have one primary key, and the column(s) can't be nullable. You can have multiple unique indexes and they can include nullable columns.
If you wanted (although that would be a terrible design, so you shouldn't) you could have a table without a primary key, that had a unique index.
This is kindof a disconnect between Logical database modelling and Physical database design/implementation - logically the Entity (Table) should have a primary key that uniquely identifies each Instance (Row). In reality you are free to do what you want with your database system.

Primary key in "many-to-many" table

I have a table in a SQL database that provides a "many-to-many" connection.
The table contains id's of both tables and some fields with additional information about the connection.
CREATE TABLE SomeTable (
f_id1 INTEGER NOT NULL,
f_id2 INTEGER NOT NULL,
additional_info text NOT NULL,
ts timestamp NULL DEFAULT now()
);
The table is expected to contain 10 000 - 100 000 entries.
How is it better to design a primary key? Should I create an additional 'id' field, or to create a complex primary key from both id's?
DBMS is PostgreSQL
This is a "hard" question in the sense that there are pretty good arguments on both sides. I have a bias toward putting in auto-incremented ids in all tables that I use. Over time, I have found that this simply helps with the development process and I don't have to think about whether they are necessary.
A big reason for this is so foreign key references to the table can use only one column.
In a many-to-many junction table (aka "association table"), this probably isn't necessary:
It is unlikely that you will add a table with a foreign key relationship to a junction table.
You are going to want a unique index on the columns anyway.
They will probably be declared not null anyway.
Some databases actually store data based on the primary key. So, when you do an insert, then data must be moved on pages to accommodate the new values. Postgres is not one of those databases. It treats the primary key index just like any other index. In other words, you are not incurring "extra" work by declaring one more more columns as a primary key.
My conclusion is that having the composite primary key is fine, even though I would probably have an auto-incremented primary key with separate constraints. The composite primary key will occupy less space so probably be more efficient than an auto-incremented id. However, if there is any chance that this table would be used for a foreign key relationship, then add in another id field.
A surrogate key wont protect you from adding multiple instances of (f_id1, f_id2) so you should definitely have a unique constraint or primary key for that. What would the purpose of a surrogate key be in your scenario?
Yes that's actually what people commonly do, that key is called surrogate key.. I'm not exactly sure with PostgreSQL, but in MySQL by using surrogate key you can delete/edit the records from the user interface.. Besides, this allows the database to query the single key column faster than it could multiple columns.. Hope it helps..

Foreign keys vs secondary keys

I used to think that foreign key and secondary key are the same thing.
After Googling the result are even more confusing, some consider them to be the same, others said that a secondary key is an index that doesn't have to be unique, and allows faster access to data than with the primary key.
Can someone explain the difference?
Or is it indeed a case of mixed terminology?
Does it maybe differ per database type?
The definition in wiki/Foreign_key states that:
In the context of relational databases, a foreign key is a field (or
collection of fields) in one table that uniquely identifies a row of
another table. In other words, a foreign key is a column or a
combination of columns that is used to establish and enforce a link
between two tables.
The table containing the foreign key is called the referencing or
child table, and the table containing the candidate key is called the
referenced or parent table.
Take the example of the case:
A customer may place 0,1 or more orders.
From the point of the business, each customer is identified by a unique id (Primary Key) and instead of repeating the customer information with each order, we place a reference, or a pointer to that unique customer id (Customer's Primary Key) in the order table. By looking at any order, we can tell who placed it using the unique customer id.
The relationship established between the parent (Customer table) and the child table (Order table) is established when you set the value of the FK in the Order table after the Customer row has been inserted. Also, deleting a child row may affect the parent depending on your Referential Integrity stings (Cascading Rules) established when the FK was created. FKs help establish integrity in a relational database system.
As for the "Secondary Key", the term refers to a structure of 1 or more columns that together help retrieve 1 or more rows of the same table. The word 'key' is somewhat misleading to some. The Secondary Key does not have to be unique (unlike the PK). It is not the Primary Key of the table. It is used to locate rows in the same table it is defined within (unlike the FK). Its enforcement is only through an index (either unique or not) and it is implementation is optional. A table could have 0,1 or more Secondary Key(s). For example, in an Employee table, you may use an auto generated column as a primary key. Alternatively, you may decide to use the Employee Number or SSN to retrieve employee(s) information.
Sometimes people mix the term "Secondary Key" with the term "Candidate Key" or "Alternate Key" (usually appears in Normalization context) but they are all different.
A foreign key is a key that references an index on some other table. For example, if you have a table of customers, one of the columns on that table may be a country column which would just contain an ID number, which would match the ID of that country in a separate Country table. That country column in the customer table would be a foreign key.
A secondary key on the other hand is just a different column in the table that you have used to create an index (which is used to speed up queries). Foreign keys have nothing to do with improving query speeds.
"Secondary key" is not a term I'm familiar with. It doesn't appear in the index of Database Design for Mere Mortals and I don't remember it in Pro SQL Server 2012 Relational Database Design and Implementation (my two "goto" books for database design). It also doesn't appear in the index for SQL for Smarties. It sounds like its not an actual term at all.
I've always used the term "candidate key".
A candidate key is a way to uniquely identify an entity. You identify all the candidate keys during the design phase of a database system. During the implementation phase, you will decide on a primary key: either one of the candidate keys or an artificial key. The primary key will probably be implemented with a primary key constraint; the candidate keys will probably be implemented with unique constraints.
A foreign key is an instance of one entity's candidate key in another entity, representing a relationship between the two entities. It will probably be implemented with a foreign key constraints.

Two related tables and primary key independence issue

I was wondering, if we have two tables that share one column in common and in the first table this column is a primary key but in the second, another is chosen as a primary key... then does SQL treat the common column in the second table as just another ordinary column? Hence no optimization is present if the second table is searched based on the common column info, i.e. primary keys between two related tables are completely independent?
Yes they are independent: primary keys are completely unique to a table.
They are not shared across tables, even if the type of the column is the same, but you can share the primary key of a table as foreign key in another table.
No optimization is performed, as the second column you had mentioned is not a primary-key in that table. And the database by default creates an index based on a primary key which improves looking up in the table data.
If a proper PK - FK relationship is established between the two columns in their respective tables, then any joins should be optimized.

Many-to-many link table design : two foreign keys only or an additional primary key?

this is undoubtedly a newbie question, but I haven't been able
to find a satisfactory answer.
When creating a link table for many-to-many relationships, is it better to
create a unique id or only use two foreign keys of the respective tables (compound key?).
Looking at different diagrams of the Northwind database for example, I've come across
both 'versions'.
That is: a OrderDetails table with fkProductID and fkOrderID and also versions
with an added OrderDetailsID.
What's the difference? (does it also depend on the DB engine?).
What are the SQL (or Linq) advantages/disadvantages?
Thanks in advance for an explanation.
Tom
ORMs have been mandating the use of non-composite primary keys to simplify queries...
But it Makes Queries Easier...
At first glance, it makes deleting or updating a specific order/etc easier - until you realize that you need to know the applicable id value first. If you have to search for that id value based on an orders specifics then you'd have been better off using the criteria directly in the first place.
But Composite keys are Complex...
In this example, a primary key constraint will ensure that the two columns--fkProductID and fkOrderID--will be unique and indexed (most DBs these days automatically index primary keys if the clustered index doesn't already exist) using the best index possible for the table.
The lone primary key approach means the OrderDetailsID is indexed with the best index for the table (SQL Server & MySQL call them clustered indexes, to Oracle they're all just indexes), and requires an additional composite unique constraint/index. Some databases might require additional indexing beyond the unique constraint... So this makes the data model more involved/complex, and for no benefit:
Some databases, like MySQL, put a limit on the amount of space you can use for indexes.
the primary key is getting the most ideal index yet the value has no relevance to the data in the table, so making use of the index related to the primary key will be seldom if ever.
Conclusion
I don't see the benefit in a single column primary key over a composite primary key. More work for additional overhead with no net benefit...
I'm used to use PrimaryKey column. It's because the primary key uniquely identify the record.
If you have a cascade-update settings on table relations, the values of foreign keys can be changed between "SELECT" and "UPDATE/DELETE" commands sent from application.