Two related tables and primary key independence issue - sql

I was wondering, if we have two tables that share one column in common and in the first table this column is a primary key but in the second, another is chosen as a primary key... then does SQL treat the common column in the second table as just another ordinary column? Hence no optimization is present if the second table is searched based on the common column info, i.e. primary keys between two related tables are completely independent?

Yes they are independent: primary keys are completely unique to a table.
They are not shared across tables, even if the type of the column is the same, but you can share the primary key of a table as foreign key in another table.
No optimization is performed, as the second column you had mentioned is not a primary-key in that table. And the database by default creates an index based on a primary key which improves looking up in the table data.

If a proper PK - FK relationship is established between the two columns in their respective tables, then any joins should be optimized.

Related

How is primary key is different from unique key when both are actually serving the same purpose as per my understanding?

Primary key is actually the one which cannot be repeated for more than one entries so as the same for unique key as far as I know.
Let's say, we take employee IDs as primary keys for 100 number of employees, this means that employee ID for two employees cannot and can never be same.
But then, what is the unique key? As employee ID is a unique identifier for each employee. What I just know is that the primary key cannot cater Null values whereas unique key can cater just one Null value.
But is that actually the difference between the two? Can someone, make it more easily understandable preferably with a code example.
Also, after differentiating between two, how can we define both in a single dataset? Are there any set rules which we have to follow.
A primary key has three properties:
The key is unique across all rows of a table.
The key (or no components of a composite key) are NULL.
There is only one per table.
In general, primary keys are used for foreign key relationships. They are typically integers, because that is somewhat more efficient for indexing.
Other columns or combinations of columns can be unique and non-NULL. Those are candidate primary keys. However, a table has only one primary key.
I hope my answer will clear your doubt,
Primary key is used as for foreign key to referencing to different tables.
But when we implement large scale Database and our requirement of Referencing more then two Keys from same table to different table(s).
At that time unique key come in scenario and will help you to do the above task easily, as per your(programmer) requirements.
Primary key can't have NULL values. Unique key can allows one NULL value.
A database table can't have more that one primary key but multiple unique keys are allowed in one table.
In sql server a clustered index automatically gets created with the primary key. On the other hand one unique key generates one non-clustered index.

Primary key in "many-to-many" table

I have a table in a SQL database that provides a "many-to-many" connection.
The table contains id's of both tables and some fields with additional information about the connection.
CREATE TABLE SomeTable (
f_id1 INTEGER NOT NULL,
f_id2 INTEGER NOT NULL,
additional_info text NOT NULL,
ts timestamp NULL DEFAULT now()
);
The table is expected to contain 10 000 - 100 000 entries.
How is it better to design a primary key? Should I create an additional 'id' field, or to create a complex primary key from both id's?
DBMS is PostgreSQL
This is a "hard" question in the sense that there are pretty good arguments on both sides. I have a bias toward putting in auto-incremented ids in all tables that I use. Over time, I have found that this simply helps with the development process and I don't have to think about whether they are necessary.
A big reason for this is so foreign key references to the table can use only one column.
In a many-to-many junction table (aka "association table"), this probably isn't necessary:
It is unlikely that you will add a table with a foreign key relationship to a junction table.
You are going to want a unique index on the columns anyway.
They will probably be declared not null anyway.
Some databases actually store data based on the primary key. So, when you do an insert, then data must be moved on pages to accommodate the new values. Postgres is not one of those databases. It treats the primary key index just like any other index. In other words, you are not incurring "extra" work by declaring one more more columns as a primary key.
My conclusion is that having the composite primary key is fine, even though I would probably have an auto-incremented primary key with separate constraints. The composite primary key will occupy less space so probably be more efficient than an auto-incremented id. However, if there is any chance that this table would be used for a foreign key relationship, then add in another id field.
A surrogate key wont protect you from adding multiple instances of (f_id1, f_id2) so you should definitely have a unique constraint or primary key for that. What would the purpose of a surrogate key be in your scenario?
Yes that's actually what people commonly do, that key is called surrogate key.. I'm not exactly sure with PostgreSQL, but in MySQL by using surrogate key you can delete/edit the records from the user interface.. Besides, this allows the database to query the single key column faster than it could multiple columns.. Hope it helps..

Foreign keys vs secondary keys

I used to think that foreign key and secondary key are the same thing.
After Googling the result are even more confusing, some consider them to be the same, others said that a secondary key is an index that doesn't have to be unique, and allows faster access to data than with the primary key.
Can someone explain the difference?
Or is it indeed a case of mixed terminology?
Does it maybe differ per database type?
The definition in wiki/Foreign_key states that:
In the context of relational databases, a foreign key is a field (or
collection of fields) in one table that uniquely identifies a row of
another table. In other words, a foreign key is a column or a
combination of columns that is used to establish and enforce a link
between two tables.
The table containing the foreign key is called the referencing or
child table, and the table containing the candidate key is called the
referenced or parent table.
Take the example of the case:
A customer may place 0,1 or more orders.
From the point of the business, each customer is identified by a unique id (Primary Key) and instead of repeating the customer information with each order, we place a reference, or a pointer to that unique customer id (Customer's Primary Key) in the order table. By looking at any order, we can tell who placed it using the unique customer id.
The relationship established between the parent (Customer table) and the child table (Order table) is established when you set the value of the FK in the Order table after the Customer row has been inserted. Also, deleting a child row may affect the parent depending on your Referential Integrity stings (Cascading Rules) established when the FK was created. FKs help establish integrity in a relational database system.
As for the "Secondary Key", the term refers to a structure of 1 or more columns that together help retrieve 1 or more rows of the same table. The word 'key' is somewhat misleading to some. The Secondary Key does not have to be unique (unlike the PK). It is not the Primary Key of the table. It is used to locate rows in the same table it is defined within (unlike the FK). Its enforcement is only through an index (either unique or not) and it is implementation is optional. A table could have 0,1 or more Secondary Key(s). For example, in an Employee table, you may use an auto generated column as a primary key. Alternatively, you may decide to use the Employee Number or SSN to retrieve employee(s) information.
Sometimes people mix the term "Secondary Key" with the term "Candidate Key" or "Alternate Key" (usually appears in Normalization context) but they are all different.
A foreign key is a key that references an index on some other table. For example, if you have a table of customers, one of the columns on that table may be a country column which would just contain an ID number, which would match the ID of that country in a separate Country table. That country column in the customer table would be a foreign key.
A secondary key on the other hand is just a different column in the table that you have used to create an index (which is used to speed up queries). Foreign keys have nothing to do with improving query speeds.
"Secondary key" is not a term I'm familiar with. It doesn't appear in the index of Database Design for Mere Mortals and I don't remember it in Pro SQL Server 2012 Relational Database Design and Implementation (my two "goto" books for database design). It also doesn't appear in the index for SQL for Smarties. It sounds like its not an actual term at all.
I've always used the term "candidate key".
A candidate key is a way to uniquely identify an entity. You identify all the candidate keys during the design phase of a database system. During the implementation phase, you will decide on a primary key: either one of the candidate keys or an artificial key. The primary key will probably be implemented with a primary key constraint; the candidate keys will probably be implemented with unique constraints.
A foreign key is an instance of one entity's candidate key in another entity, representing a relationship between the two entities. It will probably be implemented with a foreign key constraints.

Transform Logical Data model to SQL Table design

In a LDM I recently made, I have an entity which has the following structure:
Building_ID (Primary Key, Foreign Key),
Plant_ID (Foreign Key),
Build_Year (Primary Key),
Size
I need to create a table in a SQL database using this design. The question I'm running into is how do I handle the primary keys here? Is it OK for a SQL table to have multiple primary keys? If the answer to this question is yes, then which column should act as the unique index? Should I create a new column to act as the unique index identifier?
Any SQL table for any relational database system (SQL Server, Oracle, Firebird, IBM DB2, Sybase etc.) I know can only ever have one primary key - after all, it's the primary key - there can only ever be one.
However, a primary key can be made up from multiple columns (called a "compound primary key"). There are downsides such as: all foreign key constraints from other tables also must specify all columns in the compound PK, thus making joining the tables a bit of a pain (since you need to specify all equality constraints for all columns included in the key in your JOIN).
Besides a primary key, you can also have multiple alternate keys - other column(s) that could also identify the row uniquely. Those make excellent candidates for e.g. indices, if those can help you speed up access to the table (but don't over-index your tables! Less is more)

SQL: what exactly do Primary Keys and Indexes do?

I've recently started developing my first serious application which uses a SQL database, and I'm using phpMyAdmin to set up the tables. There are a couple optional "features" I can give various columns, and I'm not entirely sure what they do:
Primary Key
Index
I know what a PK is for and how to use it, but I guess my question with regards to that is why does one need one - how is it different from merely setting a column to "Unique", other than the fact that you can only have one PK? Is it just to let the programmer know that this value uniquely identifies the record? Or does it have some special properties too?
I have no idea what "Index" does - in fact, the only times I've ever seen it in use are (1) that my primary keys seem to be indexed, and (2) I heard that indexing is somehow related to performance; that you want indexed columns, but not too many. How does one decide which columns to index, and what exactly does it do?
edit: should one index colums one is likely to want to ORDER BY?
Thanks a lot,
Mala
Primary key is usually used to create a numerical 'id' for your records, and this id column is automatically incremented.
For example, if you have a books table with an id field, where the id is the primary key and is also set to auto_increment (Under 'Extra in phpmyadmin), then when you first add a book to the table, the id for that will become 1'. The next book's id would automatically be '2', and so on. Normally, every table should have at least one primary key to help identifying and finding records easily.
Indexes are used when you need to retrieve certain information from a table regularly. For example, if you have a users table, and you will need to access the email column a lot, then you can add an index on email, and this will cause queries accessing the email to be faster.
However there are also downsides for adding unnecessary indexes, so add this only on the columns that really do need to be accessed more than the others. For example, UPDATE, DELETE and INSERT queries will be a little slower the more indexes you have, as MySQL needs to store extra information for each indexed column. More info can be found at this page.
Edit: Yes, columns that need to be used in ORDER BY a lot should have indexes, as well as those used in WHERE.
The primary key is basically a unique, indexed column that acts as the "official" ID of rows in that table. Most importantly, it is generally used for foreign key relationships, i.e. if another table refers to a row in the first, it will contain a copy of that row's primary key.
Note that it's possible to have a composite primary key, i.e. one that consists of more than one column.
Indexes improve lookup times. They're usually tree-based, so that looking up a certain row via an index takes O(log(n)) time rather than scanning through the full table.
Generally, any column in a large table that is frequently used in WHERE, ORDER BY or (especially) JOIN clauses should have an index. Since the index needs to be updated for evey INSERT, UPDATE or DELETE, it slows down those operations. If you have few writes and lots of reads, then index to your hear's content. If you have both lots of writes and lots of queries that would require indexes on many columns, then you have a big problem.
The difference between a primary key and a unique key is best explained through an example.
We have a table of users:
USER_ID number
NAME varchar(30)
EMAIL varchar(50)
In that table the USER_ID is the primary key. The NAME is not unique - there are a lot of John Smiths and Muhammed Khans in the world. The EMAIL is necessarily unique, otherwise the worldwide email system wouldn't work. So we put a unique constraint on EMAIL.
Why then do we need a separate primary key? Three reasons:
the numeric key is more efficient
when used in foreign key
relationships as it takes less space
the email can change (for example
swapping provider) but the user is
still the same; rippling a change of
a primary key value throughout a schema
is always a nightmare
it is always a bad idea to use
sensitive or private information as
a foreign key
In the relational model, any column or set of columns that is guaranteed to be both present and unique in the table can be called a candidate key to the table. "Present" means "NOT NULL". It's common practice in database design to designate one of the candidate keys as the primary key, and to use references to the primary key to refer to the entire row, or to the subject matter item that the row describes.
In SQL, a PRIMARY KEY constraint amounts to a NOT NULL constraint for each primary key column, and a UNIQUE constraint for all the primary key columns taken together. In practice many primary keys turn out to be single columns.
For most DBMS products, a PRIMARY KEY constraint will also result in an index being built on the primary key columns automatically. This speeds up the systems checking activity when new entries are made for the primary key, to make sure the new value doesn't duplicate an existing value. It also speeds up lookups based on the primary key value and joins between the primary key and a foreign key that references it. How much speed up occurs depends on how the query optimizer works.
Originally, relational database designers looked for natural keys in the data as given. In recent years, the tendency has been to always create a column called ID, an integer as the first column and the primary key of every table. The autogenerate feature of the DBMS is used to ensure that this key will be unique. This tendency is documented in the "Oslo design standards". It isn't necessarily relational design, but it serves some immediate needs of the people who follow it. I do not recommend this practice, but I recognize that it is the prevalent practice.
An index is a data structure that allows for rapid access to a few rows in a table, based on a description of the columns of the table that are indexed. The index consists of copies of certain table columns, called index keys, interspersed with pointers to the table rows. The pointers are generally hidden from the DBMS users. Indexes work in tandem with the query optimizer. The user specifies in SQL what data is being sought, and the optimizer comes up with index strategies and other strategies for translating what is being sought into a stategy for finding it. There is some kind of organizing principle, such as sorting or hashing, that enables an index to be used for fast lookups, and certain other uses. This is all internal to the DBMS, once the database builder has created the index or declared the primary key.
Indexes can be built that have nothing to do with the primary key. A primary key can exist without an index, although this is generally a very bad idea.