Difference between primary key search and fields search SQL - sql

Is there any difference with performance when we are searching / selecting the records with the primary key and the other fields of same sizes?
Also the comparison in performances of such queries with GUID as primary keys and Int as primary keys will be different?

A key is a logical feature of the database whereas performance is determined entirely by physical features: storage format, indexing and the methods used internally for data access. In principle therefore the answer is No. There is no fundamental reason why there needs to be a difference between accessing data by key or by any other non-key attribute(s). A more specific answer might be possible only if we knew more information about storage, indexing and what DBMS you are using.

Unique Keys: The columns in which no two rows are similar
Primary Key: Collection of minimum number of columns which can uniquely identify every row in a table (i.e. no two rows are similar in all the columns constituting primary key). There can be more than one primary key in a table. If there exists a unique-key then it is primary key (not "the" primary key) in the table. If there does not exist a unique key then more than one column values will be required to identify a row like (first_name, last_name, father_name, mother_name) can in some tables constitute primary key.
Index: used to optimize the queries. If you are going to search or sort the results on basis of some column many times (eg. mostly people are going to search the students by name and not by their roll no.) then it can be optimized if the column values are all "indexed" for example with a binary tree algorithm.

Difference is that primary key has index so searching by primary key is usually faster than by other field that hasn't index. Anyway you can have index on field which is not primary key then there is no difference in searching.

Yes. There is a performance issue.
Search with primary key is much more faster than search with other fields. Because primary key is unique + notnull and also its having index.

Related

Why we can't have more than one primary key?

I Know there can't be more than 1 primary key in a table but what is the technical reason ?
Pulled directly from SO:
You can only have one primary key, but you can have multiple columns in your primary key.
You can also have Unique Indexes on your table, which will work a bit like a primary key in that they will enforce unique values, and will speed up querying of those values.
Primary in the context of Primary Key means that it's ranked first in importance. Therefore, there can only be one key. It's by definition.
It's also usually the key for which the index has the actual data attached to it, that is, the data is stored with the primary key index. Other indices contain only the data that's being indexed, and perhaps some Included Columns.
In fact E.F.Codd (the inventor of the Relational Database Model) [1] originated the term "primary key" to mean any number of keys of a relation - not just one. He made it clear that it was quite possible to have more than one such key. His suggestion was that the database designer could choose one key as a preferred identifier ("the primary key") - but in principle this was optional and such a choice was "arbitrary" (that was his word). Because all keys enjoy the same properties as each other there is no fundamental need to choose any one over another.
Later on [2] what Codd originally called primary keys became known as candidate keys and the one key singled out as the preferred one became known as the "primary" key. This was not really a fundamental shift however because a primary key means exactly the same as a candidate key. Since they are equivalent concepts it doesn't really mean anything important when we say there "must" only be one primary key. If you have more than one candidate key you could quite reasonably call more than one of them "primary" if you prefer because it doesn't make any logical or practical difference to the meaning and function of the database.
It has been argued (by me among others) that the idea of designating one key per table as "primary" is utterly superfluous and sometimes a positive hinderance to a good understanding of database design and data intgrity issues. However, the concept is so entrenched we are probably stuck with it.
So the proper answer to your question is "convention" and "convenience". There is no good technical reason at all.
[1] A Relational Model of Data for Large Shared Data Banks (1970)
[2] E.g. in "Further Normalization of the Relational Data Base Model" (1971)
Well, it's called "primary" for a reason. As in, its the one key used to uniquely identify the record... and there "can be only one".
You could certainly mimick a second "primary" key by having an index placed on one or more other fields that are unique but for the purposes of your database server it's generally only necessary if your key isn't unique enough to cross database servers in a merge replication situation. (ie: multi master).
PRIMARY KEY is usually equivalent to UNIQUE INDEX NOT NULL. So you can effectively have multiple "primary keys" on a single table.
The primary key is the key which uniquely identifies that record.
I'm not sure if you're asking if a) there can be a single primary key spanning multiple columns, or b) if you can have multiple keys which uniquely identify the record.
The first is possible, known as a composite primary key.
The second is possible also, but only one is called the primary key.
Because the "primary" in "primary key" denotes its, mmm, singularity(?).
But if you need more, you can define UNIQUE keys which have quite the same behaviour.
The technical reason is that there can be only one primary. Otherwise it wouldn't be called so.
However a primary key can include several columns - see 7.5.2. Multiple-Column Indexes
The primary key is the one (of possibly many) unique identifiers of a particular row in a table. The other unique identifiers, which were not designated as the primary one, are hence often refereed to as secondary unique indexes.
Primary key allows us to uniquely identify each record in the table. You can have 2 primary keys in a table but they are called Composite Primary Keys. "When you define more than one column as your primary key on a table, it is called a composite primary key."
A primary key defines record uniqueness. To have two different measures of uniqueness can be problematic. For example, if you have primary keys A and B and you insert records where A is the same and B is different, then are those records the same or different? If you consider them different, then make your primary a composite of A and B. If you consider them the same record, then just use A or B as the primary key.
For non-clustered index we can create two index and are typically made on non-primary key columns used in JOIN, WHERE , ORDER BY clauses.
While in clustered index we have only one index and that on primary key. So if we have two primary keys there is ambiguity.
Also in referential intergrity there is ambiguity selecting one of the two primary keys.
Only one primary key possible on the table because primary key creates a clustered index on the table which stored data physically on the leaf node in ordered way based on that primary key column.
If we try to create one another primary key on that table then there will be one major problem related to the data.Because be can not store same data of the table in two different-2 order.

Example Use of MySQL Indexes?

I was reading this question about the difference between these 4: Differences between INDEX, PRIMARY, UNIQUE, FULLTEXT in MySQL?
However, it is all very abstract and vague to me after reading it. Maybe it would help me understand it more concretely if I had some examples of when I would use it.
For example, I think for the field user_id, we would use the index UNIQUE, correct?
A primary key is not an index, per se --it's a constraint.
The primary key uniquely identifies a row from all the rest - that means the values must be unique. A primary key is typically made of one column, but can be made of more than one - multiple columns are called a composite....
A unique constraint is implemented in MySQL as an index - it guarantees that the same value can not occur more than once in the column(s) it is defined for. A unique constraint/index is redundant on a primary key column, and a primary key could be considered a synonym but with bigger implications. These too support composites...
In MySQL (and SQL Server), there are two types of indexes - clustered and non-clustered. A clustered index is typically associated with the primary key, and automatically created if a primary key is defined in the CREATE TABLE statement. But it doesn't have to be - it's the most important index to a table, so if it's more optimal to associate with different columns then the change should be reviewed. There can only be one clustered index for a table - the rest are non-clustered indexes. The amount of space you have to define indexes depends on the table engine - 1,000 for MyISAM and 767 for InnoDB. Indexes, clustered an non, are used to speed up data retrieval and their use can be triggered by using the columns in SELECT, JOIN, WHERE and ORDER BY clauses. But they also slow down INSERT/UPDATE/DELETE statements because of maintaining that data.
Full Text indexes are explicitly for Full Text Search (FTS) functionality - no other functionality can make use of them. They are only for columns defined with string based data types.
Mind that indexes are not ANSI - the similarities are thankfully relatively consistent. Oracle doesn't distinguish indexes - they're all the same.
Here's a brief description of what they are and when to use them:
INDEX: To speed up searches for values in this column.
UNIQUE: When you want to constrain the column to only contain unique values. This also adds an index. Example: if you only want each email to be registered once, you can add a unique constraint on the email column.
PRIMARY KEY: Contains the identity for each row. A primary key also implies a unique index and a "not null" constraint. It is often an auto-increment id, but it could also be a natural key. It is generally a good idea to create a primary key for each table, although it is not required.
FULL TEXT: This is a special type of index that allows you to perform searches for text strings much faster than LIKE '%foo%'.
Note that I am only considering single column indexes here. It is also possible to have a multi-column index.
if you have a "Person" table with id,name,email and bio information...
The primary key is the id, maybe it's a number it will allways be unique and you could use it as a reference in other tables (foreign keys)
the email is unique on each person, so you could add a UNIQUE constraint there
you might want to search a person over his name constantly so you could add an INDEX there
and finally a full text search in the bio attribute because it might be useful on a search
primary keys are also UNIQUE

When should I use primary key or index?

When should I use a primary key or an index?
What are their differences and which is the best?
Basically, a primary key is (at the implementation level) a special kind of index. Specifically:
A table can have only one primary key, and with very few exceptions, every table should have one.
A primary key is implicitly UNIQUE - you cannot have more than one row with the same primary key, since its purpose is to uniquely identify rows.
A primary key can never be NULL, so the row(s) it consists of must be NOT NULL
A table can have multiple indexes, and indexes are not necessarily UNIQUE. Indexes exist for two reasons:
To enforce a uniquness constraint (these can be created implicitly when you declare a column UNIQUE)
To improve performance. Comparisons for equality or "greater/smaller than" in WHERE clauses, as well as JOINs, are much faster on columns that have an index. But note that each index decreases update/insert/delete performance, so you should only have them where they're actually needed.
Differences
A table can only have one primary key, but several indexes.
A primary key is unique, whereas an index does not have to be unique. Therefore, the value of the primary key identifies a record in a table, the value of the index not necessarily.
Primary keys usually are automatically indexed - if you create a primary key, no need to create an index on the same column(s).
When to use what
Each table should have a primary key. Define a primary key that is guaranteed to uniquely identify each record.
If there are other columns you often use in joins or in where conditions, an index may speed up your queries. However, indexes have an overhead when creating and deleting records - something to keep in mind if you do huge amounts of inserts and deletes.
Which is best?
None really - each one has its purpose. And it's not that you really can choose the one or the other.
I recommend to always ask yourself first what the primary key of a table is and to define it.
Add indexes by your personal experience, or if performance is declining. Measure the difference, and if you work with SQL Server learn how to read execution plans.
This might help Back to the Basics: Difference between Primary Key and Unique Index
The differences between the two are:
Column(s) that make the Primary Key of a table cannot be NULL since by definition, the Primary Key cannot be NULL since it helps uniquely identify the record in the table. The column(s) that make up the unique index can be nullable. A note worth mentioning over here is that different RDBMS treat this differently –> while SQL Server and DB2 do not allow more than one NULL value in a unique index column, Oracle allows multiple NULL values. That is one of the things to look out for when designing/developing/porting applications across RDBMS.
There can be only one Primary Key defined on the table where as you can have many unique indexes defined on the table (if needed).
Also, in the case of SQL Server, if you go with the default options then a Primary Key is created as a clustered index while the unique index (constraint) is created as a non-clustered index. This is just the default behavior though and can be changed at creation time, if needed.
Keys and indexes are quite different concepts that achieve different things. A key is a logical constraint which requires tuples to be unique. An index is a performance optimisation feature of a database and is therefore a physical rather than a logical feature of the database.
The distinction between the two is sometimes blurred because often a similar or identical syntax is used for specifying constraints and indexes. Many DBMSs will create an index by default when key constraints are created. The potential for confusion between key and index is unfortunate because separating logical and physical concerns is a highly important aspect of data management.
As regards "primary" keys. They are not a "special" type of key. A primary key is just any one candidate key of a table. There are at least two ways to create candidate keys in most SQL DBMSs and that is either using the PRIMARY KEY constraint or using a UNIQUE constraint on NOT NULL columns. It is a very widely observed convention that every SQL table has a PRIMARY KEY constraint on it. Using a PRIMARY KEY constraint is conventional wisdom and a perfectly reasonable thing to do but it generally makes no practical or logical difference because most DBMSs treat all keys as equal. Certainly every table ought to enforce at least one candidate key but whether those key(s) are enforced by PRIMARY KEY or UNIQUE constraints doesn't usually matter. In principle it is candidate keys that are important, not "primary" keys.
The primary key is by definition unique: it identifies each individual row. You always want a primary key on your table, since it's the only way to identify rows.
An index is basically a dictionary for a field or set of fields. When you ask the database to find the record where some field is equal to some specific value, it can look in the dictionary (index) to find the right rows. This is very fast, because just like a dictionary, the entries are sorted in the index allowing for a binary search. Without the index, the database has to read each row in the table and check the value.
You generally want to add an index to each column you need to filter on. If you search on a specific combination of columns, you can create a single index containing all of those columns. If you do so, the same index can be used to search for any prefix of the list of columns in your index. Put simply (if a bit inaccurately), the dictionary holds entries consisting of the concatenation of the values used in the columns, in the specified order, so the database can look for entries which start with a specific value and still use efficient binary search for this.
For example, if you have an index on the columns (A, B, C), this index can be used even if you only filter on A, because that is the first column in the index. Similarly, it can be used if you filter on both A and B. It cannot, however, be used if you only filter on B or C, because they are not a prefix in the list of columns - you need another index to accomodate that.
A primary key also serves as an index, so you don't need to add an index convering the same columns as your primary key.
Every table should have a PRIMARY KEY.
Many types of queries are sped up by the judicious choice of an INDEX. It may be that the best index is the primary key. My point is that the query is the main factor in whether to use the PK for its index.

SQL: what exactly do Primary Keys and Indexes do?

I've recently started developing my first serious application which uses a SQL database, and I'm using phpMyAdmin to set up the tables. There are a couple optional "features" I can give various columns, and I'm not entirely sure what they do:
Primary Key
Index
I know what a PK is for and how to use it, but I guess my question with regards to that is why does one need one - how is it different from merely setting a column to "Unique", other than the fact that you can only have one PK? Is it just to let the programmer know that this value uniquely identifies the record? Or does it have some special properties too?
I have no idea what "Index" does - in fact, the only times I've ever seen it in use are (1) that my primary keys seem to be indexed, and (2) I heard that indexing is somehow related to performance; that you want indexed columns, but not too many. How does one decide which columns to index, and what exactly does it do?
edit: should one index colums one is likely to want to ORDER BY?
Thanks a lot,
Mala
Primary key is usually used to create a numerical 'id' for your records, and this id column is automatically incremented.
For example, if you have a books table with an id field, where the id is the primary key and is also set to auto_increment (Under 'Extra in phpmyadmin), then when you first add a book to the table, the id for that will become 1'. The next book's id would automatically be '2', and so on. Normally, every table should have at least one primary key to help identifying and finding records easily.
Indexes are used when you need to retrieve certain information from a table regularly. For example, if you have a users table, and you will need to access the email column a lot, then you can add an index on email, and this will cause queries accessing the email to be faster.
However there are also downsides for adding unnecessary indexes, so add this only on the columns that really do need to be accessed more than the others. For example, UPDATE, DELETE and INSERT queries will be a little slower the more indexes you have, as MySQL needs to store extra information for each indexed column. More info can be found at this page.
Edit: Yes, columns that need to be used in ORDER BY a lot should have indexes, as well as those used in WHERE.
The primary key is basically a unique, indexed column that acts as the "official" ID of rows in that table. Most importantly, it is generally used for foreign key relationships, i.e. if another table refers to a row in the first, it will contain a copy of that row's primary key.
Note that it's possible to have a composite primary key, i.e. one that consists of more than one column.
Indexes improve lookup times. They're usually tree-based, so that looking up a certain row via an index takes O(log(n)) time rather than scanning through the full table.
Generally, any column in a large table that is frequently used in WHERE, ORDER BY or (especially) JOIN clauses should have an index. Since the index needs to be updated for evey INSERT, UPDATE or DELETE, it slows down those operations. If you have few writes and lots of reads, then index to your hear's content. If you have both lots of writes and lots of queries that would require indexes on many columns, then you have a big problem.
The difference between a primary key and a unique key is best explained through an example.
We have a table of users:
USER_ID number
NAME varchar(30)
EMAIL varchar(50)
In that table the USER_ID is the primary key. The NAME is not unique - there are a lot of John Smiths and Muhammed Khans in the world. The EMAIL is necessarily unique, otherwise the worldwide email system wouldn't work. So we put a unique constraint on EMAIL.
Why then do we need a separate primary key? Three reasons:
the numeric key is more efficient
when used in foreign key
relationships as it takes less space
the email can change (for example
swapping provider) but the user is
still the same; rippling a change of
a primary key value throughout a schema
is always a nightmare
it is always a bad idea to use
sensitive or private information as
a foreign key
In the relational model, any column or set of columns that is guaranteed to be both present and unique in the table can be called a candidate key to the table. "Present" means "NOT NULL". It's common practice in database design to designate one of the candidate keys as the primary key, and to use references to the primary key to refer to the entire row, or to the subject matter item that the row describes.
In SQL, a PRIMARY KEY constraint amounts to a NOT NULL constraint for each primary key column, and a UNIQUE constraint for all the primary key columns taken together. In practice many primary keys turn out to be single columns.
For most DBMS products, a PRIMARY KEY constraint will also result in an index being built on the primary key columns automatically. This speeds up the systems checking activity when new entries are made for the primary key, to make sure the new value doesn't duplicate an existing value. It also speeds up lookups based on the primary key value and joins between the primary key and a foreign key that references it. How much speed up occurs depends on how the query optimizer works.
Originally, relational database designers looked for natural keys in the data as given. In recent years, the tendency has been to always create a column called ID, an integer as the first column and the primary key of every table. The autogenerate feature of the DBMS is used to ensure that this key will be unique. This tendency is documented in the "Oslo design standards". It isn't necessarily relational design, but it serves some immediate needs of the people who follow it. I do not recommend this practice, but I recognize that it is the prevalent practice.
An index is a data structure that allows for rapid access to a few rows in a table, based on a description of the columns of the table that are indexed. The index consists of copies of certain table columns, called index keys, interspersed with pointers to the table rows. The pointers are generally hidden from the DBMS users. Indexes work in tandem with the query optimizer. The user specifies in SQL what data is being sought, and the optimizer comes up with index strategies and other strategies for translating what is being sought into a stategy for finding it. There is some kind of organizing principle, such as sorting or hashing, that enables an index to be used for fast lookups, and certain other uses. This is all internal to the DBMS, once the database builder has created the index or declared the primary key.
Indexes can be built that have nothing to do with the primary key. A primary key can exist without an index, although this is generally a very bad idea.

How to use MySQL index columns?

When do you use each MySQL index type?
PRIMARY - Primary key columns?
UNIQUE - Foreign keys?
INDEX - ??
For really large tables, do indexed columns improve performance?
Primary
The primary key is - as the name suggests - the main key of a table and should be a column which is commonly used to select the rows of this table. The primary key is always a unique key (unique identifier). The primary key is not limited to one column, for example in reference tables (many-to-many) it often makes sense to have a primary key including two or more columns.
Unique
A unique index makes sure your DBMS doesn't accept duplicate entries for this column. You ask 'Foreign keys?' NO! That would not be useful since foreign keys are per definition prone to be duplicates, (one-to-many, many-to-many).
Index
Additional indexes can be placed on columns which are often used for SELECTS (and JOINS) which is often the case for foreign keys. In many cases SELECT (and JOIN) queries will be faster, if the foreign keys are indexed.
Note however that - as SquareCog has clarified - Indexes get updated on any modifications to the data, so yes, adding more indexes can lead to degradation in INSERT/UPDATE performance. If indexes didn't get updated, you would get different information depending on whether the optimizer decided to run your query on an index or the raw table -- a highly undesirable situation.
This means, you should carefully assess the usage of indices. One thing is sure on the basis of that: Unused indices have to be avoided, resp. removed!
I'm not that familiar with MySQL, however I believe the following to be true across most database servers. An index is a balanced tree which is used to allow the database to scan the table for given data. For example say you have the following table.
CREATE TABLE person (
id SERIAL,
name VARCHAR(20),
dob VARCHAR(20)
);
If you created an index on the 'name' field this would create in a balanced tree for that data in the table for the name column. Balanced tree data structures allow for faster searching of results (see http://www.solutionhacker.com/tag/balanced-tree/).
You should note however indexing a column only allows you to search on the data as it is stored in the database. For example:
This would not be able to search on the index and would instead do a sequential scan on the table, calling UPPER() on each of the column:name rows in the table.
select *
from person
where UPPER(name) = "BOB";
This would also have the following effect, because the index will be sorted starting with the first letter. Replacing the search term with "%B" would however use the index.
select *
from person
where name like "%B"
Indexes will improve performance on larger tables. Normally, the primary key has an index based on the key. Usually unique.
It is useful to add indexes to fields that are used to search on a lot too such as Street Name or Surname as again it will improve perfomance. Don't need to be unique.
Foreign Keys and Unique Keys are more for keeping your data integrity in order. So that you cannot have duplicate primary keys and so that your child tables don't have data for a parent that has been deleted.
PRIMARY defines a primary key, yes.
UNIQUE simply defines that the specified field has to be unique, it has nothing to do with foreign keys.
INDEX creates an index for the specified column and, yes, it improves performance for large tables, sorting and finding something in this column can be much faster if you use indexing.
The bigger the table, the bigger is gain from using an index. Do note that indexes makes insert (and probably update) operations slower so make sure you don't index too many fields.