Is having a primary key that auto increments on each new row necessary? for me this number is getting quite long and I'm not even using it for anything.
I can imagine that with gradual user activity on my site new rows will be added (I am only testing atm with just 2 alfa test users and already the number has auto incremented to over 100), eventually this number could reach silly proportions (example: 10029379000577352881086) and not only slow the site down (effecting user experience) but also could inevitably push my site over its quota (exceeding its allowed size (laymen's))
really is this needed?
If you have some field/column (or combination of columns) which can be a primary key, use that, why use Auto increment. There are school of thoughts which believe using a mix of both. You could search for surrogate keys and you may find this answer interesting Surrogate vs. natural/business keys
For size quota problem, practically I don't think the maximum auto increment value would cause your site to go over data limit. If it is of int type it will take 4 bytes, regardless of the value inside. For SQL server int type could contain values ranging from -2^31 (-2,147,483,648) to 2^31-1 (2,147,483,647).
Here is the link for that
You need a way to uniquely identify each record in your table.
If you have that already -- say a user-ID or email-address -- then you don't necessarily need that auto-incrementing field.
Note: If you don't already have a unique constraint on that field, you should add one so that duplicate data cannot be entered into the table.
Warning: If you decide to get rid of it, be sure that no other tables are using it.
can't you user multiple columns to get a composite key instead of that?
just a hint.
You do need a key that identifies every row. But a key doesn't have to be a number that "auto-increments" for every row. The fact that a few people seem to think incrementing numbers are always a good idea for keys is probably a consequence either of carelessness or a lack of appreciation of database fundamentals, sound design and data integrity.
primary key is not always necessary to have for a table . for your question check my answer:
when and when not primary key should use
Related
I found a t-sql question and its answer. It is too confusing. I could use a little help.
The question is:
You develop a database application. You create four tables. Each table stores different categories of products. You create a Primary Key field on each table.
You need to ensure that the following requirements are met:
The fields must use the minimum amount of space.
The fields must be an incrementing series of values.
The values must be unique among the four tables.
What should you do?
A. Create a ROWVERSION column.
B. Create a SEQUENCE object that uses the INTEGER data type.
C. Use the INTEGER data type along with IDENTITY
D. Use the UNIQUEIDENTIFIER data type along with NEWSEQUENTIALID()
E. Create a TIMESTAMP column.
The said answer is D. But, I think the more suitable answer is B. Because sequence will use less space than GUID and it satisfies all the requirements.
D is a wrong answer, because NEWSEQUENTIALID doesn't guarantee "an incrementing series of values" (second requirement).
NEWSEQUENTIALID()
Creates a GUID that is greater than any GUID
previously generated by this function on a specified computer since
Windows was started. After restarting Windows, the GUID can start
again from a lower range, but is still globally unique.
I'd say that B (sequence) is the correct answer. At least, you can use a sequence to fulfil all three requirements, if you don't restart/recycle it manually. I think it is the easiest way to meet all three requirements.
Between the choices provided D B is the correct answer, since it meets all requirements:
ROWVERSION is a bad choice for a primary key, as stated in MSDN:
Every time that a row with a rowversion column is modified or inserted, the incremented database rowversion value is inserted in the rowversion column. This property makes a rowversion column a poor candidate for keys, especially primary keys. Any update made to the row changes the rowversion value and, therefore, changes the key value. If the column is in a primary key, the old key value is no longer valid, and foreign keys referencing the old value are no longer valid.
TIMESTAMP is deprecated, as stated in that same page:
The timestamp syntax is deprecated. This feature will be removed in a future version of Microsoft SQL Server. Avoid using this feature in new development work, and plan to modify applications that currently use this feature.
An IDENTITY column does not guarantee uniqueness, unless all it's values are only ever generated automatically (you can use SET IDENTITY_INSERT to insert values manually), nor does it guarantee uniqueness between tables for any value.
A GUID is practically guaranteed to be unique per system, so if a guid is the primary key for all 4 tables it ensures uniqueness for all tables. the one requirement it doesn't fulfill is storage size - It's storage size is quadruple that of int (16 bytes instead of 4).
A SEQUENCE, when is not declared as recycle, guarantee uniqueness, and has the lowest storage size.
The sequence of numeric values is generated in an ascending or descending order at a defined interval and can be configured to restart (cycle) when exhausted.
However,
I would actually probably choose a different option all together - create a base table with a single identity column and link it with a 1:1 relationship with all other categories. then use an instead of insert trigger for all categories tables that will first insert a record to the base table and then use scope_identity() to get the value and insert it as the primary key for the category table.
This will enforce uniqueness as well as make it possible to use a single foreign key reference between the categories and products.
The issue has been discussed extensively in the past, in general:
http://blog.codinghorror.com/primary-keys-ids-versus-guids/
The constraint #3 is why a SEQUENCE could run into issues as there is a higher risk of collision/lowered number of possible rows in each table.
There are a couple of similar questions already out there and the consensus seemed to be that a primary key should always be created.
But what if you have a single row table for storing settings (and let's not turn this into a discussion about why it might be good/bad to create a single row table please)?
Surely having a primary key on a single row table becomes completely useless?
It may seem completely useless, but it's also completely harmless, and I'd vote for harmless with good design principles vs. useless with no design principles every time.
Other people have commented, rightly, that you don't know how you're going to use the table in a year or five years... what if someone comes along and decides they want to duplicate the configuration -- move it to a distributed environment or add a test environment by using a duplicate configuration string or whatever. Having a field that acts like a primary key means that whenever you query the table, if you use the key, you'll be certain no matter what anyone else may do to your table, that you're getting the correct record.
You're right there are a million other aspects -- surrogate keys vs. intelligent keys, indexing, partitioning (silly on a single row table, I know), whatever... but without getting into that I'd vote add the key rather than not add it. You could have done it by the time you read this thread.
Short answer, no key, duplicate records possible. Your planning a single row now, but what about six months in the future when you single row multiplies. Put a primary key on the table, even for single row.
You could always base your primary key on the name of the setting. Then your table would become a key-value store.
But no, in many RDBMS you are not REQUIRED to have a primary key per table.
Having declared a primary key on a single row table in SQL will ensure that there will be no duplicates. Whether it is useless depends on your requirements. Usually it is a good idea to avoid duplicates.
I wondered if I insert, let's say, 10 entries into a SQL Server table.
If i then delete one of them, will the id/index change correspondingly?
Example:
1 | Simon Cowell | 56 years
2 | Frank Lampard| 24 years
3 | Harry Bennet | 12 years
If I delete #2, will Harry Bennet's index change to 2?
Thanks :)
EDIT:
Sorry for my outrage, had a bad day. And yes, I should have researched it myself, I deserve to be downvoted.
I don't ask for anything, I just want to say that I'm sorry :|
Since you seem to be conflating "id/index" let's talk a little but about the primary key and indexes in the context of a relational database.
The "id" or primary key assigned to a row in a SQL database is the unique identifier for that row. It can consist of one or more columns. (When more than one column is involved it is known as a "composite" or "multi-part" key.) The primary key should really do nothing more than be a unique handle for addressing a row: the primary key should not contain any information about the entity represented by the row, especially if that info has the potential to be mutable; an example would be a part number that has a suffix that stands for the type of metal the part is made from; if that metal can possibly change from titanium to unobtainium, say, that part number would make a bad choice as a primary key; it would be better to have another column to store the type of metal than to make the metal-type suffix part of the primary key. "Meaningful" primary keys might have made some sense in legacy non-relational databases but in a relational database they are to be avoided.
When seeking to enforce the uniqueness of a primary key, a database engine can make use of an index so it can rapidly test whether the key value exists. It could use a binary algorithm to find the value, avoiding the need to scan the actual data "brute force", row by row, looking for the value. But the index that is used behind the scenes by the engine to assist it with the primary key housekeeping is not the same as the primary key itself.
If you have a simple sequential integer as your primary key, there's an infinite number of them, so there is no need to reuse an integer when it becomes available when the row to which it was assigned has been deleted. So the relational database engine won't automatically attempt to reuse it, and it won't by any means change the primary key values that have been assigned to all other rows in the table when "gaps" in the number sequence are created by a deletion. Many other rows in other tables could be referencing those values and having them be mutable would create either chaos or a huge inefficiency.
Hashing algorithms are another very efficient way a database engine can quickly test for the existence of a key value. It computes the location in the hashed-file where the key would be if it did exist, and then looks there for it. The rows are stored in no particular order, so such schemes are optimized for instant finding of records in a large table, not for culling records that have a common characteristic, such as all customers in zipcode 10023.
No. You can set up triggers or logic to do it if you want; however, it will not automatically do this.
No it will not change automatically
No, it wont. And hopefully, that's the answer you're hoping for. For any auto-generated identifiers (such as IDENTITY columns), you should, so far as possible, ignore the data type and treat it as an opaque "blob" of identity information.
It gets assigned during insert, and you can use it for cross-referencing purposes, but the fact that it's numeric is not something you should use or rely upon. It's just a stable identifier for the row.
Hi I was wondering what is the best practice for tables in which you have a record that must be unique. I've seen the two ways of doing that: use a Primary Key or add a Unique constraint to the column.
If you use a primary key, is it bad practice to have a primary key such as "UserName" that is varchar(*)? Does that impact performance enough that it is problematic? Is it best to use an integer id with a unique constraint on the username?
I see some other factors that may impact choosing a column as PK vs Unique. Am I right about these?
PK
- Column should be one that doesn't ever need to be changed
Unique
- Column could be changed later on
Having a primary key on the UserName is not the best idea, but it isn't so bad in performance as you maybe think.
The best idea would be using a ID (INT) as PRIMARY KEY and the UserName as UNIQUE.
Usernames change over time, that is why they are a bad candidate for a PK especally since it is extremely likely you have child records associated with the username. For instance suppose my username included some variation of my real name. If I then got divorced and returned to my maiden name, the last thing I want to do is be reminded of that SOB I was married to and so I change my username. Do you really want to change the 2 million posts I've made in the last ten years as well? I didn't think so.
Yes string comparisons are slower but this may or may not be an issue depending on the overall amount of action the database will get. Small copmany database with less than 200 users, probaly not a problem, Internet site with millions of users, much more likely to be a problem.
It may or may not be a good idea as others have already discussed. Let me just add one more detail...
I see some other factors that may impact choosing a column as PK vs Unique.
The main difference is usually related to clustering. Most DBMSes (that support clustering) automatically use PK as a clustering index. For example MySQL/InnoDB always clusters data and you can't event turn it off, while MS SQL Server clusters by default (you have to use special syntax to turn it off).
Should you choose to use clustering (or are forced by your DBMS), having fewer indexes is usually better (e.g. see "Disadvantages of clustering" in this article), even when leading to "fatter" foreign keys.
If I had a site where a user can flag another user post and it cannot be undone or changed, do I need to have a primary key? All my selects would be on the post_id and with a where clause to see if the user already flagged it.
It seems to me from some of your other posts that the reason you are trying to avoid adding a primary key to your table is to save space.
Stop thinking like that.
It's a bad idea to make non-standard optimizations like this without having tested them first to see if they actually work. Have you run some tests that shows that you save a significant amount of space in your database by omitting the primary key on this table? Or are you just guessing?
Using a primary key doesn't necessarily mean that you will use more space. Depending on the database, if you omit the primary key it might add a hidden field for you anyway (for example if you don't have a PK in MySQL/InnoDB it adds a hidden clustered index on a synthetic column containing 6 byte row ID values (source)). If you do use a primary key, rather than adding a new column you can just choose some existing columns that you know should be unique anyway. It won't take up any more space, it will just mean that the data will be stored in a different order to make it easier to search.
When you add an index, that index is going to take up extra space, as an index is basically just a copy of a few columns of the table, plus a link back to the row in the original table. Remember that hidden column the database uses when you don't have a PK? Well now it has to use that to find your rows, so you'll get a copy of it in your index too. If you use a primary key then you probably don't need one of your indexes that you would have added, so you're actually saving space here.
Besides all this, some useful database tools just won't work well if you don't have a primary key on your table. You will annoy everyone that has to maintain your database after you are gone.
So tell me, why do you think it's a good idea to NOT have one?
A primary key has nothing to do with whether data can be changed - it's a single point of reference for an entire row, which can make looking up and/or changing data faster.
All my selects would be on the post_id and with a where clause to see if the user already flagged it.
You need to provide more information about business rules. For example, should the system support more than one user flagging the same post?
If the answer is "no", then I would model a POST_STATUS_CODE table and have a foreign key to the table in your POSTS table.
If the answer is "yes", then I would still have a POST_STATUS_CODE table but also a table linking the POSTS and POST_STATUS_CODE tables - say POSTS_STATUS_XREF.
I have a post_flag table with post_id, user_id (who flagged it) and flag_type (ATM as a byte). I don't see how PK will make it faster in this case but I imagine it will take up 4 or 8 bytes per row. I was thinking about indexing post_id. If I do should I still create a PK?
At a minimum, I would make the primary key to be a combination of:
post_id
user_id
The reason being that a primary key ensures that there can't be duplicates.
A primary key can be made up of more than one column - this is called a compound key. It means that the pair of values is unique. IE: You can't have more than one combination of 1, 1 values, but you could have 1,2, 1,3, etc (and vice versa). Attempts to add duplicates will result in duplicate primary key errors.
Primary keys help speed up lookups and joins, so it's always nice to have if you can.
You don't need a primary key, not even if users are going to modify rows. A primary key optimizes the performance every time you query that table though. If you think your table will grow larger than about a thousand rows or so, then setting a primary key will give a noticeable performance boost.
The only advantage in not creating a primary key really is that it means you don't have to create one, which is fair enough I suppose :-P
You could just not bother creating one for now. You can always add one later. Not a big deal. Don't let anyone bully you into thinking you absolutely must create a primary key right now! You'll see it being horribly slow soon enough :-P and then you can just add the primary key at that point. If you don't have too many duplicates by then :-P
Best have one, if just because you may have to delete the occasional record manually (e.g. duplicates) and one should have a unique identifier for that.
The simple answer is yes. every table should have a primary key (made of at least one column). what benefit do you get for not having one?
In such a situation, you might be able to get away without one, but I'd be inclined to throw a primary key on there anyway, simply because it's relatively simple to do and will save rework if the requirements change.
The software requirements may change rapidly. The customer may introduce new requirements. So having a primary key may be useful because you can eliminate totally unnecessary data migrations in such a situations.
Read this: "Is it OK not to use a Primary Key When I don’t Need one?"
Yes, you do need a primary key.
You may as well use text files for storage if you don't think you do because it means you don't understand them...