Why do SQL id sequences go out of sync (specifically using Postgres)? - sql

I've seen solutions for updating a sequence when it goes out of sync with the primary key it's generating, but I don't understand how this problem occurs in the first place.
Does anyone have insight into how a primary key field, with its default defined as the nextval of a sequence, whose primary keys aren't set explicitly anywhere, can go out of sync with the sequence? I'm using Postgres, and we see this occur now and then. It results eventually in a duplicate key constraint when the sequence produces an id for an existing row.

Your application is probably occasionally setting the value for the primary key for a new row. Then postgresql has no need to get a new sequence and the sequence doesn't get updated.

When a sequence number is allocated, it remains allocated, even if the TX that requested it is rolled back. So a number can be allocated that does not appear in the stable database. Of course, rows can also be deleted after they are created, so the maximum number found in the table need not be the maximum number ever allocated. This applies to any auto-incrementing type.
Also, depending on the technology used, separate sequences can be used with multiple tables, so a value might be missing from TableA but present in TableB. That could be because of a mistake in the use of sequence names, or it might be intentional.

Related

Confusing t-sql exam answer about sequence or uniqueidentifier

I found a t-sql question and its answer. It is too confusing. I could use a little help.
The question is:
You develop a database application. You create four tables. Each table stores different categories of products. You create a Primary Key field on each table.
You need to ensure that the following requirements are met:
The fields must use the minimum amount of space.
The fields must be an incrementing series of values.
The values must be unique among the four tables.
What should you do?
A. Create a ROWVERSION column.
B. Create a SEQUENCE object that uses the INTEGER data type.
C. Use the INTEGER data type along with IDENTITY
D. Use the UNIQUEIDENTIFIER data type along with NEWSEQUENTIALID()
E. Create a TIMESTAMP column.
The said answer is D. But, I think the more suitable answer is B. Because sequence will use less space than GUID and it satisfies all the requirements.
D is a wrong answer, because NEWSEQUENTIALID doesn't guarantee "an incrementing series of values" (second requirement).
NEWSEQUENTIALID()
Creates a GUID that is greater than any GUID
previously generated by this function on a specified computer since
Windows was started. After restarting Windows, the GUID can start
again from a lower range, but is still globally unique.
I'd say that B (sequence) is the correct answer. At least, you can use a sequence to fulfil all three requirements, if you don't restart/recycle it manually. I think it is the easiest way to meet all three requirements.
Between the choices provided D B is the correct answer, since it meets all requirements:
ROWVERSION is a bad choice for a primary key, as stated in MSDN:
Every time that a row with a rowversion column is modified or inserted, the incremented database rowversion value is inserted in the rowversion column. This property makes a rowversion column a poor candidate for keys, especially primary keys. Any update made to the row changes the rowversion value and, therefore, changes the key value. If the column is in a primary key, the old key value is no longer valid, and foreign keys referencing the old value are no longer valid.
TIMESTAMP is deprecated, as stated in that same page:
The timestamp syntax is deprecated. This feature will be removed in a future version of Microsoft SQL Server. Avoid using this feature in new development work, and plan to modify applications that currently use this feature.
An IDENTITY column does not guarantee uniqueness, unless all it's values are only ever generated automatically (you can use SET IDENTITY_INSERT to insert values manually), nor does it guarantee uniqueness between tables for any value.
A GUID is practically guaranteed to be unique per system, so if a guid is the primary key for all 4 tables it ensures uniqueness for all tables. the one requirement it doesn't fulfill is storage size - It's storage size is quadruple that of int (16 bytes instead of 4).
A SEQUENCE, when is not declared as recycle, guarantee uniqueness, and has the lowest storage size.
The sequence of numeric values is generated in an ascending or descending order at a defined interval and can be configured to restart (cycle) when exhausted.
However,
I would actually probably choose a different option all together - create a base table with a single identity column and link it with a 1:1 relationship with all other categories. then use an instead of insert trigger for all categories tables that will first insert a record to the base table and then use scope_identity() to get the value and insert it as the primary key for the category table.
This will enforce uniqueness as well as make it possible to use a single foreign key reference between the categories and products.
The issue has been discussed extensively in the past, in general:
http://blog.codinghorror.com/primary-keys-ids-versus-guids/
The constraint #3 is why a SEQUENCE could run into issues as there is a higher risk of collision/lowered number of possible rows in each table.

SQL - Must there always be a primary key?

There are a couple of similar questions already out there and the consensus seemed to be that a primary key should always be created.
But what if you have a single row table for storing settings (and let's not turn this into a discussion about why it might be good/bad to create a single row table please)?
Surely having a primary key on a single row table becomes completely useless?
It may seem completely useless, but it's also completely harmless, and I'd vote for harmless with good design principles vs. useless with no design principles every time.
Other people have commented, rightly, that you don't know how you're going to use the table in a year or five years... what if someone comes along and decides they want to duplicate the configuration -- move it to a distributed environment or add a test environment by using a duplicate configuration string or whatever. Having a field that acts like a primary key means that whenever you query the table, if you use the key, you'll be certain no matter what anyone else may do to your table, that you're getting the correct record.
You're right there are a million other aspects -- surrogate keys vs. intelligent keys, indexing, partitioning (silly on a single row table, I know), whatever... but without getting into that I'd vote add the key rather than not add it. You could have done it by the time you read this thread.
Short answer, no key, duplicate records possible. Your planning a single row now, but what about six months in the future when you single row multiplies. Put a primary key on the table, even for single row.
You could always base your primary key on the name of the setting. Then your table would become a key-value store.
But no, in many RDBMS you are not REQUIRED to have a primary key per table.
Having declared a primary key on a single row table in SQL will ensure that there will be no duplicates. Whether it is useless depends on your requirements. Usually it is a good idea to avoid duplicates.

sql primary key auto increment

Is having a primary key that auto increments on each new row necessary? for me this number is getting quite long and I'm not even using it for anything.
I can imagine that with gradual user activity on my site new rows will be added (I am only testing atm with just 2 alfa test users and already the number has auto incremented to over 100), eventually this number could reach silly proportions (example: 10029379000577352881086) and not only slow the site down (effecting user experience) but also could inevitably push my site over its quota (exceeding its allowed size (laymen's))
really is this needed?
If you have some field/column (or combination of columns) which can be a primary key, use that, why use Auto increment. There are school of thoughts which believe using a mix of both. You could search for surrogate keys and you may find this answer interesting Surrogate vs. natural/business keys
For size quota problem, practically I don't think the maximum auto increment value would cause your site to go over data limit. If it is of int type it will take 4 bytes, regardless of the value inside. For SQL server int type could contain values ranging from -2^31 (-2,147,483,648) to 2^31-1 (2,147,483,647).
Here is the link for that
You need a way to uniquely identify each record in your table.
If you have that already -- say a user-ID or email-address -- then you don't necessarily need that auto-incrementing field.
Note: If you don't already have a unique constraint on that field, you should add one so that duplicate data cannot be entered into the table.
Warning: If you decide to get rid of it, be sure that no other tables are using it.
can't you user multiple columns to get a composite key instead of that?
just a hint.
You do need a key that identifies every row. But a key doesn't have to be a number that "auto-increments" for every row. The fact that a few people seem to think incrementing numbers are always a good idea for keys is probably a consequence either of carelessness or a lack of appreciation of database fundamentals, sound design and data integrity.
primary key is not always necessary to have for a table . for your question check my answer:
when and when not primary key should use

Fixing holes/gaps in numbers generated by Postgres sequence

I have a postgres database that uses sequences extensively to generate primary keys of tables.
After a lot of usage of this database i.e. Adding/Update/Delete operation the columns that uses sequences for primary keys now have a lot holes/gaps in them and the sequence value itself is very high.
My question is: Are there any ways in which we can fix these gaps in Primary Keys? which should inturn bring down the max value of the number in that columns and then reset the sequence?
Note: A lot of these columns are also referenced by other columns as ForeignKeys.
If you feel the need to fill gaps in auto-generated Posgresql sequence numbers, I have the feeling you need another field in your table, like some kind of "number" you increment programmatically, either in your code, or in a trigger.
It is possible to solve this problem, but is expensive for the database to do (especially IO) and is guaranteed to reoccur. I would not worry about this problem. If you get close to 4B, upgrade your primary and foreign keys to BIGSERIAL and BIGINT. If you're getting close to 2^64... well... I'd be interested in hearing more about your application. :~]
Postgres allows you to update PKs, although a lot of people think it's bad practice. So you could lock the table, and UPDATE. (You can make an oldkey, newkey table all sorts of ways, e.g., window function.) All the FK relationships have to be marked to cascade. Then you can reset the currval of the id sequence.
Personally, I would just use a BIGSERIAL. If you have so many updates and deletes that you may run out even so, maybe there is some composite PK based on (say) a timestamp and id that would help you.

How can I get the Primary Key id of a file I just INSERTED?

Earlier today I asked this question which arose from A- My poor planning and B- My complete disregard for the practice of normalizing databases. I spent the last 8 hours reading about normalizing databases and the finer points of JOIN and worked my way through the SQLZoo.com tutorials.
I am enlightened. I understand the purpose of database normalization and how it can suit me. Except that I'm not entirely sure how to execute that vision from a procedural standpoint.
Here's my old vision: 1 table called "files" that held, let's say, a file id and a file url and appropos grade levels for that file.
New vision!: 1 table for "files", 1 table for "grades", and a junction table to mediate.
But that's not my problem. This is a really basic Q that I'm sure has an obvious answer- When I create a record in "files", it gets assigned the incremented primary key automatically (file_id). However, from now on I'm going to need to write that file_id to the other tables as well. Because I don't assign that id manually, how do I know what it is?
If I upload text.doc and it gets file_id 123, how do I know it got 123 in order to write it to "grades" and the junction table? I can't do a max(file_id) because if you have concurrent users, you might nab a different id. I just don't know how to get the file_id value without having manually assigned it.
You may want to use LAST_INSERT_ID() as in the following example:
START TRANSACTION;
INSERT INTO files (file_id, url) VALUES (NULL, 'text.doc');
INSERT INTO grades (file_id, grade) VALUES (LAST_INSERT_ID(), 'some-grade');
COMMIT;
The transaction ensures that the operation remains atomic: This guarantees that either both inserts complete successfully or none at all. This is optional, but it is recommended in order to maintain the integrity of the data.
For LAST_INSERT_ID(), the most
recently generated ID is maintained in
the server on a per-connection basis.
It is not changed by another client.
It is not even changed if you update
another AUTO_INCREMENT column with a
nonmagic value (that is, a value that
is not NULL and not 0).
Using
LAST_INSERT_ID() and AUTO_INCREMENT
columns simultaneously from multiple
clients is perfectly valid. Each
client will receive the last inserted
ID for the last statement that client
executed.
Source and further reading:
MySQL Reference: How to Get the Unique ID for the Last Inserted Row
MySQL Reference: START TRANSACTION, COMMIT, and ROLLBACK Syntax
In PHP to get the automatically generated ID of a MySQL record, use mysqli->insert_id property of your mysqli object.
How are you going to find the entry tomorrow, after your program has forgotten the value of last_insert_id()?
Using a surrogate key is fine, but your table still represents an entity, and you should be able to answer the question: what measurable properties define this particular entity? The set of these properties are the natural key of your table, and even if you use surrogate keys, such a natural key should always exist and you should use it to retrieve information from the table. Use the surrogate key to enforce referential integrity, for indexing purpuses and to make joins easier on the eye. But don't let them escape from the database