There a way to have primary key ids without any gaps - sql

I'm creating an application with Java Spring and Oracle DB.
In the app, I want to generate a primary key value that is unique as well as ordered and without gaps: 1,2,3,4,5 instead of 1,2,5,7,8,9.
I've at one point used max(id) + 1 to get the maximum value of the id, and the id of the next/current transaction. However I know it isn't perfect in the case of concurrency with multiple users or multiple sessions.
I've tried using sequences, but even with the ORDER tag it could still create gaps with the possibility of a failed transaction.
REATE SEQUENCE num_seq
START WITH 1
INCREMENT BY 1
ORDER NOCACHE NOCYCLE;
I need there to be gapless values as a requirement, however I'm unsure how it's possible in the case of multiple users/multiple sessions.

Don't do it.
The goal of primary keys is not to be displayed on the UI or to be exposed to the external world, but only to provide a unique identifier of the row.
In simple words, a primary key doesn't need to be sexy or good looking. It's an internal identifier.
If you are considering the idea of having serial identifier, that means you probably want to display it somewhere or you want to expose it to the external world. If that's the case, then create a secondary column (also unique) that serves this "public relations" goal. It can be automatically generated, or updated at leisure without affecting the integrity of the database.
It can also be generated by a secondary process that runs in a deferred way (e.g. every 10 minutes) that finds all the "unassigned" new rows, and gives them the new number. This has the advantage that is not vulnerable to concurrency.

Related

Are PostgreSQL autoicrementing id scalable?

I am new to SQL, I am coming from NoSQL.
I have seen that you need to make a unique id for you rows if you want to use unique ids. They are not automatically made by the database as it was in MongoDB. One way to do so is to create auto-incrementing ids.
Are PostgreSQL auto-incrementing id scalable? Does the DB have to insert a row at a time? How does it work?
-----EDIT-----
What I am actually wondering is in a distributed environment is there a risk that two rows may have the same id?
In Postgres, autoincrement is atomic and scalable. In case some inserts fail, some ids can be missing from sequence but inserted are guaranteed to be unique.
Also, all primary keys don't have to be generated. See my answer to your first question.
Autoincrementing columns that are defined as
id bigint PRIMARY KEY DEFAULT nextval('tab_id_seq')
or, more standard compliant, as
id bigint PRIMARY KEY GENERATED ALWAYS AS IDENTITY
use a sequence to generate unique values.
A sequence is a special database object that can very efficiently supply unique integers to concurrent database sessions. I doubt that any identity generator, be it in MongoDB or elsewhere, can be more efficient.
Since getting a new sequence values accesses shared state, you can optimize sequences for high concurrency by defining them with a CACHE value higher than 1. Then each database session that uses the sequence keeps a cache of unique values and doesn't have to access the shared state each time it needs a value.

advantages and disadvantages of database automatic number generator for each row vs manual numbering for each row

Imagine two tables that implemented like the following description:
The first table rows numbers created by database system administration automatically.
The second table rows numbers created manually by the programmer in a sequential order.
The main question is what are the advantages and disadvantages of these two approaches?
One distinct advantage of having the database manage auto-numbering over manually creating them is that the database implementation is thread safe - and manually creating them is usually (99.9% of the cases) is not (It's hard to do it correctly).
On the other hand, the database implementation does not guarantee sequential numbering - there can be gaps in the numbers.
Given these two facts, an auto-increment column should be used only as a surrogate key, when the values of this column does not have any business meaning - but they are simple used as a simple row identifier.
Please note that when using a surrogate key, it's best to also enforce uniqueness of a natural key - otherwise you might get rows where all the data is duplicated except the surrogate key.
When the database automatically create numbers, you habe less work.
Think about a sign up system, you have fields like name, email, password and so one:
1.) the number is generated by the database, so you can just insert the data into the table.
2.) if this is not the case you have to get the last number, so before the insert into you have to get the last id so instead a insert into you have a select + insert into.
Another reason is, what happened when you delete a row in your table?
Maybe in a forum, you want to delete the account but not all of his posts, so you can work with a workaround and when a post has a user_id not given you know this is/was a deleted or banned account - if you give a new user the number from a deleted user you will come in trouble.

Why do SQL id sequences go out of sync (specifically using Postgres)?

I've seen solutions for updating a sequence when it goes out of sync with the primary key it's generating, but I don't understand how this problem occurs in the first place.
Does anyone have insight into how a primary key field, with its default defined as the nextval of a sequence, whose primary keys aren't set explicitly anywhere, can go out of sync with the sequence? I'm using Postgres, and we see this occur now and then. It results eventually in a duplicate key constraint when the sequence produces an id for an existing row.
Your application is probably occasionally setting the value for the primary key for a new row. Then postgresql has no need to get a new sequence and the sequence doesn't get updated.
When a sequence number is allocated, it remains allocated, even if the TX that requested it is rolled back. So a number can be allocated that does not appear in the stable database. Of course, rows can also be deleted after they are created, so the maximum number found in the table need not be the maximum number ever allocated. This applies to any auto-incrementing type.
Also, depending on the technology used, separate sequences can be used with multiple tables, so a value might be missing from TableA but present in TableB. That could be because of a mistake in the use of sequence names, or it might be intentional.

Concurrent insert of keys into a table

Probably a trivial question, but I want to get the best possible solution.
Problem:
I have two or more workers that insert keys into one or more tables. The problem arises when two or more workers try to insert the same key into one of those key tables at the same time.
Typical problem.
Worker A reads the table if a key exists (SELECT). There is no key.
Worker B reads the table if a key exists (SELECT). There is no key.
Worker A inserts the key.
Worker B inserts the key.
Worker A commits.
Worker B commits. Exception is throws as unique constraint is violated
The key tables are simple pairs. First column is autoincrement integer and the second is varchar key.
What is the best solution to such a concurrency problem? I believe it is a common problem. One way for sure is to handle the exceptions thrown, but somehow I don't believe this is the best way to tackle this.
The database I use is Firebird 2.5
EDIT:
Some additional info to make things clear.
Client side synchronization is not a good approach, because the inserts come from different processes (workers). And I could have workers across different machines someday, so even mutexes are a no-go.
The primary key and the first columns of such a table is autoincrement field. No problem there. The varchar field is the problem as it is something that the client inserts.
Typical such table is a table of users. For instance:
1 2056
2 1044
3 1896
4 5966
...
Each worker check if user "xxxx" exists and if not inserts it.
EDIT 2:
Just for the reference if somebody will go the same route. IB/FB return pair of error codes (I am using InterBase Express components). Checking for duplicate value violation look like this:
except
on E: EIBInterBaseError do
begin
if (E.SQLCode = -803) and (E.IBErrorCode = 335544349) then
begin
FKeysConnection.IBT.Rollback;
EnteredKeys := False;
end;
end;
end;
With Firebird you can use the following statement:
UPDATE OR INSERT INTO MY_TABLE (MY_KEY) VALUES (:MY_KEY) MATCHING (MY_KEY) RETURNING MY_ID
assuming there is a BEFORE INSERT trigger which will generate the MY_ID if a NULL value is being inserted.
Here is the documentation.
Update: The above statement will avoid exceptions and cause every statement to succeed. However, in case of many duplicate key values it will also cause many unnecessary updates.
This can be avoided by another approach: just handle the unique constraint exception on the client and ignore it. The details depend on which Delphi library you're using to work with Firebird but it should be possible to examine the SQLCode returned by the server and ignore only the specific case of unique constraint violation.
I do not know if something like this is avalible in Firebird but in SQL Server you can check when inserting the key.
insert into Table1 (KeyValue)
select 'NewKey'
where not exists (select *
from Table1
where KeyValue = 'NewKey')
First option - don't do it.
Don't do it; Unless the WORKERS are doing extraordinary amounts of work (we're talking about computers, so requiring 1 second per record qualifies as "extraordinary amount of work"), just use a single thread; Even better, do all the work in a stored procedure, you'd be amazed by the speedup gained by not transporting data over whatever protocol into your app.
Second option - Use a Queue
Make sure your worker threads don't all work on the same ID. Set up a Queue, push all the ID's that need processing into that queue, have each working thread Dequeue an ID from that Queue. This way you're guaranteed no two workers work on the same record at the same time. This might be difficult to implement if your workers are not all part of the same process.
Last resort
Set up an DB-based "Reservation" system so an Worker Thread can mark a Key for "work in process" so no two workers would work on the same Key. I'd set up a table like this:
CREATE TABLE KEY_RESERVATIONS (
KEY INTEGER NOT NULL, /* This is the KEY you'd be reserving */
RESERVED_UNTIL TIMESTAMP NOT NULL /* We don't want to keep reservations for ever in case of failure */
);
Each of your workers would use short transactions to work on that table: Select a candidate Key, one that's not in the KEY_RESERVATIONS table. Try to INSERT. Failed? Try an other KEY. Periodically delete all reserved key with old RESERVED_UNTIL timestamps. Make sure the transactions for working with KEY_RESERVATIONS are as short as possible, so that two threads both trying to reserve the same key at the same time would fail quickly.
This is what you have to deal with in an optimistic (or no-) locking scheme.
One way to avoid it is to put a pessimistic lock on the table around the whole select, insert, commit sequence.
However, that means you will have to deal with not being able to access the table (handle table-locked exceptions).
If by workers you mean threads in the same application instance instead of different users (application instances), you will need thread synchronization like kubal5003 says around the select-insert-commit sequence.
A combination of the two is needed if you have multiple users/application instances each with multiple threads.
Synchronize your threads to make it impossible to insert the same value or use a db side key generation method (I don't know Firebird so I don't even know if it's there, eg. in MsSQL Server there is identity column or GUIDs also solve the problem because it's unlikely to generate two identical ones)
You should not rely the client to generate the unique key, if there's possibility for duplicates.
Use triggers and generators (maybe with help of stored procedure) to create always unique keys.
More information about proper autoinc implementation in Firebird here: http://www.firebirdfaq.org/faq29/

How can I get the Primary Key id of a file I just INSERTED?

Earlier today I asked this question which arose from A- My poor planning and B- My complete disregard for the practice of normalizing databases. I spent the last 8 hours reading about normalizing databases and the finer points of JOIN and worked my way through the SQLZoo.com tutorials.
I am enlightened. I understand the purpose of database normalization and how it can suit me. Except that I'm not entirely sure how to execute that vision from a procedural standpoint.
Here's my old vision: 1 table called "files" that held, let's say, a file id and a file url and appropos grade levels for that file.
New vision!: 1 table for "files", 1 table for "grades", and a junction table to mediate.
But that's not my problem. This is a really basic Q that I'm sure has an obvious answer- When I create a record in "files", it gets assigned the incremented primary key automatically (file_id). However, from now on I'm going to need to write that file_id to the other tables as well. Because I don't assign that id manually, how do I know what it is?
If I upload text.doc and it gets file_id 123, how do I know it got 123 in order to write it to "grades" and the junction table? I can't do a max(file_id) because if you have concurrent users, you might nab a different id. I just don't know how to get the file_id value without having manually assigned it.
You may want to use LAST_INSERT_ID() as in the following example:
START TRANSACTION;
INSERT INTO files (file_id, url) VALUES (NULL, 'text.doc');
INSERT INTO grades (file_id, grade) VALUES (LAST_INSERT_ID(), 'some-grade');
COMMIT;
The transaction ensures that the operation remains atomic: This guarantees that either both inserts complete successfully or none at all. This is optional, but it is recommended in order to maintain the integrity of the data.
For LAST_INSERT_ID(), the most
recently generated ID is maintained in
the server on a per-connection basis.
It is not changed by another client.
It is not even changed if you update
another AUTO_INCREMENT column with a
nonmagic value (that is, a value that
is not NULL and not 0).
Using
LAST_INSERT_ID() and AUTO_INCREMENT
columns simultaneously from multiple
clients is perfectly valid. Each
client will receive the last inserted
ID for the last statement that client
executed.
Source and further reading:
MySQL Reference: How to Get the Unique ID for the Last Inserted Row
MySQL Reference: START TRANSACTION, COMMIT, and ROLLBACK Syntax
In PHP to get the automatically generated ID of a MySQL record, use mysqli->insert_id property of your mysqli object.
How are you going to find the entry tomorrow, after your program has forgotten the value of last_insert_id()?
Using a surrogate key is fine, but your table still represents an entity, and you should be able to answer the question: what measurable properties define this particular entity? The set of these properties are the natural key of your table, and even if you use surrogate keys, such a natural key should always exist and you should use it to retrieve information from the table. Use the surrogate key to enforce referential integrity, for indexing purpuses and to make joins easier on the eye. But don't let them escape from the database