Does this query guarantee me a 'race free' PK value?

Does this query guarantee me a 'race free' PK value? - sql

I was just reading How to avoid a database race condition when manually incrementing PK of new row.
There was a lot of good suggestions like having a separate table to get the PK values.
So I wonder if a query like this:
INSERT INTO Party VALUES(
(SELECT MAX(id)+1 FROM
(SELECT id FROM Party) as x),
'A-XXXXXXXX-X','Joseph')
could avoid race conditions?
Is the whole statement guaranteed to be atomic? Isn't in mysql? postgresql?

The best way to avoid race conditions while creating primary keys in a relational database is to allow the database to generate the primary keys.

It would work on tables which use table-level locking (MyISAM), but on Innodb etc, it could deadlock or produce duplicate keys, I think, depending on the isolation level in use.
In any case doing this is an extremely bad idea as it won't work well in the general case, but might appear to work during low-concurrency testing. It's a recipe for trouble.
You'd be better off using another table and incrementing a value in there; that's more likely to be race-free / deadlock-free.

No, you still have a problem, as, if two queries try to increment at the same time there may be a situation where the inner select is done, then another query is processed.
Your best bet, if you want a guarantee, if you don't want the database doing it, is to have a unique key on there.
In the event that there is an error in inserting, then try your query again, and once the primary key is unique it will work.
In this case, your best bet is to first insert only the id and any other non-null columns, and then do an update to set the nullable columns to whatever is correct.

Related

Using Identity in SQL Server without race condition

So let's say you have a table of Patients with an IDENTITY(1,1) for the primary key. By using ##Identity, how do we avoid a race condition where two people may save a new patient at the same time? Obviously, duplicate ID's in the Patients table would not be created, but what if the application needed the ID for one of the inserted patients to update a record in another table elsewhere? How do we know that ##Identity won't get the ID of the other record if both are inserted at the same time?
Or is there a best practice for avoiding this?
JamesNT

Yes, there is a best practice. Don't use ##Identity.
The safest way to get the identity values assigned in an insert statement is to use the OUTPUT clause. You should start with the documentation.
This has numerous advantages:
It does not get confused by triggers and nested statements.
It can handle multiple inserts at the same time.
It can return the values of other columns, not just the identity column.
It specifically returns the rows affected by the transaction, so you don't even think about sessions, users, or anything else.

##IDENTITY will not cause a race condition but it is NOT best practice either. You should instead be using SCOPE_IDENTITY.
http://blog.sqlauthority.com/2007/03/25/sql-server-identity-vs-scope_identity-vs-ident_current-retrieve-last-inserted-identity-of-record/

SQL - Must there always be a primary key?

There are a couple of similar questions already out there and the consensus seemed to be that a primary key should always be created.
But what if you have a single row table for storing settings (and let's not turn this into a discussion about why it might be good/bad to create a single row table please)?
Surely having a primary key on a single row table becomes completely useless?

It may seem completely useless, but it's also completely harmless, and I'd vote for harmless with good design principles vs. useless with no design principles every time.
Other people have commented, rightly, that you don't know how you're going to use the table in a year or five years... what if someone comes along and decides they want to duplicate the configuration -- move it to a distributed environment or add a test environment by using a duplicate configuration string or whatever. Having a field that acts like a primary key means that whenever you query the table, if you use the key, you'll be certain no matter what anyone else may do to your table, that you're getting the correct record.
You're right there are a million other aspects -- surrogate keys vs. intelligent keys, indexing, partitioning (silly on a single row table, I know), whatever... but without getting into that I'd vote add the key rather than not add it. You could have done it by the time you read this thread.

Short answer, no key, duplicate records possible. Your planning a single row now, but what about six months in the future when you single row multiplies. Put a primary key on the table, even for single row.

You could always base your primary key on the name of the setting. Then your table would become a key-value store.
But no, in many RDBMS you are not REQUIRED to have a primary key per table.

Having declared a primary key on a single row table in SQL will ensure that there will be no duplicates. Whether it is useless depends on your requirements. Usually it is a good idea to avoid duplicates.

update part of the key

Q:
If I have a composite key combined from 4 fields for example, can I update one of them?
I mean can I execute a statement like this:
UPDATE tb
SET firstCol = '15', secondCol = 'test2'
WHERE firstCol = '1' AND serial = '2';
Given:
my table name is: tb
my fields are: firstCol, secondCol, serial
my keys are: firstCol , serial
Any suggestions? Did I miss some concept?
thanks.

Of course you can do that, why?
Do you have a problem doing that?

You may run into problems in updating if you try to make some row into the same values as an existing row. No matter what you do in an update, the unique constraint will still apply.
If you have related tables and have cascade update turned on, you may have locking issues if many records need to be locked. If you do not have cascade update turned on, you may have issues where a PK cannot be changed until you break those relationships and then put them back after manually changing all the related tables to the new value. This task, either way, should only be done in single user mode during non-peak hours.
Personally, if you need to change the PK, the design of your database is fragile and may cause problems in the future. Especially with a multicolumn key. If this is a one-time, rare change, go ahead and work through the issues. Otherwise, it might be time to decide if having a surrogate key as the PK and a unique index on the multi-columns is a better choice. Multicolumn PKs create much larger indexes not only on the main table but the child tables as well, they can create difficult issues when you need to update one of the columns, and they have performance implications for joins. In general I'm not a fan of them. And defintely not if there are some of those columns that will need updating with any frequency (and by that I mean any large update more than once a year - one or two records OK, but if you are running an update as described more often than once a year, you need to revisit the design in my opinion.).

Yes you can. It may be part of the key but it's still a column.
Note: if you have FKs relying on this key then you'll need to consider CASCASE updates. Also, a key update (assuming its clustered) means more work then "normal" because of how non clustered indexes refer to the clustered key

Fixing holes/gaps in numbers generated by Postgres sequence

I have a postgres database that uses sequences extensively to generate primary keys of tables.
After a lot of usage of this database i.e. Adding/Update/Delete operation the columns that uses sequences for primary keys now have a lot holes/gaps in them and the sequence value itself is very high.
My question is: Are there any ways in which we can fix these gaps in Primary Keys? which should inturn bring down the max value of the number in that columns and then reset the sequence?
Note: A lot of these columns are also referenced by other columns as ForeignKeys.

If you feel the need to fill gaps in auto-generated Posgresql sequence numbers, I have the feeling you need another field in your table, like some kind of "number" you increment programmatically, either in your code, or in a trigger.

It is possible to solve this problem, but is expensive for the database to do (especially IO) and is guaranteed to reoccur. I would not worry about this problem. If you get close to 4B, upgrade your primary and foreign keys to BIGSERIAL and BIGINT. If you're getting close to 2^64... well... I'd be interested in hearing more about your application. :~]

Postgres allows you to update PKs, although a lot of people think it's bad practice. So you could lock the table, and UPDATE. (You can make an oldkey, newkey table all sorts of ways, e.g., window function.) All the FK relationships have to be marked to cascade. Then you can reset the currval of the id sequence.
Personally, I would just use a BIGSERIAL. If you have so many updates and deletes that you may run out even so, maybe there is some composite PK based on (say) a timestamp and id that would help you.

Should i have a primary ID? i am indexing another field

Using sqlite i need a table to hold a blob to store a md5 hash and a 4byte int. I plan to index the int but this value will not be unique.
Do i need a primary key for this table? and is there an issue with indexing a non unique value? (I assume there is not issue or reason for any).

Personally, I like to have a unique primary id on all tables. It makes finding unique records for updating/deleting easier.

How are you going to reference on a SELECT * FROM Table WHERE or an UPDATE ... WHERE? Are you sure you want each one?

You already have one.
SQLite automatically creates an integer ROWID column for every row of every table. This can function as a primary key if you don't declare your own.
In general it's a good idea to declare your own primary key column. In the particular instance you mentioned, ROWID will probably be fine for you.

My advice is to go with primary key if you want to have referential integrity. However there is no issue with indexing a non unique value. The only thing is that your performance will downgrade a little.

What are the consequences of letting two identical rows somehow get into this table?
One consequence is, of course, wasted space. But I'm talking about something more fundamental, here. There are times when duplicate rows in data give you wrong results. For example, if you grouped by the int column (field), and listed the count of rows in each group, a duplicate row (record) might throw you off, depending on what you are really looking for.
Relational databases work better if they are based on relations. Relations are always in first normal form. The primary reason for declaring a primary key is to prevent the table from getting out of first normal form, and thus not representing a relation.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas