I am new to SQL, I am coming from NoSQL.
I have seen that you need to make a unique id for you rows if you want to use unique ids. They are not automatically made by the database as it was in MongoDB. One way to do so is to create auto-incrementing ids.
Are PostgreSQL auto-incrementing id scalable? Does the DB have to insert a row at a time? How does it work?
-----EDIT-----
What I am actually wondering is in a distributed environment is there a risk that two rows may have the same id?
In Postgres, autoincrement is atomic and scalable. In case some inserts fail, some ids can be missing from sequence but inserted are guaranteed to be unique.
Also, all primary keys don't have to be generated. See my answer to your first question.
Autoincrementing columns that are defined as
id bigint PRIMARY KEY DEFAULT nextval('tab_id_seq')
or, more standard compliant, as
id bigint PRIMARY KEY GENERATED ALWAYS AS IDENTITY
use a sequence to generate unique values.
A sequence is a special database object that can very efficiently supply unique integers to concurrent database sessions. I doubt that any identity generator, be it in MongoDB or elsewhere, can be more efficient.
Since getting a new sequence values accesses shared state, you can optimize sequences for high concurrency by defining them with a CACHE value higher than 1. Then each database session that uses the sequence keeps a cache of unique values and doesn't have to access the shared state each time it needs a value.
Related
I'm using SQL SERVER 2017 and using SSMS. I have created a few tables whose Primary Key is int and enabled Is Identity and set Identity Increment = 1 and Identity Seed=1 For all the tables I have used the same method. But When I added one record in a table say Lead it's ID was 2, Then added value to the table say Followup then its ID was 3.
Here I'm adding the screenshots for a better understanding
Lead Table
Followup Table
Is there any option available to avoid this? can we keep the identity individual for each table?
The documentation is quite specific about what identity does not guarantee:
The identity property on a column does not guarantee the following:
Uniqueness of the value . . .
Consecutive values within a transaction . . .
Consecutive values after server restart or other failures . . .
Reuse of values
In general, the "uniqueness" property is a non-issue, because identity columns are usually the primary key (or routinely declared at least unique), which does guarantee uniqueness.
The purpose of an identity column is to provide a unique numeric identifier different from other rows, so it can be readily used as a primary key. There are no other guarantees. And for performance SQL Server has lots of short-cuts that result in gaps.
If you want no gaps, then the simplest way is to assign a value when querying:
row_number() over (order by <identity column>)
That is not 100% satisfying, because deletions can affect the value. I also think that on parallel systems, inserts can as well (because identities might be cached on individual nodes).
If you do not care about performance, you can use a sequence for assigning a value. This is less performant than using an identity. Basically, it requires serializing all the inserts to guarantee the properties of the insert.
I should note that even with a sequence, a failed insert can still produce gaps, so it still might not do what you want.
I'm using a table Mail with auto-increment Id and Mail Address. The table is used in 4 other tables and it is mainly used to save storage (String is only saved once and not 4 times). I'm using INSERT OR IGNORE to just blindly add the mail addresses to the table and if it exists ignore the update. This approach is MUCH faster than checking the existence with SELECT ... and do an INSERT if needed.
For every INSERT OR IGNORE the auto-increment, no matter if ignored or done the auto-increment Id is incremented. I one run I have approx. 500k data sets to proceed. So after every run the the last auto-increment key is incremented by 500k. I know there are 2^63-1 possible keys, so a long time to use them all up.
I also tried INSERT OR REPLACE, but this will increment the Id of the dataset on every run of the command, so this is not a solution at all.
Is there a way to prevent this increase of auto-increment key on every INSERT OR IGNORE?
Table Mail Example (replaced with pseudo Addresses)
mIdMail mMail
"1" ""
"7" "mail1#example.com"
"15" "mail2#example.com"
"17" "mail3#example.com"
"19" "mail4#example.com"
"23" "mail5#example.com"
...
Insert Query (Using Java Lib: org.apache.commons.dbutils)
INSERT OR IGNORE
INTO MAIL
( mMail )
VALUES ( ? );
Table Definition
CREATE TABLE IF NOT EXISTS MAIL (
mIdMail INTEGER PRIMARY KEY AUTOINCREMENT,
mMail CHAR(90) UNIQUE
);
To get autoincrementing values without gaps, drop the AUTOINCREMENT keyword. (Yes, you get autoincrementing values even without it.)
Auto-increment keys behave the way they do specifically because the database guarantees their behavior -- regardless of concurrent transactions and transaction failures.
Auto-increment keys have two guarantees:
They are increasing, so later inserts have larger values than earlier ones.
They are guaranteed to be unique.
The mechanism for allocating the keys does not guarantee no gaps. Why not? Because no-gaps would incur a lot more overhead on the database. Basically, each transaction on the table would need to be completely serialized (that is completed and committed) before the next one can take place. Generally, that is a really bad idea from a performance perspective.
Unfortunately, SQLite doesn't have the simplest solution, which is simply to call row_number() on the auto-incremented keys. You could try to implement a gapless auto-increment using triggers, significantly slowing down your application.
My real suggestion is simply to live with the gaps. Accept them. Surrender. That is how the built-in method works, and for good reason. Now design the rest of the database/application keeping this in mind.
I had the same issue, and changing "INSERT OR IGNORE" into "INSERT OR FAIL" solved the problem, so now when it fails the id value doesn't increment.
I found a t-sql question and its answer. It is too confusing. I could use a little help.
The question is:
You develop a database application. You create four tables. Each table stores different categories of products. You create a Primary Key field on each table.
You need to ensure that the following requirements are met:
The fields must use the minimum amount of space.
The fields must be an incrementing series of values.
The values must be unique among the four tables.
What should you do?
A. Create a ROWVERSION column.
B. Create a SEQUENCE object that uses the INTEGER data type.
C. Use the INTEGER data type along with IDENTITY
D. Use the UNIQUEIDENTIFIER data type along with NEWSEQUENTIALID()
E. Create a TIMESTAMP column.
The said answer is D. But, I think the more suitable answer is B. Because sequence will use less space than GUID and it satisfies all the requirements.
D is a wrong answer, because NEWSEQUENTIALID doesn't guarantee "an incrementing series of values" (second requirement).
NEWSEQUENTIALID()
Creates a GUID that is greater than any GUID
previously generated by this function on a specified computer since
Windows was started. After restarting Windows, the GUID can start
again from a lower range, but is still globally unique.
I'd say that B (sequence) is the correct answer. At least, you can use a sequence to fulfil all three requirements, if you don't restart/recycle it manually. I think it is the easiest way to meet all three requirements.
Between the choices provided D B is the correct answer, since it meets all requirements:
ROWVERSION is a bad choice for a primary key, as stated in MSDN:
Every time that a row with a rowversion column is modified or inserted, the incremented database rowversion value is inserted in the rowversion column. This property makes a rowversion column a poor candidate for keys, especially primary keys. Any update made to the row changes the rowversion value and, therefore, changes the key value. If the column is in a primary key, the old key value is no longer valid, and foreign keys referencing the old value are no longer valid.
TIMESTAMP is deprecated, as stated in that same page:
The timestamp syntax is deprecated. This feature will be removed in a future version of Microsoft SQL Server. Avoid using this feature in new development work, and plan to modify applications that currently use this feature.
An IDENTITY column does not guarantee uniqueness, unless all it's values are only ever generated automatically (you can use SET IDENTITY_INSERT to insert values manually), nor does it guarantee uniqueness between tables for any value.
A GUID is practically guaranteed to be unique per system, so if a guid is the primary key for all 4 tables it ensures uniqueness for all tables. the one requirement it doesn't fulfill is storage size - It's storage size is quadruple that of int (16 bytes instead of 4).
A SEQUENCE, when is not declared as recycle, guarantee uniqueness, and has the lowest storage size.
The sequence of numeric values is generated in an ascending or descending order at a defined interval and can be configured to restart (cycle) when exhausted.
However,
I would actually probably choose a different option all together - create a base table with a single identity column and link it with a 1:1 relationship with all other categories. then use an instead of insert trigger for all categories tables that will first insert a record to the base table and then use scope_identity() to get the value and insert it as the primary key for the category table.
This will enforce uniqueness as well as make it possible to use a single foreign key reference between the categories and products.
The issue has been discussed extensively in the past, in general:
http://blog.codinghorror.com/primary-keys-ids-versus-guids/
The constraint #3 is why a SEQUENCE could run into issues as there is a higher risk of collision/lowered number of possible rows in each table.
Hi I have a situation like i am inserting a unique value into Data Base along with primary key
by generating in java code. This unique Id has time stamp Ex:'BatchID16Jul1411111111'. where it is extended up to milliseconds.Now if two users hit at same time same unique ids are generated.
Is there any way to make this times tamp unique even it is called at same time.
Is it possible by getting auto increment number from DB.
Can any one suggest me solution for this situation.
Thanks in advance
Mahesh
Yes, it is possible to get an auto-incremented number from the database. The exact syntax depends on the database. These typically use one of three methods:
An auto-increment declaration in the create table statement;
An identity declaration in the create table statement; or,
A sequence assigned as a default value to the primary key column.
Note, though, that the auto-incremented number will not have any meaning. So, you will need a separate column for the 'BatchId' and for the date time.
I have a database used by several clients. I don't really want surrogate incremental key values to bleed between clients. I want the numbering to start from 1 and be client specific.
I'll use a two-part composite key of the tenant_id as well as an incremental id.
What is the best way to create an incremental key per tenant?
I am using SQL Server Azure. I'm concerned about locking tables, duplicate keys, etc. I'd typically set the primary key to IDENTITY and move on.
Thanks
Are you planning on using SQL Azure Federations in the future? If so, the current version of SQL Azure Federations does not support the use of IDENTITY as part of a clustered index. See this What alternatives exist to using guid as clustered index on tables in SQL Azure (Federations) for more details.
If you haven't looked at Federations yet, you might want to check it out as it provides an interesting way to both shard the database and for tenant isolation within the database.
Depending upon your end goal, using Federations you might be able to use a GUID as the primary clustered index on the table and also use an incremental INT IDENTITY field on the table. This INT IDENTITY field could be shown to end-users. If you are federating on the TenantID each "Tenant table" effectively becomes a silo (as I understand it at least) so the use of IDENTITY on a field within that table would effectively be an ever increasing auto generated value which increments within a given Tenant.
When \ if data is merged together (combining data from multiple Tenants) you would wind up with collisions on this INT IDENTITY field (hence why IDENTITY isn't supported as a primary key in federations) but as long as you aren't using this field as a unique identifier within the system at large you should be ok.
If you're looking to duplicate the convenience of having an automatically assigned unique INT key upon insert, you could add an INSTEAD OF INSERT trigger that uses MAX of the existing column +1 to determine the next value.
If the column with the identity value is the first key in an index, the MAX query will be a simple index seek, very efficient.
Transactions will ensure that unique values are assigned but this approach will have different locking semantics than the standard identity column. IIRC, SQL Server can allocate a different identity value for each transaction that requests it in parallel and if a transaction is rolled back, the value(s) allocated to it are discarded. The MAX approach would only allow one transaction to insert rows into the table at a time.
A related approach could be to have a dedicated key value table keyed by the table name, tenant ID and current identity value. It would require the same INSTEAD OF INSERT trigger and more boilerplate to query and keep that key table updated. It wouldn't improve parallel operations though; the lock would just be on a different table's record.
One possibility to fix the locking bottleneck would be to include the current SPID in the key's value (now the identity key is a combination of sequential int and whatever SPID happened to allocate it and not simply sequential), use the dedicated identity value table and insert records there per SPID as necessary; the identity table PK would be (table name, tenant, SPID) and have a non-key column with the current sequential value. That way, each SPID would have its own dynamically allocated identity pool and would only ever have its own SPID specific records locked.
Another downside is maintaining triggers that have to be updated whenever you change the columns in any of the special identity tables.