Sequence vs identity - sql

SQL Server 2012 introduced Sequence as a new feature, same as in Oracle and Postgres. Where sequences are preferred over identities? And why do we need sequences?

I think you will find your answer here
Using the identity attribute for a column, you can easily generate
auto-incrementing numbers (which as often used as a primary key). With
Sequence, it will be a different object which you can attach to a
table column while inserting. Unlike identity, the next number for the
column value will be retrieved from memory rather than from the disk –
this makes Sequence significantly faster than Identity. We will see
this in coming examples.
And here:
Sequences: Sequences have been requested by the SQL Server community
for years, and it's included in this release. Sequence is a user
defined object that generates a sequence of a number. Here is an
example using Sequence.
and here as well:
A SQL Server sequence object generates sequence of numbers just like
an identity column in sql tables. But the advantage of sequence
numbers is the sequence number object is not limited with single sql
table.
and on msdn you can also read more about usage and why we need it (here):
A sequence is a user-defined schema-bound object that generates a
sequence of numeric values according to the specification with which
the sequence was created. The sequence of numeric values is generated
in an ascending or descending order at a defined interval and may
cycle (repeat) as requested. Sequences, unlike identity columns, are
not associated with tables. An application refers to a sequence object
to receive its next value. The relationship between sequences and
tables is controlled by the application. User applications can
reference a sequence object and coordinate the values keys across
multiple rows and tables.
A sequence is created independently of the tables by using the CREATE
SEQUENCE statement. Options enable you to control the increment,
maximum and minimum values, starting point, automatic restarting
capability, and caching to improve performance. For information about
the options, see CREATE SEQUENCE.
Unlike identity column values, which are generated when rows are
inserted, an application can obtain the next sequence number before
inserting the row by calling the NEXT VALUE FOR function. The sequence
number is allocated when NEXT VALUE FOR is called even if the number
is never inserted into a table. The NEXT VALUE FOR function can be
used as the default value for a column in a table definition. Use
sp_sequence_get_range to get a range of multiple sequence numbers at
once.
A sequence can be defined as any integer data type. If the data type
is not specified, a sequence defaults to bigint.

Sequence and identity both used to generate auto number but the major difference is Identity is a table dependant and Sequence is independent from table.
If you have a scenario where you need to maintain an auto number globally (in multiple tables), also you need to restart your interval after particular number and you need to cache it also for performance, here is the place where we need sequence and not identity.

Although sequences provide more flexibility than identity columns, I didn't find they had any performance benefits.
I found performance using identity was consistently 3x faster than using sequence for batch inserts.
I inserted approx 1.5M rows and performance was:
14 seconds for identity
45 seconds for sequence
I inserted the rows into a table which used sequence object via a table default:
NEXT VALUE for <seq> for <col_name>
and also tried specifying sequence value in select statement:
SELECT NEXT VALUE for <seq>, <other columns> from <table>
Both were the same factor slower than the identity method. I used the default cache option for the sequence.
The article referenced in Arion's first link shows performance for row-by-row insert and difference between identity and sequence was 16.6 seconds to 14.3 seconds for 10,000 inserts.
The Caching option has a big impact on performance, but identity is faster for higher volumes (+1M rows)
See this link for an indepth analysis as per utly4life's comment.

I know this is a little old, but wanted to add an observation that bit me.
I switched from identity to sequence to have my indexes in order. I later found out that sequence doesn't transfer with replication. I started getting key violations after I setup replication between two databases since the sequences were not in sync. just something to watch out for before you make a decision.

I find the best use of Sequences is not to replace an identity column but to create a "Order Number" type of field.
In other words, an Order Number is exposed to the end user and may have business rules along with it. You want it to be unique, but just using an Identity Column isn't really correct either.
For example, different order types might require a different sequence, so you might have a sequence for Internet Order, as opposed to In-house orders.
In other words, don't think of a Sequence as simple a replacement for identity, think of it as being useful in cases where an identity does not fit the business requirements.

Recently was bit by something to consider for identity vs sequence. Seems MSFT now suggests sequence if you may want to keep identity without gaps. We had an issue where there were huge gaps in the identity, but based on this statement highlighted would explain our issue that SQL cached the identity and after reboot we lost those numbers.
https://learn.microsoft.com/en-us/sql/t-sql/statements/create-table-transact-sql-identity-property?view=sql-server-2017
Consecutive values after server restart or other failures – SQL Server might cache identity values for performance reasons and some of the assigned values can be lost during a database failure or server restart. This can result in gaps in the identity value upon insert. If gaps are not acceptable then the application should use its own mechanism to generate key values. Using a sequence generator with the NOCACHE option can limit the gaps to transactions that are never committed.

Related

Using Identity or sequence in data warehouse

I'm new to data warehouse, So I try to follow the best practice, mimicking some implementation details from the Microsoft Demo DB WideWorldImportersDW, One of the things that I have noticed is using Sequence as default value for PK over Identity.
Could I ask, If it's preferable to use Sequence over Identity in data warehouse in general and Which one is more convenient especially during ETL process?.
A sequence has more guarantees than an identity column. In particular, each call to a sequence is guaranteed to produce the next value for the sequence.
However, an identity column can have gaps and other inconsistencies. This is all documented here.
Because of the additional guarantees on sequences, I suspect that they are slower. In particular, I suspect that the database cannot preallocate values in batch. That means that in a multi-threaded environments, sequences would impose serialization on transactions, slowing things down.
In general, I see identity used for identifying columns in tables. And although there is probably a performance comparison, I haven't seen one. But I suspect that sequences are a wee bit slower in some circumstances.
Both Sequence and Identity are designed for OLTP tables to enable effective assignment of unique keys in multi-session environment.
Important thing to realize is that in data warehouse environment you often have a different setup and there is only one job that populates a specific table.
In a single user environment you do not need the above features at all and you can simple assign the keys manually starting with max(id) +1 and increment by one for each row.
The general rule of data warehouse is that you should not search for silver bullet recommendation but check the functionality and preformance in your onw test.
If you make some research on SQL Server Identity vs Sequence e.g. here or here you get various result partly prefering the former partly the latter feature.
My recomendation is therefore to perform a test with the manually assigned IDs (i.e. with no overhead) simple to get a baseline for the expectation.
Than repeat it with both identity and sequence - compare and choose.
The sequence in SQL Server was added later and is based on Oracle Sequence, so I would not expect it has some basic problem.
The experience from Oracle tells us, you need to have a large enought cache in the sequence to support effective bulk insert.
In the meantime the identity can also be defined as cached, (IDENTITY_CACHE = { ON | OFF }) so once again, try all three posibilities (sequence, identity, nothing) and choose the best one.
Identity is scoped to a single table, is part of the table definition (DDL) and is reset on a truncate. Identity is unique within the table. Each table has its own identity value when configured and cannot be shared across tables. In general usage, the "next" value is consumed by SQL Server when an Insert occurs on the table.+
Sequence is a first class object, scoped to the database. The "next" value is consumed when the Sequence is used (NEXT VALUE FOR).
Sequences are most effectively used when you need a person readable unique identifier stored across multiple tables. For example a ticketing system that stores ticket types in different tables may use a sequence to ensure no ticket receives the same number, regardless of the table in which it is stored, and that a person can reasonably refer to the number (not GUID).
In data warehousing, the dimension table needs a row identifier unique within the table. In general, the OLTP primary key is not sufficient as it may be duplicated within the dimension table depending on the type of dimension, and you don't want to risk assigning additional context to the OLTP PK as that can cause challenges when the source data changes. The dimension row identifier should only have meaning to the non-measure fact columns associated with it. Fact columns are not joined across different dimensions.++
Since the scope of the dimension table identifier is limited to the dimension table, an identity key is the ideal row identifier. It is simple to create, compact to store, and is meaningless outside the dimension. You won't use the dimension identity on a report. (Really, please don't be that developer.)
+ Its rare you'll need to know the next value without needing to assign to a row. Might be a red flag if you are trying to manipulate the identity value prior to assignment
++ a dimension view may union different tables to feed the OLAP cube, in which case a persistent repeatable key should be generated from the underlying data, usually by concatenating a string literal with each table key in a normalized format.

SEQUENCE number on every INSERT in MS SQL 2012

I am in the situation where multiple user inserting values from application to database via web service, have using stored procedure for validate and insert records.
Requirement is create unique number for each entries but strictly in SEQUENCE only. I added Identity column but its missed some of the number in between e.g. 25,26,27,29,34...
Our requirement is strictly generate next number only like we use for Invoice Number/ Order Number/ Receipt Number etc. 1,2,3,4,5...
I checked below link about Sequence Number but not sure if its surely resolve my issue. Can someone please assist in this.
Sequence Numbers
If you absolutely, positively cannot have gaps, then you need to use a trigger and your own logic. This puts a lot of overhead into inserts, but it is the only guarantee.
Basically, the parts of a database that protect the data get in the way of doing what you want. If a transaction uses a sequence number (or identity) and it is later rolled back, then what happens to the generated number? Well, that is one way that gaps appear.
Of course, you will have to figure out what to do in that case. I would just go for an identity column and work on educating users that gaps are possible. After all, if you don't want gaps on output, then row_number() is available to re-assign numbers.

SQL Server: Return unique generated value for each row

The requirement is to return unique value for each processed row from stored procedure, which will be used like dummy primary key. One solution seems to be using ROW_NUMBER() function. Another one is given here. Perhaps, there can be solutions involving Guid. Can someone recommend me a solution which is performant and reliable?
A random number would not be an option (as you specify unique).
ROW_NUMBER is a bigint taking up 8 bytes of storage per row.
uniqueidentifier is a 16 byte structure and more costly to obtain than a ROW_NUMBER which is simply incremented (SQL's unique identifiers are GUIDs, with NEWID() being slower than NEWSEQUENTIALID() because NEWSEQUENTIALID() will increment from a seed GUID).
In scenarios where you INSERT into a table variable or a temporary table, you can use IDENTITY. Storage size is that of the column data type, which can be any integer type (no bit, no decimal). It will increment from a configurable offset (default 1), in configurable step (default 1).
This seems to make ROW_NUMBER your best fit, it is both fast and reliable.
I would recommend to base your design choice on more than just performance though. On any reasonably configured SQL server installation you will barely notice a difference in speed, and unless you have very constrained resources, storage should not be the bottleneck either. Some rather old benchmarks here.
Be aware that neither will help you to maintain or guarantee a particular order of the returned rows - you still need to ORDER BY your outermost SELECT if you need predicable results.

Postgresql wrong auto-increment for serial

I have a problem on postgresql which I think there is a bug in the postgresql, I wrongly implement something.
There is a table including colmn1(primary key), colmn2(unique), colmn3, ...
After an insertion of a row, if I try another insertion with an existing colmn2 value I am getting a duplicate value error as I expected. But after this unsuccesful try, colmn1's next value is
incremented by 1 although there is no insertion so i am getting rows with id sequences like , 1,2,4,6,9.(3,5,6,7,8 goes for unsuccessful trials).
I need help from the ones who can explain this weird behaviour.
This information may be useful: I used "create unique index on tableName (lower(column1)) " query to set unique constraint.
See the PostgreSQL sequence FAQ:
Sequences are intended for generating unique identifiers — not
necessarily identifiers that are strictly sequential. If two
concurrent database clients both attempt to get a value from a
sequence (using nextval()), each client will get a different sequence
value. If one of those clients subsequently aborts their transaction,
the sequence value that was generated for that client will be unused,
creating a gap in the sequence.
This can't easily be fixed without incurring a significant performance
penalty. For more information, see Elein Mustein's "Gapless Sequences for Primary Keys" in the General Bits Newsletter.
From the manual:
Important: Because sequences are non-transactional, changes made by
setval are not undone if the transaction rolls back.
In other words, it's normal to have gaps. If you don't want gaps, don't use a sequence.

SQLPlus Sequence - multiple tables

I am trying to use Dennis' solution here as an implementation of auto_increment in Oracle database. Say I create one sequence as follows:
CREATE SEQUENCE auto_increment
START WITH 1
INCREMENT BY 1;
If I want auto_increment behavior in multiple tables, can I just use this sequence for all tables? Or do I need a separate sequence per table? That is, will the sequence increment for one table be affected by another table using the sequence?
Yes, the sequence accesses will be affecting each other if you use the same sequence. However the tone of your question makes me think that you expect the sequence to be continuous.
Don't be fooled, sequences are NOT sequential. The only thing that you can be garanteed is that the numbers retrieved are unique, and in an ascending order (in your case)
You can use the same sequence for many tables. It would be unconventional to do so, it would lead to more contention on the sequence, and it would make life a bit more difficult if you needed to reset the sequence value as a result of, say, an export and import between environments but it would work.
Of course, if the sequence gave a value of 1 for table A, it would never give that same value to a trigger defined on B. Since sequences do not generate gap-free sets of values (i.e. you can guarantee that there will be "missing" values in every table no matter how many sequences you create) that shouldn't be a major downside.
Sequences are sequential. However, there are many things that can cause gaps in the sequence e.g rollback, commit (because the sequence generator issues sequences irrespective of commits or rollbacks), and same sequence for multiple tables.