When should I use CYCLE in a sequence? - sql

I'm using sequences in a PostgreSQL database to insert rows into tables.
When creating the sequences I have never used the CYCLE option on them. I mean they can generate pretty big numbers (in the order of 2^63 as far as I remeber) and I don't really see why I would like a sequence to go back to zero. So my question is:
When should I use CYCLE while creating a sequence?
Do you have an example where it makes sense?

It seems a sequence can use CYCLE for other purposes rather than for primary key generation.
This is, in scenarios where the uniqueness of its value is not required; actually is quite the opposite, when the values are expected to cycle back and repeat themselves after some time.
For example:
When generating numbers that must return to the initial value and repeat themselves at some point, for any reason (e.g. implementing a "Bingo" game).
When the sequence is a temporary identifier that will last for a short period of time and will be unique during its life.
When the field is small -- or can accept a limited number of values -- and it doesn't matter if they repeat themselves.
When there is another field in the entity that will identify it, and the sequence value is used for something else.
When an entity has a composite unique key and the sequence value is only a part of it.
When using the sequence value to generate uniform distribution of values on a big set, though this is hardly a random assignation of values.
Any other cyclic number generation.

Related

Using Identity or sequence in data warehouse

I'm new to data warehouse, So I try to follow the best practice, mimicking some implementation details from the Microsoft Demo DB WideWorldImportersDW, One of the things that I have noticed is using Sequence as default value for PK over Identity.
Could I ask, If it's preferable to use Sequence over Identity in data warehouse in general and Which one is more convenient especially during ETL process?.
A sequence has more guarantees than an identity column. In particular, each call to a sequence is guaranteed to produce the next value for the sequence.
However, an identity column can have gaps and other inconsistencies. This is all documented here.
Because of the additional guarantees on sequences, I suspect that they are slower. In particular, I suspect that the database cannot preallocate values in batch. That means that in a multi-threaded environments, sequences would impose serialization on transactions, slowing things down.
In general, I see identity used for identifying columns in tables. And although there is probably a performance comparison, I haven't seen one. But I suspect that sequences are a wee bit slower in some circumstances.
Both Sequence and Identity are designed for OLTP tables to enable effective assignment of unique keys in multi-session environment.
Important thing to realize is that in data warehouse environment you often have a different setup and there is only one job that populates a specific table.
In a single user environment you do not need the above features at all and you can simple assign the keys manually starting with max(id) +1 and increment by one for each row.
The general rule of data warehouse is that you should not search for silver bullet recommendation but check the functionality and preformance in your onw test.
If you make some research on SQL Server Identity vs Sequence e.g. here or here you get various result partly prefering the former partly the latter feature.
My recomendation is therefore to perform a test with the manually assigned IDs (i.e. with no overhead) simple to get a baseline for the expectation.
Than repeat it with both identity and sequence - compare and choose.
The sequence in SQL Server was added later and is based on Oracle Sequence, so I would not expect it has some basic problem.
The experience from Oracle tells us, you need to have a large enought cache in the sequence to support effective bulk insert.
In the meantime the identity can also be defined as cached, (IDENTITY_CACHE = { ON | OFF }) so once again, try all three posibilities (sequence, identity, nothing) and choose the best one.
Identity is scoped to a single table, is part of the table definition (DDL) and is reset on a truncate. Identity is unique within the table. Each table has its own identity value when configured and cannot be shared across tables. In general usage, the "next" value is consumed by SQL Server when an Insert occurs on the table.+
Sequence is a first class object, scoped to the database. The "next" value is consumed when the Sequence is used (NEXT VALUE FOR).
Sequences are most effectively used when you need a person readable unique identifier stored across multiple tables. For example a ticketing system that stores ticket types in different tables may use a sequence to ensure no ticket receives the same number, regardless of the table in which it is stored, and that a person can reasonably refer to the number (not GUID).
In data warehousing, the dimension table needs a row identifier unique within the table. In general, the OLTP primary key is not sufficient as it may be duplicated within the dimension table depending on the type of dimension, and you don't want to risk assigning additional context to the OLTP PK as that can cause challenges when the source data changes. The dimension row identifier should only have meaning to the non-measure fact columns associated with it. Fact columns are not joined across different dimensions.++
Since the scope of the dimension table identifier is limited to the dimension table, an identity key is the ideal row identifier. It is simple to create, compact to store, and is meaningless outside the dimension. You won't use the dimension identity on a report. (Really, please don't be that developer.)
+ Its rare you'll need to know the next value without needing to assign to a row. Might be a red flag if you are trying to manipulate the identity value prior to assignment
++ a dimension view may union different tables to feed the OLAP cube, in which case a persistent repeatable key should be generated from the underlying data, usually by concatenating a string literal with each table key in a normalized format.

DB2 v10 zos : identify free index values

My organisation has hundreds of DB2 tables that each have a randomly generated unique integer index. The random values are generated by either COBOL CICS mainframe programs or Java distributed applications. The normal approach taken is to randomly generate an integer value (only positive values are employed), then attempt to insert the data row, retrying when a duplicate index value has already been persisted. I would like to improve the performance of this approach and I'm considering trying to identify integer values that have not been generated and persisted to each table, this would mean we don't ever need to retry. We would know our insert would work. Does db2 have a function that can return unused index values?
The short answer is no.
The slightly longer answer is to point out that, if such a function existed, in your case on the first insert into one of your tables the size of the result set it would return would be 2,147,483,647 (positive) integers. At 4 bytes each, that would be 8,589,934,588 bytes.
Given the constraints of your existing system, what you're doing is probably the best that can be done. If the performance of retrying is unacceptable, I'm afraid redesigning your key scheme is the next step.
I think that's a question to ask: Is this scheme of using random numbers for unique keys causing a performance problem? As the tables fill up the key space you will see more and more retries, but you have a relatively large key space. If you're seeing large numbers of retries maybe your random numbers are less random than you'd like.
just a thought but you could use one sequence for a group of tables. In this way, the value will still be random (because you wouldn't know which it the next table you perform an insert to) but based on a specific sequance wich mean that most of the time you won't get a retry because the number keep ascending. that same Sequance can loop after a few hunderd million inserts and start to "fill in the blanks".
as far as other key ideas are concerned,You could also try and use a diffrent key, maybe one based on Timestamp or Rowid. that will still be random but not repetitive.

Postgresql wrong auto-increment for serial

I have a problem on postgresql which I think there is a bug in the postgresql, I wrongly implement something.
There is a table including colmn1(primary key), colmn2(unique), colmn3, ...
After an insertion of a row, if I try another insertion with an existing colmn2 value I am getting a duplicate value error as I expected. But after this unsuccesful try, colmn1's next value is
incremented by 1 although there is no insertion so i am getting rows with id sequences like , 1,2,4,6,9.(3,5,6,7,8 goes for unsuccessful trials).
I need help from the ones who can explain this weird behaviour.
This information may be useful: I used "create unique index on tableName (lower(column1)) " query to set unique constraint.
See the PostgreSQL sequence FAQ:
Sequences are intended for generating unique identifiers — not
necessarily identifiers that are strictly sequential. If two
concurrent database clients both attempt to get a value from a
sequence (using nextval()), each client will get a different sequence
value. If one of those clients subsequently aborts their transaction,
the sequence value that was generated for that client will be unused,
creating a gap in the sequence.
This can't easily be fixed without incurring a significant performance
penalty. For more information, see Elein Mustein's "Gapless Sequences for Primary Keys" in the General Bits Newsletter.
From the manual:
Important: Because sequences are non-transactional, changes made by
setval are not undone if the transaction rolls back.
In other words, it's normal to have gaps. If you don't want gaps, don't use a sequence.

SQLPlus Sequence - multiple tables

I am trying to use Dennis' solution here as an implementation of auto_increment in Oracle database. Say I create one sequence as follows:
CREATE SEQUENCE auto_increment
START WITH 1
INCREMENT BY 1;
If I want auto_increment behavior in multiple tables, can I just use this sequence for all tables? Or do I need a separate sequence per table? That is, will the sequence increment for one table be affected by another table using the sequence?
Yes, the sequence accesses will be affecting each other if you use the same sequence. However the tone of your question makes me think that you expect the sequence to be continuous.
Don't be fooled, sequences are NOT sequential. The only thing that you can be garanteed is that the numbers retrieved are unique, and in an ascending order (in your case)
You can use the same sequence for many tables. It would be unconventional to do so, it would lead to more contention on the sequence, and it would make life a bit more difficult if you needed to reset the sequence value as a result of, say, an export and import between environments but it would work.
Of course, if the sequence gave a value of 1 for table A, it would never give that same value to a trigger defined on B. Since sequences do not generate gap-free sets of values (i.e. you can guarantee that there will be "missing" values in every table no matter how many sequences you create) that shouldn't be a major downside.
Sequences are sequential. However, there are many things that can cause gaps in the sequence e.g rollback, commit (because the sequence generator issues sequences irrespective of commits or rollbacks), and same sequence for multiple tables.

Sequence vs identity

SQL Server 2012 introduced Sequence as a new feature, same as in Oracle and Postgres. Where sequences are preferred over identities? And why do we need sequences?
I think you will find your answer here
Using the identity attribute for a column, you can easily generate
auto-incrementing numbers (which as often used as a primary key). With
Sequence, it will be a different object which you can attach to a
table column while inserting. Unlike identity, the next number for the
column value will be retrieved from memory rather than from the disk –
this makes Sequence significantly faster than Identity. We will see
this in coming examples.
And here:
Sequences: Sequences have been requested by the SQL Server community
for years, and it's included in this release. Sequence is a user
defined object that generates a sequence of a number. Here is an
example using Sequence.
and here as well:
A SQL Server sequence object generates sequence of numbers just like
an identity column in sql tables. But the advantage of sequence
numbers is the sequence number object is not limited with single sql
table.
and on msdn you can also read more about usage and why we need it (here):
A sequence is a user-defined schema-bound object that generates a
sequence of numeric values according to the specification with which
the sequence was created. The sequence of numeric values is generated
in an ascending or descending order at a defined interval and may
cycle (repeat) as requested. Sequences, unlike identity columns, are
not associated with tables. An application refers to a sequence object
to receive its next value. The relationship between sequences and
tables is controlled by the application. User applications can
reference a sequence object and coordinate the values keys across
multiple rows and tables.
A sequence is created independently of the tables by using the CREATE
SEQUENCE statement. Options enable you to control the increment,
maximum and minimum values, starting point, automatic restarting
capability, and caching to improve performance. For information about
the options, see CREATE SEQUENCE.
Unlike identity column values, which are generated when rows are
inserted, an application can obtain the next sequence number before
inserting the row by calling the NEXT VALUE FOR function. The sequence
number is allocated when NEXT VALUE FOR is called even if the number
is never inserted into a table. The NEXT VALUE FOR function can be
used as the default value for a column in a table definition. Use
sp_sequence_get_range to get a range of multiple sequence numbers at
once.
A sequence can be defined as any integer data type. If the data type
is not specified, a sequence defaults to bigint.
Sequence and identity both used to generate auto number but the major difference is Identity is a table dependant and Sequence is independent from table.
If you have a scenario where you need to maintain an auto number globally (in multiple tables), also you need to restart your interval after particular number and you need to cache it also for performance, here is the place where we need sequence and not identity.
Although sequences provide more flexibility than identity columns, I didn't find they had any performance benefits.
I found performance using identity was consistently 3x faster than using sequence for batch inserts.
I inserted approx 1.5M rows and performance was:
14 seconds for identity
45 seconds for sequence
I inserted the rows into a table which used sequence object via a table default:
NEXT VALUE for <seq> for <col_name>
and also tried specifying sequence value in select statement:
SELECT NEXT VALUE for <seq>, <other columns> from <table>
Both were the same factor slower than the identity method. I used the default cache option for the sequence.
The article referenced in Arion's first link shows performance for row-by-row insert and difference between identity and sequence was 16.6 seconds to 14.3 seconds for 10,000 inserts.
The Caching option has a big impact on performance, but identity is faster for higher volumes (+1M rows)
See this link for an indepth analysis as per utly4life's comment.
I know this is a little old, but wanted to add an observation that bit me.
I switched from identity to sequence to have my indexes in order. I later found out that sequence doesn't transfer with replication. I started getting key violations after I setup replication between two databases since the sequences were not in sync. just something to watch out for before you make a decision.
I find the best use of Sequences is not to replace an identity column but to create a "Order Number" type of field.
In other words, an Order Number is exposed to the end user and may have business rules along with it. You want it to be unique, but just using an Identity Column isn't really correct either.
For example, different order types might require a different sequence, so you might have a sequence for Internet Order, as opposed to In-house orders.
In other words, don't think of a Sequence as simple a replacement for identity, think of it as being useful in cases where an identity does not fit the business requirements.
Recently was bit by something to consider for identity vs sequence. Seems MSFT now suggests sequence if you may want to keep identity without gaps. We had an issue where there were huge gaps in the identity, but based on this statement highlighted would explain our issue that SQL cached the identity and after reboot we lost those numbers.
https://learn.microsoft.com/en-us/sql/t-sql/statements/create-table-transact-sql-identity-property?view=sql-server-2017
Consecutive values after server restart or other failures – SQL Server might cache identity values for performance reasons and some of the assigned values can be lost during a database failure or server restart. This can result in gaps in the identity value upon insert. If gaps are not acceptable then the application should use its own mechanism to generate key values. Using a sequence generator with the NOCACHE option can limit the gaps to transactions that are never committed.