How do I generate a unique increasing integer value in SQL Server? - sql

I have the following problem. An SQL table stores job items and each item has state. Items can change states and also other properties either together or separately. I want to be able to assign each job item a unique increasing integer (64 or more bits) that will only be assigned when I ask to, not updated on each update like timestamp behaves. So I want to be able to do two operations:
update some other fields without changing that integer on the row and
update state and that integer
and when the integer changes it should be greater than all other such integers ever generated for that column (per-table or per-database will of course do as well).
This needs to be scalable so that multiple clients can work with the database without serious penalties.
How can I do that?

Look here http://blogs.msdn.com/b/sqlazure/archive/2010/07/15/10038656.aspx. Should help
But in a nutshell, you need a field declared as follows in your table:
Id bigint PRIMARY KEY IDENTITY (1,1)

Sounds like you need a SEQUENCE.
This is decoupled from your table (unlike an IDENTITY) and can be table or database wide: up to you.
Sequences are supported natively in the forthcoming SQL Server 2012, but until then you can emulate one as per this dba.se question: https://dba.stackexchange.com/q/3307/630

Related

Creating my Custom Unique Key

I have a table in my SQL Server. Currently I am using the identity column to uniquely identify each record but my changing needs required a unique key generated in a certain format (as specified by my client). I have tried to generate the unique key from my application by appending a unique integer (that is incremented on every insert) to the format specified my client is not satisfied with my current solution.
It would be great if I can be directed to a better technique to solve my problem rather then my current solution.
The format is like:
PRN-YEAR-MyAppGeneratedInt
Basically, keep the current identity column. That is the best way for you to identify and manage rows in the table.
If the client needs another unique key, then add it. Presumably, it will be a string (given that it has a "format"). You can possibly create the key as a generated column. Alternatively, you may need to use a trigger to calculate it.
In general, integers are better for identity columns, even if end users never see them. Here are some advantages:
They encode the ordering of row insertion in the database. You can, for instance, get the last inserted row.
They are more efficient for foreign key references (because numbers are fixed-length and generally shorter than strings).
They make it possible to directly address a row, when data needs to be fixed.
You can create a SEQUENCE to serve your purpose which were introduced in SQL Server 2012. A real detailed explanation about SEQUENCE can be found here.
Hope this helps :)
As per you specified in the comments the format let me also give you an example that how you can solve your problem using a sequence:
First create a sequence like:
CREATE SEQUENCE SeqName
AS int
START WITH 1
INCREMENT BY 1
CYCLE
CACHE
Next you can use this sequence to generate your desired unique key in you app program.
Get the next value for sequence "SELECT NEXT VALUE FOR SeqName;"
Create a string using the value like :String key= "PRN"+year+SeqValue;
Finally store this string as your unique key in your Insert statement.
You can write the application code as per you need :)
You could create a Computed Column and just append the identity
('Custom_'+CONVERT(varchar(10),iden))

Confusing t-sql exam answer about sequence or uniqueidentifier

I found a t-sql question and its answer. It is too confusing. I could use a little help.
The question is:
You develop a database application. You create four tables. Each table stores different categories of products. You create a Primary Key field on each table.
You need to ensure that the following requirements are met:
The fields must use the minimum amount of space.
The fields must be an incrementing series of values.
The values must be unique among the four tables.
What should you do?
A. Create a ROWVERSION column.
B. Create a SEQUENCE object that uses the INTEGER data type.
C. Use the INTEGER data type along with IDENTITY
D. Use the UNIQUEIDENTIFIER data type along with NEWSEQUENTIALID()
E. Create a TIMESTAMP column.
The said answer is D. But, I think the more suitable answer is B. Because sequence will use less space than GUID and it satisfies all the requirements.
D is a wrong answer, because NEWSEQUENTIALID doesn't guarantee "an incrementing series of values" (second requirement).
NEWSEQUENTIALID()
Creates a GUID that is greater than any GUID
previously generated by this function on a specified computer since
Windows was started. After restarting Windows, the GUID can start
again from a lower range, but is still globally unique.
I'd say that B (sequence) is the correct answer. At least, you can use a sequence to fulfil all three requirements, if you don't restart/recycle it manually. I think it is the easiest way to meet all three requirements.
Between the choices provided D B is the correct answer, since it meets all requirements:
ROWVERSION is a bad choice for a primary key, as stated in MSDN:
Every time that a row with a rowversion column is modified or inserted, the incremented database rowversion value is inserted in the rowversion column. This property makes a rowversion column a poor candidate for keys, especially primary keys. Any update made to the row changes the rowversion value and, therefore, changes the key value. If the column is in a primary key, the old key value is no longer valid, and foreign keys referencing the old value are no longer valid.
TIMESTAMP is deprecated, as stated in that same page:
The timestamp syntax is deprecated. This feature will be removed in a future version of Microsoft SQL Server. Avoid using this feature in new development work, and plan to modify applications that currently use this feature.
An IDENTITY column does not guarantee uniqueness, unless all it's values are only ever generated automatically (you can use SET IDENTITY_INSERT to insert values manually), nor does it guarantee uniqueness between tables for any value.
A GUID is practically guaranteed to be unique per system, so if a guid is the primary key for all 4 tables it ensures uniqueness for all tables. the one requirement it doesn't fulfill is storage size - It's storage size is quadruple that of int (16 bytes instead of 4).
A SEQUENCE, when is not declared as recycle, guarantee uniqueness, and has the lowest storage size.
The sequence of numeric values is generated in an ascending or descending order at a defined interval and can be configured to restart (cycle) when exhausted.
However,
I would actually probably choose a different option all together - create a base table with a single identity column and link it with a 1:1 relationship with all other categories. then use an instead of insert trigger for all categories tables that will first insert a record to the base table and then use scope_identity() to get the value and insert it as the primary key for the category table.
This will enforce uniqueness as well as make it possible to use a single foreign key reference between the categories and products.
The issue has been discussed extensively in the past, in general:
http://blog.codinghorror.com/primary-keys-ids-versus-guids/
The constraint #3 is why a SEQUENCE could run into issues as there is a higher risk of collision/lowered number of possible rows in each table.

Storing many bits -- Should I use multiple columns or a single bitfield column?

I am designing a User table in my database. I have about 30 or so options for each user that can be either "allow" or "disallow".
My question is should I store these as 30 bit columns or should I use a single int column to store them and parse out each bit in my application?
Also, our database is SQL Server 2008 and 2005 (depending on environment)
I just tried creating two tables, one with a single int column and one with 30 bit columns then added a row to each and looked at them with SQL Server Internals Viewer
CREATE TABLE T_INT(X INT DEFAULT 1073741823);
CREATE TABLE T_BIT(
X1 BIT DEFAULT 1,
/*Other columns omitted for brevity*/
X30 BIT DEFAULT 1
);
INSERT INTO T_INT DEFAULT VALUES;
INSERT INTO T_BIT DEFAULT VALUES;
Single row for table with 30 Bit Columns
Single row for table with one int Column
From a storage point of view SQL Server combines the bit columns and the data is stored in exactly the same amount of space (yellow). You do end up losing 3 bytes a row for the NULL bitmap (purple) though as the length of this is directly proportional to the number of columns (irrespective of whether they allow nulls)
Key for fields (for the int version, colour coding is the same for the bit version)
Neither -- unless you have a major space issue or compatibility requirement with some other system, think about how this will prevent you from optimizing your queries and clearly understanding what each bit represents.
You can have more than a thousand columns in a table, or you can have a child table for user settings. Why limit yourself to 30 bits that you need to parse in your app? Imagine what kind of changes you'll need to make to the app if several of these settings are deprecated or a couple of new ones introduced.
I think it would be easier to allow for future expansion if you have columns for each value. If you add another option in the future (which is likely for most applications like this), then it may affect all your other code since you would need to reparse your int column to account for the new bits.
If you combine into a bitflag field, it's going to be difficult to see what is set if you're looking at the raw data. I'd go with individual columns for each value, or store the options in their own table.
I agree your design should be properly normalized, three tables User and User setting, and a bridge table:
User:
Userid int
UserName varchar(X)
UserSetting:
Settingid int
SettingName varchar(X)
UserUserSetting:
Userid int
SettingId int
IsSet bit
There would be FK's between the bridge table UserUserSetting and the UserSetting and User table and a unique contr constraint of t UserId, SettingId in UserUserSetting

Database-wide unique-yet-simple identifiers in SQL Server

First, I'm aware of this question, and the suggestion (using GUID) doesn't apply in my situation.
I want simple UIDs so that my users can easily communicate this information over the phone :
Hello, I've got a problem with order
1584
as opposed to
hello, I've got a problem with order
4daz33-d4gerz384867-8234878-14
I want those to be unique (database wide) because I have a few different kind of 'objects' ... there are order IDs, and delivery IDs, and billing-IDs and since there's no one-to-one relationship between those, I have no way to guess what kind of object an ID is referring to.
With database-wide unique IDs, I can immediately tell what object my customer is referring to. My user can just input an ID in a search tool, and I save him the extra-click to further refine what is looking for.
My current idea is to use identity columns with different seeds 1, 2, 3, etc, and an increment value of 100.
This raises a few question though :
What if I eventually get more than 100 object types? granted I could use 1000 or 10000, but something that doesn't scale well "smells"
Is there a possibility the seed is "lost" (during a replication, a database problem, etc?)
more generally, are there other issues I should be aware of?
is it possible to use an non integer (I currently use bigints) as an identity columns, so that I can prefix the ID with something representing the object type? (for example a varchar column)
would it be a good idea to user a "master table" containing only an identity column, and maybe the object type, so that I can just insert a row in it whenever a need a new idea. I feel like it might be a bit overkill, and I'm afraid it would complexify all my insertion requests. Plus the fact that I won't be able to determine an object type without looking at the database
are there other clever ways to address my problem?
Why not use identities on all the tables, but any time you present it to the user, simply tack on a single char for the type? e.g. O1234 is an order, D123213 is a delivery, etc.? That way you don't have to engineer some crazy scheme...
Handle it at the user interface--add a prefix letter (or letters) onto the ID number when reporting it to the users. So o472 would be an order, b531 would be a bill, and so on. People are quite comfortable mixing letters and digits when giving "numbers" over the phone, and are more accurate than with straight digits.
You could use an autoincrement column to generate the unique id. Then have a computed column which takes the value of this column and prepends it with a fixed identifier that reflects the entity type, for example OR1542 and DL1542, would represent order #1542 and delivery #1542, respectively. Your prefix could be extended as much as you want and the format could be arranged to help distiguish between items with the same autoincrement value, say OR011542 and DL021542, with the prefixes being OR01 and DL02.
I would implement by defining a generic root table. For lack of a better name call it Entity. The Entity table should have at a minimum a single Identity column on it. You could also include other fields that are common accross all your objects or even meta data that tells you this row is an order for example.
Each of your actual Order, Delivery...tables will have a FK reference back to the Entity table. This will give you a single unique ID column
Using the seeds in my opinion is a bad idea, and one that could lead to problems.
Edit
Some of the problems you mentioned already. I also see this being a pain to track and ensure you setup all new entities correctly. Imagine a developer updating the system two years from now.
After I wrote this answer I had thought a but more about why your doing this, and I came to the same conclusion that Matt did.
MS's intentional programing project had a GUID-to-word system that gave pronounceable names from random ID's
Why not a simple Base36 representation of a bigint? http://en.wikipedia.org/wiki/Base_36
We faced a similar problem on a project. We solved it by first creating a simple table that only has one row: a BIGINT set as auto-increment identity.
And we created an sproc that inserts a new row in that table, using default values and inside a transaction. It then stores the SCOPE_IDENTITY in a variable, rolls back the transaction and then returns the stored SCOPE_IDENTITY.
This gives us a unique ID inside the database without filling up a table.
If you want to know what kind of object the ID is referring to, I'd lose the transaction rollback and also store the type of object along side the ID. That way findout out what kind of object the Id is referring to is only one select (or inner join) away.
I use a high/low algorithm for this. I can't find a description for this online though. Must blog about it.
In my database, I have an ID table with an counter field. This is the high part. In my application, I have a counter that goes from 0 to 99. This is the low part. The generated key is 100 * high + low.
To get a key, I do the following
initially high = -1
initially low = 0
method GetNewKey()
begin
if high = -1 then
high = GetNewHighFromDatabase
newkey = 100 * high + low.
Inc low
If low = 100 then
low = 0
high = -1
return newKey
end
The real code is more complicated with locks etc but that is the general gist.
There are a number of ways of getting the high value from the database including auto inc keys, generators etc. The best way depends on the db you are using.
This algorithm gives simple keys while avoiding most the db hit of looking up a new key every time. In testing, I found it had similar performance to guids and vastly better performance than retrieving an auto inc key every time.
You could create a master UniqueObject table with your identity and a subtype field. Subtables (Orders, Users, etc.) would have a FK to UniqueObject. INSTEAD OF INSERT triggers should keep the pain to a minimum.
Maybe an itemType-year-week-orderNumberThisWeek variant?
o2009-22-93402
Such identifier can consist of several database column values and simply formatted into a form of an identifier by the software.
I had a similar situation with a project.
My solution: By default, users only see the first 7 characters of the GUID.
It's sufficiently random that collisions are extremely unlikely (1 in 268 million), and it's efficient for speaking and typing.
Internally, of course, I'm using the entire GUID.

identity column in Sql server

Why does Sql server doesn't allow more than one IDENTITY column in a table?? Any specific reasons.
Why would you need it? SQL Server keeps track of a single value (current identity value) for each table with IDENTITY column so it can have just one identity column per table.
An Identity column is a column ( also known as a field ) in a database table that :-
Uniquely identifies every row in the table
Is made up of values generated by the database
This is much like an AutoNumber field in Microsoft Access or a sequence in Oracle.
An identity column differs from a primary key in that its values are managed by the server and ( except in rare cases ) can't be modified. In many cases an identity column is used as a primary key, however this is not always the case.
SQL server uses the identity column as the key value to refer to a particular row. So only a single identity column can be created. Also if no identity columns are explicitly stated, Sql server internally stores a separate column which contains key value for each row. As stated if you want more than one column to be having unique value, you can make use of UNIQUE keyword.
The SQL Server stores the identity in an internal table, using the id of the table as it's key. So it's impossible for the SQL Server to have more than one Identity column per table.
Because MS realized that better than 80% of users would only want one auto-increment column per table and the work-around to have a second (or more) is simple enough i.e. create an IDENTITY with seed = 1, increment = 1 then a calculated column multiplying the auto-generated value by a factor to change the increment and adding an offset to change the seed.
Yes , Sequences allow more than one identity like columns in atable , but there are some issues here . In a typical development scenario i have seen developers manually inserting valid values in a column (which is suppose to be inserted through sequence) . Later on when a sequence try inserting value in to the table , it may fail due to unique key violation.
Also , in a multi developer / multi vendor scenario, developers might use the same sequence for more than one table (as sequences are not linked to tables) . This might lead to missing values in one of the table . ie tableA might get the value 1 while tableB might use value 2 and tableA will get 3. This means that tableA will have 1 and 3 (missing 2).
Apart from this , there is another scenario where you have a table which is truncated every day . Since Sequences are not having any link with table , the truncated table will continue to use the Seq.NextVal again (unless you manually reset the sequence) leading to missing values or even more dangerous arthmetic overflow error after sometime.
Owing to above reason , i feel that both Oracle sequences and SQL server identity column are good for their purposes. I would prefer oracle implementing the concept of Identity column and SQL Server implementing the sequence concept so that developers can implement either of the two as per their requirement.
The whole purpose of an identity column is that it will contain a unique value for each row in the table. So why would you need more than one of them in any given table?
Perhaps you need to clarify your question, if you have a real need for more than one.
An identity column is used to uniquely identify a single row of a table. If you want other columns to be unique, you can create a UNIQUE index for each "identity" column that you may need.
I've always seen this as an arbitrary and bad limitation for SQL Server. Yes, you only want one identity column to actually identify a row, but there are valid reasons why you would want the database to auto-generate a number for more than one field in the database.
That's the nice thing about sequences in Oracle. They're not tied to a table. You can use several different sequences to populate as many fields as you like in the same table. You could also have more than one table share the same sequence, although that's probably a really bad decision. But the point is you could. It's more granular and gives you more flexibility.
The bad thing about sequences is that you have to write code to actually increment them, whether it's in your insert statement or in an on-insert trigger on the table. The nice thing about SQL Server identity is that all you have to do is change a property or add a keyword to your table creation and you're done.