Why manage your own Sql Server ID column?

Why manage your own Sql Server ID column? - sql

I recently started a new job and I am perplexed as to why the tables were designed this way. (in many databases) Is there someone who can give me a logical explanation?
Each table has a primary key/Id field. Example: EmployeeId (Integer)
Then to get the next id we actually need to query and update a table that manages all the keys for every table.
SELECT NextId
FROM dbo.NextID
Where TableName = 'Employees'
This makes life difficult, as you can imagine. The person who designed this mess has left, and the others just buy into this is the way you do things.
Is there some design flaw in MS SQL Identity columns? I don't get it? Any ideas?
Thanks for your input

The features/limitations of IDENTITY columns make them useful for generating surrogate keys in many scenarios but they are not ideal for all purposes - such as creating "meaningful", managed and/or potentially updateable identifers usable by the business; or for data integration or replication. Microsoft introduced the SEQUENCE feature as a more flexible alternative to IDENTITY in SQL Server 2008. In code written for earlier versions where sequences weren't available it isn't unusual to see the kind of scheme that you have described.

My guess is the person wanted no gaps in the ID column therefore he/she implemented this unnecessary process of getting next available id.
Maybe your Application depends on sequential Ids, either way it is not the way to go, Your application should not be dependant on sequential values. and no doubt Identity value is the way to go for this kind of requirement.
Issue with Identity Column
Yes there was an active bug in Identity column in Sql Server 2012. Identity Column taking big jumps when creating new identity values. Still it should not matter.

Related

SQL Identity Specification is out of order

I have the identity specification set on the "EventID" Column. It has been working great for quite a few years.
The other day I noticed the internal website was wrong so I went to the DB and the column is out of order. It goes from 21,22,25,24,26. There is very little to no information in entry# 25. The specification is set to "Yes", Is Identity is set to Yes, Identity Increment is 1 and Identity Seed is 1.I have attempted to delete the wrong entry but it will not let me delete it.
How do I fix this? Remove the specification? Re-create the table? It has been a while since I have been actively working n SQL. Suggestions please and thank you!

What could you possibly mean by "the column is out of order"? SQL tables represent unordered sets. The only ordering is provided by data in (other) columns in the row.
The identity column in SQL Server captures the insertion order of the data. By definition, it cannot be out-of-order with the insertion order (well, we might disagree on the ordering of two insertions that happen "at the same time", but I don't think that is the gist of your question).
Note: This assumes that identity has not bee overridden. It is, of course, possible to allow someone to specify an identity value, allowing gaps and "out-of-order" values. Your question suggests that this is not happening either.

What Gordon Linoff above is getting at is that there is no natural order to an SQL table. Client tools will show you the data in a particular order, but the table itself has no order (outside the physical order of storage, not relevant to this problem). If you don't specify an ORDER BY, you'll get the data in some arbitrary order. I can't remember how SQL decides on that order, because no-one ever relies on it.
Your IDENTITY column must be "out of order" relative to some other ordering: this is just the order you're viewing the data in (perhaps ORDER BY SomeDateEnteredColumn?).
The IDENTITY value being relatively out of order shouldn't matter at all. If an application relies on it to express a newer-older relation between rows, that's a bad idea.
If you can't delete the row with ID=25, it's probably because of a foreign key: a related row in another table connects to this row.

Bigquery - create surrogate keys on migrated data

We are doing a migration from AWS Redshift to GCP BigQuery.
Problem statement:
We have a Redshift table that uses the IDENTITY column functionality to issue an internal EDW surrogate key (PK) for natural/business keys. These natural keys are from at least 20 different source systems for customers. We need a method to identify them in case natural keys are somehow duplicated (because we have so many source systems). In BigQuery, the functionality of the Redshift IDENTITY column does not exist. How can I replicate this in BQ?
We cant use GENERATE_UUID() because all our downstream clients have been using a BIGINT for the last 4 years. All history is based on BIGINT and too much would need to change for a VARCHAR.
Does anyone have any ideas, recommendations or suggestions?
Some considerations I have made:
1. load the data into Spark and keep it in memory and use scala or python functions to issue the surrogate key.
2. use a nosql data store (but this does not seem likely as a use case).
Any ideas are welcome!

In these cases, the idea is generally to identify an injective/bijective function, which can map to some unique space.
How about you try something like: SELECT UNIX_MICROS(current_timestamp()) + x as identity where x is a numeral that you can somehow manage (using case statements or if conditions) based on the business name or something?
You can also eliminate x from this formula if you intend to process things linearly in some order, like one business entity at a time.
Hope it helps.

How does the identity column in sql server technically work?

In my search for a method to create an unique increment series of numbers. I considered the identity column. I couldn't use it for my purpose, but it lead to my current question;
How does the identity technically work in Sql server?
I am not looking for an answer on how to use it, I know it increments the number on each insert, leaving gaps on a delete. But I couldn't find any documentation how it generates it's number. Is this a table in sql server? Does it use a row lock? Or some other kind of lock? Is it locking? How does it prevent duplicate numbers?

There's some useful information on the internals in these questions/articles:
https://dba.stackexchange.com/questions/1635/why-are-denali-sequences-supposed-to-perform-better-than-identity-columns
http://www.sqlmag.com/article/sql-server/Sequences-Part-2-129205
They are talking about SEQUENCEs which were introduced in SQL 2012, but includes some info on IDENTITY, the differences, and the implementations. Not sure if that answers fully, but IMHO are worth a read.

SQL Server and gaps in an Identity column

I just noticed that if I have an identity column in a table, when I insert new rows SQL Server 2008 is automatically filling up the sequence if there are discontinuity. I mean, if in my identity column I have 1,2,5,6 if I insert other two rows in the table the system puts automatically 3,7 in the identity column.
Do you know how to control this behavior?
THANKS

That is the defined and documented SQL Server behavior, and there's really not much you can do about changing it. What did you want to change about it??
IDENTITY columns will guarantee unique, ever-increasing ID's (as long as you don't mess around with them) - they don't guarantee anything else.
SQL Server will not go through the trouble of spotting "gaps" in your sequence and filling them up. I don't think that would be a good idea, anyway - what if you did have a record with ID=3, and then deleted it? Do you really want a next record to suddenly "recycle" that ID?? Not a good idea, in my opinion.

What is the best way to generate an ID for a SQL Insert?

What is the best, DBMS-independent way of generating an ID number that will be used immediately in an INSERT statement, keeping the IDs roughly in sequence?

DBMS independent? That's a problem. The two most common methods are auto incrementing columns, and sequences, and most DBMSes do one or the other but not both. So the database independent way is to have another table with one column with one value that you lock, select, update, and unlock.
Usually I say "to hell with DBMS independence" and do it with sequences in PostgreSQL, or autoincrement columns in MySQL. For my purposes, supporting both is better than trying to find out one way that works everywhere.

If you can create a Globally Unique Identifier (GUID) in your chosen programming language - consider that as your id.
They are harder to work with when troubleshooting (it is much easier to type in a where condition that is an INT) but there are also some advantages. By assigning the GUID as your key locally, you can easily build parent-child record relationships without first having to save the parent to the database and retrieve the id. And since the GUID, by definition, is unique, you don't have to worry about incrementing your key on the server.

There is auto increment or sequence
What is the point of this, that is the least of your worries?
How will you handle SQL itself?
MySQL has Limit,
SQL Server has Top,
Oracle has Rank
Then there are a million other things like triggers, alter table syntax etc etc

Yep, the obvious ways in raw SQL (and in my order of preference) are a) sequences b) auto-increment fields. The better, more modern, more DBMS-independent way is to not touch SQL at all, but to use a (good) ORM.

There's no universal way to do this. If there were, everyone would use it. SQL by definition abhors the idea - it's an antipattern for set-based logic (although a useful one, in many real-world cases).
The biggest problem you'd have trying to interpose an identity value from elsewhere is when a SQL statement involves several records, and several values must be generated simultaneously.
If you need it, then make it part of your selection requirements for a database to use with your application. Any serious DBMS product will provide its own mechanism to use, and it's easy enough to code around the differences in DML. The variations are pretty much all in the DDL.

I'd always go for the DB specific solution, but if you really have to the usual way of doing this is to implement your own sequence. Your RDBMS has to support transactions.
You create a sequence table which contains an int column and seed this with the first number, your transaction logic then looks something like this
begin transaction
update tblSeq set intID = intID + 1
select #myID = intID from tblSeq
inset into tblData (intID, ...) values (#myID, ...)
end transaction
The transaction forces a write lock such that the then next queued insert cannot update the tblSeq value before the record has been inserted into tblData. So long as all inserts go though this transaction then your generated ID is in sequence.

Use an auto-incrementing id column.

Is there really a reason that they have to be in sequence? If you're just using it as an ID, then you should just be able to use part of a UUID or the first couple digits of md5(now()).

You could take the time and massage it. It'd be the equivalent of something like
DateTime.Now.Ticks
So it be something like YYYYMMDDHHMMSSSS

It may be of a bit lateral approach, but a good ORM-type library will probably be able to at least hide the differences. For example, in Ruby there is ActiveRecord (commonly used in but not exclusively tied to the Ruby the Rails web framework) which has Migrations. Within a table definition, which is declared in platform-agnostic code, implementation details such as datatypes, sequential id generation and index creation are pushed down below your vision.
I have transparently developed a schema on SQLite, then implemented it on MS SQL Server and later ported to Oracle. Without ever changing the code that generates my schema definition.
As I say, it may not be what you're looking for, but the easiest way to encapsulate what varies is to use a library that has already done the encapsulation for you.

With only SQL, following could be one to the approaches:
Create a table to contain the starting id for your needs
When the application is deployed for the first time, the application should read the value in its context.
Thereafter, increment id (in thread-safe fashion) as required
3.1 Write the id to the database (in thread-safe fashion) which always keeps updated value
3.2 Don't write it to the database, just keep incrementing in the memory (thread-safe manner)
If for any reason server is going down, write the current id value to the database
When the server is up again it will pick from where it left, the last time.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Why manage your own Sql Server ID column? - sql

Related

SQL Identity Specification is out of order

Bigquery - create surrogate keys on migrated data

How does the identity column in sql server technically work?

SQL Server and gaps in an Identity column

What is the best way to generate an ID for a SQL Insert?

Categories

Resources