I have the identity specification set on the "EventID" Column. It has been working great for quite a few years.
The other day I noticed the internal website was wrong so I went to the DB and the column is out of order. It goes from 21,22,25,24,26. There is very little to no information in entry# 25. The specification is set to "Yes", Is Identity is set to Yes, Identity Increment is 1 and Identity Seed is 1.I have attempted to delete the wrong entry but it will not let me delete it.
How do I fix this? Remove the specification? Re-create the table? It has been a while since I have been actively working n SQL. Suggestions please and thank you!
What could you possibly mean by "the column is out of order"? SQL tables represent unordered sets. The only ordering is provided by data in (other) columns in the row.
The identity column in SQL Server captures the insertion order of the data. By definition, it cannot be out-of-order with the insertion order (well, we might disagree on the ordering of two insertions that happen "at the same time", but I don't think that is the gist of your question).
Note: This assumes that identity has not bee overridden. It is, of course, possible to allow someone to specify an identity value, allowing gaps and "out-of-order" values. Your question suggests that this is not happening either.
What Gordon Linoff above is getting at is that there is no natural order to an SQL table. Client tools will show you the data in a particular order, but the table itself has no order (outside the physical order of storage, not relevant to this problem). If you don't specify an ORDER BY, you'll get the data in some arbitrary order. I can't remember how SQL decides on that order, because no-one ever relies on it.
Your IDENTITY column must be "out of order" relative to some other ordering: this is just the order you're viewing the data in (perhaps ORDER BY SomeDateEnteredColumn?).
The IDENTITY value being relatively out of order shouldn't matter at all. If an application relies on it to express a newer-older relation between rows, that's a bad idea.
If you can't delete the row with ID=25, it's probably because of a foreign key: a related row in another table connects to this row.
Related
SQL Server databases have arbitrary behavior regarding the ordering of rows in a result set, unless order is specified using ORDER BY (see reference).
There's no easy way of determining what happens in the case where ORDER BY is not specified, but in practice it seems like it can often coincide with an existing index or primary key. Thus if there is a case where an ORDER BY on the primary key is missing, that bug can easily go undetected.
Is there a simple way of preventing this, such as forcing the database to randomize these results during testing as if ORDER BY NEWID() had been added to every data access?
Suppose, if following rows are inserted in chronological order into a table:
row1, row2, row3, row4, ..., row1000, row1001.
After a while, we delete/remove the latest row1001.
As in this post: How to get Top 5 records in SqLite?
If the below command is run:
SELECT * FROM <table> LIMIT 1;
Will it assuredly provide the "row1000"?
If no, then is there any efficient way to get the latest row(s)
without traversing through all the rows? -- i.e. without using
combination of ORDER BY and DESC.
[Note: For now I am using "SQLite", but it will be interesting for me to know about SQL in general as well.]
You're misunderstanding how SQL works. You're thinking row-by-row which is wrong. SQL does not "traverse rows" as per your concern; it operates on data as "sets".
Others have pointed out that relational database cannot be assumed to have any particular ordering, so you must use ORDER BY to explicitly specify ordering.
However (not mentioned yet is that), in order to ensure it performs efficiently, you need to create an appropriate index.
Whether you have an index or not, the correct query is:
SELECT <cols>
FROM <table>
ORDER BY <sort-cols> [DESC] LIMIT <no-rows>
Note that if you don't have an index the database will load all data and probably sort in memory to find the TOP n.
If you do have the appropriate index, the database will use the best index available to retrieve the TOP n rows as efficiently as possible.
Note that the sqllite documentation is very clear on the matter. The section on ORDER BY explains that ordering is undefined. And nothing in the section on LIMIT contradicts this (it simply constrains the number of rows returned).
If a SELECT statement that returns more than one row does not have an ORDER BY clause, the order in which the rows are returned is undefined.
This behaviour is also consistent with the ANSI standard and all major SQL implementations. Note that any database vendor that guaranteed any kind of ordering would have to sacrifice performance to the detriment of queries trying to retrieve data but not caring about order. (Not good for business.)
As a side note, flawed assumptions about ordering is an easy mistake to make (similar to flawed assumptions about uninitialised local variables).
RDBMS implementations are very likely to make ordering appear consistent. They follow a certain algorithm for adding data, a certain algorithm for retrieving data. And as a result, their operations are highly repeatable (it's what we love (and hate) about computers). So things repeatably look the same.
Theoretical examples:
Inserting a row results in the row being added to the next available free space. So data appears sequential. But an update would have to move the row to a new location if it no longer fits.
The DB engine might retrieve data sequentially from clustered index pages and seem to use clustered index as the 'natural ordering' ... until one day a page-split puts one of the pages in a different location. * Or a new version of the DMBS might cache certain data for performance, and suddenly order changes.
Real-world example:
The MS SQL Server 6.5 implementation of GROUP BY had the side-effect of also sorting by the group-by columns. When MS (in version 7 or 2000) implemented some performance improvements, GROUP BY would by default, return data in a hashed order. Many people blamed MS for breaking their queries when in fact they had made false assumptions and failed to ORDER BY their results as needed.
This is why the only guarantee of a specific ordering is to use the ORDER BY clause.
No. Table records have no inherent order. So it is undefined which row(s) to get with a LIMIT clause without an ORDER BY.
SQLite in its current implemantation may return the latest inserted row, but even if that were the case you must not rely on it.
Give a table a datetime column or some sortkey, if record order is important for you.
In SQL, data is stored in tables unordered. What comes out first one day might not be the same the next.
ORDER BY, or some other specific selection criteria is required to guarantee the correct value.
I recently started a new job and I am perplexed as to why the tables were designed this way. (in many databases) Is there someone who can give me a logical explanation?
Each table has a primary key/Id field. Example: EmployeeId (Integer)
Then to get the next id we actually need to query and update a table that manages all the keys for every table.
SELECT NextId
FROM dbo.NextID
Where TableName = 'Employees'
This makes life difficult, as you can imagine. The person who designed this mess has left, and the others just buy into this is the way you do things.
Is there some design flaw in MS SQL Identity columns? I don't get it? Any ideas?
Thanks for your input
The features/limitations of IDENTITY columns make them useful for generating surrogate keys in many scenarios but they are not ideal for all purposes - such as creating "meaningful", managed and/or potentially updateable identifers usable by the business; or for data integration or replication. Microsoft introduced the SEQUENCE feature as a more flexible alternative to IDENTITY in SQL Server 2008. In code written for earlier versions where sequences weren't available it isn't unusual to see the kind of scheme that you have described.
My guess is the person wanted no gaps in the ID column therefore he/she implemented this unnecessary process of getting next available id.
Maybe your Application depends on sequential Ids, either way it is not the way to go, Your application should not be dependant on sequential values. and no doubt Identity value is the way to go for this kind of requirement.
Issue with Identity Column
Yes there was an active bug in Identity column in Sql Server 2012. Identity Column taking big jumps when creating new identity values. Still it should not matter.
I'm working with a client who wants to add timestamps to a bunch of tables so that they may sort the records in those tables chronologically. All of the tables also have an auto incrementing integer field as their primary key (id).
The (simple) idea - save the overhead/storage and rely on the primary key to sort fields chronologically. Sure this works, but I'm uncertain whether or not this approach is acceptable in sound database design.
Pros: less storage required per record, simpler VO classes, etc. etc.
Con: it implies a characteristic of that field, an otherwise simple identifer, whose definition does not in any way define or guarantee that it should/will function as such.
Assume for the sake of my question that the DB table definitions are set in stone. Still - is this acceptable in terms of best practices?
Thanks
You asked for "best practices", rather than "not terrible practices" so: no, you should not rely on an autoincremented primary key to establish chronology. One day you're going to introduce a change to the db design and that will break. I've seen it happen.
A datetime column whose default value is GETDATE() has very little overhead (about as much as an integer) and (better still) tells you not just sequence but actual date and time, which often turns out to be priceless. Even maintaining an index on the column is relatively cheap.
These days, I always put a CreateDate column data objects connected to real world events (such as account creation).
Edited to add:
If exact chronology is crucial to your application, you can't rely on either auto-increment or timestamps (since there can always be identical timestamps, no matter how high the resolution). You'll probably have to make something application-specific instead.
Further to egrunin's answer, a change to the persistence or processing logic of these rows may cause rows to be inserted into the database in a non-sequential or nondeterministic manner. You may implement a parallelized file processor that throws a row into the DB as soon as the thread finishes transforming it, which may be before another thread has finished processing a row that occurred earlier in the file. Using an ORM for record persistence may result in a similar behavior; the ORM may just maintain a "bag" (unordered collection) of object graphs awaiting persistence, and grab them at random to persist them to the DB when it's told to "flush" its object buffer.
In either case, trusting the autoincrement column to tell you the order in which records came into the SYSTEM is bad juju. It may or may not be able to tell you the order in which records his the DATABASE; that depends on the DB implementation.
You can acheive the same goal in the short term by sorting on the ID column. This would be better that adding additional data to acheive the same result. I don't think that it would be confusing for anyone to look at the data table and know that it's chronological when they see that it's an identity column.
There are a few drawbacks or limitations that I see however.
The chronological sort can be messed up if someone re-seeds the column
Chronology for a date period cannot be ascertained without the additional data
This setup prevents you from sorting chronologically if the system ever accepts new, non-chronological data
Based on the realistic evaluation of these "limitations" you should be able to advise a proper approach.
Auto-incrementing ID will give you an idea of order as Brad points out, but do it right - if you want to know WHEN something was added, have a datetime column. Then you can not only chronologically sort but also apply filters.
Don't do it. You should never rely on the actual value of your ID column. Treat it like a black box, only useful for doing key lookups.
You say "less storage required per record," but how important is that? How big are the rows we're talking about? If you've got 200-byte rows, another 4 bytes probably isn't going to matter much.
Don't optimize without measuring. Get it working right first, and THEN optimize.
#MadBreaker
There's to separate things, if you need to know the order you create a column order with autoincrement, however if you want to know the date and time it was inserted you use datetime2.
Chronological order can be garanteed if you don't allow updates or deletes, but if you want time control over select you should use datetime2.
You didnt mention if you are running on a single db or clustered. If you are clustered, be wary of increment implementations, as you are not always guaranteed things will come out in the order you would naturally think. For example, Oracle sequences can cache groups of next values (depending on your setup) and give you a 1,3,2,4,5 sort of list...
I just noticed that if I have an identity column in a table, when I insert new rows SQL Server 2008 is automatically filling up the sequence if there are discontinuity. I mean, if in my identity column I have 1,2,5,6 if I insert other two rows in the table the system puts automatically 3,7 in the identity column.
Do you know how to control this behavior?
THANKS
That is the defined and documented SQL Server behavior, and there's really not much you can do about changing it. What did you want to change about it??
IDENTITY columns will guarantee unique, ever-increasing ID's (as long as you don't mess around with them) - they don't guarantee anything else.
SQL Server will not go through the trouble of spotting "gaps" in your sequence and filling them up. I don't think that would be a good idea, anyway - what if you did have a record with ID=3, and then deleted it? Do you really want a next record to suddenly "recycle" that ID?? Not a good idea, in my opinion.