Get max value for identity column without a table scan - sql

I have a table with an Identity column Id.
When I execute:
select max(Id) from Table
SQL Server does a table scan and stream aggregate.
My question is, why can it not simply look up the last value assigned to Id? It's an identity, so the information must be tracked, right?
Can I look this up manually?

You can use IDENT_CURRENT to look up the last identity value to be inserted, e.g.
IDENT_CURRENT('MyTable')
However, be cautious when using this function. A failed transaction can still increment this value, and, as Quassnoi states, this row might have been deleted.
It's likely that it does a table scan because it can't guarantee that the last identity value is the MAX value. For example the identity might not be a simple incrementing integer. You could be using a decrementing integer as your identity.

What if you have deleted the latest record?
The value of IDENTITY would not correspond to the actual data anymore.
If you want fast lookups for MAX(id), you should create an index on it (or probably declare it a PRIMARY KEY)

Is the table clustered on that column?
Can you use Top 1:
SELECT TOP 1 [ID]
FROM [Table]
order by ID desc

You can run this following statement and remove the last UNION ALL. Run this statement to get the current Identity values.
SELECT
' SELECT '+char(39)+[name]+char(39)+' AS table_name, IDENT_CURRENT('+char(39)+[name]+char(39)+') AS currvalue UNION ALL'
AS currentIdentity
FROM sys.all_objects WHERE type = 'U'

Is the Id the primary key or indexed? Seems like it should do a seek in those cases.

I'm pretty sure you could set up an index on that field in descending order and it would use that to find the largest key. It should be fast.

Related

How to sort sql server table on open in ssms

How would I sort a table in SQL Server when it is opened? My table has an autonumber field that increments sequentially and it is the primary key field also.
I'd like to have another numeric field sorted smallest to largest and then have the autonumber field to use the auto field number relying on the other numeric field.
The image shows how the table is currently sorted. I'd like to have the table sorted when it opens from ssms by the second column.
Thanks,
Jeff
the sort depends on cluster index, by default any primary key will create a clustered index on that column so the table will be sorted based on it.
if you want to sort the table based on another column you need to drop your cluster index and create it on the second column.
How would I sort a table in SQL Server when it is opened?
You cannot. SQL tables represent unordered (multi)sets. They have no inherent ordering. To get data in order, you need an explicit order by clause:
select t.*
from t
order by col2 asc;
For performance, you want an index on (col2).
If you declared the second column as a descending clustered index, then in practice you would probably see smaller values first. However, even with a clustered index, SQL Server does not guarantee that a SELECT with no ORDER BY will return the rows in any order. Period. There is no guarantee.

Auto Increment selection SQLite

I have a column named id in my SQLite database which is auto-increment, Primary Key, Unique.
Is the result of the following query guaranteed to be the smallest value of id in the database and does this correspond to the "oldest" (as in a FIFO) row to be inserted?
SELECT id FROM table LIMIT 1
The SQLite documentation is quite explicit:
If a SELECT statement that returns more than one row does not have an
ORDER BY clause, the order in which the rows are returned is
undefined. Or, if a SELECT statement does have an ORDER BY clause,
then the list of expressions attached to the ORDER BY determine the
order in which rows are returned to the user.
The LIMIT is applied after an ORDER BY would be, so I don't think it affects the application of this statement.
Hence, if you want the first row, use ORDER BY:
SELECT id
FROM table
ORDER BY id
LIMIT 1;
Note that if id is a primary key, this will add basically no overhead.
I should emphasize that in practice you are probably going to get the smallest id without the ORDER BY. However, it is a really, really bad idea to depend on behavior that directly contradicts the documentation.
Is the result of the following query guaranteed to be the smallest value of id in the database
Yes. However if the table is empty or the id column is NULL, it could also return NULL
and does this correspond to the "oldest" (as in a FIFO) row to be inserted?
No, there's no guarantee of that.

SQL get last rows in table WITHOUT primary ID

I have a table with 800,000 entries without a primary key. I am not allowed to add a primary key and I cant sort by TOP 1 ....ORDER BY DESC because it takes hours to complete this task. So I tried this work around:
DECLARE #ROWCOUNT int, #OFFSET int
SELECT #ROWCOUNT = (SELECT COUNT(field) FROM TABLE)
SET #OFFSET = #ROWCOUNT-1
select TOP 1 FROM TABLE WHERE=?????NO PRIMARY KEY??? BETWEEN #Offset AND #ROWCOUNT
Of course this doesn't work.
Anyway to do use this code/or better code to retrieve the last row in table?
If your table has no primary key or your primary key is not orderly... you can try the code below... if you want see more last record, you can change the number in code
Select top (select COUNT(*) from table) * From table
EXCEPT
Select top ((select COUNT(*) from table)-(1)) * From table
I assume that when you are saying 'last rows', you mean 'last created rows'.
Even if you had primary key, it would still be not the best option to use it do determine rows creation order.
There is no guarantee that that the row with the bigger primary key value was created after the row with a smaller primary key value.
Even if primary key is on identity column, you can still always override identity values on insert by using
set identity_insert on.
It is a better idea to have timestamp column, for example CreatedDateTime with a default constraint.
You would have index on this field.Then your query would be simple, efficient and correct:
select top 1 *
from MyTable
order by CreatedDateTime desc
If you don't have timestamp column, you can't determine 'last rows'.
If you need to select 1 column from a table of 800,000 rows where that column is the min or max possible value, and that column is not indexed, then the unassailable fact is that SQL will have to read every row in the table in order to identify that min or max value.
(An aside, on the face of it reading all the rows of an 800,000 row table shouldn't take all that long. How wide is the column? How often is the query run? Are there concurrency, locking, blocking, or deadlocking issues? These may be pain points that could be addressed. End of aside.)
There are any number of work-arounds (indexes, views, indexed views, peridocially indexed copies of the talbe, run once store result use for T period of time before refreshing, etc.), but virtually all of them require making permanent modifications to the database. It sounds like you are not permitted to do this, and I don't think there's much you can do here without some such permanent change--and call it improvement, when you discuss it with your project manager--to the database.
You need to add an Index, can you?
Even if you don't have a primary key an Index will speed up considerably the query.
You say you don't have a primary key, but for your question I assume you have some type of timestamp or something similar on the table, if you create an Index using this column you will be able to execute a query like :
SELECT *
FROM table_name
WHERE timestamp_column_name=(
SELECT max(timestamp_column_name)
FROM table_name
)
If you're not allowed to edit this table, have you considered creating a view, or replicating the data in the table and moving it into one that has a primary key?
Sounds hacky, but then, your 800k row table doesn't have a primary key, so hacky seems to be the order of the day. :)
I believe you could write it simply as
SELECT * FROM table ORDER BY rowid DESC LIMIT 1;
Hope it helps.

which one is a faster/better sql practice?

Suppose I have a 2 column table (id, flag) and id is sequential.
I expect this table to contain a lot of records.
I want to periodically select the first row not flagged and update it. Some of the records on the way may have already been flagged, so I want to skip them.
Does it make more sense if I store the last id I flagged and use it in my select statement, like
select * from mytable where id > my_last_id order by id asc limit 1
or simply get the first unflagged row, like:
select * from mytable where flagged = 'F' order by id asc limit 1
Thank you!
If you create an index on flagged, retrieving an unflagged row should be pretty much an instant operation. If you always update them sequentially, then the first method is fine though.
Option two is the only one that makes sense unless you know that you're always going to process records in sequence!
Assuming MySQL, this one:
SELECT *
FROM mytable
WHERE flagged = 'F'
ORDER BY
flagged ASC, id ASC
LIMIT 1
will be slightly less efficient in InnoDB and of same efficiency in MyISAM, if you have an index on (flagged, id).
InnoDB tables are clustered on the PRIMARY KEY, so fetching the first record in id does not require looking up the table.
In MyISAM, tables are heap-organized, so the index used to police the PRIMARY KEY is stored separately from the table.
Note the flagged in the ORDER BY clause may seem to be redundant, but it is required for MySQL to pick the correct index.
Also, the composite index should be on (flagged, id) even in InnoDB (which implicitly includes the PRIMARY KEY into each index).
You could use
Select Min(Id) as 'Id'
From dbo.myTable
Where Flagged='F'
Assuming the Flagged = 'F' means that it is not flagged.

Does 'Select' always order by primary key?

A basic simple question for all of you DBA.
When I do a select, is it always guaranteed that my result will be ordered by the primary key, or should I specify it with an 'order by'?
I'm using Oracle as my DB.
No, if you do not use "order by" you are not guaranteed any ordering whatsoever. In fact, you are not guaranteed that the ordering from one query to the next will be the same. Remember that SQL is dealing with data in a set based fashion. Now, one database implementation or another may happen to provide orderings in a certain way but you should never rely on that.
When I do a select, is it always guaranteed that my result will be ordered by the primary key, or should I specify it with an 'order by'?
No, it's by far not guaranteed.
SELECT *
FROM table
most probably will use TABLE SCAN which does not use primary key at all.
You can use a hint:
SELECT /*+ INDEX(pk_index_name) */
*
FROM table
, but even in this case the ordering is not guaranteed: if you use Enterprise Edition, the query may be parallelized.
This is a problem, since ORDER BY cannot be used in a SELECT clause subquery and you cannot write something like this:
SELECT (
SELECT column
FROM table
WHERE rownum = 1
ORDER BY
other_column
)
FROM other_table
No, ordering is never guaranteed unless you use an ORDER BY.
The order that rows are fetched is dependent on the access method (e.g. full table scan, index scan), the physical attributes of the table, the logical location of each row within the table, and other factors. These can all change even if you don't change your query, so in order to guarantee a consistent ordering in your result set, ORDER BY is necessary.
It depends on your DB and also it depends on indexed fields.
For example, in my table Users every user has unique varchar(20) field - login, and primary key - id.
And "Select * from users" returns rowset ordered by login.
If you desire specific ordering then declare it specifically using ORDER BY.
What if the table doesn't have primary key?
If you want your results in a specific order, always specify an order by