whats the efficient query to count table rows in SQL server 2008
For an accurate answer, the sane way to get the exact nr of rows in a table in a normalized database is to execute:
select count(*) from table;
On a table without any indexes, the database will perform a full table scan. If you have indexed a non-null column, the database may use the (potentially much smaller) index to resolve your query.
If you need a faster solution, you have to keep track of the nr of rows in the table without resorting to counting the rows themselves. For example, you can do this yourself in another table (as part of your carefully designed API or by using triggers etc) or by creating an *indexed view for this purpose.
However, if you only need an aproximation, and it is critical that you get the answer fast, others have shown ways to do that by using the dictionary (John, Martin)
*Disclaimer: I haven't actually used indexed views in SQL server, I'm just reading the documentation and assume they can do the same thing as a materialized view in Oracle.
SELECT COUNT(*) FROM TableName
Does that work for you ?
SELECT COUNT(*) FROM table_name
will have a trivial plan. There is no scope to optimise this.
You can use WITH NOLOCK to reduce locking overhead though at the risk of some inaccuracy.
Or use
SELECT sum(rows)
FROM sys.partitions
where object_id=object_id('foo') and index_id < 2
However this also will not give you a transactionally consistent answer.
Related
I would like to understand how exactly does sql count work. Is it a whole table scan that happens or is it some property of the table that is read. However I feel a table scan would be an overhead in case of huge tables with lots of records.
In general either a table or index scan is performed. This is chiefly because in a MVCC-supporting engine, different transactions could see different rows, so there is no single "row count" which is simultaneously correct for everyone.
Likewise, if you have a WHERE clause, then the where condition could be different for different clients, so they see different numbers.
If you need to do a lot of counts of large tables, consider storing your own counters in a different table. Exactly how you do this is entirely application specific.
This will depend very much on which SQL implementation you are using (MS SQL Server, MySQL, Oracle, PostgreSQL etc), and how clever its optimiser is.
It may also depend on the query. For example, with something like
SELECT COUNT(primary_key) FROM table;
the optimiser may realise that there is no need to scan the table (since there is no filtering with WHERE and no possibility that any values are NULL) and just return the size of the table. With a more complicated query (where there is filtering, or the possibility of NULLs), the database may have to scan the table, or it may be able to do some optimisation with the use of an index.
This is obviously implementation dependant (i.e. different RDBMS may employ different strategies) and usage dependant (i.e. select count(*) from mytable and select count(*) from mytable where myfield < somevalue) may use different methods even in the same DB.
If you are trying to get the count based on some partitioning that is already expressed by an Index, smart DBs will try to use the index alone. Or something like the old "rushmore" used in Foxbase.
So, "it depends", but at the end of the day, if no better methods are available, yes, the DB will perform a table scan.
It is usually some sort of index scan, unless there is no unique index on the table.
Strangely enough, most database engines can only count by scanning. They even provide alternate solutions to count using table metadata. For instance SQL Server supports SELECT rowcnt FROM sysindexes .... However, these are usually not 100% accurate.
YSE COUNT FUNCTION DOSE TABLE SCAN, rather than using count on table to get total number of row you can use :
SELECT
Total_Rows= SUM(st.row_count)
FROM
sys.dm_db_partition_stats st
WHERE
object_name(object_id) = 'TABLENAME'
or
SELECT sysobjects.[name], max(sysindexes.[rows]) AS TableRows
FROM sysindexes INNER JOIN sysobjects ON sysindexes.[id] = sysobjects.[id]
WHERE sysobjects.xtype = 'U' and sysobjects.[name]='tablename'
GROUP BY sysobjects.[name]
ORDER BY max(rows) DESC
OTHER WAY TO GET TOTAL COUNT : http://www.codeproject.com/Tips/58796/Number-of-different-way-to-get-total-no-of-row-fro.aspx
It depends on the DBMS used.
If there is an index, there should be one index row for each table row. A smart DBMS will likely choose the smallest index and count the index rows.
Finally, if the table is small enough, it may count the table rows and bypass the index.
In postgreSQL a table scan is performed.
I think it's implementation dependant.
Edit:
See this link
It really doesn't matter!
I assume you want the row count for some sort of paging... so just make sure your paging algorithm is into the best practices and forget about how the engine works.
Let people in database business care about this, just follow the recommendation of those who are experts in the database your are using.
SQL Server - https://web.archive.org/web/20211020131201/https://www.4guysfromrolla.com/webtech/042606-1.shtml
Oracle - Paging with Oracle
MySQL - http://php.about.com/od/phpwithmysql/ss/php_pagination.htm
Recently, I came across a pattern (not sure, could be an anti-pattern) of sorting data in a SELECT query. The pattern is more of a verbose and non-declarative way for ordering data. The pattern is to dump relevant data from actual table into temporary table and then apply orderby on a field on the temporary table. I guess, the only reason why someone would do that is to improve the performance (which I doubt) and no other benefit.
For e.g. Let's say, there is a user table. The table might contain rows in millions. We want to retrieve all the users whose first name starts with 'G' and sorted by first name. The natural and more declarative way to implement a SQL query for this scenario is:
More natural and declarative way
SELECT * FROM Users
WHERE NAME LIKE 'G%'
ORDER BY Name
Verbose way
SELECT * INTO TempTable
FROM Users
WHERE NAME LIKE 'G%'
SELECT * FROM TempTable
ORDER BY Name
With that context, I have few questions:
Will there be any performance difference between two ways if there is no index on the first name field. If yes, which one would be better.
Will there be any performance difference between two ways if there is an index on the first name field. If yes, which one would be better.
Should not the SQL Server optimizer generate same execution plan for both the ways?
Is there any benefit in writing a verbose way from any other persective like locking/blocking?
Thanks in advance.
Reguzlarly: Anti pattern by people without an idea what they do.
SOMETIMES: ok, because SQL Server has a problem that is not resolvable otherwise - not seen that one in yeas, though.
It makes things slower because it forces the tmpddb table to be fully populated FIRST, while otherwise the query could POSSIBLY be resoled more efficiently.
last time I saw that was like 3 years ago. We got it 3 times as fast by not being smart and using a tempdb table ;)
Answers:
1: No, it still needs a table scan, obviously.
2: Possibly - depends on data amount, but an index seek by index would contain the data in order already (as the index is ordered by content).
3: no. Obviously. Query plan optimization is statement by statement. By cutting the execution in 2, the query optimizer CAN NOT merge the join into the first statement.
4: Only if you run into a query optimizer issue or a limitation of how many tables you can join - not in that degenerate case (degenerate in a technical meaning - i.e. very simplistic). BUt if you need to join MANY MANY tables it may be better to go with an interim step.
If the field you want to do an order by on is not indexed, you could put everything into a temp table and index it and then do the ordering and it might be faster. You would have to test to make sure.
There is never any benefit of the second approach that I can think of.
It means if the data is available pre-ordered SQL Server can't take advantage of this and adds an unnecessary blocking operator and additional sort to the plan.
In the case that the data is not available pre-ordered SQL Server will sort it in a work table either in memory or tempdb anyway and adding an explicit #temp table just adds an unnecessary additional step.
Edit
I suppose one case where the second approach could give an apparent benefit might be if the presence of the ORDER BY caused SQL Server to choose a different plan that turned out to be sub optimal. In which case I would resolve that in a different way by either improving statistics or by using hints/query rewrite to avoid the undesired plan.
I have an sql query which fetch the first N rows in a table which is designed as a low-level queue.
select top N * from my_table where status = 0 order by date asc
The intention behind this query is as follows:
First, this question is intended to be database agnostic, as my implementation will support sql server, oracle, DB2 and sybase. The sql syntax above of "top N" is just an example.
The table can contain millions of rows.
N is a relatively small number in comparison, e.g. 100.
status is 0 when the row is in the queue. Later it is changed to 1 to indicate that it is in processing. After processing it is deleted. So it is expected that at least 90% of the rows in the table will be with status 0.
rows in the table should be fetched according to their date, hence the order by clause.
What is the optimal index to make this query works fastest?
I initially thought the index should be on (date, status), but I am not sure about it anymore. Since the status column will contain mostly zeros, is there an added-value to it? Will it be sufficient to index by (date) alone?
Or maybe it should be (status, date)?
I don't think there is an efficient solution that will be RDMS independent. For example, Oracle has bitmap indexes, SQLServer has partial indexes, and I don't see reasons not to use them if, for instance, Mysql or Sqlite has nothing similar. Also, historically SQLServer implements clustered tables (or IOT in Oracle world) way better than Oracle does, so having clustered index on date column may work perfectly for SQLServer, but not for Oracle.
I'd rather change approach a bit. If you say 90% of rows don't satisfy status=0 condition, why not try refactoring schema, and adding a new table (or materialized view) that holds only records you are interested in ? The number of new programmable objects required for keeping that table up-to-date and merging data with original table is relatively small even if RDMS doesn't support materialized view directly. Also, if it's possible to redesign underlying logic, so rows never updated, only inserted or deleted, then it will help avoiding lock contentions , and as a result , the whole system will have a better performance .
Have a clustered index on Date and a non clustered index on Status.
i found a in a table there are 50 thousands records and it takes one minute when we fetch data from sql server table just by issuing a sql. there are one primary key that means a already a cluster index is there. i just do not understand why it takes one minute. beside index what are the ways out there to optimize a table to get the data faster. in this situation what i need to do for faster response. also tell me how we can write always a optimize sql. please tell me all the steps in detail for optimization.
thanks.
The fastest way to optimize indexes in table is to use SQL Server Tuning Advisor. Take a look http://www.youtube.com/watch?v=gjT8wL92mqE <-- here
Select only the columns you need, rather than select *. If your table has some large columns e.g. OLE types or other binary data (maybe used for storing images etc) then you may be transferring vastly more data off disk and over the network than you need.
As others have said, an index is no help to you when you are selecting all rows (no where clause). Using an index would be slower in such cases because of the index read and table lookup for each row, vs full table scan.
If you are running select * from employee (as per question comment) then no amount of indexing will help you. It's an "Every column for every row" query: there is no magic for this.
Adding a WHERE won't help usually for select * query too.
What you can check is index and statistics maintenance. Do you do any? Here's a Google search
Or change how you use the data...
Edit:
Why a WHERE clause usually won't help...
If you add a WHERE that is not the PK..
you'll still need to scan the table unless you add an index on the searched column
then you'll need a key/bookmark lookup unless you make it covering
with SELECT * you need to add all columns to the index to make it covering
for a many hits, the index will probably be ignored to avoid key/bookmark lookups.
Unless there is a network issue or such, the issue is reading all columns not lack of WHERE
If you did SELECT col13 FROM MyTable and had an index on col13, the index will probably be used.
A SELECT * FROM MyTable WHERE DateCol < '20090101' with an index on DateCol but matched 40% of the table, it will probably be ignored or you'd have expensive key/bookmark lookups
Irrespective of the merits of returning the whole table to your application that does sound an unexpectedly long time to retrieve just 50000 rows of employee data.
Does your query have an ORDER BY or is it literally just select * from employee?
What is the definition of the employee table? Does it contain any particularly wide columns? Are you storing binary data such as their CVs or employee photo in it?
How are you issuing the SQL and retrieving the results?
What isolation level are your select statements running at (You can use SQL Profiler to check this)
Are you encountering blocking? Does adding NOLOCK to the query speed things up dramatically?
I would like to understand how exactly does sql count work. Is it a whole table scan that happens or is it some property of the table that is read. However I feel a table scan would be an overhead in case of huge tables with lots of records.
In general either a table or index scan is performed. This is chiefly because in a MVCC-supporting engine, different transactions could see different rows, so there is no single "row count" which is simultaneously correct for everyone.
Likewise, if you have a WHERE clause, then the where condition could be different for different clients, so they see different numbers.
If you need to do a lot of counts of large tables, consider storing your own counters in a different table. Exactly how you do this is entirely application specific.
This will depend very much on which SQL implementation you are using (MS SQL Server, MySQL, Oracle, PostgreSQL etc), and how clever its optimiser is.
It may also depend on the query. For example, with something like
SELECT COUNT(primary_key) FROM table;
the optimiser may realise that there is no need to scan the table (since there is no filtering with WHERE and no possibility that any values are NULL) and just return the size of the table. With a more complicated query (where there is filtering, or the possibility of NULLs), the database may have to scan the table, or it may be able to do some optimisation with the use of an index.
This is obviously implementation dependant (i.e. different RDBMS may employ different strategies) and usage dependant (i.e. select count(*) from mytable and select count(*) from mytable where myfield < somevalue) may use different methods even in the same DB.
If you are trying to get the count based on some partitioning that is already expressed by an Index, smart DBs will try to use the index alone. Or something like the old "rushmore" used in Foxbase.
So, "it depends", but at the end of the day, if no better methods are available, yes, the DB will perform a table scan.
It is usually some sort of index scan, unless there is no unique index on the table.
Strangely enough, most database engines can only count by scanning. They even provide alternate solutions to count using table metadata. For instance SQL Server supports SELECT rowcnt FROM sysindexes .... However, these are usually not 100% accurate.
YSE COUNT FUNCTION DOSE TABLE SCAN, rather than using count on table to get total number of row you can use :
SELECT
Total_Rows= SUM(st.row_count)
FROM
sys.dm_db_partition_stats st
WHERE
object_name(object_id) = 'TABLENAME'
or
SELECT sysobjects.[name], max(sysindexes.[rows]) AS TableRows
FROM sysindexes INNER JOIN sysobjects ON sysindexes.[id] = sysobjects.[id]
WHERE sysobjects.xtype = 'U' and sysobjects.[name]='tablename'
GROUP BY sysobjects.[name]
ORDER BY max(rows) DESC
OTHER WAY TO GET TOTAL COUNT : http://www.codeproject.com/Tips/58796/Number-of-different-way-to-get-total-no-of-row-fro.aspx
It depends on the DBMS used.
If there is an index, there should be one index row for each table row. A smart DBMS will likely choose the smallest index and count the index rows.
Finally, if the table is small enough, it may count the table rows and bypass the index.
In postgreSQL a table scan is performed.
I think it's implementation dependant.
Edit:
See this link
It really doesn't matter!
I assume you want the row count for some sort of paging... so just make sure your paging algorithm is into the best practices and forget about how the engine works.
Let people in database business care about this, just follow the recommendation of those who are experts in the database your are using.
SQL Server - https://web.archive.org/web/20211020131201/https://www.4guysfromrolla.com/webtech/042606-1.shtml
Oracle - Paging with Oracle
MySQL - http://php.about.com/od/phpwithmysql/ss/php_pagination.htm