What makes count(*) query to run for 30 sec? [duplicate] - sql

This question already has an answer here:
Is COUNT(*) indexed?
(1 answer)
Closed 9 years ago.
I have a MS SQL table with over 250 million rows. Whenever I execute the following query
SELECT COUNT(*) FROM table_name
it takes over 30 seconds to get me the output. Why is it taking so much time? Does this do a count when I query? I'm assuming till date that it stores this information somewhere (probably in the table meta data. I m not sure if table meta even exists).
Also, I would like to know if this query is IO/Processor/Memory intensive?
Thanks

Every time you execute SELECT COUNT(*) from TABLE SQL server actually goes through the table and counts all rows. To get estemated row count on one or more tables you can run the following query which gets stored information and returns in under 1 sec.
SELECT OBJECT_NAME(OBJECT_ID) TableName, st.row_count
FROM sys.dm_db_partition_stats st
WHERE index_id < 2
ORDER BY st.row_count DESC
Read more about it here http://technet.microsoft.com/en-us/library/ms187737.aspx

No, sql server dosen't store this information. It computes it every query. But it can cache execution plan to emprove perfomace. So, if you want to get results quickly, you need a primary key at least.

As for what SQL server is doing and how expensive it is, you can look this up yourself. In SSMS enable the execution plan button for the query and run a select count(*). You will see that the server actually does an index scan (full table scan). (I would have expected the PK to be used for that, but in my test case it used some other non-clustered index.).
To get a feel for the cost right-click your query editor window, select Query Options... -> Execution -> Advanced and activate the check boxes for SET STATISTICS TIME and SET STATISTICS IO. The messages tab will contain information regarding IO and timing after you re-executed the select statement.
Also note that a select count(*) is quite aggressive in terms of shared locks it uses. To guarantee the result the whole table will be locked with a shared lock.
A very fast, lock-free alternative is to use the meta data for the table. The count you get from the meta-data is almost always accurate, but there is no guarantee.
USE <database_name,,>
GO
SELECT ddps.row_count
FROM sys.indexes AS i
INNER JOIN sys.objects AS o
ON i.object_id = o.object_id
AND o.name = '<your_table,,>'
INNER JOIN sys.dm_db_partition_stats AS ddps
ON i.object_id = ddps.object_id
AND i.index_id = ddps.index_id
WHERE i.index_id = 1
This is a SSMS template. Copy this into a query window and hit CTRL+SHIFT+M to get a dialog that asks you for values for database_name and table_name.

If you are looking for approximation counts on tables and your version is greater than or equal to SQL Server 2005, you can simply use:
SELECT t.NAME AS 'TableName'
,s.Name AS 'TableSchema'
,p.rows AS 'RowCounts'
FROM sys.tables t
INNER JOIN sys.schemas s
ON t.schema_id = s.schema_id
INNER JOIN sys.indexes i
ON t.OBJECT_ID = i.object_id
INNER JOIN sys.partitions p
ON i.object_id = p.OBJECT_ID AND i.index_id = p.index_id
WHERE
t.is_ms_shipped = 0
GROUP BY
t.NAME, s.Name, p.Rows
ORDER BY
s.Name, t.Name
Doing a count(*) would only consume a small amount of memory/processor. It isn't that big of an operation in terms of database functions.

Related

Verify SQL Server statistics

From my ASP.NET app using EF Core I get a query to the database that has quite a few joins (13). When I execute it, it works - but it takes 25 seconds to complete.
However, when I use "Legacy Cardinality Estimation" in database options the execution is just instant. As I understand the cardinality estimation is done based on statistics, therefore I executed exec sp_updatestats. While once it helped on the same db (but different query), this time it did not.
Therefore the first question that comes to my mind is: how do I verify the statistics are correct? And if they are why would cardinality estimator make bad choices?
Or more generally: how do I approach this problem without resorting to above mentioned option (turning on something legacy doesn't sound right)?
a good start to check is the indexes rowcnt/rows. you can usually see the issue here, though fixing it is either by updating statistics or getting the latest CU (cumulative update) or SP (service pack), and even then it may not be patched yet.
I had this problem in 2014 sp2 and sp3, the service packs alone did not fix the cardinality estimator. You need to get the latest Cumulative Update.
declare
#table_name varchar(128) = 'TableName'
,#schema_name varchar(128) = 'dbo'
/* 2014 and after I believe */
select max(p.rows) rows from sys.tables t
inner join sys.indexes i on t.object_id = i.object_id
inner join sys.partitions p on t.object_id = p.object_id and p.index_id = i.index_id -- current-ish
where t.name = #table_name and schema_name(t.schema_id) = #schema_name
/* 2008 to 2012 I believe */
select max(i.rows)
from sys.tables t
inner join sys.sysindexes i on i.id = t.object_id -- the old dbo.sysindexes becomes a view
where t.name = #table_name and schema_name(t.schema_id) = #schema_name
/* 2000 and possibly earlier */
select max(i.rows)
from sys.tables t
inner join dbo.sysindexes i on i.id = t.object_id -- a real live table, still selectable
where t.name = #table_name and schema_name(t.schema_id) = #schema_name
Unfortunately when it comes to CE everything is common ground. You said it yourself. 25 seconds vs < 1 seconds, and both used the same stats, updated or not, same indexes, same query and so on.
I had worked quite a lot about CE choice and it comes that (after stats, indexes and all standard recommended actions) one estimator may work very good and the other very bad. And this can be at the very same query with 1 filter difference or 2 different queries producing exact equal results.
In most cases both perform well or acceptable. Sometimes however 1 of them total screw up. There you must either use the legacy CE 2012 or make your statement 'CE 2014 friendly'.
Or, if your statements have a clear identity on your code and this is for example 'mystatementA021' then you can keep your own stats and select CE 2012 or 2014. But that is a longer story.

Sql query to get total records for specific table available in multiple databases on same server

I have one requirement where I need to find the total number of records in specific table "ENTITY", available in all the databases that we have on our SQL Server. I want to write a single query to get records from multiple databases for this table. Can someone please help me in this regard?
Thanks.
Option One
SELECT SUM(P.[Rows]) AS [Count]
FROM SYS.objects O
JOIN SYS.partitions P
ON P.[object_id] = O.[object_id]
AND P.index_id < 2
WHERE O.name = 'Table'
Option Two
SELECT COUNT(1)
FROM dbo.Table

Improving the performance of an SQL query

I have a SQL Server table containing around 50,000,000 rows and I need to run the following two queries on it:
SELECT Count(*) AS Total
FROM TableName WITH(NOLOCK)
WHERE Col1 = 'xxx' AND Col2 = 'yyy'
then
SELECT TOP 1 Col3
FROM TableName WITH(NOLOCK)
WHERE Col1 = 'xxx' AND Col2 = 'yyy'
ORDER BY TableNameId DESC
The table has the following structure:
dbo.TableName
TableNameId int PK
Col1 varchar(12)
Col2 varchar(256)
Col3 int
Timestamp datetime
As well as running queries on it, there are loads of inserts every second going into the table hence the NOLOCK. I've tried creating the following index:
NONCLUSTERED INDEX (Col1, Col2) INCLUDE (TableNameId, Col3)
I need these queries to return results as quick as possible (1 second max). At this stage, I have the ability to restructure the table as the data isn't live yet, and I can also get rid of the Timestamp field if I need to.
Firstly - include TableNameID in your index and not as included column, then you can specify descending in the index order.
That should speed up things regarding your TOP 1 ... ORDER BY TableNameId DESC
Secondly - check up on how much time is I/O for example (SET STATISTICS IO ON) and how much is CPU (SET STATISTICS TIME ON).
If it is I/O there's not much you can do because you do have to move through a lot of data.
If you'd like a very fast estimate of how many rows are in the table, try quering sys.dm_db_partition_stats:
-- Report how many rows are in each table
SELECT o.name as [Table Name], ddps.row_count
FROM sys.indexes (nolock) AS i
INNER JOIN sys.objects (nolock) AS o ON i.OBJECT_ID = o.OBJECT_ID
INNER JOIN sys.dm_db_partition_stats (nolock) AS ddps ON i.OBJECT_ID = ddps.OBJECT_ID
AND i.index_id = ddps.index_id
WHERE i.index_id < 2
AND o.is_ms_shipped = 0 -- Remove to include system objects
AND ddps.row_count > 0
ORDER BY ddps.row_count DESC
If you have multiple partitions, this may not work. You might need to get the SUM of the row_count.
However, if you need an accurate count, you will need to count the rows, and this will take a while. Also, you may get the error "Could not continue scan with NOLOCK due to data movement."
You didn't mention how long your indexed query is running. The index and your query look fine to me.

SQL Server Count is slow

Counting tables with large amount of data may be very slow, sometimes it takes minutes; it also may generate deadlock on a busy server. I want to display real values, NOLOCK is not an option.
The servers I use is SQL Server 2005 or 2008 Standard or Enterprise - if it matters.
I can imagine that SQL Server maintains the counts for every table and if there is no WHERE clause I could get that number pretty quickly, right?
For example:
SELECT COUNT(*) FROM myTable
should immediately return with the correct value. Do I need to rely on statistics to be updated?
Very close approximate (ignoring any in-flight transactions) would be:
SELECT SUM(p.rows) FROM sys.partitions AS p
INNER JOIN sys.tables AS t
ON p.[object_id] = t.[object_id]
INNER JOIN sys.schemas AS s
ON s.[schema_id] = t.[schema_id]
WHERE t.name = N'myTable'
AND s.name = N'dbo'
AND p.index_id IN (0,1);
This will return much, much quicker than COUNT(*), and if your table is changing quickly enough, it's not really any less accurate - if your table has changed between when you started your COUNT (and locks were taken) and when it was returned (when locks were released and all the waiting write transactions were now allowed to write to the table), is it that much more valuable? I don't think so.
If you have some subset of the table you want to count (say, WHERE some_column IS NULL), you could create a filtered index on that column, and structure the where clause one way or the other, depending on whether it was the exception or the rule (so create the filtered index on the smaller set). So one of these two indexes:
CREATE INDEX IAmTheException ON dbo.table(some_column)
WHERE some_column IS NULL;
CREATE INDEX IAmTheRule ON dbo.table(some_column)
WHERE some_column IS NOT NULL;
Then you could get the count in a similar way using:
SELECT SUM(p.rows) FROM sys.partitions AS p
INNER JOIN sys.tables AS t
ON p.[object_id] = t.[object_id]
INNER JOIN sys.schemas AS s
ON s.[schema_id] = t.[schema_id]
INNER JOIN sys.indexes AS i
ON p.index_id = i.index_id
WHERE t.name = N'myTable'
AND s.name = N'dbo'
AND i.name = N'IAmTheException' -- or N'IAmTheRule'
AND p.index_id IN (0,1);
And if you want to know the opposite, you just subtract from the first query above.
(How large is "large amount of data"? - should have commented this first, but maybe the exec below helps you out already)
If I run a query on a static (means no one else is annoying with read/write/updates in quite a while so contention is not an issue) table with 200 million rows and COUNT(*) in 15 seconds on my dev machine (oracle).
Considering the pure amount of data, this is still quite fast (at least to me)
As you said NOLOCK is not an option, you could consider
exec sp_spaceused 'myTable'
as well.
But this pins down nearly to the same as NOLOCK (ignoring contention + delete/update afaik)
I've been working with SSMS for well over a decade and only in the past year found out that it can give you this information quickly and easily, thanks to this answer.
Select the "Tables" folder from the database tree (Object Explorer)
Press F7 or select View > Object Explorer Details to open Object Explorer Details view
In this view you can right-click on the column header to select the columns you want to see including table space used, index space used and row count:
Note that the support for this in Azure SQL databases seems a bit spotty at best - my guess is that the queries from SSMS are timing out, so it only returns a handful of tables each refresh, however the highlighted one always seems to be returned.
Count will do either a table scan or an index scan. So for a high number of rows it will be slow. If you do this operation frequently, the best way is to keep the count record in another table.
If however you do not want to do that, you can create a dummy index (that will not be used by your query's) and query it's number of items, something like:
select
row_count
from sys.dm_db_partition_stats as p
inner join sys.indexes as i
on p.index_id = i.index_id
and p.object_id = i.object_id
where i.name = 'your index'
I am suggesting creating a new index, because this one (if it will not be used) will not get locked during other operations.
As Aaron Bertrand said, maintaining the query might be more costly then using an already existing one. So the choice is yours.
If you just need a rough count of number of rows, ie. to make sure a table loaded properly or to make sure the data was not deleted, do the following:
MySQL> connect information_schema;
MySQL> select table_name,table_rows from tables;

Easy way to find out how many rows in total are stored within SQL Server Database?

I'm looking for easy way to count all rows within one SQL Server 2005/2008 database (skipping the system tables of course)? I know i could use
SELECT COUNT (COLUMN) FROM TABLE
and do it for each table and then add it up but would prefer some automated way?
Is there one?
SELECT SUM(row_count)
FROM sys.dm_db_partition_stats
WHERE index_id IN (0,1)
AND OBJECTPROPERTY([object_id], 'IsMsShipped') = 0;
This will be accurate except for, potentially, any rows that are being added or removed within a transaction at the time you run the query. And it won't have the expense of hitting individual tables.
But as I mentioned in another comment, I'm not sure how this helps you determine "how much data" your database holds. How many rows, sure, but if I have 10 glasses, each half full of water, and you have 5 glasses, each completely full, which of us has more water?
This was my answer to a similar question today:
SQL Server 2005 or later gives quite a useful report showing table sizes - including row counts etc. It's in Standard Reports - and it is Disc Usage by Table.
Programmatically, there's a nice solution at: http://www.sqlservercentral.com/articles/T-SQL/67624/
Try:
SELECT
[TableName] = so.name,
[RowCount] = MAX(si.rows)
FROM
sysobjects AS so,
sysindexes AS si
WHERE
so.xtype = 'U'
AND
si.id = OBJECT_ID(so.name)
GROUP BY
so.name
ORDER BY
2 DESC
This is the indexed rows. This is probably only an approximation, as databases change a lot and some stuff might not be indexed, but this will be fast.
EDIT: Note that so.xtype is user types, making the assumption you do not want the system stuff and only "real" data stuff.
EDIT2: no flames note: probably a bad idea to query on the sysobjects table :).
EDIT3: to specifically address requirement, and no associative joins :)
SELECT sum(mycount) from
(SELECT
MAX(si.rows) AS mycount
FROM
sysobjects AS so
join sysindexes AS si on si.id = OBJECT_ID(so.name)
WHERE
so.xtype = 'U'
GROUP BY
so.name
) as mylist
We know that sp_spaceused, when passed a table name, will return a row count, so we can examine what it does - it queries sys.dm_db_partition_stats - and copy it to get this:
SELECT
SUM(ddps.row_count) TotalRows
FROM
sys.indexes i
INNER JOIN sys.objects o ON i.OBJECT_ID = o.OBJECT_ID
INNER JOIN sys.dm_db_partition_stats ddps ON
o.OBJECT_ID = ddps.OBJECT_ID
AND i.index_id = ddps.index_id
WHERE
i.index_id < 2
AND o.is_ms_shipped = 0 -- to exclude system tables
Curious requirement though, I have to say...
You could query sysindexes, look at the rowcnt value for the clustered index on each table. But I'm not sure exactly how up to date that is.
Alternatively, something like this (briefly tested on a small test db):
CREATE TABLE #TableRowCount (TableName NVARCHAR(128), RowCnt BIGINT)
INSERT #TableRowCount (TableName, RowCnt)
EXECUTE sp_msforeachtable 'SELECT "?", COUNT(*) FROM ?'
SELECT SUM(RowCnt) AS TotalRowCount FROM #TableRowCount
DROP TABLE #TableRowCount
Check out the undocumented stored procedure sp_MSForEachTable. Using it you can run a Count(*) on every table in the database. For your specific issue:
EXEC sp_MSforeachtable 'SELECT ''?'', Count(*) as NumberOfRows FROM ?'
I'm not sure if older version of MS SQL has the information_shema SQL standard for data dictionary.
You can do something like:
SELECT SUM(table_rows)
FROM information_schema.tables
WHERE table_schema = 'DATABASENAME'