SQL Table Rowcount different from Select Count in SQL Server - sql

I am using Microsoft SQL Server.
I have a Table which had been updated by 80 rows.
If I right click and look at the table properties the rowcount say 10000 but a select Count(id) from TableName indicates 10080.
I checked the statistics and they also have a rowcount of 10080.
Why is there a difference between the Rocount in Properties and the Select Count?
Thanks,
S

This information most probably comes from the sysindexes table (see the documentation) and the
information in sysindexes isn't guaranteed to be up-to-date. This is a known fact in SQL Server.
Try running DBCC UPDATEUSAGE and check the values again.
Ref: http://msdn.microsoft.com/en-us/library/ms188414.aspx
DBCC UPDATEUSAGE corrects the rows,
used pages, reserved pages, leaf pages
and data page counts for each
partition in a table or index. If
there are no inaccuracies in the
system tables, DBCC UPDATEUSAGE
returns no data. If inaccuracies are
found and corrected and WITH
NO_INFOMSGS is not used, DBCC
UPDATEUSAGE returns the rows and
columns being updated in the system
tables.
Example:
DBCC UPDATEUSAGE (0)

Update the statistics. That's the only way RDBMS knows current status of your tables and indexes. This also helps RDBMS to choose correct execution path for optimal performance.
SQL Server 2005
UPDATE STATISTICS dbOwner.yourTableName;
Oracle
UPDATE STATISTICS yourSchema.yourTableName;

The property info is cached in SSMS.

there are a variety of ways to check the size of a table.
http://blogs.msdn.com/b/martijnh/archive/2010/07/15/sql-server-how-to-quickly-retrieve-accurate-row-count-for-table.aspx mentions 4 of various accuracy and speed.
The ever reliable full table scan is a bit slow ..
SELECT COUNT(*) FROM Transactions
and the quick alternative depends on statistics
SELECT CONVERT(bigint, rows)
FROM sysindexes
WHERE id = OBJECT_ID('Transactions')
AND indid < 2
It also mentions that the ssms gui uses the query
SELECT CAST(p.rows AS float)
FROM sys.tables AS tbl
INNER JOIN sys.indexes AS idx ON idx.object_id = tbl.object_id and idx.index_id < 2
INNER JOIN sys.partitions AS p ON p.object_id=CAST(tbl.object_id AS int)
AND p.index_id=idx.index_id
WHERE ((tbl.name=N'Transactions'
AND SCHEMA_NAME(tbl.schema_id)='dbo'))
and that a fast, and relatively accurate way to size a table is
SELECT SUM (row_count)
FROM sys.dm_db_partition_stats
WHERE object_id=OBJECT_ID('Transactions')
AND (index_id=0 or index_id=1);
Unfortunately this last query requires extra permissions beyond basic select.

Related

How to SELECT COUNT from tables currently being INSERT?

Hi consider there is an INSERT statement running on a table TABLE_A, which takes a long time, I would like to see how has it progressed.
What I tried was to open up a new session (new query window in SSMS) while the long running statement is still in process, I ran the query
SELECT COUNT(1) FROM TABLE_A WITH (nolock)
hoping that it will return right away with the number of rows everytime I run the query, but the test result was even with (nolock), still, it only returns after the INSERT statement is completed.
What have I missed? Do I add (nolock) to the INSERT statement as well? Or is this not achievable?
(Edit)
OK, I have found what I missed. If you first use CREATE TABLE TABLE_A, then INSERT INTO TABLE_A, the SELECT COUNT will work. If you use SELECT * INTO TABLE_A FROM xxx, without first creating TABLE_A, then non of the following will work (not even sysindexes).
Short answer: You can't do this.
Longer answer: A single INSERT statement is an atomic operation. As such, the query has either inserted all the rows or has inserted none of them. Therefore you can't get a count of how far through it has progressed.
Even longer answer: Martin Smith has given you a way to achieve what you want. Whether you still want to do it that way is up to you of course. Personally I still prefer to insert in manageable batches if you really need to track progress of something like this. So I would rewrite the INSERT as multiple smaller statements. Depending on your implementation, that may be a trivial thing to do.
If you are using SQL Server 2016 the live query statistics feature can allow you to see the progress of the insert in real time.
The below screenshot was taken while inserting 10 million rows into a table with a clustered index and a single nonclustered index.
It shows that the insert was 88% complete on the clustered index and this will be followed by a sort operator to get the values into non clustered index key order before inserting into the NCI. This is a blocking operator and the sort cannot output any rows until all input rows are consumed so the operators to the left of this are 0% done.
With respect to your question on NOLOCK
It is trivial to test
Connection 1
USE tempdb
CREATE TABLE T2
(
X INT IDENTITY PRIMARY KEY,
F CHAR(8000)
);
WHILE NOT EXISTS(SELECT * FROM T2 WITH (NOLOCK))
LOOP:
SELECT COUNT(*) AS CountMethod FROM T2 WITH (NOLOCK);
SELECT rows FROM sysindexes WHERE id = OBJECT_ID('T2');
RAISERROR ('Waiting for 10 seconds',0,1) WITH NOWAIT;
WAITFOR delay '00:00:10';
SELECT COUNT(*) AS CountMethod FROM T2 WITH (NOLOCK);
SELECT rows FROM sysindexes WHERE id = OBJECT_ID('T2');
RAISERROR ('Waiting to drop table',0,1) WITH NOWAIT
DROP TABLE T2
Connection 2
use tempdb;
--Insert 2000 * 2000 = 4 million rows
WITH T
AS (SELECT TOP 2000 'x' AS x
FROM master..spt_values)
INSERT INTO T2
(F)
SELECT 'X'
FROM T v1
CROSS JOIN T v2
OPTION (MAXDOP 1)
Example Results - Showing row count increasing
SELECT queries with NOLOCK allow dirty reads. They don't actually take no locks and can still be blocked, they still need a SCH-S (schema stability) lock on the table (and on a heap it will also take a hobt lock).
The only thing incompatible with a SCH-S is a SCH-M (schema modification) lock. Presumably you also performed some DDL on the table in the same transaction (e.g. perhaps created it in the same tran)
For the use case of a large insert, where an approximate in flight result is fine, I generally just poll sysindexes as shown above to retrieve the count from metadata rather than actually counting the rows (non deprecated alternative DMVs are available)
When an insert has a wide update plan you can even see it inserting to the various indexes in turn that way.
If the table is created inside the inserting transaction this sysindexes query will still block though as the OBJECT_ID function won't return a result based on uncommitted data regardless of the isolation level in effect. It's sometimes possible to get around that by getting the object_id from sys.tables with nolock instead.
Use the below query to find the count for any large table or locked table or being inserted table in seconds . Just replace the table name which you want to search.
SELECT
Total_Rows= SUM(st.row_count)
FROM
sys.dm_db_partition_stats st
WHERE
object_name(object_id) = 'TABLENAME' AND (index_id < 2)
For those who just need to see the record count while executing a long running INSERT script, I found you can see the current record count through SSMS by right clicking on the destination database table, -> Properties -> Storage, then view the "Row Count" value like so:
Close window and repeat to see the updated record count.

select count(1) vs select rowcnt from sysindexes performance and use?

I read on NET that using case 2 is more fast than case 1 to check no of rows in a table.
so i did a performance test of both count(1) vs rowcnt from sys.sysindexes I found
2nd one is far better.
I've a question is it good to use CASE 2 in production code to whenever i need to
to count no of rows in a table in Stored procedures or ad-hoc queries, is there any chances
that case 2 may fail?
Edited: No of rows in table nearly 20000 in this case
DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS
--CASE 1
SELECT count(1) from Sales.Customer c -- 95%
--CASE 2
SELECT rowcnt
from sys.sysindexes s
WHERE id=object_id('Sales.Customer') AND s.indid < 2 -- 5%
That system table only has the total number of rows in the table. So you cannot use it if you need to count any subset (i.e. have a WHERE clause).
According to this http://social.msdn.microsoft.com/Forums/en-US/transactsql/thread/9c576d2b-4a4e-4274-8986-dcc644542a65/ it reflects uncommitted data.
I've tried this, and it's true.
While you've got the transaction open, your count(*) would block if you weren't using one of the snapshot isolation levels, otherwise it would give you the correct, committed value.
Apart from that, it should be fine, handles bulk load, etc.

Why do I get "The log file for database 'tempdb' is full"

Let we have a table of payments having 35 columns with a primary key (autoinc bigint) and 3 non-clustered, non-unique indeces (each on one int column).
Among the table's columns we have two datetime fields:
payment_date datetime NOT NULL
edit_date datetime NULL
The table has about 1 200 000 rows.
Only ~1000 of rows have edit_date column = null.
9000 of rows have edit_date not null and not equal to payment_date
Others have edit_date=payment_date
When we run the following query 1:
select top 1 *
from payments
where edit_date is not null and (payment_date=edit_date or payment_date<>edit_date)
order by payment_date desc
server needs a couple of seconds to do it. But if we run query 2:
select top 1 *
from payments
where edit_date is not null
order by payment_date desc
the execution ends up with The log file for database 'tempdb' is full. Back up the transaction log for the database to free up some log space.
If we replace * with some certain column, see query 3
select top 1 payment_date
from payments
where edit_date is not null
order by payment_date desc
it also finishes in a couple of seconds.
Where is the magic?
EDIT
I've changed query 1 so that it operates over exactly the same number of rows as the 2nd query. And still it returns in a second, while query 2 fills tempdb.
ANSWER
I followed the advice to add an index, did this for both date fields - everything started working quick, as expected. Though, the question was - why in this exact situation sql server behave differently on similar queries (query 1 vs query 2); I wanted to understand the logic of the server optimization. I would agree if both queries did used tempdb similarly, but they didn't....
In the end I mark as the answer the first one, where I saw the must-be symptoms of my problem and the first, as well, thoughts on how to avoid this (i.e. indeces)
This is happening cause certain steps in an execution plan can trigger writes to tempdb in particular certain sorts and joins involving lots of data.
Since you are sorting a table with a boat load of columns, SQL decides it would be crazy to perform the sort alone in temp db without the associated data. If it did that it would need to do a gazzilion inefficient bookmark lookups on the underlying table.
Follow these rules:
Try to select only the data you need
Size tempdb appropriately, if you need to do crazy queries that sort a gazzilion rows, you better have an appropriately sized tempdb
Usually, tempdb fills up when you are low on disk space, or when you have set an unreasonably low maximum size for database growth.
Many people think that tempdb is only used for #temp tables. When in fact, you can easily fill up tempdb without ever creating a single temp table. Some other scenarios that can cause tempdb to fill up:
any sorting that requires more memory than has been allocated to SQL
Server will be forced to do its work in tempdb;
if the sorting requires more space than you have allocated to tempdb,
one of the above errors will occur;
DBCC CheckDB('any database') will perform its work in tempdb -- on
larger databases, this can consume quite a bit of space;
DBCC DBREINDEX or similar DBCC commands with 'Sort in tempdb' option
set will also potentially fill up tempdb;
large resultsets involving unions, order by / group by, cartesian
joins, outer joins, cursors, temp tables, table variables, and
hashing can often require help from tempdb;
any transactions left uncommitted and not rolled back can leave
objects orphaned in tempdb;
use of an ODBC DSN with the option 'create temporary stored
procedures' set can leave objects there for the life of the
connection.
USE tempdb
GO
SELECT name
FROM tempdb..sysobjects
SELECT OBJECT_NAME(id), rowcnt
FROM tempdb..sysindexes
WHERE OBJECT_NAME(id) LIKE '#%'
ORDER BY rowcnt DESC
The higher rowcount, values will likely indicate the biggest temporary tables that are consuming space.
Short-term fix
DBCC OPENTRAN -- or DBCC OPENTRAN('tempdb')
DBCC INPUTBUFFER(<number>)
KILL <number>
Long-term prevention
-- SQL Server 7.0, should show 'trunc. log on chkpt.'
-- or 'recovery=SIMPLE' as part of status column:
EXEC sp_helpdb 'tempdb'
-- SQL Server 2000, should yield 'SIMPLE':
SELECT DATABASEPROPERTYEX('tempdb', 'recovery')
ALTER DATABASE tempdb SET RECOVERY SIMPLE
Reference : https://web.archive.org/web/20080509095429/http://sqlserver2000.databases.aspfaq.com:80/why-is-tempdb-full-and-how-can-i-prevent-this-from-happening.html
Other references : http://social.msdn.microsoft.com/Forums/is/transactsql/thread/af493428-2062-4445-88e4-07ac65fedb76

Most efficient way to store queries and counts of large SQL data

I have a SQL Server database with a large amount of data (65 million rows mostly of text, 8Gb total). The data gets changed only once per week. I have an ASP.NET web application that will run several SQL queries on this data that will count the number of rows satisfying various conditions. Since the data gets changed only once per week, what is the most efficient way to store both the SQL queries and their counts for the week? Should I store it in the database or in the application?
If the data is only modified once a week, as part of and at the end of that (ETL?) process, perform your "basic" counts and store the results in a table in the database. Thereafter, rather than lengthy queries on the big tables, you can just query those small summary tables.
If you do not need 100% up-to-the-minute accurate row counts, you could query SQL Server's internal info:
Select so.name as 'TableName', si.rowcnt as 'RowCount'
from sysobjects so
inner join sysindexes si on so.id = si.id
where so.type = 'u' and indid < 2
Very quick to execute and no extra tables required. Not accurate where many updates are occurring but might be accurate enough in your intended usage. [Thank you to commenters!]
Update: did a bit of digging and this does produce accurate counts (slower due to the sum, but still quick):
SELECT OBJECT_SCHEMA_NAME(ps.object_id) AS SchemaName,
OBJECT_NAME(ps.object_id) AS ObjectName,
SUM(ps.row_count) AS row_count
FROM sys.dm_db_partition_stats ps
JOIN sys.indexes i ON i.object_id = ps.object_id
AND i.index_id = ps.index_id
WHERE i.type_desc IN ('CLUSTERED','HEAP')
AND OBJECT_SCHEMA_NAME(ps.object_id) <> 'sys'
GROUP BY ps.object_id
ORDER BY OBJECT_NAME(ps.object_id), OBJECT_SCHEMA_NAME(ps.object_id)
Ref.
Remember that the stored count information was not always 100%
accurate in SQL Server 2000. For a new table created on 2005 the
counts will be accurate. But for a table that existed in 2000 and now
resides on 2005 through a restore or update, you need to run (only
once after the move to 2005) either sp_spaceused #updateusage =
N'true' or DBCC UPDATEUSAGE with the COUNT_ROWS option.
The queries should be stored as stored procedures or views, depending on complexity.
For your situation I would look into indexed views.
They let you both store a query AND the result set for things like aggregation that otherwise cannot be indexed.
As a bonus, the query optimizer "knows" it has this data as well, so if you check for a count or something else stored in the view index in another query (even one not referencing the view directly) it can still use that stored data.

Simple query for generating database metrics?

Give a database (Sybase) with many tables, I would like to write a SQL query that will calculate, for each table, the number of rows and the number of columns.
Unfortunately, my SQL is a bit rusty. I can generate the table names:
select name from sysobjects where type = 'U'
but how to bind the databases returned to T in:
select count(*) from T
is beyond me. Is it even possible to do this kind of thing?
I don't use Sybase, but the online docs indicate the row counts are in systabstats and the columns are in syscolumns.
SELECT sysobjects.name,
(SELECT COUNT(*) FROM syscolumns WHERE syscolumns.id = sysobjects.id) AS cols,
systabstats.rowcnt
FROM sysobjects
JOIN systabstats
ON (sysobjects.id = systabstats.id AND sysobjects.type = 'U' AND systabstats.indid = 0)
As fredt has given the answer, I'll just provide some extra info.
The built in procedure sp_spaceused "tablename" will give you the number of rows for a selected table, along with details of how much storage space it's using. Used without the parameter it provides storage usage for the current database as a whole.
You can look at the SQL in the various system stored procedures to see where they get their information from. sp_spaceused and sp_help would both be useful for you in this. They live in the sybsystemprocs database. Just be careful not to modify any of those procedures.
There are various versions of a stored procedure called sp_rowcount floating around the internet, that provide what you ask for (rowcount anyway), but inside they are equivalent to the select statement from fredt. The one I use provides index count and table locking scheme. I don't recall exactly where I got mine so don't want to just distribute it in case I upset someone's copyright.