I read on NET that using case 2 is more fast than case 1 to check no of rows in a table.
so i did a performance test of both count(1) vs rowcnt from sys.sysindexes I found
2nd one is far better.
I've a question is it good to use CASE 2 in production code to whenever i need to
to count no of rows in a table in Stored procedures or ad-hoc queries, is there any chances
that case 2 may fail?
Edited: No of rows in table nearly 20000 in this case
DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS
--CASE 1
SELECT count(1) from Sales.Customer c -- 95%
--CASE 2
SELECT rowcnt
from sys.sysindexes s
WHERE id=object_id('Sales.Customer') AND s.indid < 2 -- 5%
That system table only has the total number of rows in the table. So you cannot use it if you need to count any subset (i.e. have a WHERE clause).
According to this http://social.msdn.microsoft.com/Forums/en-US/transactsql/thread/9c576d2b-4a4e-4274-8986-dcc644542a65/ it reflects uncommitted data.
I've tried this, and it's true.
While you've got the transaction open, your count(*) would block if you weren't using one of the snapshot isolation levels, otherwise it would give you the correct, committed value.
Apart from that, it should be fine, handles bulk load, etc.
Related
Hi consider there is an INSERT statement running on a table TABLE_A, which takes a long time, I would like to see how has it progressed.
What I tried was to open up a new session (new query window in SSMS) while the long running statement is still in process, I ran the query
SELECT COUNT(1) FROM TABLE_A WITH (nolock)
hoping that it will return right away with the number of rows everytime I run the query, but the test result was even with (nolock), still, it only returns after the INSERT statement is completed.
What have I missed? Do I add (nolock) to the INSERT statement as well? Or is this not achievable?
(Edit)
OK, I have found what I missed. If you first use CREATE TABLE TABLE_A, then INSERT INTO TABLE_A, the SELECT COUNT will work. If you use SELECT * INTO TABLE_A FROM xxx, without first creating TABLE_A, then non of the following will work (not even sysindexes).
Short answer: You can't do this.
Longer answer: A single INSERT statement is an atomic operation. As such, the query has either inserted all the rows or has inserted none of them. Therefore you can't get a count of how far through it has progressed.
Even longer answer: Martin Smith has given you a way to achieve what you want. Whether you still want to do it that way is up to you of course. Personally I still prefer to insert in manageable batches if you really need to track progress of something like this. So I would rewrite the INSERT as multiple smaller statements. Depending on your implementation, that may be a trivial thing to do.
If you are using SQL Server 2016 the live query statistics feature can allow you to see the progress of the insert in real time.
The below screenshot was taken while inserting 10 million rows into a table with a clustered index and a single nonclustered index.
It shows that the insert was 88% complete on the clustered index and this will be followed by a sort operator to get the values into non clustered index key order before inserting into the NCI. This is a blocking operator and the sort cannot output any rows until all input rows are consumed so the operators to the left of this are 0% done.
With respect to your question on NOLOCK
It is trivial to test
Connection 1
USE tempdb
CREATE TABLE T2
(
X INT IDENTITY PRIMARY KEY,
F CHAR(8000)
);
WHILE NOT EXISTS(SELECT * FROM T2 WITH (NOLOCK))
LOOP:
SELECT COUNT(*) AS CountMethod FROM T2 WITH (NOLOCK);
SELECT rows FROM sysindexes WHERE id = OBJECT_ID('T2');
RAISERROR ('Waiting for 10 seconds',0,1) WITH NOWAIT;
WAITFOR delay '00:00:10';
SELECT COUNT(*) AS CountMethod FROM T2 WITH (NOLOCK);
SELECT rows FROM sysindexes WHERE id = OBJECT_ID('T2');
RAISERROR ('Waiting to drop table',0,1) WITH NOWAIT
DROP TABLE T2
Connection 2
use tempdb;
--Insert 2000 * 2000 = 4 million rows
WITH T
AS (SELECT TOP 2000 'x' AS x
FROM master..spt_values)
INSERT INTO T2
(F)
SELECT 'X'
FROM T v1
CROSS JOIN T v2
OPTION (MAXDOP 1)
Example Results - Showing row count increasing
SELECT queries with NOLOCK allow dirty reads. They don't actually take no locks and can still be blocked, they still need a SCH-S (schema stability) lock on the table (and on a heap it will also take a hobt lock).
The only thing incompatible with a SCH-S is a SCH-M (schema modification) lock. Presumably you also performed some DDL on the table in the same transaction (e.g. perhaps created it in the same tran)
For the use case of a large insert, where an approximate in flight result is fine, I generally just poll sysindexes as shown above to retrieve the count from metadata rather than actually counting the rows (non deprecated alternative DMVs are available)
When an insert has a wide update plan you can even see it inserting to the various indexes in turn that way.
If the table is created inside the inserting transaction this sysindexes query will still block though as the OBJECT_ID function won't return a result based on uncommitted data regardless of the isolation level in effect. It's sometimes possible to get around that by getting the object_id from sys.tables with nolock instead.
Use the below query to find the count for any large table or locked table or being inserted table in seconds . Just replace the table name which you want to search.
SELECT
Total_Rows= SUM(st.row_count)
FROM
sys.dm_db_partition_stats st
WHERE
object_name(object_id) = 'TABLENAME' AND (index_id < 2)
For those who just need to see the record count while executing a long running INSERT script, I found you can see the current record count through SSMS by right clicking on the destination database table, -> Properties -> Storage, then view the "Row Count" value like so:
Close window and repeat to see the updated record count.
Let we have a table of payments having 35 columns with a primary key (autoinc bigint) and 3 non-clustered, non-unique indeces (each on one int column).
Among the table's columns we have two datetime fields:
payment_date datetime NOT NULL
edit_date datetime NULL
The table has about 1 200 000 rows.
Only ~1000 of rows have edit_date column = null.
9000 of rows have edit_date not null and not equal to payment_date
Others have edit_date=payment_date
When we run the following query 1:
select top 1 *
from payments
where edit_date is not null and (payment_date=edit_date or payment_date<>edit_date)
order by payment_date desc
server needs a couple of seconds to do it. But if we run query 2:
select top 1 *
from payments
where edit_date is not null
order by payment_date desc
the execution ends up with The log file for database 'tempdb' is full. Back up the transaction log for the database to free up some log space.
If we replace * with some certain column, see query 3
select top 1 payment_date
from payments
where edit_date is not null
order by payment_date desc
it also finishes in a couple of seconds.
Where is the magic?
EDIT
I've changed query 1 so that it operates over exactly the same number of rows as the 2nd query. And still it returns in a second, while query 2 fills tempdb.
ANSWER
I followed the advice to add an index, did this for both date fields - everything started working quick, as expected. Though, the question was - why in this exact situation sql server behave differently on similar queries (query 1 vs query 2); I wanted to understand the logic of the server optimization. I would agree if both queries did used tempdb similarly, but they didn't....
In the end I mark as the answer the first one, where I saw the must-be symptoms of my problem and the first, as well, thoughts on how to avoid this (i.e. indeces)
This is happening cause certain steps in an execution plan can trigger writes to tempdb in particular certain sorts and joins involving lots of data.
Since you are sorting a table with a boat load of columns, SQL decides it would be crazy to perform the sort alone in temp db without the associated data. If it did that it would need to do a gazzilion inefficient bookmark lookups on the underlying table.
Follow these rules:
Try to select only the data you need
Size tempdb appropriately, if you need to do crazy queries that sort a gazzilion rows, you better have an appropriately sized tempdb
Usually, tempdb fills up when you are low on disk space, or when you have set an unreasonably low maximum size for database growth.
Many people think that tempdb is only used for #temp tables. When in fact, you can easily fill up tempdb without ever creating a single temp table. Some other scenarios that can cause tempdb to fill up:
any sorting that requires more memory than has been allocated to SQL
Server will be forced to do its work in tempdb;
if the sorting requires more space than you have allocated to tempdb,
one of the above errors will occur;
DBCC CheckDB('any database') will perform its work in tempdb -- on
larger databases, this can consume quite a bit of space;
DBCC DBREINDEX or similar DBCC commands with 'Sort in tempdb' option
set will also potentially fill up tempdb;
large resultsets involving unions, order by / group by, cartesian
joins, outer joins, cursors, temp tables, table variables, and
hashing can often require help from tempdb;
any transactions left uncommitted and not rolled back can leave
objects orphaned in tempdb;
use of an ODBC DSN with the option 'create temporary stored
procedures' set can leave objects there for the life of the
connection.
USE tempdb
GO
SELECT name
FROM tempdb..sysobjects
SELECT OBJECT_NAME(id), rowcnt
FROM tempdb..sysindexes
WHERE OBJECT_NAME(id) LIKE '#%'
ORDER BY rowcnt DESC
The higher rowcount, values will likely indicate the biggest temporary tables that are consuming space.
Short-term fix
DBCC OPENTRAN -- or DBCC OPENTRAN('tempdb')
DBCC INPUTBUFFER(<number>)
KILL <number>
Long-term prevention
-- SQL Server 7.0, should show 'trunc. log on chkpt.'
-- or 'recovery=SIMPLE' as part of status column:
EXEC sp_helpdb 'tempdb'
-- SQL Server 2000, should yield 'SIMPLE':
SELECT DATABASEPROPERTYEX('tempdb', 'recovery')
ALTER DATABASE tempdb SET RECOVERY SIMPLE
Reference : https://web.archive.org/web/20080509095429/http://sqlserver2000.databases.aspfaq.com:80/why-is-tempdb-full-and-how-can-i-prevent-this-from-happening.html
Other references : http://social.msdn.microsoft.com/Forums/is/transactsql/thread/af493428-2062-4445-88e4-07ac65fedb76
Let's say I have an update such as:
UPDATE [db1].[sc1].[tb1]
SET c1 = LEFT(c1, LEN(c1)-1)
WHERE c1 like '%:'
This update is basically going to go through millions of rows and trim the colon if there is one in the c1 column.
How can I track how far along in the table this has progressed?
Thanks
This is sql server 2008
You can use the sysindexes table, which keeps track of how much an index has changed. Because this is done in an atomic update, it won't have a chance to recalc statistics, so rowmodctr will keep growing. This is sometimes not noticeable in small tables, but for millions, it will show.
-- create a test table
create table testtbl (id bigint identity primary key clustered, nv nvarchar(max))
-- fill it up with dummy data. 1/3 will have a trailing ':'
insert testtbl
select
convert(nvarchar(max), right(a.number*b.number+c.number,30)) +
case when a.number %3=1 then ':' else '' end
from master..spt_values a
inner join master..spt_values b on b.type='P'
inner join master..spt_values c on c.type='P'
where a.type='P' and a.number between 1 and 5
-- (20971520 row(s) affected)
update testtbl
set nv = left(nv, len(nv)-1)
where nv like '%:'
Now in another query window, run the below continuously and watch the rowmodctr going up and up. rowmodctr vs rows gives you an idea where you are up to, if you know where rowmodctr needs to end up being. In our case, it is 67% of just over 2 million.
select rows, rowmodctr
from sysindexes with (nolock)
where id = object_id('testtbl')
Please don't run (nolock) counting queries on the table itself while it is being updated.
Not really... you can query with the nolock hint and same where, but this will take resources
It isn't an optimal query with a leading wildcard of course...)
Database queries, particularly Data Manipulation Language (DML), are atomic. That means that the INSERT/UPDATE/DELETE either successfully occurs, or it doesn't. There's no means to see what record is being processed -- to the database, they all had been changed once the COMMIT is issued after the UPDATE. Even if you were able to view the records in process, by the time you would see the value, the query will have progressed on to other records.
The only means to knowing where in the process is to script the query to occur within a loop, so you can use a counter to know how many are processed. It's common to do this so large data sets are periodically committed, to minimize the risk of failure requiring having to run the entire query over again.
I have a process that is converting dates from GMT to Australian Eastern Standard Time. To do this, I need to select the records from the database, process them and then save them back.
To select the records, I have the following query:
SELECT id,
user_id,
event_date,
event,
resource_id,
resource_name
FROM
(SELECT rowid id,
rownum r,
user_id,
event_date,
event,
resource_id,
resource_name
FROM user_activity
ORDER BY rowid)
WHERE r BETWEEN 0 AND 50000
to select a block of 50000 rows from a total of approx. 60 million rows. I am splitting them up because a) Java (what the update process is written in) runs out of memory with too many rows (I have a bean object for each row) and b) I only have 4 gig of Oracle temp space to play with.
In the process, I use the rowid to update the record (so I have a unique value) and the rownum to select the blocks. I then call this query in iterations, selecting the next 50000 records until none remain (the java program controls this).
The problem I'm getting is that I'm still running out of Oracle temp space with this query. My DBA has told me that more temp space cannot be granted, so another method must be found.
I've tried substituting the subquery (what I presume is using all the temp space with the sort) with a view but an explain plan using a view is identical to one of the original query.
Is there a different/better way to achieve this without running into the memory/tempspace problems? I'm assuming an update query to update the dates (as opposed to a java program) would suffer from the same problem using temp space available?
Your assistance on this is greatly appreciated.
Update
I went down the path of the pl/sql block as suggested below:
declare
cursor c is select event_date from user_activity for update;
begin
for t_row in c loop
update user_activity
set event_date = t_row.event_date + 10/24 where current of c;
commit;
end loop;
end;
However, I'm running out of undo space. I was under the impression that if the commit was made after each update, then the need for undo space is minimal. Am I incorrect in this assumption?
A single update probably would not suffer from the same issue, and would probably be orders of magnitude faster. The large amount of temp tablespace is only needed because of the sorting. Although if your DBA is so stingy with the temp tablespace you may end up running out of UNDO space or something else. (Take a look at ALL_SEGMENTS, how large is your table?)
But if you really must use this method, maybe you can use a filter instead of an order by. Create 1200 buckets and process them one at a time:
where ora_hash(rowid, 1200) = 1
where ora_hash(rowid, 1200) = 2
...
But this will be horribly, horribly slow. And what happens if a value changes halfway through the process? A single SQL statement is almost certainly the best way to do this.
Why not just one update or merge?
Or you can write anonymous pl/sql block with processing data with cursor
For example
declare
cursor c is select * from aa for update;
begin
for t_row in c loop
update aa
set val=t_row.val||' new value';
end loop;
commit;
end;
How about not updating it at all?
rename user_activity to user_activity_gmt
create view user_activity as
select id,
user_id,
event_date+10/24 as event_date,
event,
resource_id,
resource_name
from user_activity_gmt;
I am using Microsoft SQL Server.
I have a Table which had been updated by 80 rows.
If I right click and look at the table properties the rowcount say 10000 but a select Count(id) from TableName indicates 10080.
I checked the statistics and they also have a rowcount of 10080.
Why is there a difference between the Rocount in Properties and the Select Count?
Thanks,
S
This information most probably comes from the sysindexes table (see the documentation) and the
information in sysindexes isn't guaranteed to be up-to-date. This is a known fact in SQL Server.
Try running DBCC UPDATEUSAGE and check the values again.
Ref: http://msdn.microsoft.com/en-us/library/ms188414.aspx
DBCC UPDATEUSAGE corrects the rows,
used pages, reserved pages, leaf pages
and data page counts for each
partition in a table or index. If
there are no inaccuracies in the
system tables, DBCC UPDATEUSAGE
returns no data. If inaccuracies are
found and corrected and WITH
NO_INFOMSGS is not used, DBCC
UPDATEUSAGE returns the rows and
columns being updated in the system
tables.
Example:
DBCC UPDATEUSAGE (0)
Update the statistics. That's the only way RDBMS knows current status of your tables and indexes. This also helps RDBMS to choose correct execution path for optimal performance.
SQL Server 2005
UPDATE STATISTICS dbOwner.yourTableName;
Oracle
UPDATE STATISTICS yourSchema.yourTableName;
The property info is cached in SSMS.
there are a variety of ways to check the size of a table.
http://blogs.msdn.com/b/martijnh/archive/2010/07/15/sql-server-how-to-quickly-retrieve-accurate-row-count-for-table.aspx mentions 4 of various accuracy and speed.
The ever reliable full table scan is a bit slow ..
SELECT COUNT(*) FROM Transactions
and the quick alternative depends on statistics
SELECT CONVERT(bigint, rows)
FROM sysindexes
WHERE id = OBJECT_ID('Transactions')
AND indid < 2
It also mentions that the ssms gui uses the query
SELECT CAST(p.rows AS float)
FROM sys.tables AS tbl
INNER JOIN sys.indexes AS idx ON idx.object_id = tbl.object_id and idx.index_id < 2
INNER JOIN sys.partitions AS p ON p.object_id=CAST(tbl.object_id AS int)
AND p.index_id=idx.index_id
WHERE ((tbl.name=N'Transactions'
AND SCHEMA_NAME(tbl.schema_id)='dbo'))
and that a fast, and relatively accurate way to size a table is
SELECT SUM (row_count)
FROM sys.dm_db_partition_stats
WHERE object_id=OBJECT_ID('Transactions')
AND (index_id=0 or index_id=1);
Unfortunately this last query requires extra permissions beyond basic select.