SQL Server 2000 - What really is the "Actual Number of Rows"? - sql

I have an SQL Server 2000 query which performs an clustered index scan and displays a very high number of rows. For a table where I have 260.000 records, the actual number of rows displayed in the plan execution is ... 34.000.000.
Does this makes sense? What am I misunderstanding?
Thanks.

The values displayed on the execution plan are estimates based on statistics. Normal queries like:
SELECT COUNT(*) FROM Table
are 100% accurate for your transaction*.
Here's a related question.
*Edge cases may vary depending on transaction isolation level.
More information on stats:
Updating statistics
How often and how (maintenance plans++)

If your row counts are off from the query plan, you'll need to update the statistics or else the query optimizer will possibly be choosing the wrong plan. Also, a clustered index scan is almost like a table scan... try to fix up the indexes so you get a clustered index seek or at least an index seek.

But ... If it's the "Actual Number of Rows" ... why is that based on statistics?
I assumed that the Estimated Number of Rows is used when building the query plan (and colected from statistics at that time) and the Actual Number of Rows is some extra info added after the query execution, for user debug and tunning purposes?
Isn't this right?

Related

Estimate Rows vs Actual Rows, what is the impact on performance?

I have a query that performs very quickly but in production when server loads are high its performance is underwhelming. I have a suspicion that it might be the Estimated Rows being much lower than the Actual Rows in the execution plan. I know that server statistics are not stale.
I am now optimizing a new query and I worry that it will have the same problem in production. The number of rows returned and the CPU and Reads are well within the designated thresholds my data admins require. As you can see in the above SQL Sentry plan there are a few temp tables that estimate a single row but return 100 times as many rows.
My question is this, even when the number of rows are few, does a difference in rows by such a large percentage cause bottlenecks on the server's performance? Secondary question, if the problem isn't a bad cached plan or stale stats, what other issues would cause a plan to show such a discrepancy?
A difference between actual and estimated rows does not cause a "bottleneck" in the server.
The impact is on algorithms and resource allocation for the query. SQL Server has multiple algorithms that it can use for things like JOINs and GROUP BYs. The (estimated) size of the data is one of the primary items of information that it uses to choose the appropriate algorithm.
Choosing the wrong algorithm is not exactly a bottleneck, but it does slow the query down. You would need to study the execution plan to see if this is happening in your case.
If you have simple queries that select from a single table, then there are many fewer options for the execution plan. The only impact I can readily think of in this case would be using an full table scan rather than an index for filtering. For your data sizes, I don't think that would make much of a difference.
Estimate Rows vs Actual Rows, what is the impact on performance?
If there is huge difference between Estimate Rows and Actual Rows then you need to worry about that query.
There can be no of reason for this.
Stale Statistics
Skewed data distribution : Here Statistics is updated, but it is skewed.Create Filtered Statistics for those index will help.
Un-Optimize query :Poorly written query.Join condition are in wrong manner.

Suboptimal execution plan is better than optimal plan when statistics are stale

Why the cost of the execution plan that was generated based on stale statistics is cheaper than the cost of the plan based on updated statistics?
To understand my problem please to follow below scenario:
Assumption: auto statistics update is off.
When I updated statistics with full scan manually then I executed following batch including actual execution plan:
CHECKPOINT;
DBCC DROPCLEANBUFFERS;
SELECT *
FROM AWSales -- table has 60000 rows
WHERE SalesOrderID = 44133
OPTION(RECOMPILE);
--returns 17 rows
The optimizer generated a plan that used non clustered index seek and key lookup - that was definitely fine.
Then I wanted to cheat the optimizer so I inserted 60 000 rows where input value for SalesOrderID column was 44133.
Without updating statistics I executed the mentioned batch again and the optimizer returned the same plan (with index seek) but with different cost (60 000 rows returned), of course.
Next I updated statistics with full scan manually for the table and I executed the batch again. This time the optimizer returned different plan with index scan operator. Predictable. At first glance it looked good. But when I compared the query plan costs it was more expensive than the cost of plan that used index seek. So after updating statistics query with optimal plan was slower. NONSENSE!
Then I wanted to compare costs of index seek before update and after update. Updated statistics caused the optimizer chose plan with index scan, so to force generating plan with index seek I added hint into the query. After executing it turned out that the actual query cost (forced index scan) was MUCH bigger then cost when statistics were stale. How is that possible?
For more details please look at sample script

SQL Server choice wrong execution plan

When this query is executed, SQL Server chooses a wrong execution plan, why?
SELECT top 10 AccountNumber , AVERAGE
FROM [M].[dbo].[Account]
WHERE [Code] = 9201
Go
SELECT top 10 AccountNumber , AVERAGE
FROM [M].[dbo].[Account] with (index(IX_Account))
WHERE [Code] = 9201
SQL Server chooses the clustered PK index for this query and elapsed time = 78254 ms, but if I force SQL Server to choose a non-clustered index then elapsed time is 2 ms, Stats Account table is updated.
It's usually down to having bad statistics on the various indexes. Even with correct stats, an index can only hold so many samples and occasionally when there is a massive skew in the values then the optimiser can think that it won't find a sufficiently small number.
Also you can sometimes have a massive amount of [almost] empty blocks to read through with data values only at "the end". This can sometimes mean where you have a couple of otherwise close variations, one will require drastically more IO to burn through the holes. Likewise if you don't actually have 10 values for 9201 it will have to do an entire table scan if it choses the PK/CI rather than a more fitting index. This is more prevalent when you've done plenty of deletes.
Try updating the stats on the various indexes and things like that & see if it changes anything. 78 seconds is a lot of IO on a single table scan.

Why is SQL Server not using Index for very similar datetime query?

i have a table on SQL Server with about 1 million rows.
It has an ID (PK), a status (int) and a datetime column.
Also I've created an Index on the datetime column.
Now I've found out an effect, which I don't understand.
SELECT status
FROM table
WHERE dateTime BETWEEN '2010-01-01T00:00:00' AND '2010-01-02T12:00:00'
This statement returns 3664 rows. It runs about 150ms and the execution plan shows that its doing an index seek with key lookup.
Now, if I change it as following (just change the hour from 12 to 13):
SELECT status
FROM table
WHERE dateTime BETWEEN '2010-01-01T00:00:00' AND '2010-01-02T13:00:00'
This statement returns 3667 rows. It runs about 600ms and the exuction plan shows that its using the primary key!
I just don't understand it. For 3667 and more rows its always using the primary key, even if the seek is much faster.
Is there an explanation?
status is not included in the index on datetime so it needs to do key lookups for each matching row to retrieve this value.
As the range grows (and hence number of lookups required) it estimates that it will be quicker just to scan the whole (covering) clustered index avoiding the lookups. Possibly incorrectly in your case. The point at which it switches from one plan to the other is known as the tipping point.
You should check if the estimated vs actual number of rows is out of whack (perhaps some rows that would have matched the range have been deleted since the statistics were last updated).
Or maybe the index scan is more expensive than the costing assumptions assume due to high levels of fragmentation or for some other reason the costing assumptions made do not reflect the actual relative performance in your environment.

SQL Server : Estimated Execution Plan

I am using SQL Server Execution plan for analysing the performance of a stored procedure. I have two results with and without the index. In both these results the estimated cost shows the same value (.0032831) but the cost % differs from one another as first, without index is 7% and with Index is 14%.
What does it really means?
Please help me with this.,
Thanks in advance.
It means that the plan without the index is costed as being doubly expensive.
For the first one the total plan cost is .0032831/0.07 = 0.0469, for the second one the total plan cost is .0032831/0.14 which is clearly going to be half that number.