I have read references of cost of SQL statement everywhere in databases.
What exactly does it mean? So this is the number of statements to be executed or something?
Cost is rough measure of how much CPU time and disk I/O server must spend to execute the query.
Typically cost is split into few components:
Cost to parse the query - it can be non-trivial just to understand what you are asking for.
Cost to get data from disk and access indexes if it reduces the cost.
Cost to compute some mathematical operations on your data.
Cost to group or sort data.
Cost to deliver data back to client.
The cost is an arbitrary number that is higher if the CPU time/IO/Memory is higher. It is also vendor specific.
It means how much it will "cost" you to run a specific SQL query in terms of CPU, IO, etc.
For example Query A can cost you 1.2sec and Query B can cost you 1.8sec
See here:
Measuring Query Performance : "Execution Plan Query Cost" vs "Time Taken"
Cost of a query relates to how much CPU utilization and time will query take. This is an estimated value, your query might take less or more time depending on the data. If your tables and all the indexes of that table are analyzed(analyze table table_name compute statistics for table for all indexes for all indexed columns) then cost based result's estimate will suffice with your query execution time.
Theoretically the above answers can satisfy you.But when you are on the floor working here is an insight.
Practically you can access the cost by the number of seeks and scans your SQL query takes.
Go to Execution Plan and accordingly you will be able to optimize the amount of time (roughly the cost) your query takes.
A sample execution plan looks like this :
It means the query performance. How much optimized the query is...
Related
I have a query that performs very quickly but in production when server loads are high its performance is underwhelming. I have a suspicion that it might be the Estimated Rows being much lower than the Actual Rows in the execution plan. I know that server statistics are not stale.
I am now optimizing a new query and I worry that it will have the same problem in production. The number of rows returned and the CPU and Reads are well within the designated thresholds my data admins require. As you can see in the above SQL Sentry plan there are a few temp tables that estimate a single row but return 100 times as many rows.
My question is this, even when the number of rows are few, does a difference in rows by such a large percentage cause bottlenecks on the server's performance? Secondary question, if the problem isn't a bad cached plan or stale stats, what other issues would cause a plan to show such a discrepancy?
A difference between actual and estimated rows does not cause a "bottleneck" in the server.
The impact is on algorithms and resource allocation for the query. SQL Server has multiple algorithms that it can use for things like JOINs and GROUP BYs. The (estimated) size of the data is one of the primary items of information that it uses to choose the appropriate algorithm.
Choosing the wrong algorithm is not exactly a bottleneck, but it does slow the query down. You would need to study the execution plan to see if this is happening in your case.
If you have simple queries that select from a single table, then there are many fewer options for the execution plan. The only impact I can readily think of in this case would be using an full table scan rather than an index for filtering. For your data sizes, I don't think that would make much of a difference.
Estimate Rows vs Actual Rows, what is the impact on performance?
If there is huge difference between Estimate Rows and Actual Rows then you need to worry about that query.
There can be no of reason for this.
Stale Statistics
Skewed data distribution : Here Statistics is updated, but it is skewed.Create Filtered Statistics for those index will help.
Un-Optimize query :Poorly written query.Join condition are in wrong manner.
The query remains constant i.e it will remain the same.
e.g. a select query takes 30 minutes if it returns 10000 rows.
Would the same query take 1 hour if it has to return 20000 rows?
I am interested in knowing the mathematical relation between no. of rows(N) and execution time(T) keeping other parameters as constant(K).
i.e T= N*K or
T=N*K + C or
any other formula?
Reading http://research.microsoft.com/pubs/76556/progress.pdf if it helps. Anybody who can understand this before me, please do reply. Thanks...
Well that is good question :), but there is not exact formula, because it depends of execution plan.
SQL query optimizer could choose another execution plan on query which return different number of rows.
I guess if the query execution plan is the same for both query's and you have some "lab" conditions then time growth could be linear. You should research more on sql execution plans and statistics
Take the very simple example of reading every row in a single table.
In the worst case, you will have to read every page of the table from your underlying storage. The worst case for this is having to do a random seek. The seek time will dominate all other factors. So you can estimate the total time.
time ~= seek time x number of data pages
Assuming your rows are of a fairly regular size, then this is linear in the number of rows.
However databases do a number of things to try and avoid this worst case. For example, in SQL Server table storage is often allocated in extents of 8 consecutive pages. A hard drive has a much faster streaming IO rate than random IO rate. If you have a clustered index, reading the pages in cluster order tend to have a lot more streaming IO than random IO.
The best case time, ignoring memory caching, is (8KB is the SQL Server page size)
time ~= 8KB * number of data pages / streaming IO rate in KB/s
This is also linear in the number of rows.
As long as you do a reasonable job managing fragmentation, you could reasonably extrapolate linearly in this simple case. This assumes your data is much larger than the buffer cache. If not, you also have to worry about the cliff edge where your query changes from reading from buffer to reading from disk.
I'm also ignoring details like parallel storage paths and access.
I am trying to increase one of my request performance.
My request is made of 10 different select .
The actual production query is taking 36sec to execute.
If I display the execution plan, for one select I have a query cost of 18%.
So I change a in clause (in this select) with an xml query (http://www.codeproject.com/KB/database/InClauseAndSQLServer.aspx).
The new query now takes 28 sec to execute, but sql server tells me that the above select has a query cost of 100%. And this is the only change I made. And there is no parallelism in any query.
PRODUCTION :
36sec, my select is 18% (the others are 10%).
NEW VERSION :
28sec, my select is 100% (the others are 0%).
Do you have any idea how sql server compute this "query cost" ? (I start to believe that it's random or something like that).
Query cost is a unitless measure of a combination of CPU cycles, memory, and disk IO.
Very often you will see operators or plans with a higher cost but faster execution time.
Primarily this is due to the difference in speed of the above three components. CPU and Memory are fairly quick, and also uncommon as bottlenecks. If you can shift some pressure from the disk IO subsystem to the CPU, the query may show a higher cost but should execute substantially faster.
If you want to get more detailed information about the execution of your specific queries, you can use:
SET STATISTICS IO ON
SET STATISTICS TIME ON
This will output detailed information about CPU cycles, plan creation, and page reads (both from disk and from memory) to the messages tab.
I am using SQL Server Execution plan for analysing the performance of a stored procedure. I have two results with and without the index. In both these results the estimated cost shows the same value (.0032831) but the cost % differs from one another as first, without index is 7% and with Index is 14%.
What does it really means?
Please help me with this.,
Thanks in advance.
It means that the plan without the index is costed as being doubly expensive.
For the first one the total plan cost is .0032831/0.07 = 0.0469, for the second one the total plan cost is .0032831/0.14 which is clearly going to be half that number.
When Oracle is estimating the 'Cost' for certain queries, does it actually look at the amount of data (rows) in a table?
For example:
If I'm doing a full table scan of employees for name='Bob', does it estimate the cost by counting the amount of existing rows, or is it always a set cost?
The cost optimizer uses segment (table and index) statistics as well as system (cpu + i/o performance) statistics for the estimates. Although it depends on how your database is configured, from 10g onwards the segment statistics are usually computed once per day by a process that is calling the DBMS_STATS package.
In the default configuration, Oracle will check the table statistics (which you can look at by querying the ALL_TABLES view - see the column NUM_ROWS). Normally an Oracle job is run periodically to re-gather these statistics by querying part or all of the table.
If the statistics haven't been gathered (yet), the optimizer will (depending on the optimizer_dynamic_sampling parameter) run a quick sample query on the table in order to calculate an estimate for the number of rows in that table.
(To be more accurate, the cost of scanning a table is calculated not from the number of rows, but the number of blocks in the table (which you can see in the BLOCKS column in ALL_TABLES). It takes this number and divides it by a factor related to the multi-block read count to calculate the cost of that part of the plan.)