Currently working on improving the efficiency of a few queries. After looking at the query plan I found that SQL is not performing parallelism when a top clause is in place, increasing the query time from 1-2s to several minutes.
The query in question is using a view with various joins and unions, I'm looking for a general answer/understanding as to why this is happening - google has thus far failed me.
Thanks
As you may be aware that
Generally, SQL Server processes queries in parallel in the following cases:
When the number of CPUs is greater than the number of active connections.
When the estimated cost for the serial execution of a query is higher than the query plan threshold (The estimated cost refers to the elapsed time in seconds required to execute the query serially.)
Certain types of statements cannot be processed in parallel unless they contain clauses, however.
For example, UPDATE, INSERT, and DELETE are not normally processed in parallel even if the related query meets the criteria.
But if the UPDATE or DELETE statements contain a WHERE clause, or an INSERT statement contains a SELECT clause, WHERE and SELECT can be executed in parallel. Changes are applied serially to the database in these cases.
To configure parallel processing, simply do the following:
In the Server Properties dialog box, go to the Advanced page.
By default, the Max Degree Of Parallelism setting has a value of 0, which means that the maximum number of processors used for parallel processing is controlled automatically. Essentially, SQL Server uses the actual number of available processors, depending on the workload. To limit the number of processors used for parallel processing to a set amount (up to the maximum supported by SQL Server), change the Max Degree Of Parallelism setting to a value greater than 1. A value of 1 tells SQL Server not to use parallel processing.
Large, complex queries usually can benefit from parallel execution. However, SQL Server performs parallel processing only when the estimated number of seconds required to run a serial plan for the same query is higher than the value set in the cost threshold for parallelism. Set the cost estimate threshold using the Cost Threshold For Parallelism box on the Advanced page of the Server Properties dialog box. You can use any value from 0 through 32,767. On a single CPU, the cost threshold is ignored.
Click OK. These changes are applied immediately. You do not need to restart the server.
You can use the stored procedure sp_configure to configure parallel processing. The Transact-SQL commands are:
exec sp_configure "max degree of parallelism", <integer value>
exec sp_configure "cost threshold for parallelism", <integer value>
Quoted from Technet article Configure Parallel Processing in SQL Server 2008
TOP automatically places the query into serial (non parallel mode). This is a restriction and cannot be overcome. Attempt using a rank where rand value=1 as a possible alternative to the TOP function...
Related
I have used default degree of parallelism in order to gain performance tuning and I got the best results too. but I doubt it will impact when some other job access the same table at same time.
sample code below.
select /*+ FULL(customer) PARALLEL(customer, default) */ customer_name from customer;
The number of servers available is 8 . How this default degree of parallelism works? will it affect if some other job running query on same table at same time? Before moving this query to production , I would like to know whether this will impact ? Thanks!
From documentation:
PARALLEL (DEFAULT):
The optimizer calculates a degree of parallelism equal to the number
of CPUs available on all participating instances times the value of
the PARALLEL_THREADS_PER_CPU initialization parameter.
The maximum degree of parallelism is limited by the number of CPUs in the system. The formula used to calculate the limit is :
PARALLEL_THREADS_PER_CPU * CPU_COUNT * the number of instances available
by default, all the opened instances on the cluster but can be constrained using PARALLEL_INSTANCE_GROUP or service specification. This is the default.
Execution Plan Download Link: https://www.dropbox.com/s/3spvo46541bf6p1/Execution%20plan.xml?dl=0
I'm using SQL Server 2008 R2
I have a pretty complex stored procedure that's requesting way too much memory upon execution. Here's a screenshot of the execution plan:
http://s15.postimg.org/58ycuhyob/image.png
The underlying query probably needs a lot of tuning as indicated by massive number of estimated rows, but that's besides the point. Regardless of the complexity of the query, it should not be requesting 3 gigabytes of memory upon execution.
How do I prevent this behavior? I've tried the following:
DBCC FREEPROCCACHE to clear plan cache. This accomplished nothing.
Setting RECOMPILE option on both SP and SQL level. Again, this does nothing.
Messing around with MAXDOP option, from 0 to 8. Same issue.
The query returns about ~1k rows on average, and it does look into a table with more than 3 million rows with about 4 tables being joined. Executing the query returns the result in less than 3 seconds in majority of the cases.
Edit:
One more thing, using query hints is not really viable for this case since the parameters vary greatly for our case.
Edit2:
Uploaded execution plan upon request
Edit3:
I've tried rebuilding/reorganizing fragmented indices. Apparently, there were few but nothing too serious. Anyhow, this didn't reduce the amount of memory granted and didn't reduce the number of estimated rows (If this is somehow related).
You say optimizing the query is besides the point, but actually it's actually just the point. When a query is executed, SQL Server will -after generating the execution plan- reserve the memory needed for executing the query. The more rows intermediate results are estimated to hold the more memory is estimated to be required.
So, rewrite your query and/or create new indexes to get a decent query plan. A quick glance at the query plan shows some nested loops without join predicates and a number of table scans of which probably only a few records are used.
I have to benchmark a query - currently I need to know how adding parameter to select result set(FIELD_DATE1) will affect sql execution time. There is administration restrictions in db so I can not use debug. So I wrote a query:
SELECT COUNT(*), MIN(XXXT), MAX(XXXT)
FROM ( select distinct ID AS XXXID, sys_extract_utc(systimestamp) AS XXXT
, FIELD_DATE1 AS XXXUT
from XXXTABLE
where FIELD_DATE1 > '20-AUG-06 02.23.40.010000000 PM' );
Will output of query show real times of query execution
There is a lot to learn when it comes to benchmarking in Oracle. I recommend you to begin with the items below even though It worries me that you might not have enough restrictions in db since some of these features could require extra permissions:
Explain Plan: For every SQL statement, oracle has to create an execution plan, the execution plan defines how to information will be read/written. I.e.: the indexes to use, the join method, the sorts, etc.
The Explain plan will give you information about how good your query is and how it is using the indexes. Learning the concept of a query cost for this is key, so take a look to it.
TKPROF: it's an Oracle tool that allows you to read oracle trace files. When you enable timed statistics in oracle you can trace your sql statements, the result of this traces are put in files; You can read these files with TKPROF.
Among the information TKPROF will let you see is:
count = number of times OCI procedure was executed
cpu = cpu time in seconds executing
elapsed = elapsed time in seconds executing
disk = number of physical reads of buffers from disk
query = number of buffers gotten for consistent read
current = number of buffers gotten in current mode (usually for update)
rows = number of rows processed by the fetch or execute call
See: Using SQL Trace and TKPROF
It's possible in this query that SYSTIMESTAMP would be evaluated once, and the same value associated with every row, or that it would be evaluated once for each row, or something in-between. It also possible that all the rows would be fetched from table, then SYSTIMESTAMP evaluated for each one, so you wouldn't be getting an accurate account of the time taken by the whole query. Generally, you can't rely on order of evaluation within SQL, or assume that a function will be evaluated once for each row where it appears.
Generally the way I would measure execution time would be to use the client tool to report on it. If you're executing the query in SQLPlus, you can SET TIMING ON to have it report the execution time for every statement. Other interactive tools probably have similar features.
How can I configure the maximum memory that a query (select query) can use in sql server 2008?
I know there is a way to set the minimum value but how about the max value? I would like to use this because I have many processes in parallel. I know about the MAXDOP option but this is for processors.
Update:
What I am actually trying to do is run some data load continuously. This data load is in the ETL form (extract transform and load). While the data is loaded I want to run some queries ( select ). All of them are expensive queries ( containing group by ). The most important process for me is the data load. I obtained an average speed of 10000 rows/sec and when I run the queries in parallel it drops to 4000 rows/sec and even lower. I know that a little more details should be provided but this is a more complex product that I work at and I cannot detail it more. Another thing that I can guarantee is that my load speed does not drop due to lock problems because I monitored and removed them.
There isn't any way of setting a maximum memory at a per query level that I can think of.
If you are on Enterprise Edition you can use resource governor to set a maximum amount of memory that a particular workload group can consume which might help.
In SQL 2008 you can use resource governor to achieve this. There you can set the request_max_memory_grant_percent to set the memory (this is the percent relative to the pool size specified by the pool's max_memory_percent value). This setting in not query specific, it is session specific.
In addition to Martin's answer
If your queries are all the same or similar, working on the same data, then they will be sharing memory anyway.
Example:
A busy web site with 100 concurrent connections running 6 different parametrised queries between them on broadly the same range of data.
6 execution plans
100 user contexts
one buffer pool with assorted flags and counters to show usage of each data page
If you have 100 different queries or they are not parametrised then fix the code.
Memory per query is something I've never thought or cared about since last millenium
I am trying to increase one of my request performance.
My request is made of 10 different select .
The actual production query is taking 36sec to execute.
If I display the execution plan, for one select I have a query cost of 18%.
So I change a in clause (in this select) with an xml query (http://www.codeproject.com/KB/database/InClauseAndSQLServer.aspx).
The new query now takes 28 sec to execute, but sql server tells me that the above select has a query cost of 100%. And this is the only change I made. And there is no parallelism in any query.
PRODUCTION :
36sec, my select is 18% (the others are 10%).
NEW VERSION :
28sec, my select is 100% (the others are 0%).
Do you have any idea how sql server compute this "query cost" ? (I start to believe that it's random or something like that).
Query cost is a unitless measure of a combination of CPU cycles, memory, and disk IO.
Very often you will see operators or plans with a higher cost but faster execution time.
Primarily this is due to the difference in speed of the above three components. CPU and Memory are fairly quick, and also uncommon as bottlenecks. If you can shift some pressure from the disk IO subsystem to the CPU, the query may show a higher cost but should execute substantially faster.
If you want to get more detailed information about the execution of your specific queries, you can use:
SET STATISTICS IO ON
SET STATISTICS TIME ON
This will output detailed information about CPU cycles, plan creation, and page reads (both from disk and from memory) to the messages tab.