What are the factors to decide the Dop in Parallel hint in Oracle.
eg.
SELECT /*+ PARALLEL(employees 3) */ e.last_name, d.department_name
FROM employees e, departments d
WHERE e.department_id=d.department_id;
How to decide 3 is the correct DOP?
Most of the system I maintain are neither OTLP now Datawarehouse, But something in between. So sometimes it is necessary to use parallelism but you do not want to use all resources (all CPU cores) for a single report. Theoretically you can limit DOP by profiles, but resource limiting is problematic component in Oracle. Also Adaptive DOP in Oracle by default decides by number of free CPU resources, so such a query usually has huge number of parallel workers, which totally kills disk IO.
Oracle session worker processes are single-threaded, neither of them will utilize more than single CPU core. So DOP should always correspond to number of CPU cores and whether it is CPU bound.
Related
Suppose I have query ( it has joins on multiple tables ) and assuming it is tuned, and optimized. This query runs on the target database/tables with N number of records and query results R number of records and takes time T. Now gradually the load increases and say the target records become N2, and result it give is R2 and time it takes as T2. Assuming that I have allocated enough memory to the Oracle, L2/L1 will be close to T2/T1. Means the proportional increase in the load will result proportional increase in execution time. For this question lets say L2 = 5L1, means load has increased to 5times. Then time take to complete by this query would also be 5times or little more, right? So, to reduce the proportional growth in time, do we have options in Oracle, like parallel hint etc? In Java we split the job in multiple threads and 2times the load with 2times the worker thread we get almost same time to complete. So with increasing load we increase the worker thread and achieve the scaling issue reasonably well. Is such thing possible in Oracle or does Oracle take care of such thing in the back end and will scale, by splitting the load internally into parallel processing? Here, I have multi core processors. I Will experiment it, but if expert opinion is available it will help.
No. Query algorithms do not necessarily grow linearly.
You should probably learn something about algorithms and complexity. But many algorithms used in a data are super-linear. For instance, ordering a set of rows has a complexity of O(n log n), meaning that if you double the data size, the time taken for sorting more than doubles.
This is also true of index lookups and various join algorithms.
On the other hand, if your query is looking up a few rows using a b-tree index, then the complex is O(log n) -- this is sublinear. So index lookups grow more slowly than the size of the data.
So, in general you cannot assume that increasing the size of data by a factor of n has a linear effect on the time.
Let's say I have a pool of 4 parallel workers in my PostgreSQL database configuration. I also have 2 sessions.
In session#1, the SQL is currently executing, with my planner randomly choosing to launch 2 workers for this query.
So, in session#2, how can I know that my pool of workers has decreased by 2?
You can count the parallel worker backends:
SELECT current_setting('max_parallel_workers')::integer AS max_workers,
count(*) AS active_workers
FROM pg_stat_activity
WHERE backend_type = 'parallel worker';
Internally, Postgres keeps track of how many parallel workers are active with two variables named parallel_register_count and parallel_terminate_count. The difference between the two is the number of active parallel workers. See comment in in the source code.
Before registering a new parallel worker, this number is checked against the max_parallel_workers setting in the source code here.
Unfortunately, I don't know of any direct way to expose this information to the user.
You'll see the effects of an exhausted limit in query plans. You might try EXPLAIN ANALYZE with a SELECT query on a big table that's normally parallelized. You would see fewer workers used than workers planned. The manual:
The total number of background workers that can exist at any one time
is limited by both max_worker_processes and max_parallel_workers.
Therefore, it is possible for a parallel query to run with fewer
workers than planned, or even with no workers at all.
I have used default degree of parallelism in order to gain performance tuning and I got the best results too. but I doubt it will impact when some other job access the same table at same time.
sample code below.
select /*+ FULL(customer) PARALLEL(customer, default) */ customer_name from customer;
The number of servers available is 8 . How this default degree of parallelism works? will it affect if some other job running query on same table at same time? Before moving this query to production , I would like to know whether this will impact ? Thanks!
From documentation:
PARALLEL (DEFAULT):
The optimizer calculates a degree of parallelism equal to the number
of CPUs available on all participating instances times the value of
the PARALLEL_THREADS_PER_CPU initialization parameter.
The maximum degree of parallelism is limited by the number of CPUs in the system. The formula used to calculate the limit is :
PARALLEL_THREADS_PER_CPU * CPU_COUNT * the number of instances available
by default, all the opened instances on the cluster but can be constrained using PARALLEL_INSTANCE_GROUP or service specification. This is the default.
How can I configure the maximum memory that a query (select query) can use in sql server 2008?
I know there is a way to set the minimum value but how about the max value? I would like to use this because I have many processes in parallel. I know about the MAXDOP option but this is for processors.
Update:
What I am actually trying to do is run some data load continuously. This data load is in the ETL form (extract transform and load). While the data is loaded I want to run some queries ( select ). All of them are expensive queries ( containing group by ). The most important process for me is the data load. I obtained an average speed of 10000 rows/sec and when I run the queries in parallel it drops to 4000 rows/sec and even lower. I know that a little more details should be provided but this is a more complex product that I work at and I cannot detail it more. Another thing that I can guarantee is that my load speed does not drop due to lock problems because I monitored and removed them.
There isn't any way of setting a maximum memory at a per query level that I can think of.
If you are on Enterprise Edition you can use resource governor to set a maximum amount of memory that a particular workload group can consume which might help.
In SQL 2008 you can use resource governor to achieve this. There you can set the request_max_memory_grant_percent to set the memory (this is the percent relative to the pool size specified by the pool's max_memory_percent value). This setting in not query specific, it is session specific.
In addition to Martin's answer
If your queries are all the same or similar, working on the same data, then they will be sharing memory anyway.
Example:
A busy web site with 100 concurrent connections running 6 different parametrised queries between them on broadly the same range of data.
6 execution plans
100 user contexts
one buffer pool with assorted flags and counters to show usage of each data page
If you have 100 different queries or they are not parametrised then fix the code.
Memory per query is something I've never thought or cared about since last millenium
I am trying to increase one of my request performance.
My request is made of 10 different select .
The actual production query is taking 36sec to execute.
If I display the execution plan, for one select I have a query cost of 18%.
So I change a in clause (in this select) with an xml query (http://www.codeproject.com/KB/database/InClauseAndSQLServer.aspx).
The new query now takes 28 sec to execute, but sql server tells me that the above select has a query cost of 100%. And this is the only change I made. And there is no parallelism in any query.
PRODUCTION :
36sec, my select is 18% (the others are 10%).
NEW VERSION :
28sec, my select is 100% (the others are 0%).
Do you have any idea how sql server compute this "query cost" ? (I start to believe that it's random or something like that).
Query cost is a unitless measure of a combination of CPU cycles, memory, and disk IO.
Very often you will see operators or plans with a higher cost but faster execution time.
Primarily this is due to the difference in speed of the above three components. CPU and Memory are fairly quick, and also uncommon as bottlenecks. If you can shift some pressure from the disk IO subsystem to the CPU, the query may show a higher cost but should execute substantially faster.
If you want to get more detailed information about the execution of your specific queries, you can use:
SET STATISTICS IO ON
SET STATISTICS TIME ON
This will output detailed information about CPU cycles, plan creation, and page reads (both from disk and from memory) to the messages tab.