Why does parallel hint not work even if there is a parallel hint?

Why does parallel hint not work even if there is a parallel hint? - sql

I tried to use 'PARALLEL'hint for the first time, but when I saw the execution plan, parallel processing didn't happen.
Here is the code and execution plan.
SELECT /*+ PARALLEL(SCORE 4) */
*
FROM SCORE;
I learned that if parallel processing happens successfully, 'PX COORDINATOR'operation has to be written in an execution plan, but as you can see in the image, there is no 'PX COORDINATOR'operation.
In this situation, why does parallel processing not happen even if I wrote the parallel hint?
And how can I make parallel processing happen succesfully?
If you give me some advice, I'll really appreciate it.
(I'm using oracle 11g.)

What edition are you using? XE does not have access to parallelism (which is sensible given in 11g XE can only use a single core anyway).
The latest version 18c XE can use two cpu cores but the parallelism restriction remains. Luckily, your query probably won’t benefit much by parallelism - the table is small so is quick to read and the data transfer to the client will need to be single threaded anyway (otherwise you need multiple client connections).

For 2601 (expected) rows it is not worth to process the query in parallel. Splitting the query, running in parallel and combine results afterwards would take longer.
Apart from that, Oracle has to know where and how to split the operation. Do you have any index on this table? If not, how can Oracle know where the operation could be split for parallel processing?

Related

How do I compare two SQL queries to run on Postgres

I need to compare two queries that will run in my Postgres database.
How do I know the execution time and any other statistics of them so I can produce a reliable benchmark between them?

I can think of two interesting data points to collect and compare:
The execution time.
For that, simply execute the query using psql connected via UNIX sockets (to factor out the network) and use psql's \timing command to measure the execution time as seen on the client.
Do not use EXPLAIN (ANALYZE) for that since that would add notable overhead which affects your measurements.
Make sure to run the query several times to get a reliable number. That number will correspond to the execution time with a warm cache.
If you want to measure execution time with a cold cache, restart PostgreSQL and empty the file system cache.
The number of blocks touched by the query.
For that, run EXPLAIN (ANALYZE, BUFFERS) once for each query.
The number of blocks touched is significant for performance: the fewer blocks a query touches, the faster it will (often) be. This number is particularly significant for performance with a cold cache; the fewer blocks, the less execution time will depend on caching.

PostgreSQL decrease performance after migration

After PostgreSQL database server v9.3 migration to v9.6 I noticed a decreace in the performance of the entire system. The config parameters are the same as in v9.3 taking in to account the next parameters:
shared_buffers = 10000MB
work_men= 64MB
maintenance_work_men = 1024MB
Also I tried to monitor some resources, and this is the result
total used free shared buff/cache available
Mem: 31G 385M 4.5G 10G 26G 19G
Swap: 3.0G 0B 3.0G
Also when I run some queries the server internally launches queries like these ones:
select typname from pg_type where oid=1043
set search path to public
deallocate pdo_stmt_0000000e
And then runs my query but I'm afraid that here is some impact in performance after migration. I have another 9.6 server with a fresh install no migration and it's not presenting that problem (response time). It seems to be expending too much time in those queries.
Do you have any tip or advice on how to fix this?
I did it with pg_upgrade, but I noticed that in the process some data doesn't migrate to v9.6 server. After that I did a dump/restore process and vacuum analyze.

In our case, we neglected to:
ANALYZE the database
Postgres specifically might need it after a larger migration.
For example when upgrading django 2.2 to 3.2 and all of the id field types are changed from AutoField to BigAutoField

you could install the pg_stat_statements extension on the slow and the fast system and compare the performance of the top queries in both systems. When there are major differences for the time/execution you can check the execution plans (using explain analyze).
Sometimes new features have a major performance impact after an upgrade. If my memory serves me well the parallel sequential scan - https://blog.2ndquadrant.com/postgresql96-parallel-sequential-scan/ - has been added in 9.6. Though this is basically a great feature, there are some situations in which its use may result in a slowdown of queries. This could be a reason to set parallel_setup_cost (or other planner parameters) to a different value to avoid inefficient parallel sequential scans.
Edited later: As I see in https://www.postgresql.org/docs/9.6/release-9-6.html the parallel query execution has not been activated by default, so it's probably not the reason of the slowdown in your situation. Still I think that only an analysis of the performance of the top queries and their plans may shed light on the issue.

What is the best way to spot the slowest block of a SQL query?

I am facing a problem that running a stored procedure is taking too much resources which sometimes causes a time out on the server (especially when the CPU usage is more than 90%).
Can anyone suggest what the best and quickest way is to spot the block which takes much resources, and also suggest a good way to solve it, please?
I am using SQL server 2005

You want to use the Query profiler. Explained here. Which will show you a graphical representation of your queries execution path, as well as which parts of it are taking the most time.

If you want to know which block is slowest, use the following
SET STATISTICS PROFILE ON
SET STATISTICS IO ON
SET STATISTICS TIME ON
When you run the SP this will display stats for each query.

If you are using the SQl Server Management studio, you can turn on the execution plan to display information about how the query will be executed by sql server including what percentage of the entire process will be taken up by each sub-process.
often when doing this, there will be a part of the query that is obviously using most of the resources.
using this informationm you can then make an informed decision about how to tune the database. (like adding an index to the offending table(s))

You don't need to use SQL Profiler to view an execution plan - just:
SET SHOWPLAN_XML ON

If there are a bunch of statements in the sproc it can be a bit convoluted to turn on the SET STATISTICS options since you have many chunks of output to associate with input.
The graphical representation of a query plan in SSMS is pretty useful since it shows you the % cost of each statement relative to the cost of the entire batch/sproc. But this is a single value, so it can be more helpful at times just to run Profiler and turn on statement level output. Profiler will give you separate IO and CPU cost for each statement if you add event SQL:StmtCompleted and columns CPU and Reads.

How do I clear oracle execution plan cache for benchmarking?

On oracle 10gr2, I have several sql queries that I am comparing performance. But after their first run, the v$sql table has the execution plan stored for caching, so for one of the queries I go from 28 seconds on first run to .5 seconds after.
I've tried
ALTER SYSTEM FLUSH BUFFER_CACHE;
After running this, the query consistently runs at 5 seconds, which I do not believe is accurate.
Thought maybe deleting the line item itself from the cache:
delete from v$sql where sql_text like 'select * from....
but I get an error about not being able to delete from view.

Peter gave you the answer to the question you asked.
alter system flush shared_pool;
That is the statement you would use to "delete prepared statements from the cache".
(Prepared statements aren't the only objects flushed from the shared pool, the statement does more than that.)
As I indicated in my earlier comment (on your question), v$sql is not a table. It's a dynamic performance view, a convenient table-like representation of Oracle's internal memory structures. You only have SELECT privilege on the dynamic performance views, you can't delete rows from them.
flush the shared pool and buffer cache?
The following doesn't answer your question directly. Instead, it answers a fundamentally different (and maybe more important) question:
Should we normally flush the shared pool and/or the buffer cache to measure the performance of a query?
In short, the answer is no.
I think Tom Kyte addresses this pretty well:
http://www.oracle.com/technology/oramag/oracle/03-jul/o43asktom.html
http://www.oracle.com/technetwork/issue-archive/o43asktom-094944.html
<excerpt>
Actually, it is important that a tuning tool not do that. It is important to run the test, ignore the results, and then run it two or three times and average out those results. In the real world, the buffer cache will never be devoid of results. Never. When you tune, your goal is to reduce the logical I/O (LIO), because then the physical I/O (PIO) will take care of itself.
Consider this: Flushing the shared pool and buffer cache is even more artificial than not flushing them. Most people seem skeptical of this, I suspect, because it flies in the face of conventional wisdom. I'll show you how to do this, but not so you can use it for testing. Rather, I'll use it to demonstrate why it is an exercise in futility and totally artificial (and therefore leads to wrong assumptions). I've just started my PC, and I've run this query against a big table. I "flush" the buffer cache and run it again:
</excerpt>
I think Tom Kyte is exactly right. In terms of addressing the performance issue, I don't think that "clearing the oracle execution plan cache" is normally a step for reliable benchmarking.
Let's address the concern about performance.
You tell us that you've observed that the first execution of a query takes significantly longer (~28 seconds) compared to subsequent executions (~5 seconds), even when flushing (all of the index and data blocks from) the buffer cache.
To me, that suggests that the hard parse is doing some heavy lifting. It's either a lot of work, or its encountering a lot of waits. This can be investigated and tuned.
I'm wondering if perhaps statistics are non-existent, and the optimizer is spending a lot of time gathering statistics before it prepares a query plan. That's one of the first things I would check, that statistics are collected on all of the referenced tables, indexes and indexed columns.
If your query joins a large number of tables, the CBO may be considering a huge number of permutations for join order.
A discussion of Oracle tracing is beyond the scope of this answer, but it's the next step.
I'm thinking you are probably going to want to trace events 10053 and 10046.
Here's a link to an "event 10053" discussion by Tom Kyte you may find useful:
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:63445044804318
tangentially related anecdotal story re: hard parse performance
A few years back, I did see one query that had elapsed times in terms of MINUTES on first execution, subsequent executions in terms of seconds. What we found was that vast majority of the time for the first execution time was spent on the hard parse.
This problem query was written by a CrystalReports developer who innocently (naively?) joined two humongous reporting views.
One of the views was a join of 62 tables, the other view was a join of 42 tables.
The query used Cost Based Optimizer. Tracing revealed that it wasn't wait time, it was all CPU time spent evaluating possible join paths.
Each of the vendor supplied "reporting" views wasn't too bad by itself, but when two of them were joined, it was agonizingly slow. I believe the problem was the vast number of join permutations that the optimizer was considering. There is an instance parameter that limits the number of permutations considered by the optimizer, but our fix was to re-write the query. The improved query only joined the dozen or so tables that were actually needed by the query.
(The initial immediate short-term "band aid" fix was to schedule a run of the query earlier in the morning, before report generation task ran. That made the report generation "faster", because the report generation run made use of the already prepared statement in the shared pool, avoiding the hard parse.
The band aid fix wasn't a real solution, it just moved the problem to a preliminary execution of the query, when the long execution time wasn't noticed.
Our next step would have probably been to implement a "stored outline" for the query, to get a stable query plan.
Of course, statement reuse (avoiding the hard parse, using bind variables) is the normative pattern in Oracle. It mproves performance, scalability, yada, yada, yada.
This anecdotal incident may be entirely different than the problem you are observing.
HTH

It's been a while since I worked with Oracle, but I believe execution plans are cached in the shared pool. Try this:
alter system flush shared_pool;
The buffer cache is where Oracle stores recently used data in order to minimize disk io.

We've been doing a lot of work lately with performance tuning queries, and one culprit for inconsistent query performance is the file system cache that Oracle is sitting on.
It's possible that while you're flushing the Oracle cache the file system still has the data your query is asking for meaning that the query will still return fast.
Unfortunately I don't know how to clear the file system cache - I just use a very helpful script from our very helpful sysadmins.

FIND ADDRESS AND HASH_VALUE OF SQL_ID
select address,hash_value,inst_id,users_executing,sql_text from gv$sqlarea where sql_id ='7hu3x8buhhn18';
PURGE THE PLAN FROM SHARED POOL
exec sys.dbms_shared_pool.purge('0000002E052A6990,4110962728','c');
VERIFY
select address,hash_value,inst_id,users_executing,sql_text from gv$sqlarea where sql_id ='7hu3x8buhhn18';

What is the purpose for using OPTION(MAXDOP 1) in SQL Server?

I have never clearly understood the usage of MAXDOP. I do know that it makes the query faster and that it is the last item that I can use for Query Optimization.
However, my question is, when and where it is best suited to use in a query?

As Kaboing mentioned, MAXDOP(n) actually controls the number of CPU cores that are being used in the query processor.
On a completely idle system, SQL Server will attempt to pull the tables into memory as quickly as possible and join between them in memory. It could be that, in your case, it's best to do this with a single CPU. This might have the same effect as using OPTION (FORCE ORDER) which forces the query optimizer to use the order of joins that you have specified. IN some cases, I have seen OPTION (FORCE PLAN) reduce a query from 26 seconds to 1 second of execution time.
Books Online goes on to say that possible values for MAXDOP are:
0 - Uses the actual number of available CPUs depending on the current system workload. This is the default value and recommended setting.
1 - Suppresses parallel plan generation. The operation will be executed serially.
2-64 - Limits the number of processors to the specified value. Fewer processors may be used depending on the current workload. If a value larger than the number of available CPUs is specified, the actual number of available CPUs is used.
I'm not sure what the best usage of MAXDOP is, however I would take a guess and say that if you have a table with 8 partitions on it, you would want to specify MAXDOP(8) due to I/O limitations, but I could be wrong.
Here are a few quick links I found about MAXDOP:
Books Online: Degree of Parallelism
General guidelines to use to configure the MAXDOP option

This is a general rambling on Parallelism in SQL Server, it might not answer your question directly.
From Books Online, on MAXDOP:
Sets the maximum number of processors
the query processor can use to execute
a single index statement. Fewer
processors may be used depending on
the current system workload.
See Rickie Lee's blog on parallelism and CXPACKET wait type. It's quite interesting.
Generally, in an OLTP database, my opinion is that if a query is so costly it needs to be executed on several processors, the query needs to be re-written into something more efficient.
Why you get better results adding MAXDOP(1)? Hard to tell without the actual execution plans, but it might be so simple as that the execution plan is totally different that without the OPTION, for instance using a different index (or more likely) JOINing differently, using MERGE or HASH joins.

As something of an aside, MAXDOP can apparently be used as a workaround to a potentially nasty bug:
Returned identity values not always correct

There are a couple of parallization bugs in SQL server with abnormal input. OPTION(MAXDOP 1) will sidestep them.
EDIT: Old. My testing was done largely on SQL 2005. Most of these seem to not exist anymore, but every once in awhile we question the assumption when SQL 2014 does something dumb and we go back to the old way and it works. We never managed to demonstrate that it wasn't just a bad plan generation on more recent cases though since SQL server can be relied on to get the old way right in newer versions. Since all cases were IO bound queries MAXDOP 1 doesn't hurt.

Adding my two cents, based on a performance issue I observed.
If simple queries are getting parellelized unnecessarily, it can bring more problems than solving one. However, before adding MAXDOP into the query as "knee-jerk" fix, there are some server settings to check.
In Jeremiah Peschka - Five SQL Server Settings to Change, MAXDOP and "COST THRESHOLD FOR PARALLELISM" (CTFP) are mentioned as important settings to check.
Note: Paul White mentioned max server memory aslo as a setting to check, in a response to Performance problem after migration from SQL Server 2005 to 2012. A good kb article to read is Using large amounts of memory can result in an inefficient plan in SQL Server
Jonathan Kehayias - Tuning ‘cost threshold for parallelism’ from the Plan Cache helps to find out good value for CTFP.
Why is cost threshold for parallelism ignored?
Aaron Bertrand - Six reasons you should be nervous about parallelism has a discussion about some scenario where MAXDOP is the solution.
Parallelism-Inhibiting Components are mentioned in Paul White - Forcing a Parallel Query Execution Plan

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas