What might cause Oracle to choose parallel execution on one database versus the other for the same objects? - sql-execution-plan

I have a test and development Oracle 19c database that is running out of temp tablespace on an older pre-existing query. The explain plan shows that, on the database running out of space, the explain plan is using a lot of parallel execution steps (PX SEND BROADCAST, PX RECEIVE, PX BLOCK ITERATOR). The database is also buffering a lot of the scanned data, which I assume is what is causing all the space to get eaten up.
On the dev database, the same query, on the same objects, same indexes, same everything else as far as I have checked, it runs without running out of space. The explain plan uses about half the steps and does not use parallel execution at all.
I am trying to work with one of our DBAs to find what is causing the difference. What are some things I should look at that might explain such a difference in explain plan? I have looked at making sure the indexes are the same, the data size is the same, there have been recent gather stats run, and I have also looked at these settings:
select PDML_ENABLED, PDML_STATUS, PDDL_STATUS, PQ_STATUS FROM V$session where sid = (select sid from v$mystat where rownum = 1);
Are there any global or session parameters I might compare between the two databases?

When you say the tables/indexes are the same, make sure to check their "parallel" attribute.
At the system level, check parallel_degree_policy.
Also, an explain plan should tell you why a specific degree or parallelism was chosen; that might provide a clue.

Related

Is there a possibility to make my query in SQL run faster?

I am trying to run a query that would produce only 2 million lines and 12 columns. However my query has been running for 6 hours... I would like to ask if there is anything I can do to speed it up and if there are general tips.
I am still a beginner in SQL and your help is highly appreciated
INSERT INTO #ORSOID values (321) --UK
INSERT INTO #ORSOID values (368) --DE
SET #startorderdate = '4/1/2019' --'1/1/2017' --EDIT THESE
SET #endorderdate = '6/30/2019' --EDIT THESE
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---step 1 for the list of opids and check the table to see if more columns that are needed are present to include them
--Create a list of relevant OpIDs for the selected time period
select
op1.oporid,
op1.opcurrentponum,
o.orcompletedate,
o.orsoid,
op1.opid,
op1.opreplacesopid,
op1.opreplacedbyopid,
op1.OpSplitFromOpID,
op1.opsuid,
op1.opprsku,
--op1.orosid,
op1.opdatenew,
OPCOMPONENTMANUFACTURERPARTID
into csn_junk.dbo.SCOpid
from csn_order..vworder o with (nolock)
inner join csn_order..vworderproduct op1 with (nolock) on oporid = orid
LEFT JOIN CSN_ORDER..TBLORDERPRODUCT v WITH (NOLOCK) on op1.opid = v.OpID
where op1.OpPrGiftCertificate = 0
and orcompletedate between #startorderdate and #endorderdate
and orsoid in (select soid from #orsoid)
Select * From csn_junk.dbo.SCOpid
First, there is no way to know why a query is running on for many hours on a server we don't have access to or without any metrics (i.e an. execution plan or CPU/Memory/IO metrics.) Also, without any DDL there it's impossible to understand what's going on with your query.
General Guidelines for troubleshooting slow data modification:
Getting the right metrics
The first thing I'd do is run task manager on that server and see if you have a server issue or a query issue. Is the CPU pegged to 100%? If so, is sqlservr.exe the cause? How often do you run this query? How fast is it normally?
There are a number of native and 3rd party tools for collecting good metrics. Execution plans, DMFs and DMVs, Extended Events, SQL Traces, Query Store. You also have great third party tools like Brent Ozar's suite of tools, Adam Machanic's sp_whoisactive.
There's a saying in the BI World: If you can't measure it, you can't manage it. If you can't measure what's causing your queries to be slow, you won't know where to start.
Big updates like this can cause locking, blocking, lock-escalation and even deadlocks.
Understand execution plans, specifically actual execution plans.
I write my code in SSMS with "Show execution plan" turned on. I always want to know what my query is doing. You can also view the execution plans after the fact by capturing them using SQL Traces (via the SQL Profiler) or Extended Events.
This is a huge topic so I'll just mention some things off the top of my head that I look for in my plans when troubleshooting slow queries: Sorts, Key Lookups, RID lookups, Scans against large tables (e.g. you scan an entire 10,000,000 row table to retrieve 12,000 rows - for this you want a seek.) Sometimes there will be warnings in the execution plan such as a "tempdb spill" - these are bad. Sometimes the plan will call out "missing indexes" - a topic unto itself. Which brings me to...
INDEXES
This is where execution plans, DMV and other SQL monitoring tools really come in handy. The rule of thumb is, when you are doing SELECT queries it's nice to have plenty of good indexes available for the optimizer to chose; in a normalized data mart for example, more are better. For INSERT/UPDATE/DELETE operations you want as few indexes possible because each one associated with the query/data in the query is modified. For a big insert like the one you are doing, fewer indexes would be better on csn_junk.dbo.SCOpid and, as mentioned in the comments below your post, you want the correct indexes to support the tables used for the update.
CONSTRAINTS
Constraints slow data modification. The present referential integrity constraints (Primary/Foreign keys) and UNIQUE constraints will impact performance. CHECK constraints can as well; CHECK constraints that use a T-SQL scalar function will absolutely destroy data modification performance more than almost anything else I can think of except for scalar UDFs as CHECK constraints that also access other tables this can slow an insert that should a minute to several hours.
MDF & LDF file growth
A 2,000,000 row+/12 column insert is going to cause the associated MDF and LDF files to grow substantially. If your data files (.MDF or .NDF) or Log File (.LDF) fill up they will auto-grow to create space. This slows queries that run in seconds to minutes, especially when your auto-growth settings are bad. See: SQL Server Database Growth and Autogrowth Settings
Whenever I have a query that always runs for 10 seconds and now, out of nowhere, it's running for minutes. Assuming it's not a deadlock or server issue I will check for MDF or LDF autogrowth as this is often the culprit. Often you have a log file that needs to be shrunk (via log backup or manually depending on the recovery model.) This brings me to batching:
Batching
Huge inserts chew up log space and take forever to roll back if the query fails. Making things worse - cancelling a huge insert (or trying to Kill the Spid) will sometimes cause more problems. Doing data modifications in batches can circumvent this problem. See this article for more details.
Hopefully this helps get you started. I've given you plenty to google. Please forgive any typos - I spun this up fast. Feel free to ask followup questions.

MS SQL Server Query caching

One of my projects has a very large database on which I can't edit indexes etc., have to work as it is.
What I saw when testing some queries that I will be running on their database via a service that I am writing in .net. Is that they are quite slow when ran the first time?
What they used to do before is - they have 2 main (large) tables that are used mostly. They showed me that they open SQL Server Management Studio and run a
SELECT *
FROM table1
JOIN table2
a query that takes around 5 minutes to run the first time, but then takes about 30 seconds if you run it again without closing SQL Server Management Studio. What they do is they keep open SQL Server Management Studio 24/7 so that when one of their programs executes queries that are related to these 2 tables (which seems to be almost all queries ran by their program) in order to have the 30 seconds run time instead of the 5 minutes.
This happens because I assume the 2 tables get cached and then there are no (or close to none) disk reads.
Is this a good idea to have a service which then runs a query to cache these 2 tables every now and then? Or is there a better solution to this, given the fact that I can't edit indexes or split the tables, etc.?
Edit:
Sorry just I was possibly unclear, the DB hopefully has indexes already, just I am not allowed to edit them or anything.
Edit 2:
Query plan
This could be a candidate for an indexed view (if you can persuade your DBA to create it!), something like:
CREATE VIEW transhead_transdata
WITH SCHEMABINDING
AS
SELECT
<columns of interest>
FROM
transhead th
JOIN transdata td
ON th.GID = td.HeadGID;
GO
CREATE UNIQUE CLUSTERED INDEX transjoined_uci ON transhead_transdata (<something unique>);
This will "precompute" the JOIN (and keep it in sync as transhead and transdata change).
You can't create indexes? This is your biggest problem regarding performance. A better solution would be to create the proper indexes and address any performance by checking wait stats, resource contention, etc... I'd start with Brent Ozar's blog and open source tools, and move forward from there.
Keeping SSMS open doesn't prevent the plan cache from being cleared. I would start with a few links.
Understanding the query plan cache
Check your current plan cache
Understanding why the cache would clear (memory constraint, too many plans (can't hold them all), Index Rebuild operation, etc. Brent talks about this in this answer
How to clear it manually
Aside from that... that query is suspect. I wouldn't expect your application to use those results. That is, I wouldn't expect you to load every row and column from two tables into your application every time it was called. Understand that a different query on those same tables, like selecting less columns, adding a predicate, etc could and likely would cause SQL Server to generate a new query plan that was more optimized. The current query, without predicates and selecting every column... and no indexes as you stated, would simply do two table scans. Any increase in performance going forward wouldn't be because the plan was cached, but because the data was stored in memory and subsequent reads wouldn't experience physical reads. i.e. it is reading from memory versus disk.
There's a lot more that could be said, but I'll stop here.
You might also consider putting this query into a stored procedure which can then be scheduled to run at a regular interval through SQL Agent that will keep the required pages cached.
Thanks to both #scsimon #Branko Dimitrijevic for their answers I think they were really useful and the one that guided me in the right direction.
In the end it turns out that the 2 biggest issues were hardware resources (RAM, no SSD), and Auto Close feature that was set to True.
Other fixes that I have made (writing it here for anyone else that tries to improve):
A helper service tool will rearrange(defragment) indexes once every
week and will rebuild them once a month.
Create a view which has all the columns from the 2 tables in question - to eliminate JOIN cost.
Advised that a DBA can probably help with better tables/indexes
Advised to improve server hardware...
Will accept #Branko Dimitrijevic 's answer as I can't accept both

My Postgres database wasn't using my index; I resolved it, but don't understand the fix, can anyone explain what happened?

My database schema in relevant part is there is a table called User, which had a boolean field Admin. There was an index on this field Admin.
The day before I restored my full production database onto my development machine, and then made only very minor changes to the database, so they should have been very similar.
When I ran the following command on my development machine, I got the expected result:
EXPLAIN SELECT * FROM user WHERE admin IS TRUE;
Index Scan using index_user_on_admin on user (cost=0.00..9.14 rows=165 width=3658)
Index Cond: (admin = true)
Filter: (admin IS TRUE)
However, when I ran the exact same command on my production machine, I got this:
Seq Scan on user (cost=0.00..620794.93 rows=4966489 width=3871)
Filter: (admin IS TRUE)
So instead of using the exact index that was a perfect match for the query, it was using a sequential scan of almost 5 million rows!
I then tried to run EXPLAIN ANALYZE SELECT * FROM user WHERE admin IS TRUE; with the hope that ANALYZE would make Postgres realize a sequential scan of 5 million rows wasn't as good as using the index, but that didn't change anything.
I also tried to run REINDEX INDEX index_user_on_admin in case the index was corrupted, without any benefit.
Finally, I called VACUUM ANALYZE user and that resolved the problem in short order.
My main understanding of vacuum is that it is used to reclaim wasted space. What could have been going on that would cause my index to misbehave so badly, and why did vacuum fix it?
It was most likely the ANALYZE that helped, by updating the data statistics used by the planner to determine what would be the best way to run a query. VACUUM ANALYZE just runs the two commands in order, VACUUM first, ANALYZE second, but ANALYZE itself would probably be enough to help.
The ANALYZE option to EXPLAIN has completely nothing to do with the ANALYZE command. It just causes Postgres to run the query and report the actual run times, so that they can be compared with the planner predictions (EXPLAIN without the ANALYZE only displays the query plan and what the planner thinks it will cost, but does not actually run the query). So EXPLAIN ANALYZE did not help because it did not update the statistics. ANALYZE and EXPLAIN ANALYZE are two completely different actions that just happen to use the same word.
PostgreSQL keeps a number of advanced statistics about the table condition, index condition, data, etc... This can get out of sync sometimes. Running VACUUM will correct the problem.
It is likely that when you reloaded the table from scratch on development, it had the same effect.
Take a look at this:
http://www.postgresql.org/docs/current/static/maintenance.html#VACUUM-FOR-STATISTICS
A partial index seems a good solution for your issue:
CREATE INDEX admin_users_ix ON users (admin)
WHERE admin IS TRUE;;
Has no sense to index a lot of tuples over a identical field.
Here is what I think is the most likely explanation.
Your index is useful only when a very small number of rows are returned (btw, I don't like to index bools for this reason-- you might consider using a partial index instead, or even adding a where admin is true since that will keep your index only to the cases where it is likely to be usable anyway).
If more than around, iirc, 10% of the pages in the table are to be retrieved, the planner is likely to choose a lot of sequential disk I/O over a smaller amount of random disk I/O because that way you don't have to wait for platters to turn. The seek speed is a big issue there and PostgreSQL will tend to try to balance that against the amount of actual data to be retrieved from the relation.
You had statistics gathered which indicated that the table was either smaller than it was or there were more admins as a portion of users than you had, and so the planner used bad information to make the decision.
VACUUM ANALYZE does three things. First it freezes tuples visible to all transactions so that transaction wraparound is not an issue. Then it allocates tuples visible to no transactions as free space. Neither of these affected your issue. However the third is that it analyzes the tables and gather statistics on the tables. Keep in mind this is a random sampling and therefore sometimes can be off. My guess is that the previous run, it grabbed the page with lots of admins and thus grossly overestimated the number of admins of the system.
This is probably a good time to double check your autovacuum settings because it is also possible that the statistics are very much out of date elsewhere but that is far from certain. In particular, cost-based vacuum settings have defaults that sometimes make it so that vacuum never fully catches up.

Postgres query optimization

On postgres 9.0, set both index_scan and seq_scan to Off. Why does it improve query performance by 2x?
This may help some queries run faster, but is almost certain to make other queries slower. It's interesting information for diagnostic purposes, but a bad idea for a long-term "solution".
PostgreSQL uses a cost-based optimizer, which looks at the costs of all possible plans based on statistics gathered by scanning your tables (normally by autovacuum) and costing factors. If it's not choosing the fastest plan, it is usually because your costing factors don't accurately model actual costs for your environment, statistics are not up-to-date, or statistics are not fine-grained enough.
After turning index_scan and seq_scan back on:
I have generally found the cpu_tuple_cost default to be too low; I have often seen better plans chosen by setting that to 0.03 instead of the default 0.01; and I've never seen that override cause problems.
If the active portion of your database fits in RAM, try reducing both seq_page_cost and random_page_cost to 0.1.
Be sure to set effective_cache_size to the sum of shared_buffers and whatever your OS is showing as cached.
Never disable autovacuum. You might want to adjust parameters, but do that very carefully, with small incremental changes and subsequent monitoring.
You may need to occasionally run explicit VACUUM ANALYZE or ANALYZE commands, especially for temporary tables or tables which have just had a lot of modifications and are about to be used in queries.
You might want to increase default_statistics_target, from_collapse_limit, join_collapse_limit, or some geqo settings; but it's hard to tell whether those are appropriate without a lot more detail than you've given so far.
You can try out a query with different costing factors set on a single connection. When you confirm a configuration which works well for your whole mix (i.e., it accurately models costs in your environment), you should make the updates in your postgresql.conf file.
If you want more targeted help, please show the structure of the tables, the query itself, and the results of running EXPLAIN ANALYZE for the query. A description of your OS and hardware helps a lot, too, along with your PostgreSQL configuration.
Why ?
The most logical answer is because of the way your database tables are configured.
Without you posting your table schema's I can only hazard a guess that your indices don't have a high cardinality.
that is to say, that if your index contains too much information to be useful then it will be far less efficient, or indeed slower.
Cardinality is a measure of how unique a row in your index is. The lower the cardinality, the slower your query will be.
A perfect example is having a boolean field in your index; perhaps you have a Contacts table in your database and it has a boolean column that records true or false depending on whether the customer would like to be contacted by a third party.
In the mean, if you did 'select * from Contacts where OptIn = true'; you can imagine that you'd return a lot of Contacts; imagine 50% of contacts in our case.
Now if you add this 'Optin' column to an index on that same table; it stands to reason that no matter how fine the other selectors are, you will always return 50% of the table, because of the value of 'OptIn'.
This is a perfect example of low cardinality; it will be slow because any query involving that index will have to select 50% of the rows in the table; to then be able to apply further WHERE filters to reduce the dataset again.
Long story short; If your Indices include bad fields or simply represent every column in the table; then the SQL engine has to resort to testing row-by-agonizing-row.
Anyway, the above is theoretical in your case; but it is a known common reason for why queries suddenly start taking much longer.
Please fill in the gaps regarding your data structure, index definitions and the actual query that is really slow!

SQL Plus vs Toad IDE - Running insert in SQL Plus takes significantly longer

I'm running a query like this:
INSERT INTO TableA (colA, colB)
Select ColA, ColB
from TableB
This is huge insert, as it is querying over 2 million rows an then inserting them into the table. My question is in regard to the performance. When I run the query in toad the query takes around 4-5 minutes to run.
When I run the query through sqlplus it is taking way longer. It has already been running 40 minutes+ and it is not finished. I've even done some minor tuning by setting the server output off in case that effected performance.
Is there any tuning I should be aware of in regard to running the query via sqlplus? Is there any way to find out the difference in how the query is being executed/handled by the different clients?
Note: This is the only way I can transfer my data from table A to table B. I've looked into imp/exp and impdp/expdp and it is not possible in my situation.
Toad - v. 9.6.1.1
SqlPlus - 9.2.0.1.0
Oracle DB - 10g
This sounds like there is something else involved. My wild guess would be that your SQL*Plus session is getting blocked. Can you check v$lock to see if that is the case? There are a lot of scripts / tools to check to see what your session is currently spending its time on. Figure that out and then go from there. I personally like Tanel Poder's Snapper script (http://tech.e2sn.com/oracle-scripts-and-tools/session-snapper).
It could be a thousand things. (#John Gardner: This is one reason why I'm not a huge fan of dba.stackexchange.com - you won't know if it's a programming issue or a DBA issue until you know the answer. I think it's better if we all work together on one site.)
Here are some ideas:
Different session settings - parallel dml and parallel query may be enabled, forced, or disabled. Look at your login scripts, or look at the session info with select pdml_stats, pq_status, v$session.* from v$session;
A lock, as #Craig suggested. Although I think it's easier to look at select v$session.blocking_session, v$session.* from v$session; to identify locks.
Delayed block cleanout will make the second query slower. Run with set autotrace on. The db block gets and redo size are probably larger the second time (the second statement has some extra work to do, although this probably isn't nearly enough to explain the time difference).
Buffer cache may make the second query faster. Run with set autotrace on, there may be a large difference in physical reads. Although with that much data the chances are probably small that a huge chunk of it is cached.
Other sessions may be taking up a lot of resources. Look at select * from v$sessmetric order by physical_reads desc,logical_reads desc, cpu desc; Or maybe look at v$sysmetric_history.
You may want to consider parallel and append hints. You can probably make that query run 10 times faster (although there are some downsides to that approach, such as the
data being unrecoverable initially).
Also, for testing, you may want to use smaller sizes. Run an insert with something like and rownum <= 10000. Performance tuning is very hard, it helps a lot if you can run
the statements frequently. There are always some flukes and you want to ignore the outliers, but you can't do that with only two samples.
You can look at some detailed stats for each run, but you may need to run the query with INSERT /*+ GATHER_PLAN_STATISTICS */.... Then run this to find the sql_id: select * from v$sql where sql_text like '%INSERT%GATHER_PLAN_STATISTICS%';
Then run this to look at the details of each step: select * from v$sql_plan_statistics_all where sql_id = '<sql_id from above>';
(In 11g, you can use v$sql_monitor, or even better, dbms_sqltune.report_sql_monitor.)
A really obvious point, but it's been known to trip people up... are there any indexes on tableA; if so are any of them unique; and if so did you commit or rollback the Toad session before running it again in SQL*Plus? Not doing so is an an easy way of getting a block, as #Craig suggests. In this scenario it won't ever finish - your 40+ minute wait is while it's blocking on the first row insert.
If there are any indexes you're likely to be better off dropping them while you do the insert and recreating them afterwards as that's usually significantly faster.
As other people have already suggested, there are a lot of things that could cause a statement that selects/inserts that much data to perform badly (and inconsistently). While I have seen Toad do things to improve performance sometimes, I've never seen it do anything so much faster, so I'm inclined to think it's more to do with the database rather than the tool.
I would ask the DBA's to check your session and the database while the slow statement is running. They should be able to give you some indication of what's happening - they'll be able to check for any problems such as locking or excessive log file switching. They'll also be able to trace both sessions (Toad and SQL Plus) to see how Oracle's executing those statements and if there are any differences, etc.
Depending what it is you're doing, they might even be able to help you run the insert faster. For example, it can be faster to disable an index, do the insert, then rebuild it; or it might be possible to disable logging temporarily. This would obviously depend on your exact scenario.