SQL Plus vs Toad IDE - Running insert in SQL Plus takes significantly longer - sql

I'm running a query like this:
INSERT INTO TableA (colA, colB)
Select ColA, ColB
from TableB
This is huge insert, as it is querying over 2 million rows an then inserting them into the table. My question is in regard to the performance. When I run the query in toad the query takes around 4-5 minutes to run.
When I run the query through sqlplus it is taking way longer. It has already been running 40 minutes+ and it is not finished. I've even done some minor tuning by setting the server output off in case that effected performance.
Is there any tuning I should be aware of in regard to running the query via sqlplus? Is there any way to find out the difference in how the query is being executed/handled by the different clients?
Note: This is the only way I can transfer my data from table A to table B. I've looked into imp/exp and impdp/expdp and it is not possible in my situation.
Toad - v. 9.6.1.1
SqlPlus - 9.2.0.1.0
Oracle DB - 10g

This sounds like there is something else involved. My wild guess would be that your SQL*Plus session is getting blocked. Can you check v$lock to see if that is the case? There are a lot of scripts / tools to check to see what your session is currently spending its time on. Figure that out and then go from there. I personally like Tanel Poder's Snapper script (http://tech.e2sn.com/oracle-scripts-and-tools/session-snapper).

It could be a thousand things. (#John Gardner: This is one reason why I'm not a huge fan of dba.stackexchange.com - you won't know if it's a programming issue or a DBA issue until you know the answer. I think it's better if we all work together on one site.)
Here are some ideas:
Different session settings - parallel dml and parallel query may be enabled, forced, or disabled. Look at your login scripts, or look at the session info with select pdml_stats, pq_status, v$session.* from v$session;
A lock, as #Craig suggested. Although I think it's easier to look at select v$session.blocking_session, v$session.* from v$session; to identify locks.
Delayed block cleanout will make the second query slower. Run with set autotrace on. The db block gets and redo size are probably larger the second time (the second statement has some extra work to do, although this probably isn't nearly enough to explain the time difference).
Buffer cache may make the second query faster. Run with set autotrace on, there may be a large difference in physical reads. Although with that much data the chances are probably small that a huge chunk of it is cached.
Other sessions may be taking up a lot of resources. Look at select * from v$sessmetric order by physical_reads desc,logical_reads desc, cpu desc; Or maybe look at v$sysmetric_history.
You may want to consider parallel and append hints. You can probably make that query run 10 times faster (although there are some downsides to that approach, such as the
data being unrecoverable initially).
Also, for testing, you may want to use smaller sizes. Run an insert with something like and rownum <= 10000. Performance tuning is very hard, it helps a lot if you can run
the statements frequently. There are always some flukes and you want to ignore the outliers, but you can't do that with only two samples.
You can look at some detailed stats for each run, but you may need to run the query with INSERT /*+ GATHER_PLAN_STATISTICS */.... Then run this to find the sql_id: select * from v$sql where sql_text like '%INSERT%GATHER_PLAN_STATISTICS%';
Then run this to look at the details of each step: select * from v$sql_plan_statistics_all where sql_id = '<sql_id from above>';
(In 11g, you can use v$sql_monitor, or even better, dbms_sqltune.report_sql_monitor.)

A really obvious point, but it's been known to trip people up... are there any indexes on tableA; if so are any of them unique; and if so did you commit or rollback the Toad session before running it again in SQL*Plus? Not doing so is an an easy way of getting a block, as #Craig suggests. In this scenario it won't ever finish - your 40+ minute wait is while it's blocking on the first row insert.
If there are any indexes you're likely to be better off dropping them while you do the insert and recreating them afterwards as that's usually significantly faster.

As other people have already suggested, there are a lot of things that could cause a statement that selects/inserts that much data to perform badly (and inconsistently). While I have seen Toad do things to improve performance sometimes, I've never seen it do anything so much faster, so I'm inclined to think it's more to do with the database rather than the tool.
I would ask the DBA's to check your session and the database while the slow statement is running. They should be able to give you some indication of what's happening - they'll be able to check for any problems such as locking or excessive log file switching. They'll also be able to trace both sessions (Toad and SQL Plus) to see how Oracle's executing those statements and if there are any differences, etc.
Depending what it is you're doing, they might even be able to help you run the insert faster. For example, it can be faster to disable an index, do the insert, then rebuild it; or it might be possible to disable logging temporarily. This would obviously depend on your exact scenario.

Related

What might cause Oracle to choose parallel execution on one database versus the other for the same objects?

I have a test and development Oracle 19c database that is running out of temp tablespace on an older pre-existing query. The explain plan shows that, on the database running out of space, the explain plan is using a lot of parallel execution steps (PX SEND BROADCAST, PX RECEIVE, PX BLOCK ITERATOR). The database is also buffering a lot of the scanned data, which I assume is what is causing all the space to get eaten up.
On the dev database, the same query, on the same objects, same indexes, same everything else as far as I have checked, it runs without running out of space. The explain plan uses about half the steps and does not use parallel execution at all.
I am trying to work with one of our DBAs to find what is causing the difference. What are some things I should look at that might explain such a difference in explain plan? I have looked at making sure the indexes are the same, the data size is the same, there have been recent gather stats run, and I have also looked at these settings:
select PDML_ENABLED, PDML_STATUS, PDDL_STATUS, PQ_STATUS FROM V$session where sid = (select sid from v$mystat where rownum = 1);
Are there any global or session parameters I might compare between the two databases?
When you say the tables/indexes are the same, make sure to check their "parallel" attribute.
At the system level, check parallel_degree_policy.
Also, an explain plan should tell you why a specific degree or parallelism was chosen; that might provide a clue.

Is there a possibility to make my query in SQL run faster?

I am trying to run a query that would produce only 2 million lines and 12 columns. However my query has been running for 6 hours... I would like to ask if there is anything I can do to speed it up and if there are general tips.
I am still a beginner in SQL and your help is highly appreciated
INSERT INTO #ORSOID values (321) --UK
INSERT INTO #ORSOID values (368) --DE
SET #startorderdate = '4/1/2019' --'1/1/2017' --EDIT THESE
SET #endorderdate = '6/30/2019' --EDIT THESE
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---step 1 for the list of opids and check the table to see if more columns that are needed are present to include them
--Create a list of relevant OpIDs for the selected time period
select
op1.oporid,
op1.opcurrentponum,
o.orcompletedate,
o.orsoid,
op1.opid,
op1.opreplacesopid,
op1.opreplacedbyopid,
op1.OpSplitFromOpID,
op1.opsuid,
op1.opprsku,
--op1.orosid,
op1.opdatenew,
OPCOMPONENTMANUFACTURERPARTID
into csn_junk.dbo.SCOpid
from csn_order..vworder o with (nolock)
inner join csn_order..vworderproduct op1 with (nolock) on oporid = orid
LEFT JOIN CSN_ORDER..TBLORDERPRODUCT v WITH (NOLOCK) on op1.opid = v.OpID
where op1.OpPrGiftCertificate = 0
and orcompletedate between #startorderdate and #endorderdate
and orsoid in (select soid from #orsoid)
Select * From csn_junk.dbo.SCOpid
First, there is no way to know why a query is running on for many hours on a server we don't have access to or without any metrics (i.e an. execution plan or CPU/Memory/IO metrics.) Also, without any DDL there it's impossible to understand what's going on with your query.
General Guidelines for troubleshooting slow data modification:
Getting the right metrics
The first thing I'd do is run task manager on that server and see if you have a server issue or a query issue. Is the CPU pegged to 100%? If so, is sqlservr.exe the cause? How often do you run this query? How fast is it normally?
There are a number of native and 3rd party tools for collecting good metrics. Execution plans, DMFs and DMVs, Extended Events, SQL Traces, Query Store. You also have great third party tools like Brent Ozar's suite of tools, Adam Machanic's sp_whoisactive.
There's a saying in the BI World: If you can't measure it, you can't manage it. If you can't measure what's causing your queries to be slow, you won't know where to start.
Big updates like this can cause locking, blocking, lock-escalation and even deadlocks.
Understand execution plans, specifically actual execution plans.
I write my code in SSMS with "Show execution plan" turned on. I always want to know what my query is doing. You can also view the execution plans after the fact by capturing them using SQL Traces (via the SQL Profiler) or Extended Events.
This is a huge topic so I'll just mention some things off the top of my head that I look for in my plans when troubleshooting slow queries: Sorts, Key Lookups, RID lookups, Scans against large tables (e.g. you scan an entire 10,000,000 row table to retrieve 12,000 rows - for this you want a seek.) Sometimes there will be warnings in the execution plan such as a "tempdb spill" - these are bad. Sometimes the plan will call out "missing indexes" - a topic unto itself. Which brings me to...
INDEXES
This is where execution plans, DMV and other SQL monitoring tools really come in handy. The rule of thumb is, when you are doing SELECT queries it's nice to have plenty of good indexes available for the optimizer to chose; in a normalized data mart for example, more are better. For INSERT/UPDATE/DELETE operations you want as few indexes possible because each one associated with the query/data in the query is modified. For a big insert like the one you are doing, fewer indexes would be better on csn_junk.dbo.SCOpid and, as mentioned in the comments below your post, you want the correct indexes to support the tables used for the update.
CONSTRAINTS
Constraints slow data modification. The present referential integrity constraints (Primary/Foreign keys) and UNIQUE constraints will impact performance. CHECK constraints can as well; CHECK constraints that use a T-SQL scalar function will absolutely destroy data modification performance more than almost anything else I can think of except for scalar UDFs as CHECK constraints that also access other tables this can slow an insert that should a minute to several hours.
MDF & LDF file growth
A 2,000,000 row+/12 column insert is going to cause the associated MDF and LDF files to grow substantially. If your data files (.MDF or .NDF) or Log File (.LDF) fill up they will auto-grow to create space. This slows queries that run in seconds to minutes, especially when your auto-growth settings are bad. See: SQL Server Database Growth and Autogrowth Settings
Whenever I have a query that always runs for 10 seconds and now, out of nowhere, it's running for minutes. Assuming it's not a deadlock or server issue I will check for MDF or LDF autogrowth as this is often the culprit. Often you have a log file that needs to be shrunk (via log backup or manually depending on the recovery model.) This brings me to batching:
Batching
Huge inserts chew up log space and take forever to roll back if the query fails. Making things worse - cancelling a huge insert (or trying to Kill the Spid) will sometimes cause more problems. Doing data modifications in batches can circumvent this problem. See this article for more details.
Hopefully this helps get you started. I've given you plenty to google. Please forgive any typos - I spun this up fast. Feel free to ask followup questions.

MS SQL Server Query caching

One of my projects has a very large database on which I can't edit indexes etc., have to work as it is.
What I saw when testing some queries that I will be running on their database via a service that I am writing in .net. Is that they are quite slow when ran the first time?
What they used to do before is - they have 2 main (large) tables that are used mostly. They showed me that they open SQL Server Management Studio and run a
SELECT *
FROM table1
JOIN table2
a query that takes around 5 minutes to run the first time, but then takes about 30 seconds if you run it again without closing SQL Server Management Studio. What they do is they keep open SQL Server Management Studio 24/7 so that when one of their programs executes queries that are related to these 2 tables (which seems to be almost all queries ran by their program) in order to have the 30 seconds run time instead of the 5 minutes.
This happens because I assume the 2 tables get cached and then there are no (or close to none) disk reads.
Is this a good idea to have a service which then runs a query to cache these 2 tables every now and then? Or is there a better solution to this, given the fact that I can't edit indexes or split the tables, etc.?
Edit:
Sorry just I was possibly unclear, the DB hopefully has indexes already, just I am not allowed to edit them or anything.
Edit 2:
Query plan
This could be a candidate for an indexed view (if you can persuade your DBA to create it!), something like:
CREATE VIEW transhead_transdata
WITH SCHEMABINDING
AS
SELECT
<columns of interest>
FROM
transhead th
JOIN transdata td
ON th.GID = td.HeadGID;
GO
CREATE UNIQUE CLUSTERED INDEX transjoined_uci ON transhead_transdata (<something unique>);
This will "precompute" the JOIN (and keep it in sync as transhead and transdata change).
You can't create indexes? This is your biggest problem regarding performance. A better solution would be to create the proper indexes and address any performance by checking wait stats, resource contention, etc... I'd start with Brent Ozar's blog and open source tools, and move forward from there.
Keeping SSMS open doesn't prevent the plan cache from being cleared. I would start with a few links.
Understanding the query plan cache
Check your current plan cache
Understanding why the cache would clear (memory constraint, too many plans (can't hold them all), Index Rebuild operation, etc. Brent talks about this in this answer
How to clear it manually
Aside from that... that query is suspect. I wouldn't expect your application to use those results. That is, I wouldn't expect you to load every row and column from two tables into your application every time it was called. Understand that a different query on those same tables, like selecting less columns, adding a predicate, etc could and likely would cause SQL Server to generate a new query plan that was more optimized. The current query, without predicates and selecting every column... and no indexes as you stated, would simply do two table scans. Any increase in performance going forward wouldn't be because the plan was cached, but because the data was stored in memory and subsequent reads wouldn't experience physical reads. i.e. it is reading from memory versus disk.
There's a lot more that could be said, but I'll stop here.
You might also consider putting this query into a stored procedure which can then be scheduled to run at a regular interval through SQL Agent that will keep the required pages cached.
Thanks to both #scsimon #Branko Dimitrijevic for their answers I think they were really useful and the one that guided me in the right direction.
In the end it turns out that the 2 biggest issues were hardware resources (RAM, no SSD), and Auto Close feature that was set to True.
Other fixes that I have made (writing it here for anyone else that tries to improve):
A helper service tool will rearrange(defragment) indexes once every
week and will rebuild them once a month.
Create a view which has all the columns from the 2 tables in question - to eliminate JOIN cost.
Advised that a DBA can probably help with better tables/indexes
Advised to improve server hardware...
Will accept #Branko Dimitrijevic 's answer as I can't accept both

Select LIMIT 1 takes long time on postgresql

I'm running a simple query on localhost PostgreSQL database and it runs too long:
SELECT * FROM features LIMIT 1;
I expect such query to be finished in a fraction of a second as it basically says "peek anywhere in the database and pick one row". Or it doesn't?
table size is 75GB with estimated row count 1.84405e+008
I'm the only user of the database
the database server was just started, so I guess nothing is cached in memory
I totally agree with #larwa1n with the content he comment on your post.
The reason here, I guess, is the performance of SELECT is too slow.
With my experience maybe there are another reasons. I list as below:
The table is too big, so let add some WHERE CLAUSE and INDEX
The performance of your server/disk drive is too slow.
Other process take most resource.
Another reason maybe come from maintenance task, let check again does the autovacuum is running? If not, check is this table is vacuum already? If not, let do a vacuum full on that table. Sometimes, when you do a lot of insert/update/delete on a large table without vacuum will make the table save in fragmented disk block, which will take longer time in query.
Hopefully, this answer will help you find out the final reason.

Long running Jobs Performance Tips

I have been working with SQL server for a while and have used lot of performance techniques to fine tune many queries. Most of these queries were to be executed within few seconds or may be minutes.
I am working with a job which loads around 100K of data and runs for around 10 hrs.
What are the things I need to consider while writing or tuning such query? (e.g. memory, log size, other things)
Make sure you have good indexes defined on the columns you are querying on.
Ultimately, the best thing to do is to actually measure and find the source of your bottlenecks. Figure out which queries in a stored procedure or what operations in your code take the longest, and focus on slimming those down, first.
I am actually working on a similar problem right now, on a job that performs complex business logic in Java for a large number of database records. I've found that the key is to process records in batches, and make as much of the logic as possible operate on a batch instead of operating on a single record. This minimizes roundtrips to the database, and causes certain queries to be much more efficient than when I run them for one record at a time. Limiting the batch size prevents the server from running out of memory when working on the Java side. Since I am using Hibernate, I also call session.clear() after every batch, to prevent the session from keeping copies of objects I no longer need from previous batches.
Also, an RDBMS is optimized for working with large sets of data; use normal SQL operations whenever possible. Avoid things like cursors, and a lot procedural programming; as other people have said, make sure you have your indexes set up correctly.
It's impossible to say without looking at the query. Just because you have indexes doesn't mean they are being used. You'll have to look at the execution plan and see if they are being used. They might show that they aren't useful to the execution plan.
You can start with looking at the estimated execution plan. If the job actually completes, you can wait for the actual execution plan. Look at parameter sniffing. Also, I had an extremely odd case on SQL Server 2005 where
SELECT * FROM l LEFT JOIN r ON r.ID = l.ID WHERE r.ID IS NULL
would not complete, yet
SELECT * FROM l WHERE l.ID NOT IN (SELECT r.ID FROM r)
worked fine - but only for particular tables. Problem was never resolved.
Make sure your statistics are up to date.
If possible post your query here so there is something to look at. I recall a query someone built with joins to 12 different tables dealing with around 4 or so million records that took around a day to run. I was able to tune that to run within 30 mins by eliminating the unnecessary joins. Where possible try to reduce the datasets you are joining before returning your results. Use plenty of temp tables, views etc if you need.
In cases of large datasets with conditions try to preapply your conditions through a view before your joins to reduce the number of records.
100k joining 100k is a lot bigger than 2k joining 3k