Why does my query in oracle take longer when I use temporary tables? - sql

I am facing a peculiar issue when using an inner query in ORACLE DB. I am fetching data from a table which is having huge number of records.
The query I am using contains an inner query.
When I provide the values directly in the inner query it is much
faster.
But when I use exactly the same values from another (temporary) table
by either inner query or JOIN, it takes too longer.
Below is the query:
Faster performance
SELECT assembly_item_id menuItemId,
location_id restId,
bill_sequence_id,
bill_config_id
FROM zil_ibat_resolve_bmi_ai_max_v
WHERE assembly_item_id = 8321
AND location_id IN (82, 85, 116, .........)
Low in performance when used select query in inner section
Without JOIN
SELECT assembly_item_id menuItemId,
location_id restId,
bill_sequence_id,
bill_config_id
FROM zil_ibat_resolve_bmi_ai_max_v
WHERE assembly_item_id = 8321
AND location_id IN (SELECT temp_id FROM global_temp_ids)
With JOIN
SELECT assembly_item_id menuItemId, location_id restId, bill_sequence_id, bill_config_id
FROM zil_ibat_resolve_bmi_ai_max_v t1
join global_temp_ids t2
on t1.location_id = t2.temp_id
WHERE t1.assembly_item_id = 8321
Note: zil_ibat_resolve_bmi_ai_max_v is a view.
What is wrong with this query? Why is it taking so much time when I query table instead of putting the IDs directly in the inner section? Is there an alternate for this?
Explain Plan
usedSelectQueryInInnerSection.png
usedJoin
enterNumbersInInnerQuery

The second and third query are slow because of the NESTED LOOP join between the view results and the temporary table. Changing it to a HASH join, perhaps through better optimizer statistics or a USE_HASH hint, should speed up the query.
Problem
This part at the top of the execution plan:
NESTED LOOPS
zil_ibat_resolve_bmi_ai_max_v
global_temp_ids
is similar to this pseudo-code:
for each row of zil_ibat_resolve_bmi_ai_max_v
search index of global_temp_ids
Based on the images the execution plan for the view does not change between queries, that part must be relatively fast. And the look-up of the temporary table uses a unique index search, that must also be fast. But it is only fast to do it once. And we can tell from the the Cardinality 1 that the Oracle optimizer thinks it will only execute the inner part of the join once.
NESTED LOOPs are great when joining a small number of rows. HASH JOINs work much better when joining a large number of rows.
Solutions
There are many ways to change the join method, here are the two to try first:
1. Gather statistics. Better optimizer statistics will improve the cardinality estimates, which will usually improve execution plans. There are many ways to gather stats but usually the default settings are the best. In this case they can be gathered by running a procedure like this: exec dbms_stats.gather_schema_stats('SMART'); Repeat that for the schemas ZILADMIN and XCBAIRAG. If the statistics were missing or stale it would also be a good idea to investigate why the default statistics gathering job did not run.
2. Hint. Hints should generally be avoided in production code but they can still at least be helpful to diagnose the problem. Run the query with the hint SELECT /*+ USE_HASH(t1 t2) */ ... and see if that improves things. If that works you can either keep the hint or consider using some other form of plan management. For example, a SQL Profile may solve this and other problems in a cleaner way. Check with other developers or DBAs to find out what types of plan management features are common in your system.

Related

SQL Join with GROUP BY query optimisation

I'm trying to optimise the following query.
SELECT C.name, COUNT(DISTINCT I.id), COUNT(B.id)
FROM Categories C, Items I, Bids B
WHERE C.id = I.category
AND I.id = B.item_id
GROUP BY C.name
ORDER BY 2 DESC, 3 DESC;
Categories is a small table with 20 records.
Items is a large table with over 50,000 records.
Bids is a even larger table with over 600,000 records.
I have an index on
Categories(name, id), Items(category), and Bids(item_id, id).
The PRIMARY KEY for each table is: Items(id), Categories(id), Bids(id)
Is there any possibility to optimise the query? Very appreciated.
Without EXPLAIN (ANALYZE, BUFFERS) output this is guesswork.
The query is so simple that nothing can be optimized there.
Make sore that you cave correct table statistics; check EXPLAIN (ANALYZE) to see if PostgreSQL's estimates are correct.
Increase shared_buffers so that the whole database fits into RAM (if you can).
Increase work_mem so that all hashes and sorts are performed in memory.
Not really you are scanning all records.
How many of the item records are hit with the data from bids. I would imagine all tables are full scanned and hash joined , and indexes disregarded.
ِYour query seems really boiler plate and I am sure that with the size of your tables, any not-really-low-hardware server can run this query in a heartbeat. But you can always make things better. Here's a list of optimizations you can make that are supposed to boost up your query's performance, theoretically:
Theoretically speaking, your biggest inefficiency here is that you are calculating cross product of your tables instead of joining them. You can rewrite the query with joins like:
...
FROM Items I
INNER JOIN Bids B
ON I.id = B.item_id
INNER JOIN Categories C
ON C.id = I.category
...
If we are considering everything performance wise, your index on the category for the Items table is inefficient, since your index has only 20 entries that are mapped to 50K entries. This here is an inefficient index, and you may even get better performance without this index. However, from a practical point of view, there are a lot of other stuff to consider here, so this may not actually be a big deal.
You have no index on the ID column of the Items table and having an index on that column speeds up your first join. (However PostgreSQL has default index on primary key columns so this is not a big deal either)
Also, adding explain analyze to the beginning of your query shows you the plan that the PostgreSQL query planner uses to run you queries. If you know a thing or two about query plans, I suggest you take a look a the results of that too to find any missing inefficiencies.

How to improve the performance of multiple joins

I have a query with multiple joins in it. When I execute the query it takes too long. Can you please suggest me how to improve this query?
ALTER View [dbo].[customReport]
As
SELECT DISTINCT ViewUserInvoicerReport.Owner,
ViewUserAll.ParentID As Account , ViewContact.Company,
Payment.PostingDate, ViewInvoice.Charge, ViewInvoice.Tax,
PaymentProcessLog.InvoiceNumber
FROM
ViewContact
Inner Join ViewUserInvoicerReport on ViewContact.UserID = ViewUserInvoicerReport.UserID
Inner Join ViewUserAll on ViewUserInvoicerReport.UserID = ViewUserAll.UserID
Inner Join Payment on Payment.UserID = ViewUserAll.UserID
Inner Join ViewInvoice on Payment.UserID = ViewInvoice.UserID
Inner Join PaymentProcessLog on ViewInvoice.UserID = PaymentProcessLog.UserID
GO
Work on removing the distinct.
THat is not a join issue. The problem is that ALL rows have to go into a temp table to find out which are double - if you analyze the query plan (programmers 101 - learn to use that fast) you will see that the join likely is not the big problem but the distinct is.
And IIRC that distinct is USELESS because all rows are unique anyway... not 100% sure, but the field list seems to indicate.
Use distincts VERY rarely please ;)
You should see the Query Execution Plan and optimize the query section by section.
The overall optimization process consists of two main steps:
Isolate long-running queries.
Identify the cause of long-running queries.
See - How To: Optimize SQL Queries for step by step instructions.
and
It's difficult to say how to improve the performance of a query without knowing things like how many rows of data are in each table, which columns are indexed, what performance you're looking for and which database you're using.
Most important:
1. Make sure that all columns used in joins are indexed
2. Make sure that the query execution plan indicates that you are using the indexes you expect

Subquery v/s inner join in sql server

I have following queries
First one using inner join
SELECT item_ID,item_Code,item_Name
FROM [Pharmacy].[tblitemHdr] I
INNER JOIN EMR.tblFavourites F ON I.item_ID=F.itemID
WHERE F.doctorID = #doctorId AND F.favType = 'I'
second one using sub query like
SELECT item_ID,item_Code,item_Name from [Pharmacy].[tblitemHdr]
WHERE item_ID IN
(SELECT itemID FROM EMR.tblFavourites
WHERE doctorID = #doctorId AND favType = 'I'
)
In this item table [Pharmacy].[tblitemHdr] Contains 15 columns and 2000 records. And [Pharmacy].[tblitemHdr] contains 5 columns and around 100 records. in this scenario which query gives me better performance?
Usually joins will work faster than inner queries, but in reality it will depend on the execution plan generated by SQL Server. No matter how you write your query, SQL Server will always transform it on an execution plan. If it is "smart" enough to generate the same plan from both queries, you will get the same result.
Here and here some links to help.
In Sql Server Management Studio you can enable "Client Statistics" and also Include Actual Execution Plan. This will give you the ability to know precisely the execution time and load of each request.
Also between each request clean the cache to avoid cache side effect on performance
USE <YOURDATABASENAME>;
GO
CHECKPOINT;
GO
DBCC DROPCLEANBUFFERS;
GO
I think it's always best to see with our own eyes than relying on theory !
Sub-query Vs Join
Table one 20 rows,2 cols
Table two 20 rows,2 cols
sub-query 20*20
join 20*2
logical, rectify
Detailed
The scan count indicates multiplication effect as the system will have to go through again and again to fetch data, for your performance measure, just look at the time
join is faster than subquery.
subquery makes for busy disk access, think of hard disk's read-write needle(head?) that goes back and forth when it access: User, SearchExpression, PageSize, DrilldownPageSize, User, SearchExpression, PageSize, DrilldownPageSize, User... and so on.
join works by concentrating the operation on the result of the first two tables, any subsequent joins would concentrate joining on the in-memory(or cached to disk) result of the first joined tables, and so on. less read-write needle movement, thus faster
Source: Here
First query is better than second query.. because first query we are joining both table.
and also check the explain plan for both queries...

Oracle SQL query - unexpected query plan

I have a very simple query that's giving me unexpected results. Hints on where to troubleshoot it would be welcome.
Simplified, the query is:
SELECT Obs.obsDate,
Obs.obsValue,
ObsHead.name
FROM ml.Obs Obs
JOIN ml.ObsHead ObsHead ON ObsHead.hdId = Obs.hdId
WHERE obs.hdId IN (53, 54)
This gives me a query cost of: 963. However, if I change the query to:
SELECT Obs.obsDate,
Obs.obsValue,
ObsHead.name
FROM ml.Obs Obs
JOIN ml.ObsHead ObsHead ON ObsHead.hdId = Obs.hdId
WHERE ObsHead.name IN ('BP SYSTOLIC', 'BP DIASTOLIC')
Although it (should) return the same data, the estimated cost shoots up to 17688. Where is the problem here likely to lie? Thanks.
Edit: The query plan says that the index on ObsHead.Name is being used for a range scan, and the table access on ObsHead only costs 4. There's another index on Obs.hdId that's being used for a range scan costing 94: it's the Nested Loops join between the tables that jumps up to 17K.
As has already been stated, the plan's cost is not intended for comparing two different queries, only for comparing different paths for the same query.
This is only a guess, but in this case, the cardinality field of the plan might be more useful to you. If the index on OBSHEAD is not unique and the statistics were gathered using an estimate, then the optimizer may not know exactly how many rows to expect when querying that table. The cardinality will tell you whether this is true or not (ideally, you'll be seeing a cardinality of 2 for OBSHEAD).
Another suggestion is to check the statistics on OBS. It seems likely that is a table that grows frequently, in which case, January 28th is not recent enough to have gathered the statistics. Assuming monitoring is turned on for this table, the queries below can tell you if the statistics are stale and need to be refreshed.
select owner, table_name, last_analyzed, stale_stats
from all_tab_statistics
where owner = 'ML' and table_name = 'OBS';
select owner, index_name, last_analyzed, stale_stats
from all_ind_statistics
where owner = 'ML' and table_name = 'OBS';
There is probably an index on hdId (which there is if it's the primary key, which I suspect is the case) and not on name which means that the second query will have to do a full table scan.
Costs are only useful for comparing different plans for one query; they're not so useful for comparing different queries.
You need to look at the plans and compare them in terms of the actions they perform.
I suspect the actual performance of these queries will be similar - however it would be interesting to know whether the first query uses a hash join, which might help things if the percentage of records in obs that are matched is significant.
I find the costs supplied by the optimizer to be interesting but not particularly useful. The best way I've found to compare queries is to run them and see how they perform relative to one another.
Share and enjoy.

simple sql query

which one is faster
select * from parents p
inner join children c on p.id = c.pid
where p.x = 2
OR
select * from
(select * from parents where p.x = 2)
p
inner join children c on p.id = c.pid
where p.x = 2
In MySQL, the first one is faster:
SELECT *
FROM parents p
INNER JOIN
children c
ON c.pid = p.id
WHERE p.x = 2
, since using an inline view implies generating and passing the records twice.
In other engines, they are usually optimized to use one execution plan.
MySQL is not very good in parallelizing and pipelining the result streams.
Like this query:
SELECT *
FROM mytable
LIMIT 1
is instant, while this one (which is semantically identical):
SELECT *
FROM (
SELECT *
FROM mytable
)
LIMIT 1
will first select all values from mytable, buffer them somewhere and then fetch the first record.
For Oracle, SQL Server and PostgreSQL, the queries above (and both of your queries) will most probably yield the same execution plans.
I know this is a simple case, but your first option is much more readable than the second one. As long as the two query plans are comparable I'd always opt for the more maintainable SQL code which your first example is for me.
It depends on how good the database is at optimising the query.
If the database manages to optimise the second one into the first one, they are equally fast, otherwise the first one is faster.
The first one gives more freedom for the database to optimise the query. The second one suggests a specific order of doing things. Either the database is able to see past this and optimise it into a single query, or it will run the query as two separate queries with the subquery as an intermediate result.
A database like SQL Server keeps statistics on what the database tables contain, which it uses to determine how to execute the query in the most efficient way. For example, depending on what will elliminate most records it can either start with joining the tables or filtering the parents table on the condition. If you write a query that forces a specific order, that might not be the most efficient order.
I'd think the first. I'm not sure if the optimizer would use any indexes on the the derived table in the second query, or if it would copy out all the rows that match into memory before joining back to the children.
This is why you have DBAs. It depends entirely on the DBMS, and how your tables and indexes are configured, as to which one runs the fastest.
Database tuning is not a set-and-forget operation, it should be done regularly, as the data changes, to ensure your database runs at peak performance. The question is not really meaningful without specifying:
which DBMS you are asking about.
what indexes you have on the tables.
a host of other possible configuration items (which may also depend on the DBMS, such as clustering).
You should run both those queries through the query optimizer to see which one is fastest, then start using that one. That's assuming the difference in noticeable in the first place. If the difference is minimal, go for the easiest to read/maintain.
For me, in the second query you are saying, I don't trust the optimizer to optimize this query so I'll provide some 'hints'.
I'd say, trust the optimizer until it let's you down and only then consider trying to do the optimizer's job for it.