How to improve the performance of multiple joins

How to improve the performance of multiple joins - sql

I have a query with multiple joins in it. When I execute the query it takes too long. Can you please suggest me how to improve this query?
ALTER View [dbo].[customReport]
As
SELECT DISTINCT ViewUserInvoicerReport.Owner,
ViewUserAll.ParentID As Account , ViewContact.Company,
Payment.PostingDate, ViewInvoice.Charge, ViewInvoice.Tax,
PaymentProcessLog.InvoiceNumber
FROM
ViewContact
Inner Join ViewUserInvoicerReport on ViewContact.UserID = ViewUserInvoicerReport.UserID
Inner Join ViewUserAll on ViewUserInvoicerReport.UserID = ViewUserAll.UserID
Inner Join Payment on Payment.UserID = ViewUserAll.UserID
Inner Join ViewInvoice on Payment.UserID = ViewInvoice.UserID
Inner Join PaymentProcessLog on ViewInvoice.UserID = PaymentProcessLog.UserID
GO

Work on removing the distinct.
THat is not a join issue. The problem is that ALL rows have to go into a temp table to find out which are double - if you analyze the query plan (programmers 101 - learn to use that fast) you will see that the join likely is not the big problem but the distinct is.
And IIRC that distinct is USELESS because all rows are unique anyway... not 100% sure, but the field list seems to indicate.
Use distincts VERY rarely please ;)

You should see the Query Execution Plan and optimize the query section by section.
The overall optimization process consists of two main steps:
Isolate long-running queries.
Identify the cause of long-running queries.
See - How To: Optimize SQL Queries for step by step instructions.
and

It's difficult to say how to improve the performance of a query without knowing things like how many rows of data are in each table, which columns are indexed, what performance you're looking for and which database you're using.
Most important:
1. Make sure that all columns used in joins are indexed
2. Make sure that the query execution plan indicates that you are using the indexes you expect

Related

Why does my query in oracle take longer when I use temporary tables?

I am facing a peculiar issue when using an inner query in ORACLE DB. I am fetching data from a table which is having huge number of records.
The query I am using contains an inner query.
When I provide the values directly in the inner query it is much
faster.
But when I use exactly the same values from another (temporary) table
by either inner query or JOIN, it takes too longer.
Below is the query:
Faster performance
SELECT assembly_item_id menuItemId,
location_id restId,
bill_sequence_id,
bill_config_id
FROM zil_ibat_resolve_bmi_ai_max_v
WHERE assembly_item_id = 8321
AND location_id IN (82, 85, 116, .........)
Low in performance when used select query in inner section
Without JOIN
SELECT assembly_item_id menuItemId,
location_id restId,
bill_sequence_id,
bill_config_id
FROM zil_ibat_resolve_bmi_ai_max_v
WHERE assembly_item_id = 8321
AND location_id IN (SELECT temp_id FROM global_temp_ids)
With JOIN
SELECT assembly_item_id menuItemId, location_id restId, bill_sequence_id, bill_config_id
FROM zil_ibat_resolve_bmi_ai_max_v t1
join global_temp_ids t2
on t1.location_id = t2.temp_id
WHERE t1.assembly_item_id = 8321
Note: zil_ibat_resolve_bmi_ai_max_v is a view.
What is wrong with this query? Why is it taking so much time when I query table instead of putting the IDs directly in the inner section? Is there an alternate for this?
Explain Plan
usedSelectQueryInInnerSection.png
usedJoin
enterNumbersInInnerQuery

The second and third query are slow because of the NESTED LOOP join between the view results and the temporary table. Changing it to a HASH join, perhaps through better optimizer statistics or a USE_HASH hint, should speed up the query.
Problem
This part at the top of the execution plan:
NESTED LOOPS
zil_ibat_resolve_bmi_ai_max_v
global_temp_ids
is similar to this pseudo-code:
for each row of zil_ibat_resolve_bmi_ai_max_v
search index of global_temp_ids
Based on the images the execution plan for the view does not change between queries, that part must be relatively fast. And the look-up of the temporary table uses a unique index search, that must also be fast. But it is only fast to do it once. And we can tell from the the Cardinality 1 that the Oracle optimizer thinks it will only execute the inner part of the join once.
NESTED LOOPs are great when joining a small number of rows. HASH JOINs work much better when joining a large number of rows.
Solutions
There are many ways to change the join method, here are the two to try first:
1. Gather statistics. Better optimizer statistics will improve the cardinality estimates, which will usually improve execution plans. There are many ways to gather stats but usually the default settings are the best. In this case they can be gathered by running a procedure like this: exec dbms_stats.gather_schema_stats('SMART'); Repeat that for the schemas ZILADMIN and XCBAIRAG. If the statistics were missing or stale it would also be a good idea to investigate why the default statistics gathering job did not run.
2. Hint. Hints should generally be avoided in production code but they can still at least be helpful to diagnose the problem. Run the query with the hint SELECT /*+ USE_HASH(t1 t2) */ ... and see if that improves things. If that works you can either keep the hint or consider using some other form of plan management. For example, a SQL Profile may solve this and other problems in a cleaner way. Check with other developers or DBAs to find out what types of plan management features are common in your system.

why do some columns slow down the query

I am using SQL Server 2012.
I am trying to optimize a query which is somehting like this:
SELECT TOP 20 ta.id,
ta.name,
ta.amt,
tb.id,
tb.name,
tc.name,
tc.id,
tc.descr
FROM a ta
INNER JOIN b tb
ON ta.id = tb.id
INNER JOIN c tc
ON tb.id = tc.id
ORDER BY ta.mytime DESC
The query takes around 5 - 6 secs to run. There are indexes for all the columns used in joins. The tables have 500k records.
My question is: When I remove the columns tc.name, tc.id and tc.descr from the select, the query returns the results in less than a second. Why?

You need to post the execution plans to really know the difference.
As far as I know, SQL Server does not optimize away joins. After all, even without columns in the select list, the joins can still be used for filtering and multiplying the number of rows.
However, one step might be skipped. With the variables in the select, the engine needs to both go to the index and fetch the page with the data. Without the variables, the engine does not need to do the fetch. This may subtly tip the balance of the optimizer from one type of join to another.
A second possibility simply involves timing. If you ran the query once, then page caches might be filled on the machine. The second time you run it, the query goes much faster simply because the data is in memory. Don't ever run timings unless you either (1) clear the cache between each call or (2) be sure that the cache is filled equivalently.

Do you have clustered indexes? If not, you should create clustered indexes and run your query integer and mostly on primary key columns.
Check http://msdn.microsoft.com/en-us/library/aa933131(v=sql.80).aspx for clustered index.

I was finally able to tune the query by adding additional index to the table. SQL server did not show/imply a missing index but I figured it out by creating a new non-clustered index on a field that is present in a select.
Thanks to you all for coming forward for help.
#Wade the link is really helpful in understanding the SQL optimizer the

Subquery v/s inner join in sql server

I have following queries
First one using inner join
SELECT item_ID,item_Code,item_Name
FROM [Pharmacy].[tblitemHdr] I
INNER JOIN EMR.tblFavourites F ON I.item_ID=F.itemID
WHERE F.doctorID = #doctorId AND F.favType = 'I'
second one using sub query like
SELECT item_ID,item_Code,item_Name from [Pharmacy].[tblitemHdr]
WHERE item_ID IN
(SELECT itemID FROM EMR.tblFavourites
WHERE doctorID = #doctorId AND favType = 'I'
)
In this item table [Pharmacy].[tblitemHdr] Contains 15 columns and 2000 records. And [Pharmacy].[tblitemHdr] contains 5 columns and around 100 records. in this scenario which query gives me better performance?

Usually joins will work faster than inner queries, but in reality it will depend on the execution plan generated by SQL Server. No matter how you write your query, SQL Server will always transform it on an execution plan. If it is "smart" enough to generate the same plan from both queries, you will get the same result.
Here and here some links to help.

In Sql Server Management Studio you can enable "Client Statistics" and also Include Actual Execution Plan. This will give you the ability to know precisely the execution time and load of each request.
Also between each request clean the cache to avoid cache side effect on performance
USE <YOURDATABASENAME>;
GO
CHECKPOINT;
GO
DBCC DROPCLEANBUFFERS;
GO
I think it's always best to see with our own eyes than relying on theory !

Sub-query Vs Join
Table one 20 rows,2 cols
Table two 20 rows,2 cols
sub-query 20*20
join 20*2
logical, rectify
Detailed
The scan count indicates multiplication effect as the system will have to go through again and again to fetch data, for your performance measure, just look at the time

join is faster than subquery.
subquery makes for busy disk access, think of hard disk's read-write needle(head?) that goes back and forth when it access: User, SearchExpression, PageSize, DrilldownPageSize, User, SearchExpression, PageSize, DrilldownPageSize, User... and so on.
join works by concentrating the operation on the result of the first two tables, any subsequent joins would concentrate joining on the in-memory(or cached to disk) result of the first joined tables, and so on. less read-write needle movement, thus faster
Source: Here

First query is better than second query.. because first query we are joining both table.
and also check the explain plan for both queries...

SQL Server - joining 4 fast queries gives me one slow query

I have 4 views in my MS Sql Server Database which are all quite fast (less than 2 seconds) and return all less than 50 rows.
BUT when I create a query where I join those 4 views (left outer joins) I get a query which takes almost one minute to finish.
I think the query optimizer is doing a bad job here, is there any way to speed this up. I am tempted to copy each of the 4 views into a table and join them together but this seems like too much of a workaround to me.
(Sidenote: I can't set any indexes on any tables because the views come from a different database and I am not allowed to change anything there, so this is not an option)
EDIT: I am sorry, but I don't think posting the sql queries will help. They are quite complex and use around 50 different tables. I cannot post an execution plan either because I don't have enought access rights to generate an execution plan on some of the databases.
I guess my best solution right now is to generate temporary tables to store the results of each query.

If you can't touch indexes, to speed up, you can put results of you 4 queries in 4 temp tables and then join them.
You can do this in a stored procedure.

You can have derived table of views while joining.
EXAMPLE: Instead of having this query
SELECT V1.* FROM dbo.View1 AS V1 INNER JOIN dbo.View2 as V2
ON V1.Column1=V2.Column1;
you can have the below query
SELECT V1.* FROM (SELECT * FROM dbo.View1) AS V1 INNER JOIN (SELECT * FROM dbo.View2) AS V2
ON V1.Column1=V2.Column1;
I hope this can impove the performance.

If you have many columns, only include the columns you need. Particularly, if you have many math operations on the columns, the database has to convert all of the numbers when it returns the results.
One more point is that it is sometimes better to do 3 queries than make a huge join and do 1 query.
Without specifics, however, it is difficult to give the right advice beyond generalities.

simple sql query

which one is faster
select * from parents p
inner join children c on p.id = c.pid
where p.x = 2
OR
select * from
(select * from parents where p.x = 2)
p
inner join children c on p.id = c.pid
where p.x = 2

In MySQL, the first one is faster:
SELECT *
FROM parents p
INNER JOIN
children c
ON c.pid = p.id
WHERE p.x = 2
, since using an inline view implies generating and passing the records twice.
In other engines, they are usually optimized to use one execution plan.
MySQL is not very good in parallelizing and pipelining the result streams.
Like this query:
SELECT *
FROM mytable
LIMIT 1
is instant, while this one (which is semantically identical):
SELECT *
FROM (
SELECT *
FROM mytable
)
LIMIT 1
will first select all values from mytable, buffer them somewhere and then fetch the first record.
For Oracle, SQL Server and PostgreSQL, the queries above (and both of your queries) will most probably yield the same execution plans.

I know this is a simple case, but your first option is much more readable than the second one. As long as the two query plans are comparable I'd always opt for the more maintainable SQL code which your first example is for me.

It depends on how good the database is at optimising the query.
If the database manages to optimise the second one into the first one, they are equally fast, otherwise the first one is faster.
The first one gives more freedom for the database to optimise the query. The second one suggests a specific order of doing things. Either the database is able to see past this and optimise it into a single query, or it will run the query as two separate queries with the subquery as an intermediate result.
A database like SQL Server keeps statistics on what the database tables contain, which it uses to determine how to execute the query in the most efficient way. For example, depending on what will elliminate most records it can either start with joining the tables or filtering the parents table on the condition. If you write a query that forces a specific order, that might not be the most efficient order.

I'd think the first. I'm not sure if the optimizer would use any indexes on the the derived table in the second query, or if it would copy out all the rows that match into memory before joining back to the children.

This is why you have DBAs. It depends entirely on the DBMS, and how your tables and indexes are configured, as to which one runs the fastest.
Database tuning is not a set-and-forget operation, it should be done regularly, as the data changes, to ensure your database runs at peak performance. The question is not really meaningful without specifying:
which DBMS you are asking about.
what indexes you have on the tables.
a host of other possible configuration items (which may also depend on the DBMS, such as clustering).
You should run both those queries through the query optimizer to see which one is fastest, then start using that one. That's assuming the difference in noticeable in the first place. If the difference is minimal, go for the easiest to read/maintain.

For me, in the second query you are saying, I don't trust the optimizer to optimize this query so I'll provide some 'hints'.
I'd say, trust the optimizer until it let's you down and only then consider trying to do the optimizer's job for it.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas