SQL Server - joining 4 fast queries gives me one slow query - sql

I have 4 views in my MS Sql Server Database which are all quite fast (less than 2 seconds) and return all less than 50 rows.
BUT when I create a query where I join those 4 views (left outer joins) I get a query which takes almost one minute to finish.
I think the query optimizer is doing a bad job here, is there any way to speed this up. I am tempted to copy each of the 4 views into a table and join them together but this seems like too much of a workaround to me.
(Sidenote: I can't set any indexes on any tables because the views come from a different database and I am not allowed to change anything there, so this is not an option)
EDIT: I am sorry, but I don't think posting the sql queries will help. They are quite complex and use around 50 different tables. I cannot post an execution plan either because I don't have enought access rights to generate an execution plan on some of the databases.
I guess my best solution right now is to generate temporary tables to store the results of each query.

If you can't touch indexes, to speed up, you can put results of you 4 queries in 4 temp tables and then join them.
You can do this in a stored procedure.

You can have derived table of views while joining.
EXAMPLE: Instead of having this query
SELECT V1.* FROM dbo.View1 AS V1 INNER JOIN dbo.View2 as V2
ON V1.Column1=V2.Column1;
you can have the below query
SELECT V1.* FROM (SELECT * FROM dbo.View1) AS V1 INNER JOIN (SELECT * FROM dbo.View2) AS V2
ON V1.Column1=V2.Column1;
I hope this can impove the performance.

If you have many columns, only include the columns you need. Particularly, if you have many math operations on the columns, the database has to convert all of the numbers when it returns the results.
One more point is that it is sometimes better to do 3 queries than make a huge join and do 1 query.
Without specifics, however, it is difficult to give the right advice beyond generalities.

Related

SQL - Make Join Query Faster

I've made a query that selects 2 values from 2 tables. I need to run this query about 32 times when a visitor visits my website. This makes the page quite slow (it takes over 5 seconds to fully load).
The query looks like this:
SELECT tmdb.name, patch.sfo_title
FROM tmdb
RIGHT JOIN patch
ON tmdb.titleid = CONCAT(patch.cusa, '_00')
WHERE cusa = :titleid
LIMIT 1
is there any way to make this query faster? The query isn't the biggest operation if I look at it, so I'm not really sure why it's so slow?
I would write the query as a left join (out of preference, not performance):
SELECT t.name, p.sfo_title
FROM patch p LEFT JOIN
tmdb t
ON t.titleid = CONCAT(p.cusa, '_00')
WHERE p.cusa = :titleid
LIMIT 1;
Then for performance, I would recommend indexes on patch(cusa, sfo_title) and t(titleid).
Note that the use of LIMIT without ORDER BY is suspicious, although you might have reasons for it.
You are joining two tables based on a CALCULATED field? No wonder it is slow. I have no idea how your tables get maintained, but you need to get that concat'ed value of CUSA into the data base as a separate field and get it indexed, as Gordon Linoff suggested. You could even maintain it through On Insert and On Update triggers. Personally, I would examine why you have similar but different keys in your two tables and try to rationalize it down to one. That Concat(CUSA, '_00' ) looks suspiciously like an opportunity to simplify the application.

Inner join vs separate statements

I am trying to get some more context on this out of curiosity. So far when I run 2 separate sql statements I find in SQL Profiler that I have no CPU cycles, less reads and less duration than taking the script and using Inner join. Is this a typical case, I am looking for help to understand this better.
Simple example:
SELECT * FROM dbo.ChargeCode
SELECT * FROM dbo.ChargeCodeGroup
vs
SELECT *
FROM dbo.ChargeCode c
INNER JOIN dbo.ChargeCodeGroup cc ON c.ChargeCodeGroupID = cc.ChargeCodeGroupID
From what I guess is that inner join cost extra CPU cycles because its doing a nested loop. Am I on the right track with this?
The simple answer is that you're doing two different things here. In your 1st example you're retrieving 2 separate entities. In your second example, your asking the RDBMS to combine (join) 2 entities into a single result set.
A join is one of the most powerful capabilities of an RDBMS - and it will (usually) do it as efficiently as it possibly can - but that's not to say it's free or cheap.
SELECT * FROM sometable
must scan whole table.
If there are indexes on ChargeCodeGroupID column on either table, it will be much faster for INNER JOIN to only scan index. (By their name, I guess there are). Of course, if there is no index on either ChargeCodeGroupID column, second query is slower than the first one.

How to improve the performance of multiple joins

I have a query with multiple joins in it. When I execute the query it takes too long. Can you please suggest me how to improve this query?
ALTER View [dbo].[customReport]
As
SELECT DISTINCT ViewUserInvoicerReport.Owner,
ViewUserAll.ParentID As Account , ViewContact.Company,
Payment.PostingDate, ViewInvoice.Charge, ViewInvoice.Tax,
PaymentProcessLog.InvoiceNumber
FROM
ViewContact
Inner Join ViewUserInvoicerReport on ViewContact.UserID = ViewUserInvoicerReport.UserID
Inner Join ViewUserAll on ViewUserInvoicerReport.UserID = ViewUserAll.UserID
Inner Join Payment on Payment.UserID = ViewUserAll.UserID
Inner Join ViewInvoice on Payment.UserID = ViewInvoice.UserID
Inner Join PaymentProcessLog on ViewInvoice.UserID = PaymentProcessLog.UserID
GO
Work on removing the distinct.
THat is not a join issue. The problem is that ALL rows have to go into a temp table to find out which are double - if you analyze the query plan (programmers 101 - learn to use that fast) you will see that the join likely is not the big problem but the distinct is.
And IIRC that distinct is USELESS because all rows are unique anyway... not 100% sure, but the field list seems to indicate.
Use distincts VERY rarely please ;)
You should see the Query Execution Plan and optimize the query section by section.
The overall optimization process consists of two main steps:
Isolate long-running queries.
Identify the cause of long-running queries.
See - How To: Optimize SQL Queries for step by step instructions.
and
It's difficult to say how to improve the performance of a query without knowing things like how many rows of data are in each table, which columns are indexed, what performance you're looking for and which database you're using.
Most important:
1. Make sure that all columns used in joins are indexed
2. Make sure that the query execution plan indicates that you are using the indexes you expect

Subquery v/s inner join in sql server

I have following queries
First one using inner join
SELECT item_ID,item_Code,item_Name
FROM [Pharmacy].[tblitemHdr] I
INNER JOIN EMR.tblFavourites F ON I.item_ID=F.itemID
WHERE F.doctorID = #doctorId AND F.favType = 'I'
second one using sub query like
SELECT item_ID,item_Code,item_Name from [Pharmacy].[tblitemHdr]
WHERE item_ID IN
(SELECT itemID FROM EMR.tblFavourites
WHERE doctorID = #doctorId AND favType = 'I'
)
In this item table [Pharmacy].[tblitemHdr] Contains 15 columns and 2000 records. And [Pharmacy].[tblitemHdr] contains 5 columns and around 100 records. in this scenario which query gives me better performance?
Usually joins will work faster than inner queries, but in reality it will depend on the execution plan generated by SQL Server. No matter how you write your query, SQL Server will always transform it on an execution plan. If it is "smart" enough to generate the same plan from both queries, you will get the same result.
Here and here some links to help.
In Sql Server Management Studio you can enable "Client Statistics" and also Include Actual Execution Plan. This will give you the ability to know precisely the execution time and load of each request.
Also between each request clean the cache to avoid cache side effect on performance
USE <YOURDATABASENAME>;
GO
CHECKPOINT;
GO
DBCC DROPCLEANBUFFERS;
GO
I think it's always best to see with our own eyes than relying on theory !
Sub-query Vs Join
Table one 20 rows,2 cols
Table two 20 rows,2 cols
sub-query 20*20
join 20*2
logical, rectify
Detailed
The scan count indicates multiplication effect as the system will have to go through again and again to fetch data, for your performance measure, just look at the time
join is faster than subquery.
subquery makes for busy disk access, think of hard disk's read-write needle(head?) that goes back and forth when it access: User, SearchExpression, PageSize, DrilldownPageSize, User, SearchExpression, PageSize, DrilldownPageSize, User... and so on.
join works by concentrating the operation on the result of the first two tables, any subsequent joins would concentrate joining on the in-memory(or cached to disk) result of the first joined tables, and so on. less read-write needle movement, thus faster
Source: Here
First query is better than second query.. because first query we are joining both table.
and also check the explain plan for both queries...

simple sql query

which one is faster
select * from parents p
inner join children c on p.id = c.pid
where p.x = 2
OR
select * from
(select * from parents where p.x = 2)
p
inner join children c on p.id = c.pid
where p.x = 2
In MySQL, the first one is faster:
SELECT *
FROM parents p
INNER JOIN
children c
ON c.pid = p.id
WHERE p.x = 2
, since using an inline view implies generating and passing the records twice.
In other engines, they are usually optimized to use one execution plan.
MySQL is not very good in parallelizing and pipelining the result streams.
Like this query:
SELECT *
FROM mytable
LIMIT 1
is instant, while this one (which is semantically identical):
SELECT *
FROM (
SELECT *
FROM mytable
)
LIMIT 1
will first select all values from mytable, buffer them somewhere and then fetch the first record.
For Oracle, SQL Server and PostgreSQL, the queries above (and both of your queries) will most probably yield the same execution plans.
I know this is a simple case, but your first option is much more readable than the second one. As long as the two query plans are comparable I'd always opt for the more maintainable SQL code which your first example is for me.
It depends on how good the database is at optimising the query.
If the database manages to optimise the second one into the first one, they are equally fast, otherwise the first one is faster.
The first one gives more freedom for the database to optimise the query. The second one suggests a specific order of doing things. Either the database is able to see past this and optimise it into a single query, or it will run the query as two separate queries with the subquery as an intermediate result.
A database like SQL Server keeps statistics on what the database tables contain, which it uses to determine how to execute the query in the most efficient way. For example, depending on what will elliminate most records it can either start with joining the tables or filtering the parents table on the condition. If you write a query that forces a specific order, that might not be the most efficient order.
I'd think the first. I'm not sure if the optimizer would use any indexes on the the derived table in the second query, or if it would copy out all the rows that match into memory before joining back to the children.
This is why you have DBAs. It depends entirely on the DBMS, and how your tables and indexes are configured, as to which one runs the fastest.
Database tuning is not a set-and-forget operation, it should be done regularly, as the data changes, to ensure your database runs at peak performance. The question is not really meaningful without specifying:
which DBMS you are asking about.
what indexes you have on the tables.
a host of other possible configuration items (which may also depend on the DBMS, such as clustering).
You should run both those queries through the query optimizer to see which one is fastest, then start using that one. That's assuming the difference in noticeable in the first place. If the difference is minimal, go for the easiest to read/maintain.
For me, in the second query you are saying, I don't trust the optimizer to optimize this query so I'll provide some 'hints'.
I'd say, trust the optimizer until it let's you down and only then consider trying to do the optimizer's job for it.