SQL - Make Join Query Faster - sql

I've made a query that selects 2 values from 2 tables. I need to run this query about 32 times when a visitor visits my website. This makes the page quite slow (it takes over 5 seconds to fully load).
The query looks like this:
SELECT tmdb.name, patch.sfo_title
FROM tmdb
RIGHT JOIN patch
ON tmdb.titleid = CONCAT(patch.cusa, '_00')
WHERE cusa = :titleid
LIMIT 1
is there any way to make this query faster? The query isn't the biggest operation if I look at it, so I'm not really sure why it's so slow?

I would write the query as a left join (out of preference, not performance):
SELECT t.name, p.sfo_title
FROM patch p LEFT JOIN
tmdb t
ON t.titleid = CONCAT(p.cusa, '_00')
WHERE p.cusa = :titleid
LIMIT 1;
Then for performance, I would recommend indexes on patch(cusa, sfo_title) and t(titleid).
Note that the use of LIMIT without ORDER BY is suspicious, although you might have reasons for it.

You are joining two tables based on a CALCULATED field? No wonder it is slow. I have no idea how your tables get maintained, but you need to get that concat'ed value of CUSA into the data base as a separate field and get it indexed, as Gordon Linoff suggested. You could even maintain it through On Insert and On Update triggers. Personally, I would examine why you have similar but different keys in your two tables and try to rationalize it down to one. That Concat(CUSA, '_00' ) looks suspiciously like an opportunity to simplify the application.

Related

How to improve the performance of multiple joins

I have a query with multiple joins in it. When I execute the query it takes too long. Can you please suggest me how to improve this query?
ALTER View [dbo].[customReport]
As
SELECT DISTINCT ViewUserInvoicerReport.Owner,
ViewUserAll.ParentID As Account , ViewContact.Company,
Payment.PostingDate, ViewInvoice.Charge, ViewInvoice.Tax,
PaymentProcessLog.InvoiceNumber
FROM
ViewContact
Inner Join ViewUserInvoicerReport on ViewContact.UserID = ViewUserInvoicerReport.UserID
Inner Join ViewUserAll on ViewUserInvoicerReport.UserID = ViewUserAll.UserID
Inner Join Payment on Payment.UserID = ViewUserAll.UserID
Inner Join ViewInvoice on Payment.UserID = ViewInvoice.UserID
Inner Join PaymentProcessLog on ViewInvoice.UserID = PaymentProcessLog.UserID
GO
Work on removing the distinct.
THat is not a join issue. The problem is that ALL rows have to go into a temp table to find out which are double - if you analyze the query plan (programmers 101 - learn to use that fast) you will see that the join likely is not the big problem but the distinct is.
And IIRC that distinct is USELESS because all rows are unique anyway... not 100% sure, but the field list seems to indicate.
Use distincts VERY rarely please ;)
You should see the Query Execution Plan and optimize the query section by section.
The overall optimization process consists of two main steps:
Isolate long-running queries.
Identify the cause of long-running queries.
See - How To: Optimize SQL Queries for step by step instructions.
and
It's difficult to say how to improve the performance of a query without knowing things like how many rows of data are in each table, which columns are indexed, what performance you're looking for and which database you're using.
Most important:
1. Make sure that all columns used in joins are indexed
2. Make sure that the query execution plan indicates that you are using the indexes you expect

why do some columns slow down the query

I am using SQL Server 2012.
I am trying to optimize a query which is somehting like this:
SELECT TOP 20 ta.id,
ta.name,
ta.amt,
tb.id,
tb.name,
tc.name,
tc.id,
tc.descr
FROM a ta
INNER JOIN b tb
ON ta.id = tb.id
INNER JOIN c tc
ON tb.id = tc.id
ORDER BY ta.mytime DESC
The query takes around 5 - 6 secs to run. There are indexes for all the columns used in joins. The tables have 500k records.
My question is: When I remove the columns tc.name, tc.id and tc.descr from the select, the query returns the results in less than a second. Why?
You need to post the execution plans to really know the difference.
As far as I know, SQL Server does not optimize away joins. After all, even without columns in the select list, the joins can still be used for filtering and multiplying the number of rows.
However, one step might be skipped. With the variables in the select, the engine needs to both go to the index and fetch the page with the data. Without the variables, the engine does not need to do the fetch. This may subtly tip the balance of the optimizer from one type of join to another.
A second possibility simply involves timing. If you ran the query once, then page caches might be filled on the machine. The second time you run it, the query goes much faster simply because the data is in memory. Don't ever run timings unless you either (1) clear the cache between each call or (2) be sure that the cache is filled equivalently.
Do you have clustered indexes? If not, you should create clustered indexes and run your query integer and mostly on primary key columns.
Check http://msdn.microsoft.com/en-us/library/aa933131(v=sql.80).aspx for clustered index.
I was finally able to tune the query by adding additional index to the table. SQL server did not show/imply a missing index but I figured it out by creating a new non-clustered index on a field that is present in a select.
Thanks to you all for coming forward for help.
#Wade the link is really helpful in understanding the SQL optimizer the

order of tables in FROM clause

For an sql query like this.
Select * from TABLE_A a
JOIN TABLE_B b
ON a.propertyA = b.propertyA
JOIN TABLE_C
ON b.propertyB = c.propertyB
Does the sequence of the tables matter. It wont matter in results, but do they affect the performance?
One can assume that the data in table C is much larger that a or b.
For each sql statement, the engine will create a query plan. So no matter how you put them, the engine will chose a correct path to build the query.
More on plans you have http://en.wikipedia.org/wiki/Query_plan
There are ways, considering what RDBMS you are using to enforce the query order and plan, using hints, however, if you feel that the engine does no chose the correct path.
Sometimes Order of table creates a difference here,(when you are using different joins)
Actually our Joins working on Cross Product Concept
If you are using query like this A join B join C
It will be treated like this (A*B)*C)
Means first result comes after joining A and B table then it will make join with C table
So if after inner joining A (100 record) and B (200 record) if it will give (100 record)
And then these ( 100 record ) will compare with (1000 record of C)
No.
Well, there is a very, very tiny chance of this happening, see this article by Jonathan Lewis. Basically, the number of possible join orders grows very quickly, and there's not enough time for the Optimizer to check them all. The sequence of the tables may be used as a tie-breaker in some very rare cases. But I've never seen this happen, or even heard about it happening, to anybody in real life. You don't need to worry about it.

SQL Server - joining 4 fast queries gives me one slow query

I have 4 views in my MS Sql Server Database which are all quite fast (less than 2 seconds) and return all less than 50 rows.
BUT when I create a query where I join those 4 views (left outer joins) I get a query which takes almost one minute to finish.
I think the query optimizer is doing a bad job here, is there any way to speed this up. I am tempted to copy each of the 4 views into a table and join them together but this seems like too much of a workaround to me.
(Sidenote: I can't set any indexes on any tables because the views come from a different database and I am not allowed to change anything there, so this is not an option)
EDIT: I am sorry, but I don't think posting the sql queries will help. They are quite complex and use around 50 different tables. I cannot post an execution plan either because I don't have enought access rights to generate an execution plan on some of the databases.
I guess my best solution right now is to generate temporary tables to store the results of each query.
If you can't touch indexes, to speed up, you can put results of you 4 queries in 4 temp tables and then join them.
You can do this in a stored procedure.
You can have derived table of views while joining.
EXAMPLE: Instead of having this query
SELECT V1.* FROM dbo.View1 AS V1 INNER JOIN dbo.View2 as V2
ON V1.Column1=V2.Column1;
you can have the below query
SELECT V1.* FROM (SELECT * FROM dbo.View1) AS V1 INNER JOIN (SELECT * FROM dbo.View2) AS V2
ON V1.Column1=V2.Column1;
I hope this can impove the performance.
If you have many columns, only include the columns you need. Particularly, if you have many math operations on the columns, the database has to convert all of the numbers when it returns the results.
One more point is that it is sometimes better to do 3 queries than make a huge join and do 1 query.
Without specifics, however, it is difficult to give the right advice beyond generalities.

simple sql query

which one is faster
select * from parents p
inner join children c on p.id = c.pid
where p.x = 2
OR
select * from
(select * from parents where p.x = 2)
p
inner join children c on p.id = c.pid
where p.x = 2
In MySQL, the first one is faster:
SELECT *
FROM parents p
INNER JOIN
children c
ON c.pid = p.id
WHERE p.x = 2
, since using an inline view implies generating and passing the records twice.
In other engines, they are usually optimized to use one execution plan.
MySQL is not very good in parallelizing and pipelining the result streams.
Like this query:
SELECT *
FROM mytable
LIMIT 1
is instant, while this one (which is semantically identical):
SELECT *
FROM (
SELECT *
FROM mytable
)
LIMIT 1
will first select all values from mytable, buffer them somewhere and then fetch the first record.
For Oracle, SQL Server and PostgreSQL, the queries above (and both of your queries) will most probably yield the same execution plans.
I know this is a simple case, but your first option is much more readable than the second one. As long as the two query plans are comparable I'd always opt for the more maintainable SQL code which your first example is for me.
It depends on how good the database is at optimising the query.
If the database manages to optimise the second one into the first one, they are equally fast, otherwise the first one is faster.
The first one gives more freedom for the database to optimise the query. The second one suggests a specific order of doing things. Either the database is able to see past this and optimise it into a single query, or it will run the query as two separate queries with the subquery as an intermediate result.
A database like SQL Server keeps statistics on what the database tables contain, which it uses to determine how to execute the query in the most efficient way. For example, depending on what will elliminate most records it can either start with joining the tables or filtering the parents table on the condition. If you write a query that forces a specific order, that might not be the most efficient order.
I'd think the first. I'm not sure if the optimizer would use any indexes on the the derived table in the second query, or if it would copy out all the rows that match into memory before joining back to the children.
This is why you have DBAs. It depends entirely on the DBMS, and how your tables and indexes are configured, as to which one runs the fastest.
Database tuning is not a set-and-forget operation, it should be done regularly, as the data changes, to ensure your database runs at peak performance. The question is not really meaningful without specifying:
which DBMS you are asking about.
what indexes you have on the tables.
a host of other possible configuration items (which may also depend on the DBMS, such as clustering).
You should run both those queries through the query optimizer to see which one is fastest, then start using that one. That's assuming the difference in noticeable in the first place. If the difference is minimal, go for the easiest to read/maintain.
For me, in the second query you are saying, I don't trust the optimizer to optimize this query so I'll provide some 'hints'.
I'd say, trust the optimizer until it let's you down and only then consider trying to do the optimizer's job for it.