simple sql query - sql

which one is faster
select * from parents p
inner join children c on p.id = c.pid
where p.x = 2
OR
select * from
(select * from parents where p.x = 2)
p
inner join children c on p.id = c.pid
where p.x = 2

In MySQL, the first one is faster:
SELECT *
FROM parents p
INNER JOIN
children c
ON c.pid = p.id
WHERE p.x = 2
, since using an inline view implies generating and passing the records twice.
In other engines, they are usually optimized to use one execution plan.
MySQL is not very good in parallelizing and pipelining the result streams.
Like this query:
SELECT *
FROM mytable
LIMIT 1
is instant, while this one (which is semantically identical):
SELECT *
FROM (
SELECT *
FROM mytable
)
LIMIT 1
will first select all values from mytable, buffer them somewhere and then fetch the first record.
For Oracle, SQL Server and PostgreSQL, the queries above (and both of your queries) will most probably yield the same execution plans.

I know this is a simple case, but your first option is much more readable than the second one. As long as the two query plans are comparable I'd always opt for the more maintainable SQL code which your first example is for me.

It depends on how good the database is at optimising the query.
If the database manages to optimise the second one into the first one, they are equally fast, otherwise the first one is faster.
The first one gives more freedom for the database to optimise the query. The second one suggests a specific order of doing things. Either the database is able to see past this and optimise it into a single query, or it will run the query as two separate queries with the subquery as an intermediate result.
A database like SQL Server keeps statistics on what the database tables contain, which it uses to determine how to execute the query in the most efficient way. For example, depending on what will elliminate most records it can either start with joining the tables or filtering the parents table on the condition. If you write a query that forces a specific order, that might not be the most efficient order.

I'd think the first. I'm not sure if the optimizer would use any indexes on the the derived table in the second query, or if it would copy out all the rows that match into memory before joining back to the children.

This is why you have DBAs. It depends entirely on the DBMS, and how your tables and indexes are configured, as to which one runs the fastest.
Database tuning is not a set-and-forget operation, it should be done regularly, as the data changes, to ensure your database runs at peak performance. The question is not really meaningful without specifying:
which DBMS you are asking about.
what indexes you have on the tables.
a host of other possible configuration items (which may also depend on the DBMS, such as clustering).
You should run both those queries through the query optimizer to see which one is fastest, then start using that one. That's assuming the difference in noticeable in the first place. If the difference is minimal, go for the easiest to read/maintain.

For me, in the second query you are saying, I don't trust the optimizer to optimize this query so I'll provide some 'hints'.
I'd say, trust the optimizer until it let's you down and only then consider trying to do the optimizer's job for it.

Related

Inner join vs separate statements

I am trying to get some more context on this out of curiosity. So far when I run 2 separate sql statements I find in SQL Profiler that I have no CPU cycles, less reads and less duration than taking the script and using Inner join. Is this a typical case, I am looking for help to understand this better.
Simple example:
SELECT * FROM dbo.ChargeCode
SELECT * FROM dbo.ChargeCodeGroup
vs
SELECT *
FROM dbo.ChargeCode c
INNER JOIN dbo.ChargeCodeGroup cc ON c.ChargeCodeGroupID = cc.ChargeCodeGroupID
From what I guess is that inner join cost extra CPU cycles because its doing a nested loop. Am I on the right track with this?
The simple answer is that you're doing two different things here. In your 1st example you're retrieving 2 separate entities. In your second example, your asking the RDBMS to combine (join) 2 entities into a single result set.
A join is one of the most powerful capabilities of an RDBMS - and it will (usually) do it as efficiently as it possibly can - but that's not to say it's free or cheap.
SELECT * FROM sometable
must scan whole table.
If there are indexes on ChargeCodeGroupID column on either table, it will be much faster for INNER JOIN to only scan index. (By their name, I guess there are). Of course, if there is no index on either ChargeCodeGroupID column, second query is slower than the first one.

How to improve the performance of multiple joins

I have a query with multiple joins in it. When I execute the query it takes too long. Can you please suggest me how to improve this query?
ALTER View [dbo].[customReport]
As
SELECT DISTINCT ViewUserInvoicerReport.Owner,
ViewUserAll.ParentID As Account , ViewContact.Company,
Payment.PostingDate, ViewInvoice.Charge, ViewInvoice.Tax,
PaymentProcessLog.InvoiceNumber
FROM
ViewContact
Inner Join ViewUserInvoicerReport on ViewContact.UserID = ViewUserInvoicerReport.UserID
Inner Join ViewUserAll on ViewUserInvoicerReport.UserID = ViewUserAll.UserID
Inner Join Payment on Payment.UserID = ViewUserAll.UserID
Inner Join ViewInvoice on Payment.UserID = ViewInvoice.UserID
Inner Join PaymentProcessLog on ViewInvoice.UserID = PaymentProcessLog.UserID
GO
Work on removing the distinct.
THat is not a join issue. The problem is that ALL rows have to go into a temp table to find out which are double - if you analyze the query plan (programmers 101 - learn to use that fast) you will see that the join likely is not the big problem but the distinct is.
And IIRC that distinct is USELESS because all rows are unique anyway... not 100% sure, but the field list seems to indicate.
Use distincts VERY rarely please ;)
You should see the Query Execution Plan and optimize the query section by section.
The overall optimization process consists of two main steps:
Isolate long-running queries.
Identify the cause of long-running queries.
See - How To: Optimize SQL Queries for step by step instructions.
and
It's difficult to say how to improve the performance of a query without knowing things like how many rows of data are in each table, which columns are indexed, what performance you're looking for and which database you're using.
Most important:
1. Make sure that all columns used in joins are indexed
2. Make sure that the query execution plan indicates that you are using the indexes you expect

Subquery v/s inner join in sql server

I have following queries
First one using inner join
SELECT item_ID,item_Code,item_Name
FROM [Pharmacy].[tblitemHdr] I
INNER JOIN EMR.tblFavourites F ON I.item_ID=F.itemID
WHERE F.doctorID = #doctorId AND F.favType = 'I'
second one using sub query like
SELECT item_ID,item_Code,item_Name from [Pharmacy].[tblitemHdr]
WHERE item_ID IN
(SELECT itemID FROM EMR.tblFavourites
WHERE doctorID = #doctorId AND favType = 'I'
)
In this item table [Pharmacy].[tblitemHdr] Contains 15 columns and 2000 records. And [Pharmacy].[tblitemHdr] contains 5 columns and around 100 records. in this scenario which query gives me better performance?
Usually joins will work faster than inner queries, but in reality it will depend on the execution plan generated by SQL Server. No matter how you write your query, SQL Server will always transform it on an execution plan. If it is "smart" enough to generate the same plan from both queries, you will get the same result.
Here and here some links to help.
In Sql Server Management Studio you can enable "Client Statistics" and also Include Actual Execution Plan. This will give you the ability to know precisely the execution time and load of each request.
Also between each request clean the cache to avoid cache side effect on performance
USE <YOURDATABASENAME>;
GO
CHECKPOINT;
GO
DBCC DROPCLEANBUFFERS;
GO
I think it's always best to see with our own eyes than relying on theory !
Sub-query Vs Join
Table one 20 rows,2 cols
Table two 20 rows,2 cols
sub-query 20*20
join 20*2
logical, rectify
Detailed
The scan count indicates multiplication effect as the system will have to go through again and again to fetch data, for your performance measure, just look at the time
join is faster than subquery.
subquery makes for busy disk access, think of hard disk's read-write needle(head?) that goes back and forth when it access: User, SearchExpression, PageSize, DrilldownPageSize, User, SearchExpression, PageSize, DrilldownPageSize, User... and so on.
join works by concentrating the operation on the result of the first two tables, any subsequent joins would concentrate joining on the in-memory(or cached to disk) result of the first joined tables, and so on. less read-write needle movement, thus faster
Source: Here
First query is better than second query.. because first query we are joining both table.
and also check the explain plan for both queries...

order of tables in FROM clause

For an sql query like this.
Select * from TABLE_A a
JOIN TABLE_B b
ON a.propertyA = b.propertyA
JOIN TABLE_C
ON b.propertyB = c.propertyB
Does the sequence of the tables matter. It wont matter in results, but do they affect the performance?
One can assume that the data in table C is much larger that a or b.
For each sql statement, the engine will create a query plan. So no matter how you put them, the engine will chose a correct path to build the query.
More on plans you have http://en.wikipedia.org/wiki/Query_plan
There are ways, considering what RDBMS you are using to enforce the query order and plan, using hints, however, if you feel that the engine does no chose the correct path.
Sometimes Order of table creates a difference here,(when you are using different joins)
Actually our Joins working on Cross Product Concept
If you are using query like this A join B join C
It will be treated like this (A*B)*C)
Means first result comes after joining A and B table then it will make join with C table
So if after inner joining A (100 record) and B (200 record) if it will give (100 record)
And then these ( 100 record ) will compare with (1000 record of C)
No.
Well, there is a very, very tiny chance of this happening, see this article by Jonathan Lewis. Basically, the number of possible join orders grows very quickly, and there's not enough time for the Optimizer to check them all. The sequence of the tables may be used as a tie-breaker in some very rare cases. But I've never seen this happen, or even heard about it happening, to anybody in real life. You don't need to worry about it.

SQL IN clause slower than individual queries

I'm using Hibernate's JPA implementation with MySQL 5.0.67. MySQL is configured to use InnoDB.
In performing a JPA query (which is translated to SQL), I've discovered that using the IN clause is slower than performing individual queries. Example:
SELECT p FROM Person p WHERE p.name IN ('Joe', 'Jane', 'Bob', 'Alice')
is slower than four separate queries:
SELECT p FROM Person p WHERE p.name = 'Joe'
SELECT p FROM Person p WHERE p.name = 'Jane'
SELECT p FROM Person p WHERE p.name = 'Bob'
SELECT p FROM Person p WHERE p.name = 'Alice'
Why is this? Is this a MySQL performance limitation?
This is a known deficiency in MySQL.
It is often true that using UNION performs better than a range query like the one you show. MySQL doesn't employ indexes very intelligently for expressions using IN (...). A similar hole exists in the optimizer for boolean expressions with OR.
See http://www.mysqlperformanceblog.com/2006/08/10/using-union-to-implement-loose-index-scan-to-mysql/ for some explanation and detailed benchmarks.
The optimizer is being improved all the time. A deficiency in one version of MySQL may be improved in a subsequent version. So it's worth testing your queries on different versions.
It is also advantageous to use UNION ALL instead of simply UNION. Both queries use a temporary table to store results, but the difference is that UNION applies DISTINCT to the result set, which incurs an additional un-indexed sort.
If you're using the IN operator, it's not much different than saying:
(p.name = 'Joe' OR p.name = 'Jane' OR p.name = 'Bob' OR p.name = 'Alice')
Those are four conditions which must be checked for every row that the query must consider. Of course, each other query you cite has only one condition. I don't believe in most real-world scenarios doing four such queries would be faster, since you have to consider the time it takes for your client to read the result sets and do something with them. In that case, IN looks pretty nice; even better if it can use an index.
A query as simple as the IN demonstrated shouldn't have an issue with the optimizer choosing to use the index. The UNION work mentioned by Bill is only required occasionally when you have more complex queries. It could be an issue with index statistics.
Have you done an ANALYZE on the table in question?
How many rows are in the table and how many match the IN clause?
What does EXPLAIN say for the queries in question?
Are you measuring wall-clock time or query execution time? My guess is that the actual execution time for each of the four individual queries may add up to less than the time to execute the IN query, but the overall wall-clock time will be much longer for the four queries.
It will help to have an index on the name column.
For me because the IN clause can free the database and tables up to be used by other connections, and there are application structure benefit to using it, the IN clause is an invaluable tool, even if there is a slight lag over individual queries.
The following technique is utilized in almost every PHP/MySQL application I construct.
I use the IN clause quite a bit with numerical keys:
e.g.
grab five master items and all subites could be:
$master_arr = mysql_query(
select * from master table where master_id in (1,7,9,10)
);
then:
$subitem_arr = mysql_query(
select * from subitems table where par_master_id in (1,7,9,10)
);
the add the subarray to the master items:
foreach($subitem_arr AS $sv){
$m_key = $sv['par_master_id'];
$s_key = $sv['subitem_id'];
$master_arr[$m_key]['subitem'][$s_key] = $sv;
}
This does two things:
1.) the tables are not all held at once with a join
2.) only two mysql queries produce a tree of data
you can make the in clause faster if you get the values first then embed the values into the in clause instead of embedding the sql query into the sql statement
here is an example of using in clause