Which query is better in performance aspect ?
A
select users.email
from (select * from purchases where id = ***) A
left join users on A.user_id = users.id;
B
select users.email
from purchases A
left join users on A.user_id = users.id where A.id = ***;
Basically, I'm thinking A is better. (But If sql server is optimizing query significantly)
Please explain me which query is better and why. Thanks.
Almost any database optimizer is going to ignore the subquery. Why? SQL queries describer the result set being produced, not the steps for processing it. The SQL optimizer produces the underlying code that is run.
And most optimizers are smart enough to ignore subqueries and to choose optimal indexes and partitions and algorithms regardless of them. One except is that some versions of MySQL/MariaDB tend to materialize subqueries -- and that is a performance killer. I think even that has improved in more recent versions.
Related
What is the best way for query with joins?
First join tables and then add where conditions
First add where conditions with subquery and then join
For example which one of the following queries have a better performance?
select * from person persons
inner join role roles on roles.person_id_fk = persons.id_pk
where roles.deleted is null
or
select * from person persons
inner join (select * from role roles where roles.deleted is null) as roles
on roles.person_id_fk = persons.id_pk
In a decent database, there should be no difference between the two queries. Remember, SQL is a descriptive language, not a procedural language. That is, a SQL SELECT statement describes the result set that should be returned. It does not specify the steps for creating it.
Your two queries are semantically equivalent and the SQL optimizer should be able to recognize that.
Of course, SQL optimizers are not omniscient. So, sometimes how you write a query does affect the execution plan. However, the queries that you are describing are turned into execution plans that have no concept of "subquery", so it is reasonable that they would produce the same execution plan.
Note: Some databases -- such as MySQL and MS Access -- do not have very good optimizers and such queries do produce different execution plans. Alas.
I am not sure whether the title of this question is correct or not.
I have a table for example users which contains different types of users. Like user type 10, 20, 30 etc.
In a query I need to join the user table, but I want only user type 20. So which of the below query perform better.
SELECT fields
FROM consumer c
INNER JOIN user u ON u.userid = c.userid
WHERE u.type = 20
In another way,
SELECT fields
FROM consumer c
INNER JOIN (SELECT user_fields FROM user WHERE type = 20) u ON u.userid = c.userid
Please advice.
Let's start with this query:
SELECT . . .
FROM consumer c INNER JOIN
user u
ON u.userid = c.userid
WHERE u.type = 20;
Assuming that type is relatively rare, you want indexes on the tables. The best indexes are probably user(type, userid) and customer(userid). It is possible that an index on user(userid, type) would be better (and would be unnecessary if userid is a clustered primary key).
The second query . . . well, from the SQL Server perspective it is probably the same. Why? SQL Server has a good optimizer. You can check the execution plans if you like. Because of the optimizer:
There is no benefit to having a subquery select only a handful of columns. For better or worse, SQL Server pushes that information down to the node that reads the data.
The where clause is not necessarily going to be evaluated before the join. SQL Server is smart enough to re-arrange operations.
Not all optimizers are this smart. In a database such as MySQL, MS Access, or SQLite, I'm pretty sure the first version is much better than the second.
Run the two queries in SSMS as a batch, and click "execution plan" , you will find that the execution plan of both queries, and the query cost (relative to the batch ): 50%
That means they are the same.
If they are different (in case of some optimization), you find the ratio different.
I simulated your query and find the query cost=50% ===> i.e they are the same.
It really depends on a various number of factors:
is "userid" on both table indexed?
is "type" on table "users" indexed?
how many rows in each table?
Usually a subquery produces slower performances, but depending on the conditions listed above and how your sql server installation is configured, both query can be resolved (and so, executed) as the same by the query analyzer.
SQLServer takes your query and tries to optimize it so it can happen that query B is "transformed" in query A.
Look at the QueryAnalyzer tool for both queries, and see if they have differences.
Generally speaking inner queries are better to be avoided, and you'll probably get the best performances doing query A.
Both your options are valid. Personally would code it like this;
SELECT fields
FROM consumer c
INNER JOIN user u ON u.userid = c.userid and u.type = 20
Run both queries in SQL Management Studio (query) and tick 'Include actual execution plan'. This will let you see the performance of your queries against each other. It will depend on your particular database.
For inner joins, is there any difference in performance to apply a filter in the JOIN ON clause or the WHERE clause? Which is going to be more efficient, or will the optimizer render them equal?
JOIN ON
SELECT u.name
FROM users u
JOIN departments d
ON u.department_id = d.id
AND d.name = 'IT'
VS
WHERE
SELECT u.name
FROM users u
JOIN departments d
ON u.department_id = d.id
WHERE d.name = 'IT'
Oracle 11gR2
There should be no difference. The optimizer should generate the same plan in both cases and should be able to apply the predicate before, after, or during the join in either case based on what is the most efficient approach for that particular query.
Of course, the fact that the optimizer can do something, in general, is no guarantee that the optimizer will actually do something in a particular query. As queries get more complicated, it becomes impossible to exhaustively consider every possible query plan which means that even with perfect information and perfect code, the optimizer may not have time to do everything that you'd like it to do. You'd need to take a look at the actual plans generated for the two queries to see if they are actually identical.
I prefer putting the filter criteria in the where clause.
With data warehouse queries, putting the filter criteria in the join seems to cause the query to last significantly longer.
For example, I have Table1 indexed by field Date, and Table2 partitioned by field Partition. Table2 is the biggest table in the query and is in another database server. I use driving_site hint to tell the optimizer to use Table2 partitions.
select /*+driving_site(b)*/ a.key, sum(b.money) money
from schema.table1 a
join schema2.table2#dblink b
on a.key = b.key
where b.partition = to_number(to_char(:i,'yyyymm'))
and a.date = :i
group by a.key`
If I do the query this way, it takes about 30 - 40 seconds to return the results.
If I don't do the query this way, it takes about 10 minutes until I cancel the execution with no results.
There's a similar question here, but my doubt is slight different:
select *
from process a inner join subprocess b on a.id=b.id and a.field=true
and b.field=true
So, when using inner join, which operation comes first: the join or the a.field=true condition?
As the two tables are very big, my goal is to filter table process first and after that join only the rows filtered with table subprocess.
Which is the best approach?
First things first:
which operation comes first: the join or the a.field=true condition?
Your INNER JOIN includes this (a.field=true) as part of the condition for the join. So it will prevent rows from being added during the JOIN process.
A part of an RDBMS is the "query optimizer" which will typically find the most efficient way to execute the query - there is no guarantee on the order of evaluation for the INNER JOIN conditions.
Lastly, I would recommend rewriting your query this way:
SELECT *
FROM process AS a
INNER JOIN subprocess AS b ON a.id = b.id
WHERE a.field = true AND b.field = true
This will effectively do the same thing as your original query, but it is widely seen as much more readable by SQL programmers. The optimizer can rearrange INNER JOIN and WHERE predicates as it sees fit to do so.
You are thinking about SQL in terms of a procedural language which it is not. SQL is a declarative language, and the engine is free to pick the execution plan that works best for a given situation. So, there is no way to predict if a join or a where will be executed first.
A better way to think about SQL is in terms of optimizing queries. Things like assuring that your joins and wheres are covered by indexes. Also, at least in MS Sql Server, you can preview an estimated or actual execution plan. There is nothing stopping you from doing that and seeing for yourself.
I was just tidying up some sql when I came across this query:
SELECT
jm.IMEI ,
jm.MaxSpeedKM ,
jm.MaxAccel ,
jm.MaxDeccel ,
jm.JourneyMaxLeft ,
jm.JourneyMaxRight ,
jm.DistanceKM ,
jm.IdleTimeSeconds ,
jm.WebUserJourneyId ,
jm.lifetime_odo_metres ,
jm.[Descriptor]
FROM dbo.Reporting_WebUsers AS wu WITH (NOLOCK)
INNER JOIN dbo.Reporting_JourneyMaster90 AS jm WITH (NOLOCK) ON wu.WebUsersId = jm.WebUsersId
INNER JOIN dbo.Reporting_Journeys AS j WITH (NOLOCK) ON jm.WebUserJourneyId = j.WebUserJourneyId
WHERE ( wu.isActive = 1 )
AND ( j.JourneyDuration > 2 )
AND ( j.JourneyDuration < 1000 )
AND ( j.JourneyDistance > 0 )
My question is does it make any performance difference the order of the joins as for the above query I would have done
FROM dbo.Reporting_JourneyMaster90 AS jm
and then joined the other 2 tables to that one
Join order in SQL2008R2 server does unquestionably affect query performance, particularly in queries where there are a large number of table joins with where clauses applied against multiple tables.
Although the join order is changed in optimisation, the optimiser does't try all possible join orders. It stops when it finds what it considers a workable solution as the very act of optimisation uses precious resources.
We have seen queries that were performing like dogs (1min + execution time) come down to sub second performance just by changing the order of the join expressions. Please note however that these are queries with 12 to 20 joins and where clauses on several of the tables.
The trick is to set your order to help the query optimiser figure out what makes sense. You can use Force Order but that can be too rigid. Try to make sure that your join order starts with the tables where the will reduce data most through where clauses.
No, the JOIN by order is changed during optimization.
The only caveat is the Option FORCE ORDER which will force joins to happen in the exact order you have them specified.
I have a clear example of inner join affecting performance. It is a simple join between two tables. One had 50+ million records, the other has 2,000. If I select from the smaller table and join the larger it takes 5+ minutes.
If I select from the larger table and join the smaller it takes 2 min 30 seconds.
This is with SQL Server 2012.
To me this is counter intuitive since I am using the largest dataset for the initial query.
Usually not. I'm not 100% this applies verbatim to Sql-Server, but in Postgres the query planner reserves the right to reorder the inner joins as it sees fit. The exception is when you reach a threshold beyond which it's too expensive to investigate changing their order.
JOIN order doesn't matter, the query engine will reorganize their order based on statistics for indexes and other stuff.
For test do the following:
select show actual execution plan and run first query
change JOIN order and now run the query again
compare execution plans
They should be identical as the query engine will reorganize them according to other factors.
As commented on other asnwer, you could use OPTION (FORCE ORDER) to use exactly the order you want but maybe it would not be the most efficient one.
AS a general rule of thumb, JOIN order should be with table of least records on top, and most records last, as some DBMS engines the order can make a difference, as well as if the FORCE ORDER command was used to help limit the results.
Wrong. SQL Server 2005 it definitely matters since you are limiting the dataset from the beginning of the FROM clause. If you start with 2000 records instead of 2 million it makes your query faster.