Which of these select statements is "better," and why? - sql

I have 2 table person and role.
I have to all the persons based on role.
select person.* from person inner join role on
person.roleid = role.id Where role.id = #Roleid
or
select person.* from person inner join role on
person.roleid = role.id AND role.id = #Roleid
Which one of the above two solutions is better and Why?

The first is better because it's logically coherent. The scoping condition isn't relevant to the join, so making it a part of the join is a kludge, and in this case an unhelpful one.

There is no difference in the relational algebra. Criteria from the where and inner joins like this are interchangeable. I use both depending on the readability and situation.
In this particular case, you could also use:
select person.* from person WHERE person.roleid = #Roleid
The only difference being that it does not require that a row exist in the role table (but I assume you have referential integrity for that) and it will not return multiple rows if roleid is not unique (which it almost certainly is in most scenarios I could foresee).

Your best bet is to try these queries out and run them through MS Sql's execution plan. I did this and the results look like this:
Execution plan showing identical performance of queries http://img223.imageshack.us/img223/6491/querycompare.png
As you can see, the performance is the same (granted, running it on your db may produce different results.) So, the best query is the one that follows the consistent convention you use for writing queries.

SQL Server should evaluate those queries identically. Personally, I would use the AND. I like to keep all of the criteria for a joined table in the join itself so that it's all together and easy to find.

Both queries are identical. During query processing, SQL server applies the WHERE filter immediately after applying the join condition filter, so you'll wind up with the same things filtered either way.

I prefer #1, I believe that it expresses the intent of the statement better. That you are joining the two tables based on the roleid & role.id and that you are filtering based on #Roleid
SELECT person.*
FROM person INNER JOIN role ON person.roleid = role.id
Where role.id = #Roleid

Sqlserver probably has an equivalent of the "explain plan" statement that Oracle, PostgreSQL and MySQL all support in one form or another. It can be very useful in telling you how the query parser and optimizer is going to treat your query.

I'd go with the first one, but remember when done testing the code you should explicitly select each table, instead of doing select *

As you are not fetching columns from role you'd better not include it in the FROM clause at all. Use this:
SELECT *
FROM person
WHERE person.roleid IN (SELECT id FROM role WHERE id = #Roleid)
This way the optimizer sees only one table in the FROM clause and can quickly figure out the cardinality of the resultset (that is the number of rows in the resultset is <= the number of rows in table person).
When you throw two tables with a JOIN the optimizer has to look in the ON clause to figure out if these tables are equi-joined and whether unique indexes exist on the joined columns. If the predicate in the ON clause is complicated one (multiple ANDs and ORs) or simply wrong (sometimes very wrong) the optimizer might choose sub-optimal join strategy.
Obviously this particular sample is very contrived, because you can filter persons by roleid = #Roleid directly (no join or sub-query) but the considerations above are valid if you had to filter on other columns in role (#Rolename for instance).

Would there be any performance hit/gain by using this query?
SELECT person.* FROM person,role WHERE person.roleid=role.id AND role.id=#RoleID

Related

Normal Join vs Join with Subqueries

What is the best way for query with joins?
First join tables and then add where conditions
First add where conditions with subquery and then join
For example which one of the following queries have a better performance?
select * from person persons
inner join role roles on roles.person_id_fk = persons.id_pk
where roles.deleted is null
or
select * from person persons
inner join (select * from role roles where roles.deleted is null) as roles
on roles.person_id_fk = persons.id_pk
In a decent database, there should be no difference between the two queries. Remember, SQL is a descriptive language, not a procedural language. That is, a SQL SELECT statement describes the result set that should be returned. It does not specify the steps for creating it.
Your two queries are semantically equivalent and the SQL optimizer should be able to recognize that.
Of course, SQL optimizers are not omniscient. So, sometimes how you write a query does affect the execution plan. However, the queries that you are describing are turned into execution plans that have no concept of "subquery", so it is reasonable that they would produce the same execution plan.
Note: Some databases -- such as MySQL and MS Access -- do not have very good optimizers and such queries do produce different execution plans. Alas.

Left join or Select in select (SQL - Speed of query)

I have something like this:
SELECT CompanyId
FROM Company
WHERE CompanyId not in
(SELECT CompanyId
FROM Company
WHERE (IsPublic = 0) and CompanyId NOT IN
(SELECT ShoppingLike.WhichId
FROM Company
INNER JOIN
ShoppingLike ON Company.CompanyId = ShoppingLike.UserId
WHERE (ShoppingLike.IsWaiting = 0) AND
(ShoppingLike.ShoppingScoreTypeId = 2) AND
(ShoppingLike.UserId = 75)
)
)
It has 3 select, I want to know how could I have it without making 3 selects, and which one has better speed for 1 million record? "select in select" or "left join"?
My experiences are from Oracle. There is never a correct answer to optimising tricky queries, it's a collaboration between you and the optimiser. You need to check explain plans and sometimes traces, often at each stage of writing the query, to find out what the optimiser in thinking. Having said that:
You could remove the outer SELECT by putting the entire contents of it's subquery WHERE clause in a NOT(...). On the face of it will prevent that outer full scan of Company (or it's index of CompanyId). Try it, check the output is the same and get timings, then remove it temporarily before trying the below. The NOT() may well cause the optimiser to stop considering an ANTI-JOIN against the ShoppingLike subquery due to an implicit OR being created.
Ensure that CompanyId and WhichId are defined as NOT NULL columns. Without this (or the likes of an explicit CompanyId IS NOT NULL) then ANTI-JOIN options are often discarded.
The inner most subquery is not correlated (does not reference anything from it's outer query) so can be extracted and tuned separately. As a matter of style I'd swap the table names round the INNER JOIN as you want ShoppingLike scanned first as it has all the filters against it. It wont make any difference but it reads easier and makes it possible to use a hint to scan tables in the order specified. I would even question the need for the Company table in this subquery.
You've used NOT IN when sometimes the very similar NOT EXISTS gives the optimiser more/alternative options.
All the above is just trial and error unless you start trying the explain plan. Oracle can, with a following wind, convert between LEFT JOIN and IN SELECT. 1M+ rows will create time to invest.

DB2 Performance CASE vs COALESCE

I'm modifying an existing statement that joins user information in one table so that the user info can come from another table. One is a permanent table and the other is temporary (records get moved from one to the other). I changed my join to a left join and then left joined the second contact info table. I need to select the permanent field if it exists and the temporary if the permanent isn't there. 154306 is the user id of all incoming records on the main table I'm selecting from. Here are my 2 options for selecting fields:
SELECT
CASE WHEN U.USRID = 154306
THEN T.TMPFNAME
ELSE U.FNAME
END AS FNAME,
COALESCE (U.LNAME, T.TMPLNAME) AS LNAME
FROM FILES.ORDERS O
LEFT JOIN FILES.USERS U ON U.USRID <> 154306 AND U.USRID = O.ORDUSR
LEFT JOIN FILES.TMPUSERS T ON O.ORDNUM = T.TMPORD
I'm thinking the case seems more "correct" as it's actually controlling the flow, but since the coalesce has less logic to follow it might perform faster. Either should accomplish the same result because the 2 left joins ensures we'll get the info for the user no matter what, but don't get the permanent user info for orders which are still assigned to the temp user. It looks like we have 10 fields to case/coalesce so I'm thinking the method with better performance is the way to go, which I think is coalesce but I'm not even sure on that. Is either way better for any reason?
The performance of case versus coalesce() just will not make a difference to a query that is joining three large tables. Such queries are dominated by the time for reading and matching the rows in the table.
By the way, the two are not exactly the same. If you have NULL values in users.Fname, then the case logic would keep them but the coalesce() logic would fill in the values from the other table.
Your criterion should be clarity of expression. Because you think the case makes more sense, I would suggest you go with that.

INNER JOIN vs multiple table names in "FROM" [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
INNER JOIN versus WHERE clause — any difference?
What is the difference between an INNER JOIN query and an implicit join query (i.e. listing multiple tables after the FROM keyword)?
For example, given the following two tables:
CREATE TABLE Statuses(
id INT PRIMARY KEY,
description VARCHAR(50)
);
INSERT INTO Statuses VALUES (1, 'status');
CREATE TABLE Documents(
id INT PRIMARY KEY,
statusId INT REFERENCES Statuses(id)
);
INSERT INTO Documents VALUES (9, 1);
What is the difference between the below two SQL queries?
From the testing I've done, they return the same result. Do they do the same thing? Are there situations where they will return different result sets?
-- Using implicit join (listing multiple tables)
SELECT s.description
FROM Documents d, Statuses s
WHERE d.statusId = s.id
AND d.id = 9;
-- Using INNER JOIN
SELECT s.description
FROM Documents d
INNER JOIN Statuses s ON d.statusId = s.id
WHERE d.id = 9;
There is no reason to ever use an implicit join (the one with the commas). Yes for inner joins it will return the same results. However, it is subject to inadvertent cross joins especially in complex queries and it is harder for maintenance because the left/right outer join syntax (deprecated in SQL Server, where it doesn't work correctly right now anyway) differs from vendor to vendor. Since you shouldn't mix implicit and explict joins in the same query (you can get wrong results), needing to change something to a left join means rewriting the entire query.
If you do it the first way, people under the age of 30 will probably chuckle at you, but as long as you're doing an inner join, they produce the same result and the optimizer will generate the same execution plan (at least as far as I've ever been able to tell).
This does of course presume that the where clause in the first query is how you would be joining in the second query.
This will probably get closed as a duplicate, btw.
The nice part of the second method is that it helps separates the join condition (on ...) from the filter condition (where ...). This can help make the intent of the query more readable.
The join condition will typically be more descriptive of the structure of the database and the relation between the tables. e.g., the salary table is related to the employee table by the EmployeeID column, and queries involving those two tables will probably always join on that column.
The filter condition is more descriptive of the specific task being performed by the query. If the query is FindRichPeople, the where clause might be "where salaries.Salary > 1000000"... thats describing the task at hand, not the database structure.
Note that the SQL compiler doesn't see it that way... if it decides that it will be faster to cross join and then filter the results, it will cross join and filter the results. It doesn't care what is in the ON clause and whats in the WHERE clause. But, that typically wont happen if the on clause matches a foreign key or joins to a primary key or indexed column. As far as operating correctly, they are identical; as far as writing readable, maintainable code, the second way is probably a little better.
there is no difference as far as I know is the second one with the inner join the new way to write such statements and the first one the old method.
The first one does a Cartesian product on all record within those two tables then filters by the where clause.
The second only joins on records that meet the requirements of your ON clause.
EDIT: As others have indicated, the optimization engine will take care of an attempt on a Cartesian product and will result in the same query more or less.
A bit same. Can help you out.
Left join vs multiple tables in SQL (a)
Left join vs multiple tables in SQL (b)
In the example you've given, the queries are equivalent; if you're using SQL Server, run the query and display the actual exection plan to see what the server's doing internally.

Queries within queries: Is there a better way?

As I build bigger, more advanced web applications, I'm finding myself writing extremely long and complex queries. I tend to write queries within queries a lot because I feel making one call to the database from PHP is better than making several and correlating the data.
However, anyone who knows anything about SQL knows about JOINs. Personally, I've used a JOIN or two before, but quickly stopped when I discovered using subqueries because it felt easier and quicker for me to write and maintain.
Commonly, I'll do subqueries that may contain one or more subqueries from relative tables.
Consider this example:
SELECT
(SELECT username FROM users WHERE records.user_id = user_id) AS username,
(SELECT last_name||', '||first_name FROM users WHERE records.user_id = user_id) AS name,
in_timestamp,
out_timestamp
FROM records
ORDER BY in_timestamp
Rarely, I'll do subqueries after the WHERE clause.
Consider this example:
SELECT
user_id,
(SELECT name FROM organizations WHERE (SELECT organization FROM locations WHERE records.location = location_id) = organization_id) AS organization_name
FROM records
ORDER BY in_timestamp
In these two cases, would I see any sort of improvement if I decided to rewrite the queries using a JOIN?
As more of a blanket question, what are the advantages/disadvantages of using subqueries or a JOIN? Is one way more correct or accepted than the other?
In simple cases, the query optimiser should be able to produce identical plans for a simple join versus a simple sub-select.
But in general (and where appropriate), you should favour joins over sub-selects.
Plus, you should avoid correlated subqueries (a query in which the inner expression refer to the outer), as they are effectively a for loop within a for loop). In most cases a correlated subquery can be written as a join.
JOINs are preferable to separate [sub]queries.
If the subselect (AKA subquery) is not correlated to the outer query, it's very likely the optimizer will scan the table(s) in the subselect once because the value isn't likely to change. When you have correlation, like in the example provided, the likelihood of single pass optimization becomes very unlikely. In the past, it's been believed that correlated subqueries execute, RBAR -- Row By Agonizing Row. With a JOIN, the same result can be achieved while ensuring a single pass over the table.
This is a proper re-write of the query provided:
SELECT u.username,
u.last_name||', '|| u.first_name AS name,
r.in_timestamp,
r.out_timestamp
FROM RECORDS r
LEFT JOIN USERS u ON u.user_id = r.user_id
ORDER BY r.in_timestamp
...because the subselect can return NULL if the user_id doesn't exist in the USERS table. Otherwise, you could use an INNER JOIN:
SELECT u.username,
u.last_name ||', '|| u.first_name AS name,
r.in_timestamp,
r.out_timestamp
FROM RECORDS r
JOIN USERS u ON u.user_id = r.user_id
ORDER BY r.in_timestamp
Derived tables/inline views are also possible using JOIN syntax.
a) I'd start by pointing out that the two are not necessarily interchangable. Nesting as you have requires there to be 0 or 1 matching value otherwise you will get an error. A join puts no such requirement and may exclude the record or introduce more depending on your data and type of join.
b) In terms of performance, you will need to check the query plans but your nested examples are unlikely to be more efficient than a table join. Typically sub-queries are executed once per row but that very much depends on your database, unique constraints, foriegn keys, not null etc. Maybe the DB can rewrite more efficiently but joins can use a wider variety of techniques, drive the data from different tables etc because they do different things (though you may not observe any difference in your output depending on your data).
c) Most DB aware programmers I know would look at your nested queries and rewrite using joins, subject to the data being suitably 'clean'.
d) Regarding "correctness" - I would favour joins backed up with proper constraints on your data where necessary (e.g. a unique user ID). You as a human may make certain assumptions but the DB engine cannot unless you tell it. The more it knows, the better job it (and you) can do.
Joins in most cases will be much more faster.
Lets take this with an example.
Lets use your first query:
SELECT
(SELECT username FROM users WHERE records.user_id = user_id) AS username,
(SELECT last_name||', '||first_name FROM users WHERE records.user_id = user_id) AS name,
in_timestamp,
out_timestamp
FROM records
ORDER BY in_timestamp
Now consider we have 100 records in records and 100 records in user.(Assuming we dont have index on user_id)
So if we understand your algorithm it says:
For each record
Scan all 100 records in users to find out username
Scan all 100 records in users to find out last name and first name
So its like we scanned users table 100*100*2 time. Is it really worth. If we consider index on user_id it will make thing better, but is it still worth.
Now consider a join (nested loop will almost produce same result as above, but consider a hash join):
Its like.
Make a hash map of user.
For each record
Find a mapping record in Hashmap. Which will be certainly much more faster then looping and finding a record.
So clearly, joins should be favorable.
NOTE: Example used of 100 record may produce identical plan, but the idea is to analyze how it can effect the performance.