SQL inner join and where performance comparison - sql

If we have 2 tables, tableA (with column1, column2) and tableB (with column1, column2), what's the difference between the following two queries? Which one has better performance? What if we have indexing for both tables?
Query #1:
select
b.column2
from
tableA a,
tableB b
where
a.column1 = b.column1
and a.column2 = ?;
Query #2:
select
b.column2
from
tableA a
inner join
tableB b on a.column1 = b.column1
where
a.column2 = ?;

2nd query has better performance.
You are using cross join in your first query and then filtering the results. Imagine having 10000 records in both the tables, it will produce 10000*10000 combinations.

Both will perform equally. One is an ansi style and the other is old fashioned style of joining
You may compare the explain plans and most likely you will find them to be the same.

Related

Right Join vs where a value exists in another table

Without realizing it I've switched to the first block of code as a preference. I am curious if it is a best practice or more efficient to use the first block of code over the second or vice versa?
In my opinion the first is more readable and concise since all the columns are from one table.
SELECT Column2, Column3, Column4
FROM Table1
WHERE Column1 in (SELECT Column1 FROM Table2)
vs
SELECT A.Column2, A.Column3, A.Column4
FROM Table1 A
RIGHT JOIN Table2 B ON A.Column1 = B.Column1
Just hoping for clarification on best practices/efficiency of each statement and if there's an accepted form.
Your two queries don't do the same thing.
Your first one
SELECT Column2, Column3, Column4
FROM Table1
WHERE Column1 in (SELECT Column1 FROM Table2)
is called a semi-join. It works like an inner join where the resultset has no columns from the second table. This is another way of writing the semi-join, but you have pointed out that your way is easier for you to read and reason about. (I agree.) Modern query planners satisfy either way of writing the semi-join the same way. This is the other way of writing the semi-join.
SELECT Table1.Column2, Table1.Column3, Table1.Column4
FROM Table1
INNER JOIN Table2 ON Table1.Column1 = Table2.Column1
Your second query is this. (By the way, RIGHT JOINs are far less common than LEFT JOINs in production code; many people have to stop and think twice when reading a RIGHT JOIN.)
SELECT A.Column2, A.Column3, A.Column4
FROM Table1 A
RIGHT JOIN Table2 B ON A.Column1 = B.Column1
This will produce resultset rows for every row in Table2 whether or not they match rows in Table1. Inner joins only deliver the rows that match the ON condition for both joined tables, and that's what you want.
Left joins produce at least one row for every row in Table1, even if it doesn't match. It's the same mutatis mutandis for right joins.

Which of these SQL queries will be better for query performance?

So I have some tables with millions of rows of data, and the current query I have is like the following:
WITH first_table AS
(
SELECT
A.column1, A.column2, B.column1 AS column3, C.column1 AS column4
FROM
tableA AS A
LEFT JOIN
tableB AS B ON A.id = B.id
LEFT JOIN
tableC AS C ON A.id = C.id
UNION ALL
SELECT
D.column1, D.column2, NULL AS column3, D.column4
FROM
tableD AS D
UNION ALL
...
)
SELECT
column1, column2, column3, column4, A.col5, A.col6... until A.col20
FROM
first_table
LEFT JOIN
tableA AS A ON first_table.id = A.id
I'm basically appending two tables at least to table A and then joining again table A in the final SELECT statement. I do this because I need like 30 columns from table A and I don't want to fill with NULL values the append statement since I only need 4 or 5 columns from the tables appended to the main one (tableA).
I was wondering if it would be better to avoid the join and then fill all the columns I need since the WITH statement or should I keep my code as it is. All of this is for query performance and improve execution time.
Thanks.

postgresql delete rows based on inner join of complex subqueries

I am trying to delete the matching rows from two complex subqueries. I am using postgresql. Here is a sample code:
DELETE FROM complex_subquery1 as a
USING complex_subquery2 as b
WHERE a.column1 = b.column2
I read here: PostgreSQL: delete rows returned by subquery that this is not really possible this way. Is there a shortcut for the case of deleting inner join?
The normal way to do that is
DELETE FROM atable
USING complex_subquery1 as a,
complex_subquery2 as b
WHERE a.column1 = b.column2
AND a.column3 = atable.column4;

HQL: Why would one ever use WHERE instead of ON...AND in a JOIN statement?

Suppose I have the following query:
SELECT a.column1,
a.column2,
b.column3
FROM table1 a
JOIN table2 b
ON a.column1 = b.column2
AND a.column2 = "value"
AND b.column3 = "other value"
Why would one ever use a WHERE when filtering the values rather than another AND, i.e.
SELECT a.column1,
a.column2,
b.column3
FROM table1 a
JOIN table2 b
ON a.column1 = b.column2
AND a.column2 = "value"
WHERE b.column3 = "other value"
Wouldn't AND always make the query faster, as it will filter out the data before the join?
As far as I know there won't be any measurable performance difference between both the queries.
Personally i prefer to keep the Join conditions in ON clause and the filtering conditions in Where clause.
If you keep the filtering condition's in where clause it will be more readable.
Modern rdbms query optimizers do a great job at building efficient execution plans, compare the execution plans created by your two queries, they are identical. So there will not be a performance difference.
You may find older folk who suggest there's a performance increase when adding filtering criteria to a JOIN because FROM is evaluated before WHERE, thus filtering records earlier in the process and saving time. This is just an artifact from old databases.
I agree with NoDisplayName, I usually put filtering criteria that references 1 side of the JOIN in the WHERE clause, unless needed as is sometimes the case with outer joins.
Hive only supports equi-joins.
So in the ON clause you can only do equality comparison:
SELECT
...
FROM
... a
JOIN
... b
ON
a.column1 = b.column2
AND a.column2 = "value"
But not:
ON
a.column1 = b.column2
a.column2 LIKE "value"
But you can do:
ON
a.column1 = b.column2
WHERE
a.column2 LIKE "value"

SQL Selecting from 3 tables returns syntax error

I have 3 tables on which i need to select data from using ms-access DB
I tried this SQL:
SELECT a.column1, a.column2, a.column3, a.columnID, b.column1
From TableA a INNER JOIN TableB b
ON a.columnID = b.columnID INNER JOIN TableC c
ON c.columnID = a.columnRelativeID
WHERE a.columnID=16
Although when I try to execute the query I receive Syntax error.
In addition, when i remove the second join, with the third table, the query works fine so this is the place where the error stays.
This example of joining 3 tables didn't help me understand where my problem is.
Is it OK if I just select from two tables and complete the third-table data from a LINQ in C#? I have the Third-table data in a data source in my code
Thanks in advance,
Oz.
You can absolutely select from three (or more) tables in MS Access. However, you have to use Access' craptastic parenthesis system which pairs tables together in the From clause.
Select A.Column1, A.Column2, A.Column3, A.ColumnID, B.Column1
From (Table1 AS A
Inner Join Table2 AS B
On A.ColumnID = B.ColumnId)
Inner Join Table3 AS C
ON A.ColumnRelativeId = C.ColumnId
Where A.ColumnId = 16
SELECT a.column1, a.column2, a.column3, a.columnID, b.column1
From TableA a , TableB b, TableC
WHERE a.columnID = b.columnID
AND c.columnID = a.columnRelativeID
AND a.columnID=16