Is there any difference ( in performance and correctness) between these to SQL (sqls are not important)
SQL no. 1
SELECT a.test1,
FROM table_1 a
JOIN table_2 b ON a.id = b.id
AND (a.test1 ="test")
AND (b.test2 = "test")
JOIN table_3 c ON c.id2 = b.id2
SQL no. 2
SELECT a.test1,
FROM table_3 c
JOIN table_2 b ON c.id2 = b.id2
JOIN table_1 a ON a.id = b.id
AND (a.test1 ="test")
AND (b.test2 = "test")
also, table a has 500 000 records, table b has 1000 000 and table c has 1 5000 00 records

Honestly I don't think reversing the join statements will make much of a difference. Either way the program will have to combine the data of the table with the data of the two other tables. The calculations that have to be performed don't seem to be affected by the joins. If the first join changed the length of the data that needed to be analysed to perform the second join, I could see why the query execution time would be affected, as would the results. But aren't the statements independent of eachother? Measuring time of queries can be done like following in SQL Server Management Studio:
set statistics time on
set statistics time off


Postgresql why is INNER JOIN so much slower than WHERE

I have 2 tables where I copy file name from one table to another in an update operation. Using INNER JOIN makes the query run in 22 seconds when there are just ~4000 rows. Using a WHERE clause allows it to run it in about 200 milliseconds. How and why is this happening, does the INNER JOIN result in additional looping?
Example 1 using INNER JOIN - Takes 22 seconds when table a has about 4k records.
UPDATE table_a SET file_name = tmp.file_name FROM
SELECT b.customer_id, b.file_name, b.file_id FROM table_b AS b WHERE b.status = 'A'
) tmp
INNER JOIN table_a AS a
ON tmp.customer_id=a.customer_id AND tmp.file_id=a.file_id;
Example 2 using WHERE runs in about 200 ms.
UPDATE table_a AS a SET file_name = tmp.file_name FROM
SELECT b.customer_id, b.file_name, b.file_id FROM table_b AS b WHERE b.status = 'A'
) tmp
WHERE tmp.customer_id=a.customer_id AND tmp.file_id=a.file_id;
The queries are doing totally different things. The first is updating every row in table_a with the expression. I am guessing that there are even multiple updates on the same row.
The two table_as in the first version are two different references to the table. The effect is a cross join because you have no conditions combining them.
The second method is the correct syntax for what you want to do in Postgres.

What is the efficient way to query subset of a joined table

My query is somewhat like this
SELECT TableA.Column1
LEFT JOIN TableB ON TableA.ForeignKey = TableB.PrimaryKey
LEFT JOIN TableC ON TableC.PrimaryKey = TableB.ForeignKey
WHERE TableC.SomeColumn = 'XXX'
In the above case Table A and Table B are large tables (may contain more than 1 million rows), but Table C is small, with just 25 rows.
I have applied indexes on primary keys of all the tables.
In our application scenario, I need to search in TableC for just two conditions, TableC.SomeColumn = 'XXX' or TableC.SomeColumn = 'YYY'.
My question is what is the most efficient way to do this. A straight join does work, but I am concerned about joining with all the rows in TableB, just to pick a small subset of it, when joined in Table C.
Is it a good approach to have an indexed view?
For example,
JOIN TableC ON TableC.PrimaryKey = TableB.ForeignKey
WHERE TableC.SomeColumn IN ('XXX', 'YYY'))?
You where clause undoes the outer join, so you might as well write the query as:
SELECT a.Column1
TableB b
ON a.ForeignKey = b.PrimaryKey JOIN
TableC c
ON c.PrimaryKey = b.ForeignKey
WHERE c.SomeColumn = 'XXX';
For this query, you want indexes these indexes:
TableC(SomeColumn, PrimaryKey)
TableB(ForeignKey, PrimaryKey)
TableA(ForeignKey, Column1)
You can create an indexed view. That would generally be the fastest for querying. However, it can incur a lot more overhead for updates and inserts into any of the base tables.
I typically only use a JOIN when I need to SELECT or GROUP on the data, not when using it as a predicate. That said, I would be very curious to see if Gordon's answer or this one performs better.
I would also suggest getting in the habit of using alias' when referencing your tables, its less typing, and makes your code easier to read.
I would test and compare execution times:
SELECT A.Column1
WHERE A.ForeignKey = B.PrimaryKey
WHERE C.PrimaryKey = B.ForeignKey
AND C.SomeColumn = 'XXX'))

SQL query on large tables fast at first then slow

Below query returns the initial result fast and then becomes extremely slow.
, B.Date1
ON A.Id = B.Id AND A.Flag = 'Y'
AND (B.Date1 IS NOT NULL AND A.Date >= B.Date2 AND A.Date < B.Date1)
Table A has 24 million records and Table B has 500 thousand records.
Index for Table A is on columns: Id and Date
Index for Table B is on columns: Id, Date2, Date1 - Date1 is nullable - index is unique
Frist 11m records are returned quite fast and it then suddenly becomes extremely slow. Execution Plan shows the indexes are used.
However, when I remove condition A.Date < B.Date1, query becomes fast again.
Do you know what should be done to improve the performance? Thanks
I updated the query to show that I need fields of Table B in the result. You might think why I used left join when I have condition "B.Date1 is not null". That's because I have posted the simplified query. My performance issue is even with this simplified version.
You can maybe try using EXISTS. It should be faster as it stops looking for further rows once a match is found unlike JOIN where all the rows will have to be fetched and joined.
select id
from a
where flag = 'Y'
and exists (
select 1
from b
where a.id = b.id
and a.date >= b.date2
and a.date < b.date1
and date1 is not null
Generally what I've noticed with queries, and SQL performance is the DATA you are joining, for instance ONE to ONE relationships are much faster than ONE to MANY relationships.
I've noticed ONE to MANY relationship on table 3000 items, joining to a table with 30,000 items can easily take up to 11-15 seconds, with LIMIT. But that same query, redesigned with all ONE TO ONE relationships would take less than 1 second.
So my suggestion to speed up your query.
According to Left Outer Join (desc) "LEFT JOIN and LEFT OUTER JOIN are the same" so it doesn't matter which one, you use.
But ideally, should use INNER because in your question you stated B.Date1 IS NOT NULL
Based on this parent columns in join selection (desc), you can use parent column in SELECT within JOIN.
INNER JOIN (SELECT b.Id AS 'Id', COUNT(1) as `TotalLinks` FROM B b WHERE ((b.Date1 IS NOT NULL) AND ((a.Date >= b.Date2) AND (a.Date < b.Date1)) GROUP BY b.Id) AS `ab` ON (a.Id = ab.Id) AND (a.Flag = 'Y')
WHERE a.Flag = 'Y' AND b.totalLinks > 0
LIMIT 0, 500
Try and also, LIMIT the DATA you want; this will reduce the filtering necessary by SQL.

SQL Understanding Outer Joins with Criteria and Why the Order of Criteria can alter Results

So I know how some things work in SQL but I don't know why and I'm unable to find a nice layman description of this online. For reference I'm using Oracle 11g and TOAD.
Question 1 - Outer Joins with Criteria
I know that if you place criteria on an outer joined table, you turn the query into an inner join regardless of your Syntax. So this query acts as an inner join:
SELECT a.field1, b.field1
FROM tableA a
LEFT JOIN tableB b on a.key = b.key
WHERE b.field2 = 'someCriteria'
The way to get around this is to include an "OR IS NULL" in the second table's criteria. I know this to be true but I've never been able to wrap my head around why this is. Can someone explain why criteria on an outer table turns an outer join into an inner join?
Question 2 - Adding Criteria to Different Clauses Changes Results
So the above being true, I've been struggling with how the order of my criteria can alter the results of the following two queries. I have two tables - tableA and tableB - and need to do a left join compare of a subset of tableA to a subset of tableB.
SELECT DISTINCT a.field1, b.field2
FROM tableA a
LEFT JOIN tableB b
on a.key = b.key
AND (b.field2 = 'somecriteria' or b.field2 IS NULL)
WHERE a.field1 = 'othercriteria'
Results: SQL1 gives me the correct left joined results.
SELECT DISTINCT a.field1, b.field2
FROM tableA a
LEFT JOIN tableB b
on a.key = b.key
WHERE a.field1 = 'othercriteria'
AND (b.field2 = 'somecriteria' or b.field2 IS NULL)
Results: SQL2 pulls back only an inner joined result of the two subsets (excluding those rows in tableA where tableB does not have a match).
Understanding The Results
The reason for this has something to do with the order in which the join and the where runs. I could understand this changing performance but I'm not following why it would change the results as the syntax is almost identical. To wrap my head around it, I ran the execution plans for both queries and got the following results (from TOAD):
Execution Plans:
SQL 1:
5) Select Statement (Rows were returned by the Select statement)
4) Sort Unique (The rows from step 3 were sorted to eliminate duplicate rows)
3) Hash Join Right Outer (The result sets from steps 1, 2 were joined (hash)
1) Table Access Full TABLE tableB (Every row in the table tableB is read)
2) Table Access Full TABLE tableA (Every row in the table tableA is read)
SQL 2:
11) Select Statement (Rows were returned by the Select statement)
10) Sort Unique (The rows from step 9 were sorted to eliminate duplicate rows)
9) All distinct rows from steps 4, 8 were returned
4) Filter (For the rows returned by step 3, filter out rows depending on filter criteria)
3) Hash Join Right Outer (The result sets from steps 1, 2 were joined (hash)
1) Table Access Full TABLE tableB (Every row in the table tableB is read
2) Table Access Full TABLE tableA (Every row in the table tableA is read)
8) Filter (For the rows returned by step 7, filter out rows depending on filter criteria)
7) Hash Join Right Outer (The result sets from steps 5, 6 were joined (hash)
5) Table Access Full TABLE tableB (Every row in the table tableB is read)
6) Table Access Full TABLE tableA (Every row in the table tableA is read)
So I have no doubt that the above execution plans explain perfectly why SQL 2 gives me different results than SQL 1 but I'm having a hard time reading these plans. Can someone help me translate these execution plans and explain why SQL2 is treated as an Inner Join because the tableA criteria is listed in the WHERE clause instead of the JOIN?
Thanks in advance!!
For question 1, consider table a containing two rows:
key field1
1 a
2 b
And table b containing 3 rows:
key field1 field2
1 c someCriteria
1 d notSomeCriteria
1 e NULL
your FROM clause (with its JOINs) effectively generates a result set that looks like this:
(a)key (a)field1 (b)key (b)field1 (b)field2
1 a 1 c someCriteria
1 a 1 d notSomeCriteria
1 a 1 e NULL
By the time the WHERE clause is considered, it no longer really "knows" whether a particular JOIN was successful or not - it doesn't selectively apply criteria based on whether or not the join succeeded. So if you've specified the b.field2 should equal someCriteria, you're saying that it should only return the first row (1,a,1,c,someCriteria).
If you want to make particular assertions about NULLable columns, you do really want the WHERE clause to act this way and force you to explicitly consider NULLs (whether those be generated from a NULL column or by a JOIN failing)
The cure I'd usually adopt is the one shown in HLGEM's answer, rather than adding OR b.field2 IS NULL since you usually want to exclude the (1,a,1,e,NULL) row.
I would write the first as:
SELECT a.field1, b.field1
FROM tableA a
LEFT JOIN tableB b on a.key = b.key
AND b.field2 = 'someCriteria'
This would rerturn all records from table a and the b.field1 would only have data if b.field2 = 'somecriteria'

Should I use a temp table?

I have a report query that is taking 4 minutes, and under the maximum 30 seconds allowed limit applied on us.
I notice that it has a LOT of INNER JOINS. One, I see, is it joins to a Person table, which has millions of rows. I'm wondering if it would be more efficient to break up the query. Would it be more efficient to do something like:
Assume all keys are indexed.
Table C has 8 million records, Table B has 6 Million records, Table A has 400,000 records.
ON b.key = a.key
ON C.key = b.CKey
WHERE A.id = AnInput
INTO TempTableC
WHERE id = AnInput
-- TempTableC now has 1000 records
INNER JOIN TableB B --Maybe put this into a filtered temp table?
ON b.key = a.key
ON c.AField = b.aField
WHERE a.id = AnInput
Basically, bring the result sets into temp tables, then join.
If your Person table is indexed correctly, then the INNER JOIN should not be causing such a problem. Check that you have an index created on column(s) that are joined to in all your tables. Using temp tables for what appears to be a relatively simple query seems to be papering over the cracks of an inadequate database design.
As others have said, the only way to be sure is to post your query plan.