Should I use a temp table? - sql

I have a report query that is taking 4 minutes, and under the maximum 30 seconds allowed limit applied on us.
I notice that it has a LOT of INNER JOINS. One, I see, is it joins to a Person table, which has millions of rows. I'm wondering if it would be more efficient to break up the query. Would it be more efficient to do something like:
Assume all keys are indexed.
Table C has 8 million records, Table B has 6 Million records, Table A has 400,000 records.
SELECT Fields
FROM TableA A
INNER JOIN TableB B
ON b.key = a.key
INNER JOIN Table C
ON C.key = b.CKey
WHERE A.id = AnInput
Or
SELECT *
INTO TempTableC
FROM TableC
WHERE id = AnInput
-- TempTableC now has 1000 records
Then
SELECT Fields
FROM TableA A
INNER JOIN TableB B --Maybe put this into a filtered temp table?
ON b.key = a.key
INNER JOIN TempTableC c
ON c.AField = b.aField
WHERE a.id = AnInput
Basically, bring the result sets into temp tables, then join.

If your Person table is indexed correctly, then the INNER JOIN should not be causing such a problem. Check that you have an index created on column(s) that are joined to in all your tables. Using temp tables for what appears to be a relatively simple query seems to be papering over the cracks of an inadequate database design.
As others have said, the only way to be sure is to post your query plan.

Related

SQL: placement of inner joins and impact of performance and correctness

Is there any difference ( in performance and correctness) between these to SQL (sqls are not important)
SQL no. 1
SELECT a.test1,
a.test2,
a.test3,
b.test1,
b.test2,
b.test3,
c.test1,
c.test2,
c.test3
FROM table_1 a
JOIN table_2 b ON a.id = b.id
AND (a.test1 ="test")
AND (b.test2 = "test")
JOIN table_3 c ON c.id2 = b.id2
SQL no. 2
SELECT a.test1,
a.test2,
a.test3,
b.test1,
b.test2,
b.test3,
c.test1,
c.test2,
c.test3
FROM table_3 c
JOIN table_2 b ON c.id2 = b.id2
JOIN table_1 a ON a.id = b.id
AND (a.test1 ="test")
AND (b.test2 = "test")
also, table a has 500 000 records, table b has 1000 000 and table c has 1 5000 00 records
Honestly I don't think reversing the join statements will make much of a difference. Either way the program will have to combine the data of the table with the data of the two other tables. The calculations that have to be performed don't seem to be affected by the joins. If the first join changed the length of the data that needed to be analysed to perform the second join, I could see why the query execution time would be affected, as would the results. But aren't the statements independent of eachother? Measuring time of queries can be done like following in SQL Server Management Studio:
set statistics time on
$query_to_be_measured
set statistics time off

What is the efficient way to query subset of a joined table

My query is somewhat like this
SELECT TableA.Column1
FROM TableA
LEFT JOIN TableB ON TableA.ForeignKey = TableB.PrimaryKey
LEFT JOIN TableC ON TableC.PrimaryKey = TableB.ForeignKey
WHERE TableC.SomeColumn = 'XXX'
In the above case Table A and Table B are large tables (may contain more than 1 million rows), but Table C is small, with just 25 rows.
I have applied indexes on primary keys of all the tables.
In our application scenario, I need to search in TableC for just two conditions, TableC.SomeColumn = 'XXX' or TableC.SomeColumn = 'YYY'.
My question is what is the most efficient way to do this. A straight join does work, but I am concerned about joining with all the rows in TableB, just to pick a small subset of it, when joined in Table C.
Is it a good approach to have an indexed view?
For example,
CREATE INDEXED VIEW FOR TableB
JOIN TableC ON TableC.PrimaryKey = TableB.ForeignKey
WHERE TableC.SomeColumn IN ('XXX', 'YYY'))?
You where clause undoes the outer join, so you might as well write the query as:
SELECT a.Column1
FROM TableA a JOIN
TableB b
ON a.ForeignKey = b.PrimaryKey JOIN
TableC c
ON c.PrimaryKey = b.ForeignKey
WHERE c.SomeColumn = 'XXX';
For this query, you want indexes these indexes:
TableC(SomeColumn, PrimaryKey)
TableB(ForeignKey, PrimaryKey)
TableA(ForeignKey, Column1)
You can create an indexed view. That would generally be the fastest for querying. However, it can incur a lot more overhead for updates and inserts into any of the base tables.
I typically only use a JOIN when I need to SELECT or GROUP on the data, not when using it as a predicate. That said, I would be very curious to see if Gordon's answer or this one performs better.
I would also suggest getting in the habit of using alias' when referencing your tables, its less typing, and makes your code easier to read.
I would test and compare execution times:
SELECT A.Column1
FROM TableA A
WHERE EXISTS (SELECT 1
FROM TableB B
WHERE A.ForeignKey = B.PrimaryKey
AND EXISTS (SELECT 1
FROM TableC C
WHERE C.PrimaryKey = B.ForeignKey
AND C.SomeColumn = 'XXX'))

Can the order of Inner Joins Change the results o a query

I have the following scenario on a SQL Server 2008 R2:
The following queries returns :
select * from TableA where ID = '123'; -- 1 rows
select * from TableB where ID = '123'; -- 5 rows
select * from TableC where ID = '123'; -- 0 rows
When joining these tables the following way, it returns 1 row
SELECT A.ID
FROM TableA A
INNER JOIN ( SELECT DISTINCT ID
FROM TableB ) AS D
ON D.ID = A.ID
INNER JOIN TableC C
ON A.ID = C.ID
ORDER BY A.ID
But, when switching the inner joins order it does not returns any row
SELECT A.ID
FROM TableA A
INNER JOIN TableC C
ON A.ID = C.ID
INNER JOIN ( SELECT DISTINCT ID
FROM TableB ) AS D
ON D.ID = A.ID
ORDER BY A.ID
Can this be possible?
Print Screen:
For inner joins, the order of the join operations does not affect the query (it can affect the ordering of the rows and columns, but the same data is returned).
In this case, the result set is a subset of the Cartesian product of all the tables. The ordering doesn't matter.
The order can and does matter for outer joins.
In your case, one of the tables is empty. So, the Cartesian product is empty and the result set is empty. It is that simple.
As Gordon mentioned, for inner joins the order of joins doesn't matter, whereas it does matter when there's at least one outer join involved; however, in your case, none of this is pertinent as you are inner joining 3 tables, one of which will return zero rows - hence all combinations will result in zero rows.
You cannot reproduce the erratic behavior with the queries as they are shown in this question since they will always return zero records. You can try it again on your end to see what you come up with, and if you do find a difference, please share it with us then.
For the future, whenever you have something like this, creating some dummy data either in the form of insert statements or in rextester or the like, you make it that much easier for someone to help you.
Best of luck.

SQL filter LEFT TABLE before left join

I have read a number of posts from SO and I understand the differences between filtering in the where clause and on clause. But most of those examples are filtering on the RIGHT table (when using left join). If I have a query such as below:
select * from tableA A left join tableB B on A.ID = B.ID and A.ID = 20
The return values are not what I expected. I would have thought it first filters the left table and fetches only rows with ID = 20 and then do a left join with tableB.
Of course, this should be technically the same as doing:
select * from tableA A left join table B on A.ID = B.ID where A.ID = 20
But I thought the performance would be better if you could filter the table before doing a join. Can someone enlighten me on how this SQL is processed and help me understand this thoroughly.
A left join follows a simple rule. It keeps all the rows in the first table. The values of columns depend on the on clause. If there is no match, then the corresponding table's columns are NULL -- whether the first or second table.
So, for this query:
select *
from tableA A left join
tableB B
on A.ID = B.ID and A.ID = 20;
All the rows in A are in the result set, regardless of whether or not there is a match. When the id is not 20, then the rows and columns are still taken from A. However, the condition is false so the columns in B are NULL. This is a simple rule. It does not depend on whether the conditions are on the first table or the second table.
For this query:
select *
from tableA A left join
tableB B
on A.ID = B.ID
where A.ID = 20;
The from clause keeps all the rows in A. But then the where clause has its effect. And it filters the rows so on only id 20s are in the result set.
When using a left join:
Filter conditions on the first table go in the where clause.
Filter conditions on subsequent tables go in the on clause.
Where you have from tablea, you could put a subquery like from (select x.* from tablea X where x.value=20) TA
Then refer to TA like you did tablea previously.
Likely the query optimizer would do this for you.
Oracle should have a way to show the query plan. Put "Explain plan" before the sql statement. Look at the plan both ways and see what it does.
In your first SQL statement, A.ID=20 is not being joined to anything technically. Joins are used to connect two separate tables together, with the ON statement joining columns by associating them as keys.
WHERE statements allow the filtering of data by reducing the number of rows returned only where that value can be found under that particular column.

SQL Understanding Outer Joins with Criteria and Why the Order of Criteria can alter Results

So I know how some things work in SQL but I don't know why and I'm unable to find a nice layman description of this online. For reference I'm using Oracle 11g and TOAD.
Question 1 - Outer Joins with Criteria
I know that if you place criteria on an outer joined table, you turn the query into an inner join regardless of your Syntax. So this query acts as an inner join:
SELECT a.field1, b.field1
FROM tableA a
LEFT JOIN tableB b on a.key = b.key
WHERE b.field2 = 'someCriteria'
The way to get around this is to include an "OR IS NULL" in the second table's criteria. I know this to be true but I've never been able to wrap my head around why this is. Can someone explain why criteria on an outer table turns an outer join into an inner join?
Question 2 - Adding Criteria to Different Clauses Changes Results
So the above being true, I've been struggling with how the order of my criteria can alter the results of the following two queries. I have two tables - tableA and tableB - and need to do a left join compare of a subset of tableA to a subset of tableB.
SQL1
SELECT DISTINCT a.field1, b.field2
FROM tableA a
LEFT JOIN tableB b
on a.key = b.key
AND (b.field2 = 'somecriteria' or b.field2 IS NULL)
WHERE a.field1 = 'othercriteria'
Results: SQL1 gives me the correct left joined results.
SQL2
SELECT DISTINCT a.field1, b.field2
FROM tableA a
LEFT JOIN tableB b
on a.key = b.key
WHERE a.field1 = 'othercriteria'
AND (b.field2 = 'somecriteria' or b.field2 IS NULL)
Results: SQL2 pulls back only an inner joined result of the two subsets (excluding those rows in tableA where tableB does not have a match).
Understanding The Results
The reason for this has something to do with the order in which the join and the where runs. I could understand this changing performance but I'm not following why it would change the results as the syntax is almost identical. To wrap my head around it, I ran the execution plans for both queries and got the following results (from TOAD):
Execution Plans:
SQL 1:
5) Select Statement (Rows were returned by the Select statement)
4) Sort Unique (The rows from step 3 were sorted to eliminate duplicate rows)
3) Hash Join Right Outer (The result sets from steps 1, 2 were joined (hash)
1) Table Access Full TABLE tableB (Every row in the table tableB is read)
2) Table Access Full TABLE tableA (Every row in the table tableA is read)
SQL 2:
11) Select Statement (Rows were returned by the Select statement)
10) Sort Unique (The rows from step 9 were sorted to eliminate duplicate rows)
9) All distinct rows from steps 4, 8 were returned
4) Filter (For the rows returned by step 3, filter out rows depending on filter criteria)
3) Hash Join Right Outer (The result sets from steps 1, 2 were joined (hash)
1) Table Access Full TABLE tableB (Every row in the table tableB is read
2) Table Access Full TABLE tableA (Every row in the table tableA is read)
8) Filter (For the rows returned by step 7, filter out rows depending on filter criteria)
7) Hash Join Right Outer (The result sets from steps 5, 6 were joined (hash)
5) Table Access Full TABLE tableB (Every row in the table tableB is read)
6) Table Access Full TABLE tableA (Every row in the table tableA is read)
So I have no doubt that the above execution plans explain perfectly why SQL 2 gives me different results than SQL 1 but I'm having a hard time reading these plans. Can someone help me translate these execution plans and explain why SQL2 is treated as an Inner Join because the tableA criteria is listed in the WHERE clause instead of the JOIN?
Thanks in advance!!
For question 1, consider table a containing two rows:
key field1
1 a
2 b
And table b containing 3 rows:
key field1 field2
1 c someCriteria
1 d notSomeCriteria
1 e NULL
your FROM clause (with its JOINs) effectively generates a result set that looks like this:
(a)key (a)field1 (b)key (b)field1 (b)field2
1 a 1 c someCriteria
1 a 1 d notSomeCriteria
1 a 1 e NULL
2 b NULL NULL NULL
By the time the WHERE clause is considered, it no longer really "knows" whether a particular JOIN was successful or not - it doesn't selectively apply criteria based on whether or not the join succeeded. So if you've specified the b.field2 should equal someCriteria, you're saying that it should only return the first row (1,a,1,c,someCriteria).
If you want to make particular assertions about NULLable columns, you do really want the WHERE clause to act this way and force you to explicitly consider NULLs (whether those be generated from a NULL column or by a JOIN failing)
The cure I'd usually adopt is the one shown in HLGEM's answer, rather than adding OR b.field2 IS NULL since you usually want to exclude the (1,a,1,e,NULL) row.
I would write the first as:
SELECT a.field1, b.field1
FROM tableA a
LEFT JOIN tableB b on a.key = b.key
AND b.field2 = 'someCriteria'
This would rerturn all records from table a and the b.field1 would only have data if b.field2 = 'somecriteria'