SELECT *
FROM t1, t2 , t3
WHERE t1.row_id = t2.invoice_id(+)
and t2.voi_id = t3.row_id(+)
and type = 'Dec'
order by 1
I have 3 indexes, one for each column in the join, but it seems that the explain plan uses a full table scan on the tables without using the indexes:
Plan
1 Every row in the table t1 is read.
2 The rows were sorted to support the join at step 5.
3 Every row in the table t2 is read.
4 The rows were sorted to support the join at step 5.
5 Join the sorted results sets provided from steps 2, 4.
6 Rows were returned by the SELECT statement.
It is depend on rowcounts and size of tables. In your query all rows t1 will be fetched (because used left join and all rows from t1 with type='Dec' will be shown). That's why TABLE ACCESS FULL to table t1 is normal.
If rowcount in t1 is more than 20-30% rowcount in t2 (% depend on t2 size) also TABLE ACCESS FULL to t2 and their hash join is normal scenario.
Related
Currently we are using SQL Server 2019 and from time to time we tend to use TOP (max INT) to prioritize the execution of a sub-query.
The main reason to do this is to make the starting result set as small as possible and thus avoid excessive reads when joining with other tables.
most common scenario in which it helps:
t1: is the main table we are querying has about 200k rows
t2,3: just some other tables with max 5k rows
pres: is a view with basically all the fields we use for presentation of e.g. product with about 30 JOINS and also containing table t1 + LanguageID
SELECT t1.Id, "+30 Fields from tables t1,t2,t3, pres"
FROM t1
INNER JOIN pres ON pres.LanguageId=1 AND t1.Id=pres.Id
INNER JOIN t2 ON t1.vtype=t2.Id
LEFT JOIN t3 ON t1.color=t3.Id
WHERE 1=1
AND t1.f1=0
AND t1.f2<>76
AND t1.f3=2
we only expect about 300 rows, but it takes about 12 seconds to run
SELECT t.Id, "10 Fields from tables t1,t2,t3 + 20 fields from pres"
FROM (
SELECT TOP 9223372036854775807 t1.Id, "about 10 fields from table t1,t2,t3"
FROM t1
INNER JOIN t2 ON t1.vtype=t2.Id
LEFT JOIN t3 ON t1.color=t3.Id
WHERE 1=1
AND t1.f1=0
AND t1.f2<>76
AND t1.f3=2
) t
INNER JOIN pres ON pres.LanguageId=1 AND t.Id=pres.Id
we only expect about 300 rows, but it takes about 2 seconds to run
Let's say I have three tables: t1(it has about 1 billion rows, fact table) and t2(empty table, 0 rows). and t0 (dimension table), all of them have properly collected statistics. In addition there is view v0:
REPLACE VIEW v0
AS SELECT * from t1
union
SELECT * from t2;
Let's look to these three queries:
1) Select * from t1 inner t0 join on t1.id = t0.id; -- Optimizer correctly estimates 1 bln rows
2) Select * from t2 inner t0 join on t1.id = t0.id; -- Optimizer correctly estimates 0 row
3) Select * from v0 inner t0 join on v0.id = t0.id; -- Optimizer locks t1 and t2 for read, the correctly estimated, that it will get 1 bln rows from t1, but for no clear reasons estimated same number 1 bln from table t2.
What is going on here? Is is it the bug or a feature?
PS. Original query, that pretty big to show here, didn't finished in 35 minutes. After leaving just t1 - successfully finished in 15 minutes.
TD Release: 15.10.03.07
TD Version: 15.10.03.09
It's not the same number for the 2nd Select, it's the overall number of rows in spool after the 2nd Select, which is 1 billion plus 0.
And you query was running slowly because you used a UNION which defaults to DISTINCT, running this on a billion rows is really expensive.
Better switch to UNION ALL instead.
I have two tables with 3.5 million rows of data. I am creating a left join between the two to create a new view.
Code 1:
SELECT t1.c1,t1.c2,t2.c3,t2.c4
from table1 as t1
left join table2 as t2
on t1.Location=t2.Location and t1.OrderNumber=t2.OrderNumber and t1.Customer=t2.Customer
Code 2:
SELECT t1.c1,t1.c2,t2.c3,t2.c4
from table1 as t1
left join table2 as t2
on t1.OrderNumber=t2.OrderNumber
Both snippets of code give the same desired result as the Order number field in table 2 has only unique values.
Is it better to give more fields to JOIN compared to only one?
SELECT t1.c1,t1.c2,t2.c3,t2.c4
from table1 as t1
left join table2 as t2
on t1.Location = t2.Location
and t1.OrderNumber = t2.OrderNumber
and t1.Customer = t2.Customer
If OrderNumber is the PK of either table then adding additional fields will not change the results and it will not improve performance unless an index as not present on the other side.
If Order number field in table 2 has only unique values it would not change the query. If it is a PK or has a unique constraint/index then addition fields would not help unless what Table2.OrderNumber was joined to was not indexed.
I have two queries that supposed to bring equivalent results. However the second query gives only partial results (less than 10 % of the total).
First query gives more than 4 million rows
SELECT id, amount
FROM table1 t1 LEFT OUTER JOIN table2 t2 ON t1.id = t2.id;
Second give only 18 thousand records
CREATE VOLATILE TABLE vt AS
(
SELECT id, amount
FROM table1 t1 LEFT OUTER JOIN table2 t2 ON t1.id = t2.id;
)
WITH DATA
NO PRIMARY INDEX
ON COMMIT PRESERVE ROWS;
SELECT *
FROM vt ;
Why does the second query give less records ???
When you do a SHOW TABLE vt; you'll notice that it's created as a SET table, which doesn't store duplicate rows. There are only 18 thousand distinct (id,amount) combinations.
Either add DISTINCT to your first Select or use CREATE MULTISET VOLATILE TABLE.
I have the following SQL statement in a legacy system I'm refactoring. It is an abbreviated view for the purposes of this question, just returning count(*) for the time being.
SELECT COUNT(*)
FROM Table1
INNER JOIN Table2
INNER JOIN Table3 ON Table2.Key = Table3.Key AND Table2.Key2 = Table3.Key2
ON Table1.DifferentKey = Table3.DifferentKey
It is generating a very large number of records and killing the system, but could someone please explain the syntax? And can this be expressed in any other way?
Table1 contains 419 rows
Table2 contains 3374 rows
Table3 contains 28182 rows
EDIT:
Suggested reformat
SELECT COUNT(*)
FROM Table1
INNER JOIN Table3
ON Table1.DifferentKey = Table3.DifferentKey
INNER JOIN Table2
ON Table2.Key = Table3.Key AND Table2.Key2 = Table3.Key2
For readability, I restructured the query... starting with the apparent top-most level being Table1, which then ties to Table3, and then table3 ties to table2. Much easier to follow if you follow the chain of relationships.
Now, to answer your question. You are getting a large count as the result of a Cartesian product. For each record in Table1 that matches in Table3 you will have X * Y. Then, for each match between table3 and Table2 will have the same impact... Y * Z... So your result for just one possible ID in table 1 can have X * Y * Z records.
This is based on not knowing how the normalization or content is for your tables... if the key is a PRIMARY key or not..
Ex:
Table 1
DiffKey Other Val
1 X
1 Y
1 Z
Table 3
DiffKey Key Key2 Tbl3 Other
1 2 6 V
1 2 6 X
1 2 6 Y
1 2 6 Z
Table 2
Key Key2 Other Val
2 6 a
2 6 b
2 6 c
2 6 d
2 6 e
So, Table 1 joining to Table 3 will result (in this scenario) with 12 records (each in 1 joined with each in 3). Then, all that again times each matched record in table 2 (5 records)... total of 60 ( 3 tbl1 * 4 tbl3 * 5 tbl2 )count would be returned.
So, now, take that and expand based on your 1000's of records and you see how a messed-up structure could choke a cow (so-to-speak) and kill performance.
SELECT
COUNT(*)
FROM
Table1
INNER JOIN Table3
ON Table1.DifferentKey = Table3.DifferentKey
INNER JOIN Table2
ON Table3.Key =Table2.Key
AND Table3.Key2 = Table2.Key2
Since you've already received help on the query, I'll take a poke at your syntax question:
The first query employs some lesser-known ANSI SQL syntax which allows you to nest joins between the join and on clauses. This allows you to scope/tier your joins and probably opens up a host of other evil, arcane things.
Now, while a nested join cannot refer any higher in the join hierarchy than its immediate parent, joins above it or outside of its branch can refer to it... which is precisely what this ugly little guy is doing:
select
count(*)
from Table1 as t1
join Table2 as t2
join Table3 as t3
on t2.Key = t3.Key -- join #1
and t2.Key2 = t3.Key2
on t1.DifferentKey = t3.DifferentKey -- join #2
This looks a little confusing because join #2 is joining t1 to t2 without specifically referencing t2... however, it references t2 indirectly via t3 -as t3 is joined to t2 in join #1. While that may work, you may find the following a bit more (visually) linear and appealing:
select
count(*)
from Table1 as t1
join Table3 as t3
join Table2 as t2
on t2.Key = t3.Key -- join #1
and t2.Key2 = t3.Key2
on t1.DifferentKey = t3.DifferentKey -- join #2
Personally, I've found that nesting in this fashion keeps my statements tidy by outlining each tier of the relationship hierarchy. As a side note, you don't need to specify inner. join is implicitly inner unless explicitly marked otherwise.