ORACLE VERSION : 19C
I am working on a legacy select query which returns around 60k rows. It İS formed of 9 joins and 2 unions. I want to exclude a small number of audience if they are inside the case i specified.
I wrote a select query using four joins and then used not in clause to exclude these audience.
The query was executing in aroung 15seconds before but after i wrote this not in clause it did not finish even in 20 minutes and i aborted it.
It is coded like this;
A.ID NOT IN (SELECT A.ID
FROM A
INNER JOIN B
ON A.X = BX
INNER JOIN C
ON B.Y = C.Y
INNER JOIN D
ON C.Z = D.Z)
However if i execute this subquery before the select and insert it into a table and then use not in clause for the table it almost finishes in 15 seconds just as normal
It is coded like this;
A.ID NOT IN (SELECT GT.ID FROM GENERATED_TABLE GT)
Do you know why it takes too much time when it is not populated into a table?
And are there any way to make the first one run faster?
Expecting it to take much less time
Try to use EXISTS instead of IN statement. The EXISTS clause is much faster than IN when the subquery results is very large.
And again - check EXPLAIN PLAN and search for FULL SCAN keywords. That will be the main cause.
Related
What will happen in an Oracle SQL join if I don't use all the tables in the WHERE clause that were mentioned in the FROM clause?
Example:
SELECT A.*
FROM A, B, C, D
WHERE A.col1 = B.col1;
Here I didn't use the C and D tables in the WHERE clause, even though I mentioned them in FROM. Is this OK? Are there any adverse performance issues?
It is poor practice to use that syntax at all. The FROM A,B,C,D syntax has been obsolete since 1992... more than 30 YEARS now. There's no excuse anymore. Instead, every join should always use the JOIN keyword, and specify any join conditions in the ON clause. The better way to write the query looks like this:
SELECT A.*
FROM A
INNER JOIN B ON A.col1 = B.col1
CROSS JOIN C
CROSS JOIN D;
Now we can also see what happens in the question. The query will still run if you fail to specify any conditions for certain tables, but it has the effect of using a CROSS JOIN: the results will include every possible combination of rows from every included relation (where the "A,B" part counts as one relation). If each of the three parts of those joins (A&B, C, D) have just 100 rows, the result set will have 1,000,000 rows (100 * 100 * 100). This is rarely going to give the results you expect or intend, and it's especially suspect when the SELECT clause isn't looking at any of the fields from the uncorrelated tables.
Any table lacking join definition will result in a Cartesian product - every row in the intermediate rowset before the join will match every row in the target table. So if you have 10,000 rows and it joins without any join predicate to a table of 10,000 rows, you will get 100,000,000 rows as a result. There are only a few rare circumstances where this is what you want. At very large volumes it can cause havoc for the database, and DBAs are likely to lock your account.
If you don't want to use a table, exclude it entirely from your SQL. If you can't for reason due to some constraint we don't know about, then include the proper join predicates to every table in your WHERE clause and simply don't list any of their columns in your SELECT clause. If there's a cost to the join and you don't need anything from it and again for some very strange reason can't leave the table out completely from your SQL (this does occasionally happen in reusable code), then you can disable the joins by making the predicates always false. Remember to use outer joins if you do this.
Native Oracle method:
WITH data AS (SELECT ROWNUM col FROM dual CONNECT BY LEVEL < 10) -- test data
SELECT A.*
FROM data a,
data b,
data c,
data d
WHERE a.col = b.col
AND DECODE('Y','Y',NULL,a.col) = c.col(+)
AND DECODE('Y','Y',NULL,a.col) = d.col(+)
ANSI style:
WITH data AS (SELECT ROWNUM col FROM dual CONNECT BY LEVEL < 10)
SELECT A.*
FROM data a
INNER JOIN data b ON a.col = b.col
LEFT OUTER JOIN data c ON DECODE('Y','Y',NULL,a.col) = b.col
LEFT OUTER JOIN data d ON DECODE('Y','Y',NULL,a.col) = d.col
You can plug in a variable for the first Y that you set to Y or N (e.g. var_disable_join). This will bypass the join and avoid both the associated performance penalty and the Cartesian product effect. But again, I want to reiterate, this is an advanced hack and is probably NOT what you need. Simply leaving out the unwanted tables it the right approach 95% of the time.
I have 2 equivalent queries in Snowflake - one with left join and the other with inner join:
SELECT *
FROM A
INNER JOIN B ON a.id=b.id;
SELECT *
FROM A
LEFT JOIN B ON a.id=b.id
WHERE b.id IS NOT NULL;
The inner join does not finish after an hour while the left join takes only few seconds. Why would it happen?
EDIT:
There are two possibilities.
One is that there is really a difference in performance between the queries. That would occur because Snowflake chooses very different execution plans for the two queries. Of course the queries are not the same, but I might expect the execution plans to be similar. You can check explain to investigate this.
The second is that the queries actually have quite similar performance, but you start seeing results from the first query quickly. Why? Because every row in the first table is going to be in the result set -- even if there is no match. By contrast, if there is no match at all between the two tables, the second query has to process all the data before it knows there is no match.
I have recently noticed that SQL Server 2016 appears to be ignoring left joins if any column is not used in the select or where clause. Also not found in Actual execution plan.
This is good for if anyone added extra join but still not affecting performance.
I have query that took 9 sec, if I add column in select clause for Left join tables but without that only 1 sec.
Can anyone please check and suggest, Is that true or not?
Query with Actual execution plan. You can see there is no any table from left join in execution plan.
I'm not 100% sure what the question is asking, but a SQL optimizer can ignore left join. Consider this type of query:
select a.*
from a left join
b
on a.b_id = b.id;
If b.id is declared as unique (or equivalently a primary key) then the above query returns exactly the same result set as:
select a.*
from a;
I am not per se aware that SQL Server started putting this optimization in 2016. But the optimization is perfectly valid and (I believe) other optimizers do implement it.
Remember, SQL is a declarative language, not a procedural language. The SQL query describes the result set, not how it is produced.
If you have a left join and your matching condition don't return any data from the joined table it will return data as inner join return, when select statement does not contains columns from right tables. Not only in ms server 2016 but most of the DB's.
Left join reduces the performance of the query if there are large amount of data available in join tables.
For inner joins, is there any difference in performance to apply a filter in the JOIN ON clause or the WHERE clause? Which is going to be more efficient, or will the optimizer render them equal?
JOIN ON
SELECT u.name
FROM users u
JOIN departments d
ON u.department_id = d.id
AND d.name = 'IT'
VS
WHERE
SELECT u.name
FROM users u
JOIN departments d
ON u.department_id = d.id
WHERE d.name = 'IT'
Oracle 11gR2
There should be no difference. The optimizer should generate the same plan in both cases and should be able to apply the predicate before, after, or during the join in either case based on what is the most efficient approach for that particular query.
Of course, the fact that the optimizer can do something, in general, is no guarantee that the optimizer will actually do something in a particular query. As queries get more complicated, it becomes impossible to exhaustively consider every possible query plan which means that even with perfect information and perfect code, the optimizer may not have time to do everything that you'd like it to do. You'd need to take a look at the actual plans generated for the two queries to see if they are actually identical.
I prefer putting the filter criteria in the where clause.
With data warehouse queries, putting the filter criteria in the join seems to cause the query to last significantly longer.
For example, I have Table1 indexed by field Date, and Table2 partitioned by field Partition. Table2 is the biggest table in the query and is in another database server. I use driving_site hint to tell the optimizer to use Table2 partitions.
select /*+driving_site(b)*/ a.key, sum(b.money) money
from schema.table1 a
join schema2.table2#dblink b
on a.key = b.key
where b.partition = to_number(to_char(:i,'yyyymm'))
and a.date = :i
group by a.key`
If I do the query this way, it takes about 30 - 40 seconds to return the results.
If I don't do the query this way, it takes about 10 minutes until I cancel the execution with no results.
I'm thinking about which should be the best way (considering the execution time) of doing a join between 2 or more tables with some conditions. I got these three ways:
FIRST WAY:
select * from
TABLE A inner join TABLE B on A.KEY = B.KEY
where
B.PARAM=VALUE
SECOND WAY
select * from
TABLE A inner join TABLE B on A.KEY = B.KEY
and B.PARAM=VALUE
THIRD WAY
select * from
TABLE A inner join (Select * from TABLE B where B.PARAM=VALUE) J ON A.KEY=J.KEY
Consider that tables have more than 1 milion of rows.
What your opinion? Which should be the right way, if exists?
Usually putting the condition in where clause or join condition has no noticeable differences in inner joins.
If you are using outer joins ,putting the condition in the where clause improves query time because when you use condition in the where clause of
left outer joins, rows which aren't met the condition will be deleted from the result set and the result set becomes smaller.
But if you use the condition in join clause of left outer joins ,no rows deletes and result set is bigger in comparison to using condition in the where clause.
for more clarification,follow the example.
create table A
(
ano NUMBER,
aname VARCHAR2(10),
rdate DATE
)
----A data
insert into A
select 1,'Amand',to_date('20130101','yyyymmdd') from dual;
commit;
insert into A
select 2,'Alex',to_date('20130101','yyyymmdd') from dual;
commit;
insert into A
select 3,'Angel',to_date('20130201','yyyymmdd') from dual;
commit;
create table B
(
bno NUMBER,
bname VARCHAR2(10),
rdate DATE
)
insert into B
select 3,'BOB',to_date('20130201','yyyymmdd') from dual;
commit;
insert into B
select 2,'Br',to_date('20130101','yyyymmdd') from dual;
commit;
insert into B
select 1,'Bn',to_date('20130101','yyyymmdd') from dual;
commit;
first of all we have normal query which joins 2 tables with each other:
select * from a inner join b on a.ano=b.bno
the result set has 3 records.
now please run below queries:
select * from a inner join b on a.ano=b.bno and a.rdate=to_date('20130101','yyyymmdd')
select * from a inner join b on a.ano=b.bno where a.rdate=to_date('20130101','yyyymmdd')
as you see above results row counts have no differences,and According to my experience there is no noticeable performance differences for data in large volume.
please run below queries:
select * from a left outer join b on a.ano=b.bno and a.rdate=to_date('20130101','yyyymmdd')
in this case,the count of output records will be equal to table A records.
select * from a left outer join b on a.ano=b.bno where a.rdate=to_date('20130101','yyyymmdd')
in this case , records of A which didn't met the condition deleted from the result set and as I said the result set will have less records(in this case 2 records).
According to above examples we can have following conclusions:
1-in case of using inner joins,
there is no special differences between putting condition in where clause or join clause ,but please try to put tables in from clause in order to have minimum intermediate result row counts:
(http://www.dba-oracle.com/art_dbazine_oracle10g_dynamic_sampling_hint.htm)
2-In case of using outer joins,whenever you don't care of exact result row counts (don't care of missing records of table A which have no paired records in table B and fields of table B will be null for these records in the result set),put the condition in the where clause to delete a set of rows which aren't met the condition and obviously improve query time by decreasing the result row counts.
but in special cases you HAVE TO put the condition in the join part.for example if you want that your result row count will be equal to table 'A' row counts(this case is common in ETL processes) you HAVE TO put the condition in the join clause.
3-avoiding subquery is recommended by lots of reliable resources and expert programmers.It usually increase the query time and you can use subquery just when its result data set is small.
I hope this will be useful:)
1M rows really isn't that much - especially if you have sensible indexes. I'd start off with making your queries as readable and maintainable as possible, and only start optimizing if you notice a perforamnce problem with the query (and as Gordon Linoff said in his comment - it's doubtful there would even be a difference between the three).
It may be a matter of taste, but to me, the third way seems clumsy, so I'd cross it out. Personally, I prefer using JOIN syntax for the joining logic (i.e., how A and B's rows are matched) and WHERE for filtering (i.e., once matched, which rows interest me), so I'd go for the first way. But again, it really boils down to personal taste and preferences.
You need to look at the execution plans for the queries to judge which is the most computationally efficient. As pointed out in the comments you may find they are equivalent. Here is some information on Oracle execution plans. Depending on what editor / IDE you use the may be a shortcut for this e.g. F5 in PL/SQL Developer.