ORACLE VERSION : 19C
I am working on a legacy select query which returns around 60k rows. It İS formed of 9 joins and 2 unions. I want to exclude a small number of audience if they are inside the case i specified.
I wrote a select query using four joins and then used not in clause to exclude these audience.
The query was executing in aroung 15seconds before but after i wrote this not in clause it did not finish even in 20 minutes and i aborted it.
It is coded like this;
A.ID NOT IN (SELECT A.ID
FROM A
INNER JOIN B
ON A.X = BX
INNER JOIN C
ON B.Y = C.Y
INNER JOIN D
ON C.Z = D.Z)
However if i execute this subquery before the select and insert it into a table and then use not in clause for the table it almost finishes in 15 seconds just as normal
It is coded like this;
A.ID NOT IN (SELECT GT.ID FROM GENERATED_TABLE GT)
Do you know why it takes too much time when it is not populated into a table?
And are there any way to make the first one run faster?
Expecting it to take much less time
Try to use EXISTS instead of IN statement. The EXISTS clause is much faster than IN when the subquery results is very large.
And again - check EXPLAIN PLAN and search for FULL SCAN keywords. That will be the main cause.
I have recently noticed that SQL Server 2016 appears to be ignoring left joins if any column is not used in the select or where clause. Also not found in Actual execution plan.
This is good for if anyone added extra join but still not affecting performance.
I have query that took 9 sec, if I add column in select clause for Left join tables but without that only 1 sec.
Can anyone please check and suggest, Is that true or not?
Query with Actual execution plan. You can see there is no any table from left join in execution plan.
I'm not 100% sure what the question is asking, but a SQL optimizer can ignore left join. Consider this type of query:
select a.*
from a left join
b
on a.b_id = b.id;
If b.id is declared as unique (or equivalently a primary key) then the above query returns exactly the same result set as:
select a.*
from a;
I am not per se aware that SQL Server started putting this optimization in 2016. But the optimization is perfectly valid and (I believe) other optimizers do implement it.
Remember, SQL is a declarative language, not a procedural language. The SQL query describes the result set, not how it is produced.
If you have a left join and your matching condition don't return any data from the joined table it will return data as inner join return, when select statement does not contains columns from right tables. Not only in ms server 2016 but most of the DB's.
Left join reduces the performance of the query if there are large amount of data available in join tables.
I have 3 tables:
1 - tblMembers_Info
2 - a junction table
3 - tblCourses
I need to query the members who haven't done a specific course.
After trying to do it manually I gave MS Access "Query Wizard" a try. I ended up with :
A saved query as Query1:
// That one query who did the course
SELECT tblMembers_Info.*, tblCourses.CourseName
FROM tblMembers_Info
INNER JOIN
(tblCourses INNER JOIN tblMembers_Courses
ON tblCourses.IDCourses = tblMembers_Courses.IDCourses)
ON tblMembers_Info.Members_ID = tblMembers_Courses.Members_ID
WHERE (tblCourses.CourseName) In ('NameOftheCourse');
2nd query using the saved Query1:
SELECT tblMembers_Info.Members_ID, tblMembers_Info.FirstName, tblMembers_Info.LastName
FROM tblMembers_Info
LEFT JOIN [Query1]
ON tblMembers_Info.[Members_ID] = Query1.[Members_ID]
WHERE (((Query1.Members_ID) Is Null));
How can I replace the Query1 in the second query with the full query instead of using a QueryDef (the saved query "Query1")?
Also, there's a better way for sure to write that query, I would really appreciate any help.
You can simply replace LEFT JOIN [Query1] with LEFT JOIN (...) AS [Query1] where ... should be the SQL of the first query, without the ending ;.
But I think in your specific case the use of NOT IN might give a better performance to get the same results:
SELECT tblMembers_Info.Members_ID, tblMembers_Info.FirstName, tblMembers_Info.LastName
FROM tblMembers_Info
WHERE tblMembers_Info.[Members_ID] NOT IN (
SELECT tblMembers_Info.[Members_ID]
FROM ((tblMembers_Info
INNER JOIN tblMembers_Courses
ON tblMembers_Info.Members_ID = tblMembers_Courses.Members_ID)
INNER JOIN tblCourses
ON tblCourses.IDCourses = tblMembers_Courses.IDCourses)
WHERE tblCourses.CourseName = 'NameOftheCourse'
);
I'm thinking about which should be the best way (considering the execution time) of doing a join between 2 or more tables with some conditions. I got these three ways:
FIRST WAY:
select * from
TABLE A inner join TABLE B on A.KEY = B.KEY
where
B.PARAM=VALUE
SECOND WAY
select * from
TABLE A inner join TABLE B on A.KEY = B.KEY
and B.PARAM=VALUE
THIRD WAY
select * from
TABLE A inner join (Select * from TABLE B where B.PARAM=VALUE) J ON A.KEY=J.KEY
Consider that tables have more than 1 milion of rows.
What your opinion? Which should be the right way, if exists?
Usually putting the condition in where clause or join condition has no noticeable differences in inner joins.
If you are using outer joins ,putting the condition in the where clause improves query time because when you use condition in the where clause of
left outer joins, rows which aren't met the condition will be deleted from the result set and the result set becomes smaller.
But if you use the condition in join clause of left outer joins ,no rows deletes and result set is bigger in comparison to using condition in the where clause.
for more clarification,follow the example.
create table A
(
ano NUMBER,
aname VARCHAR2(10),
rdate DATE
)
----A data
insert into A
select 1,'Amand',to_date('20130101','yyyymmdd') from dual;
commit;
insert into A
select 2,'Alex',to_date('20130101','yyyymmdd') from dual;
commit;
insert into A
select 3,'Angel',to_date('20130201','yyyymmdd') from dual;
commit;
create table B
(
bno NUMBER,
bname VARCHAR2(10),
rdate DATE
)
insert into B
select 3,'BOB',to_date('20130201','yyyymmdd') from dual;
commit;
insert into B
select 2,'Br',to_date('20130101','yyyymmdd') from dual;
commit;
insert into B
select 1,'Bn',to_date('20130101','yyyymmdd') from dual;
commit;
first of all we have normal query which joins 2 tables with each other:
select * from a inner join b on a.ano=b.bno
the result set has 3 records.
now please run below queries:
select * from a inner join b on a.ano=b.bno and a.rdate=to_date('20130101','yyyymmdd')
select * from a inner join b on a.ano=b.bno where a.rdate=to_date('20130101','yyyymmdd')
as you see above results row counts have no differences,and According to my experience there is no noticeable performance differences for data in large volume.
please run below queries:
select * from a left outer join b on a.ano=b.bno and a.rdate=to_date('20130101','yyyymmdd')
in this case,the count of output records will be equal to table A records.
select * from a left outer join b on a.ano=b.bno where a.rdate=to_date('20130101','yyyymmdd')
in this case , records of A which didn't met the condition deleted from the result set and as I said the result set will have less records(in this case 2 records).
According to above examples we can have following conclusions:
1-in case of using inner joins,
there is no special differences between putting condition in where clause or join clause ,but please try to put tables in from clause in order to have minimum intermediate result row counts:
(http://www.dba-oracle.com/art_dbazine_oracle10g_dynamic_sampling_hint.htm)
2-In case of using outer joins,whenever you don't care of exact result row counts (don't care of missing records of table A which have no paired records in table B and fields of table B will be null for these records in the result set),put the condition in the where clause to delete a set of rows which aren't met the condition and obviously improve query time by decreasing the result row counts.
but in special cases you HAVE TO put the condition in the join part.for example if you want that your result row count will be equal to table 'A' row counts(this case is common in ETL processes) you HAVE TO put the condition in the join clause.
3-avoiding subquery is recommended by lots of reliable resources and expert programmers.It usually increase the query time and you can use subquery just when its result data set is small.
I hope this will be useful:)
1M rows really isn't that much - especially if you have sensible indexes. I'd start off with making your queries as readable and maintainable as possible, and only start optimizing if you notice a perforamnce problem with the query (and as Gordon Linoff said in his comment - it's doubtful there would even be a difference between the three).
It may be a matter of taste, but to me, the third way seems clumsy, so I'd cross it out. Personally, I prefer using JOIN syntax for the joining logic (i.e., how A and B's rows are matched) and WHERE for filtering (i.e., once matched, which rows interest me), so I'd go for the first way. But again, it really boils down to personal taste and preferences.
You need to look at the execution plans for the queries to judge which is the most computationally efficient. As pointed out in the comments you may find they are equivalent. Here is some information on Oracle execution plans. Depending on what editor / IDE you use the may be a shortcut for this e.g. F5 in PL/SQL Developer.
I've got the following SQL:
SELECT customfieldvalue.ISSUE
FROM customfieldvalue
WHERE customfieldvalue.STRINGVALUE
IN (SELECT customfieldvalue.STRINGVALUE
FROM customfieldvalue
WHERE customfieldvalue.CUSTOMFIELD = "10670"
GROUP BY customfieldvalue.STRINGVALUE
HAVING COUNT(*) > 1);
The inner nested select returns 3265 rows in 1.5secs on MySQL 5.0.77 when run on its own.
The customfieldvalue table contains 2286831 rows.
I want to return all values of the ISSUE column where the STRINGISSUE column value is not exclusive to that row and the CUSTOMFIELD column contains "10670".
When I try and run the query above, MySQL seems to be stuck. I've left it run for up to a minute, but I'm pretty sure the problem is my query.
Try something along these lines:
SELECT cfv1.ISSUE
COUNT(cfv2.STRINGVALUE) as indicator
FROM customfieldvalue cfv1
INNER JOIN customfieldvalue cfv2
ON cfv1.STRINGVALUE = cfv2.STRINGVALUE AND cfv2.CUSTOMFIELD = "10670"
GROUP BY cfv1.ISSUE
HAVING indicator > 1
This probably doesn't work on copy&paste as I haven't verified it, but in MySQL JOINs are often much much faster than subqueries, even orders of magnitude.