Speeding up a query with INNER JOIN - sql

I have a query that takes a long time to execute. I've waited for about 10 mins and it's still not finished executing.
The query looks something like this:
SELECT
one.ID,
two.NAME,
two.STATUS,
four.KEY,
four.VALUE,
count(one.ID) as num
FROM TABLE_ONE one, TABLE_TWO two, TABLE_THREE three, TABLE_FOUR four
WHERE one.STATE='RED'
AND (two.STATUS='ON' OR two.STATUS='OFF')
AND (
four.KEY='FINAL'
OR four.KEY='LIMIT'
OR (
four.KEY='MODE'
AND (
four.VALUE='T'
OR four.VALUE='R')))
GROUP BY one.ID, two.NAME, two.STATUS, four.KEY, four.VALUE
ORDER BY group_name ASC;
I have another query which is equivalent but executes very fast (about 1 second to execute).
Here is that query:
SELECT
one.ID,
two.NAME,
two.STATUS,
four.KEY,
four.VALUE,
count(one.ID) as num
FROM TABLE_ONE one
INNER JOIN TABLE_TWO two
ON one.ID=two.ID
INNER JOIN TABLE_THREE three
ON two.ID=three.GROUP_ID
INNER JOIN TABLE_FOUR four
ON three.ID=four.ID
WHERE one.STATE='RED'
AND (two.STATUS='ON' OR two.STATUS='OFF')
AND (
four.KEY='FINAL'
OR four.KEY='LIMIT'
OR (
four.KEY='MODE'
AND (
four.VALUE='T'
OR four.VALUE='R')))
GROUP BY one.ID, two.NAME, two.STATUS, four.KEY, four.VALUE
ORDER BY group_name ASC;
I'm kind of confused why the query with INNER JOIN executes really fast (about 1 second) and the one without takes a long time (waited about 10mins and still not finised executing).
Is there anything I can do to the query without the INNER JOIN to speed up the execution time?
I am using ORACLE.

In the first query, the tables are not really joined on any columns. The result is called cross join. Cross join between two table returns rows equals to number of rows in the first table times the numbers of rows in the second table.
Inner join joins based on given set of columns.

Your long running query has no join conditions to relate one table to the other. Therefore it is creating a cartesian product of all the records in each table. So if each table has 10 rows, it would generate 10*10*10*10=10,000 result rows before performing the aggregate functions. Larger tables just get worse. If each table had 1,000 rows you'd end up generating 1,000,000,000,000 rows.
Your faster query has join criteria which significantly reduces the number of rows in the result set, which is why it is more performant.

Lets say you have N values for ID. In the first query you will create N * N * N * N (or N ^ 4) rows.
In the second you will create N rows.
In big O notation:
O(N^4)
vs
O(N)
Now you have a real world example of the impact.

Related

A small table join large table on like, using impala

An impala query as below runs very slow.
SELECT pattern, MAX(time)
FROM (SELECT t.time, p.pattern
FROM t
JOIN p
ON (t.name LIKE p.pattern)) AS tmp
GROUP BY pattern
p is large of 1billion records, and t is small of only 1 record.
How can I optimize this? This is actually a nested loop join, but why did this take about half an hour to complete?
What's more, when I use the following query
SELECT time
FROM p
WHERE name LIKE 'one_pattern'
ORDER BY time DESC LIMIT 1
It only takes 3s. I am really confused.

SELECT FROM inner query slowdown

We have two very similar queries, one takes 22 seconds the other takes 6 seconds. Both use an inner select, have the exact same outer columns and outer joins. The only difference is the inner select that the outer query is using to join in on.
The inner query when run alone executes in 100ms or less in both cases and returns the EXACT SAME data.
Both queries as a whole have a lot of room for improvement, but this particular oddity is really puzzling to us and we just want to understand why. To me it would seem the inner query should be executed once in 100ms then the outer stuff happens. I have a feeling the inner select may be executed multiple times.
Query that takes 6 seconds:
SELECT {whole bunch of column names}
FROM (
SELECT projectItems.* FROM projectItems
WHERE projectItems.isActive = 1
ORDER BY projectItemsID ASC
OFFSET 0 ROWS FETCH NEXT 1 ROWS ONLY
) projectItems
LEFT JOIN categories
ON projectItems.fk_category = categories.categoryID
...{more joins}
Query that takes 22 seconds:
SELECT {whole bunch of column names}
FROM (
SELECT projectItems.* FROM projectItems
WHERE projectItems.isActive = 1
AND projectItemsID = 6539
) projectItems
LEFT JOIN categories
ON projectItems.fk_category = categories.categoryID
...{more joins}
For every row in your projectItems table, in the second function, you search two columns instead of one. If projectItemsID isn't the primary key or if it isn't indexed, it takes longer to parse an extra column.'
If you look at the sizes of the tables and the number of rows each query returns, you can calculate how many comparisons need to be made for each of the queries.
I believe that you're right that the inner query is being run for every single row that is being left joined with categories.
I can't find a proper source on it right now, but you can easily test this by doing something like this and comparing the run times. Here, we can at least be sure that the inner query is only running one time. (sorry if any syntax is incorrect, but you'll get the general idea):
DECLARE #innerQuery TABLE ( [all inner query columns here] )
INSERT INTO #innerQuery
SELECT projectItems.* FROM projectItems
WHERE projectItems.isActive = 1
AND projectItemsID = 6539
SELECT {whole bunch of field names}
FROM #innerQuery as IQ
LEFT JOIN categories
ON IQ.fk_category = categories.categoryID
...{more joins}

does left join order affect execute result and duration in sql server?

view a has 1 million+ rows. table b has 50000+ rows.
when left join, a left join b left join c cost 1 minute.
b left join a left join c cost 10 seconds.
they return same results.
does left join order affect result and execute duration?
They shouldn't return the same results, unless you have a where clause that does additional filtering.
The first should keep all 1,000,000 rows in A. The second should keep all 50,000 rows in B. If they match, then every row in A matches at least one row in B and every row in B matches at least one row in A.
As for your question, the join order does affect the results (as I just described). It doesn't really affect the execution, because the optimizer determines the execution plan independently of how the joins are ordered in the from clause. (Of course, different semantic differences will result in different execution plans.)

SQL Join between tables with conditions

I'm thinking about which should be the best way (considering the execution time) of doing a join between 2 or more tables with some conditions. I got these three ways:
FIRST WAY:
select * from
TABLE A inner join TABLE B on A.KEY = B.KEY
where
B.PARAM=VALUE
SECOND WAY
select * from
TABLE A inner join TABLE B on A.KEY = B.KEY
and B.PARAM=VALUE
THIRD WAY
select * from
TABLE A inner join (Select * from TABLE B where B.PARAM=VALUE) J ON A.KEY=J.KEY
Consider that tables have more than 1 milion of rows.
What your opinion? Which should be the right way, if exists?
Usually putting the condition in where clause or join condition has no noticeable differences in inner joins.
If you are using outer joins ,putting the condition in the where clause improves query time because when you use condition in the where clause of
left outer joins, rows which aren't met the condition will be deleted from the result set and the result set becomes smaller.
But if you use the condition in join clause of left outer joins ,no rows deletes and result set is bigger in comparison to using condition in the where clause.
for more clarification,follow the example.
create table A
(
ano NUMBER,
aname VARCHAR2(10),
rdate DATE
)
----A data
insert into A
select 1,'Amand',to_date('20130101','yyyymmdd') from dual;
commit;
insert into A
select 2,'Alex',to_date('20130101','yyyymmdd') from dual;
commit;
insert into A
select 3,'Angel',to_date('20130201','yyyymmdd') from dual;
commit;
create table B
(
bno NUMBER,
bname VARCHAR2(10),
rdate DATE
)
insert into B
select 3,'BOB',to_date('20130201','yyyymmdd') from dual;
commit;
insert into B
select 2,'Br',to_date('20130101','yyyymmdd') from dual;
commit;
insert into B
select 1,'Bn',to_date('20130101','yyyymmdd') from dual;
commit;
first of all we have normal query which joins 2 tables with each other:
select * from a inner join b on a.ano=b.bno
the result set has 3 records.
now please run below queries:
select * from a inner join b on a.ano=b.bno and a.rdate=to_date('20130101','yyyymmdd')
select * from a inner join b on a.ano=b.bno where a.rdate=to_date('20130101','yyyymmdd')
as you see above results row counts have no differences,and According to my experience there is no noticeable performance differences for data in large volume.
please run below queries:
select * from a left outer join b on a.ano=b.bno and a.rdate=to_date('20130101','yyyymmdd')
in this case,the count of output records will be equal to table A records.
select * from a left outer join b on a.ano=b.bno where a.rdate=to_date('20130101','yyyymmdd')
in this case , records of A which didn't met the condition deleted from the result set and as I said the result set will have less records(in this case 2 records).
According to above examples we can have following conclusions:
1-in case of using inner joins,
there is no special differences between putting condition in where clause or join clause ,but please try to put tables in from clause in order to have minimum intermediate result row counts:
(http://www.dba-oracle.com/art_dbazine_oracle10g_dynamic_sampling_hint.htm)
2-In case of using outer joins,whenever you don't care of exact result row counts (don't care of missing records of table A which have no paired records in table B and fields of table B will be null for these records in the result set),put the condition in the where clause to delete a set of rows which aren't met the condition and obviously improve query time by decreasing the result row counts.
but in special cases you HAVE TO put the condition in the join part.for example if you want that your result row count will be equal to table 'A' row counts(this case is common in ETL processes) you HAVE TO put the condition in the join clause.
3-avoiding subquery is recommended by lots of reliable resources and expert programmers.It usually increase the query time and you can use subquery just when its result data set is small.
I hope this will be useful:)
1M rows really isn't that much - especially if you have sensible indexes. I'd start off with making your queries as readable and maintainable as possible, and only start optimizing if you notice a perforamnce problem with the query (and as Gordon Linoff said in his comment - it's doubtful there would even be a difference between the three).
It may be a matter of taste, but to me, the third way seems clumsy, so I'd cross it out. Personally, I prefer using JOIN syntax for the joining logic (i.e., how A and B's rows are matched) and WHERE for filtering (i.e., once matched, which rows interest me), so I'd go for the first way. But again, it really boils down to personal taste and preferences.
You need to look at the execution plans for the queries to judge which is the most computationally efficient. As pointed out in the comments you may find they are equivalent. Here is some information on Oracle execution plans. Depending on what editor / IDE you use the may be a shortcut for this e.g. F5 in PL/SQL Developer.

Optimizing a SQL join statement

I am joining two tables. The first contains work orders and their associated part numbers. The second contains the BOM for all of the part numbers. They are both large tables. Individually, I can query the two tables in seconds if not less. When I perform the join, it takes minutes. Is it possible that the where at the end of this statement is being performed after the join? If the join is performed first, I could see this taking a long time. But if the first table is reduced first by the where, I would think this should go fast. Is there someway to write a more optimized query?
SELECT Table2.ItemNum As ItemNum
FROM Table1
INNER Join Table2
ON Table1.PartNum = Table2.PartNum
WHERE Table1.WorkOrder = 10100314
That will do a better job:
SELECT Table2.ItemNum As ItemNum
FROM Table2
INNER JOIN
(
SELECT *
FROM Table1
WHERE Table1.WorkOrder = 10100314
)AS Table1
ON Table1.PartNum = Table2.PartNum
Indexes on PartNum fields are required too ...