Oracle 11g Performance of Join - sql

I was doing some random testing on oracle 11g, and realized a strange performance difference between different JOIN using SQL developer.
I inserted 200,000 records into RANDOM_TABLE, and about 300 in unrelated_table, then run the two query for 10 times, the time stated is an average.
The two tables are totally unrelated, so the first one should give the same result as the second one, and indeed the row counts are the same.
1. SELECT * FROM some_random_table t1 LEFT JOIN unrelated_table t2 ON 1=1;
~0.005 seconds to fetch the first 50 row.
2. SELECT * FROM some_random_table t1 RIGHT JOIN unrelated_table t2 ON 1=1;
>0.05 seconds to fetch the first 50 row.
1. SELECT * FROM some_random_table t1 FULL JOIN unrelated_table t2 ON 1=1;
~0.005 seconds to fetch the first 50 row.
2. SELECT * FROM some_random_table t1 CROSS JOIN unrelated_table t2;
>0.05 seconds to fetch the first 50 row.
Can anyone explain the difference between these queries? Why are some faster and some slower by an order of magnitude?

Related

Is there a better way to prioritize a sub query instead of using TOP?

Currently we are using SQL Server 2019 and from time to time we tend to use TOP (max INT) to prioritize the execution of a sub-query.
The main reason to do this is to make the starting result set as small as possible and thus avoid excessive reads when joining with other tables.
most common scenario in which it helps:
t1: is the main table we are querying has about 200k rows
t2,3: just some other tables with max 5k rows
pres: is a view with basically all the fields we use for presentation of e.g. product with about 30 JOINS and also containing table t1 + LanguageID
SELECT t1.Id, "+30 Fields from tables t1,t2,t3, pres"
FROM t1
INNER JOIN pres ON pres.LanguageId=1 AND t1.Id=pres.Id
INNER JOIN t2 ON t1.vtype=t2.Id
LEFT JOIN t3 ON t1.color=t3.Id
WHERE 1=1
AND t1.f1=0
AND t1.f2<>76
AND t1.f3=2
we only expect about 300 rows, but it takes about 12 seconds to run
SELECT t.Id, "10 Fields from tables t1,t2,t3 + 20 fields from pres"
FROM (
SELECT TOP 9223372036854775807 t1.Id, "about 10 fields from table t1,t2,t3"
FROM t1
INNER JOIN t2 ON t1.vtype=t2.Id
LEFT JOIN t3 ON t1.color=t3.Id
WHERE 1=1
AND t1.f1=0
AND t1.f2<>76
AND t1.f3=2
) t
INNER JOIN pres ON pres.LanguageId=1 AND t.Id=pres.Id
we only expect about 300 rows, but it takes about 2 seconds to run

Teradata optimizer wrongly estimates row number then accessing to table through view with union

Let's say I have three tables: t1(it has about 1 billion rows, fact table) and t2(empty table, 0 rows). and t0 (dimension table), all of them have properly collected statistics. In addition there is view v0:
REPLACE VIEW v0
AS SELECT * from t1
union
SELECT * from t2;
Let's look to these three queries:
1) Select * from t1 inner t0 join on t1.id = t0.id; -- Optimizer correctly estimates 1 bln rows
2) Select * from t2 inner t0 join on t1.id = t0.id; -- Optimizer correctly estimates 0 row
3) Select * from v0 inner t0 join on v0.id = t0.id; -- Optimizer locks t1 and t2 for read, the correctly estimated, that it will get 1 bln rows from t1, but for no clear reasons estimated same number 1 bln from table t2.
What is going on here? Is is it the bug or a feature?
PS. Original query, that pretty big to show here, didn't finished in 35 minutes. After leaving just t1 - successfully finished in 15 minutes.
TD Release: 15.10.03.07
TD Version: 15.10.03.09
It's not the same number for the 2nd Select, it's the overall number of rows in spool after the 2nd Select, which is 1 billion plus 0.
And you query was running slowly because you used a UNION which defaults to DISTINCT, running this on a billion rows is really expensive.
Better switch to UNION ALL instead.

sql sum and count functions with inner join

select *
from Table1 t1
inner join Table2 t2 on t1.id=t2.tid
returns 102 rows
select sum(t1.val), count(t1.val)
from Table1 t1
inner join Table2 t2 on t1.id=t2.tid
returns 29000 103
That means the second query doesnt work correctly. What the problem?
Looks like one of your 103 values has null in val column.
select sum(t1.val), count(*)
from Table1 t1
inner join Table2 t2 on t1.id=t2.tid
This should return 103 for count. At least in MS SQL Server. But I think it's part of SQL ANSI, so should work for all SQL ANSI compliant database engines
As you haven't specified a DBMS, I'm answering based on just SQL, as you tagged it. Anyway, this should apply to any DBMS.
You have 2 different queries that have the same join. The join will generate the same amount of results in both cases. It is clear from the first one that there are 102 results after the join.
If you then want to count those rows there is not way that you get more rows than there really are. What can happen is that you get less because the count(field) aggregation function will count only non-null values for field.
However, you stated that you got more and that is absolutely not possible.

LIMIT and JOIN order of actions

I have a query which includes LIMIT to the main table and JOIN.
My questions is that: what is coming before? the query finds the x rows of the LIMIT and then doing JOIN to these rows or doing first the JOIN on the all rows and just after that LIMIT?
LIMIT Applies to the query to which it is applied. It will be applied to the query AFTER the JOINs in that query, but if the derived table is JOINed to other tables, that/those JOIN(s) comes after.
e.g.
SELECT ..
FROM (SELECT ..
FROM TABLE1 T1
JOIN TABLE2 T2 ON ..
LIMIT 10) X
JOIN OTHERTABLE Y
LIMIT 20;
The JOIN between T1 and T2 occurs first
LIMIT 10 is applied to result from the previous step, so only 10 records from this derived table will be used in the outer query
LIMIT 20 is applied to the result of the JOIN between X and Y
Although LIMIT is a specific keyword for PostgreSQL, MySQL and SQLite, the TOP keyword and processing in SQL Server works the same way.
doing first the JOIN on the all rows and just after that LIMIT

SQL query to limit number of rows having distinct values

Is there a way in SQL to use a query that is equivalent to the following:
select * from table1, table2 where some_join_condition
and some_other_condition and count(distinct(table1.id)) < some_number;
Let us say table1 is an employee table. Then a join will cause data about a single employee to be spread across multiple rows. I want to limit the number of distinct employees returned to some number. A condition on row number or something similar will not be sufficient in this case.
So what is the best way to get the same effect the same output as intended by the above query?
select *
from (select * from employee where rownum < some_number and some_id_filter), table2
where some_join_condition and some_other_condition;
This will work for nearly all DBs
SELECT *
FROM table1 t1
INNER JOIN table2 t2
ON some_join_condition
AND some_other_condition
INNER JOIN (
SELECT t1.id
FROM table1 t1
HAVING
count(t1.ID) > someNumber
) on t1.id = t1.id
Some DBs have special syntax to make this a little bit eaiser.
I may not have a full understanding of what you're trying to accomplish, but lets say you're trying to get it down to 1 row per employee, but each join is causing multiple rows per employee and grouping by employee name and other fields is still not unique enough to get it down to a single row, then you can try using ranking and partitioning and then select the rank you prefer for each employee partition.
See example : http://msdn.microsoft.com/en-us/library/ms176102.aspx