joining larger volumes of table in oracle - sql

I want to join three larger tables in oracle. TableA has 370 million rows, TableB has 370 million rows and the master table TableM has 600 000 rows. TableM is the master table of the other two tables TableA and TableB.
My query was like
Select A.MasterId, B.Date1
FROM TableA A
INNER JOIN TableB B on B.MasterId= A.MasterId
INNER JOIN TableM M ON M.MasterId= A.MasterId
When I execute the above query, its taking a long time. I wanted to split the query execution with WHERE clause by taking the values of five years data. We have total of 25 years of data, so five times I can execute the below query and insert the values to Temp table.
My approaches are.
Approach 1:
Using UNION operator, I can combine the result set and insert the values to Temp table. It took too long.
Select A.MasterId, B.Date1
FROM TableA A
INNER JOIN TableB B on B.MasterId= A.MasterId
INNER JOIN TableM M ON M.MasterId= A.MasterId
WHERE M.Date > '01-JAN-1985' and M.Date <'01-JAN-1990'
UNION ALL
Select A.MasterId, B.Date1
FROM TableA A
INNER JOIN TableB B on B.MasterId= A.MasterId
INNER JOIN TableM M ON M.MasterId= A.MasterId
WHERE M.Date > '01-JAN-1990' and M.Date <'01-JAN-1995'
.....
Approach 2:
Tried to insert the 5 years data to temp table by using bulk collect but it failed.
Is there any other way to handle this problem?

A full join over these 3 tables would result in 8.2140E+22 records, which seems like an unwieldy large dataset and that is also why it takes a loooooong time.
What would be the use of such a select?
For insert, use a simple INSERT INTO ... SELECT ... FROM ...
Performance should be much better than using pl/sql with bulk collect.

Related

I need help joining these two datasets to produce the results on this exercise

I'm trying to join table 1 and table 2 to create table 3 with using SQL. How would I be able to do this?
you need a join, count and group by
select a.region, b.partnerType, count(*)
from table1 a
inner join table2 b on a.DistrectID = b.DistrectID
group by a.region, b.partnerType

Postgresql why is INNER JOIN so much slower than WHERE

I have 2 tables where I copy file name from one table to another in an update operation. Using INNER JOIN makes the query run in 22 seconds when there are just ~4000 rows. Using a WHERE clause allows it to run it in about 200 milliseconds. How and why is this happening, does the INNER JOIN result in additional looping?
Example 1 using INNER JOIN - Takes 22 seconds when table a has about 4k records.
UPDATE table_a SET file_name = tmp.file_name FROM
(
SELECT b.customer_id, b.file_name, b.file_id FROM table_b AS b WHERE b.status = 'A'
) tmp
INNER JOIN table_a AS a
ON tmp.customer_id=a.customer_id AND tmp.file_id=a.file_id;
Example 2 using WHERE runs in about 200 ms.
UPDATE table_a AS a SET file_name = tmp.file_name FROM
(
SELECT b.customer_id, b.file_name, b.file_id FROM table_b AS b WHERE b.status = 'A'
) tmp
WHERE tmp.customer_id=a.customer_id AND tmp.file_id=a.file_id;
The queries are doing totally different things. The first is updating every row in table_a with the expression. I am guessing that there are even multiple updates on the same row.
The two table_as in the first version are two different references to the table. The effect is a cross join because you have no conditions combining them.
The second method is the correct syntax for what you want to do in Postgres.

Refactor SQL statement with lots of common INNER JOINS and columns

I need suggestions on how to re-factor the following SQL expression. As you can see all the selected columns except col_N are same. Plus all the inner joins except the last one in the 2 sub-queries are the same. This is just a snippet of my code so I am not including the WHERE clause I have in my query. FYI-This is part of a stored procedure which is used by a SSRS report and performance is BIG for me due to thousands of records:
SELECT col_A
, col_B, col_C,...
, '' As[col_N]
FROM table_A
INNER JOIN table_B
INNER JOIN table_C
INNER JOIN table_D1
UNION
SELECT col_A
, col_B, col_C,...
, (select E.field_2 from table_E AS E where D2.field_1 = E.field_1 AND A.field_1 = E.field_2) AS [col_N]
FROM table_A as A
INNER JOIN table_B
INNER JOIN table_C
INNER JOIN table_D2 as D2
Jean's first suggestion of creating a view by joining A, B and C worked. I created a temp table by joining ABC and then used it to achieve significant performance improvement (query time reduces to half for couple thousand records)!

left join on MS SQL 2008 R2

I'm trying to left join two tables. Table A contains unique 100 records with field_a_1, field_a_2, field_a_3. The combination of field_a_1 and field_a_2 is unique.
Table B has multi-million records with multiple fields. field_b_1 is same as field_a_1 and field_b_2 is same as field_a_2.
I join the two tables together like this:
select a.*, b.*
from a
left join b
on field_a_1 = field_b_1
and field_a_2 = field_b_2
Instead of getting 100 records, I get multi-million records. Why is this?
Because table B has multiple rows for each table A entry.
For example:
TableA (ID)
1
2
3
TableB (ID, data)
1 hello
1 world
1 foo
1 bar
2 data
2 words
2 more
3 words
3 boring
If you left join from TableA to TableB, you will get a row for every TableB record that matches a TableA record - ie. all of them.
Can you explain what results you are looking for?
Because a left join returns all of the rows from the first table + all of the matching rows from the second table. Which of the millions of matching rows did you expect to get?
Left join or inner join don't really make a difference. A JOIN will return all rows that match the join condition. So if table b has millions of rows that match the JOIN criteria, then all the rows will be returned.
Depending on what you wish to accomplish you should consider using the DISTINCT keyword or GROUP BY to perform aggregate functions.

Should I use a temp table?

I have a report query that is taking 4 minutes, and under the maximum 30 seconds allowed limit applied on us.
I notice that it has a LOT of INNER JOINS. One, I see, is it joins to a Person table, which has millions of rows. I'm wondering if it would be more efficient to break up the query. Would it be more efficient to do something like:
Assume all keys are indexed.
Table C has 8 million records, Table B has 6 Million records, Table A has 400,000 records.
SELECT Fields
FROM TableA A
INNER JOIN TableB B
ON b.key = a.key
INNER JOIN Table C
ON C.key = b.CKey
WHERE A.id = AnInput
Or
SELECT *
INTO TempTableC
FROM TableC
WHERE id = AnInput
-- TempTableC now has 1000 records
Then
SELECT Fields
FROM TableA A
INNER JOIN TableB B --Maybe put this into a filtered temp table?
ON b.key = a.key
INNER JOIN TempTableC c
ON c.AField = b.aField
WHERE a.id = AnInput
Basically, bring the result sets into temp tables, then join.
If your Person table is indexed correctly, then the INNER JOIN should not be causing such a problem. Check that you have an index created on column(s) that are joined to in all your tables. Using temp tables for what appears to be a relatively simple query seems to be papering over the cracks of an inadequate database design.
As others have said, the only way to be sure is to post your query plan.