Joining two tables (with a many to one relationship) taking long - sql

I have two tables, table 1 has a column X, table 2 has a column that relates to column X.
In Table 1 the entries of column X are repeated (i.e entries can be {1,1,2,3,5,4,4,4,9})
In Table 2 the entries that column X are not repeated but all the distinct entries in table 1 appear
So there is one to many relationship
Now I want to join the two tables(as seen below in the code) and the performance is extremely slow!
Any ideas?
> CTE_DE_Normalised AS
(
SELECT *
FROM Table1 a
JOIN Table2 b
ON a.Id = b.Table1Id
)

Related

Almost equal table with different running time

I’m using oracle. I have two table A and B
Table A has 8000 rows and 5 five columns
Table B has 5500 rows and same 5 columns
All of 5500 rows in table B are contained in Table A and they are the same
I have a query like
With t1 as (select distinct(id) from table A/B)
,T2 as (select a.id, c.value, d.value from Table A/B a
Join table c c on “conditions”
Join table d d on “conditions”
) select * from t2
So the query with Table A works excellent but with Table B it freezes for eternity.
Data types and other properties are equal in table A and table A.
Where should i look for the problem?
I tried to explain plan but differences only in row “PX PARTITION HASH JOIN-FILTER” in Table A and “PX BLOCK ITERATOR ADAPTIVE” in table B

Optimising multiple join in hive

I have 4 four Hive Tables:
A - 1.2 billion records and 250 GB
B - 4 billion records and 1 TB
C - 30 billion records and 2 TB
D - 2 billion records and 100 GB
All the tables are not partitioned
A is the parent of B (one to many foreign key relation), B is the parent of C (one to many foreign key relation) and C is the parent of D (one to many foreign key relation)
Now I have to join these tables ; what would be the best approach to join these tables
I need to create a table E with columns from A,B,C,D duplicate values in columns of A,B,C is ok
Tables are rather big and map join is not an option in this case.
If one A to many B and one B to many C and one C to many D and you join them simultaneously then obviously such join causes huge rows multiplication.
And this is quite normal join behavior. Say if A has 10 keys and B has 100 rows per each key in A then after join them it will be 10 x 100 = 1000 rows (if join key in A is unique) and even more if join key in A is not unique. This results in huge dataset on join reducer.
And I suppose your final goal is to aggregate rows. In such case the best approach would be to pre-aggregate rows to the required grain and join aggregated datasets:
select A.*, B.* --aggregate here if necessary
(select <some aggregation here > from A group by <key> ) A
join
(select <some aggregation here > from B group by <key> ) B
on A.key=B.key
and so on...
Not sure if it is the best approach.
I have created intermidiate partitioned tables for all the tables partitioned on a common column.
Now for each partition, I have incrementally run the join query.

Comparing value in a table vs count of rows in other table

I have two tables A (primary key - unit_id) and B (primary key - unit_id)
I have a value (eg :4 ) in table A and has a unit_id.
I have 4 rows in table B with the same unit ID
I have to write a SQL query to check whether the value in table A matches with the count (rows) in table B with the same unit_id
You can just use inner join and you will see how many values from table A are in table B :
Select a.unit_id from Table1 a inner join Table2 b on a.unit_id = b.unit_id
Im assumin that is what you need , because as #StanislavL pointed out, you cant have more then one unique unit_id in each table.

left join on MS SQL 2008 R2

I'm trying to left join two tables. Table A contains unique 100 records with field_a_1, field_a_2, field_a_3. The combination of field_a_1 and field_a_2 is unique.
Table B has multi-million records with multiple fields. field_b_1 is same as field_a_1 and field_b_2 is same as field_a_2.
I join the two tables together like this:
select a.*, b.*
from a
left join b
on field_a_1 = field_b_1
and field_a_2 = field_b_2
Instead of getting 100 records, I get multi-million records. Why is this?
Because table B has multiple rows for each table A entry.
For example:
TableA (ID)
1
2
3
TableB (ID, data)
1 hello
1 world
1 foo
1 bar
2 data
2 words
2 more
3 words
3 boring
If you left join from TableA to TableB, you will get a row for every TableB record that matches a TableA record - ie. all of them.
Can you explain what results you are looking for?
Because a left join returns all of the rows from the first table + all of the matching rows from the second table. Which of the millions of matching rows did you expect to get?
Left join or inner join don't really make a difference. A JOIN will return all rows that match the join condition. So if table b has millions of rows that match the JOIN criteria, then all the rows will be returned.
Depending on what you wish to accomplish you should consider using the DISTINCT keyword or GROUP BY to perform aggregate functions.

Sql Query To read from 4 tables

i have 3 tables
table A, table B and a relational table C
table A has ta_id as primary key along with other columns
table B has tb_id as primary key along with other columns
table C has ta_id, tb_id as foreign keys along with other columns.
i wanted to find all the rows of table B which have a common ta_id in table C.
my sql query for the that is.
SELECT B.ta_id,
B.type,
B.language,
B.user_id
FROM B
INNER JOIN C
ON B.tb_id=C.tb_id
where C.ta_id = 1
ORDER BY B.user_id
the above query seems to be working..
but now i have another table called table D with D.tb_id as a foreign key (which is primary key in table B ).
each row of table B has 0 or more rows associated in table D or we can say
1 or more rows in table D has exactly one corresponding row in table B.
Now i want to list my each row of table B with all the associated rows of table D.
so it should look like this.
first row of table B
first corresponding row of table D
second corresponding row of table D
...
..
second row of table B
first corresponding row of table D
second corresponding row of table D
...
..
so in a way i am mixing the contents of 2 tables in display
Please tell me how to achieve this using a sql query..?
Waiting for reply..!
Thanks
Big O
Just add another inner join like this:
SELECT B.ta_id, B.type, B.language, B.user_id
FROM B
INNER JOIN C
ON B.tb_id=C.tb_id
INNER Join D
ON B.tb_id=D.tb_id
WHERE C.ta_id = 1
ORDER BY B.user_id
I believe that you can use SQL views easily to query data with lot of tables
You cannot do this in one simpl query, you need a loop. Think about what you are trying to do...
TABLE B ROW 1
TABLE D ROW 1 (Matching Row 1 Table B)
TABLE D ROW 2 (Matching Row 1 Table B)
TABLE D ROW 3 (Matching Row 1 Table B)
TABLE B ROW 2
TABLE D ROW 1 (Matching Row 2 Table B)
TABLE D ROW 2 (Matching Row 2 Table B)
TABLE D ROW 3 (Matching Row 2 Table B)
ETC...
ETC...
The only way you can do this is inside a stored procedure using temp tables and looping.