Is there a way to print all of the rows from two tables using full outer join? - sql

Here there are two tables. Table A and Table B I tried joining these two tables using the outer join to get all of the rows which is the resultant_table from both tables and it isn't working for some reason the screenshot at the end shows the error that I'm getting when I happen to run the query. I wanted the output as showed in the resultant table.
Here is the script that i used,
SELECT table_b.date,
table_b.student,
table_b.location,
table_b.sub_division,
table_a.part_time_pay,
table_b.days_worked
FROM table_a
FULL OUTER JOIN table_b
ON table_a.date = table_b.date
AND table_a.student = table_b.student;

It is doing exactly what you specify. Use coalesce() to combine values from the two tables:
SELECT COALESCE(a.date, b.date) as date,
COALESCE(a.student, b.student) as student,
b.location, b.sub_division,
a.part_time_pay, b.days_worked
FROM table_a a FULL JOIN
table_b b
ON a.date = b.date AND
a.student = b.student;
I'm not sure how you want to handle LOCATION, and SUBDIVISION. What if they have different values? I might think you want to put them in the JOIN conditions and then:
SELECT COALESCE(a.date, b.date) as date,
COALESCE(a.student, b.student) as student,
COALESCE(a.location, b.location) as location,
COALESCE(a.sub_division, b.sub_division) as sub_division,
a.part_time_pay, b.days_worked
FROM table_a a FULL JOIN
table_b b
ON a.date = b.date AND
a.student = b.student AND
a.location = b.location AND
a.sub_division = b.sub_division;

Related

Full outer join acts like inner join with multiple conditions on the two tables

I am trying to have a full outer join between two tables Table1 and Table2 on ID with a query like the following in Teradata. The problem is it acts like inner join.
SELECT *
FROM Table1 AS a
FULL OUTER JOIN Table2 AS b
ON a.ID = b.ID
WHERE a.country in ('US','FR')
AND a.create_date = '2021-01-01'
AND b.country IN ('US','DE','BE')
AND b.create_date = '2021-01-01';
What I want is something like this:
SELECT * FROM
(
SELECT * FROM Table1 as a
WHERE a.country in ('US','FR')
AND a.create_date = '2021-01-01'
) as ax
FULL OUTER JOIN
(
SELECT * FROM Table2 as b
WHERE b.country IN ('US','DE','BE')
AND b.create_date = '2021-01-01'
) as bx
ON ax.ID=bx.ID;
I feel like the second query is not best practice, maybe inefficient and/or hard to read in complicated cases. How can I modify the first query to get the desired output?
I know that this is a fundamental problem and probably there are many other ways to do it (e.g. with USING, HAVING etc) but could not find a basic explanation. Would appreciate a comprehensive answer on alternative solutions as a guide for future reference.
EDIT
The difference in my question to Left Join With Where Clause is that I require a condition in both tables. I cannot figure out where to put the second WHERE condition.
The short answer: Both sets of predicates belong in the ON clause.
SELECT *
FROM Table1 AS a
FULL OUTER JOIN Table2 AS b
ON a.ID = b.ID
AND a.country in ('US','FR')
AND a.create_date = '2021-01-01'
AND b.country IN ('US','DE','BE')
AND b.create_date = '2021-01-01';
The ON clause both limits the rows that are eligible to participate in the join (pre-join filtering) and specifies how to match rows (join criteria). The WHERE clause filters results (after the join).
A generally less-desirable alternative would be to modify the predicates so as not to filter out the non-matching rows, e.g. assuming ID is NOT NULL in both tables
SELECT *
FROM Table1 AS a
FULL OUTER JOIN Table2 AS b
ON a.ID = b.ID
WHERE (a.country in ('US','FR')
AND a.create_date = '2021-01-01'
OR a.ID IS NULL)
AND (b.country IN ('US','DE','BE')
AND b.create_date = '2021-01-01'
OR b.ID IS NULL);
Logically the ON and WHERE work the same way for INNER JOIN but in that case the net result is the same (and many databases including Teradata will generate the same query plan for INNER JOIN regardless of where you put the filter predicates).

How to join/merge date column from different tables in SQL

I have 3 different CTEs(a,b,c) which have columns date, id, and value. Only the value column has different values in it. Date and Id will have the same values in it.
select a.date,a.id,a.value,b.date,b.id,b.value,c.date,c.id,c.value
from table_a a
full outer join table_b on b.id = a.id
full outer join table_c on c.id = b.id
The above code gave the following outputenter image description here
I want the following output, where the date column is merged and acts as an index
enter image description here
how do I get the output in the second picture.
Use coalesce, typicall when your work with full join
select
coalesce(a.date, b.date, c.date) as ddate,
a.id,a.value,
b.id,b.value,
c.id,c.value
from table_a a
full outer join table_b on b.id = a.id
full outer join table_c on c.id = b.id

SQL function to create a one-to-one match between two tables?

I am trying to join 2 tables. Table_A has ~145k rows whereas Table_B has ~205k rows.
They have two columns in common (i.e. ISIN and date). However, when I execute this query:
SELECT A.*,
B.column_name
FROM Table_A
JOIN
Table_B ON A.date = B.date
WHERE A.isin = B.isin
I get a table with more than 147k rows. How is it possible? Shouldn't it return a table with at most ~145k rows?
What you are seeing indicates that, for some of the records in Table_A, there are several records in Table_B that satisfy the join conditions (equality on the (date, isin) tuple).
To exhibit these records, you can do:
select B.date, B.isin
from Table_A
join Table_B on A.date = B.date and A.isin = B.isin
group by B.date, B.isin
having count(*) > 1
It's up to you to define how to handle those duplicates. For example:
if the duplicates have different values in column column_name, then you can decide to pull out the maximum or minimum value
or use another column to filter on the top or lower record within the duplicates
if the duplicates are true duplicates, then you can use select distinct in a subquery to dedup them before joining
... other solutions are possible ...
If you want one row per table A, then use outer apply:
SELECT A.*,
B.column_name
FROM Table_A a OUTER APPLY
(SELECT TOP (1) b.*
FROM Table_B b
WHERE A.date = B.date AND A.isin = B.isin
ORDER BY ? -- you can specify *which* row you want when there are duplicates
) b;
OUTER APPLY implements a lateral join. The TOP (1) ensures that at most one row is returned. The OUTER (as opposed to CROSS) ensures that nothing is filtered out. In this case, you could also phrase it as a correlated subquery.
All that said, your data does not seem to be what you really expect. You should figure out where the duplicates are coming from. The place to start is:
select b.date, b.isin, count(*)
from tableb b
group by b.date, b.isin
having count(*) >= 2;
This will show you the duplicates, so you can figure out what to do about them.
Duplicate possibilities is already discuss.
When millions of records are use in join then often due to poor Cardianility Estimate,
record return are not accurate.
For this just change join order,
SELECT A.*,
B.column_name
FROM Table_A
JOIN
Table_B ON A.isin = B.isin
and
A.date = B.date
Also create non clustered index on both table.
Create NonClustered index isin_date_table_A on Table_A(isin,date)include(*Table_A)
*Table_A= comma seperated list Table_A column which is require in resultset
Create NonClustered index isin_date_table_B on Table_B(isin,date)include(column_nameA)
Update STATISTICS Table_A
Update STATISTICS Table_B
Keeping the DATE columns of both tables in the same format in the JOIN condition you should be getting the result as expected.
Select A.*, B.column_name
from Table_A
join Table_B on to_date(a.date,'DD-MON-YY') = to_date(b.date,'DD-MON-YY')
where A.isin = B.isin

Using date condition in left outer join when date value is coming from different table

Table Date_User has one column RUNDATE1, which store date (example - 2016-01-01). This date can be changed based on user request.
I am writing below query and it is throwing error:
Select A.NAME, B.DEPT_NAME, B.START_DATE, B.END_DATE
FROM TABLE A LEFT OUTER JOIN
TABLE B
ON A.DEPT_ID = B.DEPT_ID AND
DATE_USER.RUNDATE1 BETWEEN B.START_DATE AND B.END_DATE;
Above query is throwing error, because table Date_User is nowhere used in left outer join.
Could anyone please suggest how to modify the query.
Note: This is only sample query; Original query has 10 left outer join with similar type of date condition needed.
If DATE_USER is neither TABLE A nor TABLE B, then you need to do an additional JOIN to it.
You can try:
Select A.NAME, B.DEPT_NAME, B.START_DATE, B.END_DATE
FROM TABLE A LEFT OUTER JOIN
TABLE B
ON A.DEPT_ID = B.DEPT_ID AND
LEFT JOIN DATE_USER ON 1 = 1 AND
DATE_USER.RUNDATE1 BETWEEN B.START_DATE AND B.END_DATE;

Join Tables on Date Range in Hive

I need to join tableA to tableB on employee_id and the cal_date from table A need to be between date start and date end from table B. I ran below query and received below error message, Would you please help me to correct and query. Thank you for you help!
Both left and right aliases encountered in JOIN 'date_start'.
select a.*, b.skill_group
from tableA a
left join tableB b
on a.employee_id= b.employee_id
and a.cal_date >= b.date_start
and a.cal_date <= b.date_end
RTFM - quoting LanguageManual Joins
Hive does not support join conditions that are not equality conditions
as it is very difficult to express such conditions as a map/reduce
job.
You may try to move the BETWEEN filter to a WHERE clause, resulting in a lousy partially-cartesian-join followed by a post-processing cleanup. Yuck. Depending on the actual cardinality of your "skill group" table, it may work fast - or take whole days.
If your situation allows, do it in two queries.
First with the full join, which can have the range; Then with an outer join, matching on all the columns, but include a where clause for where one of the fields is null.
Ex:
create table tableC as
select a.*, b.skill_group
from tableA a
, tableB b
where a.employee_id= b.employee_id
and a.cal_date >= b.date_start
and a.cal_date <= b.date_end;
with c as (select * from TableC)
insert into tableC
select a.*, cast(null as string) as skill_group
from tableA a
left join c
on (a.employee_id= c.employee_id
and a.cal_date = c.cal_date)
where c.employee_id is null ;
MarkWusinich had a great solution but with one major issue. If table a has an employee ID twice within the date range table c will also have that employee_ID twice (if b was unique if not more) creating 4 records after the join. As such if A is not unique on employee_ID a group by will be necessary. Corrected below:
with C as
(select a.employee_id, b.skill_group
from tableA a
, tableB b
where a.employee_id= b.employee_id
and a.cal_date >= b.date_start
and a.cal_date <= b.date_end
group by a.employee_id, b.skill_group
) C
select a.*, c.skill_group
from tableA a
left join c
on a.employee_id = c.employee_id
and a.cal_date = c.cal_date;
Please note: If B was somehow intentionally not distinct on (employee_id, skill_group), then my query above would also have to be modified to appropriately reflect that.