How to drop one join key when joining two tables - sql

I have two tables. Both have lot of columns. Now I have a common column called ID on which I would join.
Now since this variable ID is present in both the tables if I do simply this
select a.*,b.*
from table_a as a
left join table_b as b on a.id=b.id
This will give an error as id is duplicate (present in both the tables and getting included for both).
I don't want to write down separately each column of b in the select statement. I have lots of columns and that is a pain. Can I rename the ID column of b in the join statement itself similar to SAS data merge statements?
I am using Postgres.

Postgres would not give you an error for duplicate output column names, but some clients do. (Duplicate names are also not very useful.)
Either way, use the USING clause as join condition to fold the two join columns into one:
SELECT *
FROM tbl_a a
LEFT JOIN tbl_b b USING (id);
While you join the same table (self-join) there will be more duplicate column names. The query would make hardly any sense to begin with. This starts to make sense for different tables. Like you stated in your question to begin with: I have two tables ...
To avoid all duplicate column names, you have to list them in the SELECT clause explicitly - possibly dealing out column aliases to get both instances with different names.
Or you can use a NATURAL join - if that fits your unexplained use case:
SELECT *
FROM tbl_a a
NATURAL LEFT JOIN tbl_b b;
This joins on all columns that share the same name and folds those automatically - exactly the same as listing all common column names in a USING clause. You need to be aware of rules for possible NULL values ...
Details in the manual.

Related

Best way to combine two tables, remove duplicates, but keep all other non-duplicate values in SQL

I am looking for the best way to combine two tables in a way that will remove duplicate records based on email with a priority of replacing any duplicates with the values in "Table 2", I have considered full outer join and UNION ALL but Union all will be too large as each table has several 1000 columns. I want to create this combination table as my full reference table and save as a view so I can reference it without always adding a union or something to that effect in my already complex statements. From my understanding, a full outer join will not necessarily remove duplicates. I want to:
a. Create table with ALL columns from both tables (fields that don't apply to records in one table will just have null values)
b. Remove duplicate records from this master table based on email field but only remove the table 1 records and keep the table 2 duplicates as they have the information that I want
c. A left-join will not work as both tables have unique records that I want to retain and I would like all 1000+ columns to be retained from each table
I don't know how feasible this even is but thank you so much for any answers!
If I understand your question correctly you want to join two large tables with thousands of columns that (hopefully) are the same between the two tables using the email column as the join condition and replacing duplicate records between the two tables with the records from Table 2.
I had to do something similar a few days ago so maybe you can modify my query for your purposes:
WITH only_in_table_1 AS(
SELECT *
FROM table_1 A
WHERE NOT EXISTS
(SELECT * FROM table_2 B WHERE B.email_field = A.email_field))
SELECT * FROM table_2
UNION ALL
SELECT * FROM only_in_table_1
If the columns/fields aren't the same between tables you can use a full outer join on only_in_table_1 and table_2
try using a FULL OUTER JOIN between the two tables and then a COALESCE function on each resultset column to determine from which table/column the resultset column is populated

Why does FULL JOIN order make a difference in these queries?

I'm using PostgreSQL. Everything I read here suggests that in a query using nothing but full joins on a single column, the order of tables joined basically doesn't matter.
My intuition says this should also go for multiple columns, so long as every common column is listed in the query where possible (that is, wherever both joined tables have the column in common). But this is not the case, and I'm trying to figure out why.
Simplified to three tables a, b, and c.
Columns in table a: id, name_a
Columns in table b: id, id_x
Columns in table c: id, id_x
This query:
SELECT *
FROM a
FULL JOIN b USING(id)
FULL JOIN c USING(id, id_x);
returns a different number of rows than this one:
SELECT *
FROM a
FULL JOIN c USING(id)
FULL JOIN b USING(id, id_x);
What I want/expect is hard to articulate, but basically, a I'd like a "complete" full merger. I want no null fields anywhere unless that is unavoidable.
For example, whenever there is a not-null id, I want the corresponding name column to always have the name_a and not be null. Instead, one of those example queries returns semi-redundant results, with one row having a name_a but no id, and another having an id but no name_a, rather than a single merged row.
When the joins are listed in the other order, I do get that desired result (but I'm not sure what other problems might occur, because future data is unknown).
Your queries are different.
In the first, you are doing a full join to b using a single column, id.
In the second, you are doing a full join to b using two columns.
Although the two queries could return the same results under some circumstances, there is not reason to think that the results would be comparable.
Argument order matters in OUTER JOINs, except that FULL NATURAL JOIN is symmetric. They return what an INNER JOIN (ON, USING or NATURAL) does but also the unmatched rows from the left (LEFT JOIN), right (RIGHT JOIN) or both (FULL JOIN) tables extended by NULLs.
USING returns the single shared value for each specified column in INNER JOIN rows; in NULL-extended rows another common column can have NULL in one table's version and a value in the other's.
Join order matters too. Even FULL NATURAL JOIN is not associative, since with multiple tables each pair of tables (either operand being an original or join result) can have a unique set of common columns, ie in general (A ⟗ B) ⟗ C ≠ A ⟗ (B ⟗ C).
There are a lot of special cases where certain additional identities hold. Eg FULL JOIN USING all common column names and OUTER JOIN ON equality of same-named columns are symmetric. Some cases involve CKs (candidate keys), FKs (foreign keys) and other constraints on arguments.
Your question doesn't make clear exactly what input conditions you are assuming or what output conditions you are seeking.

What does it mean to INNER JOIN before an INSERT?

I have the following case where I'm doing an insert into a table, however, before I can do that, I to grab a foreign key ID that's associated with another table. That foreign key ID is not a simply look up, but rather requires an INNER JOIN of two other tables to be able to get that ID.
So, what I'm currently doing is the following:
Inner joining A, B and grabbing the ID that I need.
Once I resolve the value from above, I insert into table C with
the foreign key that I got from step 1.
Now, I was wondering if there is a better way for doing this. Could I do the join of table A and B and insert into table C all in one statement? This is where I was getting confused on what it means to INNER JOIN across tables and then INSERT. Are you potentially inserting into multiple tables?
You can use the insert-select syntax to insert the results of a query (which may or may not involve a join) to another table. E.g.:
INSERT INTO C
SELECT col_from_a, col_from_b
FROM a
JOIN b ON a.id = b.id

How does JOIN work exactly in SQL

I know that joins work by combining two or more tables by their attributes, so if you have two tables that both have three columns and both have column INDEX, if you use table1 JOIN table2 you will get a new table with 5 columns, but what if you do not have a column that is shared by both table1 and table2? Can you still use JOIN or do you have to use TIMES?
Join is not a method for combining tables. It is a method to select records (and selected fields) from 2 or more tables where every table in the query must carry a field that can be matched to a field in another table in the query. The matched fields need not have the same name, but must carry the same type of data. Lacking this would be like trying to create meaning from joining a list of license plates of cars in NYC, with height data from lumberjacks in Washington state -- not meaningful.
Ex:)
Select h.name, h.home_address, h.home_phone, w.work_address,
w.department
from home h, work w
where h.employee_id = w.emp_id
As long as both columns: employee_id and emp_id carry the same information this query will work
In Microsoft Access, to get five rows from a three column table joined to a two column table, you'd use:
SELECT Table1.*, Table2.* FROM Table1 INNER JOIN Table2 ON Table1.Field1 = Table2.Field1;
You can query whatever you want, and join whatever you want, though.
If your one table is a list of people, and your other is a list of cars, and you want to see what people have names that are also models of cars, you can do:
SELECT Table1.Name, Table1.Age, Table2.Make, Table2.Year
FROM Table1 INNER JOIN Table2 ON Table1.Name = Table2.Model;
Only when Name is the same as Model will it show a record.
This is the same idea for joining tables in any relational DBMS I've used.
You are right you can join two tables even if they do not have shared column.
Join uses primary to prevent mistakes on inserting or deleting when user trying to insert record that does not has a parent one or some thing like this.
join methods has many types you can view them here:
http://dev.mysql.com/doc/refman/5.7/en/join.html
LEFT JOIN: select all records from first table, then selecting all records from second table that fulfilling the condition after ON clause.
you can't join the tables if they do not share a common column. If you can find a 3rd table that has common columns with table1 and table2 you can get them to join that way. so join table2 and tabl3 on a common column and than join table3 back to table1 on a common column.

Column in field list is ambiguous error

i've been recently working in mysql and in one of the requests i wrote :
SELECT SIGLE_EEP, ID_SOUS_MODULE, LIBELLE
FROM mef_edi.eep a, mef_edi.envoi e, mef_edi.sous_module s
WHERE a.ID_EEP = e.ID_EEP
AND a.ID_SOUS_MODULE = s.ID_SOUS_MODULE;
and they told me :
Column ID_SOUS_MODULE in field list is ambiguous
What should i do ?
More than one table has a column named ID_SOUS_MODULE.
So you need to name the table every time you mention the column to specify which table you mean.
Change
SELECT ID_SOUS_MODULE
for instance to
SELECT a.ID_SOUS_MODULE
I agree with the answer above, you may have duplicate column names across your 3 tables, assigning the table id (a, e, s) as noted above will avoid that issue in the select. In addition to what #juergen said you may want to get rid of that cartesian join by using an inner or left join (inner seems to be what your going for). The way you are joining your table you are joining every possible combination of rows together than filtering. using a proper join will get you better performance in the long run as your table line counts grow. Here is an example of a non cartesian join:
SELECT SIGLE_EEP, ID_SOUS_MODULE, LIBELLE
FROM mef_edi.eep a
INNER JOIN mef_edi.envoi e ON (a.ID_EEP = e.ID_EEP)
INNER JOIN mef_edi.sous_module s ON (a.ID_SOUS_MODULE = s.ID_SOUS_MODULE)