does equi join return duplicate column - sql

Consider following points:
As we know, inner join returns the duplicate column for which we have defined the joining condition.
And we know natural join it removes the duplicate.
I know we can use equi join:
select * from table1, table2 where id1=id2
or
select * from table1 join table2 on id1 = id2
I have confusion in this third point. I want to know what is the scenario in equi join what it returns? Does it return with duplicate column where we have defined the joining condition like inner join.

To answer titled question: Yes, if your query uses SELECT *. While your focus is on the JOIN clause, you are forgetting fundamentals about the SELECT clause in SQL.
By the way your #3 (your definition of equi join) is equivalent to #1 (inner join). In #3, the former using WHERE is the older, de-emphasized (ANSI-89) implicit join style and the latter using JOIN is the current, standard (ANSI-92) explicit join. Both return same results and should render same performance but readability arguably differs. See Explicit vs implicit SQL joins.
The reason you receive duplicate columns for these equi-join cases is the SELECT clause requests for all columns of all joined tables since you do not qualify the asterisk,*:
select * from table1 join table2 on id1 = id2
However, using asterisk alone is an abbreviated, convenience method but the recommended way is to explicitly control your resultset by selecting specific columns. This is crucial for application builds where column order and existence is required and helps in code readability and maintainability. See Why is SELECT * considered harmful?
Below are examples that can be used to avoid duplicate columns: qualifying the asterisks, subsetting/omiting columns, and even renaming/re-ordering columns with aliases .
-- ONLY table1 COLUMNS
select table1.* from table1 join table2 on id1 = id2
-- ONLY table2 COLUMNS
select table2.* from table1 join table2 on id1 = id2
-- SUBSET OF table1 AND table2 COLUMNS
select table1.Col1, table1.Col2, table1.Col3
, table2.Col4, table2.Col5, table2.Col6
from table1
join table2 on id1 = id2
-- RENAMED COLUMNS
select table1.Col1 as t1_col1, table1.Col2 as t1_col2, table1.Col3 as t1_col3
, table2.Col3 as t2_col3, table2.Col2 as t2_col2, table2.Col1 as t2_col1
from table1
join table2 on id1 = id2

Related

Inner Join on Concat

I used this code in SQL Server to join two tables on unique values i made with concat.
My intention was to create unique values with concat function in both tables so I can join them on matching values.
Problem is that query written in this way never never executes (like some sort of infinite loop)
[Table inputs and result]
select t1.AAA, t1.BBB, t1.XXX, t2.YYY
from Table1 t1
inner join Table2 t2
on concat(t1.AAA, t1.BBB)= CONCAT(t2.AAA, t2.BBB)
why concat , this way you lose benefit of optimizer & index ,
you can join on two condition :
select select t1.AAA, t1.BBB, t1.XXX, t2.YYY
from Table1 t1
inner join Table2 t2
on t1.AAA= t2.AAA
and t1.BBB = t2.BBB
The query that you want is:
select t1.AAA, t1.BBB, t1.XXX, t2.YYY
from Table1 t1 inner join
Table2 t2
on t1.AAA = t2.AAA and t1.BBB = t2.BBB;
Then, if performance is a concern, you want an index on Table1(AAA, BBB) or Table2(AAA, BBB) or both. The columns can also be reversed in the indexes.

SQL join to return a table with multiple columns from other tables replacing its own

I am trying to write an SQL query that will return Table1, which has 10 columns. This table consists of a primary key id, 4 foreign key Id columns, and 5 other columns that I want to return but not change. The goal is to do a join to replace the foreign key Ids with their descriptions that are held in other tables.
Here is one attempt with the first FK Id:
Select * from Table1 t1
left join Table2 t2
on t1.BranchId = t2.BranchId;
This left join returns the description from table2, but does not replace it.
Here is another with the first FK Id:
Select t2.BranchName from Table1 t1
left join Table2 t2
on t1.BranchId = t2.BranchId;
This returns the name I want, but does not return table1 fully.
For the sake of an example you could pretend that OtherName3, OtherName4, OtherName5 are in tables Table3, Table4, Table5, respectively.
This may seem trivial for experienced SQL devs, but I am having a hard time figuring out the syntax.
Thanks!
I'm not sure what you mean by replace it.
I think you just need to list out all the columns you want:
Select t1.col1, t1.col2, t1.col3, . . .,
t2.name
from Table1 t1 left join
Table2 t2
on t1.BranchId = t2.BranchId;
I don't know what you mean by 'replace' but you just need to qualify what columns from which table you want. That goes for all tables you are joined to, especially if they have the same column name in multiple tables. I put junk columns in since I don't know your tables but you should get the general idea.
Select t2.BranchName, t1.BranchId, t1.Name, t1.Amount, t2.BranchLocation from Table1 t1
left join Table2 t2
on t1.BranchId = t2.BranchId;
I think this is what you are looking for:
select t1.*, t2.BranchName from Table1 t1
left join Table2 t2
on t1.BranchId = t2.BranchId;
Return Table1 fully (all columns) and only the description (BranchName) from Table2.
If using SQL Server, see all syntax options for the SELECT clause here:
https://msdn.microsoft.com/en-us/library/ms176104.aspx

SQL aggregate function returning inflated values on joined table

I'm breaking my head here where I'm going wrong.
The following query:
SELECT SUM(table1.col1) FROM table1
returns value x.
And the following query:
SELECT SUM(table1.col1) FROM table2 RIGHT OUTER JOIN table1 ON table2.ID = table1.ID
returns value y. (I need the Join for the other data of table2). Why is the 2nd example returning a different value than in the first?
Make life easier on yourself, your colleagues that will support your code, and your clients by temporarily ignoring the existence of RIGHT OUTER JOIN. Use Table1 as the "from table" instead of table2.
Then, If aggregating, you will often find it necessary to do this BEFORE joining, so that the numbers are accurate. e.g.
SELECT T1.SUMCOL1
FROM (
SELECT id, SUM(col1) as SUMCOL1 FROM Table1 GROUP BY id
) T1
LEFT OUTER JOIN table2 T2 on T1.id = T2.ID
Obvious answer is because table2 is many to table1's one. That is, there are multiple rows in table2 for one id in table1. You may also be eliminating rows from table1 if the id isn't present in table2.
Compare:
SELECT COUNT(*) FROM table1
To:
SELECT COUNT(*) FROM table2 RIGHT OUTER JOIN table1 ON table2.ID = table1.ID
If you get different results, you're aggregating duplicates or eliminating rows from table1.
If you want to avoid this, you'll need to use a subquery.

Correct way to select from two tables in SQL Server with no common field to join on

Back in the old days, I used to write select statements like this:
SELECT
table1.columnA, table2.columnA
FROM
table1, table2
WHERE
table1.columnA = 'Some value'
However I was told that having comma separated table names in the "FROM" clause is not ANSI92 compatible. There should always be a JOIN statement.
This leads to my problem.... I want to do a comparison of data between two tables but there is no common field in both tables with which to create a join. If I use the 'legacy' method of comma separated table names in the FROM clause (see code example), then it works perfectly fine. I feel uncomfortable using this method if it is considered wrong or bad practice.
Anyone know what to do in this situation?
Extra Info:
Table1 contains a list of locations in 'geography' data type
Table2 contains a different list of 'geography' locations
I am writing select statement to compare the distances between the locations. As far I know you cant do a JOIN on a geography column??
You can (should) use CROSS JOIN. Following query will be equivalent to yours:
SELECT
table1.columnA
, table2.columnA
FROM table1
CROSS JOIN table2
WHERE table1.columnA = 'Some value'
or you can even use INNER JOIN with some always true conditon:
FROM table1
INNER JOIN table2 ON 1=1
Cross join will help to join multiple tables with no common fields.But be careful while joining as this join will give cartesian resultset of two tables.
QUERY:
SELECT
table1.columnA
, table2,columnA
FROM table1
CROSS JOIN table2
Alternative way to join on some condition that is always true like
SELECT
table1.columnA
, table2,columnA
FROM table1
INNER JOIN table2 ON 1=1
But this type of query should be avoided for performance as well as coding standards.
A suggestion - when using cross join please take care of the duplicate scenarios. For example in your case:
Table 1 may have >1 columns as part of primary keys(say table1_id,
id2, id3, table2_id)
Table 2 may have >1 columns as part of primary keys(say table2_id,
id3, id4)
since there are common keys between these two tables (i.e. foreign keys in one/other) - we will end up with duplicate results. hence using the following form is good:
WITH data_mined_table (col1, col2, col3, etc....) AS
SELECT DISTINCT col1, col2, col3, blabla
FROM table_1 (NOLOCK), table_2(NOLOCK))
SELECT * from data_mined WHERE data_mined_table.col1 = :my_param_value

Is there a logical difference between putting a condition in the ON clause of an inner join versus the where clause of the main query?

Consider these two similar SQLs
(condition in ON clause)
select t1.field1, t2.field1
from
table1 t1 inner join table2 t2 on t1.id = t2.id and t1.boolfield = 1
(condition in WHERE clause)
select t1.field1, t2.field1
from
table1 t1 inner join table2 t2 on t1.id = t2.id
where t1.boolfield = 1
I have tested this out a bit and I can see the difference between putting a condition in the two different places for an outer join.
But in the case of an inner join can the result sets ever be different?
For INNER JOIN, there is no effective difference, although I think the second option is cleaner.
For LEFT JOIN, there is a huge difference. The ON clause specifies which records will be selected from the tables for comparison and the WHERE clause filters the results.
Example 1: returns all the rows from tbl 1 and matches them up with appropriate rows from tbl2 that have boolfield=1
Select *
From tbl1
LEFT JOIN tbl2 on tbl1.id=tbl2.id and tbl2.boolfield=1
Example 2: will only include rows from tbl1 that have a matching row in tbl2 with boolfield=1. It joins the tables, and then filters out the rows that don't meet the condition.
Select *
From tbl1
LEFT JOIN tbl2 on tbl1.id=tbl2.id
WHERE tbl2.boolfield=1
In your specific case, the t1.boolfield specifies an additional selection condition, not a condition for matching records between the two tables, so the second example is more correct.
If you're speaking about the cases when a condition for matching records is put in the ON clause vs. in the WHERE clause, see this question.
Both versions return the same data.
Although this is true for an inner join, it is not true for outer joins.
Stylistically, there is a third possibility. In addition to your two, there is also:
select t1.field1, t2.field1
from (select t1.*
from table1 t1
where t1.boolfield = 1
) t1 inner join
table2 t2
on t1.id = t2.id
Which is preferable all depends on what you want to highlight, so you (or someone else) can later understand and modify the query. I often prefer the third version, because it emphasizes that the query is only using certain rows from the table -- the boolean condition is very close to where the table is specified.
In the other two cases, if you have a long query, it can be problematic to figure out what "t1" really means. I think this is why some people prefer to put the condition in the ON clause. Others prefer the WHERE clause.