What is the difference between `FROM _ , _` and `FROM _ INNER JOIN _ ON _` - sql

What is the Difference between the two?
SELECT [EmployeeList].[Emp_ID], [EmployeeLevel].[LevelPosition]
FROM [EmployeeList], [EmployeeLevel]
SELECT [EmployeeList].[Emp_ID], [EmployeeLevel].[LevelPosition]
FROM EmployeeList INNER JOIN EmployeeLevel ON
[EmployeeList].[LevelID] = [EmployeeLevel].[LevelID]
regardless of the field names.

The first one is not correlated in any way and will return a cross join / cartesian join with every permutation of rows from the 2 tables. You would need to add a WHERE clause
SELECT [EmployeeList].[Emp_ID], [EmployeeLevel].[LevelPosition]
FROM [EmployeeList], [EmployeeLevel]
WHERE [EmployeeList].[LevelID] = [EmployeeLevel].[LevelID]
Then they would be semantically the same but the above is the old style ANSI syntax and is largely discouraged by everyone except Joe Celko due to being less clear and the possibility of causing inadvertent cartesian Joins (as well as being more work to change if you want to convert to an outer join).

Implicit Cross Join vs. Inner Join
http://explainextended.com/2009/07/16/inner-join-vs-cross-apply/
http://en.wikipedia.org/wiki/Join_(SQL)

First one you're selecting a list of employees ids and a list of level positions. (with no necessarily relationship between them).
On the second one you're doing a join. You're relating the list of employees (given its level) with the employees with the same level (see the ON condition).

The first is a Cartesian Product join which does not care about matching rows between the two tables. It creates a list of every combination of row from table1 combined with every row from table2. Unless there is a specific case requiring it, this kind of query produces way too much data of little meaning.
The second matches rows with equivalent IDs.

First one shows all data from EmployeeList and EmployeeLevel table...
In second query, there are a rlation between EmployeeList and EmployeeLevel by LevelID and INNER JOIN means must match between EmployeeList.LevelID and EmployeeLevel.LevelID...
So second query shows data if match both LevelID....

Related

How to avoid duplicated rows when joining multiple tables?

I am trying to SELECT the data from the education and experience tables. Both have two entries for the given candidate_id. When I try using GROUP BY and json_agg, I get four rows in the aggregated JSON values. What am I doing wrong? I want two education objects and two experience objects in their respective arrays.
SQL:
SELECT
json_agg(education) as education,
json_agg(experience) as experience
FROM application
LEFT JOIN candidate ON application.candidate_id = candidate.id
LEFT JOIN education ON candidate.id = education.candidate_id
LEFT JOIN experience ON candidate.id = experience.candidate_id
WHERE application.candidate_id = 2
GROUP BY education.candidate_id, experience.candidate_id;
Result:
education
[{"id":3,"candidate_id":2,"school":"school1 candidate2","qualification":"qualification1 candidate2","dates":"dates1 candidate2","note":null},
{"id":3,"candidate_id":2,"school":"school1 candidate2","qualification":"qualification1 candidate2","dates":"dates1 candidate2","note":null},
{"id":4,"candidate_id":2,"school":"school2 candidate2","qualification":"qualification2 candidate2","dates":"dates2 candidate2","note":null},
{"id":4,"candidate_id":2,"school":"school2 candidate2","qualification":"qualification2 candidate2","dates":"dates2 candidate2","note":null}]
experience
[{"id":3,"candidate_id":2,"employer":"emploer1 candidate2","title":"title1 candidate2","dates":"dates1 candidate2","job_duties":"duties1 candidate2"},
{"id":4,"candidate_id":2,"employer":"emploer2 candidate2","title":"title2 candidate2","dates":"dates2 candidate2","job_duties":"duties2 candidate2"},
{"id":3,"candidate_id":2,"employer":"emploer1 candidate2","title":"title1 candidate2","dates":"dates1 candidate2","job_duties":"duties1 candidate2"},
{"id":4,"candidate_id":2,"employer":"emploer2 candidate2","title":"title2 candidate2","dates":"dates2 candidate2","job_duties":"duties2 candidate2"}]
I tried multiple variants of this query ...
Multiple joins that do not (also) associate rows among the joined table rows effectively act like CROSS JOIN by proxy, multiplying rows. See:
Two SQL LEFT JOINS produce incorrect result
Aggregate before joining (so that only a single row per parent row remains, hence no duplication). Or use lowly correlated subqueries for this simple case. Well, not even correlated for just your single candidate_id, rather plain suquery expressions in the SELECT list:
SELECT (SELECT json_agg(e.*)
FROM education e
WHERE e.candidate_id = 2) AS education
, (SELECT json_agg(e.*)
FROM experience e
WHERE e.candidate_id = 2) AS experience
WHERE EXISTS (SELECT FROM application a WHERE a.candidate_id = 2);
I removed the table candidate from your query, which was dead freight (unless you must verify that a related row exists in that table), but might additionally multiply rows in the same way.
And the table application only needs to be checked for the existence of any qualifying rows.
You might alternatively use (LATERAL) subqueries for more complex cases. (I suspect you over-simplified.) See:
How to SUM numbers from a plain jsonb array?

Semi-join vs Subqueries

What is the difference between semi-joins and a subquery? I am currently taking a course on this on DataCamp and i'm having a hard time making a distinction between the two.
Thanks in advance.
A join or a semi join is required whenever you want to combine two or more entities records based on some common conditional attributes.
Unlike, Subquery is required whenever you want to have a lookup or a reference on same table or other tables
In short, when your requirement is to get additional reference columns added to existing tables attributes then go for join else when you want to have a lookup on records from the same table or other tables but keeping the same existing columns as o/p go for subquery
Also, In case of semi join it can act/used as a subquery because most of the times we dont actually join the right table instead we mantain a check via subquery to limit records in the existing hence semijoin but just that it isnt a subquery by itself
I don't really think of a subquery and a semi-join as anything similar. A subquery is nothing more interesting than a query that is used inside another query:
select * -- this is often called the "outer" query
from (
select columnA -- this is the subquery inside the parentheses
from mytable
where columnB = 'Y'
)
A semi-join is a concept based on join. Of course, joining tables will combine both tables and return the combined rows based on the join criteria. From there you select the columns you want from either table based on further where criteria (and of course whatever else you want to do). The concept of a semi-join is when you want to return rows from the first table only, but you need the 2nd table to decide which rows to return. Example: you want to return the people in a class:
select p.FirstName, p.LastName, p.DOB
from people p
inner join classes c on c.pID = p.pID
where c.ClassName = 'SQL 101'
group by p.pID
This accomplishes the concept of a semi-join. We are only returning columns from the first table (people). The use of the group by is necessary for the concept of a semi-join because a true join can return duplicate rows from the first table (depending on the join criteria). The above example is not often referred to as a semi-join, and is not the most typical way to accomplish it. The following query is a more common method of accomplishing a semi-join:
select FirstName, LastName, DOB
from people
where pID in (select pID
from class
where ClassName = 'SQL 101'
)
There is no formal join here. But we're using the 2nd table to determine which rows from the first table to return. It's a lot like saying if we did join the 2nd table to the first table, what rows from the first table would match?
For performance, exists is typically preferred:
select FirstName, LastName, DOB
from people p
where exists (select pID
from class c
where c.pID = p.pID
and c.ClassName = 'SQL 101'
)
In my opinion, this is the most direct way to understand the semi-join. There is still no formal join, but you can see the idea of a join hinted at by the usage of directly matching the first table's pID column to the 2nd table's pID column.
Final note. The last 2 queries above each use a subquery to accomplish the concept of a semi-join.

When to use SQL natural join instead of join .. on?

I'm studying SQL for a database exam and the way I've seen SQL is they way it looks on this page:
http://en.wikipedia.org/wiki/Star_schema
IE join written the way Join <table name> On <table attribute> and then the join condition for the selection. My course book and my exercises given to me from the academic institution however, use only natural join in their examples. So when is it right to use natural join? Should natural join be used if the query can also be written using JOIN .. ON ?
Thanks for any answer or comment
A natural join will find columns with the same name in both tables and add one column in the result for each pair found. The inner join lets you specify the comparison you want to make using any column.
IMO, the JOIN ON syntax is much more readable and maintainable than the natural join syntax. Natural joins is a leftover of some old standards, and I try to avoid it like the plague.
A natural join will find columns with the same name in both tables and add one column in the result for each pair found. The inner join lets you specify the comparison you want to make using any column.
The JOIN keyword is used in an SQL statement to query data from two or more tables, based on a relationship between certain columns in these tables.
Different Joins
* JOIN: Return rows when there is at least one match in both tables
* LEFT JOIN: Return all rows from the left table, even if there are no matches in the right table
* RIGHT JOIN: Return all rows from the right table, even if there are no matches in the left table
* FULL JOIN: Return rows when there is a match in one of the tables
INNER JOIN
http://www.w3schools.com/sql/sql_join_inner.asp
FULL JOIN
http://www.w3schools.com/sql/sql_join_full.asp
A natural join is said to be an abomination because it does not allow qualifying key columns, which makes it confusing. Because you never know which "common" columns are being used to join two tables simply by looking at the sql statement.
A NATURAL JOIN matches on any shared column names between the tables, whereas an INNER JOIN only matches on the given ON condition.
The joins often interchangeable and usually produce the same results. However, there are some important considerations to make:
If a NATURAL JOIN finds no matching columns, it returns the cross
product. This could produce disastrous results if the schema is
modified. On the other hand, an INNER JOIN will return a 'column does
not exist' error. This is much more fault tolerant.
An INNER JOIN self-documents with its ON clause, resulting in a
clearer query that describes the table schema to the reader.
An INNER JOIN results in a maintainable and reusable query in
which the column names can be swapped in and out with changes in the
use case or table schema.
The programmer can notice column name mis-matches (e.g. item_ID vs itemID) sooner if they are forced to define the ON predicate.
Otherwise, a NATURAL JOIN is still a good choice for a quick, ad-hoc query.

joining more tables makes me get less data from the query

I have one problem with one SP, when I am joining some specific tables, I am getting less data from the SP then I am getting when they are not included in the SP.
I am not getting any data from them yet, I am just joining them and only that makes the SP to send me less data.
Any idea what the problem can be? Thanks
It sounds like there are no matching rows in the tables you're joining to.
If you change the join to a LEFT OUTER JOIN, you should get the rows you are expecting (but, obviously, check the output to make sure you do!)
Joins usually have a join condition in the "ON" clause. That condition says how to match rows between the tables being joined. If, for a particular row on one side, there is no matching row on the other side, then we need to consider what type of join we're dealing with:
For an INNER JOIN, the row will be discarded.
For a LEFT JOIN, and if the row comes from the "LEFT" table source, then this row will appear, but with NULLs present for all columns from the "RIGHT" table source.
RIGHT JOIN is similar to left join, with the directions swapped over.
regardless if you are asking for data from those tables or not, they have been included as part of the join, and the resultset is going to return rows that meet the criteria you have specified.
You may want to post 2 versions of the SP, one with few tables, then a second with one or more of the joins so that it can be better explained what is happening behind the scenes.
Sounds like you're doing an INNER JOIN. This will only return a record if whatever property you're joining on is in both tables. As an example:
Customers
ID Name
1 Mike
2 Steve
3 Amy
Address
ID Address
1 123 Main
3 456 Oak
If you
SELECT Name, Adddress FROM NAME N
Join Address A ON N.Id = A.Id
Only The records for Mike and Amy will be returned because Steve doesn't have an Address record.
I don't know what kind of join you are doing but it exists three kind of join
INNER JOIN : Retrieves datas that matches on both side
LEFT (OUTER) JOIN : Retrieves datas that only match on the left side, even if right is null
RIGHT (OUTER) JOIN Retrieves datas that only mach on the right side, even if left is null
According to which one you are using, datas can be retrieved or not.
But posting your query will let us tell you what might be the real problem.
Hope I could help,
See this blog post by Jeff Atwood. It explains SQL Joins very well. I think it will answer your question about why certain sets of data may or may not be missing.

What are the uses of the different join operations?

What are the uses of the different join operations in SQL? Like I want to know why do we need the different inner and outer joins?
The only type of join you really need is LEFT OUTER JOIN. Every other type of join can be rewritten in terms of one or more left outer joins, and possibly some filtering. So why do we need all the others? Is it just to confuse people? Wouldn't it be simpler if there were only one type of join?
You could also ask: Why have both a <= b and b >= a? Don't these just do the same thing? Can't we just get rid of one of them? It would simplify things!
Sometimes it's easier to swap <= to >= instead of swapping the arguments round. Similarly, a left join and a right join are the same thing just with the operands swapped. But again it's practical to have both options instead of requiring people to write their queries in a specific order.
Another thing you could ask is: In logic why do we have AND, OR, NOT, XOR, NAND, NOR, etc? All these can be rewritten in terms of NANDs! Why not just have NAND? Well it's awkward to write an OR in terms of NANDs, and it's not as obvious what the intention is - if you write OR, people know immediately what you mean. If you write a bunch of NANDs, it is not obvious what you are trying to achieve.
Similarly, if you want to do a FULL OUTER JOIN b you could make a left join and a right join, remove duplicated results, and then union all. But that's a pain and so there's a shorthand for it.
When do you use each one? Here's a simplified rule:
If you always want a result row for each row in the LEFT table, use a LEFT OUTER JOIN.
If you always want a result row for each row in the RIGHT table, use a RIGHT OUTER JOIN.
If you always want a result row for each row in either table, use a FULL OUTER JOIN.
If you only want a result row when there's a row in both tables, use an INNER JOIN.
If you want all possible pairs of rows, one row from each table, use a CROSS JOIN.
inner join - joins rows from both sets of the match based on specified criteria.
outer join - selects all of one set, along with matching or empty (if not matched) elements from the other set. Outer joins can be left or right, to specify which set is returned in its entirety.
To make the other answers clearer - YOU GET DIFFERENT RESULTS according to the join you choose, when the columns you're joining on contain null values - for example.
So - for each Real-life scenario there is a join that suits it (either you want the lines without the data or not in the null values example).
My answer assumes 2 tables joined on a single key:
INNER JOIN - get the results that are in both join tables (according to the join rule)
FULL OUTER JOIN - get all results from both table (Cartesian product)
LEFT OUTER JOIN - get all the results from left table and the matching results from the right
You can add WHERE clauses in order to further constrain the results.
Use these in order to only get what you want to get.