What are the uses of the different join operations? - sql

What are the uses of the different join operations in SQL? Like I want to know why do we need the different inner and outer joins?

The only type of join you really need is LEFT OUTER JOIN. Every other type of join can be rewritten in terms of one or more left outer joins, and possibly some filtering. So why do we need all the others? Is it just to confuse people? Wouldn't it be simpler if there were only one type of join?
You could also ask: Why have both a <= b and b >= a? Don't these just do the same thing? Can't we just get rid of one of them? It would simplify things!
Sometimes it's easier to swap <= to >= instead of swapping the arguments round. Similarly, a left join and a right join are the same thing just with the operands swapped. But again it's practical to have both options instead of requiring people to write their queries in a specific order.
Another thing you could ask is: In logic why do we have AND, OR, NOT, XOR, NAND, NOR, etc? All these can be rewritten in terms of NANDs! Why not just have NAND? Well it's awkward to write an OR in terms of NANDs, and it's not as obvious what the intention is - if you write OR, people know immediately what you mean. If you write a bunch of NANDs, it is not obvious what you are trying to achieve.
Similarly, if you want to do a FULL OUTER JOIN b you could make a left join and a right join, remove duplicated results, and then union all. But that's a pain and so there's a shorthand for it.
When do you use each one? Here's a simplified rule:
If you always want a result row for each row in the LEFT table, use a LEFT OUTER JOIN.
If you always want a result row for each row in the RIGHT table, use a RIGHT OUTER JOIN.
If you always want a result row for each row in either table, use a FULL OUTER JOIN.
If you only want a result row when there's a row in both tables, use an INNER JOIN.
If you want all possible pairs of rows, one row from each table, use a CROSS JOIN.

inner join - joins rows from both sets of the match based on specified criteria.
outer join - selects all of one set, along with matching or empty (if not matched) elements from the other set. Outer joins can be left or right, to specify which set is returned in its entirety.

To make the other answers clearer - YOU GET DIFFERENT RESULTS according to the join you choose, when the columns you're joining on contain null values - for example.
So - for each Real-life scenario there is a join that suits it (either you want the lines without the data or not in the null values example).

My answer assumes 2 tables joined on a single key:
INNER JOIN - get the results that are in both join tables (according to the join rule)
FULL OUTER JOIN - get all results from both table (Cartesian product)
LEFT OUTER JOIN - get all the results from left table and the matching results from the right
You can add WHERE clauses in order to further constrain the results.
Use these in order to only get what you want to get.

Related

How do I remove duplicate rows on returned related data in SQL?

I have a query that joins many related tables together and as a result, it returns duplicate rows for those items with multiple data against them.
I've searched for answers to this on stack and via google but all the results show things like using 'DISTINCT' or creating a subquery. I can't get any solution to work and I think the confusion I face is because of the many joins I have.
Can someone guide me on how to stop my results shows duplicates? Here is my query so far.
SELECT dbo.Vessel.VesselId,
dbo.Vessel.Name,
dbo.Capacity.DeckAreaM2,
dbo.Vessel.DPClassId,
dbo.Subsea.Accomodation,
dbo.Subsea.RovHangar,
dbo.Crane.SWL
FROM dbo.Vessel INNER JOIN
dbo.Capacity ON dbo.Vessel.VesselId = dbo.Capacity.VesselId LEFT OUTER JOIN
dbo.DeckEquipment ON dbo.Vessel.VesselId = dbo.DeckEquipment.VesselId LEFT OUTER JOIN
dbo.Crane ON dbo.DeckEquipment.DeckEquipmentId = dbo.Crane.DeckEquipmentId LEFT OUTER JOIN
dbo.Subsea ON dbo.Vessel.VesselId = dbo.Subsea.VesselId
First of all, does your query even work? There's no such thing as LEFT INNER JOIN ; you may have an INNER JOIN, a LEFT JOIN, or a LEFT OUTER JOIN, with the latter two being the same.
Second, I can understand your non-willingness to make an additional subquery, but why are you against the DISTINCT operator?
Third, if you use a GROUP BY and put there ONLY the fields you want, it will be equivelant to a DISTINCT operator and will return the results you need.
Last but not least, you need to show us what you are getting and what you want instead, if we are to be able to help you more.

Purpose of joins in structured query language

What is the purpose of joins if we can collect data from multiple tables through
SELECT ,table1.a , table2.b , FROM table1,table2 ...
The syntax you've shown is in fact a join. It's called an implicit join. The join syntax is called an explicit join, and has the same effect, with a few advantages:
It's the standard, ANSI, way of doing things
As the name says, it's explicit - it's easier to understand where the join is, and to separate the join conditions (in the on clause) from the logical conditions.
It's easier to specify different types of joins (inner/outer/cross), which some databases may not allow in the implicit form, at least not with a standard syntax.
A JOIN allows you to return all or selected data from multiple tables into a single temporary table. Using single SELECT commands per table would leave you with multiple datasets rather than a single source.
Joins are useful for bringing data together from different tables based on their database relations.
Try to go on this link to learn much about this topic http://www.codeproject.com/Articles/435694/Understanding-Table-Joins-using-SQL
Comma is from before explicit JOIN syntax. It is a cross join of two tables: all possible combinations of a row from each. WHERE keeps only rows that meet its conditon. Given comma and WHERE and nested SELECTS, the alternative spelling CROSS JOIN and JOIN on another condition in ON is not needed.
However the OUTER JOINs ON a condition result in the rows from comma plus a WHERE-like restiriction to rows satisfying the condition plus the unmatched rows from either the left table (LEFT JOIN), the right table (RIGHT JOIN) or both (FULL JOIN) extended with a row of NULLs for the columns from the other table. This requires specifying a match condition (determining unmatched rows where NULLs are added to the cross join that comma would give) separately from later restriction via WHERE.
But as long as an ON condition is added for OUTER JOINs one might as well allow it for inner joins. It gives the same result as an outer join ON that condition returns less any unmatched rows. That is the same as doing a cross join then (like WHERE) restricting per the condition. So (INNER) JOIN is a comma with its own condition from ON, and CROSS JOIN is comma with no ON. (CROSS JOIN is like (INNER) JOIN ON 1=1.) (Also comma has lower precedence then the explicit non-comma joins.)
TL;DR Comma is low-precedence JOIN; we don't need explicit INNER JOIN or ON; OUTER JOINs need ON distinct from WHERE; we might as well then add explicit inner JOIN ON syntax.

Is the order of joining tables indifferent as long as we chose proper join types?

Can we achieve desired results of joining tables by executing joins in whatever order? Suppose we want to left join two tables A and B (order AB). We can get the same results with right join of B and A (BA).
What about 3 tables ABC. Can we get whatever results by only changing order and joins types? For example A left join B inner join C. Can we get it with BAC order? What about if we have 4 or more tables?
Update.
The question Does the join order matter in SQL? is about inner join type. Agreed that then the order of join doesn't matter. The answer provided in that question does not answer my question whether it is possible to get desired results of joining tables with whatever original join types (join types here) by choosing whatever order of tables we like, and achieve this goal only by manipulating with join types.
In an inner join, the ordering of the tables in the join doesn't matter - the same rows will make up the result set regardless of the order they are in the join statement.
In either a left or right outer join, the order DOES matter. In A left join B, your result set will contain one row for every record in table A, irrespective of whether there is a matching row in table B. If there are non matching rows, this is likely to be a different result set to B left join A.
In a full outer join, the order again doesn't matter - rows will be produced for each row in each joined table no matter what their order.
Regarding A left join B vs B right join A - these will produce the same results. In simple cases with 2 tables, swapping the tables and changing the direction of the outer join will result in the same result set.
This will also apply to 3 or more tables if all of the outer joins are in the same direction - A left join B left join C will give the same set of results as C right join B right join A.
If you start mixing left and right joins, then you will need to start being more careful. There will almost always be a way to make an equivalent query with re-ordered tables, but at that point sub-queries or bracketing off expressions might be the best way to clarify what you are doing.
As another commenter states, using whatever makes your purpose most clear is usually the best option. The ordering of the tables in your query should make little or no difference performance wise, as the query optimiser should work this out (although the only way to be sure of this would be to check the execution plans for each option with your own queries and data).

Inner join or something else?

I'm trying to get 2 results into 1 html table, the results of a user being kicked/banned. A UUID is a unique code for every user.
The UUID is stored in BAT_players.
The player name is also stored in BAT_players
There are 3 table's: BAT_players, BAT_ban and BAT_kick
I'm trying to get the history of a user in a html table, this includes kicks and bans. Right now there are only bans in this history, i'm trying to add kicks too. This query is working fine, it shows only bans though.
SELECT BAT_ban.ban_staff, BAT_ban.ban_state, BAT_ban.ban_server, BAT_ban.ban_begin, BAT_ban.ban_end, BAT_ban.ban_id, BAT_kick.kick_id, BAT_ban.ban_reason, BAT_players.BAT_player, ban_soort
FROM BAT_players
INNER JOIN BAT_ban
ON BAT_ban.UUID=BAT_players.UUID
Unfortunately it is not working with this query, it's giving me an empty history. What am i doing wrong with the second inner join?
SELECT BAT_ban.ban_staff, BAT_ban.ban_state, BAT_ban.ban_server, BAT_ban.ban_begin, BAT_ban.ban_end, BAT_ban.ban_id, BAT_kick.kick_id, BAT_ban.ban_reason, BAT_players.BAT_player, ban_soort
FROM BAT_players
INNER JOIN BAT_ban
ON BAT_ban.UUID=BAT_players.UUID
INNER JOIN BAT_kick
ON BAT_kick.UUID=BAT_players.UUID ORDER BY ban_id DESC
;
Thanks!
The problem is that a "null" on any of your inner joins will eliminate that row from the result set.
One solution (perhaps the best solution) is to use left joins.
Another is to take the UNION of two inner joins.
Here's a great link to help visualize INNER, OUTER, LEFT and RIGHT joins:
http://www.codeproject.com/Articles/33052/Visual-Representation-of-SQL-Joins
Try using LEFT JOIN:
SELECT BAT_ban.ban_staff, BAT_ban.ban_state, BAT_ban.ban_server, BAT_ban.ban_begin, BAT_ban.ban_end, BAT_ban.ban_id, BAT_kick.kick_id, BAT_ban.ban_reason, BAT_players.BAT_player, ban_soort
FROM BAT_players LEFT JOIN
BAT_ban ON BAT_ban.UUID=BAT_players.UUID LEFT JOIN
BAT_kick ON BAT_kick.UUID=BAT_players.UUID
ORDER BY ban_id DESC
To replace NULL values with empty strings (for SQL Server):
SELECT ISNULL(BAT_ban.ban_staff,''), ISNULL(BAT_ban.ban_state,''), ISNULL(BAT_ban.ban_server,''), ISNULL(BAT_ban.ban_begin,''), ISNULL(BAT_ban.ban_end,''), ISNULL(BAT_ban.ban_id,''), ISNULL(BAT_kick.kick_id,''), ISNULL(BAT_ban.ban_reason,''), BAT_players.BAT_player, ban_soort
FROM BAT_players LEFT JOIN
BAT_ban ON BAT_ban.UUID=BAT_players.UUID LEFT JOIN
BAT_kick ON BAT_kick.UUID=BAT_players.UUID
ORDER BY ban_id DESC
For MySQL, replace ISNULL with IFNULL.
An INNER JOIN requires a value in both tables, so if you use it for both kicks and bans, you will only see users who have been both kicked and banned.
To show users who have been either kicked or banned, you need to change both joins to LEFT JOIN, which essentially means "join if you can, but don't discard rows if you can't". That will include users who have been neither kicked nor banned, so you will also want an extra condition in the WHERE clause saying BAT_ban.ban_id IS NOT NULL OR BAT_kick.kick_id IS NOT NULL.
Note that where a user has multiple bans and multiple kicks, this will produce a row for every combination of ban and kick, since there is no rule to determine which ban should line up with which kick.
An alternative is to write two queries, each using INNER JOIN, and then combine the results. If you give them the same number and type of output columns (leaving NULL for those which aren't applicable) you can use UNION to run both and return the complete result set in one go.

When to use SQL natural join instead of join .. on?

I'm studying SQL for a database exam and the way I've seen SQL is they way it looks on this page:
http://en.wikipedia.org/wiki/Star_schema
IE join written the way Join <table name> On <table attribute> and then the join condition for the selection. My course book and my exercises given to me from the academic institution however, use only natural join in their examples. So when is it right to use natural join? Should natural join be used if the query can also be written using JOIN .. ON ?
Thanks for any answer or comment
A natural join will find columns with the same name in both tables and add one column in the result for each pair found. The inner join lets you specify the comparison you want to make using any column.
IMO, the JOIN ON syntax is much more readable and maintainable than the natural join syntax. Natural joins is a leftover of some old standards, and I try to avoid it like the plague.
A natural join will find columns with the same name in both tables and add one column in the result for each pair found. The inner join lets you specify the comparison you want to make using any column.
The JOIN keyword is used in an SQL statement to query data from two or more tables, based on a relationship between certain columns in these tables.
Different Joins
* JOIN: Return rows when there is at least one match in both tables
* LEFT JOIN: Return all rows from the left table, even if there are no matches in the right table
* RIGHT JOIN: Return all rows from the right table, even if there are no matches in the left table
* FULL JOIN: Return rows when there is a match in one of the tables
INNER JOIN
http://www.w3schools.com/sql/sql_join_inner.asp
FULL JOIN
http://www.w3schools.com/sql/sql_join_full.asp
A natural join is said to be an abomination because it does not allow qualifying key columns, which makes it confusing. Because you never know which "common" columns are being used to join two tables simply by looking at the sql statement.
A NATURAL JOIN matches on any shared column names between the tables, whereas an INNER JOIN only matches on the given ON condition.
The joins often interchangeable and usually produce the same results. However, there are some important considerations to make:
If a NATURAL JOIN finds no matching columns, it returns the cross
product. This could produce disastrous results if the schema is
modified. On the other hand, an INNER JOIN will return a 'column does
not exist' error. This is much more fault tolerant.
An INNER JOIN self-documents with its ON clause, resulting in a
clearer query that describes the table schema to the reader.
An INNER JOIN results in a maintainable and reusable query in
which the column names can be swapped in and out with changes in the
use case or table schema.
The programmer can notice column name mis-matches (e.g. item_ID vs itemID) sooner if they are forced to define the ON predicate.
Otherwise, a NATURAL JOIN is still a good choice for a quick, ad-hoc query.