Need help in understanding JOINS in SQL - sql

I was asked the below SQL question in an interview. Kindly explain how it works and what join it is.
Q: There are two tables: table emp contains 10 rows and table department contains 12 rows.
Select * from emp,department;
What is the result and what join it is?

It would return the Cartesian Product of the two tables, meaning that every combination of emp and department would be included in the result.
I believe that the next question would be:
Blockquote
How do you show the correct department for each employee?
That is, show only the combination of emp and department where the employee belongs to the department.
This can be done by:
SELECT * FROM emp LEFT JOIN department ON emp.department_id=department.id;
Assuming that emp has a field called department_id, and department has a matching id field (This is quite standard in these type of questions).
The LEFT JOIN means that all items from the left side (emp) will be included, and each employee will be matched with the corresponding department. If no matching department is found, the resulting fields from departments will remain empty. Note that exactly 10 rows will be returned.
To show only the employees with valid department IDs, use JOIN instead of LEFT JOIN. This query will return 0 to 10 rows, depending on the number of matching department ids.

The join you specified is a cross join. It will produce one row for each combination of records in the tables being joined.
I'll let you do the math from there.

This will do a cross join I believe, returning 120 rows. One row for each pair-wise combination of rows from each of the two tables.
All-in-all a fairly useless join most of the time.

You will get all rows from both tables with each row joined together.
This is known as a Cartesian join and is very bad.
You will get a total of 120 rows.

This is also the old implied syntax (18 yeasr out of date) and accidental cross joins are a common problem with this syntax. One should never use it. Explict joins are a better choice. I would have also mentioned this in an interview and explained why. I also would not have taken the job if they actually used crappy syntax like this because it's very use shows me the database is very likely to be poorly designed.

Related

How To Select data from multiple tables with grouping for duplicates

I have Two Tables, one with Employees Details and another with vacations taken by them in different years.Please check this image for the tables
Here as you'll find out in the vacation table, for the same employee with same employeeId and in sam year different vacation days are mentioned. Like John Smith in 2011 have two entries one with 10 vacation and one with 3 vacation. I want my query to return a single row with vacations mentioned as 13.
I tried the following query but no luck
SELECT Employee_Details.EmployeeId, Employee_Details.EmployeeName, Employees_Vacation.Year, Employees_Vacation.Vacation, Employee_Details.Department
FROM Employees_Vacation INNER JOIN Employee_Details ON Employees_Vacation.EmployeeId=Employee_Details.EmployeeId group by Employee_Details.EmployeeId ORDER BY Employee_Details.EmployeeName, Employees_Vacation.Year ;
if i understood you right, i think this may help you
select sum(vacation) as sum, ev.year, ed.EmployeeName from employee_Details as ed inner join employee_Vacation as ev
on ed.employeeID = ev.employeeID
group by ev.year, ed.EmployeeName
A lot here will depend on the sql engine you are using, however there are some things that will apply regardless of the engine to consider:
Your current GROUP BY clause is grouping only by employeeId - from the question text it seems like you are instead looking for results grouped by employee AND vacation year
Your projection (SELECT statement) currently isn't actually aggregating anything - it's just projecting a bunch of fields. On some db engines, this actually isn't even allowed (SQL Server for example will only allow grouped or aggregated columns in the projection). Again, from the question text it seems like you are looking for the SUM of vacation days per employee and year.
Taking these into account and assuming the assumptions made are accurate, something like the following should work in most/all modern RDBMS's:
SELECT Employee_Details.EmployeeId,
Employee_Details.EmployeeName,
Employees_Vacation.Year,
SUM(Employees_Vacation.Vacation) AS TotalVacationDays,
Employee_Details.Department
FROM Employees_Vacation
INNER JOIN Employee_Details
ON Employees_Vacation.EmployeeId = Employee_Details.EmployeeId
GROUP BY
Employee_Details.EmployeeId, Employee_Details.EmployeeName,
Employees_Vacation.Year, Employee_Details.Department
ORDER BY
Employee_Details.EmployeeName,
Employee_Details.EmployeeId,
Employees_Vacation.Year;
You may be able to get away with fewer grouping clauses in some engines (MySql for example). Additionally I added an EmployeeId to the order by clause to ensure records for the same employee remain together in the results (for employees with the same names for example).

SQL Remove Duplicate Rows of Data from Query Result

I am still learning the ropes of SQL so I have run into my first obstacle. I am to create an SQL query that retrieves employee.firstname, employee.lastname, dependents.depname, and dependents.birthday from the two tables employees and dependents.
I am only supposed to show an employee if he or she has a dependent.
My primary table (employee; only the first 43 rows): employee table
My secondary table (dependents): dependents table
This is what I have so far:
SELECT
employee.firstname, employee.lastname,
dependents.depname, dependents.birthday
FROM
employee
INNER JOIN
dependents ON employee.id = dependents.empid
This works fine however I run into many duplicate rows of data:
Original Query
This is not the full query result but I think it provides sufficient evidence of my problem.
I used the DISTINCT keyword with my SELECT statement, but it only retrieved a small number of my dependents.
Adding DISTINCT
Have you already any duplicates in one of the tables employee or dependents? The second result looks correct. With select distinct the database removes all duplicates from the result set.

Ms-Access: counting from 2 tables

I have two tables in a Database
and
I need to retrieve the number of staff per manager in the following format
I've been trying to adapt an answer to another question
SELECT bankNo AS "Bank Number",
COUNT (*) AS "Total Branches"
FROM BankBranch
GROUP BY bankNo
As
SELECT COUNT (*) AS StaffCount ,
Employee.Name AS Name
FROM Employee, Stafflink
GROUP BY Name
As I look at the Group BY I'm thinking I should be grouping by The ManID in the Stafflink Table.
My output with this query looks like this
So it is counting correctly but as you can see it's far off the output I need to get.
Any advice would be appreciated.
You need to join the Employee and Stafflink tables. It appears that your FROM clause should look like this:
FROM Employee INNER JOIN StaffLink ON Employee.ID = StaffLink.ManID
You have to join the Eployee table twice to get the summary of employees under manager
select count(*) as StaffCount,Manager.Name
from Employee join Stafflink on employee.Id = StaffLink.EmpId
join Employee as Manager on StaffLink.ManId = Manager.Id
Group by Manager.Name
The answers that advise you on how to join are correct, assuming that you want to learn how to use SQL in MS Access. But there is a way to accomplish the same thing using the ACCESS GUI for designing queries, and this involves a shorter learning curve than learning SQL.
The key to using the GUI when more than one table is involved is to realize that you have to define the relationships between tables in the relationship manager. Once you do that, designing the query you are after is a piece of cake, just point and click.
The tricky thing in your case is that there are two relationships between the two tables. One relationship links EmpId to ID and the other links ManId to ID.
If, however, you want to learn SQL, then this shortcut will be a digression.
If you don't specify a join between the tables, a so called Cartesian product will be built, i.e., each record from one table will be paired with every record from the other table. If you have 7 records in one table and 10 in the other you will get 70 pairs (i.e. rows) before grouping. This explains why you are getting a count of 7 per manager name.
Besides joining the tables, I would suggest you to group on the manager id instead of the manager name. The manager id is known to be unique per manager, but not the name. This then requires you to either group on the name in addition, because the name is in the select list or to apply an aggregate function on the name. Each additional grouping slows down the query; therefore I prefer the aggregate function.
SELECT
COUNT(*) AS StaffCount,
FIRST(Manager.Name) AS ManagerName
FROM
Stafflink
INNER JOIN Employee AS Manager
ON StaffLink.ManId = Manager.Id
GROUP BY
StaffLink.ManId
I don't know if it makes a performance difference, but I prefer to group on StaffLink.ManId than on Employee.Id, since StaffLink is the main table here and Employee is just used as lookup table in this query.

Inner join sql statement

I have two tables, Invoices and members, connected by PK/FK relationship through the field InvoiceNum. I have created the following sql and it works fine, and pulls 44 records as expected.
SELECT
INVOICES.InvoiceNum,
INVOICES.GroupNum,
INVOICES.DivisionNum,
INVOICES.DateBillFrom,
INVOICES.DateBillTo
FROM INVOICES
INNER JOIN MEMBERS ON INVOICES.InvoiceNum = MEMBERS.InvoiceNum
WHERE MEMBERS.MemberNum = '20032526000'
Now, I want to replace INVOICES.GroupNum and INVOICES.DivisionNum in the above query with GroupName and DivisionName. These values are present in the Groups and Divisions tables which also have the corresponding Group_num and Division_num fields. I have created the following sql. The problem is that it now pulls 528 records instead of 44!
SELECT
INVOICES.InvoiceNum,
INVOICES.DateBillFrom,
INVOICES.DateBillTo,
DIVISIONS.DIVISION_NAME,
GROUPS.GROUP_NAME
FROM INVOICES
INNER JOIN MEMBERS ON INVOICES.InvoiceNum = MEMBERS.InvoiceNum
INNER JOIN GROUPS ON INVOICES.GroupNum = GROUPS.Group_Num
INNER JOIN DIVISIONS ON INVOICES.DivisionNum = DIVISIONS.Division_Num
WHERE MEMBERS.MemberNum = '20032526000'
Any help is greatly appreciated.
You have at least one relation between your tables which is missing in your query. It gives you extra records. Find all common fields. Say, are divisions related to groups?
The statement is fine, as far as the SQL syntax goes.
But the question you have to ask yourself (and answer it):
How many rows in Groups do you get for that given GroupNum?
Ditto for Divisions - how many rows exist for that DivisionNum?
It would appear that those numbers aren't unique - multiple rows exist for each number - therefore you get multiple rows returned

How do I remove "duplicate" rows from a view?

I have a view which was working fine when I was joining my main table:
LEFT OUTER JOIN OFFICE ON CLIENT.CASE_OFFICE = OFFICE.TABLE_CODE.
However I needed to add the following join:
LEFT OUTER JOIN OFFICE_MIS ON CLIENT.REFERRAL_OFFICE = OFFICE_MIS.TABLE_CODE
Although I added DISTINCT, I still get a "duplicate" row. I say "duplicate" because the second row has a different value.
However, if I change the LEFT OUTER to an INNER JOIN, I lose all the rows for the clients who have these "duplicate" rows.
What am I doing wrong? How can I remove these "duplicate" rows from my view?
Note:
This question is not applicable in this instance:
How can I remove duplicate rows?
DISTINCT won't help you if the rows have any columns that are different. Obviously, one of the tables you are joining to has multiple rows for a single row in another table. To get one row back, you have to eliminate the other multiple rows in the table you are joining to.
The easiest way to do this is to enhance your where clause or JOIN restriction to only join to the single record you would like. Usually this requires determining a rule which will always select the 'correct' entry from the other table.
Let us assume you have a simple problem such as this:
Person: Jane
Pets: Cat, Dog
If you create a simple join here, you would receive two records for Jane:
Jane|Cat
Jane|Dog
This is completely correct if the point of your view is to list all of the combinations of people and pets. However, if your view was instead supposed to list people with pets, or list people and display one of their pets, you hit the problem you have now. For this, you need a rule.
SELECT Person.Name, Pets.Name
FROM Person
LEFT JOIN Pets pets1 ON pets1.PersonID = Person.ID
WHERE 0 = (SELECT COUNT(pets2.ID)
FROM Pets pets2
WHERE pets2.PersonID = pets1.PersonID
AND pets2.ID < pets1.ID);
What this does is apply a rule to restrict the Pets record in the join to to the Pet with the lowest ID (first in the Pets table). The WHERE clause essentially says "where there are no pets belonging to the same person with a lower ID value).
This would yield a one record result:
Jane|Cat
The rule you'll need to apply to your view will depend on the data in the columns you have, and which of the 'multiple' records should be displayed in the column. However, that will wind up hiding some data, which may not be what you want. For example, the above rule hides the fact that Jane has a Dog. It makes it appear as if Jane only has a Cat, when this is not correct.
You may need to rethink the contents of your view, and what you are trying to accomplish with your view, if you are starting to filter out valid data.
So you added a left outer join that is matching two rows? OFFICE_MIS.TABLE_CODE is not unique in that table I presume? you need to restrict that join to only grab one row. It depends on which row you are looking for, but you can do something like this...
LEFT OUTER JOIN OFFICE_MIS ON
OFFICE_MIS.ID = /* whatever the primary key is? */
(select top 1 om2.ID
from OFFICE_MIS om2
where CLIENT.REFERRAL_OFFICE = om2.TABLE_CODE
order by om2.ID /* change the order to fit your needs */)
If the secondd row has one different value than it is not really duplicate and should be included.
Instead of using DISTINCT, you could use a GROUP BY.
Group by all the fields that you want to be returned as unique values.
Use MIN/MAX/AVG or any other function to give you one result for fields that could return multiple values.
Example:
SELECT Office.Field1, Client.Field1, MIN(Office.Field1), MIN(Client.Field2)
FROM YourQuery
GROUP BY Office.Field1, Client.Field1
You could try using Distinct Top 1 but as Hunter pointed out, if there is if even one column is different then it should either be included or if you don't care about or need the column you should probably remove it. Any other suggestions would probably require more specific info.
EDIT: When using Distinct Top 1 you need to have an appropriate group by statement. You would really be using the Top 1 part. The Distinct is in there because if there is a tie for Top 1 you'll get an error without having some way to avoid a tie. The two most common ways I've seen are adding Distinct to Top 1 or you could add a column to the query that is unique so that sql would have a way to choose which record to pick in what would otherwise be a tie.