SQL Remove Duplicate Rows of Data from Query Result

SQL Remove Duplicate Rows of Data from Query Result - sql

I am still learning the ropes of SQL so I have run into my first obstacle. I am to create an SQL query that retrieves employee.firstname, employee.lastname, dependents.depname, and dependents.birthday from the two tables employees and dependents.
I am only supposed to show an employee if he or she has a dependent.
My primary table (employee; only the first 43 rows): employee table
My secondary table (dependents): dependents table
This is what I have so far:
SELECT
employee.firstname, employee.lastname,
dependents.depname, dependents.birthday
FROM
employee
INNER JOIN
dependents ON employee.id = dependents.empid
This works fine however I run into many duplicate rows of data:
Original Query
This is not the full query result but I think it provides sufficient evidence of my problem.
I used the DISTINCT keyword with my SELECT statement, but it only retrieved a small number of my dependents.
Adding DISTINCT

Have you already any duplicates in one of the tables employee or dependents? The second result looks correct. With select distinct the database removes all duplicates from the result set.

Related

MS ACCESS Query with junction table, for all items in one table, but not in another

To create a many-to-many relationship, I have three tables:
tblEmployee, contains employees
tlkpPermission, contains 11 different possible permission groups an employee can be part of
tblEmployeeXPermission, combines the EmployeeID with one or more PermissionID
What I’m trying to create is a query that shows what permission groups an employee is NOT part of.
So, if EmployeeID 12345 is associated with PermissionID 1,2,3,4,5, but NOT 6,7,8,9,10,11 (in the EmployeeXPermission table) then I want the query to show EmployeeID 12345 is not part of PermissionID 6,7,8,9,10,11.
Of all the JOINs and query options, I can only get a query to show which PermissionIDs an employee is associated with, but not the PermissionIDs the employee is not associated with.
Any help would be appreciated.
Thanks

You need to start with all combinations of employees and permissions, and this type of join is CROSS JOIN, but MsAccess SQL does not have it in the new SQL syntax. You can use the old syntax of listing your tables in the FROM clause, comma separated, and provide the join condition, if any, in the WHERE clause:
SELECT
EmployeeId,
PermissionID
FROM
tblEmployee as E,
tlkpPermission as P
where not exists (
select 1
from tblEmployeeXPermission X
where X.EmployeeId=E.EmployeeId
and X.PermissionId=P.PermissionId
)
Here the part up to the WHERE clause would give you all employee - permission combinations, and the WHERE clause removes those occuring in the tblEmployeeXPermission, leaving you with the ones you want.

duplicated rows in select query

I am trying to run a query to get all the customers from my database. These are my tables in a diagram :
when running the query by joining the table Companies_Customers and the Customers table based on the customerId in both tables(doesn't show in the join table in the pic), I get duplicate rows, which is not the desired outcome.
This is normal from a database standpoint since a Customer can be related to different companies (Companies can share single customer).
My question is how do I get rid of the duplication via SQL.

There can be 2 approaches to your problem.
Either only select data from Customers table:
SELECT * FROM Customers
Or select from both tables joined together, but without CompanyName and with GROUP BY CompanyCustomerId - although I highly suggest the first approach.

Which Oracle query is faster

I am trying to display employee properties using C# WPF view.
I have data in '2' different oracle tables in my database:
Those tables structure at high-level is...
Employee table (EMP) - columns:
ID, Name, Organisation
Employee properties table (EMPPR) - columns
ID, PropertyName, PropertyValue
The user will input 'List of Employee Name' and I need to display Employee properties using data in those '2' tables.
Each employee has properties from 40-80 i.e. 40-80 rows per employee in EMPPR table. In this case, which approach is more efficient?
Approach #1 - single query data retrieval:
SELECT Pr.PropertyName, Pr.PropertyValue
FROM EMP Emp, EMPPR Pr
WHERE Emp.ID = Pr.ID
AND Emp.Name IN (<List of Names entered>)
Approach #2 - get IDs list using one query and Get properties using that ID in the second query
Query #1:
SELECT ID
FROM EMP
WHERE Name IN (<List of Names entered>)
Query #2:
SELECT PropertyName, PropertyValue
FROM EMPPR
WHERE ID IN (<List of IDs got from Query#1>)
I need to retrieve ~10K employee details at once where each employee has 40-80 properties.
Which approach is good?

Which query is faster?
The first one, which uses a single query to fetch your results.
Why? much of the elapsed time handling queries, especially ones with modestly sized rows like yours, is consumed going back and forth from the client to the database server.
Plus, the construct WHERE something IN (val, val, val, val ... ... val) can throw an error when you have too many values. So the first query is more robust.
Pro tip: Come on into the 21st century and use the new JOIN syntax.
SELECT Pr.PropertyName, Pr.PropertyValue
FROM EMP Emp
JOIN EMPPR Pr ON Emp.ID = Pr.ID
WHERE Emp.Name IN (<List of Names Inputted>)

Use first approach of join between two tables which is far better than using where clause two times.

Generate "scatter plot" result of members against sets from SQL query

I have a staff database table containing staff members, with user_no and user_name columns. I have another, department, table containing the departments which staff can be members of, with dept_no and dept_name as columns.
Because staff can be members of multiple departments, I have a third, staff_dept, table with a user_no column and a dept_no column, which are the primary keys of those other two tables. This table shows which departments each member of staff belongs to and contains one row for each user/department intersection.
I would like to have an output in the form of a spreadsheet (CSV file, whatever; I'll be fine mangling the results into a usable form after I've got them) with one column for each department, and one row for each user, with an X appearing at each intersection, as defined in staff_dept.
Can I write a single SQL query which will achieve this result? or will I have to do some "real" programming (because it's not a "real" program until you've nested three or four for loops, obviously) to collect and format this data?

This can be done with a PIVOT table (using SQL Server):
SELECT user_name, [dept1name], [dept2name], [dept3name], ...
FROM
(SELECT s.user_name, d.dept_name,
case when sd.user_no is not null then 'X' else '' end as matches
from staff s
cross join department d
left join staff_dept sd on s.user_no = sd.user_no and d.dept_no = sd.dept_no
) AS s
PIVOT
(
min(matches)
FOR dept_name IN ([dept1name], [dept2name], [dept3name], ...)
) AS pvt
order by user_name
Demo: http://www.sqlfiddle.com/#!3/c136d/5
Edit: To generate the PIVOT query dynamically from the list of departments in the table, you would make use of dynamic SQL, i.e., generate the code into a variable and use sp_executesql helper stored procedure. Here's an example: http://www.sqlfiddle.com/#!3/c136d/14

In SQL Server (if you're using SQL Server), I would start with a full outer join (to include all staff and departments, not just those involved in the relation), drop that into a pivot statement to pivot all departments into columns, and then build a short script to generate and dynamically execute that SELECT statement (because the columns created by a pivot statement must be hard-coded, they can't be dynamically generated at run time).
Here's a sample -- it's an unpivot statement, but the concept is pretty much the same.

Need help in understanding JOINS in SQL

I was asked the below SQL question in an interview. Kindly explain how it works and what join it is.
Q: There are two tables: table emp contains 10 rows and table department contains 12 rows.
Select * from emp,department;
What is the result and what join it is?

It would return the Cartesian Product of the two tables, meaning that every combination of emp and department would be included in the result.
I believe that the next question would be:
Blockquote
How do you show the correct department for each employee?
That is, show only the combination of emp and department where the employee belongs to the department.
This can be done by:
SELECT * FROM emp LEFT JOIN department ON emp.department_id=department.id;
Assuming that emp has a field called department_id, and department has a matching id field (This is quite standard in these type of questions).
The LEFT JOIN means that all items from the left side (emp) will be included, and each employee will be matched with the corresponding department. If no matching department is found, the resulting fields from departments will remain empty. Note that exactly 10 rows will be returned.
To show only the employees with valid department IDs, use JOIN instead of LEFT JOIN. This query will return 0 to 10 rows, depending on the number of matching department ids.

The join you specified is a cross join. It will produce one row for each combination of records in the tables being joined.
I'll let you do the math from there.

This will do a cross join I believe, returning 120 rows. One row for each pair-wise combination of rows from each of the two tables.
All-in-all a fairly useless join most of the time.

You will get all rows from both tables with each row joined together.
This is known as a Cartesian join and is very bad.
You will get a total of 120 rows.

This is also the old implied syntax (18 yeasr out of date) and accidental cross joins are a common problem with this syntax. One should never use it. Explict joins are a better choice. I would have also mentioned this in an interview and explained why. I also would not have taken the job if they actually used crappy syntax like this because it's very use shows me the database is very likely to be poorly designed.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL Remove Duplicate Rows of Data from Query Result - sql

Have you already any duplicates in one of the tables employee or dependents? The second result looks correct. With select distinct the database removes all duplicates from the result set.

Related

MS ACCESS Query with junction table, for all items in one table, but not in another

duplicated rows in select query

Which Oracle query is faster

Generate "scatter plot" result of members against sets from SQL query

Need help in understanding JOINS in SQL

Categories

Resources