How to select only distinct row when there might be duplicates using SQL? - sql

I have a tricky SQL query I could use some help with.
I have a phone directory table, that was not designed very well. It has name, phone number, job description and primary_job_indicator. However, the primary_job_indicator isn't doing it's job. Not everyone has a primary_job.
Here's some sample data:
fname lname phone email job primary_job_ind
Tim Burton 222-2222 tburton#ok.com manager Y
Jim Classy 222-3333 tclassy#ok.com instructor Y
Jim Classy 222-3333 tclassy#ok.com dept head N
Jane Dill 222-4444 jdill#ok.com janitor N
I would like to select only the following, one row, with one job per person :
Tim Burton 222-2222 tburton#ok.com manager
Jim classy 222-3333 jclassy#ok.com instructor
Jane Dill 222-4444 jdill#ok.com janitor
I want to select from the table and avoid duplicate name+phone number+email.
If the person has only one row in the table, I want to select that row.
If the person has more than one row in the table, I want to select only one row - the one with primary_job_ind = 'Y' if it exists
I can't figure out how to do it :
SELECT fname, lname, phone, email, job
FROM phonedirectory
WHERE (( primary_job_ind = 'Y' ) OR ??????? )

Assuming there is no typo (which there probably is), this should do the job
select fname, lname, phone, email, job
from (
SELECT fname, lname, phone, email, job,
row_number() over (
partition by fname, lname, phone, email
order by primary_job_ind desc
) r
FROM phonedirectory
) where r = 1
It numbers the rows, belonging to the same person, with the primary job first, then takes only the first row for each person.
See the documenation of ROW_NUMBER for details

Related

matching pair of employees in a same department

I'm trying to make a list of employees working in a same department like:
employeeName
department
employeeName
Tim
2
kim
Tim
2
Jim
Kim
2
Tim
Kim
2
Jim
Jim
2
Kim
Jim
2
Tim
Aim
3
Sim
Sim
3
Aim
But the only thing i can do for now is:
SELECT emp_name, dept_code
FROM employee
WHERE dept_code IN (SELECT dept_code FROM employee);
employeeName
department
Tim
2
Kim
2
Jim
2
Aim
3
Sim
3
How can I make a list pairing with the employee working in a same department? thanks gurus...
To first point that out: I dislike your idea to create such a result listing "pairs" twice and would prefer another, easier query whose results would be better to read. I will come back to this later in this answer.
But anyway, if you really want to produce the outcome you have shown, we can do this with CROSS JOIN. This builds all combinations of employees.
In the WHERE clause, we will set the conditions that they must work in the same department, but have different names:
SELECT
e1.emp_name AS employeeName,
e1.dept_code AS department,
e2.emp_name AS employeeName
FROM
employee e1
CROSS JOIN employee e2
WHERE
e1.dept_code = e2.dept_code
AND e1.emp_name <> e2.emp_name
ORDER BY e1.dept_code, e1.emp_name, e2.emp_name;
To come back to the idea to make this much easier and better to read: We can just use LISTAGG with GROUP BY to produce a comma-separated list of employees per department. I highly recommend to use this approach due to much better performance and readability.
This query will do on new Oracle DB's:
SELECT dept_code,
LISTAGG (emp_name,',') AS employees
FROM employee
GROUP BY dept_code;
On older Oracle DB's, we need to add a WITHIN GROUP clause:
SELECT dept_code,
LISTAGG (emp_name,',')
WITHIN GROUP (ORDER BY emp_name) AS employees
FROM employee
GROUP BY dept_code;
This will produce following result for your sample data:
DEPT_CODE
EMPLOYEES
2
Jim,Kim,Tim
3
Aim,Sim
Here we can try out these things: db<>fiddle
You will get all the pairs (A,B) and (B,A) of employees in the same department at the exclusion of all (A,A) with:
SELECT e1.emp_name AS first_emp_name, e1.dept_code, e2.emp_name AS second_emp_name
FROM employee e1
JOIN employee e2 ON e1.dept_code = e2.dept_code AND e1.emp_name <> e2.emp_name ;

How to return all names that appear multiple times in table [duplicate]

This question already has answers here:
What's the SQL query to list all rows that have 2 column sub-rows as duplicates?
(10 answers)
Closed last year.
Suppose I have the following schema:
student(name, siblings)
The related table has names and siblings. Note the number of rows of the same name will appear the same number of times as the number of siblings an individual has. For instance, a table could be as follows:
Jack, Lucy
Jack, Tim
Meaning that Jack has Lucy and Tim as his siblings.
I want to identify an SQL query that reports the names of all students who have 2 or more siblings. My attempt is the following:
select name
from student
where count(name) >= 1;
I'm not sure I'm using count correctly in this SQL query. Can someone please help with identifying the correct SQL query for this?
You're almost there:
select name
from student
group by name
having count(*) > 1;
HAVING is a where clause that runs after grouping is done. In it you can use things that a grouping would make available (like counts and aggregations). By grouping on the name and counting (filtering for >1, if you want two or more, not >=1 because that would include 1) you get the names you want..
This will just deliver "Jack" as a single result (in the example data from the question). If you then want all the detail, like who Jack's siblings are, you can join your grouped, filtered list of names back to the table:
select *
from
student
INNER JOIN
(
select name
from student
group by name
having count(*) > 1
) morethanone ON morethanone.name = student.name
You can't avoid doing this "joining back" because the grouping has thrown the detail away in order to create the group. The only way to get the detail back is to take the name list the group gave you and use it to filter the original detail data again
Full disclosure; it's a bit of a lie to say "can't avoid doing this": SQL Server supports something called a window function, which will effectively perform a grouping in the background and join it back to the detail. Such a query would look like:
select student.*, count(*) over(partition by name) n
from student
And for a table like this:
jack, lucy
jack, tim
jane, bill
jane, fred
jane, tom
john, dave
It would produce:
jack, lucy, 2
jack, tim, 2
jane, bill, 3
jane, fred, 3
jane, tom, 3
john, dave, 1
The rows with jack would have 2 on because there are two jack rows. There are 3 janes, there is 1 john. You could then wrap all that in a subquery and filter for n > 1 which would remove john
select *
from
(
select student.*, count(*) over(partition by name) n
from student
) x
where x.n > 1
If SQL Server didn't have window functions, it would look more like:
select *
from
student
INNER JOIN
(
select name, count(*) as n
from student
group by name
) x ON x.name = student.name
The COUNT(*) OVER(PARTITION BY name) is like a mini "group by name and return the count, then auto join back to the main detail using the name as key" i.e. a short form of the latter query
You can do:
select name
from student as s1
where exists (
select s2
from student as s2
where s1.name = s2.name and s1.siblings != s2.siblings
)
I think the best approach is what 'Caius Jard' mentioned. However, additional way if you want to get how many siblings each name has .
SELECT name, COUNT(*) AS Occurrences
FROM student
GROUP BY name
HAVING (COUNT(*) > 1)
I wanted to share another solution I came up with:
select s1.name
from student s1, student s2
where s1.name = s2.name and s1.sibling != s2.sibling;

How to find people in a database who live in the same cities?

I'm new to SQL, and I'm asking for help in an apparently easy question, but it gets cumbersome in my mind.
I have the following table:
ID NAME CITY
---------------------
1 John new york
2 Sam new york
3 Tom boston
4 Bob boston
5 Jan chicago
6 Ted san francisco
7 Kat boston
I want a query that returns all the people who live in a city that another person registered in the database also lives in.
The answer, for the table I showed above, would be:
ID NAME CITY
---------------------
1 John new york
2 Sam new york
3 Tom boston
4 Bob boston
7 Kat boston
This is really a two part question:
What cities have more than one user located in them?
What users live in that subset of cities?
Let's answer it in two parts. Let's also make the simplifying assumption (not stated in your question) that the Users table has only one entry per user per city.
To find cities with more than one user:
SELECT City FROM Users GROUP BY City HAVING COUNT(*) > 1
Now, let's find all the users for those cities:
SELECT ID, User, City FROM Users
WHERE City IN (SELECT City FROM Users GROUP BY CITY HAVING COUNT(*) > 1)
I would use EXISTS :
SELECT t.*
FROM table t
WHERE EXISTS (SELECT 1 FROM table t1 WHERE t1.city = t.city AND t1.name <> t.name);
To avoid a correlated subquery which leads to a nested loop, you could perform a self join:
SELECT id, name, city
FROM persons
JOIN (SELECT city
FROM persons
GROUP BY city HAVING count(*) > 1) AS cities
USING (city);
This might be the most performant solution.
This will give you the rows that have the same city more than 1 time:
SELECT persons.*
FROM persons
WHERE (SELECT COUNT(*) FROM persons AS p GROUP BY CITY HAVING p.CITY = persons.CITY) > 1
This is just a different flavor from the others that have posted.
SELECT ID,
name,
city
FROM (SELECT DISTINCT
ID,
name,
city,
COUNT(1) OVER (PARTITION BY city) AS cityCount
FROM table) t
WHERE cityCount > 1
This can be expressed many ways. Here is one possible way:
select * from persons p
where exists (
select 1 from persons p2
where p2.city = p.city and p2.name <> p.name
)

SQL Subquery or Group BY to return Distinct values of a certain column?

I am running into an issue where I know I am close to a solution but I cannot get the right data to come back or am getting errors on the queries.
Basically what I want to do is to select values from a single table based on the following: I want them based on an employee name I supply and I want them to return DISTINCT Account Numbers for that employee(sometimes the same account number is listed more than one time...I only want one row of data from each duplicate account number).
Here is what I have tried so far:
SELECT DISTINCT * FROM Employees WHERE EmpName = 'Mary Johnson' GROUP BY AccountID
-This returns an error saying I don't have an aggregate for the column EmployeeID(which isn't a calculated column, so why is it asking for an aggregate??)
Then I tried a subquery:
SELECT EmployeeID, EmpName, Department, Position, (SELECT DISTINCT AccountID From Employees) AS AcctID FROM Employees WHERE EmpName = 'Mary Johnson'
-This gives me an error saying AT Most one record can be returned(I'm in Access).
I know there has to be a solution to what I am looking for---Currently If I do a full query just on EmpName it returns 30 records, however, 23 of them have a duplicate AccountID, so there should only be 7 records that return with a DISTINCT AccountID.
How can I achieve this via SQL?
Here is an Example:
ID(PK) EmpName AcctID Department Position EmpId
------ -------- --------- -------- -------- ------
1 Mary Johnson 1234 IT Tech 226663
2 Mary Johnson 1234 IT Tech 226663
3 Mary Johnson 1234 IT Tech 226663
4 Mary Johnson 2345 IT Tech 226663
5 John James 23442 Banking Teller 445645
6 Jame Tabor 1234 HR Manager 234555
In the example above I want to do an SQL Query to get the rows back with the AccountId's for Mary Johnson that are DISTINCT. So We would get back 2 rows Mary Johnson 1234 and Mary Johnson 2345, ignoring the other 2 rows that Mary Johnson has with duplicate AccountIDs of 1234 and the other rows with employees not named Mary Johnson
This is not really easy to do in MS Access -- it lacks row_number(). One method is to use another column that uniquely identifies each row that has the same AccountID:
select e.*
from Employees as e
where e.?? = (select max(e2.??)
from Employees as e2
where e2.AccountId = e.AccountId and e2.empName = e.empName
);
The ?? is for the column that uniquely identifies the rows.
If you only care about two columns, then use select distinct:
select distinct e.empName, e.AccountId
from employees as e;
You an add a where clause to either query to restrict to a single employee. However, it doesn't make sense to me that a table called Employees would have multiple rows for a single employee name (except in the rare case of people with the same names).
The reason you are getting the error about "you can only return one row" is because you are using a subquery to feed into a record row. In that case, you are searching for distinct AcctIDs for Mary Johnson, she has two distinct ones, so the query fails. If you used MIN(AcctID) that query would work fine. Therefore, this will work:
SELECT EmployeeID, EmpName, Department, Position, MIN(AccountID )
FROM Employees
WHERE EmpName = 'Mary Johnson'
GROUP BY EmployeeID, EmpName, Department, Position
The next question you need to ask yourself is WHICH AccountID do you want to see? You can switch to MAX, but if you have three values and you want the middle one, that will be a lot harder to solve. But, the above query will at least give you a working result.
EDIT:
That should give you only one row assuming you have identical information for Mary Johnson. If you are getting 24 it means you have different entries For empname, department, position, or employeeid. If you want to see all AccountIDs associated with Mary Johnson, use this:
SELECT EmployeeID, MIN(EmpName), AccountID, MIN(Department) AS Department, MIN(Position) AS Position
FROM Employees
WHERE EmpName = 'Mary Johnson'
GROUP BY EmployeeID, AccountID

How can I get a distinct list of elements in a hierarchical query?

I have a database table, with people identified by a name, a job and a city. I have a second table that contains a hierarchical representation of every job in the company in every city.
Suppose I have 3 people in the people table:
[name(PK),title,city]
Jim, Salesman, Houston
Jane, Associate Marketer, Chicago
Bill, Cashier, New York
And I have thousands of job type/location combinations in the job table, a sample of which follow. You can see the hierarchical relationship since parent_title is a foreign key to title:
[title,city,pay,parent_title]
Salesman, Houston, $50000, CEO
Cashier, Houston, $25000
CEO, USA, $1000000
Associate Marketer, Chicago, $75000
Senior Marketer, Chicago, $125000
.....
The problem I'm having is that my Person table is a composite key, so I don't know how to structure the start with part of my query so that it starts with each of the three jobs in the cities I specified.
I can execute three separate queries to get what I want, but this doesn't scale well. e.g.:
select * from jobs
start with city = (select city from people where name = 'Bill') and title = (select title from people where name = 'Bill')
connect by prior parent_title = title
UNION
select * from jobs
start with city = (select city from people where name = 'Jim') and title = (select title from people where name = 'Jim')
connect by prior parent_title = title
UNION
select * from jobs
start with city = (select city from people where name = 'Jane') and title = (select title from people where name = 'Jane')
connect by prior parent_title = title
How else can I get a distinct list (or I could wrap it with a distinct if not possible) of all the jobs which are above the three people I specified?
Please try this. I haven't tested this.
SELECT distinct *
FROM jobs
START WITH ( city, title ) IN
( SELECT city, title
FROM people
WHERE name IN ( 'Bill', 'Jim', 'Jane' )
)
CONNECT BY PRIOR parent_title = title;
this should work:
SQL> SELECT *
2 FROM jobs
3 START WITH (title, city) IN (SELECT title, city FROM people)
4 CONNECT BY PRIOR parent_title = title;
TITLE CITY PAY PARENT_TITLE
------------------ ------- ---------- ------------
Associate Marketer Chicago 7500
Salesman Houston 5000 CEO
CEO USA 100000