SQL Subquery or Group BY to return Distinct values of a certain column? - sql

I am running into an issue where I know I am close to a solution but I cannot get the right data to come back or am getting errors on the queries.
Basically what I want to do is to select values from a single table based on the following: I want them based on an employee name I supply and I want them to return DISTINCT Account Numbers for that employee(sometimes the same account number is listed more than one time...I only want one row of data from each duplicate account number).
Here is what I have tried so far:
SELECT DISTINCT * FROM Employees WHERE EmpName = 'Mary Johnson' GROUP BY AccountID
-This returns an error saying I don't have an aggregate for the column EmployeeID(which isn't a calculated column, so why is it asking for an aggregate??)
Then I tried a subquery:
SELECT EmployeeID, EmpName, Department, Position, (SELECT DISTINCT AccountID From Employees) AS AcctID FROM Employees WHERE EmpName = 'Mary Johnson'
-This gives me an error saying AT Most one record can be returned(I'm in Access).
I know there has to be a solution to what I am looking for---Currently If I do a full query just on EmpName it returns 30 records, however, 23 of them have a duplicate AccountID, so there should only be 7 records that return with a DISTINCT AccountID.
How can I achieve this via SQL?
Here is an Example:
ID(PK) EmpName AcctID Department Position EmpId
------ -------- --------- -------- -------- ------
1 Mary Johnson 1234 IT Tech 226663
2 Mary Johnson 1234 IT Tech 226663
3 Mary Johnson 1234 IT Tech 226663
4 Mary Johnson 2345 IT Tech 226663
5 John James 23442 Banking Teller 445645
6 Jame Tabor 1234 HR Manager 234555
In the example above I want to do an SQL Query to get the rows back with the AccountId's for Mary Johnson that are DISTINCT. So We would get back 2 rows Mary Johnson 1234 and Mary Johnson 2345, ignoring the other 2 rows that Mary Johnson has with duplicate AccountIDs of 1234 and the other rows with employees not named Mary Johnson

This is not really easy to do in MS Access -- it lacks row_number(). One method is to use another column that uniquely identifies each row that has the same AccountID:
select e.*
from Employees as e
where e.?? = (select max(e2.??)
from Employees as e2
where e2.AccountId = e.AccountId and e2.empName = e.empName
);
The ?? is for the column that uniquely identifies the rows.
If you only care about two columns, then use select distinct:
select distinct e.empName, e.AccountId
from employees as e;
You an add a where clause to either query to restrict to a single employee. However, it doesn't make sense to me that a table called Employees would have multiple rows for a single employee name (except in the rare case of people with the same names).

The reason you are getting the error about "you can only return one row" is because you are using a subquery to feed into a record row. In that case, you are searching for distinct AcctIDs for Mary Johnson, she has two distinct ones, so the query fails. If you used MIN(AcctID) that query would work fine. Therefore, this will work:
SELECT EmployeeID, EmpName, Department, Position, MIN(AccountID )
FROM Employees
WHERE EmpName = 'Mary Johnson'
GROUP BY EmployeeID, EmpName, Department, Position
The next question you need to ask yourself is WHICH AccountID do you want to see? You can switch to MAX, but if you have three values and you want the middle one, that will be a lot harder to solve. But, the above query will at least give you a working result.
EDIT:
That should give you only one row assuming you have identical information for Mary Johnson. If you are getting 24 it means you have different entries For empname, department, position, or employeeid. If you want to see all AccountIDs associated with Mary Johnson, use this:
SELECT EmployeeID, MIN(EmpName), AccountID, MIN(Department) AS Department, MIN(Position) AS Position
FROM Employees
WHERE EmpName = 'Mary Johnson'
GROUP BY EmployeeID, AccountID

Related

matching pair of employees in a same department

I'm trying to make a list of employees working in a same department like:
employeeName
department
employeeName
Tim
2
kim
Tim
2
Jim
Kim
2
Tim
Kim
2
Jim
Jim
2
Kim
Jim
2
Tim
Aim
3
Sim
Sim
3
Aim
But the only thing i can do for now is:
SELECT emp_name, dept_code
FROM employee
WHERE dept_code IN (SELECT dept_code FROM employee);
employeeName
department
Tim
2
Kim
2
Jim
2
Aim
3
Sim
3
How can I make a list pairing with the employee working in a same department? thanks gurus...
To first point that out: I dislike your idea to create such a result listing "pairs" twice and would prefer another, easier query whose results would be better to read. I will come back to this later in this answer.
But anyway, if you really want to produce the outcome you have shown, we can do this with CROSS JOIN. This builds all combinations of employees.
In the WHERE clause, we will set the conditions that they must work in the same department, but have different names:
SELECT
e1.emp_name AS employeeName,
e1.dept_code AS department,
e2.emp_name AS employeeName
FROM
employee e1
CROSS JOIN employee e2
WHERE
e1.dept_code = e2.dept_code
AND e1.emp_name <> e2.emp_name
ORDER BY e1.dept_code, e1.emp_name, e2.emp_name;
To come back to the idea to make this much easier and better to read: We can just use LISTAGG with GROUP BY to produce a comma-separated list of employees per department. I highly recommend to use this approach due to much better performance and readability.
This query will do on new Oracle DB's:
SELECT dept_code,
LISTAGG (emp_name,',') AS employees
FROM employee
GROUP BY dept_code;
On older Oracle DB's, we need to add a WITHIN GROUP clause:
SELECT dept_code,
LISTAGG (emp_name,',')
WITHIN GROUP (ORDER BY emp_name) AS employees
FROM employee
GROUP BY dept_code;
This will produce following result for your sample data:
DEPT_CODE
EMPLOYEES
2
Jim,Kim,Tim
3
Aim,Sim
Here we can try out these things: db<>fiddle
You will get all the pairs (A,B) and (B,A) of employees in the same department at the exclusion of all (A,A) with:
SELECT e1.emp_name AS first_emp_name, e1.dept_code, e2.emp_name AS second_emp_name
FROM employee e1
JOIN employee e2 ON e1.dept_code = e2.dept_code AND e1.emp_name <> e2.emp_name ;

Snowflake Return records once based on combination of distinct column values

I have a data table like the following:
User ID Co_UserID Name Total Tickets
1514677 1377535 Jose 273013
1514677 1377535 Jose 273013
1514677 1377535 Jose 273013
1514677 1377535 Jose 273013
1514677 1377535 Jose 273013
212121 31313 Rob 21212
312312 234134 James 33
As you can see, I have duplicates based on user ID and Co_userID for Jose. I am trying to return distinct records where a combination of user_ID and co_userID appear once.
Desired Output:
User ID Co User ID Name Total Tickets
1514677 1377535 Jose 273013
212121 31313 Rob 21212
312312 234134 James 33
I tried running a query like the following, but selecting multiple distincts is not possible. Can someone advise?
SELECT distinct d.User_ID, distinct d.Co_userID, d.Name, d.Total_Tickets
From DATA d
You only need one DISTINCT:
SELECT distinct d.User_ID, d.Co_userID, d.Name, d.Total_Tickets
From DATA d;
SELECT DISTINCT is a single keyword in SQL (well, a modification of the SELECT clause). There is not need for additional distincts.
This works for the data you have provided. If you wanted one row for the first two columns -- and there are multiple values for the others, then use `QUALIFY:
SELECT d.User_ID, d.Co_userID, d.Name, d.Total_Tickets
FROM DATA d
QUALIFY ROW_NUMBER() OVER (PARTITION BY d.User_ID, d.Co_userID ORDER BY d.User_id) = 1

Postgresql : Removing Duplicates after performing UNION ALL

I have a requirement where i need to remove some rows after Joining two tables using UNION ALL.
Here are the Tables
Accounts1
id
username
department
salary
1
Sam
IT
2000
2
Frodo
Accounts
1000
3
Natan
Service
800
4
Kenworth
Admin
900
Accounts2
id
username
department
salary
5
Sam
IT
1600
6
Frodo
Accounts
800
Expected Result of the UNION should be
id
username
department
salary
5
Sam
IT
1600
6
Frodo
Accounts
800
3
Natan
Service
800
4
Kenworth
Admin
900
As seen the expected result should contain the records of the least salary from the accounts2 table replacing the records from the accounts1. I have tried with Distinct but that doesnot resolve the requirement. Any help is greatly appreciated
You can use union all with filtering:
select a2.*
from accounts2 a2
union all
select a1.*
from accounts1 a1
where not exists (select 1
from accounts2 a2
where a2.username = a1.username and a2.department = a1.department
);
EDIT:
If you want one row per username or username/department from either table with the minimum salary, then I would suggest union all with distinct on:
select distinct on (username, department) a.*
from ((select a1.*
from accounts a1
) union all
(select a2.*
from accounts a2
)
) a
order by username, department, salary;
Remove department accordingly if you want one row per employee.
After UNIONing the two sets, I would calculate a Row_number() ON (Group By department, username Order By salary, id). Then I would wrap that one in one more Select to filter and retain only row_number = 1.
A little more code, but very explicit as to what is being performed and it has the advantage that if either data set happens to contain multiple values for a user you still get the one with the lowest salary.
This is a problem that comes up often where there are multiple records within a group domain and you want to choose "the best" one even if you can't say exactly which one that is. The row_number() window function allows your Order By to make the best choice float to the top where it will assign the row_number of 1. You can then filter and retain only the row_numbers=1 as the "best" choice within each domain. This always means at least two Select statements because window functions are evaluated after Where and Having clauses.

How to mark employees which are also manager

I need to determine whether an employee is any other employee's manager.
Given this table:
Employee Employee's Manager
---------- ------------------
Bob CN=Lisa
Amanda CN=Lisa
James CN=Art
Frank CN=Amanda
Amy CN=Art
I need this:
Employee Employee's Manager Employee IS Manager
---------- ------------------ -------------------
Bob CN=Lisa N
Amanda <-- CN=Lisa Y <--
James CN=Art N
Frank CN=Amanda <-- N
Amy CN=Art N
Because Amanda appears in the "Employee's Manager" column in another employee's row, I need to derive this, adding the additional "Employee IS Manager" field.
I've gotten as far as this (wrong!) subquery for the additional "IS Manager" field, but I do not know how to add it as a column in a subquery:
select
a.* ,
(select 'Y' as IsManager
where exists (select * from Employees b where b.Manager like '%' + #x+ '%' )
)
from Employees a
But I do not know how to make #x refer to Amanda in the Employee column in the other row.
EDIT: I should note that I am not necessarily looking for a "subquery" solution. A JOIN solution, or any other kind of solution is fine for my purposes. Thanks.
You are close but you need a case expression:
select e.* ,
(case when exists (select 1
from Employees m
where m.Manager like '%=' + e.employee_manager
)
then 'Y' else 'N' end
) as isManager
from Employees e;
Note that I tweaked the logic for matching so "Anne" and Roseanne" do not get confused. If the manager always starts with 'CN=', then use like 'CN=' + instead.
You can also use outer apply to get your desired result. Here through outer apply we are getting 'Y' when an employee is also a manager other wise it's returning null. Coalesce() is used to convert null to 'N'.
Schema and insert statements:
create table Employees (Employee varchar(50),employee_Manager varchar(50));
insert into Employees values('Bob', 'CN=Lisa');
insert into Employees values('Amanda', 'CN=Lisa');
insert into Employees values('James', 'CN=Art' );
insert into Employees values('Frank', 'CN=Amanda');
insert into Employees values('Amy', 'CN=Art' );
Query:
select
a.*,coalesce(isManager,'N')[Employee IS Manager]
from Employees a outer apply(select 'Y' from Employees b where b.employee_manager='CN='+a.Employee)manager (isManager)
Output:
Employee
employee_Manager
Employee IS Manager
Bob
CN=Lisa
N
Amanda
CN=Lisa
Y
James
CN=Art
N
Frank
CN=Amanda
N
Amy
CN=Art
N
db<fiddle here

How to select only distinct row when there might be duplicates using SQL?

I have a tricky SQL query I could use some help with.
I have a phone directory table, that was not designed very well. It has name, phone number, job description and primary_job_indicator. However, the primary_job_indicator isn't doing it's job. Not everyone has a primary_job.
Here's some sample data:
fname lname phone email job primary_job_ind
Tim Burton 222-2222 tburton#ok.com manager Y
Jim Classy 222-3333 tclassy#ok.com instructor Y
Jim Classy 222-3333 tclassy#ok.com dept head N
Jane Dill 222-4444 jdill#ok.com janitor N
I would like to select only the following, one row, with one job per person :
Tim Burton 222-2222 tburton#ok.com manager
Jim classy 222-3333 jclassy#ok.com instructor
Jane Dill 222-4444 jdill#ok.com janitor
I want to select from the table and avoid duplicate name+phone number+email.
If the person has only one row in the table, I want to select that row.
If the person has more than one row in the table, I want to select only one row - the one with primary_job_ind = 'Y' if it exists
I can't figure out how to do it :
SELECT fname, lname, phone, email, job
FROM phonedirectory
WHERE (( primary_job_ind = 'Y' ) OR ??????? )
Assuming there is no typo (which there probably is), this should do the job
select fname, lname, phone, email, job
from (
SELECT fname, lname, phone, email, job,
row_number() over (
partition by fname, lname, phone, email
order by primary_job_ind desc
) r
FROM phonedirectory
) where r = 1
It numbers the rows, belonging to the same person, with the primary job first, then takes only the first row for each person.
See the documenation of ROW_NUMBER for details