Get list of unique records - sql

I have the following table which lists the employees and their corresponding managers:
id | employeeid | managerid
1 | 34256 | 12789
2 | 21222 | 34256
3 | 12435 | 34256
.....
.....
What is the recommended way to list out all distinct employees(id) in a single list.
Note that all managers may not be listed under the employeeid column (as he may not have a manager in turn).

If I understand this correctly:
This will unite all distict Employee IDs avoiding duplicates from between the two column (UNION)
SELECT employeeid AS Employee
FROM tableA
UNION
SELECT managerid AS Employee
FROM tableA

This should d it :
SELECT DISTINCT employeeid FROM yourtablename
But seriously, by googling the keyword "distinct" you could have found out very easily yourself ! Or did I miss something out ?

SELECT id, employeeid, managerid
FROM
(SELECT yourtablename.*,
ROW_NUMBER() OVER (PARTITION BY managerid ORDER BY employeeid DESC) AS RN
FROM yourtablename) AS t
WHERE RN = 1
ORDER BY ID

Related

Why no similar ids in the results set when query with a correlated query inside where clause

I have a table with columns id, forename, surname, created (date).
I have a table such as the following:
ID | Forename | Surname | Created
---------------------------------
1 | Tom | Smith | 2008-01-01
1 | Tom | Windsor | 2008-02-01
2 | Anne | Thorn | 2008-01-05
2 | Anne | Baker | 2008-03-01
3 | Bill | Sykes | 2008-01-20
Basically, I want this to return the most recent name for each ID, so it would return:
ID | Forename | Surname | Created
---------------------------------
1 | Tom | Windsor | 2008-02-01
2 | Anne | Baker | 2008-03-01
3 | Bill | Sykes | 2008-01-20
I get the desired result with this query.
SELECT id, forename, surname, created
FROM name n
WHERE created = (SELECT MAX(created)
FROM name
GROUP BY id
HAVING id = n.id);
I am getting the result I want but I fail to understand WHY THE IDS ARE NOT BEING REPEATED in the result set. What I understand about correlated subquery is it takes one row from the outer query table and run the inner subquery. Shouldn't it repeat "id" when ids repeat in the outer query? Can someone explain to me what exactly is happening behind the scenes?
First, your subquery does not need a GROUP BY. It is more commonly written as:
SELECT n.id, n.forename, n.surname, n.created
FROM name n
WHERE n.created = (SELECT MAX(n2.created)
FROM name n2
WHERE n2.id = n.id
);
You should get in the habit of qualifying all column references, especially when your query has multiple table references.
I think you are asking why this works. Well, each row in the outer query is tested for the condition. The condition is: "is my created the same as the maximum created for all rows in the name table with the same id". In your data, only one row per id matches that condition, so ids are not repeated.
You can also consider joining the tables by created vs max(created) column values :
SELECT n.id, n.forename, n.surname, n.created
FROM name n
RIGHT JOIN ( SELECT id, MAX(created) as created FROM name GROUP BY id ) t
ON n.created = t.created;
or using IN operator :
SELECT id, forename, surname, created
FROM name n
WHERE ( id, created ) IN (SELECT id, MAX(created)
FROM name
GROUP BY id );
or using EXISTS with HAVING clause in the subquery :
SELECT id, forename, surname, created
FROM name n
WHERE EXISTS (SELECT id
FROM name
GROUP BY id
HAVING MAX(created) = n.created
);
Demo

SQL Server : finding where the chain of hierarchy broke

I have more than 70k employee data in a table. It looks like this:
+----------------+----------------------+----------+
| EmployeeId | name | ManagerID|
+----------------+----------------------+----------+
| 1 | Iron Man | 2 |
| 2 | Batman | 4 |
| 3 | Superman | 2000 |
| 4 | Captain America | 3 |
+----------------+----------------------+----------+
Here, Superman has an invalid ManagerID because ManagerID = 2000 doesn't exist in the EmployeeID column. In order to assign a new ManagerID for Superman, I need to find out at what level of hierarchy he is located. I know it should be some recursive query, but I am having much difficulty. Could anybody help? Thank you so much!
This might help you get started.
CREATE TABLE Employees ( EmployeeID INT, Name VARCHAR(200), ManagerID INT )
INSERT INTO Employees ( EmployeeID, Name, ManagerID ) VALUES ( 1, 'Iron Man', 2 ), ( 2, 'Batman', 4 ), ( 3, 'Superman', 2000 ), (4, 'Captain America', 3 )
WITH Relationships ( ManagerID, Name, EmployeeID ) AS
(
SELECT
ManagerID, Name, EmployeeID
FROM
Employees
WHERE
ManagerID IN ( SELECT EmployeeID FROM Employees )
UNION ALL
SELECT
Employees.ManagerID, Relationships.Name, Relationships.EmployeeID
FROM
Employees,
relationships
WHERE
Employees.EmployeeID = Relationships.ManagerID
)
SELECT
EmployeeID, Name, ManagerID
FROM
Relationships
WHERE
EmployeeID = 1 -- Iron Man
OPTION ( MAXRECURSION 25000 )
Replace the "EmployeeID = 1" with whatever the Employee ID is that you want to target. The number of rows it returns is the level. You can probably add a ROW_NUM to the outermost query to get that value.
In order to find out the broken records you can use subquery :
select *
from table t
where not exists (select 1 from table where EmployeeId = t.ManagerID);

postgresql find duplicates in column with ID

For instance, I have a table say "Name" with duplicate records in it:
Id | Firstname
--------------------
1 | John
2 | John
3 | Marc
4 | Jammie
5 | John
6 | Marc
How can I fetch duplicate records and display them with their receptive primary key ID?
Use Count()Over() window aggregate function
Select * from
(
select Id, Firstname, count(1)over(partition by Firstname) as Cnt
from yourtable
)a
Where Cnt > 1
SELECT t.*
FROM t
INNER JOIN
(SELECT firstname
FROM t
GROUP BY firstname
HAVING COUNT(*) > 1) sub
ON t.firstname = sub.firstname
A sub-query would do the trick. Select the first names that are found more than once your table, t. Then join these names back to the main table to pull in the primary key.

Alternate way of this Query?

Let's say we have a Table named Employee with column 'Name':
+---------+
| Name |
+---------+
| Jack |
+---------+
| Paul |
+---------+
| Jack |
+---------+
To have the distinct name we can run this query:
Select DISTINCT Name
from Employee
Is there any other way to retrieve the distinct value?
GROUP BY is silly, but UNION is even worse:
select name from Employee
union
select name from Employee
You can also do INTERSECT...
Select Name
from Employee
Group by Name
This also gives same result
I don't know why you don't want to use distinct but you can GROUP BY
Select Name
from Employee
group by Name;
If you just want to have some fun:
select top 1 with ties Name
from Employee
order by ROW_NUMBER() over(partition by Name order by Name)
;with #temp(Select Name,row_number()over(partition by name order by name desc) as seq)
Select Name from #temp where seq=1

SQL group by with a count

I have a table (simplified below)
|company|name |age|
| 1 | a | 3 |
| 1 | a | 3 |
| 1 | a | 2 |
| 2 | b | 8 |
| 3 | c | 1 |
| 3 | c | 1 |
For various reason the age column should be the same for each company. I have another process that is updating this table and sometimes it put an incorrect age in. For company 1 the age should always be 3
I want to find out which companies have a mismatch of age.
Ive done this
select company, name age from table group by company, name, age
but dont know how to get the rows where the age is different. this table is a lot wider and has loads of columns so I cannot really eyeball it.
Can anyone help?
Thanks
You should not be including age in the group by clause.
SELECT company
FROM tableName
GROUP BY company, name
HAVING COUNT(DISTINCT age) <> 1
SQLFiddle Demo
If you want to find the row(s) with a different age than the max-count age of each company/name group:
WITH CTE AS
(
select company, name, age,
maxAge=(select top 1 age
from dbo.table1 t2
group by company,name, age
having( t1.company=t2.company and t1.name=t2.name)
order by count(*) desc)
from dbo.table1 t1
)
select * from cte
where age <> maxAge
Demontration
If you want to update the incorrect with the correct ages you just need to replace the SELECT with UPDATE:
WITH CTE AS
(
select company, name, age,
maxAge=(select top 1 age
from dbo.table1 t2
group by company,name, age
having( t1.company=t2.company and t1.name=t2.name)
order by count(*) desc)
from dbo.table1 t1
)
UPDATE cte SET AGE = maxAge
WHERE age <> maxAge
Demonstration
Since you mentioned "how to get the rows where the age is different" and not just the comapnies:
Add a unique row id (a primary key) if there isn't already one. Let's call it id.
Then, do
select id from table
where company in
(select company from table
group by company
having count(distinct age)>1)