Why doesn't DISTINCT work in this case? (SQL) - sql

SELECT DISTINCT
employees.departmentname,
employees.firstname,
employees.salary,
employees.departmentid
FROM employees
JOIN (
SELECT MAX(salary) AS Highest, departmentID
FROM employees
GROUP BY departmentID
) departments ON employees.departmentid = departments.departmentid
AND employees.salary = departments.highest;
Why doesn't the DISTINCT work here?
I'm trying to have each department to show only once because the question is asking the highest salary in each department.

Use the ROW_NUMBER() function, as in:
select departmentname, firstname, salary, departmentid
from (
select e.*,
row_number() over(partition by departmentid, order by salary desc) as rn
from employees e
) x
where rn = 1

I'm trying to have each department to show only once because the question is asking the highest salary in each department.
Use window functions:
SELECT e.*
FROM (SELECT e.*,
ROW_NUMBER() OVER (PARTITION BY departmentID ORDER BY salary DESC) as seqnum
FROM employees e
) e
WHERE seqnum = 1;
This is guaranteed to return one row per department, even when there are ties. If you want all rows when there are ties, use RANK() instead.

Why doesn't the DISTINCT work here?
DISTINCT is not a function; it is a keyword that will eliminate duplicate rows when ALL the column values are duplicates. It does NOT apply to a single column.
The DISTINCT keyword has "worked" (i.e. done what it is intended to do) because there are no rows where all the column values are a duplicate of another row's values.
However, it hasn't solved your problem because DISTINCT is not the correct solution to your problem. For that, you want to "fetch the row which has the max value for a column [within each group]" (as per this question).

Gwen, Elena and Paula all have the same salary
and they are in the same department

Related

Return NULL when no records found Postgres

I am trying to solve Leetcode's second highest salary (https://leetcode.com/problems/second-highest-salary/). Here's what I implemented on postgres:
select foo.salary as "SecondHighestSalary"
from
(select salary,
dense_rank() over (order by salary desc) as rank
from Employee) foo
where rank = 2;
But the issue is, I need to return NULL if there are no records with rank = 2.
I then tried
select (case
when count(1) = 0 then NULL
else salary
end
)
from
(select salary,
dense_rank() over (order by salary desc) as rank
from Employee) foo
where rank = 2
group by salary;
But it still returns no records.
How do I output NULL when no records are returned?
The solution doesn't work because there's no row for the case to act on.
You can use coalesce and a sub-select.
Rewriting it as a CTE makes the sub-select more compact.
with ranked_salaries as (
select
salary,
dense_rank() over (order by salary desc) as "rank"
from Employee
)
select
coalesce(
(select salary from ranked_salaries where "rank" = 2),
null
);
Note that this is a simpler and faster approach for this particular problem.
select max(salary)
from Employee
where salary < (select max(salary) from Employee)
If salary is indexed, this should be very fast.
You don't actually need COALESCE, just an outer SELECT:
SELECT (
SELECT salary FROM (
SELECT salary, dense_rank() OVER (ORDER BY salary DESC NULLS LAST) AS rank
FROM employee
) sub
WHERE rank = 2
LIMIT 1
) AS second_highest_salary;
See:
Return a value if no record is found
Be sure to add NULLS LAST if salary can be NULL, or you are in for a surprise. (You'd get the highest salary.) See:
Sort NULL values to the end of a table
And there can be multiple rows with rank = 2, so add LIMIT 1.
With an index on salary, Schwern's 2nd query will be substantially faster, though - while dodging the NULL issue because max() excludes NULL values, and dodging the "no row" issue because aggregate functions always return a row, defaulting to NULL in absence of a value.
select max(salary) as "SecondHighestSalary"
from
(select salary,
dense_rank() over (order by salary desc) as rnk
from Employee) foo
where rnk = 2;
Using a dummy aggregate for the output will guarantee a row is returned. It also deals with the potential for tying rows.

Row_number function using directly

Is there any direct way of using row_number() function? I want to find 2 nd highest salary
SELECT DISTINCT id
,salary
,depid
,ROW_NUMBER() OVER (
PARTITION BY depid ORDER BY salary DESC
) AS rownum
FROM emp
WHERE rownum = 2;
It gives an error, However the below code works fine.
SELECT *
FROM (
SELECT DISTINCT id
,salary
,depid
,ROW_NUMBER() OVER (
PARTITION BY depid ORDER BY salary DESC
) AS rownum
FROM emp
) AS t
WHERE t.rownum = 2;
Is any way of directly using the row_number() function as in the first option which is giving the error?
You can not use the alias name of the same query as the condition for the where clause. You also can not use windowed queries as a passing condition in the where clause.
Here is a detailed explanation Why no windowed functions in where clauses?. It is so you need another query outside the inner query and needs to write sub-query.
You can get the Nth highest salary in SQL Server from the below query.
SELECT TOP 1 salary
FROM (
SELECT DISTINCT TOP N salary
FROM <YourTableNameHere>
ORDER BY salary DESC
) AS TEMP
ORDER BY salary
This query will give you the second highest salary ? No
SELECT id
,salary
,depid
from emp
ORDER BY salary DESC
OFFSET 1 ROWS
FETCH FIRST 1 ROWS ONLY;
Well actually, it will give you the salary that is on the second position when you order the salary's from highest to lowest... So if the highest is 100 and the second highest is 100 then you will get 100 as a result. To conclude this will return a row on the second place depending on the order by clause...
This next query will give you the second highest salary :
SELECT max(id)
, salary
, max(depid)
from emp
group by salary
ORDER BY salary DESC
OFFSET 1 ROWS
FETCH FIRST 1 ROWS ONLY;
But be aware, in case you have two employees from two different departments with the same salary then it will return you the one with the higher id and it will return the higher department id which can be incorrect.
And finally this will give you one employee that has a second largest salary with correct data:
SELECT id
, salary
, depid
from emp
where id = (SELECT max(id)
from emp
group by depid, salary
ORDER BY salary DESC
OFFSET 1 ROWS
FETCH FIRST 1 ROWS ONLY);
First, you want dense_rank(), not row_number() if you want the second highest value -- ties might get in the way otherwise.
You can use an arithmetic trick:
SELECT TOP (1) WITH TIES id, salary, depid
FROM emp
ORDER BY ABS(DENSE_RANK() over (PARTITION BY depid ORDER BY salary DESC) - 2)
The "-2" is an arithmetic trick to put the "second" values highest.
That said, I would stick with the subquery because the intent in clearer.
You could use a variation on the trick that uses a TOP 1 WITH TIES in combination with an ORDER BY ROW_NUMBER
SELECT TOP 1 WITH TIES
id,
salary,
depid
FROM emp
ORDER BY IIF(2 = ROW_NUMBER() OVER (PARTITION BY depid ORDER BY salary DESC), 1, 2)
But this trick does have the disadvantage that you can't sort it by something else.
Well, not unless you wrap it in a sub-query and sort the outer query.
A test on rextester here
I prefer to use dense_rank() instead of row_number() function with CTE (common table expression) for the scenario you have mentioned. CTE is modern, easy to use and have many cool features like it is memory resident, it can be used for DUI operations, it make code easy to understand etc.
To find Nth highest salary, the CTE look like
;with findnthsalary
as
(
select empid, deptid, salary,
dense_rank() over(partition by deptid order by salary desc) salrank
from
Employee
)
select distinct id, deptid, salary
from findnthsalary
where salrank = N
I used dense_rank() because if you use row_number() it will produce the wrong result in case multiple employees have the same salary in the same department.

SQL Query with two WHERE conditions

My task is:
To make a query which will get Employees, who earn the biggest salary for their working experience. In other words, the Employee who earns the biggest salary with the biggest experience.
As I consider, I need to make a query with two conditions:
select * from employee where salary in (select max(salary) from employee) and
hire_date in (select min(hire_date) from employee)
I think this was what you're trying to do:
select * from (
select *,
datediff(day,hire_date,getdate()) [Days_Worked],
dense_Rank() over(Partition by datediff(day,hire_date,getdate()) order by salary desc) [RN]
from employee
)a
where a.RN = 1
order by Days_Worked DESC
So that'll give you a list of employees with the highest salary against the employees that have worked the same number of days.
Just note that with dense rank if for example there are 2 employees that have worked 88 days and both earn $50000 (higher than anyone else) it will list both employees, use ROW_NUMBER() instead of DENSE_RANK() if you want to restrict examples like that to 1 employee.
If I get it correctly, this query solves your problem.
SELECT TOP 1 WITH TIES * FROM
employee
ORDER BY
hire_date ASC,
salary DESC

To display the name of the department that has the least student count in SQL

Tables:
Department (dept_id,dept_name)
Students(student_id,student_name,dept_id)
I am using Oracle. I have to print the name of that department that has the minimum no. of students. Since I am new to SQL, I am stuck on this problem. So far, I have done this:
select d.department_id,d.department_name,
from Department d
join Student s on s.department_id=d.department_id
where rownum between 1 and 3
group by d.department_id,d.department_name
order by count(s.student_id) asc;
The output is incorrect. It is coming as IT,SE,CSE whereas the output should be IT,CSE,SE! Is my query right? Or is there something missing in my query?
What am I doing wrong?
One of the possibilities:
select dept_id, dept_name
from (
select dept_id, dept_name,
rank() over (order by cnt nulls first) rn
from department
left join (select dept_id, count(1) cnt
from students
group by dept_id) using (dept_id) )
where rn = 1
Group data from table students at first, join table department, rank numbers, take first row(s).
left join are used is used to guarantee that we will check departments without students.
rank() is used in case that there are two or more departments with minimal number of students.
To find the department(s) with the minimum number of students, you'll have to count per department ID and then take the ID(s) with the minimum count.
As of Oracle 12c this is simply:
select department_id
from student
group by department_id
order by count(*)
fetch first row with ties
You then select the departments with an ID in the found set.
select * from department where id in (<above query>);
In older versions you could use RANK instead to rank the departments by count:
select department_id, rank() over (order by count(*)) as rnk
from student
group by department_id
The rows with rnk = 1 would be the department IDs with the lowest count. So you could select the departments with:
select * from department where (id, 1) in (<above query>);

Select records with maximum value

I have a table that is called: customers.
I'm trying to get the name and the salary of the people who have the maximum salary.
So I have tried this:
SELECT name, salary AS MaxSalary
FROM CUSTOMERS
GROUP BY salary
HAVING salary = max(salary)
Unfortunately, I got this error:
Column 'CUSTOMERS.name' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
I know I should add the name column to the group by clause, but I get all the records of the table.
I know that I can do it by:
SELECT name, salary
FROM CUSTOMERS
WHERE salary = (SELECT MAX(salary) FROM CUSTOMERS)
But I want to achieve it by group by and having clauses.
This requirement isn't really suited for a group by and having solution. The easiest way to do so, assuming you're using a modern-insh version of MS SQL Server, is to use the rank window function:
SELECT name, salary
FROM (SELECT name, salary, RANK() OVER (ORDER BY salary DESC) rk
FROM customers) c
WHERE rk = 1
Mureinik's answer is good with rank, but if you didn't want a windowed function for whatever reason, you can just use a CTE or a subquery.
with mxs as (
select
max(salary) max_salary
from
customers
)
select
name
,salary
from
customers cst
join mxs on mxs.max_salary = cst.salary
There was no need to use group by and having clause there, you know. But if you want to use them then query should be
SELECT name, salary
FROM CUSTOMERS
GROUP BY salary
having salary = (select max(salary) from CUSTOMERS)