Retrieve 3rd MAX salary in Hive - hive

I'm a novice. I have the following Employee table.
ID Name Country Salary ManagerID
I retrieved the 3rd max salary using the following.
select name , salary From (
select name, salary from
employee sort by salary desc limit 3)
result sort by salary limit 1;
How to do the same to display 3rd max salary for each country? can we use OVER (PARTITION BY country)? I tried looking in the languageManual Windowing and Analytics but I'm finding it difficult to understand. Please help!

You're definitely on the right track with windowing functions. row_number() is a good function to use here.
select name, salary
from (
select name
, salary
, row_number() over (partition by country order by salary desc) idx
from employee ) x
where idx = 3
when you order by salary, make sure that it is a numerical type, or it will not be sorted correctly.

Related

Retrieve attribute with max function

How retrieve other attribute with max function like max salary with name.
to retrieve max salary along with their name please help anyone.
If my understanding of the question is correct, you can use something like this:
select name, salary from table where salary = (select max(salary) from table)
With SalaryOrder AS (
Select name, salary, Row_Number() Over(Order By salary Desc) RN
From table
)
Select /*the top clause is needed as described below*/ Top (1) *
From SalaryOrder Where RN = 1
this query may return more than a record when existing people with the same salary.
To solve this you can add more orders in CTE or use Top one to pick a random record.
RANK() should be an efficient way to get name where salary is MAX. Using sub-query will also get us the same result but it will take more time than RANK() in some cases.
Query:
SELECT name, salary
FROM
(
SELECT name, salary, RANK() over(ORDER BY salary DESC) AS rnk
FROM your_table
) AS a
WHERE rnk=1
Look at the db<>fiddle with time consumption. (The time may vary for different runs)

Row_number function using directly

Is there any direct way of using row_number() function? I want to find 2 nd highest salary
SELECT DISTINCT id
,salary
,depid
,ROW_NUMBER() OVER (
PARTITION BY depid ORDER BY salary DESC
) AS rownum
FROM emp
WHERE rownum = 2;
It gives an error, However the below code works fine.
SELECT *
FROM (
SELECT DISTINCT id
,salary
,depid
,ROW_NUMBER() OVER (
PARTITION BY depid ORDER BY salary DESC
) AS rownum
FROM emp
) AS t
WHERE t.rownum = 2;
Is any way of directly using the row_number() function as in the first option which is giving the error?
You can not use the alias name of the same query as the condition for the where clause. You also can not use windowed queries as a passing condition in the where clause.
Here is a detailed explanation Why no windowed functions in where clauses?. It is so you need another query outside the inner query and needs to write sub-query.
You can get the Nth highest salary in SQL Server from the below query.
SELECT TOP 1 salary
FROM (
SELECT DISTINCT TOP N salary
FROM <YourTableNameHere>
ORDER BY salary DESC
) AS TEMP
ORDER BY salary
This query will give you the second highest salary ? No
SELECT id
,salary
,depid
from emp
ORDER BY salary DESC
OFFSET 1 ROWS
FETCH FIRST 1 ROWS ONLY;
Well actually, it will give you the salary that is on the second position when you order the salary's from highest to lowest... So if the highest is 100 and the second highest is 100 then you will get 100 as a result. To conclude this will return a row on the second place depending on the order by clause...
This next query will give you the second highest salary :
SELECT max(id)
, salary
, max(depid)
from emp
group by salary
ORDER BY salary DESC
OFFSET 1 ROWS
FETCH FIRST 1 ROWS ONLY;
But be aware, in case you have two employees from two different departments with the same salary then it will return you the one with the higher id and it will return the higher department id which can be incorrect.
And finally this will give you one employee that has a second largest salary with correct data:
SELECT id
, salary
, depid
from emp
where id = (SELECT max(id)
from emp
group by depid, salary
ORDER BY salary DESC
OFFSET 1 ROWS
FETCH FIRST 1 ROWS ONLY);
First, you want dense_rank(), not row_number() if you want the second highest value -- ties might get in the way otherwise.
You can use an arithmetic trick:
SELECT TOP (1) WITH TIES id, salary, depid
FROM emp
ORDER BY ABS(DENSE_RANK() over (PARTITION BY depid ORDER BY salary DESC) - 2)
The "-2" is an arithmetic trick to put the "second" values highest.
That said, I would stick with the subquery because the intent in clearer.
You could use a variation on the trick that uses a TOP 1 WITH TIES in combination with an ORDER BY ROW_NUMBER
SELECT TOP 1 WITH TIES
id,
salary,
depid
FROM emp
ORDER BY IIF(2 = ROW_NUMBER() OVER (PARTITION BY depid ORDER BY salary DESC), 1, 2)
But this trick does have the disadvantage that you can't sort it by something else.
Well, not unless you wrap it in a sub-query and sort the outer query.
A test on rextester here
I prefer to use dense_rank() instead of row_number() function with CTE (common table expression) for the scenario you have mentioned. CTE is modern, easy to use and have many cool features like it is memory resident, it can be used for DUI operations, it make code easy to understand etc.
To find Nth highest salary, the CTE look like
;with findnthsalary
as
(
select empid, deptid, salary,
dense_rank() over(partition by deptid order by salary desc) salrank
from
Employee
)
select distinct id, deptid, salary
from findnthsalary
where salrank = N
I used dense_rank() because if you use row_number() it will produce the wrong result in case multiple employees have the same salary in the same department.

SQL Query with two WHERE conditions

My task is:
To make a query which will get Employees, who earn the biggest salary for their working experience. In other words, the Employee who earns the biggest salary with the biggest experience.
As I consider, I need to make a query with two conditions:
select * from employee where salary in (select max(salary) from employee) and
hire_date in (select min(hire_date) from employee)
I think this was what you're trying to do:
select * from (
select *,
datediff(day,hire_date,getdate()) [Days_Worked],
dense_Rank() over(Partition by datediff(day,hire_date,getdate()) order by salary desc) [RN]
from employee
)a
where a.RN = 1
order by Days_Worked DESC
So that'll give you a list of employees with the highest salary against the employees that have worked the same number of days.
Just note that with dense rank if for example there are 2 employees that have worked 88 days and both earn $50000 (higher than anyone else) it will list both employees, use ROW_NUMBER() instead of DENSE_RANK() if you want to restrict examples like that to 1 employee.
If I get it correctly, this query solves your problem.
SELECT TOP 1 WITH TIES * FROM
employee
ORDER BY
hire_date ASC,
salary DESC

Query to find No. of employees earning maximum salary

I am trying to find the number of employees in a table that earn exactly the maximum salary of all the employees in the table called tblPerson.
Select Max(x.[No of Employees]) as Number, x.Salary as Salary
from
(
Select Count(Id) as [No of Employees], Salary
from tblPerson
Group by Salary
Having Salary = MAX(Salary)
)x
where x.[No of Employees]=3
Now I know this is a kind of long and complex way of doing it, but I was trying to do it using a derived table. But I am getting the error:
"Column 'x.Salary' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause"
My question is, why am I getting this particular error since the main query is a simple Select statement with a where clause. Isn't it??
Mainly, aggregate functions work only with other aggregate functions or grouped by columns.
Why? Because an aggregate function needs to know the set of values to do calculation with.
In you case, the max() will want to use all the data available for the calculation and display a single result (single row) and the other column will want to be displayed row by row. So there's a conflict.
Thank you all. Every answer helped me. However, I think I found a pretty simple way to do it:
Select top 1 Count(Id) as [No of Employees], salary
from tblPerson
Group by Salary
Order by [No of Employees] DESC
select count(*) from tblPerson where salary=(select max(salary) from tblPerson)
You get the error because 'max' is an aggregation, while you have nothing to aggregate the number by.
Select Max(x.[No of Employees]) as Number, x.Salary as Salary
from
(
Select Count(Id) as [No of Employees], Salary
from tblPerson
Group by Salary
Having Salary = MAX(Salary)
)x
---------
Group by Salary -- all other items in your select statement
---------
where x.[No of Employees]=3
however, you can also use a temporary table or variable to find the persons.
To solve this via a variable, you could do the following
declare #maxSalary Decimal
set #maxSalary = (Select max(salary) from tblperson) --insert the max value into a variable
Then either aggregate the persons (or do some other logic):
Select ID from tblperson where salary = #maxSalary
The reason for not using a group by is that using a variable is more efficient, as you search the table instead of aggregating over it.
Create a CTE (RESULT), and using DENSE_RANK function, get the highest salary, together with the EmployeeID's.
The first row of the RESULT table will give the highest salary.
Using the aggregate function COUNT, get the number of Employees with the highest Salary.
with RESULT (EmployeeID, Salary, DenseRank) as
(select EmployeeID, Salary,
DENSE_RANK() over (ORDER BY SALARY DESC) AS DenseRank
from Employee)
select TOP 1 Salary,
(select COUNT(EmployeeID)
from Employee
where Salary = (select TOP 1 Salary)
from RESULT
where DenseRank = 1)
)
from RESULT
where DenseRank = 1;

How to find out 2nd highest salary of employees?

Created table named geosalary with columns name, id, and salary:
name id salary
patrik 2 1000
frank 2 2000
chinmon 3 1300
paddy 3 1700
I tried this below code to find 2nd highest salary:
SELECT salary
FROM (SELECT salary, DENSE_RANK() OVER(ORDER BY SALARY) AS DENSE_RANK FROM geosalary)
WHERE DENSE_RANK = 2;
However, getting this error message:
ERROR: subquery in FROM must have an alias
SQL state: 42601
Hint: For example, FROM (SELECT ...) [AS] foo.
Character: 24
What's wrong with my code?
I think the error message is pretty clear: your sub-select needs an alias.
SELECT t.salary
FROM (
SELECT salary,
DENSE_RANK() OVER (ORDER BY SALARY DESC) AS DENSE_RANK
FROM geosalary
) as t --- this alias is missing
WHERE t.dense_rank = 2
The error message is pretty obvious: You need to supply an alias for the subquery.
Here is a simpler / faster alternative:
SELECT DISTINCT salary
FROM geosalary
ORDER BY salary DESC NULLS LAST
OFFSET 1
LIMIT 1;
This finds the "2nd highest salary" (1 row), as opposed to other queries that find all employees with the 2nd highest salary (1-n rows).
I added NULLS LAST, as NULL values typically shouldn't rank first for this purpose. See:
PostgreSQL sort by datetime asc, null first?
SELECT department_id, salary, RANK1 FROM (
SELECT department_id,
salary,
DENSE_RANK ()
OVER (PARTITION BY department_id ORDER BY SALARY DESC)
AS rank1
FROM employees) result
WHERE rank1 = 3
This above query will get you the 3rd highest salary in the individual department. If you want regardless of the department, then just remove PARTITION BY department_id
Your SQL engine doesn't know the "salary" column of which table you are using, that's why you need to use an alias to differentiate the two columns.
Try this:
SELECT salary
FROM (SELECT G.salary ,DENSE_RANK() OVER(ORDER BY G.SALARY) AS DENSE_RANK FROM geosalary G)
WHERE DENSE_RANK=2;
WITH salaries AS (SELECT salary, DENSE_RANK() OVER(ORDER BY SALARY) AS DENSE_RANK FROM geosalary)
SELECT * FROM salaries WHERE DENSE_RANK=2;
select level, max(salary)
from geosalary
where level=2
connect by
prior salary>salary
group by level;
In case of duplicates in salary column below query will give the right result:
WITH tmp_tbl AS
(SELECT salary,
DENSE_RANK() OVER (ORDER BY SALARY) AS DENSE_RANK
FROM geosalary
)
SELECT salary
FROM tmp_tbl
WHERE dense_rank =
(SELECT MAX(dense_rank)-1 FROM tmp_tbl
)
AND rownum=1;
Here is SQL standard
SELECT name, salary
FROM geosalary
ORDER BY salary desc
OFFSET 1 ROW
FETCH FIRST 1 ROW ONLY
To calculate nth highest salary change offset value
SELECT MAX(salary) FROM geosalary WHERE salary < ( SELECT MAX(salary) FROM geosalary )