Field value counts with aggregate conditions - sql

Suppose I have the following applicant data for jobs in a company:
id position salary
——————————————————————
0 senior 20000
1 senior 15000
2 associate 10000
The budget is 40000 and the preference is to hire senior managers. What PostgreSQL constructs do I use to get the following result as far as the number of hires are concerned.
seniors associates
———————————————————
2 0
Any directions would be appreciated.
Here is a starting sqlfiddle: http://sqlfiddle.com/#!17/2cef4/1

Using PostgreSQL filters and window functions, I was able to come up with a query that produced the result.
select
count(*) filter(where s.position = 'senior') as seniors,
count(*) filter(where s.position = 'associate') as associates
from (
select
position,
sum(salary) over(order by position desc rows between unbounded preceding and current row) as salary
from
candidates
) as s
where s.salary <= 40000;
Example: http://sqlfiddle.com/#!17/2cef4/10

Related

Nth salary in SQL

I'm trying to understand below query, how its working.
SELECT *
FROM Employee Emp1
WHERE (N-1) = (
SELECT COUNT(DISTINCT(Emp2.Salary))
FROM Employee Emp2
WHERE Emp2.Salary > Emp1.Salary
)
Lets say I have 5 distinct salaries and want to get 3rd largest salary. So Inner query will run first and then outer query ?
I'm getting confused how its being done in sql engine. Curious to know. Becasue if its 3rd largest then 3-1 = 2, so that 2 needs to be matched with inner count as well. How inner count is being operated.
Can anyone explain the how its working .. ?
The subquery is correlated subquery, so it conceptually executes once for each row in the outer query (database optimizations left apart).. What it does is count how many employees have a salary greater than the one on the row in the outer query: if there are 2 employee with a higher salary, then you know that the employee on the current row in the outer query has the third highest salary.
Another way to phrase this is to use row_number() for this:
select *
from (
select
e.*,
row_number() over(order by salary desc) rn
from employee e
) t
where rn = 3
Depending on how you want to handle duplicates, dense_rank() might also be an option.
SELECT * FROM (SELECT EMP.ID,RANK() OVER (ORDER BY SALARY DESC) AS NOS FROM EMPLOYEE) T WHERE T.NOS=3
Then from this select the one with any desired rank.
It is easier to understand when you run this query:
select e1.*,
(select count(distinct e2.salary)
from employee e2
where e2.salary > e1.salary) as n
from employee e1
This is my sample table:
create table employee(salary) as (
select * from table(sys.odcinumberlist(1500, 1200, 1400, 1500, 1100)));
so my output is:
SALARY N
---------- ----------
1500 0
1200 2
1400 1
1500 0
1100 3
As you can see that subquery counts, for each row, salaries which are greater than salary in current row. So for instance, for 1400 there is one DISTINCT greater salary (1500). 1500 appears twice in my table, but distinct makes that it is counted once. So 1400 is second in order.
Your query has this count moved to the where part and compared with required value. We have to substract one, because for highest salary there is no higher value, for second salary one row etc.
It's one of the methods used to find such values, newer Oracle versions introduced analytic functions (rank, row_number, dense_rank) which eliminates the need of using subqueries for such purposes. They are faster, more efficient. For your query dense_rank() would be useful.

Return the first n records per group Oracle SQL [duplicate]

This question already has answers here:
Select top 10 records for each category
(14 answers)
Closed 6 years ago.
My problem
I want to return the top n rows per group ordered by a date in Oracle 10g
My table
EMPLOYEE|START_DATE|DEPARTMENT
Amy |01-02-1901|Sales
Edwina |01-02-1902|Mergers
Tawnee |01-02-1904|Legal
Trudy |01-02-1998|Sales
Tanner |01-02-1967|Sales
Kelly |01-02-1954|Mergers
Jenny |01-02-1991|Sales
Jacinta |01-02-1924|Legal
Suzanne |01-02-1976|Legal
Jacqui |01-02-1989|Legal
Jill |01-02-1989|Mergers
Kate |01-02-1998|Mergers
Jane |01-02-1900|Sales
Louise |01-02-1912|Mergers
Kim |01-02-1976|Sales
Cara |01-02-1955|Sales
Kirsten |01-02-1933|Legal
Sarah |01-02-1998|Legal
Desired outcome
EMPLOYEE|START_DATE|DEPARTMENT
Jane |01-02-1900|Sales
Amy |01-02-1901|Sales
Tawnee |01-02-1904|Legal
Jacinta |01-02-1924|Legal
Sarah |01-02-1998|Legal
Edwina |01-02-1902|Mergers
Louise |01-02-1912|Mergers
What I've tried
(select * from
employees where
DEPARTMENT = 'Sales' and
rownum <3;)
UNION
(select * from
employees where
DEPARTMENT = 'Legal' and
rownum <3;)
UNION
(select * from
employees where
DEPARTMENT = 'Mergers' and
rownum <3;)
REALLY ugly query
I'm thinking if there was a way you could you an
OVER (PARTITION BY DEPARTMENT)
but from what I read, this needs to be preceded by an analytic function (count, sum whatever). Is there a more elegant, inexpensive solution?
Consider this non-Windows function approach using a count correlated aggregate query. The idea is to run a department rank subquery and then use that in a derived table that filters outer query by this department rank. Please note your desired results do not return by ordered START_DATE but simply query's row number.
SELECT main.EMPLOYEE, t.START_DATE, t.DEPARTMENT
FROM
(SELECT t.EMPLOYEE, t.START_DATE, t.DEPARTMENT,
(SELECT Count(*) FROM Employees sub
WHERE sub.START_DATE <= t.START_DATE
AND sub.Department = t.Department) AS DeptRank
FROM Employees t) main
WHERE main.DeptRank <= 3
ORDER BY main.DEPARTMENT, main.START_DATE;
-- EMPLOYEE START_DATE DEPARTMENT
-- Tawnee 1/2/1904 Legal
-- Jacinta 1/2/1924 Legal
-- Kirsten 1/2/1933 Legal
-- Edwina 1/2/1902 Mergers
-- Louise 1/2/1912 Mergers
-- Kelly 1/2/1954 Mergers
-- Jane 1/2/1900 Sales
-- Amy 1/2/1901 Sales
-- Cara 1/2/1955 Sales
For the Windows function counterpart:
SELECT main.EMPLOYEE, t.START_DATE, t.DEPARTMENT
FROM
(SELECT t.EMPLOYEE, t.START_DATE, t.DEPARTMENT,
RANK() OVER (PARTITION BY Department
ORDER BY START_DATE) AS DeptRank
FROM Employees t) main
WHERE main.DeptRank <= 3
ORDER BY main.DEPARTMENT, main.START_DATE;
And as #Matt comments, you may want to handle ties (i.e., employees who started on same day). Both above solutions will output all such employees depending on rank filter. To take one of the ties in correlated subquery, use Employee name as tiebreaker (or better yet a unique ID if available):
SELECT main.EMPLOYEE, t.START_DATE, t.DEPARTMENT
FROM
(SELECT t.EMPLOYEE, t.START_DATE, t.DEPARTMENT,
(SELECT Count(*) FROM Employees sub
WHERE sub.Department = t.Department
AND (sub.START_DATE <= t.START_DATE
OR sub.START_DATE = t.START_DATE
AND sub.EMPLOYEE < t.EMPLOYEE) AS DeptRank
FROM Employees t) main
WHERE main.DeptRank <= 3
ORDER BY main.DEPARTMENT, main.START_DATE;
And for window-function query use ROW_NUMBER() in place of RANK().

Find multiple maximum values form a table

I have read answers to similar questions but I cannot find a solution to my particular problem.
I will use a simple example to demonstrate my question.
I have a table called 'Prizes' with two columns: Employees and Awards
The employee column lists the employee's ID and award shows a single award won by the employee. If an employee has won multiple awards their ID will be listed in multiple rows of the table along with each unique award.
The table would look as follows:
Employee AWARD
1 Best dressed
1 Most attractive
2 Biggest time waster
1 Most talkative
3 Hardest worker
4 Most shady
3 Most positive
3 Heaviest drinker
2 Most facebook friends
Using this table, how would I select the ID's of the employees who won the most awards?
The output should be:
Employee
1
3
For the example as both these employees won 3 awards
Currently, the query below outputs the employee ID along with the number of awards they have won in descending order:
SELECT employee,COUNT(*) AS num_awards
FROM prizes
GROUP BY employee
ORDER BY num_awards DESC;
Would output:
employee num_awards
1 3
3 3
2 2
4 1
How could I change my query to select the employee(s) with the most awards?
A simple way to express this is using rank() or dense_rank():
SELECT p.*
FROM (SELECT employee, COUNT(*) AS num_awards,
RANK() OVER (ORDER BY COUNT(*) DESC) as seqnum
FROM prizes
GROUP BY employee
) p
WHERE seqnum = 1;
Being able to combine aggregation functions and analytic functions can make these queries much more concise.
You can use dense_rank to get all the rows with highest counts.
with cnts as (
SELECT employee, count(*) cnt
FROM prizes
GROUP BY employee)
, ranks as (select employee, cnt, dense_rank() over(order by cnt desc) rnk
from cnts)
select employee, cnt
from ranks where rnk = 1

Find the second highest salary [duplicate]

This question already has answers here:
How to find the employee with the second highest salary?
(5 answers)
Closed 2 years ago.
Well it is a well known question. Consider the below
EmployeeID EmployeeName Department Salary
----------- --------------- --------------- ---------
1 T Cook Finance 40000.00
2 D Michael Finance 25000.00
3 A Smith Finance 25000.00
4 D Adams Finance 15000.00
5 M Williams IT 80000.00
6 D Jones IT 40000.00
7 J Miller IT 50000.00
8 L Lewis IT 50000.00
9 A Anderson Back-Office 25000.00
10 S Martin Back-Office 15000.00
11 J Garcia Back-Office 15000.00
12 T Clerk Back-Office 10000.00
We need to find out the second highest salary
With Cte As
(
Select
level
,Department
,Max(Salary)
From plc2_employees
Where level = 2
Connect By Prior (Salary) > Salary)
Group By level,Department
)
Select
Employeeid
,EmployeeName
,Department
,Salary
From plc2_employees e1
Inner Join Cte e2 On e1.Department = e2.Department
Order By
e1.Department
, e1.Salary desc
,e1.EmployeeID
is somehow not working... I am not getting the correct result. Could anyone please help me out.
Something like
select * from
(
select EmployeeID, EmployeeName, Department, Salary,
rank () over (partition by Department order by Salary desc) r
from PLC2_Employees
)
where r = 2
Edit - tested it and it gives the answer you expected.
If you're going to teach yourself how to deal with CONNECT BY, you should first find a problem that is suited to the construct. CONNECT BY is meant for processing data that's in a hierarchical form, which your example is not. Salaries are not related to each other in a hierarchical fashion. Trying to force-fit a construct on the wrong problem is frustrating and doesn't really teach you anything.
Take a look at the classic employee-manager relationship in the demo HR schema you can install with Oracle. All employees report to a manager, including managers (except the top guy). You can then use this schema to create a query to show, for example, the Organization Chart for the company.
START WITH … CONNECT BY is designed to explore data that forms a graph, by exploring all possible descending paths. You specify the root nodes in the START WITH clause and the node connections in the CONNECT BY clause (not in the WHERE clause).
The WHERE clause filters will be processed after the hierachical conditions, same for GROUP BY and HAVING (of course because GROUP BY is computed after WHERE).
Therefore you MUST here CONNECT BY PRIOR department = department for example. You must also avoid that a node connection is done between two salaries when there is an intermediate salary.
Therefore the final query would resemble this:
SELECT level
, Department
, Salary
FROM plc2_employees pe1
START WITH pe1.salary = (select max(salary) from plc2_employees pe2 WHERE pe2.Department = pe1.Department)
CONNECT BY PRIOR pe1.Department = pe1.Department
AND PRIOR pe1.Salary > pe1.Salary
AND PRIOR pe1.Salary = ( SELECT MIN(Salary) FROM plc2_employees pe3
WHERE pe3.Department = pe1.Department
AND pe3.Salary > pe1.Salary
)
The recursion condition states that there is no intermediate salary between the child row and the parent row.
Note that this will really be unefficient…
Try this, it gives second highest salary
select MAX(Salary) as Salary
from Employee_salary
where Salary not in (select MAX(Salary) from Employee_salary)
You can use this query:
select * from
employee e1
where 2 = (select count (distinct (salary))
from employee e2
where e2.salary >=e1.salary);
find out second highest salary from employee table having column as salary:
Database : DB2
with t as
(
select distinct salary from employee order by salary desc
),
tr as
(
select salary, row_Number() over() r from t
)
select salary from tr where r = 2
Try this,
It gives second highest salary...
select MAX(Salary) as Salary
from Employee_salary
where Salary not in (select MAX(Salary) from Employee_salary )
If you want to find nth highest salary than you can use following query....
you need to do just one change.....
Put the value of N=nth highest
Cheers....:)
SELECT * FROM Employee_salary Emp1
WHERE (N-1) = (SELECT COUNT(DISTINCT(Emp2.Salary))
FROM Employee_salary Emp2
WHERE Emp2.Salary > Emp1.Salary)
This will work -
SELECT MIN(Salary)
FROM employee
WHERE salary IN (SELECT TOP 2 salary FROM employee ORDER BY salary DESC)
First, select the distinct salaries in descending order (from greatest to least), from that set select the top 2 and put in ascending order (placing number 2 on top), then from those 2 select top 1:
select top 1 s.Salary
from
(select top 2 t.Salary
from
(select distinct Salary
from PLC2_Employees
order by Salary desc) t
order by Salary asc) s

SQL query to find Nth highest salary from a salary table

How can I find the Nth highest salary in a table containing salaries in SQL Server?
You can use a Common Table Expression (CTE) to derive the answer.
Let's say you have the following salaries in the table Salaries:
EmployeeID Salary
--------------------
10101 50,000
90140 35,000
90151 72,000
18010 39,000
92389 80,000
We will use:
DECLARE #N int
SET #N = 3 -- Change the value here to pick a different salary rank
SELECT Salary
FROM (
SELECT row_number() OVER (ORDER BY Salary DESC) as SalaryRank, Salary
FROM Salaries
) as SalaryCTE
WHERE SalaryRank = #N
This will create a row number for each row after it has been sorted by the Salary in descending order, then retrieve the third row (which contains the third-highest record).
SQL Fiddle
For those of you who don't want a CTE (or are stuck in SQL 2000):
[Note: this performs noticably worse than the above example; running them side-by-side with an exceution plans shows a query cost of 36% for the CTE and 64% for the subquery]:
SELECT TOP 1 Salary
FROM
(
SELECT TOP N Salary
FROM Salaries
ORDER BY Salary DESC
) SalarySubquery
ORDER BY Salary ASC
where N is defined by you.
SalarySubquery is the alias I have given to the subquery, or the query that is in parentheses.
What the subquery does is it selects the top N salaries (we'll say 3 in this case), and orders them by the greatest salary.
If we want to see the third-highest salary, the subquery would return:
Salary
-----------
80,000
72,000
50,000
The outer query then selects the first salary from the subquery, except we're sorting it ascending this time, which sorts from smallest to largest, so 50,000 would be the first record sorted ascending.
As you can see, 50,000 is indeed the third-highest salary in the example.
You could use row_number to pick a specific row. For example, the 42nd highest salary:
select *
from (
select row_number() over (order by Salary desc) as rn
, *
from YourTable
) as Subquery
where rn = 42
Windowed functions like row_number can only appear in select or order by clauses. The workaround is placing the row_number in a subquery.
select MIN(salary) from (
select top 5 salary from employees order by salary desc) x
EmpID Name Salary
1 A 100
2 B 800
3 C 300
4 D 400
5 E 500
6 F 200
7 G 600
SELECT * FROM Employee E1
WHERE (N-1) = (
SELECT COUNT(DISTINCT(E2.Salary))
FROM Employee E2
WHERE E2.Salary > E1.Salary
)
Suppose you want to find 5th highest salary, which means there are total 4 employees who have salary greater than 5th highest employee. So for each row from the outer query check the total number of salaries which are greater than current salary. Outer query will work for 100 first and check for number of salaries greater than 100. It will be 6, do not match (5-1) = 6 where clause of outerquery. Then for 800, and check for number of salaries greater than 800, 4=0 false then work for 300 and finally there are totally 4 records in the table which are greater than 300. Therefore 4=4 will meet the where clause and will return
3 C 300.
try it...
use table_name
select MAX(salary)
from emp_salary
WHERE marks NOT IN (select MAX(marks)
from student_marks )
Simple way WITHOUT using any special feature specific to Oracle, MySQL etc.
Suppose in EMPLOYEE table Salaries can be repeated.
Use query to find out rank of each ID.
select *
from (
select tout.sal, id, (select count(*) +1 from (select distinct(sal) distsal from
EMPLOYEE ) where distsal >tout.sal) as rank from EMPLOYEE tout
) result
order by rank
First we find out distinct salaries. Then we find out count of distinct salaries greater than each row. This is nothing but the rank of that id. For highest salary, this count will be zero. So '+1' is done to start rank from 1.
Now we can get IDs at Nth rank by adding where clause to above query.
select *
from (
select tout.sal, id, (select count(*) +1 from (select distinct(sal) distsal from
EMPLOYEE ) where distsal >tout.sal) as rank from EMPLOYEE tout
) result
where rank = N;
The easiest method is to get 2nd higest salary from table in SQL:
sql> select max(sal) from emp where sal not in (select max(sal) from emp);
Dont forget to use the distinct keyword:-
SELECT TOP 1 Salary
FROM
(
SELECT Distinct TOP N Salary
FROM Salaries
ORDER BY Salary DESC
) SalarySubquery
ORDER BY Salary ASC
Solution 1: This SQL to find the Nth highest salary should work in SQL Server, MySQL, DB2, Oracle, Teradata, and almost any other RDBMS: (note: low performance because of subquery)
SELECT * /*This is the outer query part */
FROM Employee Emp1
WHERE (N-1) = ( /* Subquery starts here */
SELECT COUNT(DISTINCT(Emp2.Salary))
FROM Employee Emp2
WHERE Emp2.Salary > Emp1.Salary)
The most important thing to understand in the query above is that the subquery is evaluated each and every time a row is processed by the outer query. In other words, the inner query can not be processed independently of the outer query since the inner query uses the Emp1 value as well.
In order to find the Nth highest salary, we just find the salary that has exactly N-1 salaries greater than itself.
Solution 2: Find the nth highest salary using the TOP keyword in SQL Server
SELECT TOP 1 Salary
FROM (
SELECT DISTINCT TOP N Salary
FROM Employee
ORDER BY Salary DESC
) AS Emp
ORDER BY Salary
Solution 3: Find the nth highest salary in SQL Server without using TOP
SELECT Salary FROM Employee
ORDER BY Salary DESC OFFSET N-1 ROW(S)
FETCH FIRST ROW ONLY
Note that I haven’t personally tested the SQL above, and I believe that it will only work in SQL Server 2012 and up.
SELECT * FROM
(select distinct postalcode from Customers order by postalcode DESC)
limit 4,1;
4 here means leave first 4 and show the next 1.
Try this it works for me.
Very simple one query to find nth highest salary
SELECT DISTINCT(Sal) FROM emp ORDER BY Salary DESC LIMIT n,1