I'm using Oracle and SQL Developer. I have downloaded HR schema and need to do some queries with it. Now I'm working with table Employees. As an user I need to see employees with the highest gap between their salary and the average salary of all later hired colleagues in corresponding department. It seems quite interesting and really complicated. I have read some documentation and tried, for example LEAD(), that provides access to more than one row of a table at the same time:
SELECT
employee_id,
first_name
|| ' '
|| last_name,
department_id,
salary,
hire_date,
LEAD(hire_date)
OVER(PARTITION BY department_id
ORDER BY
hire_date DESC
) AS Prev_hiredate
FROM
employees
ORDER BY
department_id,
hire_date;
That shows for every person in department hiredate of later hired person. Also I have tried to use window clause to understand its concepts:
SELECT
employee_id,
first_name
|| ' '
|| last_name,
department_id,
hire_date,
salary,
AVG(salary)
OVER(PARTITION BY department_id
ORDER BY
hire_date ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING
) AS avg_sal
FROM
employees
ORDER BY
department_id,
hire_date;
The result of this query will be:
However, it is not exactly what I need. I need to reduce the result just by adding column with gap (salary-avr_sal), where the gap will be highest and receive one employee per department. How should the result look like: for example, we have 60 department. We have 5 employees there ordering by hire_date. First has salary 4800, second – 9000, third – 4800, fourth – 4200, fifth – 6000. If we do calculations: 4800 - ((9000+4800+4200+6000)/4)=-1200, 9000-((4800+4200+6000)/3)=4000, 4800 -((4200+6000)/2)=-300, 4200 - 6000=-1800 and the last person in department will have the highest gap: 6000 - 0 = 6000. Let's take a look on 20 department. We have two people there: first has salary 13000, second – 6000. Calculations: 13000 - 6000 = 7000, 6000 - 0 = 6000. The highest gap will be for first person. So for department 20 the result should be person with salary 13000, for department 60 the result should be person with salary 6000 and so on.
How should look my query to get the appropriate result (what I need is marked bold up, also I want to see column with highest gap, can be different solutions with analytic functions, but should be necessarily included window clause)?
You can get the average salary of employees that were hired prior to the current employee by just adapting the rows clause of your avg:
AVG(salary) OVER(
PARTITION BY department_id
ORDER BY hire_date
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
) AS avg_salary
The 1 PRECEDING clause tells the database not to include the current row in the window.
If you are looking for the employees with the greatest gap to that average, we can just order by the resultset:
SELECT e.*,
AVG(salary) OVER(
PARTITION BY department_id
ORDER BY hire_date
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
) AS avg_salary
FROM employees e
ORDER BY ABS(salary - avg_salary) DESC;
Finally, if you want the top "outlier salary" per department, then we need at least one more level. The shortest way to express this probably is to use ROW_NUMBER() to rank employees in each department by their salary gap to the average, and then to fetch all top rows per group using WITH TIES:
SELECT *
FROM (
SELECT e.*,
AVG(salary) OVER(
PARTITION BY department_id
ORDER BY hire_date
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
) AS avg_salary
FROM employees e
) e
ORDER BY ROW_NUMBER() OVER(
PARTITION BY department_id
ORDER BY ABS(salary - avg_salary) DESC
)
FETCH FIRST ROW WITH TIES
Maybe this is what you are looking for.
Sample data:
WITH
emp (ID, EMP_NAME, HIRE_DATE, SALARY, DEPT) AS
(
Select 601, 'HILLER', To_Date('23-JAN-82', 'dd-MON-yy'), 4800, 60 From Dual Union All
Select 602, 'MILLER', To_Date('23-FEB-82', 'dd-MON-yy'), 9000, 60 From Dual Union All
Select 603, 'SMITH', To_Date('23-MAR-82', 'dd-MON-yy'), 4800, 60 From Dual Union All
Select 604, 'FORD', To_Date('23-APR-82', 'dd-MON-yy'), 4200, 60 From Dual Union All
Select 605, 'KING', To_Date('23-MAY-82', 'dd-MON-yy'), 6000, 60 From Dual Union All
Select 201, 'SCOT', To_Date('23-MAR-82', 'dd-MON-yy'), 13000, 20 From Dual Union All
Select 202, 'JONES', To_Date('23-AUG-82', 'dd-MON-yy'), 6000, 20 From Dual
),
Create CTE named grid with several analytic functions and windowing clauses. They are not all needed but the resulting dataset below shows the logic with all components included.
grid AS
(
Select
g.*, Max(GAP) OVER(PARTITION BY DEPT) "DEPT_MAX_GAP"
From
(
Select
ROWNUM "RN",
Sum(1) OVER(Partition By DEPT Order By DEPT, HIRE_DATE, ID ROWS BETWEEN Unbounded Preceding And Current Row) "RN_DEPT",
ID, EMP_NAME, HIRE_DATE, DEPT, SALARY,
--
Nvl(Sum(SALARY) OVER(Partition By DEPT Order By DEPT, HIRE_DATE, ID ROWS BETWEEN 1 Following And Unbounded Following), 0) "SUM_SAL_LATER",
Nvl(Sum(1) OVER(Partition By DEPT Order By DEPT, HIRE_DATE, ID ROWS BETWEEN 1 Following And Unbounded Following), 0) "COUNT_EMP_LATER",
--
Nvl(Sum(SALARY) OVER(Partition By DEPT Order By DEPT, HIRE_DATE, ID ROWS BETWEEN 1 Following And Unbounded Following) /
Sum(1) OVER(Partition By DEPT Order By DEPT, HIRE_DATE, ID ROWS BETWEEN 1 Following And Unbounded Following), 0) "AVG_LATER",
--
SALARY -
Nvl((
Sum(SALARY) OVER(Partition By DEPT Order By DEPT, HIRE_DATE, ID ROWS BETWEEN 1 Following And Unbounded Following) /
Sum(1) OVER(Partition By DEPT Order By DEPT, HIRE_DATE, ID ROWS BETWEEN 1 Following And Unbounded Following)
), 0) "GAP"
from
emp
Order By
DEPT, HIRE_DATE, ID
) g
Order By
RN
)
CTE grid resultiing dataset:
RN
RN_DEPT
ID
EMP_NAME
HIRE_DATE
DEPT
SALARY
SUM_SAL_LATER
COUNT_EMP_LATER
AVG_LATER
GAP
DEPT_MAX_GAP
1
1
601
HILLER
23-JAN-82
60
4800
24000
4
6000
-1200
6000
2
2
602
MILLER
23-FEB-82
60
9000
15000
3
5000
4000
6000
3
3
603
SMITH
23-MAR-82
60
4800
10200
2
5100
-300
6000
4
4
604
FORD
23-APR-82
60
4200
6000
1
6000
-1800
6000
5
5
605
KING
23-MAY-82
60
6000
0
0
0
6000
6000
6
1
201
SCOT
23-MAR-82
20
13000
6000
1
6000
7000
7000
7
2
202
JONES
23-AUG-82
20
6000
0
0
0
6000
7000
Main SQL
SELECT
g.ID, g.EMP_NAME, g.HIRE_DATE, g.DEPT, g.SALARY, g.GAP
FROM
grid g
WHERE
g.GAP = g.DEPT_MAX_GAP
Order By
RN
Resulting as:
ID
EMP_NAME
HIRE_DATE
DEPT
SALARY
GAP
605
KING
23-MAY-82
60
6000
6000
201
SCOT
23-MAR-82
20
13000
7000
Without CTE and with all unnecessery columns excluded it looks like this:
SELECT ID, EMP_NAME, HIRE_DATE, DEPT, SALARY, GAP
FROM
(
( Select g.*, Max(GAP) OVER(PARTITION BY DEPT) "DEPT_MAX_GAP"
From( Select
ID, EMP_NAME, HIRE_DATE, DEPT, SALARY,
SALARY -
Nvl(( Sum(SALARY) OVER(Partition By DEPT Order By DEPT, HIRE_DATE, ID ROWS BETWEEN 1 Following And Unbounded Following) /
Sum(1) OVER(Partition By DEPT Order By DEPT, HIRE_DATE, ID ROWS BETWEEN 1 Following And Unbounded Following)
), 0) "GAP"
From emp
Order By DEPT, HIRE_DATE, ID
) g
)
)
WHERE GAP = DEPT_MAX_GAP
Order By DEPT, HIRE_DATE, ID
It seems like this is all you need.
Regards...
I want to get best sellers from march 2019, while excluding the top 3 sellers of january. I tried using except where first SELECT gives best sellers of march (all of them) and the second SELECT gives top 3 of january.
SELECT * FROM (SELECT fullname, SUM(sale) sales
FROM mytable
WHERE oredrdate BETWEEN '2019-03-01' AND '2019-03-31'
GROUP BY fullname
ORDER BY sales DESC) X
EXCEPT
SELECT * FROM (SELECT fullname, SUM(sale) sales
FROM mytable
WHERE oredrdate BETWEEN '2019-01-01' AND '2019-01-31'
GROUP BY fullname
ORDER BY sales DESC
LIMIT 3) Y;
The problem is that EXCEPT does not intersect as I wished it would. What each SELECT returns and my desired output with data:
First SELECT returns:
fullname sales
Tommy Williams 8320
Ryan Atkinson 7310
Petey Cruiser 6200
Anna Mull 5840
Gail Forcewind 4120
Paige Turner 3300
Bob Frapples 2100
... ...
Seconds SELECT returns:
fullname sales
Tommy Williams 9220
Anna Mull 8100
Greta Life 7891
Desired OUTPUT:
fullname sales
Ryan Atkinson 7310
Petey Cruiser 6200
Gail Forcewind 4120
Paige Turner 3300
Bob Frapples 2100
... ...
How should I change my code to achieve this?
This can be done with a LEFT JOIN where you exclude the matching rows:
SELECT X.*
FROM (
SELECT fullname, SUM(sale) sales
FROM mytable
WHERE oredrdate BETWEEN '2019-03-01' AND '2019-03-31'
GROUP BY fullname
) X LEFT JOIN (
SELECT fullname, SUM(sale) sales
FROM mytable
WHERE oredrdate BETWEEN '2019-01-01' AND '2019-01-31'
GROUP BY fullname
ORDER BY sales DESC
LIMIT 3
) Y ON Y.fullname = X.fullname
WHERE Y.fullname IS NULL
ORDER BY X.sales DESC
You could use:
SELECT fullname, SUM(sales) AS total
FROM mytable
WHERE oredrdate BETWEEN '2019-03-01' AND '2019-03-31'
AND fullname NOT IN (SELECT fullname, SUM(sales) AS total
FROM mytable
WHERE oredrdate BETWEEN '2019-01-01' AND '2019-01-31'
AND fullname IS NOT NULL
GROUP BY fullname
ORDER BY total DESC LIMIT 3)
GROUP BY fullname
ORDER BY total DESC;
I would group by some kind of unique column like employee_id, there is possibility that two persons could have the same name.
The problem is that EXCEPT is considering both the name and the amount columns. It is unlikely that the second would match.
One way to write this is:
WITH jan3 as (
SELECT TOP (3) fullname, SUM(sale) as sales
FROM mytable
WHERE orderdate >= '2019-01-01' AND
orderdate < '2019-02-01'
GROUP BY fullname
ORDER BY sales DESC
)
SELECT m.fullname, SUM(m.sale) as sales
FROM mytable m
WHERE m.orderdate >= '2019-03-01' AND
m.orderdate < '2019-04-01' AND
NOT EXISTS (SELECT 1
FROM jan3
WHERE jan3.fullname = m.fullname
)
GROUP BY fullname
ORDER BY sales DESC;
Note that this changes the date comparisons to use >= and <. This is considered a best practice, because it works for dates and datetime (timestamp) values.
There are other ways of writing this using only a single aggregation. For instance:
WITH s as (
SELECT m.fullname,
SUM(CASE WHEN m.orderdate < '2019-02-01' THEN m.sale END) as sales_jan,
SUM(CASE WHEN m.orderdate >= '2019-03-01' THEN m.sale END) as sales_mar
FROM mytable m
WHERE m.orderdate >= '2019-01-01' AND
m.orderdate < '2019-04-01'
)
SELECT s.*
FROM (SELECT s.*,
ROW_NUMBER() OVER (ORDER BY sales_jan DESC) as seqnum_jan
FROM s
) s
WHERE seqnum_jan > 3
ORDER BY s.sales_mar;
Name, Salary, DateChanged
John 100 '10-Jan-2017'
John 200 '20-Jan-2017'
John 50 '20-Jan-2018'
Tom 100 '10-Jan-2017'
Tom 200 '20-Jan-2017'
Alice 100 '10-Jan-2017'
Alice 200 '20-Jan-2017'
How to get persons with salary > 100 on Apr,1, 2018?
Thanks
One method is to use row_number():
select t.*
from (select t.*,
row_number() over (partition by name order by datechanged desc) as seqnum
from t
where datechanged <= date '2018-04-01'
) t
where seqnum = 1 and salary > 100;
This selects all rows before the cutoff date. It then enumerates them and chooses the one with the highest date and compares the salary.
This assumes that the first salary is in the table.
I'm trying to write an SQL request to count the number of Employees hired/fired each year.
I can have each Employee's dates with this select:
SELECT HiredDate, FiredDate FROM Employees;
I can list each year with this select:
SELECT to_char(e1.HiredDate, 'YYYY') Year FROM Employees e1
UNION
SELECT to_char(e2.FiredDate, 'YYYY') Year FROM Employees e2;
But I don't manage to count the number of hired/fired each year.
EDIT
Employees sample data:
Name | HiredDate | FiredDate
--------------------------------
John | 01/02/2003 | 03/04/2013
Jack | 05/06/2006 | 07/08/2013
Jean | 03/04/2006 | null
James | 01/02/2013 | null
Expected results:
Year | HiredNumber | FiredNumber
---------------------------------
2003 | 1 | 0
2006 | 2 | 0
2013 | 1 | 2
There may be years with no hiring and years with no firings. So the easiest way to solve this problem is with two sub-queries, one for each count and join them with a full outer join.
with e1 as (
select extract(year from hireddate) as emp_year
, count(hireddate) as hired_count
from employees
where hireddate is not null
group by extract(year from hireddate)
)
, e2 as (
select extract(year from fireddate) as emp_year
, count(fireddate) as fired_count
from employees
where fireddate is not null
group by extract(year from fireddate)
)
select coalesce (e1.emp_year, e2.emp_year) as emp_year
, nvl(e1.hired_count, 0) as hired_count
, nvl(e2.fired_count, 0) as fired_count
from e1
full outer join e2
on e1.emp_year = e2.emp_year
order by 1
Notes
This will exclude any years with neither hirings nor firings. It's easy enough to generate such things.
Presumably hireddate is mandatory but the not null check is retained for symmetry :)
". It works well in SQL Developer but can't be set as a Visual datasource"
Here is a variant without the FULL OUTER JOIN:
select emp_year
, sum(hired_count) as hired_count
, sum(fired_count) as fired_count
from (
select extract(year from hireddate) as emp_year
, count(hireddate) as hired_count
, 0 as fired_count
from employees
where hireddate is not null
group by extract(year from hireddate)
union all
select extract(year from fireddate) as emp_year
, 0 as hired_count
, count(fireddate) as fired_count
from employees
where fireddate is not null
group by extract(year from fireddate)
)
group by emp_year
order by 1
SELECT 'Hired' What, to_char(e1.HiredDate, 'YYYY') Year, COUNT(*) TheCount
FROM Employees e1
GROUP BY to_char(e1.HiredDate, 'YYYY')
UNION ALL
SELECT 'Fired' What, to_char(e2.FiredDate, 'YYYY') Year, COUNT(*) TheCount
FROM Employees e2
GROUP BY to_char(e2.FiredDate, 'YYYY');
What one query can produce table_c?
I have three columns: day, person, and revenue_per_person. Right now I have to use two queries since I lose 'person' when producing table_b.
table_a uses all three columns:
SELECT day, person, revenue_per_person
FROM purchase_table
GROUP BY day, person
table_b uses only two columns due to AVG() and GROUP BY:
SELECT day, AVG(revenue) as avg_revenue
FROM purchase_table
GROUP BY day
table_c created from table_a and table_b:
SELECT
CASE
WHEN revenue_per_person > avg_revenue THEN 'big spender'
ELSE 'small spender'
END as spending_bucket
FROM ????
Maybe this could help, try this one
SELECT a.day,
CASE
WHEN a.revenue_per_person > b.avg_revenue THEN 'big spender'
ELSE 'small spender'
END as spending_bucket
FROM
(
SELECT day, person, AVG(revenue) revenue_per_person
FROM purchase_table
GROUP BY day, person
) a INNER JOIN
(
SELECT day, AVG(revenue) as avg_revenue
FROM purchase_table
GROUP BY day
) b ON a.day = b.day
You might want to use analytic functions.
An Oracle example showing if a person's salary is greater than average salary in his department.
08:56:54 HR#vm_xe> ed
Wrote file s:\toolkit\service\buffer.sql
1 select
2 department_id
3 ,employee_id
4 ,salary
5 ,avg_salary
6 ,case when salary > avg_salary then 1 else 0 end case_is_greater
7 from (
8 select
9 department_id
10 ,employee_id
11 ,salary
12 ,round(avg(salary) over(partition by department_id),2) avg_salary
13 from employees
14 )
15* where department_id = 30
08:58:56 HR#vm_xe> /
DEPARTMENT_ID EMPLOYEE_ID SALARY AVG_SALARY CASE_IS_GREATER
------------- ----------- ---------- ---------- ---------------
30 114 11000 4150 1
30 115 3100 4150 0
30 116 2900 4150 0
30 117 2800 4150 0
30 118 2600 4150 0
30 119 2500 4150 0
6 rows selected.
Elapsed: 00:00:00.01
If you are using a database that supports windows functions, you can do this as:
SELECT (CASE WHEN revenue_per_person > avg_revenue THEN 'big spender'
ELSE 'small spender'
END) as spending_bucket
FROM (select pt.*,
avg(revenue) over (partition by day, person) as revenue_per_person,
avg(revenue) over (partition by day) as avg_revenue,
row_number() over (partition by day, person order by day) as seqnum
from purchase_table pt
) t
where seqnum = 1
The purpose of seqnum is to just get one row per person/day combination.