Oracle: how to select timestamped value for particular date - sql

Name, Salary, DateChanged
John 100 '10-Jan-2017'
John 200 '20-Jan-2017'
John 50 '20-Jan-2018'
Tom 100 '10-Jan-2017'
Tom 200 '20-Jan-2017'
Alice 100 '10-Jan-2017'
Alice 200 '20-Jan-2017'
How to get persons with salary > 100 on Apr,1, 2018?
Thanks

One method is to use row_number():
select t.*
from (select t.*,
row_number() over (partition by name order by datechanged desc) as seqnum
from t
where datechanged <= date '2018-04-01'
) t
where seqnum = 1 and salary > 100;
This selects all rows before the cutoff date. It then enumerates them and chooses the one with the highest date and compares the salary.
This assumes that the first salary is in the table.

Related

Analytic functions and means of window clause

I'm using Oracle and SQL Developer. I have downloaded HR schema and need to do some queries with it. Now I'm working with table Employees. As an user I need to see employees with the highest gap between their salary and the average salary of all later hired colleagues in corresponding department. It seems quite interesting and really complicated. I have read some documentation and tried, for example LEAD(), that provides access to more than one row of a table at the same time:
SELECT
employee_id,
first_name
|| ' '
|| last_name,
department_id,
salary,
hire_date,
LEAD(hire_date)
OVER(PARTITION BY department_id
ORDER BY
hire_date DESC
) AS Prev_hiredate
FROM
employees
ORDER BY
department_id,
hire_date;
That shows for every person in department hiredate of later hired person. Also I have tried to use window clause to understand its concepts:
SELECT
employee_id,
first_name
|| ' '
|| last_name,
department_id,
hire_date,
salary,
AVG(salary)
OVER(PARTITION BY department_id
ORDER BY
hire_date ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING
) AS avg_sal
FROM
employees
ORDER BY
department_id,
hire_date;
The result of this query will be:
However, it is not exactly what I need. I need to reduce the result just by adding column with gap (salary-avr_sal), where the gap will be highest and receive one employee per department. How should the result look like: for example, we have 60 department. We have 5 employees there ordering by hire_date. First has salary 4800, second – 9000, third – 4800, fourth – 4200, fifth – 6000. If we do calculations: 4800 - ((9000+4800+4200+6000)/4)=-1200, 9000-((4800+4200+6000)/3)=4000, 4800 -((4200+6000)/2)=-300, 4200 - 6000=-1800 and the last person in department will have the highest gap: 6000 - 0 = 6000. Let's take a look on 20 department. We have two people there: first has salary 13000, second – 6000. Calculations: 13000 - 6000 = 7000, 6000 - 0 = 6000. The highest gap will be for first person. So for department 20 the result should be person with salary 13000, for department 60 the result should be person with salary 6000 and so on.
How should look my query to get the appropriate result (what I need is marked bold up, also I want to see column with highest gap, can be different solutions with analytic functions, but should be necessarily included window clause)?
You can get the average salary of employees that were hired prior to the current employee by just adapting the rows clause of your avg:
AVG(salary) OVER(
PARTITION BY department_id
ORDER BY hire_date
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
) AS avg_salary
The 1 PRECEDING clause tells the database not to include the current row in the window.
If you are looking for the employees with the greatest gap to that average, we can just order by the resultset:
SELECT e.*,
AVG(salary) OVER(
PARTITION BY department_id
ORDER BY hire_date
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
) AS avg_salary
FROM employees e
ORDER BY ABS(salary - avg_salary) DESC;
Finally, if you want the top "outlier salary" per department, then we need at least one more level. The shortest way to express this probably is to use ROW_NUMBER() to rank employees in each department by their salary gap to the average, and then to fetch all top rows per group using WITH TIES:
SELECT *
FROM (
SELECT e.*,
AVG(salary) OVER(
PARTITION BY department_id
ORDER BY hire_date
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
) AS avg_salary
FROM employees e
) e
ORDER BY ROW_NUMBER() OVER(
PARTITION BY department_id
ORDER BY ABS(salary - avg_salary) DESC
)
FETCH FIRST ROW WITH TIES
Maybe this is what you are looking for.
Sample data:
WITH
emp (ID, EMP_NAME, HIRE_DATE, SALARY, DEPT) AS
(
Select 601, 'HILLER', To_Date('23-JAN-82', 'dd-MON-yy'), 4800, 60 From Dual Union All
Select 602, 'MILLER', To_Date('23-FEB-82', 'dd-MON-yy'), 9000, 60 From Dual Union All
Select 603, 'SMITH', To_Date('23-MAR-82', 'dd-MON-yy'), 4800, 60 From Dual Union All
Select 604, 'FORD', To_Date('23-APR-82', 'dd-MON-yy'), 4200, 60 From Dual Union All
Select 605, 'KING', To_Date('23-MAY-82', 'dd-MON-yy'), 6000, 60 From Dual Union All
Select 201, 'SCOT', To_Date('23-MAR-82', 'dd-MON-yy'), 13000, 20 From Dual Union All
Select 202, 'JONES', To_Date('23-AUG-82', 'dd-MON-yy'), 6000, 20 From Dual
),
Create CTE named grid with several analytic functions and windowing clauses. They are not all needed but the resulting dataset below shows the logic with all components included.
grid AS
(
Select
g.*, Max(GAP) OVER(PARTITION BY DEPT) "DEPT_MAX_GAP"
From
(
Select
ROWNUM "RN",
Sum(1) OVER(Partition By DEPT Order By DEPT, HIRE_DATE, ID ROWS BETWEEN Unbounded Preceding And Current Row) "RN_DEPT",
ID, EMP_NAME, HIRE_DATE, DEPT, SALARY,
--
Nvl(Sum(SALARY) OVER(Partition By DEPT Order By DEPT, HIRE_DATE, ID ROWS BETWEEN 1 Following And Unbounded Following), 0) "SUM_SAL_LATER",
Nvl(Sum(1) OVER(Partition By DEPT Order By DEPT, HIRE_DATE, ID ROWS BETWEEN 1 Following And Unbounded Following), 0) "COUNT_EMP_LATER",
--
Nvl(Sum(SALARY) OVER(Partition By DEPT Order By DEPT, HIRE_DATE, ID ROWS BETWEEN 1 Following And Unbounded Following) /
Sum(1) OVER(Partition By DEPT Order By DEPT, HIRE_DATE, ID ROWS BETWEEN 1 Following And Unbounded Following), 0) "AVG_LATER",
--
SALARY -
Nvl((
Sum(SALARY) OVER(Partition By DEPT Order By DEPT, HIRE_DATE, ID ROWS BETWEEN 1 Following And Unbounded Following) /
Sum(1) OVER(Partition By DEPT Order By DEPT, HIRE_DATE, ID ROWS BETWEEN 1 Following And Unbounded Following)
), 0) "GAP"
from
emp
Order By
DEPT, HIRE_DATE, ID
) g
Order By
RN
)
CTE grid resultiing dataset:
RN
RN_DEPT
ID
EMP_NAME
HIRE_DATE
DEPT
SALARY
SUM_SAL_LATER
COUNT_EMP_LATER
AVG_LATER
GAP
DEPT_MAX_GAP
1
1
601
HILLER
23-JAN-82
60
4800
24000
4
6000
-1200
6000
2
2
602
MILLER
23-FEB-82
60
9000
15000
3
5000
4000
6000
3
3
603
SMITH
23-MAR-82
60
4800
10200
2
5100
-300
6000
4
4
604
FORD
23-APR-82
60
4200
6000
1
6000
-1800
6000
5
5
605
KING
23-MAY-82
60
6000
0
0
0
6000
6000
6
1
201
SCOT
23-MAR-82
20
13000
6000
1
6000
7000
7000
7
2
202
JONES
23-AUG-82
20
6000
0
0
0
6000
7000
Main SQL
SELECT
g.ID, g.EMP_NAME, g.HIRE_DATE, g.DEPT, g.SALARY, g.GAP
FROM
grid g
WHERE
g.GAP = g.DEPT_MAX_GAP
Order By
RN
Resulting as:
ID
EMP_NAME
HIRE_DATE
DEPT
SALARY
GAP
605
KING
23-MAY-82
60
6000
6000
201
SCOT
23-MAR-82
20
13000
7000
Without CTE and with all unnecessery columns excluded it looks like this:
SELECT ID, EMP_NAME, HIRE_DATE, DEPT, SALARY, GAP
FROM
(
( Select g.*, Max(GAP) OVER(PARTITION BY DEPT) "DEPT_MAX_GAP"
From( Select
ID, EMP_NAME, HIRE_DATE, DEPT, SALARY,
SALARY -
Nvl(( Sum(SALARY) OVER(Partition By DEPT Order By DEPT, HIRE_DATE, ID ROWS BETWEEN 1 Following And Unbounded Following) /
Sum(1) OVER(Partition By DEPT Order By DEPT, HIRE_DATE, ID ROWS BETWEEN 1 Following And Unbounded Following)
), 0) "GAP"
From emp
Order By DEPT, HIRE_DATE, ID
) g
)
)
WHERE GAP = DEPT_MAX_GAP
Order By DEPT, HIRE_DATE, ID
It seems like this is all you need.
Regards...

How to select only rows with 2 consecutive "Yes" values ordered by year in SQL?

I have a query to return sample values for each employee per calendar year, and a column that checks (yes/no) if the sample value is >= 60,000.
My initial data:
Employee_ID Calendar_Year Sample_Value Sample_Check
1234 2020 55,000 No
1234 2021 70,000 Yes
1234 2022 50,000 No
3456 2020 80,000 Yes
3456 2021 40,000 No
3456 2022 65,000 Yes
5678 2020 30,000 No
5678 2021 70,000 Yes
5678 2022 90,000 Yes
I would like to get this result, because this employee is the only one with "yes" for 2 consecutive calendar years.
Employee_ID Calendar_Year Sample_Value Sample_Check
5678 2022 90,000 Yes
I have looked up similar questions but could not find something that solves my issue. I have also looked into LAG and LEAD but need help in understanding if they can give me the result I want.
I would tend towards using a correlated query to find qualifying rows, followed by a row_number window to select the greatest/least of each group you require:
with v as (
select *,
case when exists (
select * from t t2
where t2.Employee_ID = t.Employee_ID
and t.Sample_Check = 'Yes'
and t2.Sample_Check = 'Yes'
and t2.Calendar_Year = t.Calendar_Year - 1
) then 1 else 0 end valid
from t
), s as (
select *,
Row_Number() over(partition by Employee_ID, valid order by Calendar_Year desc) rn
from v
)
select Employee_Id, Calendar_Year, Sample_Value, Sample_Check
from s
where valid = 1 and rn = 1;
I'm not sure how bullet proof this is. I used the lag function in a window partition to get the prior Sample_Check. I then matched on the outer query to get the record that (basically shows yes = yes). If you had 3 (Yes) in a row then it would pull back 2. You might be able to use some conditional logic to offset the rows if you ran into that scenario
SELECT
*
FROM
(
SELECT Employee_ID
,Calendar_Year
,Sample_Value
,Sample_Check
, LAG(Sample_Check) OVER (PARTITION BY Employee_ID ORDER BY Employee_ID ASC, Calendar_Year ASC) AS LagSampleCheck1
FROM EMPLOYEETABLE
) X
WHERE Sample_Check = LagSampleCheck1
ORDER BY Employee_ID ASC, Calendar_Year ASC
I also created this one that does another row_number() Over (Partition BY Employee ID and Order by Calendar year so it picks up the latest year if you have a situation where you have more than one that meet that criteria. I added another record to your original data set (Employee ID 5678, Calendar Year 2023, Samples Value and Sample Check Yes) too create two records.
Employee_ID Calendar_Year Sample_Value Sample_Check
1234 2020 55,000 No
1234 2021 70,000 Yes
1234 2022 50,000 No
3456 2020 80,000 Yes
3456 2021 40,000 No
3456 2022 65,000 Yes
5678 2020 30,000 No
5678 2021 70,000 Yes
5678 2022 90,000 Yes
5678 2023 90,000 Yes
SELECT
*
FROM
(
SELECT
*
, ROW_NUMBER() OVER (PARTITION BY EMPLOYEE_ID ORDER BY CALENDAR_YEAR DESC) AS ROWCOUNTER
FROM
(
SELECT Employee_ID
,Calendar_Year
,Sample_Value
,Sample_Check
, LAG(Sample_Check) OVER (PARTITION BY Employee_ID ORDER BY Employee_ID ASC, Calendar_Year ASC) AS LagSampleCheck1
FROM EMPLOYEETABLE
) X
WHERE Sample_Check = LagSampleCheck1
) Z
WHERE ROWCOUNTER = 1
ORDER BY Employee_ID ASC, Calendar_Year ASC
This is the most straightforward solution . Just join the table with itself ( assuming calendar year is numeric )
SELECT t1.*, t2.sample_check
FROM data AS t1, data AS t2
WHERE t1.emp_id = t2.emp_id
AND t1.calendar_year = t2.calendar_year + 1
AND t1.sample_check = t2.sample_check
AND t1.sample_check = 'Yes'
test it
Also you can get same result with lag function with this;
WITH temp AS (SELECT emp_id
, calendar_year
, sample_value
, sample_check
, lag( CASE WHEN sample_check = 'Yes' THEN 1 ELSE 0 END, 1 )
OVER (PARTITION BY emp_id ORDER BY calendar_year) AS prevcheck
FROM data)
SELECT *
FROM temp
WHERE prevcheck = 1
AND sample_check = 'Yes'
Both gives the same result
emp_id calendar_year sample_value sample_check prevcheck
5678 2022 90 Yes 1
test it

Find repeating values of a certain value

I have a table similar to:
Date
Person
Distance
2022/01/01
John
15
2022/01/02
John
0
2022/01/03
John
0
2022/01/04
John
0
2022/01/05
John
19
2022/01/01
Pete
25
2022/01/02
Pete
12
2022/01/03
Pete
0
2022/01/04
Pete
0
2022/01/05
Pete
1
I want to find all persons who have a distance of 0 for 3 or more consecutive days.
So in the above, it must return John and the count of the days with a zero distance.
I.e.
Person
Consecutive Days with Zero
John
3
I'm looking at something like this, but I think this might be way off:
Select Person, count(*),
(row_number() over (partition by Person, Date order by Person, Date))
from mytable
Provided I understand your requirement you could, for your sample data, just calculate the difference in days of a windowed min/max date:
select distinct Person, Consecutive from (
select *, DateDiff(day,
Min(date) over(partition by person),
Max(date) over(partition by person)
) + 1 Consecutive
from t
where distance = 0
)t
where Consecutive >= 3;
Example Fiddle
If you can have gaps in the dates you could try the following that only considers rows with 1 day between each date (and could probably be simplified):
with c as (
select *, Row_Number() over (partition by person order by date) rn,
DateDiff(day, Lag(date) over(partition by person order by date), date) c
from t
where distance = 0
), g as (
select Person, rn - Row_Number() over(partition by person, c order by date) grp
from c
)
select person, Count(*) + 1 consecutive
from g
group by person, grp
having Count(*) >= 2;
One option is to:
transform your "Distance" values into a boolean, where distance of 0 becomes 1 and any other value becomes zero
compute a running sum over your transformed "Distance" values in a window of three rows, using a frame specification clause
filter out any "Person" value which has at least one sum of 3.
WITH cte AS (
SELECT *, SUM(CASE WHEN Distance = 0 THEN 1 ELSE 0 END) OVER(
PARTITION BY Person
ORDER BY Date_
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS window_of_3
FROM tab
)
SELECT DISTINCT Person
FROM cte
WHERE window_of_3 = 3
Check the demo here.
Note: This solution requires your table to have no missing dates. In case missing dates is a possible scenario, then it's necessary to add missing rows corresponding to the dates not found for each "Person" value, for this solution to work.

How to use SQL to get column count for a previous date?

I have the following table,
id status price date
2 complete 10 2020-01-01 10:10:10
2 complete 20 2020-02-02 10:10:10
2 complete 10 2020-03-03 10:10:10
3 complete 10 2020-04-04 10:10:10
4 complete 10 2020-05-05 10:10:10
Required output,
id status_count price ratio
2 0 0 0
2 1 10 0
2 2 30 0.33
I am looking to add the price for previous row. Row 1 is 0 because it has no previous row value.
Find ratio ie 10/30=0.33
You can use analytical function ROW_NUMBER and SUM as follows:
SELECT
id,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY date) - 1 AS status_count,
COALESCE(SUM(price) OVER (PARTITION BY id ORDER BY date), 0) - price as price
FROM yourTable;
DB<>Fiddle demo
I think you want something like this:
SELECT
id,
COUNT(*) OVER (PARTITION BY id ORDER BY date) - 1 AS status_count,
COALESCE(SUM(price) OVER (PARTITION BY id
ORDER BY date ROWS BETWEEN
UNBOUNDED PRECEDING AND 1 PRECEDING), 0) price
FROM yourTable;
Demo
Please also check another method:
with cte
as(*,ROW_NUMBER() OVER (PARTITION BY id ORDER BY date) - 1 AS status_count,
SUM(price) OVER (PARTITION BY id ORDER BY date) ss from yourTable)
select id,status_count,isnull(ss,0)-price price
from cte

MySQL AVG() in sub-query

What one query can produce table_c?
I have three columns: day, person, and revenue_per_person. Right now I have to use two queries since I lose 'person' when producing table_b.
table_a uses all three columns:
SELECT day, person, revenue_per_person
FROM purchase_table
GROUP BY day, person
table_b uses only two columns due to AVG() and GROUP BY:
SELECT day, AVG(revenue) as avg_revenue
FROM purchase_table
GROUP BY day
table_c created from table_a and table_b:
SELECT
CASE
WHEN revenue_per_person > avg_revenue THEN 'big spender'
ELSE 'small spender'
END as spending_bucket
FROM ????
Maybe this could help, try this one
SELECT a.day,
CASE
WHEN a.revenue_per_person > b.avg_revenue THEN 'big spender'
ELSE 'small spender'
END as spending_bucket
FROM
(
SELECT day, person, AVG(revenue) revenue_per_person
FROM purchase_table
GROUP BY day, person
) a INNER JOIN
(
SELECT day, AVG(revenue) as avg_revenue
FROM purchase_table
GROUP BY day
) b ON a.day = b.day
You might want to use analytic functions.
An Oracle example showing if a person's salary is greater than average salary in his department.
08:56:54 HR#vm_xe> ed
Wrote file s:\toolkit\service\buffer.sql
1 select
2 department_id
3 ,employee_id
4 ,salary
5 ,avg_salary
6 ,case when salary > avg_salary then 1 else 0 end case_is_greater
7 from (
8 select
9 department_id
10 ,employee_id
11 ,salary
12 ,round(avg(salary) over(partition by department_id),2) avg_salary
13 from employees
14 )
15* where department_id = 30
08:58:56 HR#vm_xe> /
DEPARTMENT_ID EMPLOYEE_ID SALARY AVG_SALARY CASE_IS_GREATER
------------- ----------- ---------- ---------- ---------------
30 114 11000 4150 1
30 115 3100 4150 0
30 116 2900 4150 0
30 117 2800 4150 0
30 118 2600 4150 0
30 119 2500 4150 0
6 rows selected.
Elapsed: 00:00:00.01
If you are using a database that supports windows functions, you can do this as:
SELECT (CASE WHEN revenue_per_person > avg_revenue THEN 'big spender'
ELSE 'small spender'
END) as spending_bucket
FROM (select pt.*,
avg(revenue) over (partition by day, person) as revenue_per_person,
avg(revenue) over (partition by day) as avg_revenue,
row_number() over (partition by day, person order by day) as seqnum
from purchase_table pt
) t
where seqnum = 1
The purpose of seqnum is to just get one row per person/day combination.