MySQL AVG() in sub-query - sql

What one query can produce table_c?
I have three columns: day, person, and revenue_per_person. Right now I have to use two queries since I lose 'person' when producing table_b.
table_a uses all three columns:
SELECT day, person, revenue_per_person
FROM purchase_table
GROUP BY day, person
table_b uses only two columns due to AVG() and GROUP BY:
SELECT day, AVG(revenue) as avg_revenue
FROM purchase_table
GROUP BY day
table_c created from table_a and table_b:
SELECT
CASE
WHEN revenue_per_person > avg_revenue THEN 'big spender'
ELSE 'small spender'
END as spending_bucket
FROM ????

Maybe this could help, try this one
SELECT a.day,
CASE
WHEN a.revenue_per_person > b.avg_revenue THEN 'big spender'
ELSE 'small spender'
END as spending_bucket
FROM
(
SELECT day, person, AVG(revenue) revenue_per_person
FROM purchase_table
GROUP BY day, person
) a INNER JOIN
(
SELECT day, AVG(revenue) as avg_revenue
FROM purchase_table
GROUP BY day
) b ON a.day = b.day

You might want to use analytic functions.
An Oracle example showing if a person's salary is greater than average salary in his department.
08:56:54 HR#vm_xe> ed
Wrote file s:\toolkit\service\buffer.sql
1 select
2 department_id
3 ,employee_id
4 ,salary
5 ,avg_salary
6 ,case when salary > avg_salary then 1 else 0 end case_is_greater
7 from (
8 select
9 department_id
10 ,employee_id
11 ,salary
12 ,round(avg(salary) over(partition by department_id),2) avg_salary
13 from employees
14 )
15* where department_id = 30
08:58:56 HR#vm_xe> /
DEPARTMENT_ID EMPLOYEE_ID SALARY AVG_SALARY CASE_IS_GREATER
------------- ----------- ---------- ---------- ---------------
30 114 11000 4150 1
30 115 3100 4150 0
30 116 2900 4150 0
30 117 2800 4150 0
30 118 2600 4150 0
30 119 2500 4150 0
6 rows selected.
Elapsed: 00:00:00.01

If you are using a database that supports windows functions, you can do this as:
SELECT (CASE WHEN revenue_per_person > avg_revenue THEN 'big spender'
ELSE 'small spender'
END) as spending_bucket
FROM (select pt.*,
avg(revenue) over (partition by day, person) as revenue_per_person,
avg(revenue) over (partition by day) as avg_revenue,
row_number() over (partition by day, person order by day) as seqnum
from purchase_table pt
) t
where seqnum = 1
The purpose of seqnum is to just get one row per person/day combination.

Related

Recursive snowflake query of monthly snapshots for time series analysis dropping records prior to change for users who changed department

This is a successor question to this question which explains the objective of this query and provides a sample of the source data.
With help, I have this recursive query running which is more efficient than my non-recursive query, repeated 36 times and unioned together.
The purpose of this query is to know which department an employee was in at the end of each month. The problem with this code is that for employees who changed departments, it is returning only the month end department value for months subsequent to the most recent department change, and no prior records. For employees who changed departments, the output should contain this data:
Month - Department Code
0 - 100
1 - 100
2 - 200
3 - 200
And it is currently returning:
Month - Department Code
0 - 100
1 - 100
Here is the query:
WITH Q AS (
select
row_number() over(order by null) as q_level,
last_day(dateadd(month, -q_level, CURRENT_DATE), month) as last_day_month
from table(generator(ROWCOUNT=>36))
), Q1 AS (
select
q.q_level
,q.last_day_month
,v_dept_history_adj.associate_id
,v_dept_history_adj.home_department_code
,v_dept_history_adj.position_effective_date
,max(position_effective_date) OVER(PARTITION BY v_dept_history_adj.associate_id) AS most_recent_record
from datawarehouse.srctable
,Q
where v_dept_history_adj.position_effective_date <= q.last_day_month
)
select
associate_id
,position_effective_date
,home_department_code
,most_recent_record
,last_day_month AS month
FROM Q1
where position_effective_date = most_recent_record
order by month desc, position_effective_date desc
So no that the larger picture of your questions makes sense:
To get the most resent department per month for each employee, I would write this query like so:
with emp_data(emp_id, dep_id, date) as (
select * from values
(1, 10, '2022-01-01'::date),
(1, 20, '2022-07-10'::date),
(2, 10, '2022-07-14'::date)
), last_36_months as (
select
row_number() over(order by null) as q_level,
last_day(dateadd(month, -q_level, CURRENT_DATE), month) as last_day_month
--from table(generator(ROWCOUNT=>36))
from table(generator(ROWCOUNT=>12))
), month_end_data as (
select
e.emp_id
,e.dep_id
,l.last_day_month as month
from last_36_months as l
join emp_data as e
on e.date <= l.last_day_month
qualify row_number() over(partition by e.emp_id, l.last_day_month order by e.date desc) = 1
)
select *
from month_end_data
order by 1,3 desc;
I reduced 36 to 12, and moved the data to 2022 so the output was less verbose, but it gives:
EMP_ID
DEP_ID
MONTH
1
20
2022-10-31
1
20
2022-09-30
1
20
2022-08-31
1
20
2022-07-31
1
10
2022-06-30
1
10
2022-05-31
1
10
2022-04-30
1
10
2022-03-31
1
10
2022-02-28
1
10
2022-01-31
2
10
2022-10-31
2
10
2022-09-30
2
10
2022-08-31
2
10
2022-07-31
which seems more aligned to what you want, and simpler to read

PIVOT using JOIN in SQL

I was asked to pivot this data using basic SQL and wasn't sure how to answer it. I googled some answers and realized you can use MAX or SUM with CASE expressions, but at the end of the interview I asked how you would solve the question and the interviewer said by using joins. Can anyone show me how it's done using joins?
BEGINNING TABLE
emp_id
col_id
col_desc
attvalue
month
1
1
salary
2000
2010-05-09
1
2
bonus
0
2010-05-09
1
3
compensation
2000
2010-05-09
1
1
salary
2000
2010-05-10
1
2
bonus
500
2010-05-10
1
3
compensation
2500
2010-05-10
2
1
salary
1000
2010-05-09
2
2
bonus
500
2010-05-09
2
3
compensation
1500
2010-05-09
Code to create the beginning table
CREATE TABLE Employees(emp_id INT, col_id INT, col_desc NVARCHAR(MAX), attvalue INT, month DATE);
INSERT INTO Employees
VALUES
(1,1,'salary',2000,'2010-05-09'),
(1,2,'bonus',0,'2010-05-09'),
(1,3,'compensation',2000,'2010-05-09'),
(1,1,'salary',2000,'2010-05-10'),
(1,2,'bonus',500,'2010-05-10'),
(1,3,'compensation',2500,'2010-05-10'),
(2,1,'salary',1000,'2010-05-09'),
(2,2,'bonus',500,'2010-05-09'),
(2,3,'compensation',1500,'2010-05-09');
RESULTING TABLE
emp_id
month
salary
bonus
compensation
1
2010-05-09
2000
0
2000
1
2010-05-10
2000
500
2500
2
2010-05-09
1000
500
1500
Below are the self join, case expression and pivot way
-- Self Join way
select s.emp_id, s.month,
s.attvalue as salary,
b.attvalue as bonus,
c.attvalue as compensation
from Employees s
inner join Employees b on s.emp_id = b.emp_id
and s.month = b.month
inner join Employees c on s.emp_id = c.emp_id
and s.month = c.month
where s.col_desc = 'salary'
and b.col_desc = 'bonus'
and c.col_desc = 'compensation'
order by s.emp_id, s.month
-- case expression way
select emp_id, month,
max(case when col_desc = 'salary' then attvalue else 0 end) as salary,
max(case when col_desc = 'bonus' then attvalue else 0 end) as bonus,
max(case when col_desc = 'compensation' then attvalue else 0 end) as compensation
from Employees
group by emp_id, month
order by emp_id, month
-- Pivot way
select *
from (
select emp_id, month, col_desc, attvalue
from Employees
) d
pivot
(
max(attvalue)
for col_desc in ([salary], [bonus], [compensation])
) p
order by emp_id, month

View and complex query count distinct locations employee stayed in SQL

I have a view which looks like this view_1:
id Office Begin_dt Last_dt Days
1 Office1 2019-09-02 2019-09-08 6
1 Office2 2019-09-09 2019-09-30 21
1 Office1 2019-10-01 2019-10-31 30
5 Office3 2017-10-01 2017-10-16 15
5 Office2 2017-10-17 2017-10-30 13
5 Office2 2017-11-01 2017-11-31 30
I want to find the office where employee stayed for max time and also the number of Distinct Office locations he stayed in.
Expected output
id Max_time_in_Office Days Distinct_office_locations
1 Office1 36 2
5 Office2 43 2
So id 1 spends 6 and 30, overall 36 days in office 1. Max time is spent in office 1 by him. Distinct locations are 2.
id 5 spends 13 and 30 , 43 days in office. Max time is spent in office 2. Distinct locations are 2.
Code tried
select v.*
from (select v.id, v.office, sum(days) as Max_time_in_Office, count(Office) as Distinct_office_locations,
rank() over (partition by id order by sum(days) desc) as seqnum
from view_1 v
group by id, office
) v
where seqnum = 1;
Output obtained
id Max_time_in_Office Days Distinct_office_locations
1 Office1 36 1
5 Office2 43 1
So I am getting wrong output. Can someone pls help
Close. You want a window function:
select v.*
from (select v.id, v.office, sum(days) as Max_time_in_Office,
count(*) over (partition by id) as Distinct_office_locations,
rank() over (partition by id order by sum(days) desc) as seqnum
from view_1 v
group by id, office
) v
where seqnum = 1;
Basically the window function is counting the number of rows returned after the aggregation -- and there is one row per office.
You could use the apply operator to achieve that:
select V.Id,
T.Max_Time_Office,
T.Days,
Distinct_office_locations = count(distinct V.Office)
from view_1 V
Cross apply
(
Select top 1 Id,
Max_Time_Office = Office,
Days = sum(Days)
From view_1 VG
where V.Id = VG.Id
group by VG.Id, VG.Office
order by sum(Days) desc
) T
group by V.Id, T.Max_Time_Office, T.Days
Basically, you are getting the Office with most days in the order by sum(Days) desc inside the Cross apply, and using that in the outer expression. I then just did a count(distinct V.Office) to get the distinct offices.

T-SQL calculate the percent increase or decrease between the earliest and latest for each project

I have a table like below, I am trying to run a query in T-SQL to get the earliest and latest costs for each project_id according to the date column and calculate the percent cost increase or decrease and return the data-set show in the second table (I have simplified the table in this question).
project_id date cost
-------------------------------
123 7/1/17 5000
123 8/1/17 6000
123 9/1/17 7000
123 10/1/17 8000
123 11/1/17 9000
456 7/1/17 10000
456 8/1/17 9000
456 9/1/17 8000
876 1/1/17 8000
876 6/1/17 5000
876 8/1/17 10000
876 11/1/17 8000
Result:
(Edit: Fixed the result)
project_id "cost incr/decr pct"
------------------------------------------------
123 80% which is (9000-5000)/5000
456 -20%
876 0%
Whatever query I run I get duplicates.
This is what I tried:
select distinct
p1.Proj_ID, p1.date, p2.[cost], p3.cost,
(nullif(p2.cost, 0) / nullif(p1.cost, 0)) * 100 as 'OVER UNDER'
from
[PROJECT] p1
inner join
(select
[Proj_ID], [cost], min([date]) min_date
from
[PROJECT]
group by
[Proj_ID], [cost]) p2 on p1.Proj_ID = p2.Proj_ID
inner join
(select
[Proj_ID], [cost], max([date]) max_date
from
[PROJECT]
group by
[Proj_ID], [cost]) p3 on p1.Proj_ID = p3.Proj_ID
where
p1.date in (p2.min_date, p3.max_date)
Unfortunately, SQL Server does not have a first_value() aggregation function. It does have an analytic function, though. So, you can do:
select distinct project_id,
first_value(cost) over (partition by project_id order by date asc) as first_cost,
first_value(cost) over (partition by project_id order by date desc) as last_cost,
(first_value(cost) over (partition by project_id order by date desc) /
first_value(cost) over (partition by project_id order by date asc)
) - 1 as ratio
from project;
If cost is an integer, you may need to convert to a representation with decimal places.
You can use row_number and OUTER APPLY over top 1 ... prior to SQL 2012
select
min_.projectid,
latest_.cost - min_.cost [Calculation]
from
(select
row_number() over (partition by projectid order by date) rn
,projectid
,cost
from projectable) min_ -- get the first dates per project
outer apply (
select
top 1
cost
from projectable
where
projectid = min_.projectid -- get the latest cost for each project
order by date desc
) latest_
where min_.rn = 1
This might perform a little better
;with costs as (
select *,
ROW_NUMBER() over (PARTITION BY project_id ORDER BY date) mincost,
ROW_NUMBER() over (PARTITION BY project_id ORDER BY date desc) maxcost
from table1
)
select project_id,
min(case when mincost = 1 then cost end) as cost1,
max(case when maxcost = 1 then cost end) as cost2,
(max(case when maxcost = 1 then cost end) - min(case when mincost = 1 then cost end)) * 100 / min(case when mincost = 1 then cost end) as [OVER UNDER]
from costs a
group by project_id

Writing a request for the number of hired/fired each year

I'm trying to write an SQL request to count the number of Employees hired/fired each year.
I can have each Employee's dates with this select:
SELECT HiredDate, FiredDate FROM Employees;
I can list each year with this select:
SELECT to_char(e1.HiredDate, 'YYYY') Year FROM Employees e1
UNION
SELECT to_char(e2.FiredDate, 'YYYY') Year FROM Employees e2;
But I don't manage to count the number of hired/fired each year.
EDIT
Employees sample data:
Name | HiredDate | FiredDate
--------------------------------
John | 01/02/2003 | 03/04/2013
Jack | 05/06/2006 | 07/08/2013
Jean | 03/04/2006 | null
James | 01/02/2013 | null
Expected results:
Year | HiredNumber | FiredNumber
---------------------------------
2003 | 1 | 0
2006 | 2 | 0
2013 | 1 | 2
There may be years with no hiring and years with no firings. So the easiest way to solve this problem is with two sub-queries, one for each count and join them with a full outer join.
with e1 as (
select extract(year from hireddate) as emp_year
, count(hireddate) as hired_count
from employees
where hireddate is not null
group by extract(year from hireddate)
)
, e2 as (
select extract(year from fireddate) as emp_year
, count(fireddate) as fired_count
from employees
where fireddate is not null
group by extract(year from fireddate)
)
select coalesce (e1.emp_year, e2.emp_year) as emp_year
, nvl(e1.hired_count, 0) as hired_count
, nvl(e2.fired_count, 0) as fired_count
from e1
full outer join e2
on e1.emp_year = e2.emp_year
order by 1
Notes
This will exclude any years with neither hirings nor firings. It's easy enough to generate such things.
Presumably hireddate is mandatory but the not null check is retained for symmetry :)
". It works well in SQL Developer but can't be set as a Visual datasource"
Here is a variant without the FULL OUTER JOIN:
select emp_year
, sum(hired_count) as hired_count
, sum(fired_count) as fired_count
from (
select extract(year from hireddate) as emp_year
, count(hireddate) as hired_count
, 0 as fired_count
from employees
where hireddate is not null
group by extract(year from hireddate)
union all
select extract(year from fireddate) as emp_year
, 0 as hired_count
, count(fireddate) as fired_count
from employees
where fireddate is not null
group by extract(year from fireddate)
)
group by emp_year
order by 1
SELECT 'Hired' What, to_char(e1.HiredDate, 'YYYY') Year, COUNT(*) TheCount
FROM Employees e1
GROUP BY to_char(e1.HiredDate, 'YYYY')
UNION ALL
SELECT 'Fired' What, to_char(e2.FiredDate, 'YYYY') Year, COUNT(*) TheCount
FROM Employees e2
GROUP BY to_char(e2.FiredDate, 'YYYY');