Computing the median of salaries under each manager in BigQuery SQL - sql

I have a BigQuery table with columns: employee, salary, gender, manager. I would like to compute the median, within each team (so, for each manager), of female employees' salaries.
I have tried using the PERCENTILE_CONT(..., 0.5) navigation function but it seems it does not support GROUP BY
This is my query:
SELECT
manager,
PERCENTILE_CONT(salary,
0.5) OVER() AS median_of_women_salaries
FROM
employees_table
WHERE
gender = 'woman'
GROUP BY
manager
What I get is the error message:
"SELECT list expression references column salary which is neither grouped nor aggregated at [.:.]"
As a result, I would like to get a table with columns manager and median_of_women_salaries that would show the median of females salaries under each manager.
Thank you very much for your help!

You could use an existing shared UDF:
SELECT
manager,
fhoffa.x.median(ARRAY_AGG(salary)) AS median_of_women_salaries
FROM employees_table
WHERE gender = 'woman'
GROUP BY manager
https://medium.com/#hoffa/new-in-bigquery-persistent-udfs-c9ea4100fd83
https://console.cloud.google.com/bigquery?p=fhoffa&d=x&r=median&page=routine

Related

Access SQL Can't Create Two Grouped Averages

Apologies for my simple problem, I am an absolute novice. I have the following code in separate queries
I am attempting to display 3 columns, the average male salary for a set job, average female salary for a set job and the JobID. Separately these queries work however I cannot work out how to combine them.
I have tried multiple solutions from this site for example trying to put multiple select statements inside
and also by using a 'union' solution however cannot get either to work.union This simply compiles them into a single column and sorts via salary not JobID.
SELECT Round(Avg(Salary)) AS AverageMaleSalary, JobID
FROM Employee WHERE Gender = "M"
GROUP BY JobID;
SELECT Round(Avg(Salary)) AS AverageFemaleSalary, JobID
FROM Employee WHERE Gender = "F"
GROUP BY JobID;
You could use conditional aggregation
SELECT JobId,ROUND(AVG(IIF(Gender='F', Salary, NULL))) AS AverageFemaleSalary
,ROUND(AVG(IIF(Gender='M', Salary, NULL))) AS AverageMaleSalary
FROM Employee
GROUP BY JobId;

Using ROUND, AVG and COUNT in the same SQL query

I need to write a query where I need to first count the people working in a department, then calculate the average people working in a department and finally round it to only one decimal place. I tried so many different variations.
That's what I got so far although it's not the first one I tried but I always get the same error message. (ORA-00979 - not a group by expression)
SELECT department_id,
ROUND(AVG(c.cnumber),1)
FROM employees c
WHERE c.cnumber =
(SELECT COUNT(c.employee_id)
FROM employees c)
GROUP BY department_id;
I really don't know what do to at this point and would appreciate any help.
Employees Table:
Try this (Oracle syntax) example from your description:
with department_count as (
SELECT department_id, COUNT(c.employee_id) as employee_count
FROM employees c
group by department_id
)
SELECT department_id,
ROUND(AVG(c.employee_count),1)
FROM department_count c
GROUP BY department_id;
But this query not make sense. Count is integer, and count return one number for one department in this case AVG return the same value as count.
Maybe you have calculate number of employee and averange of salary on department?

SQL query from Lynda.com

I have a question about two queries. Will these two queries give the same result? I am trying to find the average salary by department:
Select s1.department, avg(s1.salary)
From
(Select department, salary
From staff
Where salary > 100000) s1
Group by s1.department
vs
select department, avg(salary) as avg_salary
from staff
where salary > 100000
group by department
Yes, it gives the same amounts back.
the bottom query gets data from a sub select which gets its data from the table, whereas the top query gets it straight from the table itself.
There are no additional filters in there. So the result will be the same.
you can test it out however, don't take my word for it.

Using aggregate function for seprate rows

There is a table with name jobs where for different department number salary is giving. Same department can have more than one job so the salary may vary. Now i want to solve this query:
"Find the average salaries for each department without displaying the respective department numbers."
Here i should use avg but how to use it so that i can get my result of each department no separately?
I think what you're looking for is the GROUP BY clause. If I understand your question correctly then something like this should do the trick.
SELECT AVG(salary) FROM table_name GROUP BY department

How to calculate Standard Deviation in Oracle SQL Developer?

I have a table employees,
CREATE TABLE employees (
employeeid NUMERIC(9) NOT NULL,
firstname VARCHAR(10),
lastname VARCHAR(20),
deptcode CHAR(5),
salary NUMERIC(9, 2),
PRIMARY KEY(employeeid)
);
and I want to calculate Standard Deviation for salary.
This is the code I am using:
select avg(salary) as mean
,sqrt(sum((salary-avg(salary))*(salary-avg(salary)))/count(employeeid)) as SD
from employees
group by employeeid;
I am getting this error:
ORA-00979: not a GROUP BY expression
00979. 00000 - "not a GROUP BY expression"
*Cause:
*Action:
Error at Line: 260 Column: 12
Line 260 Column 12 is avg(salary)
How can I sort this out?
Oracle has a built-in function to calculate standard deviation: STDDEV.
The usage is as you'd expect for any aggregate function.
select stddev(salary)
from employees;
I'd just use the stddev function
SELECT avg(salary) as mean,
stddev(salary) as sd
FROM employees
It doesn't make sense to group by employeeid since that is, presumably unique. It doesn't make sense to talk about the average salary by employee, you want the average salary across all employees (or all departments or some other aggregatable unit)
The salary-avg(salary) can't be evaluated; avg(salary) is not available during execution of the query but only after all records are retrieved.
I would suggest to add AVG calculations in a subquery and JOIN it to the main one
select avg(salary) as mean,
sqrt(sum((salary-avg_res.avg)*(salary-avg_res.avg))/count(employeeid)) as SD
from employees JOIN
(select employeeid,avg(salary) as avg
from employees
group by employeeid) avg_res ON employees.employeeid=avg_res.employeeid
group by employeeid;
I thought you had to include the column in the GROUP BY in the SELECT:
select employeeid
,avg(salary) as mean
,sqrt(sum((salary-avg(salary))*(salary-avg(salary)))/count(employeeid)) as SD
from employees
group by employeeid;
But on further reflection the query doesn't make much sense unless it's historical data. An employee id ought to be unique to a single employee. Unless this is an average over time there should be only one salary per employee. Your mean will be the salary and the standard deviation will be zero.
A better query might be average of all salaries. In that case, remove the GROUP BY.
One more nitpick: the formula you're using is more properly called the population standard deviation. The sample deviation divides by (n-1).