SQL for calculating salary contribution by each department

SQL for calculating salary contribution by each department - sql

I am writing a simple query using in oracle database that finds the salary contribution by each department.
Here are my tables:
CREATE TABLE employee (
empid NUMBER,
fname VARCHAR2(20),
deptid NUMBER,
salary NUMBER
);
CREATE TABLE department (
deptid NUMBER,
deptname VARCHAR2(20)
);
Inserting data into this table:
INSERT INTO department VALUES (1, 'Sales');
INSERT INTO department VALUES (2, 'Accounting');
INSERT INTO employee VALUES (1,' John', 1,100);
INSERT INTO employee VALUES (2,' Lisa', 2,200);
INSERT INTO employee VALUES (3,' Jerry', 1,300);
INSERT INTO employee VALUES (4,' Sara', 1,400);
Now to find out the salary contribution in percentage by each department I am using below query:
select dept.deptname, sum(emp.salary)/(select sum(emp.salary) from employee emp)*100 as percentge from employee emp, department dept where dept.deptid=emp.deptid group by dept.deptname;
Is this efficient way of calculating my output or Is there any alternate way?

Please try:
select distinct a.*,
(sum(Salary) over(partition by a.DeptID))/(sum(Salary) over())*100 "Percent"
from department a join employee b on a.deptid=b.deptid

You don't need a subquery for this. You can use analytic functions:
select dept.deptname,
100*sum(emp.salary)/(sum(sum(emp.salary)) over ()) as percentage
from employee emp join
department dept
on dept.deptid = emp.deptid
group by dept.deptname;
I also changed the join syntax to use ANSI standard joins.
EDIT:
There is not a particular "issue" with using subqueries for this. A subquery does work. In general, though, subqueries are harder to optimize than the built-in features in Oracle (and in this case in ANSI SQL). In this simple case, I don't know if there is a performance difference.
As for analytic functions, they are a very powerful component of SQL and you should learn about them.

By with clause you can calculate sum for all departments once and then use it as parameter. On your example sum value for all departments calculated for each row and this will lead to performance loss.
with t as
(select sum(salary) as sum_salary from employee)
select dept.deptname, sum(emp.salary)/ sum_salary * 100 as percentge
from employee emp, department dept, t
where dept.deptid=emp.deptid group by dept.deptname, sum_salary;

Related

Find average salary of employees for each department and order employees within a department by age

CREATE TABLE EMPLOYEE (
NAME VARCHAR(500) UNIQUE,
AGE INT,
DEPT VARCHAR(500),
SALARY INT
)
INSERT INTO EMPLOYEE VALUES('RAMESH',20,'FINANCE',50000);
INSERT INTO EMPLOYEE VALUES('DEEP',25,'SALES',30000);
INSERT INTO EMPLOYEE VALUES('SURESH',22,'FINANCE',50000);
INSERT INTO EMPLOYEE VALUES('RAM',28,'FINANCE',20000);
INSERT INTO EMPLOYEE VALUES('PRADEEP',22,'SALES',20000);
Could someone explain the error in the query
SELECT NAME, AGE, DEPT, AVG(SALARY)
FROM EMPLOYEE
GROUP BY DEPT
ORDER BY AGE
(/* USING NAME,AGE ETC ALSO SHOWS ERROR- "column "employee.name" must appear in the GROUP BY clause or be used in an aggregate function")
Why is there an error; consider both the logic part and the syntax part for explanation?
CAN ANYBODY PROVIDE THE ANSWER USING SUB QUERY?

You can refer to the error on at here : must appear in the GROUP BY clause or be used in an aggregate function
POSTGRESQL
SELECT DISTINCT NAME, AGE, DEPT
, AVG(SALARY)
FROM EMPLOYEES
GROUP BY NAME, DEPT, AGE
ORDER BY AGE

When you are using group by then you should extract only those columns which you used in group by and you can choose aggregate function with this. So you can use like below query.
select DEPT, AVG(SALARY) FROM EMPLOYEE GROUP BY DEPT

insert into a table with select

I have two tables with these schema:
student(id,name,dept_name, tot_cred)
instructor(id,name,dept_name,salary)
and the question is:
Insert every student whose tot_cred attribute is greater than 100 as
an instructor in the same department, with a salary of $10,000.
I try this query but I dont know how set salary for student $10,000:
insert into instructor (id,name,dept_name,salary)
select id,name,dept_name
from student
where tot_cred > 100

Just select the constant value. Personally, I'd add an alias in the select to document that the constant is intended to be the salary although it's not necessary. Different databases may have different rules for how to create that alias and you haven't told us what database you're using so I'm guessing on the syntax.
insert into instructor (id,name,dept_name,salary)
select id,name,dept_name,10000 as salary
from student
where tot_cred > 100

INSERT INTO instructor
(id,
NAME,
dept_name,
salary)
SELECT id,
NAME,
dept_name,
10000 AS salary
FROM student
WHERE tot_cred >= 100;

Max Salary with a single GroupBy without Joins

Schema for EMPLOYEE
(ID, EMPLOYEENAME, SALARY, ORGANIZATIONID)
Query to Solve: Find employee Names in each organization with Maximum Salary without a Join.
SELECT E.*
FROM EMPLOYEE E,
(SELECT EMP.ORGANIZATIONID, MAX(EMP.SALARY)
FROM EMPLOYEE EMP
GROUP BY EMP.ORGANIZATIONID) MAXSALARY
WHERE MAXSALARY.SALARY =E.SALARY
AND E.ORGANIZATIONID=EMP.ORGANIZATIONID ;
Is there a way to avoid the join? I am using Spark SQL API and joins cause an extra shuffle operation which is expensive. Is there a way to get the employee name while getting the max salary?
Assume you have a single employee in each organization having the max salary

You can use PARTITION BY with Spark SQL as shown below (Although it will require a subquery)
SELECT E.*
FROM
(SELECT EMP.EMPLOYEENAME, EMP.ORGANIZATIONID, EMP.SALARY,
row_number() OVER (PARTITION BY ORGANIZATIONID ORDER BY SALARY DESC) as rank
FROM EMPLOYEE EMP
) AS E
WHERE E.rank=1

Try this:
SELECT P.ORGANIZATIONID, P.EMPLOYEENAME
FROM EMPLOYEE P
WHERE P.SALARY = (SELECT MAX(E.SALARY) FROM EMPLOYEE E WHERE P.ORGANIZATIONID = E.ORGANIZATIONID)
GROUP BY P.ORGANIZATIONID, P.EMPLOYEENAME

Try this:
SELECT EMPLOYEENAME FROM EMPLOYEE
WHERE SALARY IN (SELECT MAX(SALARY) FROM EMPLOYEE GROUP BY ORGANIZATIONID)

Employee that has a higher salary than the AVERAGE of his department - optimized

We have only a table named EMPLOYEESALARY in our database with the 3 following columns:
Employee_ID, Employee_Salary, Department_ID
Now I have to SELECT every employee that has a higher salary than the AVERAGE of his department. How do I do that?
I know this is a repeat question but the best solution I found everywere was:
SELECT * from employee join (SELECT AVG(employee_salary) as sal, department_ID
FROM employee GROUP BY Department_ID) as t1
ON employee.department_ID = t1.department_ID
where employee.employee_salary > t1.sal
Can we optimize it further and do it without a subquery?
Reference:
SELECT every employee that has a higher salary than the AVERAGE of his department
Employees with higher salary than their department average?
Find Schema here, to test: SQL Fiddle

Can we do it without a subquery?
Not that I can think of. Had the condition been >= then the following would have worked
SELECT TOP 1 WITH TIES *
FROM employee
ORDER BY CASE
WHEN employee_salary >= AVG(employee_salary)
OVER (
PARTITION BY Department_ID) THEN 0
ELSE 1
END
But this is not an optimisation and it won't work correctly for the > condition if no employee has a salary greater than the average anyway (i.e. all employees in a department had the same salary)
Can we optimize it further?
You could shorten the syntax a bit with
WITH T AS
(
SELECT *,
AVG(employee_salary) OVER (PARTITION BY Department_ID) AS sal
FROM employee
)
SELECT *
FROM T
WHERE employee_salary > sal
but it still has to do much the same work.
Assuming suitable indexes on the base table already exist then the only way of avoiding some more of that work at SELECT time would be to pre-calculate the grouped SUM and COUNT_BIG in an indexed view grouped by Department_ID (to allow the average to be cheaply derived) .

A more optimal form is likely to be:
select e.*
from (select e.*, avg(employee_salary) over (partition by department_id) as avgs
from employee e
) e
where employee_salary > avgs;
This (as well as other versions) can use an index on employee(department_id, employee_salary). The final where probably should not use an index, because it is selecting lots of rows.

ID value is lost after performing MIN then GROUP BY. Why?

To simplify, if I had a table called 'employee':
id INT NOT NULL PRIMARY KEY,
salary FLOAT,
department VARCHAR(255)
I want to perform and query where I retrieve the minimum salary in each department.
So my query is something like this:
SELECT employee.ID, MIN(employee.salary), employee.department
FROM employee
GROUP BY employee.department
But regardless of which records are found. The ID values in the result set are renamed to 1,2,3.... up to however many records (departments) exist in the result set.
How can I maintain the actual ID's of the employees after performing the AGGREGATE function and GROUP BY?

You can't. Think about it, If a Department has 20 employees, and for that department, there are three employees that have the same minimum salary, which EmployeeId do you want the the query output to display? if it was guaranteed that there was only one employee in each dept with that lowest salary, then it can be done by selecting the specific employee records where the salary is the minimum value for each Department:
Select EmployeeID
From Employee e
Where Salary =
(Select Min(Salary) From EMployee
Where DepartmentId = e.DepartmentId)
but this will return multiple records per department when more than one employee has that min salary level.

I would guess you're using MySQL or SQLite, because your query is ambiguous and isn't allowed by standard SQL or most brands of RDBMS. MySQL and SQLite are more permissive, so it's your responsibility to resolve the ambiguity.
Here's my usual fix:
SELECT e1.ID, e1.salary, e1.department
FROM employee e1
LEFT OUTER JOIN employee e2 ON (e1.department = e2.department
AND e1.salary > e2.salary)
WHERE e2.department IS NULL;
Here's another solution that gives the same result:
SELECT e1.ID, e1.salary, e1.department
FROM employee e1
JOIN (SELECT e2.department, MIN(e2.salary) AS min_salary
FROM employee e2 GROUP BY e2.department) d
ON (e1.salary = d.min_salary);
Both of these give multiple rows per department if there are multiple employees in the department with identical minimal salaries. You need to decide how to resolve that case, because it's not clear from your problem description.

Your script is invalid:
SELECT employee.ID, MIN(employee.salary), employee.department
FROM employee
GROUP BY employee.department
Instead, look at this:
SELECT MIN(employee.salary), employee.department
FROM employee
GROUP BY employee.department
If you need the employee id as well, then you need to use a subquery.

This will do the trick:
SELECT employee.department, MIN(employee.salary), employee.ID
FROM employee
GROUP BY 1

In modern SQL Server releases (and other reasonably powerful and modern SQL engines), SQL "Window functions" are probably the best alternatives (to be preferred over subqueries and self-joins) to do what you desire:
SELECT ID, salary, department
FROM employee
WHERE 1 = ROW_NUMBER() OVER(PARTITION BY department ORDER BY salary ASC)
This works when, if multiple employees have the same (department-minimal) salary, you want just a "random-ish" one of them (you can add criteria to the ORDER BY if you want one picked by some specific criteria); look into RANK, instead of ROW_NUMBER, if you want all.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL for calculating salary contribution by each department - sql

Please try: select distinct a., (sum(Salary) over(partition by a.DeptID))/(sum(Salary) over())100 "Percent" from department a join employee b on a.deptid=b.deptid

Related

Find average salary of employees for each department and order employees within a department by age

insert into a table with select

Max Salary with a single GroupBy without Joins

Employee that has a higher salary than the AVERAGE of his department - optimized

ID value is lost after performing MIN then GROUP BY. Why?

Categories

Resources

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL for calculating salary contribution by each department - sql

Please try: select distinct a.*, (sum(Salary) over(partition by a.DeptID))/(sum(Salary) over())*100 "Percent" from department a join employee b on a.deptid=b.deptid

Related

Find average salary of employees for each department and order employees within a department by age

insert into a table with select

Max Salary with a single GroupBy without Joins

Employee that has a higher salary than the AVERAGE of his department - optimized

ID value is lost after performing MIN then GROUP BY. Why?

Categories

Resources

Please try: select distinct a., (sum(Salary) over(partition by a.DeptID))/(sum(Salary) over())100 "Percent" from department a join employee b on a.deptid=b.deptid