SQL: group by table - sql

Suppose we use PostgreSQL and have 2 tables, department and employee, the latter belonging and having a FK into the former.
We now want to do an aggregate select, where we want to put all the information from department and then some aggregate values from employee:
SELECT d.id, d.name, d.budget, count(*), avg(e.salary), max(e.age), sum(e.children)
FROM department d LEFT JOIN employee e ON e.dept = d.id
GROUP BY d.id, d.name, d.budget
I don't like that I need to specify all the columns from department in the GROUP BY - is there a way to "group by the whole table"?
And a bit more philosophical question, suppose I do GROUP BY d.id. Assuming d.id is the primary key of department, why do I need to group by all the other columns as well?

If employee is pre aggregated then there is no need to list the select columns
select *
from
department d
left join (
select
dept as id,
count(*) as count_employee,
avg(salary) as avg_salary,
max(age) as max_age,
sum(children) as sum_children
from employee
group by dept
) e using (id)
The using clause avoids the joined on column duplicity.

Related

The best way to find the Department with the maximum total salary Postgresql

Lets we have 2 standard tables Employees and Departments
CREATE TABLE departments (
id SERIAL PRIMARY KEY,
name VARCHAR
);
CREATE TABLE employees (
id SERIAL PRIMARY KEY,
department_id INTEGER,
name VARCHAR,
salary NUMERIC(13,2)
);
What is the best way to find the name of the department with the maximum employees' total salary.
I've found two solutions and they looks too complicated for such simple task.
Using rank()
SELECT name FROM (
SELECT name, rank() OVER ( ORDER BY salary DESC ) AS rank
FROM (
SELECT
departments.name,
sum(salary) AS salary
FROM employees
JOIN departments ON department_id = departments.id
GROUP BY departments.name
) AS t1
) AS t2
WHERE rank = 1;
Using subquery
WITH t1 AS (SELECT
departments.name,
sum(salary) AS salary
FROM employees
JOIN departments ON departments.id = employees.department_id
GROUP BY departments.name
)
SELECT name FROM t1
WHERE t1.salary = (SELECT max(salary) FROM t1);
At first glance using rank should be less efficient as it performs unnecessary sorting. Though EXPLAIN shows that the first option is more efficient.
Or maybe someone suggests another solution.
So, what is the best way to find the Department with the maximum total salary using postgres?
I would write the rank() as:
SELECT *
FROM (SELECT d.name, SUM(e.salary) AS salary,
RANK() OVER (ORDER BY SUM(e.salary)) as rnk
FROM employees e JOIn
departments d
ON e.department_id = d.id
GROUP BY d.name
) d
WHERE rnk = 1;
(The additional subquery should not affect performance, but it adds nothing to clarify the query either.)
Because window functions are built-in to the database, the database has methods for making them more efficient. And there is overhead for getting the MAX() as well. But, to be honest, I would expect both methods to have similar performance.
I should note that if you want only one department returned -- even when there are ties -- then the simplest method is:
SELECT d.name, SUM(e.salary) AS salary
FROM employees e JOIn
departments d
ON e.department_id = d.id
GROUP BY d.name
ORDER BY SUM(e.salary) DESC
FETCH FIRST 1 ROW ONLY

SQL Set Operators - Selecting rows from tables with different columns

I'm using Oracle 10g, and I'm trying to select rows from one table that do not appear in the other table in the query using a set operator.
I'm trying to select id, last_name and first_name columns from an employee table in which these rows do not appear in a job_history table.
The only common column in these 2 tables is the id column. But I want to display the names as well.
I have tried:
SELECT
id, last_name, first_name
FROM
employees
MINUS
SELECT
id, TO_CHAR(null), TO_CHAR(null)
FROM
job_history;
Which doesn't produce desired result.
However, if I didn't want to display the names from the employee table, I use:
SELECT id FROM employees
MINUS
SELECT id FROM job_history;
Which gives me half of the result, except for that I want the names from the employee table.
Any advice?
Why can't you just use NOT IN like
SELECT id, last_name, first_name FROM employees
WHERE ID NOT IN (SELECT id FROM job_history);
You can as well try LEFT JOIN like
SELECT e.id, e.last_name, e.first_name
FROM employees e LEFT JOIN job_history jh
ON e.ID = jh.ID
WHERE jh.some_other_column IS NULL;
You can use a inner join on the select result
select a.id, a.last_name, a.first_name
from employees a
inner join (
SELECT id FROM employees
MINUS
SELECT id FROM job_history ) x on x.id = a.id

Get MAX element based on two different tables

I have problem with SQL query on Oracle DB.. I have following tables:
DEPARTMENT(`ID` NUMBER(11), `NAME` VARCHAR(25))
EMPLOYEE(`ID` INT(11), `LASTNAME` VARCHAR(25), `DEP_ID` INT(11));
SALARIES(`ID` INT(11), `EMPLOYEE_ID` INT(11), `SALARY` INT(11));
Now, I want to get name of depratment with highest average sum of salary. Department isn't directly related to Salaries so probably I need to use Employee table as well.
I've created a query:
SELECT NAME, (SELECT SUM(SALARIES.SALARY) FROM SALARIES JOIN EMPLOYEE ON EMPLOYEE.EMPLOYEE_ID = EMPLOYEE.ID WHERE EMPLOYEE.DEP_ID = DEPARTMENT.ID GROUP BY EMPLOYEE.ID) AS AVG_OF_SUM FROM DEPARTMENT;
It returns list of department's name and avg sum. But now I need to get only one department name for the highest averange row.
Is my query actually OK? Or can be improved? And how can I get only one record?
Thanks for any help.
Regards,
D
Make use of the ANALYTIC function SUM...OVER
In the subquery, apply the analytic function, and then select only those rows which you desire.
For example,
SELECT DISTINCT DEPT, SUM(SAL) OVER (PARTITION BY DEPT ORDER_BY DEPT) SUM_SAL
FROM EMPLOYEE
ORDER_BY DEPT;
SELECT NAME, (SELECT MAX(SUM(SALARIES.SALARY))
FROM SALARIES
JOIN EMPLOYEE ON EMPLOYEE.EMPLOYEE_ID = EMPLOYEE.ID )
WHERE EMPLOYEE.DEP_ID = DEPARTMENT.ID
GROUP BY EMPLOYEE.ID) AS AVG_OF_SUM FROM DEPARTMENT;
SELECT NAME, avg_sal FROM
(SELECT d.NAME, avg(s.SALARY) avg_sal
FROM SALARIES s
JOIN EMPLOYEE e ON s.EMPLOYEE_ID = e.ID
JOIN DEPARTMENT d ON e.DEP_ID = d.ID
GROUP BY d.NAME
ORDER BY 2 DESC)
WHERE rownum = 1;
(This query shows a department with the highest avg salary. If you need sum replace AVG -> SUM)

Can I use more than one column in subquery?

I want to show the names of all employees from the EMPLOYEES table who are working on more than three projects from the PROJECT table.
PROJECTS.PersonID is a a foreign key referencing EMPLOYEES.ID:
SELECT NAME, ID
FROM EMPLOYEES
WHERE ID IN
(
SELECT PersonID, COUNT(*)
FROM PROJECTS
GROUP BY PersonID
HAVING COUNT(*) > 3
)
Can I have both PersonID, COUNT(*) in that subquery, or there must be only one column?
Not in an IN clause (or at least not the way you are trying to use it. Some RDBMSs allow tuples with more than one column in the IN clause but it wouldn't help your case here)
You just need to remove the COUNT(*) from the SELECT list to achieve your desired result.
SELECT NAME, ID
FROM EMPLOYEES
WHERE ID IN
(
SELECT PersonID
FROM PROJECTS
GROUP BY PersonID
HAVING COUNT(*) > 3
)
If you wanted to also return the count you could join onto a derived table or common table expression with more than one column though.
SELECT E.NAME,
E.ID,
P.Cnt
FROM EMPLOYEES E
JOIN (SELECT PersonID,
Count(*) AS Cnt
FROM PROJECTS
GROUP BY PersonID
HAVING Count(*) > 3) P
ON E.ID = P.PersonID
To answer your question, you can only have 1 column for the IN subquery. You could get your results using the query below:
SELECT e.ID
,e.Name
FROM dbo.Projects p
LEFT OUTER JOIN dbo.Employees e
ON p.PersonID = e.ID
GROUP BY e.ID
,e.Name
HAVING COUNT(*) > 3

SQL COUNT(*) returning the wrong answer

The following script should return the name of the departments and the number of employees that are in those departments, the Marketing,Executive and Sales departments have '0' employees but instead of '0' , the returned value is '1'. How can I correct it?
select Department, Departments.DepartmentID, count(*) as 'NumOfEmps'
from Departments
left join Employees
on Employees.DepartmentID = Departments.DepartmentID
group by Departments.DepartmentID,Department
You can't do that all in one query. You need a sub-query to get the employee counts first, then get the related department information (name, etc.) using the aggregated results:
SELECT Department, Departments.DepartmentID, t.NumOfEmps
FROM Departments
LEFT JOIN (SELECT DepartmentID, count(*) as 'NumOfEmps'
FROM Employees
GROUP BY DepartmentID) t
ON t.DepartmentID = Departments.DepartmentID
I'm making some assumptions about your schema since it's not listed. Column names may be off a bit, but this is the general idea. Hope it helps.
Don't use Count(*) count the thing you want to count namely the employees.
Count(*) counts the whole row. Since there's always going to be at least one record for each Department in Departments when you do count(*) you'll always get at least 1
SELECT d.Department, d.DepartmentID, count(e.EmployeeID)
FROM Departments d
LEFT JOIN employees e
ON d.DepartmentID = e.DepartmentID
GROUP BY
d.Department, d.DepartmentID
DEMO