SQL query to find SUM - sql

I'm new too SQL and I've been struggling to write this query. I want to find the SUM of all salaries for employees in a give department, let's say 'M', and a given hire date, let's say '2002', any ideas? I'm thinking I have to JOIN the tables somehow but having trouble, I've set up a schema like this.
jobs table and columns
JOBS
------------
job_id
salary
hire_date
employees table and columns
EMPLOYEES
------------
employee_id
name
job_id
department_id
department table and columns
DEPARTMENTS
------------
department_id
department_name
This is very similar to the way the HR schema does it in Oracle so I think the schema should be OK just need help with the query now.

You need a statement like this:
SELECT e.name,
d.department_name,
SUM(j.salary)
FROM employees e,
departments d,
jobs j
WHERE d.department_name = 'M'
AND TO_CHAR(j.hire_date, 'YYYY') = '2002'
AND d.department_id = e.department_id
AND e.job_id = j.job_id
GROUP BY e.name,
d.department_name;

FWIW, you shouldn't use the old ANSI-89 implicit join notation (using ,). It is considered as deprecated since the ANSI-92 standard (more than 20 yers ago!) and some vendors start dropping its support (MS SQL Server 2008; I don't know if there is a deprecation warning for this "feature" with Oracle?).
So, as a newcomer, you shouldn't learn bad habits from the start.
With the "modern" syntax, your query should be written:
SELECT e.name,
d.department_name,
SUM(j.salary)
FROM employees e
JOIN departments d USING(department_id)
JOIN jobs j USING(job_id)
WHERE d.department_name = 'M'
AND TO_CHAR(j.hire_date, 'YYYY') = '2002'
GROUP BY e.name, d.department_name;
With that syntax, there is a clear distinction between the JOIN relation (USING or ON) and the filter clause WHERE. It will later ease things when you will encounter "advanced" joins such as OUTER JOIN.

Yes, you just need a simple inner JOIN between all three tables.
SELECT SUM(salary)
FROM JOBS j
JOIN EMPLOYEES e ON j.job_id = e.job_id
JOIN DEPARTMENTS d ON e.department_id = d.department_id
WHERE d.department_name = 'M'
AND e.hire_date = 2002

Related

What are Oracle's old-syntax join equivalents of these queries?

What are the equivalent joins written in the Oracle's old join syntax of these queries?
SELECT first_name, last_name, department_name, job_title
FROM employees e RIGHT JOIN departments d
ON(e.department_id = d.department_id)
RIGHT JOIN jobs j USING(job_id);
-->106 rows returned
SELECT first_name, last_name, department_name, job_title
FROM employees e RIGHT JOIN jobs j
ON(e.job_id = j.job_id)
RIGHT JOIN departments d
USING(department_id);
--> 122 rows returned
I would do something like this (for the first query) - making explicit the fact that a multiple join is, by definition, an iteration of joins of two tables (or more generally "rowsets") at a time. Think of it as "using parentheses explicitly".
select first_name, last_name, department_name, job_title
from (
select first_name, last_name, job_id, department_name
from employees e, departments d
where e.department_id (+) = d.department_id
) sq
, jobs j
where sq.job_id (+) = j.job_id
;
This can be rewritten (perhaps) using a single SELECT statement, with more WHERE conditions - but the query will be less readable; it wont' be quite as clear what it is doing.
Respectively:
SELECT first_name,
last_name,
department_name,
job_title
FROM employees e,
jobs j,
departments d
WHERE e.job_id (+) = j.job_id
AND e.department_id = d.department_id (+);
and:
SELECT first_name,
last_name,
department_name,
job_title
FROM employees e,
departments d,
jobs j
WHERE e.department_id (+) = d.department_id
AND e.job_id = j.job_id (+);
db<>fiddle here
However, please just use the ANSI join syntax. The old legacy join syntax is confusing to read and you will get errors from putting the (+) on the wrong side of the join condition and you should be teaching people how to use the less-confusing, "new" (its hard to call it new when its been around since Oracle 9i in 2001) syntax rather than reverting to old methods.
Just to add to Mathguy's answer, this is interesting because those innocent-looking right joins are not what they seem. My first (incorrect) attempt was this:
select e.department_id, e.job_id, e.first_name, e.last_name, d.department_name
from jobs j
, departments d
, employees e
where e.job_id(+) = j.job_id
and e.department_id(+) = d.department_id;
but as Mathguy points out it gives different results because of the departments with no employees and the cross join between departments and jobs, and a subtle join precedence effect that appears as a result of the right joins not being in one chain.
I'm not sure what the intention of the original query is. Using the Oracle HR demo schema, the results are the same as an inner join, but only because every job has at least one employee. This illustrates a pitfall in testing outer join queries, as you might run a test, get the same results, and think your rewrite was logically the same thing when it is not.
If you rewrite the original right joins as left joins, it would have to become something like this:
select e.department_id, e.job_id, e.first_name, e.last_name, d.department_name
from jobs j
left join (
departments d
left join employees e on e.department_id = d.department_id
)
on e.job_id = j.job_id;
(You could also expand the departments > employees join into an inline view or with clause, or use an outer apply construction to include the job_id join.)
This is because the two right joins in the original query are driven from jobs and departments, so even though the outer join from departments to employees includes the 16 departments with no employees, once we outer join from jobs to that, we implicitly exclude rows with no job_id, because we are driving it from jobs. So the outer join to departments is filtered to become in effect an inner join, and so long as all jobs have corresponding employees then that gives the same results as an inner join too. To see the difference you would have to insert another job, which adds a row in the results with the job title but no employee details.
Therefore the old-style version needs to be either this:
select de.first_name, de.last_name, de.department_name, j.job_title
from jobs j
, lateral (
select e.department_id, e.job_id, e.first_name, e.last_name, d.department_name
from departments d
, employees e
where e.department_id(+) = d.department_id
) de
where de.job_id(+) = j.job_id;
or without lateral:
select first_name, last_name, department_name, job_title
from jobs j
, ( select e.first_name, e.last_name, e.job_id, d.department_name
from departments d, employees e
where e.department_id (+) = d.department_id ) de
where de.job_id(+) = j.job_id
The second query just switches jobs and departments:
select first_name, last_name, department_name, job_title
from departments d
, ( select e.first_name, e.last_name, e.department_id, e.job_id, j.job_title
from jobs j, employees e
where e.job_id(+) = j.job_id ) je
where je.department_id(+) = d.department_id

SQL in Oracle HR Schema

I have made a query in Oracle HR schema to see the following information:
The city where the department is located
The total number of employees in the department
However, the query cannot be executed correctly and said this is "not a GROUP BY expression".
Does anyone knows what's the problem is? Thanks in advance.
SELECT department_name, city, COUNT(employees.department_id)
FROM departments
JOIN employees on (departments.department_id=employees.department_id)
JOIN locations USING (location_id)
GROUP BY department_name;
You are grouping by department and want to show the department's city. You expect this to work, because each department is in exactly one city. (SQL people call this functional dependency.)
For this to work, ...
there would have to be a unique contraint on the department name or you'd have to group by department_id instead
the DBMS must detect and support functional dependency in aggregation queries
Unfortunately, Oracle doesn't support functional dependency in aggregation queries. It forces us to put every such column in the GROUP BY clause or into an aggregation function.
So either extend the GROUP BY clause:
SELECT d.department_name, l.city, COUNT(e.department_id)
FROM departments d
JOIN employees e ON e.department_id = d.department_id
JOIN locations l USING (location_id)
GROUP BY d.department_name, l.city
ORDER BY d.department_name;
or use some aggregation function as MIN or MAX on that single value.
SELECT d.department_name, MAX(l.city) AS city, COUNT(e.department_id)
FROM departments d
JOIN employees e ON e.department_id = d.department_id
JOIN locations l USING (location_id)
GROUP BY d.department_name
ORDER BY d.department_name;
What I prefer though, is to aggregate first and only then join. You want to join the departments with their employee count, so do just that:
SELECT d.department_name, l.city, COALESCE(e.cnt, 0) AS employee_count
FROM departments d
JOIN locations l USING (location_id)
LEFT JOIN
(
SELECT department_id, COUNT(*) as cnt
FROM employees
GROUP BY department_id
) e ON e.department_id = d.department_id
ORDER BY d.department_name;
The problem is you have both aggregated and non-aggregated column (in your case city in the select list.
As I don't know the structure of location table and considering a department have only one location defined you can use max(city),
SELECT department_name, max(city) city, COUNT(employees.department_id) no_of_employees
FROM departments
JOIN employees on (departments.department_id=employees.department_id)
JOIN locations USING (location_id)
GROUP BY department_name;
As excellently explained by Thorsten, you could also group the data using OVER and PARTITION BY function which would eliminate the use of GROUP BY function.
SELECT d.department_name, l.city, COUNT(e.department_id) OVER (PARTITION BY e.department_id) as emp_count
FROM departments d
JOIN employees e ON e.department_id = d.department_id
JOIN locations l USING (location_id)
ORDER BY d.department_name;

Sql subquery with a group by and a join on two tables

So i have two tables
EMPLOYEE- Contains columns including EMPLOYEE_NAME, DEPARTMENT_ID and SALARY
DEPARTMENTS - Contains columns including DEPARTMENT_NAME, and DEPARTMENT_ID
I need to display the department name and the average slary for each department and order it by the average salaries.
I am new to DBs and am having trouble.
I try to do a subquery in the from field ( this subquery returns exactly what i need minus the department name which requires me to then join the departments table to the results) all the data in the subquery is in one table- employees. while department name is in the departments table.
here is what i tried.
SELECT D.DEPARTMENT_NAME, T.PERDEPT
FROM
(
SELECT DEPARTMENT_ID, AVG(SALARY) AS PERDEPT
FROM EMPLOYEE
GROUP BY DEPARTMENT_ID
ORDER BY PERDEPT
) AS TEST T
JOIN DEPARTMENTS
ON D.DEPARTMENT_ID=T.DEPARTMENT_ID;
This returns a
SQL command not properly terminated
on the line with the AS TEST T
any and all help is greatly appreciated
many thanks
This query should do what you ask:
select d.department_name, avg(e.salary) as avg_salary
from salary_department d
left join employee e on e.department_id = d.department_id
group by d.department_name
order by avg(e.salary)
Simply correct your table aliases as you seem to have two aliases for subquery (TEST and T) and no assignment for D. Adjust SQL with one alias for each table/query reference:
...
(
SELECT ...
) AS T
JOIN DEPARTMENTS D
With that said, you do not even need the subquery as aggregate query with JOIN should suffice, assuming DEPARTMENT_ID is unique in DEPARTMENTS table to not double count the aggregate.
SELECT D.DEPARTMENT_NAME,
AVG(E.SALARY) AS PERDEPT
FROM EMPLOYEE E
JOIN DEPARTMENTS D
ON E.DEPARTMENT_ID = D.DEPARTMENT_ID
GROUP BY E.DEPARTMENT_ID,
D.DEPARTMENT_NAME
ORDER BY AVG(SALARY)

SQL query errors and mistakes

I have an issue with a SQL query.
The question: show all the departments in which the max salary is bigger than 10000.
I am getting an output with this but it doesn't seem right.
My code:
SELECT
Department_Name, Max_Salary
FROM
Departments
INNER JOIN
Job_History ON Departments.department_id = Job_History.department_id
INNER JOIN
Jobs ON Job_History.job_id = jobs.job_id
WHERE
Max_Salary > 10000
Output:
DEPT_NAME | MAX_SALARY
------------------------
Accounting | 16,000
Sales | 12,080
Sales | 20,080
There is only one Sales department in the database.
Any help on why this is happening would be appreciated.
Likely, there are multiple rows in job_history that are related to 'Sales' Department.
The join operation is returning all matching rows.
To get a distinct list of Department_Name, you could add GROUP BY Department_name to the end of the query. You'll also want to use an aggregate function around the Max_Salary column in the select list... e.g. MAX(Max_Salary).
Best practice is to qualify all column references in the query. For a reader not familiar with the database schema, it's not clear whether Max_Salary is from the Job_History table, or the Job table. Also, the keyword INNER has no effect on the join operation, that keyword can be omitted.
--This works
SELECT d.department_name
, MAX(j.max_salary) AS max_salary
FROM Departments d
JOIN Job_History h
ON h.department_id = d.department_id
JOIN Jobs j
ON j.job_id = h.job_id
WHERE j.max_salary > 10000
GROUP BY d.department_name
I would prefer to comment, but I don't have a high enough reputation.
Can you try this and see what it brings for the ids.
Select Department_Name, Max_Salary,D.department_id,J.job_id
From Departments D
INNER JOIN Job_History J_H
ON D.department_id=J_H.department_id
INNER JOIN Jobs J
ON J_H.job_id=J.job_id
WHERE Max_Salary > 10000
Select d.department_name
, MAX(j.max_salary) AS max_salary
From Departments D
INNER JOIN Job_History J_H
ON D.department_id=J_H.department_id
INNER JOIN Jobs J
ON J_H.job_id=J.job_id
GROUP BY d.department_name
having max(j.max_salary)>10000

subquery exercise

I need to write a query that contains a subquery where it would list the name of departments and the number of employees per department having the word 'Representative' in their job_title and the list must be ordered by department_id.
I've written this query
SELECT d.department_name, emp.employee_id
FROM departments d, employees emp, jobs j
WHERE emp.department_id=d.department_id
AND j.job_title LIKE '%Representative%';
If there is a requirement to use a subquery, then the following will achieve what you're looking for:
select d.department_id,
d.department_name,
(select count(*)
from employees emp
join jobs j
on j.job_id = emp.employee_job -- I've made some assumptions, here!
where emp.department_id = d.department_id
and j.job_title like '%Representative%') reps
from departments d
order by d.department_id;
Personally, however, I would use a query like this:
select d.department_id,
d.department_name,
count(emp.employee_id) reps
from departments d
join employees emp
on emp.department_id = d.department_id
join jobs j
on j.job_id = emp.employee_job -- Same assumption as before!
where j.job_title like '%Representative%'
group by d.department_id,
d.department_name
order by d.department_id;
I find it easier to read/interpret, but that's ultimately up to you.
You don't need a subquery for this. Use a simple JOIN.
SELECT d.department_name, COUNT(*) AS cnt
FROM employee e JOIN department d
ON e.department_id = d.department_id
JOIN jobs j ON e.jobid = j.jobid
WHERE j.job_title LIKE '%Representative%'
GROUP BY d.department_name
Or, if it is just an exercise for you, I would suggest using the following query:
SELECT d.department_name
, (SELECT COUNT(e.*)
FROM employees e JOIN jobs j ON e.jobid = j.jobid
WHERE e.department_id = d.department_id
AND j.job_title LIKE '%Representative%') AS cnt
FROM departments d
In the real world, however, you code for convenience, not just for exercise. Your code should be convenient to read, understand and maintain for all those who are involved in your software development process. If it is just an exercise then you can use the second query. But if you have to use the query in a live application, the approach in the first query is better for everyone around you.
There is some join logic missing in your example, but something like this may work for you:
select d.department_name, count(emp.employee_id)
from departments d, employees emp, jobs j
where j.job_title in (select job_title from jobs where job_title like '%Representative%')
group by d.department_name
The SQL may not be 100% correct but you can see the point. it's hard to complete it without all of the join logic.
This should return you all of the department names and the employee count where the employee job title contains representative.