sql query group by statement suggestion - sql

I am learning database query and want to find out the following SQL query from the HR Schema in Oracle database.
Find number of employees in each salary group. Salary groups are considered as follows.
Group 1: 0k to <5K, 5k to <10k, 10k to <15k, and so on.
what will be the possible query code ??
Tables are as follows: There are seven tables in total
REGIONS: REGION_ID, REGION_NAME
COUNTRIES: COUNTRY_ID, COUNTRY_NAME, REGION_ID
LOCATIONS: LOCATION_ID, STREET_ADDRESS, POSTAL_CODE, CITY, STATE_PROVINCE, COUNTRY_ID
DEPARTMENTS: DEPARTMENT_ID, DEPARTMENT_NAME, MANAGER_ID, LOCATION_ID
EMPLOYEES: EMPLOYEE_ID, FIRST_NAME, LAST_NAME, EMAIL, PHONE_NUMBER, HIRE_DATE, JOB_ID, SALARY, COMMISSION_PCT, MANAGER_ID, DEPARTMENT_ID
JOB_HISTORY: EMPLOYEE_ID, START_DATE, END_DATE, JOB_ID, DEPARTMENT_ID
JOBS: JOB_ID, JOB_TITLE, MIN_SALARY, MAX_SALARY

I first use a common table expression (CTE) to calculate the different groups using a case statement...
WITH CTE AS (
SELECT Emp_ID,
case when salary >= 0 salary < 5000 then "<5K"
when salary >= 5000 and salary < 10000 then "<10k"
when salary >= 10000 and salary < 15000 then "<15k"
else "UDF" end as SalaryGroup
FROM Employees)
SELECT count(Emp_ID) as Cnt, SalaryGroup
FROM CTE
GROUP BY SalaryGroup;
I then select from that table to give you the counts by the salary group calculated.

select trunc(salary/5000,00) e,
count(*)
from EMPLOYEES
group by trunc(SALARY/5000,0)
ORDER by e asc

Related

Analytic functions and plain SQL equivalent

I'm using Oracle and SQL Developer. I have downloaded HR schema and need to do some queries with it. Now I'm working with table Employees. As an user I need to see the list of employees with lowest salary in each department. I need to provide different solutions by means of plain SQL and one of analytic functions. About analytic functions, I have used RANK():
SELECT *
FROM
(SELECT
employee_id,
first_name,
department_id,
salary,
RANK() OVER (PARTITION BY department_id
ORDER BY salary) result
FROM
employees)
WHERE
result = 1
AND department_id IS NOT NULL;
The result seems correct:
but when I try to use plain SQL I actually get all employees with their salaries.
Here is my attempt with GROUP BY:
SELECT
department_id, MIN(salary) AS "Lowest salary"
FROM
employees
GROUP BY
department_id;
This code seems good, but I need to also get columns first_name and employee_id.
I tried to do something like this:
SELECT
employee_id,
first_name,
department_id,
MIN(salary) result
FROM
employees
GROUP BY
employee_id,
first_name,
department_id;
and this:
SELECT
employee_id,
first_name,
salary,
departments.department_id
FROM
employees
LEFT OUTER JOIN
departments ON (employees.department_id = departments.department_id)
WHERE
employees.salary = (SELECT MIN(salary)
FROM departments
WHERE department_id = employees.department_id)
These seem wrong. How can I change or modify my queries to get the same result as when I'm using RANK() by means of plain SQL (two solutions at least)?
One of the options could be like here (with old EMP table)...
SELECT EMPNO, ENAME, DEPTNO, SAL
FROM EMP e
WHERE SAL = (Select MIN_SAL From (SELECT DEPTNO, Min(SAL) "MIN_SAL"
FROM EMP
GROUP BY DEPTNO)
Where DEPTNO = e.DEPTNO)
ORDER BY DEPTNO, SAL;
Second option could be...
SELECT EMPNO, ENAME, DEPTNO, SAL
FROM (SELECT e.EMPNO, e.ENAME, e.DEPTNO, e. SAL, (Select Min(SAL) "MIN_SAL" From EMP Where DEPTNO = e.DEPTNO) "MIN_SAL" From EMP e)
WHERE SAL = MIN_SAL
ORDER BY DEPTNO, SAL;
Regards...
You can use a subquery to find the lowest salary per employee and use the main query to only show the information of those employees that are selected by this subquery:
SELECT
employee_id,
first_name,
department_id,
salary
FROM employees e1
WHERE salary =
(SELECT MIN(e2.salary)
FROM employees e2
WHERE e1.employee_id = e2.employee_id);
This will produce exactly the same outcome as your query with RANK.
I think it would make sense to apply some sorting which is missing in your query. I don't know how you want to sort, but here an example to sort by the employee's name:
SELECT
employee_id,
first_name,
department_id,
salary
FROM employees e1
WHERE salary =
(SELECT MIN(e2.salary)
FROM employees e2
WHERE e1.employee_id = e2.employee_id)
ORDER BY first_name;
Since you asked for at least two solutions, let's have a look on another option:
SELECT
e1.employee_id,
e1.first_name,
e1.department_id,
e1.salary
FROM employees e1
JOIN (
SELECT employee_id, MIN(salary) salary
FROM employees
GROUP BY employee_id ) e2
ON e1.employee_id = e2.employee_id AND e1.salary = e2.salary
ORDER BY first_name;
As you can see, this differs since the sub query will apply a GROUP BY clause and it can be successfully executed as own query which is not possible for the sub query used in the previous query.
The JOIN to the main query will then make sure to get again the desired result.
Here are some options to get the employees with the minimum salary in their department:
With MIN (salary) OVER (...)
select employee_id, first_name, department_id, salary
from
(
select e.*, min(salary) over (partition by department_id) as min_sal
from employees e
)
where sal = min_sal;
With RANK and FETCH FIRST
select *
from employees
order by rank() over (partition by department_id order by salary)
fetch first row with ties;
With IN
select *
from employees
where (department_id, salary) in
(
select department_id, min(salary)
from employees
group by department_id
);
With NOT EXISTS
select *
from employees e
where not exists
(
select null
from employees other
where other.department_id = e.department_id
and other.salary < e.salary
);
If you will only ever have one person with the minimum salary per department then you can use KEEP:
SELECT department_id,
MIN(employee_id) KEEP (DENSE_RANK FIRST ORDER BY salary) AS employee_id,
MIN(first_name) KEEP (DENSE_RANK FIRST ORDER BY salary, employee_id) AS first_name,
MIN(salary) AS min_salary
FROM employees
GROUP BY department_id
Which, for the sample data:
CREATE TABLE employees (employee_id, department_id, first_name, salary) AS
SELECT 1, 1, 'Alice', 1000 FROM DUAL UNION ALL
SELECT 2, 1, 'Betty', 2000 FROM DUAL UNION ALL
SELECT 3, 2, 'Carol', 3000 FROM DUAL UNION ALL
SELECT 4, 2, 'Debra', 3000 FROM DUAL UNION ALL
SELECT 5, 2, 'Emily', 4000 FROM DUAL;
Outputs:
DEPARTMENT_ID
EMPLOYEE_ID
FIRST_NAME
MIN_SALARY
1
1
Alice
1000
2
3
Carol
3000
Note: this will not match Debra, even though she also has the lowest salary in department 2, as it will only find a single employee with the minimum salary and the minimum employee id.
If you can have multiple employees with the same minimum-per-department then you can use a correlated sub-query:
SELECT department_id,
employee_id,
first_name,
salary
FROM employees e
WHERE EXISTS(
SELECT 1
FROM employees x
WHERE e.department_id = x.department_id
HAVING MIN(x.salary) = e.salary
);
Which, for the sample data, outputs:
DEPARTMENT_ID
EMPLOYEE_ID
FIRST_NAME
SALARY
1
1
Alice
1000
2
3
Carol
3000
2
4
Debra
3000
Which does return Debra.
fiddle

Oracle SQL sub query

I have a practice that I should find the employees who earn more than average salary and works in the departments with employees whose last name contains the letter u
the select statement I have used was
SELECT employee_id,
last_name,
salary
FROM employees
WHERE salary > (SELECT AVG(salary)
FROM employees )
AND department_id IN(SELECT department_id
FROM employees
WHERE LOWER(last_name) LIKE '%u%')
Could anyone check this statement is suitable or not ?
thank you
That looks fine to me, assuming you mean the average salary across all departments in the database, and all employees (active or not) across all of time.
I would think you might be more interested in all active employees in this current financial year, for example.
You haven't provided the schema, so be careful to check for conditions like:
inactive departments
inactive / terminated employees
period you are interested in for comparing the salary
Your queries looks like it will work. You can rewrite it to remove all the sub-queries (that will require additional table/index scans) and just use analytic queries:
SELECT employee_id,
last_name,
salary
FROM (
SELECT employee_id,
last_name,
salary,
AVG( salary ) OVER () AS avg_salary,
COUNT( CASE WHEN LOWER( last_name ) LIKE '%u%' THEN 1 END )
OVER ( PARTITION BY department_id ) AS num_last_name_with_u
FROM employees
)
WHERE salary > avg_salary
AND num_last_name_with_u > 0;
db<>fiddle
My first Question are you getting the expected result ?
Let me break down your Query
SELECT department_id FROM employees WHERE LOWER(last_name)
Here you are selecting the department so it retrieve the department id, what is the need of selecting department Id when all you need employee_id with last name contains u so change it to employee_id instead of department_id
select avg(salary) over (partition by department_id order by employee_id)
So using partition by you must get the avg salary per department
SELECT employee_id,last_name,salary
FROM
employees
WHERE salary>(SELECT AVG(salary) OVER (PARTITION BY department_id)
FROM
employees )
AND employee_id IN
( SELECT employee_id
FROM
employees
WHERE LOWER(last_name) LIKE '%u%')
Let me know if you have any issues running it, any corrections to Query is appreciated

Calculate salary difference between two rows in HIVE

I have a table with below columns-
last_name, first_name, department, salary
I want to calculate list of employees who receive a salary less than 100, compared
to their immediate employee with higher salary in the same department. I went to below answer- Compute differences between succesive records in Hadoop with Hive Queries and tried but I think I am doing something wrong as I am new to HIVE.
Below is the query which I am running-
select last_name,first_name, salary from emp where
100 = LEAD(salary,1) OVER(PARTITION BY department ORDER BY salary)-salary;
Please help me with the solution.
Use a case expression.
SELECT last_name,
first_name,
salary
FROM (SELECT last_name,
first_name,
salary,
CASE
WHEN 100 > LEAD(salary, 1)
OVER(
PARTITION BY department
ORDER BY salary) - salary THEN 1
ELSE 0
END sal_flag
FROM emp)
WHERE sal_flag = 1;
Hive enforces every sub query to be given a name. I have just added the name to Kaushik's query. Try this, it will work.
SELECT last_name,
first_name,
salary
FROM (SELECT last_name,
first_name,
salary,
CASE
WHEN 100 > LEAD(salary, 1)
OVER(
PARTITION BY department
ORDER BY salary) - salary THEN 1
ELSE 0
END sal_flag
FROM employee) v
WHERE sal_flag = 1;
I personally prefer using WITH clause as opposed to subquery as below. With clauses make the query more readable. Also, they produce better execution plan generally.
WITH sal_view
AS (SELECT last_name,
first_name,
salary,
CASE
WHEN 100 > LEAD(salary, 1)
OVER(
PARTITION BY department
ORDER BY salary) - salary THEN 1
ELSE 0
END sal_flag
FROM employee)
SELECT last_name,
first_name,
salary
FROM sal_view
WHERE sal_flag = 1;
Try
with temp as(
select last_name,
first_name,
department,
salary,
LEAD(salary, 1)
OVER( PARTITION BY department
ORDER BY salary) as diff
FROM emp
)
select ast_name,
first_name,
department,
salary
from temp
where diff >100

Display salary, Avg(salary), Name for those who earns more than company Avg(salary)

how to display the name, salary and the avg(salary) for all the employees whose salary is greater than the company avg(salary).
I have tried the following query:
Select last_name, salary
From employees
Where salary >(select avg(salary) from employees);
This gives the names of those employees who are getting higher salary than the company avg(salary). But I want to display the avg(salary) in the select list aswell.
Join the employee table to a query that produces the average:
select last_name, salary, avg_salary
from employees
join (select avg(salary) avg_salary from employees) x
on salary > avg_salary
This query will work on all databases.
Assuming sql server, here is but one of many ways to accomplish this:
DECLARE #AverageSalary MONEY
SELECT #AverageSalary=AVG(SALARY) FROM EMPLOYEES
Select last_name, salary, #AverageSalary From employees Where salary > #AverageSalary

ORA-00934: Group function not allowed here || Selecting MIN(Salary) of highest paid dept

O community, do you know how I could select the department_ID, and lowest salary of the department with the highest average salary? Or how to eliminate the'ORA-00934: group function not allowed here' issue? Would I need to use two subqueries?
So far, this is what I've come up with, trying to get the department_ID of the highest paid department:
SELECT department_ID, MIN(salary
FROM employees
WHERE department_ID = (SELECT department_ID
FROM employees WHERE salary = MAX(salary));
Thank you, your assistance is greatly appreciated.
I can't test this, but it should work:
;WITH DepartmentsSalary AS
(
SELECT department_ID, AVG(Salary) AvgSalary, MIN(Salary) MinSalary
FROM employees
GROUP BY department_ID
)
SELECT department_ID, MinSalary
FROM ( SELECT department_ID, AvgSalary, MAX(AvgSalary) OVER() MaxSalary, MinSalary
FROM DepartmentsSalary) D
WHERE MaxSalary = AvgSalary
You can use join (then you have just one sub query)
select e1.department_ID, min(e1.salary)
from employees e1
join (
select avg_query.department_ID, max(avg_query.avg_value)
from (
select department_ID, avg(salary) as avg_value
from employees
group by department_ID
) avg_query
) e2 on e2.department_ID = e1.department_ID
;
First sub-query returned average salary for all departments
Next sub-query based on first sub-query returned highest average
salary and related department_ID
Main query returned min salary for department_ID with highest average
salary