Calculate salary difference between two rows in HIVE - hive

I have a table with below columns-
last_name, first_name, department, salary
I want to calculate list of employees who receive a salary less than 100, compared
to their immediate employee with higher salary in the same department. I went to below answer- Compute differences between succesive records in Hadoop with Hive Queries and tried but I think I am doing something wrong as I am new to HIVE.
Below is the query which I am running-
select last_name,first_name, salary from emp where
100 = LEAD(salary,1) OVER(PARTITION BY department ORDER BY salary)-salary;
Please help me with the solution.

Use a case expression.
SELECT last_name,
first_name,
salary
FROM (SELECT last_name,
first_name,
salary,
CASE
WHEN 100 > LEAD(salary, 1)
OVER(
PARTITION BY department
ORDER BY salary) - salary THEN 1
ELSE 0
END sal_flag
FROM emp)
WHERE sal_flag = 1;

Hive enforces every sub query to be given a name. I have just added the name to Kaushik's query. Try this, it will work.
SELECT last_name,
first_name,
salary
FROM (SELECT last_name,
first_name,
salary,
CASE
WHEN 100 > LEAD(salary, 1)
OVER(
PARTITION BY department
ORDER BY salary) - salary THEN 1
ELSE 0
END sal_flag
FROM employee) v
WHERE sal_flag = 1;
I personally prefer using WITH clause as opposed to subquery as below. With clauses make the query more readable. Also, they produce better execution plan generally.
WITH sal_view
AS (SELECT last_name,
first_name,
salary,
CASE
WHEN 100 > LEAD(salary, 1)
OVER(
PARTITION BY department
ORDER BY salary) - salary THEN 1
ELSE 0
END sal_flag
FROM employee)
SELECT last_name,
first_name,
salary
FROM sal_view
WHERE sal_flag = 1;

Try
with temp as(
select last_name,
first_name,
department,
salary,
LEAD(salary, 1)
OVER( PARTITION BY department
ORDER BY salary) as diff
FROM emp
)
select ast_name,
first_name,
department,
salary
from temp
where diff >100

Related

Analytic query trying to solve

im solving the following task with analytic functions and im stuck.
task: Write a query that shows the latest hired employee per department. In case of ties, use the lowest employee ID.
select a.EMPLOYEE_ID,
a.DEPARTMENT_ID,
a.FIRST_NAME,
a.LAST_NAME,
a.HIRE_DATE,
a.JOB_ID
from (select ROW_NUMBER() over (PARTITION by department_id order by hire_date desc)
from hr.EMPLOYEES a) A
where A = 1 ;
You need to include the columns you want to select in the outer query in the SELECT clause of the inner query and need to give an alias to the ROW_NUMBER computed value:
select EMPLOYEE_ID,
DEPARTMENT_ID,
FIRST_NAME,
LAST_NAME,
HIRE_DATE,
JOB_ID
from (
select EMPLOYEE_ID,
DEPARTMENT_ID,
FIRST_NAME,
LAST_NAME,
HIRE_DATE,
JOB_ID,
ROW_NUMBER() over (PARTITION by department_id order by hire_date desc) AS rn
from hr.EMPLOYEES
)
where rn = 1 ;
You still need to address the second part of the question:
In case of ties, use the lowest employee ID.
However, since this appears to be a homework question, I'll leave that for you to solve.

Analytic functions and means of window clause for calculating sum

I'm using Oracle and SQL Developer. I have downloaded HR schema and need to do some queries with it. Now I'm working with table Employees. As a user, I need the sum of salary of 3 employees with highest salary in each department. I have done query for defining 3 employees with highest salary in each department:
SELECT
*
FROM
(
SELECT
employee_id,
first_name
|| ' '
|| last_name,
department_id,
salary,
ROW_NUMBER()
OVER(PARTITION BY department_id
ORDER BY
salary DESC
--ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING
) result
FROM
employees
)
WHERE
result <= 3;
I need to use means of window clause. I have done something like this:
SELECT
department_id,
SUM(salary)
OVER (PARTITION BY department_id ORDER BY salary
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) total_sal
FROM
(
SELECT
employee_id,
first_name
|| ' '
|| last_name,
department_id,
salary,
ROW_NUMBER()
OVER(PARTITION BY department_id
ORDER BY
salary DESC
--ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING
) result
FROM
employees
)
WHERE
result <= 3;
Here is the result:
It has the necessary sum for 3 people in department and other unnnecessary results for 2 and so on. I need such result:
How can I modify my query to receive appropriate result (I need to use a window clause and analytic fuctions)?
You want aggregation rather than windowing in the outer query:
SELECT
department_id,
SUM(salary) total_sal
FROM
(
SELECT
employee_id,
first_name
|| ' '
|| last_name,
department_id,
salary,
ROW_NUMBER()
OVER(PARTITION BY department_id
ORDER BY
salary DESC
--ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING
) result
FROM
employees
) e
WHERE
result <= 3
GROUP BY department_id
I we were to do the same task with window functions only, then, starting from the existing query, we can either add another level of nesting of some sort, or use WITH TIES. Both pursue the same effect, which is to limit the results to one row per group.
The latter would look like:
SELECT
department_id,
SUM(salary) OVER(PARTITION BY department_id) total_sal
FROM (
SELECT e.*,
ROW_NUMBER() OVER(PARTITION BY department_id ORDER BY salary DESC) result
FROM employees e
) e
WHERE result <= 3
ORDER BY result FETCH FIRST ROW WITH TIES
While the former would phrase as:
SELECT department_id, total_sal
FROM (
SELECT e.*,
SUM(salary) OVER(PARTITION BY department_id) total_sal
FROM (
SELECT e.*,
ROW_NUMBER() OVER(PARTITION BY department_id ORDER BY salary DESC) result
FROM employees e
) e
WHERE result <= 3
) e
where result = 1

Oracle SQL sub query

I have a practice that I should find the employees who earn more than average salary and works in the departments with employees whose last name contains the letter u
the select statement I have used was
SELECT employee_id,
last_name,
salary
FROM employees
WHERE salary > (SELECT AVG(salary)
FROM employees )
AND department_id IN(SELECT department_id
FROM employees
WHERE LOWER(last_name) LIKE '%u%')
Could anyone check this statement is suitable or not ?
thank you
That looks fine to me, assuming you mean the average salary across all departments in the database, and all employees (active or not) across all of time.
I would think you might be more interested in all active employees in this current financial year, for example.
You haven't provided the schema, so be careful to check for conditions like:
inactive departments
inactive / terminated employees
period you are interested in for comparing the salary
Your queries looks like it will work. You can rewrite it to remove all the sub-queries (that will require additional table/index scans) and just use analytic queries:
SELECT employee_id,
last_name,
salary
FROM (
SELECT employee_id,
last_name,
salary,
AVG( salary ) OVER () AS avg_salary,
COUNT( CASE WHEN LOWER( last_name ) LIKE '%u%' THEN 1 END )
OVER ( PARTITION BY department_id ) AS num_last_name_with_u
FROM employees
)
WHERE salary > avg_salary
AND num_last_name_with_u > 0;
db<>fiddle
My first Question are you getting the expected result ?
Let me break down your Query
SELECT department_id FROM employees WHERE LOWER(last_name)
Here you are selecting the department so it retrieve the department id, what is the need of selecting department Id when all you need employee_id with last name contains u so change it to employee_id instead of department_id
select avg(salary) over (partition by department_id order by employee_id)
So using partition by you must get the avg salary per department
SELECT employee_id,last_name,salary
FROM
employees
WHERE salary>(SELECT AVG(salary) OVER (PARTITION BY department_id)
FROM
employees )
AND employee_id IN
( SELECT employee_id
FROM
employees
WHERE LOWER(last_name) LIKE '%u%')
Let me know if you have any issues running it, any corrections to Query is appreciated

MAX and GROUP BY - SQL

I'm working on Oracle SQL and HR database, I'm trying to select maximum salary in department like this:
SELECT MAX(salary), department_id
FROM employees GROUP BY department_id;
It works fine, but I want to know >who< is earning the most, so I simply change query this way:
SELECT first_name, last_name, MAX(salary), department_id
FROM employees GROUP BY department_id;
And it's wrong. Could you help me, please?
The most efficient way to do this sort of analysis is generally to use analytic functions (window functions). Something like
SELECT first_name,
last_name,
salary,
department_id
FROM (SELECT e.*,
rank() over (partition by department_id
order by salary desc) rnk
FROM employees e)
WHERE rnk = 1
Depending on how you want to handle ties (if two people in the department are tied for the maximum salary, for example, do you want both people returned or do you want to return one of the two arbitrarily), you may want to use the row_number or dense_rank functions rather than rank.
Create a view:
create view max_salary as select max(salary), department_id from employees group by department_id
Then create a query:
select first_name, last_name, salary, department_id
from employees a, max_salary b
where a.department_id = b.department_id
and a.salary = b.salary

sql query group by statement suggestion

I am learning database query and want to find out the following SQL query from the HR Schema in Oracle database.
Find number of employees in each salary group. Salary groups are considered as follows.
Group 1: 0k to <5K, 5k to <10k, 10k to <15k, and so on.
what will be the possible query code ??
Tables are as follows: There are seven tables in total
REGIONS: REGION_ID, REGION_NAME
COUNTRIES: COUNTRY_ID, COUNTRY_NAME, REGION_ID
LOCATIONS: LOCATION_ID, STREET_ADDRESS, POSTAL_CODE, CITY, STATE_PROVINCE, COUNTRY_ID
DEPARTMENTS: DEPARTMENT_ID, DEPARTMENT_NAME, MANAGER_ID, LOCATION_ID
EMPLOYEES: EMPLOYEE_ID, FIRST_NAME, LAST_NAME, EMAIL, PHONE_NUMBER, HIRE_DATE, JOB_ID, SALARY, COMMISSION_PCT, MANAGER_ID, DEPARTMENT_ID
JOB_HISTORY: EMPLOYEE_ID, START_DATE, END_DATE, JOB_ID, DEPARTMENT_ID
JOBS: JOB_ID, JOB_TITLE, MIN_SALARY, MAX_SALARY
I first use a common table expression (CTE) to calculate the different groups using a case statement...
WITH CTE AS (
SELECT Emp_ID,
case when salary >= 0 salary < 5000 then "<5K"
when salary >= 5000 and salary < 10000 then "<10k"
when salary >= 10000 and salary < 15000 then "<15k"
else "UDF" end as SalaryGroup
FROM Employees)
SELECT count(Emp_ID) as Cnt, SalaryGroup
FROM CTE
GROUP BY SalaryGroup;
I then select from that table to give you the counts by the salary group calculated.
select trunc(salary/5000,00) e,
count(*)
from EMPLOYEES
group by trunc(SALARY/5000,0)
ORDER by e asc