HIVE Group By in Multiple Tables - sql

Two tables - 'salaries' and 'master1'
The salaries is by year, and I can group them to get the sums for each player using
SELECT playerID, sum(salary) as sal
FROM salaries
GROUP BY playerID ORDER BY sal DESC LIMIT 10;
This returns the playerID and sum of salary, but I need the names of the players from the 'master1' table under column 'nameFirst' and 'nameLast'. They have the common column of 'playerID' in both 'master1' and 'salaries' but when I try to run
SELECT master1.nameFirst, master1.nameLast, sum(salary) as sal
FROM salaries, master1
GROUP BY salaries.playerID ORDER BY sal DESC LIMIT 10;
I get the error
Expression not in GROUP BY key 'nameFirst'
I have tried tinkering with it to continue getting errors.
Thanks!

Need to include nameFirst and nameLast in the group by:
SELECT
master1.nameFirst,
master1.nameLast,
sum(salary) as sal
FROM salaries JOIN master1 ON salaries.playID = master1.playerID
GROUP BY master1.nameFirst, master1.nameLast, salaries.playerID
ORDER BY sal DESC LIMIT 10;

First, you would need to use proper explicit JOIN syntax
SELECT
MAX(m.nameFirst) FirstName,
MAX(m.nameLast) LastName,
SUM(s.salary) Salary
FROM master1 m
INNER JOIN salaries s ON m.playerID = s.playerID
GROUP BY m.playerID
Use, master1 table to get the FirstName, LastName and do the JOIN with
salaries table to get the total salary of each player.
For, your current query exception when you are using GROUP BY clause make ensure that the columns/expressions in SELECT statement needs to be aggregate.

Related

basic sql employee

The employee table (EMP) specifies groups by department code, the sum of salaries by group, the average (constant treatment), and the number of people in each group's salary, and is presented below, listed in department code order. I would like to modify the following SQL syntax to look up departments whose average salary exceeds 2800000.
SELECT
DEPT
, SUM(SALARY) 합계
, FLOOR(AVG(SALARY)) 평균
, COUNT(*) 인원수
FROM
EMP
GROUP BY
DEPT
ORDER BY DEPT ASC;
question 1. Conditions that need to be modified
question 2. What should I add to the presented code?
I can't read your aliases so I'll just presume what they mean.
If query - you posted in the question - works OK, then use it as a CTE and select desired data from it:
with data as
(select dept,
sum(salary) sumsal,
floor(avg(salary)) avgsal,
count(*) cnt
from emp
group by dept
)
select *
from data
where avgsal > 2800000;
you can use below sql:
SELECT
DEPT
, SUM(SALARY) 합계
, FLOOR(AVG(SALARY)) 평균
, COUNT(*) 인원수
FROM
EMP
GROUP BY
DEPT having FLOOR(AVG(SALARY) > 2800000
ORDER BY DEPT ASC;
You can filter aggregated result using having
SELECT
DEPT
, SUM(SALARY) 합계
, FLOOR(AVG(SALARY)) 평균
, COUNT(*) 인원수
FROM
EMP
GROUP BY DEPT
HAVING AVG(SALARY) >2800000
ORDER BY DEPT ASC;

Max Salary with a single GroupBy without Joins

Schema for EMPLOYEE
(ID, EMPLOYEENAME, SALARY, ORGANIZATIONID)
Query to Solve: Find employee Names in each organization with Maximum Salary without a Join.
SELECT E.*
FROM EMPLOYEE E,
(SELECT EMP.ORGANIZATIONID, MAX(EMP.SALARY)
FROM EMPLOYEE EMP
GROUP BY EMP.ORGANIZATIONID) MAXSALARY
WHERE MAXSALARY.SALARY =E.SALARY
AND E.ORGANIZATIONID=EMP.ORGANIZATIONID ;
Is there a way to avoid the join? I am using Spark SQL API and joins cause an extra shuffle operation which is expensive. Is there a way to get the employee name while getting the max salary?
Assume you have a single employee in each organization having the max salary
You can use PARTITION BY with Spark SQL as shown below (Although it will require a subquery)
SELECT E.*
FROM
(SELECT EMP.EMPLOYEENAME, EMP.ORGANIZATIONID, EMP.SALARY,
row_number() OVER (PARTITION BY ORGANIZATIONID ORDER BY SALARY DESC) as rank
FROM EMPLOYEE EMP
) AS E
WHERE E.rank=1
Try this:
SELECT P.ORGANIZATIONID, P.EMPLOYEENAME
FROM EMPLOYEE P
WHERE P.SALARY = (SELECT MAX(E.SALARY) FROM EMPLOYEE E WHERE P.ORGANIZATIONID = E.ORGANIZATIONID)
GROUP BY P.ORGANIZATIONID, P.EMPLOYEENAME
Try this:
SELECT EMPLOYEENAME FROM EMPLOYEE
WHERE SALARY IN (SELECT MAX(SALARY) FROM EMPLOYEE GROUP BY ORGANIZATIONID)

To display the name of the department that has the least student count in SQL

Tables:
Department (dept_id,dept_name)
Students(student_id,student_name,dept_id)
I am using Oracle. I have to print the name of that department that has the minimum no. of students. Since I am new to SQL, I am stuck on this problem. So far, I have done this:
select d.department_id,d.department_name,
from Department d
join Student s on s.department_id=d.department_id
where rownum between 1 and 3
group by d.department_id,d.department_name
order by count(s.student_id) asc;
The output is incorrect. It is coming as IT,SE,CSE whereas the output should be IT,CSE,SE! Is my query right? Or is there something missing in my query?
What am I doing wrong?
One of the possibilities:
select dept_id, dept_name
from (
select dept_id, dept_name,
rank() over (order by cnt nulls first) rn
from department
left join (select dept_id, count(1) cnt
from students
group by dept_id) using (dept_id) )
where rn = 1
Group data from table students at first, join table department, rank numbers, take first row(s).
left join are used is used to guarantee that we will check departments without students.
rank() is used in case that there are two or more departments with minimal number of students.
To find the department(s) with the minimum number of students, you'll have to count per department ID and then take the ID(s) with the minimum count.
As of Oracle 12c this is simply:
select department_id
from student
group by department_id
order by count(*)
fetch first row with ties
You then select the departments with an ID in the found set.
select * from department where id in (<above query>);
In older versions you could use RANK instead to rank the departments by count:
select department_id, rank() over (order by count(*)) as rnk
from student
group by department_id
The rows with rnk = 1 would be the department IDs with the lowest count. So you could select the departments with:
select * from department where (id, 1) in (<above query>);

Select records with maximum value

I have a table that is called: customers.
I'm trying to get the name and the salary of the people who have the maximum salary.
So I have tried this:
SELECT name, salary AS MaxSalary
FROM CUSTOMERS
GROUP BY salary
HAVING salary = max(salary)
Unfortunately, I got this error:
Column 'CUSTOMERS.name' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
I know I should add the name column to the group by clause, but I get all the records of the table.
I know that I can do it by:
SELECT name, salary
FROM CUSTOMERS
WHERE salary = (SELECT MAX(salary) FROM CUSTOMERS)
But I want to achieve it by group by and having clauses.
This requirement isn't really suited for a group by and having solution. The easiest way to do so, assuming you're using a modern-insh version of MS SQL Server, is to use the rank window function:
SELECT name, salary
FROM (SELECT name, salary, RANK() OVER (ORDER BY salary DESC) rk
FROM customers) c
WHERE rk = 1
Mureinik's answer is good with rank, but if you didn't want a windowed function for whatever reason, you can just use a CTE or a subquery.
with mxs as (
select
max(salary) max_salary
from
customers
)
select
name
,salary
from
customers cst
join mxs on mxs.max_salary = cst.salary
There was no need to use group by and having clause there, you know. But if you want to use them then query should be
SELECT name, salary
FROM CUSTOMERS
GROUP BY salary
having salary = (select max(salary) from CUSTOMERS)

SQL Server more columns than group by

I have the following example tables
Employee (EmpID, DepID)
Order (OrderID, EMpID, description)
What I'm trying to achieve is to select employees with most orders by department. I'm on it for like 4 hours already and can't find resolution to this perhaps easy problem.
All I get is either number of order by employee or max number of orders by one employee in one department but I'm struggling to get result as:
DepID, EmpID, Number of orders
Here's my solution for you :
WITH Temp AS (
SELECT
emp.EMpID
,emp.DepID
,COUNT(OrderId) nb_order
,ROW_NUMBER() OVER(PARTITION BY emp.DepID ORDER BY COUNT(OrderId) DESC) Ordre
FROM
Order ord
INNER JOIN
Employee emp
ON emp.EmpID = ord.EmpID
GROUP BY
emp.EMpID
,emp.DepID)
SELECT *
FROM Temp
WHERE Ordre = 1
I hope this will help you :)