SQl server query multiple aggregate columns - sql

I need to write a query in sql server to data get like this.
Essentially it is group by dept, race, gender and then
SUM(employees_of_race_by_gender),Sum(employees_Of_Dept).
I could get data of first four columns, getting sum of employees in that dept is becoming difficult.
Could you pls help me in writing the query?
All these details in same table Emp. Columns of Emp are Emp_Number, Race_Name,Gender,Dept

Your "num_of_emp_in_race" is actually by Gender too
SELECT DISTINCT
Dept,
Race_name,
Gender,
COUNT(*) OVER (PARTITION BY Dept, Race_name, Gender) AS num_of_emp_in_race,
COUNT(*) OVER (PARTITION BY Dept) AS num_of_emp_dept
FROM
MyTable
You should probably have this
COUNT(*) OVER (PARTITION BY Dept, Gender) AS PerDeptRace
COUNT(*) OVER (PARTITION BY Dept, Race_name) AS PerDeptGender,
COUNT(*) OVER (PARTITION BY Dept, Race_name, Gender) AS PerDeptRaceGender,
COUNT(*) OVER (PARTITION BY Dept) AS PerDept
Edit: the DISTINCT appears to be applied before the COUNT (which would odd based on this) so try this instead
SELECT DISTINCT
*
FROM
(
SELECT
Dept,
Race_name,
Gender,
COUNT(*) OVER (PARTITION BY Dept, Race_name, Gender) AS num_of_emp_in_race,
COUNT(*) OVER (PARTITION BY Dept) AS num_of_emp_dept
FROM
MyTable
) foo

Since the two sums you're looking for are based on a different aggregation, you need to calculate them separately and join the result. In such cases I first build the selects to show me the different results, making it easy to catch errors early:
SELECT Dept, Gender, race_name, COUNT(*) as num_of_emp_in_race
FROM Emp
GROUP BY 1, 2, 3
SELECT Dept, COUNT(*) as num_of_emp_in_dept
FROM Emp
GROUP BY 1
Afterwards, joining those two is pretty straight forward:
SELECT *
FROM ( first statement here ) as by_race
JOIN ( second statement here ) as by_dept ON (by_race.Dept = by_dept.Dept)

Related

Analytic query trying to solve

im solving the following task with analytic functions and im stuck.
task: Write a query that shows the latest hired employee per department. In case of ties, use the lowest employee ID.
select a.EMPLOYEE_ID,
a.DEPARTMENT_ID,
a.FIRST_NAME,
a.LAST_NAME,
a.HIRE_DATE,
a.JOB_ID
from (select ROW_NUMBER() over (PARTITION by department_id order by hire_date desc)
from hr.EMPLOYEES a) A
where A = 1 ;
You need to include the columns you want to select in the outer query in the SELECT clause of the inner query and need to give an alias to the ROW_NUMBER computed value:
select EMPLOYEE_ID,
DEPARTMENT_ID,
FIRST_NAME,
LAST_NAME,
HIRE_DATE,
JOB_ID
from (
select EMPLOYEE_ID,
DEPARTMENT_ID,
FIRST_NAME,
LAST_NAME,
HIRE_DATE,
JOB_ID,
ROW_NUMBER() over (PARTITION by department_id order by hire_date desc) AS rn
from hr.EMPLOYEES
)
where rn = 1 ;
You still need to address the second part of the question:
In case of ties, use the lowest employee ID.
However, since this appears to be a homework question, I'll leave that for you to solve.

Need help on sub-query of Spark-SQL Databricks

I have below mentioned SQL and getting below mentioned dataset as result. But i want to display only one Open status record which has MIN date.
SELECT distinct o.svc_ord_nbr AS SVC_ORD_NBR,
o.svc_ord_stat_nm AS SVC_ORD_STAT_NM,
min(t.start_date_est) AS STRT_DT_EST, t.status_text
FROM A o inner join B t on t.ticket=o.notif_nbr
and o.svc_ord_nbr in ('021519_574819','110714_246149')
Group by o.svc_ord_nbr, o.svc_ord_stat_nm, t.status_text
The Result dataset looks like this:
I want only the first row which is having MIN of STRT_DT_EST.
Thanks in Advance...
Have you tried with window functions for this use case .
spark.sql(
“””
|SELECT a.*,
|ROW_NUMBER() OVER(PARTITION BY dept ORDER BY salary) as rn,
|RANK() OVER(PARTITION BY dept ORDER BY salary) as rank,
|DENSE_RANK() OVER(PARTITION BY dept ORDER BY salary) as dense_rank,
|PERCENT_RANK() OVER(PARTITION BY dept ORDER BY salary) as percent_rank,
|NTILE(3) OVER(PARTITION BY dept ORDER BY salary) as ntile
|FROM employee a
|”””.stripMargin).show(false)

Column must appear in group by or aggregate function in nested query

I have the following table.
Fights (fight_year, fight_round, winner, fid, city, league)
I am trying to query the following:
For each year that appears in the Fights table, find the city that held the most fights. For example, if in year 1992, Jersey held more fights than any other city did, you should print out (1992, Jersey)
Here's what I have so far but I keep getting the following error. I am not sure how I should construct my group by functions.
ERROR: column, 'ans.fight_round' must appear in the GROUP BY clause or be used in an aggregate function. Line 3 from (select *
select fight_year, city, max(*)
from (select *
from (select *
from fights as ans
group by (fight_year)) as l2
group by (ans.city)) as l1;
In Postgres, I would recommend aggregation and distinct on:
select distinct on (flight_year) flight_year, city, count(*) cnt
from flights
group by flight_year, city
order by flight_year, count(*) desc
This counts how many fights each city had each year, and retains the city with most fight per year.
If you want to allow ties, then use window functions:
select flight_year, city, cnt
from (
select flight_year, city, count(*) cnt,
rank() over(partition by flight_year order by count(*) desc) rn
from flights
group by flight_year, city
) f
where rn = 1
Although row_number is the easiest way as done by #GMB. Can try this alternative as well
select city, fight_year
from fights
group by city, fightyear
having count(*) = sum(case when fid is not null then 1 end)

Running Order by on column alias

I am trying to run following SQL query on northwind database :
SELECT * FROM (
SELECT DISTINCT ROW_NUMBER() OVER (ORDER BY Joinning DESC) rownum,
LastName, Country, HireDate AS Joinning
FROM Employees
WHERE Region IS NOT NULL
) r
It's giving me the error :
Invalid column name 'Joinning'.
The 'rownumber' is required for pagination.
Can anybody please suggest how I can sort on the Joining alias with the rownumber generated ?
--A possible work around
Just figured out a work around; please suggest if anything is wrong or need changes :
SELECT ROW_NUMBER() OVER (ORDER BY Joinning DESC) rownum,* FROM (
SELECT
LastName, Country, HireDate AS Joinning
FROM Employees
WHERE Region IS NOT NULL
) r
--To put further where clause on row number(what I wanted to do for pagination):
With myres as(
SELECT ROW_NUMBER() OVER (ORDER BY Joinning DESC) rownum,* FROM (
SELECT
LastName, Country, HireDate AS Joinning
FROM Employees
WHERE Region IS NOT NULL
) a
) Select * from myres where myres.rownum > 0 and myres.rownum < = 0+20
Try
SELECT * FROM (
SELECT DISTINCT ROW_NUMBER() OVER (ORDER BY HireDate DESC) rownum,
LastName, Country, HireDate AS Joinning
FROM Employees
WHERE Region IS NOT NULL
) r
Hope you ahve joinning in your table.
The order by clause is usually given at the last of the query like this :
SELECT * FROM (
SELECT DISTINCT ROW_NUMBER() rownum,
LastName, Country, HireDate AS Joinning)
FROM Employees
WHERE Region IS NOT NULL ORDER BY Joinning DESC)
Hope this helps you!
Use Original Name of the field, That will work just fine HireDate
SELECT * FROM (
SELECT DISTINCT ROW_NUMBER() OVER (ORDER BY HireDate DESC) rownum,
LastName, Country, HireDate AS Joinning
FROM Employees
WHERE Region IS NOT NULL
) r

How to include column in resultset without including it in group by clause or performing any aggregation on it?

Given a list containing City, EmpName and Salary, sorted by city and EmpName, how to output each EmpName and Salary with the total Salary per City?
Here is what I have got:
select EmpName, sum(Salary) from table group by province;
But it gives me error as I have not included the EmpName in the group by clause and/or am not performing any aggregation on it. How can i achieve the desired results? Any help?
If, what you want, is the sum of the salary in the city for each employee, then you have two options. The first should work in almost any database:
select EmpName, tcity.CitySalary
from t join
(select City, sum(Salary) as CitySalary
from t
group by city
) tcity
on tcity.city = t.city
The second way is to use a window function. Notably, this doesn't work on mysql:
select EmptName, sum(salary) over (partition by city)
from t
SELECT t.City, t.EmpName, t.Salary, x.city_salary
FROM table t,
(SELECT City, SUM(Salary) as city_salary
FROM table
GROUP BY City) x
WHERE x.city = t.city;