How to get the MAX result even if the MAX is different - sql

I have a code where I am pulling language knowledge and each employee has a plan year, they do not all complete them each year and in order to get their most recent one I use the MAX for plan year. Now one of the criteria is whether or not they are willing to move over seas, the issue arises that it will bring up their most recent YES and most recent NO and I just need their most recent plan year whether it be yes or no, I am having difficulty troubleshooting this. The code is as follows:
SELECT Employee_ID, Accept_International_Assignment, MAX(Plan_Year) AS Expr1
FROM dbo.v_sc08_CD_Employee_Availabilities
GROUP BY Employee_ID, Accept_International_Assignment

I suspect this will be more efficient than the accepted answer, at scale...
;WITH x AS
(
SELECT Employee_ID, Accept_International_Assignment, Plan_Year,
rn = ROW_NUMBER() OVER (PARTITION BY Employee_ID ORDER BY Plan_Year DESC)
FROM dbo.v_sc08_CD_Employee_Availabilities -- who comes up with these names?
)
SELECT Employee_ID, Accept_International_Assignment, Plan_Year
FROM x WHERE rn = 1;

SELECT a.Employee_ID, a.Accept_International_Assignment, a.Plan_Year
FROM dbo.v_sc08_CD_Employee_Availabilities a
INNER JOIN (SELECT Employee_ID, MAX(Plan_Year) maxPlanYear
from dbo.v_sc08_CD_Employee_Availabilities
GROUP BY Employee_ID) m
ON a.Plan_Year = m.maxPlanYear AND a.Employee_ID = m.Employee_ID

I'm not sure if you want only the most recent decision and year as Raphael posted or if you want the yes's and the no's but always with the Max plan year for that employee.
Here is a query for the yes's and no's but Max plan year is always the max for the employee.
select main.Employee_ID, Accept_International_Assignment, Expr1
from (
SELECT Employee_ID, Accept_International_Assignment
FROM #v_sc08_CD_Employee_Availabilities
GROUP BY Employee_ID, Accept_International_Assignment
) main
inner join
(
select Employee_ID, MAX(Plan_Year) as Expr1
from #v_sc08_CD_Employee_Availabilities
group by Employee_ID) empPlanYear
on main.Employee_ID = empPlanYear.Employee_ID

You need a subquery on the max() and join against that.
SQL Group by & Max

Related

BIGQUERY - LEFT OUTER JOIN cannot be used without a condition that is an equality of fields from both sides of the join

I am trying to find 5th highest salary in bigquery using this query but it gives me error
LEFT OUTER JOIN cannot be used without a condition that is an equality of fields from both sides of the join.
I believe this is the right query for sql for this question but something is not working out in bigquery. Can anybody help me with this? :)
select concat(first_name, ' ', last_name) as Name, salary
from `table` w1
where 4 = (select count(distinct(salary))
from `table` w2
where w2.salary > w1.salary)
Your query appears to be returning rows that have four larger salaries. That would be the fifth largest salary. So, just use dense_rank():
select w.*
from (select w.*,
dense_rank() over (order by salary desc) as seqnum
from `table` w
) w
where seqnum = 5;
Below is for BigQuery Standard SQL
Using functions like DENSE_RANK(), ROW_NUMBER() and such for big volumes of data usually ends up with some thing like Resource Limits Exceeded error.
Depends on your real use case - you can consider below alternatives:
#standardSQL
SELECT *
FROM `project.dataset.table`
ORDER BY salary DESC
LIMIT 1 OFFSET 4
OR
#standardSQL
SELECT AS VALUE ARRAY_AGG(t ORDER BY salary DESC LIMIT 5)[SAFE_OFFSET(4)]
FROM `project.dataset.table` t
Both above versions should give you a record with 5th highest salary

SQL Total Distinct Count on Group By Query

Trying to get an overall distinct count of the employees for a range of records which has a group by on it.
I've tried using the "over()" clause but couldn't get that to work. Best to explain using an example so please see my script below and wanted result below.
EDIT:
I should mention I'm hoping for a solution that does not use a sub-query based on my "sales_detail" table below because in my real example, the "sales_detail" table is a very complex sub-query.
Here's the result I want. Column "wanted_result" should be 9:
Sample script:
CREATE TEMPORARY TABLE [sales_detail] (
[employee] varchar(100),[customer] varchar(100),[startdate] varchar(100),[enddate] varchar(100),[saleday] int,[timeframe] varchar(100),[saleqty] numeric(18,4)
);
INSERT INTO [sales_detail]
([employee],[customer],[startdate],[enddate],[saleday],[timeframe],[saleqty])
VALUES
('Wendy','Chris','8/1/2019','8/12/2019','5','Afternoon','1'),
('Wendy','Chris','8/1/2019','8/12/2019','5','Morning','5'),
('Wendy','Chris','8/1/2019','8/12/2019','6','Morning','6'),
('Dexter','Chris','8/1/2019','8/12/2019','2','Mid','2.5'),
('Jennifer','Chris','8/1/2019','8/12/2019','4','Morning','2.75'),
('Lila','Chris','8/1/2019','8/12/2019','2','Morning','3.75'),
('Rita','Chris','8/1/2019','8/12/2019','2','Mid','1'),
('Tony','Chris','8/1/2019','8/12/2019','4','Mid','2'),
('Tony','Chris','8/1/2019','8/12/2019','1','Morning','6'),
('Mike','Chris','8/1/2019','8/12/2019','4','Mid','1.5'),
('Logan','Chris','8/1/2019','8/12/2019','3','Morning','6.25'),
('Blake','Chris','8/1/2019','8/12/2019','4','Afternoon','0.5')
;
SELECT
[timeframe],
SUM([saleqty]) AS [total_qty],
COUNT(DISTINCT [s].[employee]) AS [employee_count1],
SUM(COUNT(DISTINCT [s].[employee])) OVER() AS [employee_count2],
9 AS [wanted_result]
FROM (
SELECT
[employee],[customer],[startdate],[enddate],[saleday],[timeframe],[saleqty]
FROM
[sales_detail]
) AS [s]
GROUP BY
[timeframe]
;
If I understand correctly, you are simply looking for a COUNT(DISTINCT) for all employees in the table? I believe this query will return the results you are looking for:
SELECT
[timeframe],
SUM([saleqty]) AS [total_qty],
COUNT(DISTINCT [s].[employee]) AS [employee_count1],
(SELECT COUNT(DISTINCT [employee]) FROM [sales_detail]) AS [employee_count2],
9 AS [wanted_result]
FROM #sales_detail [s]
GROUP BY
[timeframe]
You can try this below option-
SELECT
[timeframe],
SUM([saleqty]) AS [total_qty],
COUNT(DISTINCT [s].[employee]) AS [employee_count1],
SUM(COUNT(DISTINCT [s].[employee])) OVER() AS [employee_count2],
[wanted_result]
-- select count form sub query
FROM (
SELECT
[employee],[customer],[startdate],[enddate],[saleday],[timeframe],[saleqty],
(select COUNT(DISTINCT [employee]) from [sales_detail]) AS [wanted_result]
--caculate the count with first sub query
FROM [sales_detail]
) AS [s]
GROUP BY
[timeframe],[wanted_result]
Use a trick where you only count each person on the first day they are seen:
select timeframe, sum(saleqty) as total_qty),
count(distinct employee) as employee_count1,
sum( (seqnum = 1)::int ) as employee_count2
9 as wanted_result
from (select sd.*,
row_number() over (partition by employee order by startdate) as seqnum
from sales_detail sd
) sd
group by timeframe;
Note: From the perspective of performance, your complex subquery is only evaluated once.

Effective Date Employee Entered Pay Group

I need to find the effective date when the employee entered the pay group. It either occurred at the hire date, the rehire date, or Transfer date, whichever is latest. I think what I want to do is create a temp table of most recent effective dates where C1.ACTION=('XFR') AND C1.PAYGROUP=A.PAYGROUP, when the associate is not in that table, give me most recent hire date.
A is Top of Stack Employee Dta
B is Top of Stack Personal Data
C is entire employee record
Most Recent Hire Date is
CASE WHEN A.HIRE_DT<=A.REHIRE_DT THEN A.REHIRE_DT
ELSE A.HIRE_DT END MOST_REC_HIREDT
FYI I know this query is really messed up, that's why I'm asking for help.
SELECT DISTINCT
A.EMPLID
A.FIRST_NAME||' '||A.LAST_NAME WORKERNAME,
CASE
WHEN(Select Max(C1.EFFDT) FROM JOB C1
WHERE (C.EMPLID=C1.EMPLID
AND C1.ACTION=('TAF')
AND C1.PAYGROUP=A.PAYGROUP
AND C1.EFFDT>=(CASE WHEN A.HIRE_DT<A.REHIRE_DT THEN =A.REHIRE_DT
ELSE A.HIRE_DT END MR_HIRE_DT)))
WHEN A.EMPLID NOT IN JOB C1
THEN (CASE WHEN A.HIRE_DT<=A.REHIRE_DT
THEN A.REHIRE_DT
ELSE A.HIRE_DT END MR_HIRE_DT2)
ELSE 'Null' END EFFDT,
A.PAYGROUP
FROM EMPLOYEES A, PERSONAL_DATA B, JOB C
WHERE
A.EMPLID=B.EMPLID
AND
B.EMPLID=C.EMPLID
AND
A.PAYGROUP=C.PAYGROUP
AND
C.EMPL_STATUS in ('A','L','P','S')
It really is important to use ANSI join syntax as it aids (a lot) in working through the logic of how the tables relate. Here we only have 2 tables but in the example query there are 4 table aliases in use (A, B, C and C1). Additionally it helps to use table aliases that relate to the table's name such as E for Employee, J for Job.
What you are seeking is "the latest" date from table JOB, and an extremely useful function row_number() can be used for this. It is used in conjunction with an over() clause which contains a partition by (which is a little similar to group by) and an order by. When ordered by date descending then the row number is 1 for the most recent date (per employee due to the partition used). So, if we filter the subquery below by is_latest = 1 we get one row per employee with the latest effective date. Note this also removes the need to use select distinct now.
SELECT
E.EMPLID
, (E.FIRST_NAME || ' ' || E.LAST_NAME) WORKERNAME
, J.EFFDT PAYGROUP_EFFDT
, E.PAYGROUP
FROM EMPLOYEES E
INNER JOIN (
SELECT
JOB.*
, ROW_NUMBER() OVER (PARTITION BY EMPLID
ORDER BY EFFDT DESC) AS is_latest
FROM JOB
WHERE EMPL_STATUS IN ('A','L','P','S')
) J ON E.EMPLID=J.EMPLID AND J.is_latest = 1
I may be over-simplifying the task here, as I don't fully understand how we get to the dates in question. But well, what I am doing is:
get the greater of the two hire_dt and rehire_dt from the employee record
get the job dates for the employee
from these intermediate results get the first date per employee
The query:
select emplid, max(dt)
from
(
select emplid, greatest(nvl(hire_dt,rehire_dt),nvl(rehire_dt,hire_dt)) as dt from employees
union all
select emplid, effdt as dt from job where action = 'TAF' and empl_status in ('A','L','P','S')
)
group by emplid
order by emplid;

ORA-00979: not a GROUP BY expression While trying to use MAX function [duplicate]

This question already has answers here:
ORA-00979 not a group by expression
(10 answers)
Closed 5 years ago.
When i wanted to add em_fname and emp_lname into my select statement. I kept getting NOT A GROUP BY EXPRESSION error.
The thing is when I add those 2 into GROUP BY clause, I got the unwanted query results (quite redundant).
Any suggestion on this?
select lgemployee.dept_num, emp_fname,emp_lname, max(sal_amount) as
HighestSalary
from lgsalary_history inner join lgemployee on lgsalary_history.EMP_NUM = lgemployee.EMP_NUM
group by lgemployee.dept_num;
add the none aggregated columns to group by condition, also it may required a left outer or right outer join (depends on your data)
select lgemployee.dept_num, emp_fname,emp_lname, max(sal_amount) as
HighestSalary
from lgsalary_history inner join lgemployee on lgsalary_history.EMP_NUM = lgemployee.EMP_NUM
group by lgemployee.dept_num,emp_fname,emp_lname;
But I wanted to display the highest sal_amount from employee in each dept. if i remove emp_fname and emp_lname, I got 8 rows (from 8 departments), but if I add them as you suggested, I got 363 results (not what I wanted)!
It's hard to tell what your schema is like. Is sal_amount in lgsalary_history or lgemployee (I'm guessing the latter, but it's not clear as you don't use table aliases).
With this in mind, you probably want something like the following:
SELECT dept_num, emp_fname, emp_lname FROM (
SELECT lge.dept_num, lge.emp_fname, lge.emp_lname
, lgs.sal_amount
, RANK() OVER ( PARTITION BY lge.dept_num ORDER BY lgs.sal_amount DESC) rn
FROM lgemployee lge INNER JOIN lgsalary_history lgs
ON lge.emp_num = lgs.emp_num
) WHERE rn = 1;
Now there is an added difficulty here, namely that if lgsalary_history is what I think it is, it will store past salaries in addition to current salaries. And it's just possible, even if unlikely, that a particular employee's salary could be cut. And the above query doesn't take that into account. So let's assume your salary history is dated, and there is a column dt in lgsalary_history that will allow is to determine the most recent salary for a given employee.
SELECT dept_num, emp_fname, emp_lname, sal_amount FROM (
SELECT dept_num, emp_fname, emp_lname, sal_amount
, RANK() OVER ( PARTITION BY dept_num ORDER BY sal_amount DESC ) AS rn
FROM (
SELECT e.dept_num, e.emp_fname, e.emp_lname, s.sal_amount, ROW_NUMBER() OVER ( PARTITION BY e.emp_num ORDER BY s.dt DESC ) AS rn
FROM lgemployee e INNER JOIN lgsalary_history s
ON e.emp_num = s.emp_num
) WHERE rn = 1
) WHERE rn = 1;
This will return all of the employees in a given department with the highest salary (so if there is more than one, it will return two, or three, etc.).

T-SQL: Update first row of recordset

I have a query (A) that can returns multiple rows in date order:
SELECT encounter_id, department_id, effective_time
FROM adt
WHERE event_type IN (1,3,7)
ORDER BY effective_time
I have another query (B) that returns a single row:
SELECT encounter_id, department_id, arrival_time
FROM ed
WHERE event_type = 50
I would like to join the query B to query A, in such a way that query B's single row will be associated with query A's first record.
I realize that I could do this with a CURSOR, but I was hoping to use T-SQL row_number() function.
Not sure if i got the question right.
Let me know if the below solution is different than what you were expecting
SELECT *
FROM
(
SELECT TOP 1
encounter_id, department_id, effective_time
FROM adt
WHERE event_type IN (1,3,7)
ORDER BY effective_time
)adt1,
(
SELECT encounter_id, department_id, arrival_time
FROM ed
WHERE event_type = 50
) ed1
then you can join both the tables as per your need, using WHERE clause
Regards,
Niyaz
I found my answer:
row_number() OVER (PARTITION BY encounter_id ORDER BY encounter_id, effective_time) row.
Unfortunately, the database has data-quality issues that prevent me from approaching the solution this way.
Thanks for your assistance.