Oracle 12c - Efficient way to join max date record - sql

I have the following join a table to the most recent record for a given EMPLOYE_ID and I am wondering if there is a more efficient/faster way of retrieving the most recent record, what would be the best way?
SELECT * FROM EMPLOYEE
WHERE NOT EXISTS (
SELECT 1
FROM EMPLOYEE D
JOIN EMPLOYEE_HISTORY E
ON E.EMPLOYEE_ID = D.EMPLOYEE_ID
AND E.CREATE_DATE IN (SELECT MAX(CREATE_DATE)
FROM EMPLOYEE_HISTORY
WHERE EMPLOYEE_ID = D.EMPLOYEE_ID)
)
When I compared the explain plan to the following query it seems the below way is MORE costly.
SELECT *
FROM EMPLOYEE
WHERE NOT EXISTS
(SELECT 1
FROM EMPLOYEE D
JOIN (
SELECT E.*
FROM EMPLOYEE_HISTORY E
INNER JOIN (
SELECT EMPLOYEE_ID
, MAX(CREATE_DATE) max_date
FROM EMPLOYEE_HISTORY E2
GROUP BY EMPLOYEE_ID
) EE
ON EE.EMPLOYEE_ID = E.EMPLOYEE_ID
AND EE.max_date = E.CREATE_DATE
) A
ON A.EMPLOYEE_ID = D.EMPLOYEE_ID
AND ROWNUM = 1)
So does that mean it is indeed better?
There is no index on CREATE_DATE, however the PK is on EMPLOYEE_ID, CREATE_DATE

Use the RANK (or DENSE_RANK or ROW_NUMBER) analytic function:
SELECT 1
FROM EMPLOYEE E
JOIN (
SELECT *
FROM (
SELECT H.*,
RANK() OVER ( PARTITION BY EMPLOYEE_ID ORDER BY CREATE_DATE DESC ) AS rnk
FROM EMPLOYEE_HISTORY H
)
WHERE rnk = 1
) H
ON H.EMPLOYEE_ID = E.EMPLOYEE_ID

I would write the query using = rather than IN:
SELECT 1
FROM EMPLOYEE E JOIN
EMPLOYEE_HISTORY EH
ON EH.EMPLOYEE_ID = E.EMPLOYEE_ID AND
EH.CREATE_DATE = (SELECT MAX(EH2.CREATE_DATE)
FROM EMPLOYEE_HISTORY EH2
WHERE EH2.EMPLOYEE_ID = EH.EMPLOYEE_ID
);
IN is more general than = for the comparison.
Your primary key index should be used for the subquery, which should make it pretty fast.
Assuming that you actually do want to return actual columns, then I'm not sure if there is a way to make this faster.
If you really are selecting only 1, then forget the most recent record and just use EXISTS:
SELECT 1
FROM EMPLOYEE E
WHERE EXISTS (SELECT 1
FROM EMPLOYEE_HISTORY EH2
WHERE EH2.EMPLOYEE_ID = E.EMPLOYEE_ID
);
The only additional condition your query checks for is that CREATE_DATE is not NULL, but I'm guessing that is always true anyway.

If the CREATE_DATE of the EMPLOYEE must be after the maximum CREATE_DATE for that EMPLOYEE_ID in EMPLOYEE_HISTORY?
Then for that EMPLOYEE_ID, there doesn't exist an equal or higher CREATE_DATE in EMPLOYEE_HISTORY.
SELECT *
FROM EMPLOYEE Emp
WHERE NOT EXISTS (
SELECT 1
FROM EMPLOYEE_HISTORY Hist
WHERE Hist.EMPLOYEE_ID = Emp.EMPLOYEE_ID
AND Hist.CREATE_DATE >= Emp.CREATE_DATE
)
Test here

Related

How to get result if exactly one match inner join

How can I write a query to join two tables and return result if exactly one match in there. I have to discard results if zero match and more than one match.
All I am looking for is to extend the INNER JOIN. Let me just get to the point. I have two tables Dept & Emp. One Dept can have multiple Emp's & not the other way around.
Table Dept
Table Emp
I need to JOIN it on Dept_id
Expected Results
You can join with a not exists condition:
select d.*, e.emp_id, e.emp_name
from dept d
inner join emp e
on d.dept_id = e.dept_id
and not exists (
select 1
from emp e1
where e1.dept_id = d.dept_id and e1.emp_id != e.emp_id
)
One alternative to existing solutions can be one using analytics (window functions),
instead of joining twice:
select dept_id, dept_name, emp_id, emp_name
from
(
SELECT
d.Dept_id, d.Dept_name, e.Emp_id, e.Emp_Name,
count(*) over (partition by d.dept_id) cnt1
FROM d
INNER JOIN e
ON d.Dept_id = e.Dept_id
) where cnt = 1;
You could use a subquery for group by dept_id haing count = 1
select t.dept_id, dept.dept_name, emp.Emp_name
from (
select dept_id
from emp
group by dept_id
having count(*) = 1
) t
INNER JOIN dept on t.dept_id = dept.dept_id
INNER JOIN emp ON t.dept_id = emp.dept_id
You can phrase this as an aggregation query in Oracle:
select d.dept_id, d.dept_name,
max(e.emp_id) as emp_id,
max(e.emp_name) as emp_name
from dept d inner join
emp e
using (dept_id)
group by d.dept_id, d.dept_name
having count(*) = 1;
This works because if there is only one match, then max() returns the value from the one row.
Also, try below query;
SELECT a.depid dept_id,dept_name,emp_id,emp_name
FROM
(SELECT case WHEN count(*)=1 THEN dept_id END depid FROM emp GROUP BY dept_id) a INNER JOIN emp ON depid=dept_id
INNER JOIN dept b ON a.depid = b.dept_id
WHERE depid IS NOT NULL
Another way would be
select d.dept_id, d.dept_name, e.emp_name
from emp e
join dept d on d.dept_id = e.dept_id
where e.dept_id in
( select dept_id from emp group by dept_id having count(*) = 1 )

SQL employees working for contractors

Hello I have a database and i need to answer a question whether some employee works for multiple contractors. I believe that all this info is in one table.
contract (contractor_no FK ,emp_no FK, job_no FK, is_active)
other tables that might be involved
cont_employee (emp_no PK, emp_fname, emp_lname, birth_date)
contractor (contractor_no PK, contractor_name)
Group by the employee and select only those having more than one record
select e.emp_no, e.emp_fname, e.emp_lname
from cont_employee e
join contract c on e.emp_no = c.emp_no
where c.is_active = 1
group by e.emp_no, e.emp_fname, e.emp_lname
having count(distinct c.contractor_no) > 1
above query using group by on varchar column and also using distinct .
you can try row_number.
like this,
SELECT * FROM
(
select e.emp_no, e.emp_fname, e.emp_lname
,ROW_NUMBER()OVER(PARTITION BY emp_no
ORDER BY contractor_no) ROWNUM
from cont_employee e
join contract c on e.emp_no = c.emp_no
where c.is_active = 1
)T4
WHERE ROWNUM>1

SQL group function nested too deeply

I'm trying to create an sql query that will return the smallest occurrence of an id appearing between two tables however I keep getting the error with the line HAVING MIN(COUNT(E.C_SE_ID)). Oracle is saying that the group by function is nested too deeply.
I cannot think of another way of returning C_SE_ID
SELECT CS.C_SE_ID, MIN(COUNT(E.C_SE_ID))
FROM COURSE_SECTION CS, ENROLLMENT E, LOCATION L
WHERE CS.C_SE_ID=E.C_SE_ID
AND CS.LOC_ID=L.LOC_ID
AND L.BLDG_CODE='DBW'
GROUP BY CS.C_SE_ID
HAVING MIN(COUNT(E.C_SE_ID));
in enrollment table s_id and c_se_id are linked, I'm trying to get all the s_id that are related to that c_se_id. with the updated query oracle doesn't like the select * (for obvious reasons) but when I change it too e.c_Se_id I get nothing.
SELECT E.S_ID
FROM COURSE_SECTION CS, ENROLLMENT E
WHERE CS.C_SE_ID=E.C_SE_ID
AND E.C_SE_ID =(
select *
from (select CS.C_SE_ID, count(*) as cnt,
max(count(*)) over (partition by cs.c_se_id) as maxcnt
from COURSE_SECTION CS join
ENROLLMENT E
on CS.C_SE_ID=E.C_SE_ID join
LOCATION L
on CS.LOC_ID=L.LOC_ID
where L.BLDG_CODE='DBW'
GROUP BY CS.C_SE_ID
order by count(*) desc
) t
where cnt = maxcnt);
One way to do this is by nesting your query and then choosing the first row in the output:
select C_SE_ID, cnt
from (select CS.C_SE_ID, count(*) as cnt
from COURSE_SECTION CS join
ENROLLMENT E
on CS.C_SE_ID=E.C_SE_ID join
LOCATION L
on CS.LOC_ID=L.LOC_ID
where L.BLDG_CODE='DBW'
GROUP BY CS.C_SE_ID
order by count(*) desc
) t
where rownum = 1
Note I updated the join syntax to the more modern version using on instead of where.
If you want all minimum values (and there are more than one), then I would use analytic functions. It is a very similar idea to your original query:
select *
from (select CS.C_SE_ID, count(*) as cnt,
max(count(*)) over (partition by cs.c_se_id) as maxcnt
from COURSE_SECTION CS join
ENROLLMENT E
on CS.C_SE_ID=E.C_SE_ID join
LOCATION L
on CS.LOC_ID=L.LOC_ID
where L.BLDG_CODE='DBW'
GROUP BY CS.C_SE_ID
order by count(*) desc
) t
where cnt = maxcnt;
Try this instead of your original query:
SELECT E.S_ID
FROM ENROLLMENT E
where E.C_SE_ID in (select C_SE_ID
from (select CS.C_SE_ID, count(*) as cnt,
max(count(*)) over (partition by cs.c_se_id) as maxcnt
from ENROLLMENT E
LOCATION L
on CS.LOC_ID=L.LOC_ID
where L.BLDG_CODE='DBW'
GROUP BY e.C_SE_ID
) t
where cnt = maxcnt)
);
In addition to fixing the joins, I also removed all references to course_section. This table doesn't seem to be used (unless for filtering results), and removing it implifies the queries.

SQL Server Query using GROUP BY

I am having trouble writing a query that will select all Skills, joining the Employee and Competency records, but only return one skill per employee, their newest Skill. Using this sample dataset
Skills
======
id employee_id competency_id created
1 1 1 Jan 1
2 2 2 Jan 1
3 1 2 Jan 3
Employees
===========
id first_name last_name
1 Mike Jones
2 Steve Smith
Competencies
============
id title
1 Problem Solving
2 Compassion
I would like to retrieve the following data
Skill.id Skill.employee_id Skill.competency_id Skill.created Employee.id Employee.first_name Employee.last_name Competency.id Competency.title
2 2 2 Jan 1 2 Steve Smith 2 Compassion
3 1 2 Jan 3 1 Mike Jones 2 Compassion
I was able to select the employee_id and max created using
SELECT MAX(created) as created, employee_id FROM skills GROUP BY employee_id
But when I start to add more fields in the select statement or add in a join I get the 'Column 'xyz' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.' error.
Any help is appreciated and I don't have to use GROUP BY, it's just what I'm familiar with.
The error that you were getting is because SQL Server requires any item in the SELECT list to be included in the GROUP BY if there is an aggregate function being used.
The problem with that is you might have unique values in some columns which can throw off the result. So you will want to rewrite the query to use one of the following:
You can use a subquery to get this result. This gets the max(created) in a subquery and then you use that result to get the correct employee record:
select s.id SkillId,
s.employee_id,
s.competency_id,
s.created,
e.id employee,
e.first_name,
e.last_name,
c.id competency,
c.title
from Employees e
left join Skills s
on e.id = s.employee_id
inner join
(
SELECT MAX(created) as created, employee_id
FROM skills
GROUP BY employee_id
) s1
on s.employee_id = s1.employee_id
and s.created = s1.created
left join Competencies c
on s.competency_id = c.id
See SQL Fiddle with Demo
Or another way to do this is to use row_number():
select *
from
(
select s.id SkillId,
s.employee_id,
s.competency_id,
s.created,
e.id employee,
e.first_name,
e.last_name,
c.id competency,
c.title,
row_number() over(partition by s.employee_id
order by s.created desc) rn
from Employees e
left join Skills s
on e.id = s.employee_id
left join Competencies c
on s.competency_id = c.id
) src
where rn = 1
See SQL Fiddle with Demo
For every non-aggregated column you add to your SELECT statement you need to update your GROUP BY to include it.
This article may help you understand why.
;WITH
MAX_SKILL_created AS
(
SELECT
MAX(skills.created) as created,
skills.employee_id
FROM
skills
GROUP BY
skills.employee_id
),
MAX_SKILL_id AS
(
SELECT
MAX(skills.id) as id,
skills.employee_id
FROM
skills
INNER JOIN MAX_SKILL_created
ON MAX_SKILL_created.employee_id = skills.employee_id
AND MAX_SKILL_created.created = skills.created
GROUP BY
skills.employee_id
)
SELECT
* -- type all your columns here
FROM
employees
INNER JOIN MAX_SKILL_id
ON MAX_SKILL_id.employee_id = employees.employee_id
INNER JOIN skills
ON skills.id = MAX_SKILL_id.id
INNER JOIN competencies
ON competencies.id = skills.competency_id
If you are using SQL Server than you can use OUTER APPLY
SELECT *
FROM employees E
OUTER APPLY (
SELECT TOP 1 *
FROM skills
WHERE employee_id = E.id
ORDER BY created DESC
) S
INNER JOIN competencies C
ON C.id = S.competency_id

Extra Fields with SQL MIN() & GROUP BY

When using the SQL MIN() function, along with GROUP BY, will any additional columns (not the MIN column, or one of the GROUP BY columns) match the data in the matching MIN row?
For example, given a table with department names, employee names, and salary:
SELECT MIN(e.salary), e.* FROM employee e GROUP BY department
Obviously I'll get two good columns, the minimum salary and the department. Will the employee name (and any other employee fields) be from the same row? Namely the row with the MIN(salary)?
I know there could very possibly be two employees with the same (and lowest) salary, but all I'm concerned with (now) is getting all the information on the (or a single) cheapest employee.
Would this select the cheapest salesman?
SELECT min(salary), e.* FROM employee e WHERE department = 'sales'
Essentially, can I be sure that the data returned along with the MIN() function will matches the (or a single) record with that minimum value?
If the database matters, I'm working with MySql.
If you wanted to get the "cheapest" employee in each department you would have two choices off the top of my head:
SELECT
E.* -- Don't actually use *, list out all of your columns
FROM
Employees E
INNER JOIN
(
SELECT
department,
MIN(salary) AS min_salary
FROM
Employees
GROUP BY
department
) AS SQ ON
SQ.department = E.department AND
SQ.min_salary = E.salary
Or you can use:
SELECT
E.*
FROM
Employees E1
LEFT OUTER JOIN Employees E2 ON
E2.department = E1.department AND
E2.salary < E1.salary
WHERE
E2.employee_id IS NULL -- You can use any NOT NULL column here
The second statement works by effectively saying, show me all employees where you can't find another employee in the same department with a lower salary.
In both cases, if two or more employees have equal salaries that are the minimum you will get them both (all).
SELECT e.*
FROM employee e
WHERE e.id =
(
SELECT id
FROM employee ei
WHERE ei.department = 'sales'
ORDER BY
e.salary
LIMIT 1
)
To get values for each department, use:
SELECT e.*
FROM department d
LEFT JOIN
employee e
ON e.id =
(
SELECT id
FROM employee ei
WHERE ei.department = d.id
ORDER BY
e.salary
LIMIT 1
)
To get values only for those departments that have employees, use:
SELECT e.*
FROM (
SELECT DISTINCT eo.department
FROM employee eo
) d
JOIN
employee e
ON e.id =
(
SELECT id
FROM employee ei
WHERE ei.department = d.department
ORDER BY
e.salary
LIMIT 1
)
Of course, having an index on (department, salary) will greatly improve all three queries.
The fastest solution:
SET #dep := '';
SELECT * FROM (
SELECT * FROM `employee` ORDER BY `department`, `salary`
) AS t WHERE IF ( #dep = t.`department`, FALSE, ( #dep := t.`department` ) OR TRUE );
Another approach can be using Analytical functions. Here is the query using analytical and ROW_NUM functions
select first_name, salary from (select first_name,salary, Row_NUMBER() over (PARTITION BY DEPARTMENT_ID ORDER BY salary ASC) as row_count from employees) where row_count=1;