Hello I have a database and i need to answer a question whether some employee works for multiple contractors. I believe that all this info is in one table.
contract (contractor_no FK ,emp_no FK, job_no FK, is_active)
other tables that might be involved
cont_employee (emp_no PK, emp_fname, emp_lname, birth_date)
contractor (contractor_no PK, contractor_name)
Group by the employee and select only those having more than one record
select e.emp_no, e.emp_fname, e.emp_lname
from cont_employee e
join contract c on e.emp_no = c.emp_no
where c.is_active = 1
group by e.emp_no, e.emp_fname, e.emp_lname
having count(distinct c.contractor_no) > 1
above query using group by on varchar column and also using distinct .
you can try row_number.
like this,
SELECT * FROM
(
select e.emp_no, e.emp_fname, e.emp_lname
,ROW_NUMBER()OVER(PARTITION BY emp_no
ORDER BY contractor_no) ROWNUM
from cont_employee e
join contract c on e.emp_no = c.emp_no
where c.is_active = 1
)T4
WHERE ROWNUM>1
Related
What I need:
I am looking for a solution that can give me all the Employee Id's that have the same EmailAddress Column (the filter needs to be by EmailAddress).
I want to know what are the Id's correspondent to the duplicated Email Addresses and retrieve that information.
Table Employee:
Id | PlNumber | EmailAddress | EmployeeBeginingDate | EmployedEndDate | Name UserId(FK) | CreatedBy | CreatedOn
SELECT a.Id,a.EmailAddress
FROM Employee a
INNER JOIN (SELECT
Employee.Id as EmployeeId,
Employee.EmailAddress as EmailAddress,
FROM Employee
GROUP BY Employee.Id,Employee.EmailAddress
HAVING count(Employee.EmailAddress) > 1
) b
ON a.Id= b.EmployeeId
ORDER BY a.Id
I am always getting an error:
the multi-part identifier could not be bound.
I know why the error is happening but I couldn't solve this.
UPDATE: After a few changes the query is returning 0 rows but I know it should return at least 3 rows that I have duplicate values.
Try the below query as you have an aliased table Employee as a. So in place of Employee, you have to use a.
SELECT a.Id, a.EmailAddress
FROM Employee a
INNER JOIN (SELECT
Employee.EmailAddress as EmailAddress
FROM Employee
GROUP BY Employee.EmailAddress
HAVING count(Employee.EmailAddress) > 1
) b
ON a.EmailAddress = b.EmailAddress
ORDER BY a.Id
Live db<>fiddle demo.
Assuming the ids are different on each row, I would go for exists:
SELECT e.Id, e.EmailAddress
FROM Employee e
WHERE EXISTS (SELECT 1
FROM Employee e2
WHERE e2.EmailAddress = e.EmailAddress AND
e2.Id <> e.Id
)
ORDER BY e.EmailAddress;
Or, if you want to know the number of matches, use window functions:
SELECT e.Id, e.EmailAddress, cnt
FROM (SELECT e.*, COUNT(*) OVER (PARTITION BY e.EmailAddress) as cnt
FROM Employee e
) e
WHERE cnt >= 2;
I have the following join a table to the most recent record for a given EMPLOYE_ID and I am wondering if there is a more efficient/faster way of retrieving the most recent record, what would be the best way?
SELECT * FROM EMPLOYEE
WHERE NOT EXISTS (
SELECT 1
FROM EMPLOYEE D
JOIN EMPLOYEE_HISTORY E
ON E.EMPLOYEE_ID = D.EMPLOYEE_ID
AND E.CREATE_DATE IN (SELECT MAX(CREATE_DATE)
FROM EMPLOYEE_HISTORY
WHERE EMPLOYEE_ID = D.EMPLOYEE_ID)
)
When I compared the explain plan to the following query it seems the below way is MORE costly.
SELECT *
FROM EMPLOYEE
WHERE NOT EXISTS
(SELECT 1
FROM EMPLOYEE D
JOIN (
SELECT E.*
FROM EMPLOYEE_HISTORY E
INNER JOIN (
SELECT EMPLOYEE_ID
, MAX(CREATE_DATE) max_date
FROM EMPLOYEE_HISTORY E2
GROUP BY EMPLOYEE_ID
) EE
ON EE.EMPLOYEE_ID = E.EMPLOYEE_ID
AND EE.max_date = E.CREATE_DATE
) A
ON A.EMPLOYEE_ID = D.EMPLOYEE_ID
AND ROWNUM = 1)
So does that mean it is indeed better?
There is no index on CREATE_DATE, however the PK is on EMPLOYEE_ID, CREATE_DATE
Use the RANK (or DENSE_RANK or ROW_NUMBER) analytic function:
SELECT 1
FROM EMPLOYEE E
JOIN (
SELECT *
FROM (
SELECT H.*,
RANK() OVER ( PARTITION BY EMPLOYEE_ID ORDER BY CREATE_DATE DESC ) AS rnk
FROM EMPLOYEE_HISTORY H
)
WHERE rnk = 1
) H
ON H.EMPLOYEE_ID = E.EMPLOYEE_ID
I would write the query using = rather than IN:
SELECT 1
FROM EMPLOYEE E JOIN
EMPLOYEE_HISTORY EH
ON EH.EMPLOYEE_ID = E.EMPLOYEE_ID AND
EH.CREATE_DATE = (SELECT MAX(EH2.CREATE_DATE)
FROM EMPLOYEE_HISTORY EH2
WHERE EH2.EMPLOYEE_ID = EH.EMPLOYEE_ID
);
IN is more general than = for the comparison.
Your primary key index should be used for the subquery, which should make it pretty fast.
Assuming that you actually do want to return actual columns, then I'm not sure if there is a way to make this faster.
If you really are selecting only 1, then forget the most recent record and just use EXISTS:
SELECT 1
FROM EMPLOYEE E
WHERE EXISTS (SELECT 1
FROM EMPLOYEE_HISTORY EH2
WHERE EH2.EMPLOYEE_ID = E.EMPLOYEE_ID
);
The only additional condition your query checks for is that CREATE_DATE is not NULL, but I'm guessing that is always true anyway.
If the CREATE_DATE of the EMPLOYEE must be after the maximum CREATE_DATE for that EMPLOYEE_ID in EMPLOYEE_HISTORY?
Then for that EMPLOYEE_ID, there doesn't exist an equal or higher CREATE_DATE in EMPLOYEE_HISTORY.
SELECT *
FROM EMPLOYEE Emp
WHERE NOT EXISTS (
SELECT 1
FROM EMPLOYEE_HISTORY Hist
WHERE Hist.EMPLOYEE_ID = Emp.EMPLOYEE_ID
AND Hist.CREATE_DATE >= Emp.CREATE_DATE
)
Test here
the database schema is like this:
employee(emp_id#,person_name,dob,street,city)
company(company_id#, company_name,city)
works(emp_id,company_id,salary)
manages(emp_id,manager_id)
insert into employee values('e-1','dipankar pal','15-jul-1997','h.m raod','kolkata');
insert into employee values('e-2','subhadip roy','15-jan-1997','garia','kolkata');
the manages table may be little bit confusing, here are some data I've inserted to clear my point
insert into manages values('e-3','e-1');
insert into manages values('e-4','e-1');
insert into manages values('e-5','e-1');
insert into manages values('e-6','e-2');
insert into manages values('e-7','e-2');
Try
SELECT Z.COMPANY_NAME, E.PERSON_NAME, Z.HIGHEST_SALARY
FROM
(SELECT C.COMPANY_ID, C.COMPANY_NAME, MAX(W.SALARY) AS HIGHEST_SALARY
FROM
WORKS W INNER JOIN COMPANY C
ON W.COMPANY_ID = C.COMPANY_ID
GROUP BY C.COMPANY_ID, C.COMPANY_NAME ) Z
INNER JOIN WORKS W
ON Z.COMPANY_ID = W.COMPANY_ID
AND Z.HIGHEST_SALARY = W.SALARY
INNER JOIN EMPLOYEE E
ON W.EMP_ID = E.EMP_ID
ORDER BY Z.HIGHEST_SALARY DESC;
select company.company_name,employee.person_name as employee_name,y.salary as salary from employee,company,
(select works.emp_id,works.company_id,works.salary from works,
(select company_id,max(salary) as high from works group by(company_id)
)x where high=works.salary) y
where employee.emp_id=y.emp_id and company.company_id=y.company_id order by company_name
;
I am having trouble trying to solve this problem, I would like to only add a salary up if the
employee's id is distinct. I thought I could do this using the decode() function but I am having trouble defining an expression suitable. I was aiming for something like
SUM(DECODE(S.ID,IS DISTINCT,S.SALARY))
But this isn't going to work!
So the full query looks like
SELECT B.ID, SUM(S.SALARY), COUNT(DISTINCT S.ID), COUNT(DISTINCT RM.MEMBER_ID)
FROM BRANCH B
INNER JOIN STAFF S ON S.BRANCH_ID = B.ID
INNER JOIN RECRUIT_MEMBER RM ON RM.BRANCH_ID = B.ID
GROUP BY B.ID;
But the problem is with SUM(S.SALARY) it's adding up salaries from duplicate ID's
I don't know about DECODE, but this should work:
SELECT
SUM(S.SALARY)
FROM <table> S
WHERE NOT EXISTS (
SELECT ID FROM <table> WHERE ID=S.ID GROUP BY ID HAVING COUNT(*)>1
)
Perhaps something like this...
SELECT E.ID, SUM(E.Salary)
FROM Employers E
WHERE E.ID IN (SELECT DISTINCT E2.ID FROM Employers E2)
GROUP BY E.ID
If not, perhaps you could post some sample data so that I can understand better
The joins are introducing duplicate rows. One way to fix this is by adding a row number to sequentially identify different ids. The real way would be to fix the joins so this doesn't happen, but here is the first way:
SELECT B.ID, SUM(CASE WHEN SEQNUM = 1 THEN S.SALARY END),
COUNT(DISTINCT S.ID), COUNT(DISTINCT RM.MEMBER_ID)
FROM (SELECT B.ID, S.ID, RM.MEMBER_ID,
ROW_NUMBER() OVER (PARTITION BY S.ID ORDER BY S.ID) as seqnum
FROM BRANCH B
INNER JOIN STAFF S ON S.BRANCH_ID = B.ID
INNER JOIN RECRUIT_MEMBER RM ON RM.BRANCH_ID = B.ID
) t
GROUP BY B.ID
You can create a virtual table with only one salary per ID like this...
SELECT
...whatever fields you've already got...
s.Salary
FROM
...whatever tables and joins you've already got...
LEFT JOIN (SELECT ID, MAX(SALARY) as "Salary" FROM SALARY_TABLE GROUP BY ID) s
ON whatevertable.ID = s.ID
I am having trouble writing a query that will select all Skills, joining the Employee and Competency records, but only return one skill per employee, their newest Skill. Using this sample dataset
Skills
======
id employee_id competency_id created
1 1 1 Jan 1
2 2 2 Jan 1
3 1 2 Jan 3
Employees
===========
id first_name last_name
1 Mike Jones
2 Steve Smith
Competencies
============
id title
1 Problem Solving
2 Compassion
I would like to retrieve the following data
Skill.id Skill.employee_id Skill.competency_id Skill.created Employee.id Employee.first_name Employee.last_name Competency.id Competency.title
2 2 2 Jan 1 2 Steve Smith 2 Compassion
3 1 2 Jan 3 1 Mike Jones 2 Compassion
I was able to select the employee_id and max created using
SELECT MAX(created) as created, employee_id FROM skills GROUP BY employee_id
But when I start to add more fields in the select statement or add in a join I get the 'Column 'xyz' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.' error.
Any help is appreciated and I don't have to use GROUP BY, it's just what I'm familiar with.
The error that you were getting is because SQL Server requires any item in the SELECT list to be included in the GROUP BY if there is an aggregate function being used.
The problem with that is you might have unique values in some columns which can throw off the result. So you will want to rewrite the query to use one of the following:
You can use a subquery to get this result. This gets the max(created) in a subquery and then you use that result to get the correct employee record:
select s.id SkillId,
s.employee_id,
s.competency_id,
s.created,
e.id employee,
e.first_name,
e.last_name,
c.id competency,
c.title
from Employees e
left join Skills s
on e.id = s.employee_id
inner join
(
SELECT MAX(created) as created, employee_id
FROM skills
GROUP BY employee_id
) s1
on s.employee_id = s1.employee_id
and s.created = s1.created
left join Competencies c
on s.competency_id = c.id
See SQL Fiddle with Demo
Or another way to do this is to use row_number():
select *
from
(
select s.id SkillId,
s.employee_id,
s.competency_id,
s.created,
e.id employee,
e.first_name,
e.last_name,
c.id competency,
c.title,
row_number() over(partition by s.employee_id
order by s.created desc) rn
from Employees e
left join Skills s
on e.id = s.employee_id
left join Competencies c
on s.competency_id = c.id
) src
where rn = 1
See SQL Fiddle with Demo
For every non-aggregated column you add to your SELECT statement you need to update your GROUP BY to include it.
This article may help you understand why.
;WITH
MAX_SKILL_created AS
(
SELECT
MAX(skills.created) as created,
skills.employee_id
FROM
skills
GROUP BY
skills.employee_id
),
MAX_SKILL_id AS
(
SELECT
MAX(skills.id) as id,
skills.employee_id
FROM
skills
INNER JOIN MAX_SKILL_created
ON MAX_SKILL_created.employee_id = skills.employee_id
AND MAX_SKILL_created.created = skills.created
GROUP BY
skills.employee_id
)
SELECT
* -- type all your columns here
FROM
employees
INNER JOIN MAX_SKILL_id
ON MAX_SKILL_id.employee_id = employees.employee_id
INNER JOIN skills
ON skills.id = MAX_SKILL_id.id
INNER JOIN competencies
ON competencies.id = skills.competency_id
If you are using SQL Server than you can use OUTER APPLY
SELECT *
FROM employees E
OUTER APPLY (
SELECT TOP 1 *
FROM skills
WHERE employee_id = E.id
ORDER BY created DESC
) S
INNER JOIN competencies C
ON C.id = S.competency_id