Identify duplicate rows based on specifc columns

Identify duplicate rows based on specifc columns - sql

I need a query to identify duplicate rows by adding an IsDuplicate column with the text yes/no. The query needs to check for duplicates only within certain columns.
What I have so far is almost correct, except that I need the yes to appear all of the duplicate rows. This is just a simplified example and there will be other columns that need to be selected, but not included in the duplicate checks.
select Emp_Name
,Company
,Join_Date
,Resigned_Date,
case
when ROW_NUMBER () over
(partition by
Emp_Name, Company, Join_Date
,Resigned_Date
order by Emp_Name, Company, Join_Date
,Resigned_Date) > 1 then 'Yes'
else 'No'
end as IsDuplicate
,ROW_NUMBER() over (partition by Emp_Name, Company, Join_Date
,Resigned_Date
order by Emp_Name, Company, Join_Date
,Resigned_Date) RowNumber ,
Hours from Emp_Details
https://sqliteonline.com/#fiddle=dbdab61529544220bd3319407dbafd4beba1671d14ef00bf1635011c6f233dea

What I have so far is almost correct, except that I need the yes to appear all of the duplicate rows.
You want count(*) rather than row_number():
(case when count(*) over (partition by Emp_Name, Company, Join_Date, Resigned_Date) > 1
then 'Yes' else 'No'
end) as IsDuplicate

Try this and let me know if problem not resolved
select E1.Emp_Name
,E1.Company
,E1.Join_Date
,E1.Resigned_Date, CASE WHEN E2.Count IS NOT NULL THEN 'YES' ELSE 'NO' END Duplicate
FROM Emp_Details E1
LEFT OUTER JOIn(
select EMP_NAME, company, JOIN_DATE, resigned_date, count(1)count
FROM EMP_DETAILS
GROUP BY EMP_NAME, company, JOIN_DATE, resigned_date HAVING COUNT()>1) E2 ON E1.Emp_Name=E2.emp_name and E1.join_date=E2.join_date AND E1.resigned_date=E2.resigned_date

I'm not sure exactly what you're asking since your query appears to give the results you describe.
However if I were to ignore your example query and just write a query based on your given specification I would suggest something like the following
with d as (
select * ,
row_number() over(partition by emp_name, company, join_date, resigned_date order by join_date) as rn
from Emp_Details E1
)
select emp_name, company, join_date, resigned_date,
case when rn=1 then 'No' else 'Yes' end IsDuplicate
from d
Edit...
select emp_name, company, join_date, resigned_date,
case when count(*) over(partition by Emp_Name, Company, Join_Date, Resigned_Date) = 1 then 'No' else 'Yes' end IsDuplicate
from Emp_Details
order by emp_name, company, join_date

Related

SQL Server Group Records Based Another Columns

I am working on a table that contains employee data. The table has historical employee records based on department and year as follows:
Now I want to consolidate records based on EmployeeId, Department and get the Min FromYear and Max ToYear like this:
I tried to use a query :
Select EmployeeId, Department, MIN(FromYear), MAX(ToYear)
from Employee
GROUP BY EmployeeId, Department
But this query fails for the employee with ID 3 as it returns me only 2 rows:
I have added a similar structure and query here: http://sqlfiddle.com/#!9/6f1e53/5
Any help would be highly appreciated!

This is a gaps-and-islands problem. Identify the islands using lag() and a cumulative sum. Then aggregate:
select employeeid, department, min(fromyear), max(toyear)
from (select e.*,
sum(case when prev_toyear >= fromyear - 1 then 0 else 1 end) over (partition by employeeid order by fromyear) as grp
from (select e.*,
lag(toyear) over (partition by employeeid, department order by fromyear) as prev_toyear
from employee e
) e
) e
group by employeeid, department, grp
order by employeeid, min(fromyear);
Here is a db<>fiddle.

you can use self join as well
select a.employeeid, min(a.fromyear), max(b.toyear) from emp a
inner join emp b on a.employeeid=b.employeeid
group by a.employeeid

How to eliminate duplicate records based on only few columns in table and keep one and indicate that there were duplicate records corresponding to it?

The resultset I have is like shown below:
And expected output is like shown below:
Any idea how can we achieve this with SQL in Oracle?

You can use window functions:
select city, name, salary,
(case when cnt > 1 then 'Multiple' else 'Single' end) as Indicator
from (select t.*,
count(*) over (partition by city, name) as cnt,
row_number(*) over (partition by city, name order by salary) as seqnum
from t
) t
where seqnum = 1;
EDIT:
Actually, if you want the minimum salary:
select city, name, min(salary),
(case when count(*) = 1 then 'Single' else 'Multiple' end) as indicator
from t
group by city, name;

Try this
DELETE FROM tabename WHERE rowid in
(SELECT city, name, salary,COUNT(*)
FROM tabename
GROUP BY city, name, salary
HAVING count(*) > 1);

Need help to get correct sql query

I have a table Employee, one of its attribute is 'Gender'. In Gender column we have two type of records 'male' or 'female'
Now i suppose to write a query which should give me an output like 1st record should be for 'Male', 2nd record for 'Female', 3rd for 'Male',
4th for 'Female'.
I have used below query to fetch record as above mentioned
select name, empid, salary, gender, rownum rn, case gender when 'Male' then
rn = (select * from (select rownum rn from Employee) where mod (rn, 2) <> 0)
else rn = (select * from (select rownum rn from Employee) where mod (rn, 2) = 0) end as Org_Gender form employee;
but this query not able to fetch the required output.
Can someone give me the syntax please.?

This should do it:
select empid, name, gender
from (
select name, empid, gender,
row_number() over (partition by gender order by name) as rn
from employee
) t
order by rn, gender
row_number() over (partition by gender ..) will number all females from 1 to x and all males from 1 to x. By ordering the outer query using that value the final output will have the first female, then the first male, the second female, the second male and so on.
SQLFiddle example: http://sqlfiddle.com/#!4/23f656/2

SQL Syntax on DISTINCT Query

I have an Employee Table with their DeptCode. I want list of distinct DeptCode and their first created date in the Employee Table. This will also tell which employee was first entered for a specific dept in the Employee Table.
I used:
SELECT DISTINCT DEPTCODE,
CREATEDDATE
FROM EMPLOYEE
The Date Return is incorrect.
Any specific syntax to handle this issue.

Try:
SELECT DEPTCODE,
Min(CREATEDDATE)
FROM EMPLOYEE
GROUP BY DEPTCODE

If you want the department codes, earliest creation date, and the name of the employee, then I would recommend window functions:
select deptcode, name, createddate
from (select e.*,
row_number() over (partition by deptcode order by createddate) as seqnum
from employee e
) e
where seqnum = 1;

You can use GROUP BY and MIN to achieve this.
SELECT DEPTCODE, MIN(CREATEDDATE)
from EMPLOYEE
GROUP BY DEPTCODE

Something like this.
SELECT deptcode,
employee_name,
minddate
FROM employee
JOIN (SELECT deptcode,
Min(createddate) mindate
FROM employee
GROUP BY deptcode) temp
ON employee.deptcode = temp.deptcode
AND createddate = mindate

conditional order by clause in sql

I have a query that should order the result in asc or desc depending upon a column value.
e.g.
if employee of type manager exists THEN order by joining_date, bith_date ASC
else if employee is developer THEN order by joining_date, birth_date DESC.
I would like to achieve something like below, but can't achieve that.
ORDER BY CASE WHEN employee_type = 'm'
THEN joining_date, birth_date ASC;
WHEN employee_type = 'd'
THEN joining_date, birth_date DESC;

Well I got the answer after some research.
We can add multiple columns in where clause conditionally as follows :
ORDER BY DECODE(employee_type, 'm', joining_date, birth_date, salary) ASC,
DECODE(employee_type, 'd', joining_date, birth_date, salary) DESC
This will order the result on the basis of employee_type.

I suspect you want something like this:
ORDER BY
employee_type DESC -- first all the managers, then the developers
-- and in every one of these two groups
, joining_date -- first order by joining date
, CASE WHEN employee_type = 'm' -- and then either by
THEN birth_date -- birth date ascending for managers
ELSE NULL
END -- or
, birth_date DESC ; -- birth date descending for the rest (devs)

The question is a little bit poor specified.
order the result in asc or desc depending upon a column value.
A column takes many values (as there are multiple rows).
Now, order by clause use an expression and order rows upon it.
That expression should be morphotropic(;))
So, assuming stardard oracle's employee schema, managers are:
select *
from emp e
where exists (select emp_id from emp where e.id=emp.mgr_id)
An workaround query may be:
Select e.id, e.name, e.birth_date,
case
when (select count(*)
from emp e
where exists (select emp_id from emp where e.id=emp.mgr_id)
) --existence of manager
> 0 then birth_date - to_date('1-Jan-1000','dd-mon-yyyy')
else to_date('1-Jan-1000','dd-mon-yyyy') - birth_date
end as tricky_expression
from emp A
order by 4;
That exexpresion is the case; Using a constant(subquery that decides there are managers) it changes values from positive to negative, that is, change the order direction.
UPDATE: with the details in the comments:
select id, name, birth_date emp_type
from (
Select id, name, birth_date, emp_type,
case when cnt_mgr > 0 then birth_date - to_date('1-Jan-1000','dd-mon-yyyy')
else to_date('1-Jan-1000','dd-mon-yyyy') - birth_date
end as tricky_expression
from(
Select e.id, e.name, e.birth_date, emp_type,
count(case when emp_type='M' then 1 else 0 end) over() as mgr_count
from emp A
where your_conditions
)
order by tricky_expression
)
where rownum=1;

If there is a manager in the company this query returns the oldest manager, otherwise - the youngest developer.
select
id, name, birth_date, emp_type
from emp
where
id = (select
max(id) keep (dense_rank first order by
decode(emp_type, 'M', 1, 'D', 2),
joining_date,
decode(emp_type, 'M', 1, 'D', -1) * (birth_date - to_date('3000','yyyy')))
from emp)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Identify duplicate rows based on specifc columns - sql

What I have so far is almost correct, except that I need the yes to appear all of the duplicate rows. You want count() rather than row_number(): (case when count() over (partition by Emp_Name, Company, Join_Date, Resigned_Date) > 1 then 'Yes' else 'No' end) as IsDuplicate

Related

SQL Server Group Records Based Another Columns

How to eliminate duplicate records based on only few columns in table and keep one and indicate that there were duplicate records corresponding to it?

Need help to get correct sql query

SQL Syntax on DISTINCT Query

conditional order by clause in sql

Categories

Resources

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Identify duplicate rows based on specifc columns - sql

What I have so far is almost correct, except that I need the yes to appear all of the duplicate rows. You want count(*) rather than row_number(): (case when count(*) over (partition by Emp_Name, Company, Join_Date, Resigned_Date) > 1 then 'Yes' else 'No' end) as IsDuplicate

Related

SQL Server Group Records Based Another Columns

How to eliminate duplicate records based on only few columns in table and keep one and indicate that there were duplicate records corresponding to it?

Need help to get correct sql query

SQL Syntax on DISTINCT Query

conditional order by clause in sql

Categories

Resources

What I have so far is almost correct, except that I need the yes to appear all of the duplicate rows. You want count() rather than row_number(): (case when count() over (partition by Emp_Name, Company, Join_Date, Resigned_Date) > 1 then 'Yes' else 'No' end) as IsDuplicate