I need a query to identify duplicate rows by adding an IsDuplicate column with the text yes/no. The query needs to check for duplicates only within certain columns.
What I have so far is almost correct, except that I need the yes to appear all of the duplicate rows. This is just a simplified example and there will be other columns that need to be selected, but not included in the duplicate checks.
select Emp_Name
,Company
,Join_Date
,Resigned_Date,
case
when ROW_NUMBER () over
(partition by
Emp_Name, Company, Join_Date
,Resigned_Date
order by Emp_Name, Company, Join_Date
,Resigned_Date) > 1 then 'Yes'
else 'No'
end as IsDuplicate
,ROW_NUMBER() over (partition by Emp_Name, Company, Join_Date
,Resigned_Date
order by Emp_Name, Company, Join_Date
,Resigned_Date) RowNumber ,
Hours from Emp_Details
https://sqliteonline.com/#fiddle=dbdab61529544220bd3319407dbafd4beba1671d14ef00bf1635011c6f233dea
What I have so far is almost correct, except that I need the yes to appear all of the duplicate rows.
You want count(*) rather than row_number():
(case when count(*) over (partition by Emp_Name, Company, Join_Date, Resigned_Date) > 1
then 'Yes' else 'No'
end) as IsDuplicate
Try this and let me know if problem not resolved
select E1.Emp_Name
,E1.Company
,E1.Join_Date
,E1.Resigned_Date, CASE WHEN E2.Count IS NOT NULL THEN 'YES' ELSE 'NO' END Duplicate
FROM Emp_Details E1
LEFT OUTER JOIn(
select EMP_NAME, company, JOIN_DATE, resigned_date, count(1)count
FROM EMP_DETAILS
GROUP BY EMP_NAME, company, JOIN_DATE, resigned_date HAVING COUNT()>1) E2 ON E1.Emp_Name=E2.emp_name and E1.join_date=E2.join_date AND E1.resigned_date=E2.resigned_date
I'm not sure exactly what you're asking since your query appears to give the results you describe.
However if I were to ignore your example query and just write a query based on your given specification I would suggest something like the following
with d as (
select * ,
row_number() over(partition by emp_name, company, join_date, resigned_date order by join_date) as rn
from Emp_Details E1
)
select emp_name, company, join_date, resigned_date,
case when rn=1 then 'No' else 'Yes' end IsDuplicate
from d
Edit...
select emp_name, company, join_date, resigned_date,
case when count(*) over(partition by Emp_Name, Company, Join_Date, Resigned_Date) = 1 then 'No' else 'Yes' end IsDuplicate
from Emp_Details
order by emp_name, company, join_date
I have a requirement to find emplid having data difference in same table. Table consist of 50-60 columns.. I need to check if any column has change in data from previous row, emplidshould get pick up as well as if any new employee get add that also needs to pick up..
I have created a basic query and it is working but need some way to achieve same purpose as I do not want to write every column name.
My query:
select
emplid
from
ps_custom_tbl t, ps_custom_tbl prev_t
where
prev_t.emplid = t.emplid
and t.effdt = (select max effdt from ps_custom_tbl t2
where t2.emplid = t.emplid)
and prev_t.effdt = (select max(effdt) from ps_custom_tbl prev_t2
where emplid = prev_t.emplid and effdt < t.effdt)
and (t.first_name prev_t.first_name Or t.last_name prev_t.last_name …. 50 columns);
Can you please suggest another way to achieve same thing?
You can use MINUS.
if no_data then both are the same, if there are some records - mean that there is a difference between
create table emp as select * from hr.employees;
insert into emp select employee_id+1000, first_name, last_name, email, phone_number, hire_date, job_id, salary, commission_pct, manager_id,
decode(department_id ,30,70, department_id)
from hr.employees;
select first_name, last_name, email, phone_number, hire_date, job_id, salary, commission_pct, manager_id, department_id
from emp where employee_id <= 1000
minus
select first_name, last_name, email, phone_number, hire_date, job_id, salary, commission_pct, manager_id, department_id
from emp where employee_id > 1000;
But you have to list all columns, because if you have eg different dates or ids - they will be compared too. But it's easier to list columns in SELECT clause then write for everyone WHERE condition.
Maybe it will help.
-- or if different tables and want to compare all cols simply do
drop table emp;
create table emp as select * from hr.employees;
create table emp2 as
select employee_id, first_name, last_name, email, phone_number, hire_date, job_id, salary, commission_pct, manager_id,
decode(department_id ,30,70, department_id) department_id
from hr.employees;
select * from emp
minus
select * from emp2;
---- ADD DATE CRITERIA
-- yes, you can add date criteria and using analytical functions check which
-- is newer and which is
older and then compare one to another. like below:
drop table emp;
create table emp as select * from hr.employees;
insert into emp
select
employee_id,
first_name,
last_name,
email,
phone_number,
hire_date+1,
job_id,
salary,
commission_pct,
manager_id,
decode(department_id ,30,70, department_id)
from hr.employees;
with data as --- thanks to WITH you retrieve data only once
(select employee_id, first_name, last_name, email, phone_number,
hire_date,
row_number() over(partition by employee_id order by hire_date desc) rn -- distinguish newer and older record,
job_id, salary, commission_pct, manager_id, department_id
from emp)
select employee_id, first_name, last_name, email, phone_number, department_id from data where rn = 1
MIUNUS--- find the differences
select employee_id, first_name, last_name, email, phone_number, department_id from data where rn = 2;
You will have to write all columns in some sense no matter what you do.
In terms of comparing current and previous, you might find this easier
select
col1,
col2,
...
lag(col1) over ( partition by empid order by effdt ) as prev_col1,
lag(col2) over ( partition by empid order by effdt ) as prev_col2
...
and then you comparison will be along the lines of
select *
from ( <query above >
where
decode(col1,prev_col1,0,1) = 1 or
decode(col2,prev_col2,0,1) = 1 or
...
The use of DECODE in this way handles the issues of nulls.
My requirement is to send out data to managers, they change any/all/none of the data in the columns, and send back to me. I then have to identify each column that has a difference from what I sent, and mark those columns as changed for a central office reviewer to visually scan and approve/deny the changes for integration back into the central data set.
This solution may not fit your needs of course, but a template structure is offered here that you can augment to meet your needs no matter the number of columns. In the case of your question, 50-60 columns will make this SQL query huge, but I've written heinously long queries in the past with great success. Add columns a few at a time rather than all wholesale according to this template and see if they work along the way.
You could easily write pl/sql to write this query for you for the tables in question.
This would get very cumbersome if you had to compare columns from 3 or more tables or bi-directional changes. I only care about single direction changes. Did the person change my original row columns or not. If so, what columns did they change, and what was my before value and what is their after value, and show me nothing else please.
In other words, only show me rows with columns that have changes with their before values and nothing else.
create table thing1 (id number, firstname varchar2(10), lastname varchar2(10));
create table thing2 (id number, firstname varchar2(10), lastname varchar2(10));
insert into thing1 values (1,'Buddy', 'Slacker');
insert into thing2 values (1,'Buddy', 'Slacker');
insert into thing1 values (2,'Mary', 'Slacker');
insert into thing2 values (2,'Mary', 'Slacke');
insert into thing1 values (3,'Timmy', 'Slacker');
insert into thing2 values (3,'Timm', 'Slacker');
insert into thing1 values (4,'Missy', 'Slacker');
insert into thing2 values (4,'Missy', 'Slacker');
commit;
Un-comment commented select * queries one at a time after each data set to understand what is in each data set at each stage of the refinement process.
with rowdifferences as
(
select
id
,firstname
,lastname
from thing2
minus
select
id
,firstname
,lastname
from thing1
)
--select * from rowdifferences
,thing1matches as
(
select
t1.id
,t1.firstname
,t1.lastname
from thing1 t1
join rowdifferences rd on t1.id = rd.id
)
--select * from thing1matches
, col1differences as
(
select
id
,firstname
from rowdifferences
minus
select
id
,firstname
from thing1matches
)
--select * from col1differences
, col2differences as
(
select
id
,lastname
from rowdifferences
minus
select
id
,lastname
from thing1matches
)
--select * from col2differences
,truedifferences as
(
select
case when c1.id is not null then c1.id
when c2.id is not null then c2.id
end id
,c1.firstname
,c2.lastname
from col1differences c1
full join col2differences c2 on c1.id = c2.id
)
--select * from truedifferences
select
t1m.id
,case when td.firstname is not null then t1m.firstname end beforefirstname
,td.firstname afterfirstname
,case when td.lastname is not null then t1m.lastname end beforelastname
,td.lastname afterlastname
from thing1matches t1m
join truedifferences td on t1m.id = td.id
;
I have a table called employee and has columns as follows
emp_id number
emp_name varchar(30)
salary float
dept_id number
I want to get the output as any one name of employee within that department and employee count from each department. I tried the below, but didn't work well
SELECT emp_name, count(*) FROM emp
GROUP BY dept_id, emp_name;
Expected output:
emp_name, count(*)
abc, 4
def, 2
xyz, 10
Can anyone suggest?
You can try this if you want just a basic "random employee" shown for each department.
select emp_name, emp_count
from (
select emp_name, dept_id,
count(*) over (partition by dept_id) emp_count,
row_number() over (partition by dept_id
order by dbms_random.value ) rnum
from employee
)
where rnum = 1
/
This uses analytic function to calculate the counts, and then pick off 1 random row to display.
I have a EMPLOYEE table with EMP_ID,NAME and DEPARTMENT_ID.
I want to order all the record with odd DEPARTMENT_ID as ASC and even DEPARTMENT_ID as DESC.
Can it be done?
Thank you
You can use CASE in the ORDER BY, change the sign accordingly:
...
ORDER BY CASE WHEN DEPARTMENT_ID % 2 = 0
THEN -DEPARTMENT_ID
ELSE DEPARTMENT_ID END ASC;
Try this
SELECT *
FROM EMPLOYEE
ORDER BY CASE WHEN DEPARTMENT_ID % 2 = 1 then 1 else 2 end,DEPARTMENT_ID
Makes two queries, first one filters odds, second pairs. Order by as you wish and then Union queries.
SELECT e.* FROM (SELECT *
FROM EMPLOYEE
WHERE MOD(DEPARTMENT_ID, 2) = 1
ORDER BY DEPARTMENT_ID ASC) e
UNION ALL
SELECT e1.* FROM (SELECT *
FROM EMPLOYEE
WHERE MOD(DEPARTMENT_ID, 2) = 0
ORDER BY DEPARTMENT_ID DESC) e1
I have a table Employee, one of its attribute is 'Gender'. In Gender column we have two type of records 'male' or 'female'
Now i suppose to write a query which should give me an output like 1st record should be for 'Male', 2nd record for 'Female', 3rd for 'Male',
4th for 'Female'.
I have used below query to fetch record as above mentioned
select name, empid, salary, gender, rownum rn, case gender when 'Male' then
rn = (select * from (select rownum rn from Employee) where mod (rn, 2) <> 0)
else rn = (select * from (select rownum rn from Employee) where mod (rn, 2) = 0) end as Org_Gender form employee;
but this query not able to fetch the required output.
Can someone give me the syntax please.?
This should do it:
select empid, name, gender
from (
select name, empid, gender,
row_number() over (partition by gender order by name) as rn
from employee
) t
order by rn, gender
row_number() over (partition by gender ..) will number all females from 1 to x and all males from 1 to x. By ordering the outer query using that value the final output will have the first female, then the first male, the second female, the second male and so on.
SQLFiddle example: http://sqlfiddle.com/#!4/23f656/2