I am attempting to delete duplicate rows from my table for example if my customer table contained the following:
first_name last_name email
fred wilford wilford#xchange.co.uk
fred wilford wilford#xchange.co.uk
Damian Jones jones#xchange.co.uk
the ideal result should be the following:
first_name last_name email
fred wilford wilford#xchange.co.uk
Damian Jones jones#xchange.co.uk
this should be fairly straightforward to do with an intermediary table created containing the duplicate rows before deleting the duplicates in the master table and lastly insert all rows in the intermediary table back into the master table. However I would prefer to remove the intermediary table and just use something like a with statement.
consider the following example:
with dups as
(
select name,last_name,email from customer group by 1,2,3 having
count(*) > 1
)
delete from customer
using
(
select name,last_name,email from customer group by 1,2,3 having
count(*) > 1
)b
where b.name = customer.name;
insert into customer
(
select name,last_name,email from dups
)
the trouble is the final insert statement fails as "dups" is not recognised. Is there a way to fix this? Thanks in advance
You can chain the CTE if you want to:
WITH dups AS
(
select name,last_name,email from customer group by 1,2,3 having
count(*) > 1
),
del AS(
DELETE FROM customer USING dups WHERE dups.name = customer.name RETURNING dups.*
),
ins AS(
INSERT INTO customer(name,last_name,email) SELECT name,last_name,email FROM del RETURNING del.*
)
SELECT * FROM ins;
You could also do it in this way:
Schema:
create table tbl (first_name varchar(50),
last_name varchar(50),
email varchar(50));
insert into tbl values
('fred','wilford','wilford#xchange.co.uk'),
('fred','wilford','wilford#xchange.co.uk'),
('Damian','Jones','jones#xchange.co.uk');
DO this:
CREATE TABLE temp (first_name varchar(50),
last_name varchar(50),
email varchar(50));
INSERT INTO temp SELECT DISTINCT * FROM tbl;
DROP TABLE tbl;
ALTER TABLE temp RENAME TO tbl;
check:
select * from tbl;
result:
first_name last_name email
fred wilford wilford#xchange.co.uk
Damian Jones jones#xchange.co.uk
Instead of a WITH clause, you could create dups as a temporary table:
CREATE TEMP TABLE dups (name, last_name, email ) AS
(
select name,last_name,email from customer group by 1,2,3 having
count(*) > 1
);
Related
I have these test tables which I would like to select and combine the result by timestamp:
create table employees
(
id bigint primary key,
account_id integer,
first_name varchar(150),
last_name varchar(150),
timestamp timestamp
);
create table accounts
(
id bigint primary key,
account_name varchar(150) not null,
timestamp timestamp
);
create table short_name
(
account_id bigint primary key,
full_name varchar(150) not null
);
INSERT INTO short_name(account_id, full_name)
VALUES(1, 'city 1');
INSERT INTO short_name(account_id, full_name)
VALUES(2, 'city 2');
INSERT INTO employees(id, account_id, first_name, last_name, timestamp)
VALUES(1, 1, 'Donkey', 'Kong', '10-10-10');
INSERT INTO employees(id, account_id, first_name, last_name, timestamp)
VALUES(2, 2, 'Ray', 'Kurzweil', '11-10-10');
INSERT INTO employees(id, account_id, first_name, last_name, timestamp)
VALUES(32, 2, 'Ray2', 'Kurzweil2', '1-10-10');
INSERT INTO employees(id, account_id, first_name, last_name, timestamp)
VALUES(33, 2, 'Ray3', 'Kurzweil3', '2-10-10');
INSERT INTO employees(id, account_id, first_name, last_name, timestamp)
VALUES(3432, 3, 'Percy', 'Fawcett', '6-10-10');
INSERT INTO accounts(id, account_name, timestamp)
VALUES(1, 'DK Banana Account', '5-10-10');
INSERT INTO accounts(id, account_name, timestamp)
VALUES(2, 'Kurzweil''s invetions moneyz baby!', '10-10-10');
INSERT INTO accounts(id, account_name, timestamp)
VALUES(3, 'Amazonian Emergency Fund', '10-10-10');
select *, e.timestamp, sn.full_name from employees e
INNER JOIN short_name as sn on sn.account_id = e.id
union all
select *, a.timestamp from accounts a
where timestamp >= '2022-03-25T13:00:00'
and timestamp < '2022-04-04T13:00:00'
AND timestamp IS NOT NULL
order by timestamp;
https://www.db-fiddle.com/f/pwzwQTsHuP27UDF17eAQy4/36
How I can select the tables and display a combined table rows ordered by timestamp?
The problem is that I have a different number of table columns and I would like to display them also and globally to sort all rows by timestamp.
Is it possible to display also the name of the tables as a first column into the select result?
Example result for result with table name:
table_name
timestamp
employees
2010-10-10T00:00:00.000Z
accounts
2010-11-10T00:00:00.000Z
As others have mentioned, you haven't given a clear example of what you want the output to be; however, here's my attempt assuming you want one record per employee and one additional record per account.
Each row of the result set contains every possible column. These can be removed/reordered in the final select.
Query
with accounts_and_employees as (
select
'accounts' as table_name,
accounts.id,
accounts.id as account_id,
accounts.timestamp,
account_name,
null as first_name,
null as last_name
from accounts
union
select
'employees' as table_name,
employees.id,
account_id,
employees.timestamp,
account_name,
first_name,
last_name
from employees
join accounts
on employees.account_id = accounts.id
)
select accounts_and_employees.*, full_name
from accounts_and_employees
left join short_name
on short_name.account_id = accounts_and_employees.account_id
where timestamp between '2010-01-10' and '2010-10-30'
order by timestamp;
table_name
id
account_id
timestamp
account_name
first_name
last_name
full_name
employees
32
2
2010-01-10T00:00:00.000Z
Kurzweil's invetions moneyz baby!
Ray2
Kurzweil2
city 2
employees
33
2
2010-02-10T00:00:00.000Z
Kurzweil's invetions moneyz baby!
Ray3
Kurzweil3
city 2
accounts
1
1
2010-05-10T00:00:00.000Z
DK Banana Account
city 1
employees
3432
3
2010-06-10T00:00:00.000Z
Amazonian Emergency Fund
Percy
Fawcett
accounts
3
3
2010-10-09T00:00:00.000Z
Amazonian Emergency Fund
employees
1
1
2010-10-10T00:00:00.000Z
DK Banana Account
Donkey
Kong
city 1
accounts
2
2
2010-10-10T00:00:00.000Z
Kurzweil's invetions moneyz baby!
city 2
View on DB Fiddle
If your output is just table name and timestamp, then you don't need any JOIN.
Just UNION employees and accounts.
select tablename, timestamp from
(select 'accounts' tablename, timestamp from accounts
union
select 'employees' tablename, timestamp from employees) a
order by timestamp
Otherwise, since the tables don't have same columns names, you'll need to make them having same names using column aliases.
select tablename, name, timestamp from
(select 'accounts' tablename, account_name 'name', timestamp from accounts
union
select 'employees' tablename, concat(first_name,last_name) 'name', timestamp from employees
) a
order by timestamp
I am unsure what you try to achieve. but you have to "pad" the number of columns missing, but the second query doesn't have any rpws so you don't see it in you fiddle
select *, e.timestamp, sn.full_name from employees e
INNER JOIN short_name as sn on sn.account_id = e.id
Union all
select null,null,null,null,null,NULL,NULL, a.timestamp,''
from accounts a
where timestamp >= '2022-03-25T13:00:00'
and timestamp < '2022-04-04T13:00:00'
AND timestamp IS NOT NULL
order by 8;
SELECT 'employee' as type, e.id, e.timestamp, sn.full_name
FROM employees e
INNER JOIN
short_name as sn on sn.account_id = e.id
UNION ALL
SELECT 'account' as type, a.id, a.timestamp, '' as short_name
FROM
accounts a
WHERE timestamp IS NOT NULL
ORDER BY timestamp;
You can use dummy-columns to make the column-count fitting.
To show the concept, I dropped the timestamp-filter. You can of course re-add it and also select additional columns (but of course then you also have to use some more dummy-columns)
I have a requirement to find emplid having data difference in same table. Table consist of 50-60 columns.. I need to check if any column has change in data from previous row, emplidshould get pick up as well as if any new employee get add that also needs to pick up..
I have created a basic query and it is working but need some way to achieve same purpose as I do not want to write every column name.
My query:
select
emplid
from
ps_custom_tbl t, ps_custom_tbl prev_t
where
prev_t.emplid = t.emplid
and t.effdt = (select max effdt from ps_custom_tbl t2
where t2.emplid = t.emplid)
and prev_t.effdt = (select max(effdt) from ps_custom_tbl prev_t2
where emplid = prev_t.emplid and effdt < t.effdt)
and (t.first_name prev_t.first_name Or t.last_name prev_t.last_name …. 50 columns);
Can you please suggest another way to achieve same thing?
You can use MINUS.
if no_data then both are the same, if there are some records - mean that there is a difference between
create table emp as select * from hr.employees;
insert into emp select employee_id+1000, first_name, last_name, email, phone_number, hire_date, job_id, salary, commission_pct, manager_id,
decode(department_id ,30,70, department_id)
from hr.employees;
select first_name, last_name, email, phone_number, hire_date, job_id, salary, commission_pct, manager_id, department_id
from emp where employee_id <= 1000
minus
select first_name, last_name, email, phone_number, hire_date, job_id, salary, commission_pct, manager_id, department_id
from emp where employee_id > 1000;
But you have to list all columns, because if you have eg different dates or ids - they will be compared too. But it's easier to list columns in SELECT clause then write for everyone WHERE condition.
Maybe it will help.
-- or if different tables and want to compare all cols simply do
drop table emp;
create table emp as select * from hr.employees;
create table emp2 as
select employee_id, first_name, last_name, email, phone_number, hire_date, job_id, salary, commission_pct, manager_id,
decode(department_id ,30,70, department_id) department_id
from hr.employees;
select * from emp
minus
select * from emp2;
---- ADD DATE CRITERIA
-- yes, you can add date criteria and using analytical functions check which
-- is newer and which is
older and then compare one to another. like below:
drop table emp;
create table emp as select * from hr.employees;
insert into emp
select
employee_id,
first_name,
last_name,
email,
phone_number,
hire_date+1,
job_id,
salary,
commission_pct,
manager_id,
decode(department_id ,30,70, department_id)
from hr.employees;
with data as --- thanks to WITH you retrieve data only once
(select employee_id, first_name, last_name, email, phone_number,
hire_date,
row_number() over(partition by employee_id order by hire_date desc) rn -- distinguish newer and older record,
job_id, salary, commission_pct, manager_id, department_id
from emp)
select employee_id, first_name, last_name, email, phone_number, department_id from data where rn = 1
MIUNUS--- find the differences
select employee_id, first_name, last_name, email, phone_number, department_id from data where rn = 2;
You will have to write all columns in some sense no matter what you do.
In terms of comparing current and previous, you might find this easier
select
col1,
col2,
...
lag(col1) over ( partition by empid order by effdt ) as prev_col1,
lag(col2) over ( partition by empid order by effdt ) as prev_col2
...
and then you comparison will be along the lines of
select *
from ( <query above >
where
decode(col1,prev_col1,0,1) = 1 or
decode(col2,prev_col2,0,1) = 1 or
...
The use of DECODE in this way handles the issues of nulls.
My requirement is to send out data to managers, they change any/all/none of the data in the columns, and send back to me. I then have to identify each column that has a difference from what I sent, and mark those columns as changed for a central office reviewer to visually scan and approve/deny the changes for integration back into the central data set.
This solution may not fit your needs of course, but a template structure is offered here that you can augment to meet your needs no matter the number of columns. In the case of your question, 50-60 columns will make this SQL query huge, but I've written heinously long queries in the past with great success. Add columns a few at a time rather than all wholesale according to this template and see if they work along the way.
You could easily write pl/sql to write this query for you for the tables in question.
This would get very cumbersome if you had to compare columns from 3 or more tables or bi-directional changes. I only care about single direction changes. Did the person change my original row columns or not. If so, what columns did they change, and what was my before value and what is their after value, and show me nothing else please.
In other words, only show me rows with columns that have changes with their before values and nothing else.
create table thing1 (id number, firstname varchar2(10), lastname varchar2(10));
create table thing2 (id number, firstname varchar2(10), lastname varchar2(10));
insert into thing1 values (1,'Buddy', 'Slacker');
insert into thing2 values (1,'Buddy', 'Slacker');
insert into thing1 values (2,'Mary', 'Slacker');
insert into thing2 values (2,'Mary', 'Slacke');
insert into thing1 values (3,'Timmy', 'Slacker');
insert into thing2 values (3,'Timm', 'Slacker');
insert into thing1 values (4,'Missy', 'Slacker');
insert into thing2 values (4,'Missy', 'Slacker');
commit;
Un-comment commented select * queries one at a time after each data set to understand what is in each data set at each stage of the refinement process.
with rowdifferences as
(
select
id
,firstname
,lastname
from thing2
minus
select
id
,firstname
,lastname
from thing1
)
--select * from rowdifferences
,thing1matches as
(
select
t1.id
,t1.firstname
,t1.lastname
from thing1 t1
join rowdifferences rd on t1.id = rd.id
)
--select * from thing1matches
, col1differences as
(
select
id
,firstname
from rowdifferences
minus
select
id
,firstname
from thing1matches
)
--select * from col1differences
, col2differences as
(
select
id
,lastname
from rowdifferences
minus
select
id
,lastname
from thing1matches
)
--select * from col2differences
,truedifferences as
(
select
case when c1.id is not null then c1.id
when c2.id is not null then c2.id
end id
,c1.firstname
,c2.lastname
from col1differences c1
full join col2differences c2 on c1.id = c2.id
)
--select * from truedifferences
select
t1m.id
,case when td.firstname is not null then t1m.firstname end beforefirstname
,td.firstname afterfirstname
,case when td.lastname is not null then t1m.lastname end beforelastname
,td.lastname afterlastname
from thing1matches t1m
join truedifferences td on t1m.id = td.id
;
SELECT *
FROM employee
GROUP BY first_name
HAVING count(first_name) >= 1;
How can i retrieve all rows and columns with single occurrence of duplicates? i want to retrieve all the table contents including repeated data that must occur only at once. In a table first_name,last_name are repeated twice but with different in other info.
Please Help.
try this Sql Query
SELECT * FROM EMPLOYEE WHERE FIRST_NAME NOT IN
(
SELECT FIRST_NAME FROM
(
SELECT ROW_NUMBER() OVER(PARTITION BY FIRST_NAME ORDER BY FIRST_NAME) RNK,FIRST_NAME FROM EMPLOYEE
)A WHERE A.RNK=2
)
Hi sorry for what seems such a simple question in advance...
I have a table with some millions of rows of laboratory data and the following fields (amongst others)
Laboratory Reference Number
Forename
Surname
DOB
I need to do a query that will give me all of the distinct laboratory Reference Number
, forename, surname and DOBs where the laboratory Reference Number
has more than one associated forename, surname and DOB.
i.e. a query to highlight where a laboratory Reference Number has duplicate candidates associated with it
e.g.
12345, Bob, Smith, 30/038/1981
12345, Fred, Smith, 31/03/1981
Any help would be much appreciated.
SELECT * FROM TABLE WHERE REF IN
(SELECT REF FROM TABLE GROUP BY REF HAVING COUNT(*) > 1)
You could also use SELECT DISTINCT * if necessary
select RefNr
, Forename
, Surname
, DOB
from YourTable yt1
where exists
(
select *
from YourTable yt2
where yt1.RefNr = yt2.RefNr
and
(
yt1.Forename <> yt2.Forename
or yt1.Surname <> yt2.Surname
or yt1.DOB <> yt2.DOB
)
)
how to remove a multiple records for same zipcode keeping atleast one record for that zipcode in database table
id zipcode
1 38000
2 38000
3 38000
4 38005
5 38005
i want table with two column with id and zipcode ...
my final will be following
id zipcode
1 38000
4 38005
How about
delete from myTable
where id not in (
select Min( id )
from myTable
group by zipcode )
That lets you keep your lowest IDs, which is what you seemed to want.
To just select that resultset you would use a DISTINCT statement:
SELECT id, zipcode
FROM table
WHERE zipcode IN (SELECT DISTINCT zipcode FROM table)
To delete the other records and keep only one you usea subquery like so:
DELETE FROM table
WHERE id NOT IN
(SELECT id
FROM table
WHERE zipcode IN (SELECT DISTINCT zipcode FROM table)
)
You can also accomplish this using a join if you perfer.
with cte as (
select row_number() over (partitioned by zipcode order by id desc) as rn
from table)
delete from cte
where rn > 1;
This has the advantage of correctly handling duplicates and offers tight control over what gets deleted and what gets kept.
Create temporary table with desired result:
select min(id), zipcode
into tmp_sometable
from sometable
group by zipcode
Remove the original table:
drop table sometable
Rename temporary table:
sp_rename 'tmp_sometable', 'sometable';
or something like:
delete from sometable
where id not in
(
select min(id)
from sometable
group by zipcode
)
delete from table where id not in (select min(id) from table zipcode in(select distinct zipcode from table));
select distinct zipcode from table - would give the distinct zipcode in the table
select min(id) from table zipcode in(select distinct zipcode from table) - would give the record with the min ID for each zip code
delete from table where id not in (select min(id) from table zipcode in(select distinct zipcode from table)) - this would delete all the records in the table that are not there as a result of query 2
There's an easier way if you want the lowest ID number. I just tested this:
SELECT
min(ID),
zipcode
FROM #zip
GROUP BY zipcode