I have a SQL Server 2000 database. How do I remove duplicate rows from a table without a key field? - sql-server-2000

I also do not have the ability to create a temp table to move data to, most of the suggestions I have seen either require a unique ID or the ability to create a table to move unique rows to. I also cant add a key to the table.
My table structure (relevant columns) is:
Customer_code, carrier, rack, bin
Thanks.

;WITH x AS
(
SELECT id, gid, url, rn = ROW_NUMBER() OVER
(PARTITION BY gid, url ORDER BY id)
FROM dbo.table
)
SELECT id,gid,url FROM x WHERE rn = 1

Related

Deleting multiple duplicate rows

This code I have finds duplicate rows in a table. H
SELECT position, name, count(*) as cnt
FROM team
GROUP BY position, name,
HAVING COUNT(*) > 1
How do I delete the duplicate rows that I have found in Hiveql?
Apart from distinct, you can use row_number for this in Hive. Explicit delete and update can only be performed on tables that support ACID. So insert overwrite is more universal.
insert overwrite table team
select position, name, other1, other2...
from (
select
*,
row_number() over(partition by position, name order by rand()) as rn
from team
) tmp
where rn = 1
;
Please try this.assuming id is primary key column
delete from team where id in (
select t1.id from team t1,
(SELECT position, name, count(*) as cnt ,max(id) as id1
FROM team
GROUP BY position, name,
HAVING COUNT(*) > 1) t2
where t1.position=t2.position
and t1.name=t2.name
and t1.id<>t2.id1)
This is an alternative way, since deletes are expensive in Hive
Create table Team_new
As
Select distinct <col1>, <col2>,...
from Team;
Drop table Team purge;
Alter table Team_new rename to Team;
This is assuming you don’t have an id column. If you have an id column then the 1st query would change slightly as
Create table Team_new
As
Select <col1>,<col2>,...,max(id) as id from Team
Group by <col1>,<col2>,... ;
Other queries (drop & alter post this) would remain the same as above.

How to delete duplicate rows that are exactly the same in SQL Server [duplicate]

This question already has answers here:
Delete duplicate records from a SQL table without a primary key
(20 answers)
Delete duplicate entries keeping one entry of each if id column not available
(2 answers)
Closed 2 years ago.
I loaded some data into a SQL Server table from a .CSV file for test purposes, I don't have any primary key, unique key or auto-generated ID in that table.
Helow is an example of the situation:
select *
from people
where name in (select name
from people
group by name
having count(name) > 1)
When I run this query, I get these results:
The goal is to keep one row and remove other duplicate rows.
Is there any way other than save the content somewhere else, delete all duplicate rows and insert a new one?
Thanks for helping!
You could use an updatable CTE for this.
If you want to delete rows that are exact duplicates on the three columns (as shown in your sample data and explained in the question):
with cte as (
select row_number() over(partition by name, age, gender order by (select null)) rn
from people
)
delete from cte where rn > 1
If you want to delete duplicates on name only (as shown in your existing query):
with cte as (
select row_number() over(partition by name order by (select null)) rn
from people
)
delete from cte where rn > 1
How are you defining "duplicate"? Based on your code example, it appears to be by name.
For the deletion, you can use an updatable CTE with row_number():
with todelete as (
select p.*,
row_number() over (partition by name order by (select null)) as seqnum
from people p
)
delete from todelete
where seqnum > 1;
If more columns define the duplicate, then adjust the partition by clause.

Change value of duplicated rows

There is a table with tow columns(ID, Data) and there are 3 rows with same value.
ID Data
4 192.168.0.22
4 192.168.0.22
4 192.168.0.22
Now I want to change third row DATA column. In update SQL Server Generate an error that I ca not change the value.
I can delete all 3 rows. But I can not delete third row separately.
This table is for a software that I bought and I changed the third Server IP.
You can try the following query
create table #tblSimilarValues(id int, ipaddress varchar(20))
insert into #tblSimilarValues values (4, '192.168.0.22'),
(4, '192.168.0.22'),(4, '192.168.0.22')
Use Below query if you want to change all rows
with oldData as (
select *,
count(*) over (partition by id, ipaddress) as cnt
from #tblSimilarValues
)
update oldData
set ipaddress = '192.168.0.22_1'
where cnt > 1;
select * from #tblSimilarValues
Use Below query if you want to skip firs row
;with oldData as (
select *,
ROW_NUMBER () over (partition by id, ipaddress order by id, ipaddress) as cnt
from #tblSimilarValues
)
update oldData
set ipaddress = '192.168.0.22_2'
where cnt > 1;
select * from #tblSimilarValues
drop table #tblSimilarValues
You can find the live demo live demo here
Since there is no column that allows us to distinguish these rows from each other, there's no "third row" (nor a first or second one for that matter).
We can use a ROW_NUMBER function to apply arbitrary row numbers to these rows, however, and if we place that in a CTE, we can apply DELETE/UPDATE actions via the CTE and use the arbitrary row numbers:
declare #t table (ID int not null, Data varchar(15))
insert into #t(ID,Data) values
(4,'192.168.0.22'),
(4,'192.168.0.22'),
(4,'192.168.0.22')
;With ArbitraryAssignments as (
select *,ROW_NUMBER() OVER (PARTITION BY ID, Data ORDER BY Data) as rn
from #t
)
delete from ArbitraryAssignments where rn > 2
select * from #t
This produces two rows of output - one row was deleted.
Note that I say that the ROW_NUMBER is arbitrary. One of the expressions in both the PARTITION BY and ORDER BY clauses is the same. By definition, then, we know that no real ORDER is defined by this (because all rows within the same partition, by definition, have the same value for that expression).
In this case ID columns allows duplicate value which is wrong, ID should be unique.
Now what you can do is create a new column make that unique or Primary Key or change the duplicate values of ID column and make it Unique/Primary key.
Now as per your Unique key/Primary key you can update DATA column value by query as below:
UPDATE <Table Name>
SET DATA = 'new data'
WHERE ID = 3;

Remove duplicates from table in bigquery

I found duplicates in my table by doing below query.
SELECT name, id, count(1) as count
FROM [myproject:dev.sample]
group by name, id
having count(1) > 1
Now i would like to remove these duplicates based on id and name by using DML statement but its showing '0 rows affected' message.
Am i missing something?
DELETE FROM PRD.GPBP WHERE
id not in(select id from [myproject:dev.sample] GROUP BY id) and
name not in (select name from [myproject:dev.sample] GROUP BY name)
I suggest, you create a new table without the duplicates. Drop your original table and rename the new table to original table.
You can find duplicates like below:
Create table new_table as
Select name, id, ...... , put our remaining 10 cols here
FROM(
SELECT *,
ROW_NUMBER() OVER(Partition by name , id Order by id) as rnk
FROM [myproject:dev.sample]
)a
WHERE rnk = 1;
Then drop the older table and rename new_table with old table name.
Below query (BigQuery Standard SQL) should be more optimal for de-duping like in your case
#standardSQL
SELECT AS VALUE ANY_VALUE(t)
FROM `myproject.dev.sample` AS t
GROUP BY name, id
If you run it from within UI - you can just set Write Preference to Overwrite Table and you are done
Or if you want you can use DML's INSERT to new table and then copy over original one
Meantime, the easiest way is as below (using DDL)
#standardSQL
CREATE OR REPLACE TABLE `myproject.dev.sample` AS
SELECT * FROM (
SELECT AS VALUE ANY_VALUE(t)
FROM `myproject.dev.sample` AS t
GROUP BY name, id
)

Deleting rows where the Primary key is duplicated - SQL

My issue is how do we delete a primary key row in case it is duplicated. The other fields may/may not be duplicates. I am interested only in the primary key being duplicated and would like to retain the first instance while deleting the other duplicate entries.
For example,
I have 2 tables with the following data:
Table1:- Portfolio
Columns:- PortfolioID(PK), PortfolioName
Sample data :-
1, North America
2, Europe
3, Asia
Table2:- Account
Columns:- AccountID(PK), PortfolioID(FK), AccountName
Sample data :-
1,1,Quake
1,1,Wind
2,1,Fire
3,1,Quake
4,2,Flood
5,2,Wind
Lets say for PortfolioID = 1,
I am trying to delete row number 2 from the Account table where the AccountID 1 is repeated for PortfolioID =1. I have tried using the CTE expression where I use the ROW_NUMBER statement and try to delete ROWNUMBER <> 1. But this query doesn't work as it deletes all the rows in the table.
The query I tried:
WITH CTE AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY [Account].[AccountID] ORDER BY [Account].[AccountID]) AS [ROWNUMBER],
[Account].[AccountID]
FROM [Account]
INNER JOIN [Portfolio] ON [Portfolio].[PortfolioID] = [Account]. [PortfolioID]
WHERE [Portfolio].[PortfolioID] = 1
)
DELETE [Account]
FROM [CTE]
WHERE [ROWNUMBER] <> 1
Am I doing something wrong in the query? Thanks in advance for the help.
Firstly, if you define the AccountID column as the primary key in your database, this going forward will help solve having these kinds of problems.
Secondly, are you using Sql Server? Which version?
Assuming you are using Sql Server and a recent version which allows you to use windowing, you can try something like this to delete any duplicates that you have.
This will delete ALL copies of ALL duplicates:
WITH CTE AS
(SELECT *,R=RANK() OVER (ORDER BY AccountID,PortfolioID)
FROM Account)
DELETE CTE
WHERE R IN (SELECT R FROM CTE GROUP BY R HAVING COUNT(*)>1)
This alternative script will keep one of the duplicates if that is what you prefer:
WITH CTE AS
(
SELECT *,ROW_NUMBER() OVER (PARTITION BY AccountID,PortfolioID ORDER BY AccountID,PortfolioID) AS RN
FROM Account
)
DELETE FROM CTE WHERE RN<>1
Finally, if you want to only delete duplicates for Portfolio Id 1:
WITH CTE AS
(
SELECT *,ROW_NUMBER() OVER (PARTITION BY AccountID,PortfolioID ORDER BY AccountID,PortfolioID) AS RN
FROM Account
Where PortfolioID = 1
)
DELETE FROM CTE WHERE RN<>1
Primary key column never ever support duplicate entries.
Try with the below query for the desired result based on the given data/inputs.
;WITH CTE AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY a.[AccountID],a.PortfolioID ORDER BY a.[AccountID]) AS [ROWNUMBER],*
FROM [Account] a
WHERE a.[PortfolioID] = 1
)
DELETE
FROM [CTE]
WHERE [ROWNUMBER] > 1