I need to delete from row number 475 to 948 due to them being duplicates of rows 1-474. It would be something close to this, I presume, or is there more to it?
DELETE FROM dbo.industry WHERE row_number between 475 and 948
May be it is too late, but I am usually doing this
; with cte(rownum)as(
select row_number () over(partition by [Col1], [Col2] order by Col3) from [table]
)
delete from cte where rownum > 1
DELETE FROM dbo.industry
WHERE COLUMN_NAME IN -- Choose a column name
(SELECT TOP 1000
COLUMN_NAME, -- Choose a column name
ROW_NUMBER() OVER( ORDER by COLUMN_NAME ASC) AS Row_Number
FROM dbo.industry
WHERE Row_Number BETWEEN 475 AND 948 )
COLUMN_NAME can be any column name of your table u want.
If you are trying to delete using the Row_Number function, and you get an error of
Windowed functions can only appear in the SELECT or ORDER BY clauses
you can revise the SQL to have it in the select clause as in the example below:
Delete T
From (Select Row_Number() Over(Partition By [IndustryType], [IndustryDescription] order By [ID]) As RowNumber,*
From dbo.industry) T
Where T.RowNumber > 1
ALTER TABLE dbo.industryADD COLUMN tmpRowNumber COUNTER
DELETE * FROM dbo.industry WHERE tmpRowNumber IS BETWEEN 475 AND 948
ALTER TABLE dbo.industry DROP COLUMN tmpRowNumber
Carefull though, your DBS may not consider row #1 as row #1 as tables are unsorted
SELECT DISTINCT *
INTO #Temp
FROM dbo.industry
DELETE FROM dbo.industry
INSERT INTO dbo.industry
SELECT *
FROM #Temp
DELETE FROM dbo.industry WHERE dbo.industry.
REPLACE WITH PK COLUMN NAME| IN (SELECT TOP 948 dbo.industry
REPLACE WITH PK COLUMN NAME| FROM dbo.industry WHERE dbo.industry
REPLACE WITH PK COLUMN NAME| > 475 ORDER BY dbo.industry
REPLACE WITH PK COLUMN NAME|)
This is not really an answer. There were a few issues with the data that made the answers above (while excellent) unrelated. I simply deleted the table and then re-imported it from fixed width. This time, I was more careful and did not have the duplication.
Related
I want to delete duplicate records without using ROW_NUMBER() function (SQL Server)
Example: Table with the following data:
name salary
-----------------
Husain 20000.00
Husain 20000.00
Husain 20000.00
Munavvar 50000.00
Munavvar 50000.00
After deleting the duplicate records
table should contains data like this:
name salary
-----------------
Husain 20000.00
Munavvar 50000.00
As the motivation for this question seems to be academic interest rather than practical use...
The table has no primary key but the undocumented pseudo column %%physloc%% can provide a substitute.
DELETE T1
FROM YourTable T1 WITH(TABLOCKX)
WHERE CAST(T1.%%physloc%% AS BIGINT)
NOT IN (SELECT MAX(CAST(%%physloc%% AS BIGINT))
FROM YourTable
GROUP BY Name, Salary)
In reality you should never use the above and just use row_number as it is more efficient and documented.
(Data Explorer Demo)
Another (academic) option, depending on what version of SQL server you're using:
;with CTE as (select lag(name) over (order by name) as name1
,lag(salary) over (order by name) as salary1
, *
from #table)
delete from cte where name = name1 and salary = salary1
You can use Common Table Expression combined with ROW_NUMBER() like this (This is the best way to delete duplicates):
WITH CTE AS(
SELECT t.name,t.salary
ROW_NUMBER() OVER(PARTITION BY t.name,t.salary ORDER BY (SELECT 1)) as rn
FROM YourTable t
)
DELETE FROM CTE WHERE RN > 1
ROW_NUMBER() will assign each group randomly ranking, only one will get the rank 1 , and every thing else will be deleted.
EDIT: I can suggest something else with out the use of ROW_NUMBER() :
SELECT distinct t.name,t.salart
INTO TEMP_FOR_UPDATE
FROM YourTable t;
TRUNCATE TABLE YourTable ;
INSERT INTO YourTable
SELECT * FROM TEMP_FOR_UPDATE;
DROP TEMP_FOR_UPDATE;
This will basically create a temp table containing distincted values from your table, truncate your table and re insert the distincted values into your table.
Select data from your table using group by name , salary (or
distinct).
Insert into temp table.
Delete data in original
Copy data from temp table to your original table
In oracle you can use as below
Delete from table
where rowid not in (select max(rowid) from test group by name, salary);
Suppose Table A has a column B like:
tbl A
------------------------
column B
10
10
20
20
50
50
40
Then how to delete duplicate records? E.g. in the above column there are two "10"s, so only one should be get deleted. This is the condition, how to implement it?
If you use SQL-Server(or other rdbms which support window functions) you can use ROW_NUMBER.
So this works in SQL-Server 2005 and upon:
WITH CTE -- a common-table-expression which is similar to a subquery, increases readability
(
SELECT ColumnB, RN = ROW_NUMBER() OVER (PARTITION BY ColumnB ORDER BY ColumnB)
FROM dbo.TblA
)
DELETE FROM CTE WHERE RN > 1
I like this approach because it's easy to use a SELECT instead to see what I'm going to delete.
This works with Oracle RDBMS
DELETE FROM table_A
WHERE ROWID IN (
SELECT rid
FROM (SELECT ROWID rid,
ROW_NUMBER () OVER (PARTITION BY B ORDER BY ROWID) rn
FROM table_A)
WHERE rn <> 1);
I have a table that has a lot of duplicated rows and no primary key.
I want to remove just the duplicated records, but when I try to do this it would remove all peers.
How can I find the ROWID from a table in Postgres?
Simplify this by one query level:
DELETE FROM table_name
WHERE ctid NOT IN (
SELECT min(ctid)
FROM table_name
GROUP BY $other_columns);
.. where duplicates are defined by equality in $other_columns.
There is no need to include columns from the GROUP BY clause in the SELECT list, so you don't need another subquery.
ctid in the current manual.
On PostgreSQL the physical location of the row is called CTID.
So if you want to view it use a QUERY like this:
SELECT CTID FROM table_name
To use it on a DELETE statement to remove the duplicated records use it like this:
DELETE FROM table_name WHERE CTID NOT IN (
SELECT RECID FROM
(SELECT MIN(CTID) AS RECID, other_columns
FROM table_name GROUP BY other_columns)
a);
Remember that table_name is the desired table and other_columns are the columns that you want to use to filter that.
Ie:
DELETE FROM user_department WHERE CTID NOT IN (
SELECT RECID FROM
(SELECT MIN(CTID) AS RECID, ud.user_id, ud.department_id
FROM user_department ud GROUP BY ud.user_id, ud.department_id)
a);
You should consider using row_number() if want to delete based on a unique id column(or a timestamp), since ctid alone is not always reliable when you want to only keep recent records etc.
WITH d
AS (SELECT ctid c,
row_number()
OVER (
partition BY s
ORDER BY id) rn
FROM t)
DELETE FROM t
WHERE ctid IN (SELECT c
FROM d
WHERE rn > 1) ;
Demo
I want to know the way we can remove duplicate records where PK is uniqueidentifier.
I have to delete records on the basis of duplicate values in a set of fields.we can use option to get temptable using Row_Number() and except row number one we can delete rest or the records.
But I wanted to build one liner query. Any suggestion?
You can use CTE to do this, without seeing your table structure here is the basic SQL
;with cte as
(
select *, row_number() over(partition by yourfields order by yourfields) rn
from yourTable
)
delete
from cte
where rn > 1
delete from table t using table ta where ta.dup_field=t.dup_field and t.pk >ta.pk
;
I need, if possible, a t-sql query that, returning the values from an arbitrary table, also returns a incremental integer column with value = 1 for the first row, 2 for the second, and so on.
This column does not actually resides in any table, and must be strictly incremental, because the ORDER BY clause could sort the rows of the table and I want the incremental row in perfect shape always.
The solution must run on SQL Server 2000
For SQL 2005 and up
SELECT ROW_NUMBER() OVER( ORDER BY SomeColumn ) AS 'rownumber',*
FROM YourTable
for 2000 you need to do something like this
SELECT IDENTITY(INT, 1,1) AS Rank ,VALUE
INTO #Ranks FROM YourTable WHERE 1=0
INSERT INTO #Ranks
SELECT SomeColumn FROM YourTable
ORDER BY SomeColumn
SELECT * FROM #Ranks
Order By Ranks
see also here Row Number
You can start with a custom number and increment from there, for example you want to add a cheque number for each payment you can do:
select #StartChequeNumber = 3446;
SELECT
((ROW_NUMBER() OVER(ORDER BY AnyColumn)) + #StartChequeNumber ) AS 'ChequeNumber'
,* FROM YourTable
will give the correct cheque number for each row.
Try ROW_NUMBER()
http://msdn.microsoft.com/en-us/library/ms186734.aspx
Example:
SELECT
col1,
col2,
ROW_NUMBER() OVER (ORDER BY col1) AS rownum
FROM tbl
It is ugly and performs badly, but technically this works on any table with at least one unique field AND works in SQL 2000.
SELECT (SELECT COUNT(*) FROM myTable T1 WHERE T1.UniqueField<=T2.UniqueField) as RowNum, T2.OtherField
FROM myTable T2
ORDER By T2.UniqueField
Note: If you use this approach and add a WHERE clause to the outer SELECT, you have to added it to the inner SELECT also if you want the numbers to be continuous.