Fixing duplicate rows in a table

Fixing duplicate rows in a table - sql

I have a table like below
DECLARE #ProductTotals TABLE
(
id int,
value nvarchar(50)
)
which has following value
1, 'abc'
2, 'abc'
1, 'abc'
3, 'abc'
I want to update this table so that it has the following values
1, 'abc'
2, 'abc_1'
1, 'abc'
3, 'abc_2'
Could someone help me out with this

Use a cursor to move over the table and try to insert every row in a second temporary table. If you get a collision (technically with a select), you can run a second query to get the maximum number (if any) that's appended to your item.
Once you know what maximum number is used (use isnull to cover the case of the first duplicate) just run an update over your original table and keep going with your scan.

Are you looking to remove duplicates? or just change the values so they aren't duplicate?
to change the values use
update producttotals
set value = 'abc_1'
where id =2;
update producttotals
set value = 'abc_2'
where id =3;
to find duplicate rows do a
select id, value
from producttotals
group by id, value
having count() > 2;

Assuming SQL Server 2005 or greater
DECLARE #ProductTotals TABLE
(
id int,
value nvarchar(50)
)
INSERT INTO #ProductTotals
VALUES (1, 'abc'),
(2, 'abc'),
(1, 'abc'),
(3, 'abc')
;WITH CTE as
(SELECT
ROW_NUMBER() OVER (Partition by value order by id) rn,
id,
value
FROM
#ProductTotals),
new_values as (
SELECT
pt.id,
pt.value,
pt.value + '_' + CAST( ROW_NUMBER() OVER (partition by pt.value order by pt.id) as varchar) new_value
FROM
#ProductTotals pt
INNER JOIN CTE
ON pt.id = CTE.id
and pt.value = CTE.value
WHERE
pt.id NOT IN (SELECT id FROM CTE WHERE rn = 1)) --remove any with the lowest ID for the value
UPDATE
#ProductTotals
SET
pt.value = nv.new_value
FROM
#ProductTotals pt
inner join new_values nv
ON pt.id = nv.id and pt.value = nv.value
SELECT * FROM #ProductTotals
Will produce the following
id value
----------- --------------------------------------------------
1 abc
2 abc_1
1 abc
3 abc_2
Explanation of the SQL
The first CTE creates a row number Value. So the numbering gets restarted whenever it sees a new value
rn id value
-------------------- ----------- --------
1 1 abc
2 1 abc
3 2 abc
4 3 abc
The second CTE called new_values ignores any IDs that are assoicated with with a RN of 1. So rn 1 and rn 2 get removed because they share the same ID. It also uses ROW_NUMBER() again to determine the number for the new_value
id value new_value
----------- ------ -------------
2 abc abc_1
3 abc abc_2
The final statement just updates the Old value with the new value

Related

Count length of consecutive duplicate values for each id

I have a table as shown in the screenshot (first two columns) and I need to create a column like the last one. I'm trying to calculate the length of each sequence of consecutive values for each id.
For this, the last column is required. I played around with
row_number() over (partition by id, value)
but did not have much success, since the circled number was (quite predictably) computed as 2 instead of 1.
Please help!

First of all, we need to have a way to defined how the rows are ordered. For example, in your sample data there is not way to be sure that 'first' row (1, 1) will be always displayed before the 'second' row (1,0).
That's why in my sample data I have added an identity column. In your real case, the details can be order by row ID, date column or something else, but you need to ensure the rows can be sorted via unique criteria.
So, the task is pretty simple:
calculate trigger switch - when value is changed
calculate groups
calculate rows
That's it. I have used common table expression and leave all columns in order to be easy for you to understand the logic. You are free to break this in separate statements and remove some of the columns.
DECLARE #DataSource TABLE
(
[RowID] INT IDENTITY(1, 1)
,[ID]INT
,[value] INT
);
INSERT INTO #DataSource ([ID], [value])
VALUES (1, 1)
,(1, 0)
,(1, 0)
,(1, 1)
,(1, 1)
,(1, 1)
--
,(2, 0)
,(2, 1)
,(2, 0)
,(2, 0);
WITH DataSourceWithSwitch AS
(
SELECT *
,IIF(LAG([value]) OVER (PARTITION BY [ID] ORDER BY [RowID]) = [value], 0, 1) AS [Switch]
FROM #DataSource
), DataSourceWithGroup AS
(
SELECT *
,SUM([Switch]) OVER (PARTITION BY [ID] ORDER BY [RowID]) AS [Group]
FROM DataSourceWithSwitch
)
SELECT *
,ROW_NUMBER() OVER (PARTITION BY [ID], [Group] ORDER BY [RowID]) AS [GroupRowID]
FROM DataSourceWithGroup
ORDER BY [RowID];

You want results that are dependent on actual data ordering in the data source. In SQL you operate on relations, sometimes on ordered set of relations rows. Your desired end result is not well-defined in terms of SQL, unless you introduce an additional column in your source table, over which your data is ordered (e.g. auto-increment or some timestamp column).
Note: this answers the original question and doesn't take into account additional timestamp column mentioned in the comment. I'm not updating my answer since there is already an accepted answer.

One way to solve it could be through a recursive CTE:
create table #tmp (i int identity,id int, value int, rn int);
insert into #tmp (id,value) VALUES
(1,1),(1,0),(1,0),(1,1),(1,1),(1,1),
(2,0),(2,1),(2,0),(2,0);
WITH numbered AS (
SELECT i,id,value, 1 seq FROM #tmp WHERE i=1 UNION ALL
SELECT a.i,a.id,a.value, CASE WHEN a.id=b.id AND a.value=b.value THEN b.seq+1 ELSE 1 END
FROM #tmp a INNER JOIN numbered b ON a.i=b.i+1
)
SELECT * FROM numbered -- OPTION (MAXRECURSION 1000)
This will return the following:
i id value seq
1 1 1 1
2 1 0 1
3 1 0 2
4 1 1 1
5 1 1 2
6 1 1 3
7 2 0 1
8 2 1 1
9 2 0 1
10 2 0 2
See my little demo here: https://rextester.com/ZZEIU93657
A prerequisite for the CTE to work is a sequenced table (e. g. a table with an identitycolumn in it) as a source. In my example I introduced the column i for this. As a starting point I need to find the first entry of the source table. In my case this was the entry with i=1.
For a longer source table you might run into a recursion-limit error as the default for MAXRECURSION is 100. In this case you should uncomment the OPTION setting behind my SELECT clause above. You can either set it to a higher value (like shown) or switch it off completely by setting it to 0.

IMHO, this is easier to do with cursor and loop.
may be there is a way to do the job with selfjoin
declare #t table (id int, val int)
insert into #t (id, val)
select 1 as id, 1 as val
union all select 1, 0
union all select 1, 0
union all select 1, 1
union all select 1, 1
union all select 1, 1
;with cte1 (id , val , num ) as
(
select id, val, row_number() over (ORDER BY (SELECT 1)) as num from #t
)
, cte2 (id, val, num, N) as
(
select id, val, num, 1 from cte1 where num = 1
union all
select t1.id, t1.val, t1.num,
case when t1.id=t2.id and t1.val=t2.val then t2.N + 1 else 1 end
from cte1 t1 inner join cte2 t2 on t1.num = t2.num + 1 where t1.num > 1
)
select * from cte2

Add Min Value on Query Output in Separate Column

I have the following table:
No Item Value
----------------------------
1 A 5
2 B 8
3 C 9
If I use Min function on Value field, then I'll get 5.
My question is, how can I put the MIN value into a new column? Like the following result:
No Item Value newCol
----------------------------
1 A 5 5
2 B 8 5
3 C 9 5
Is it possible to do that?
Thank you.

Something like:
select No, Item, Value, (select min(value) from table)
from table
should do it.

I'd prefer to do the subquery in a join, you'll have to name the field. Something like this;
Sample Data
CREATE TABLE #TestData (No int, item nvarchar(1), value int)
INSERT INTO #TestData (No, item, value)
VALUES
(1,'A',5)
,(2,'B',8)
,(3,'C',9)
Query
SELECT
td.No
,td.item
,td.value
,a.Min_Value
FROM #TestData td
CROSS JOIN
(
SELECT
MIN(Value) Min_Value
FROM #TestData
) a
Result
No item value Min_Value
1 A 5 5
2 B 8 5
3 C 9 5

You could do that even simpler by using an appropriate OVER() clause.
SELECT *
, MIN(Value) OVER () AS [newCol]
FROM Table
This would be simpler and less resource consuming than a (SELECT MIN(Value) FROM TABLE) in the top level SELECT.
Sample code:
DECLARE #tbl TABLE (No int, Item char(1), Value int)
INSERT #tbl VALUES (1, 'A', 5), (2, 'B', 8), (3, 'C', 9)
SELECT *
, MIN(Value) OVER () AS [newCol]
FROM #tbl

Using cross join with min value from table :
SELECT * FROM #Tbl1 CROSS JOIN (SELECT MIN(Value1) Value1 FROM #Tbl1) A

SQL ORDER BY CSV input parameter

I have sample table and query with the issue described here,
CREATE TABLE test
(
ID INT IDENTITY(1, 1),
NAME VARCHAR(250),
VALUE float
)
INSERT INTO test(NAME,[VALUE])VALUES('A',100)
INSERT INTO test(NAME,[VALUE])VALUES('B',200)
INSERT INTO test(NAME,[VALUE])VALUES('C',200)
SELECT * FROM test WHERE ID IN (2,1,3)
ID NAME VALUE
----------- --------- ----------------
1 A 100
2 B 200
3 C 200
QUERY : when I pass (2,1,3) in WHERE clause it should give result in same ORDER as below :
ID NAME VALUE
----------- --------- ----------------
2 B 200
1 A 100
3 C 200

I have no idea why you would expect the results to be in the order of the in list. SQL works with unordered sets; so there is no intrinsic ordering -- unless it is explicitly done with an order by clause.
It looks like you are using SQL Server. You can do what you want with a join:
SELECT t.*
FROM test t JOIN
(VALUES (2, 1), (1, 2), (3, 3)
) v(id, ordering)
ON t.id = v.id
ORDER BY ordering;

If you grab a copy of DelimitedSplit8K you could do this:
-- assuming these values come in as a parameter:
DECLARE #searchString varchar(100) = '2,1,3';
-- solution using delimitedSplit8K
SELECT t.ID, t.Name, t.VALUE
FROM dbo.test t
JOIN dbo.delimitedSplit8K(#searchString,',') s ON s.item = t.id
ORDER BY s.itemNumber;
Results:
ID Name VALUE
----------- ----- ------
2 B 200
1 A 100
3 C 200
What makes this technique particularly wonderful is how, if you examine the query execution plan, there is no Sort Operator.

Interchange value of 2 row in sql table

I have a table which has column name course. In 2nd row course is "C++" and in 4th row course is "ASP.net".
I want to interchange that to value with update query. How can I achieve this?

You can change the values with update, like:
update YourTable set Course = 'ASP.NET' where id = 2
update YourTable set Course = 'C++' where id = 4
or:
update YourTable
set Course =
case id
when 2 then 'ASP.NET'
when 4 then 'C++'
end
where id in (2,4)

Test table and data
create table YourTable(id int primary key, course varchar(10))
insert into YourTable values (1, 'Delphi')
insert into YourTable values (2, 'C++')
insert into YourTable values (3, 'Clipper')
insert into YourTable values (4, 'ASP.net')
Update to switch 2 and 4
update YourTable set
course = case id
when 4 then (select course from YourTable where id = 2)
when 2 then (select course from YourTable where id = 4)
else T.course
end
from
YourTable as T
where T.id in (2, 4)
Result
id course
1 Delphi
2 ASP.net
3 Clipper
4 C++

change "many-to-many" to "one-to-many"

I have following table and data:
create table Foo
(
id int not null,
hid int not null,
value int not null
)
insert into Foo(id, hid, value) values(1,1,1) -- use this as 1 < 3
insert into Foo(id, hid, value) values(1,2,3)
insert into Foo(id, hid, value) values(2,3,3) -- use this as 3 < 5
insert into Foo(id, hid, value) values(2,4,5)
insert into Foo(id, hid, value) values(3,2,2) -- use this or next one as value are the same
insert into Foo(id, hid, value) values(3,3,2)
Currently the "id" and "hid" has many-to-many association, what I want to achieve is to make the "hid" as "one" instead of "many", the rule is to use the minimum "value" in the table, see comment in above sql code.
Is this possible use some query to achieve this instead of a cursor?
Thanks!

SQL 2005:
WITH X AS ( SELECT id, min(value) as minval from Foo group by id )
SELECT * FROM
(
SELECT Foo.*, RANK() OVER ( PARTITION by Foo.id order by Foo.hid, Foo.value ) as Rank
FROM Foo JOIN X on Foo.id = X.id and Foo.value = X.minval
) tmp
WHERE Rank = 1
id hid value Rank
----------- ----------- ----------- --------------------
1 1 1 1
2 3 3 1
3 2 2 1
The first line (WITH clause) gets a set of ids with the min value (my arbitrary choice).
The RANK is used to eliminate duplicates - there may be a better way.
With MySql or SQL 2000 I guess you could do this with a complicated set of subqueries.

Not sure if you are looking for a query, or instructions on how to modify your schema, but here is a query:
select id, min(hid) as hid, min(value) as value
from Foo
group by id

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Fixing duplicate rows in a table - sql

I have a table like below DECLARE #ProductTotals TABLE ( id int, value nvarchar(50) ) which has following value 1, 'abc' 2, 'abc' 1, 'abc' 3, 'abc' I want to update this table so that it has the following values 1, 'abc' 2, 'abc_1' 1, 'abc' 3, 'abc_2' Could someone help me out with this

Related

Count length of consecutive duplicate values for each id

Add Min Value on Query Output in Separate Column

SQL ORDER BY CSV input parameter

Interchange value of 2 row in sql table

change "many-to-many" to "one-to-many"

Categories

Resources