Remove rows with duplicate values [duplicate] - sql

This question already has answers here:
Finding duplicate rows in SQL Server
(18 answers)
Closed 9 years ago.
I have to clean a table with duplicate rows:
id: serial id
gid: group id
url: string <- this is the column that I have to cleanup
One gid may have multiple url values:
id gid url
---- ---- ------------
1 12 www.gmail.com
2 12 www.some.com
3 12 www.some.com <-- duplicate
4 13 www.other.com
5 13 www.milfsome.com <-- not a duplicate
I want to execute one query against the entire table and delete all rows where the gid and url are duplicate. In the above sample, after the delete, I want to have only 1, 2, 4 and 5 remaining.

;WITH x AS
(
SELECT id, gid, url, rn = ROW_NUMBER() OVER
(PARTITION BY gid, url ORDER BY id)
FROM dbo.table
)
SELECT id,gid,url FROM x WHERE rn = 1 -- the rows you'll keep
-- SELECT id,gid,url FROM x WHERE rn > 1 -- the rows you'll delete
-- DELETE x WHERE rn > 1; -- do the delete
Once you're happy with the first select, which indicates the rows you'll keep, remove it and un-comment the second select. Once you're happy with that, which indicates the rows you'll delete, remove it and un-comment the delete.
And if you don't want to delete data, just ignore the commented lines under the SELECT...

SELECT
MIN(id) AS id,
gid,
url
FROM yourTable
GROUP BY gid, url

Related

How to update table by of the records in the table

I have a table in PostgerSQL and I need to make N entries in the table twice and for the first half I need to fill in the partner_id field with the value 1 and the second half with the value partner_id = 2.
i try to `
update USERS_TABLE set user_rule_id = 1;
update USERS_TABLE set user_rule_id = 2 where USERS_TABLE.id > count(*)/2;
`
I depends a lot how precise the number of users have to be that are updated with 1 or 2.
The following would be quite unprecise,a s it doesn't take the exact number of user that already exist8after deleting some rows the numbers doesn't fit anymore.
SELECT * FROM USERS_TABLE
id
user_rule_id
1
1
2
1
3
2
4
2
5
2
SELECT 5
If you have a lot of deleted rows and want still the half of the users, you can choose following approach, which does rely on the id, but at teh actual row number
UPDATE USERS_TABLE1
set user_rule_id = CASE WHEN rn <= (SELECT count(*) FROM USERS_TABLE1)/ 2 then 1
ELSE 2 END
FROM (SELECT id, ROW_NUMBER() OVER( ORDER BY id) rn FROM USERS_TABLE1) t
WHERE USERS_TABLE1.id = t.id;
UPDATE 5
SELECT * FROM USERS_TABLE1
id
user_rule_id
1
1
2
1
3
2
4
2
5
2
SELECT 5
fiddle
In the sample case it it the same result, but when you have a lot of rows and a bunch of the deleted users, the senind will give you quite a good result

Snowflake: Repeating rows based on column value

How to repeat rows based on column value in snowflake using sql.
I tried a few methods but not working such as dual and connect by.
I have two columns: Id and Quantity.
For each ID, there are different values of Quantity.
So if you have a count, you can use a generator:
with ten_rows as (
select row_number() over (order by null) as rn
from table(generator(ROWCOUNT=>10))
), data(id, count) as (
select * from values
(1,2),
(2,4)
)
SELECT
d.*
,r.rn
from data as d
join ten_rows as r
on d.count >= r.rn
order by 1,3;
ID
COUNT
RN
1
2
1
1
2
2
2
4
1
2
4
2
2
4
3
2
4
4
Ok let's start by generating some data. We will create 10 rows, with a QTY. The QTY will be randomly chosen as 1 or 2.
Next we want to duplicate the rows with a QTY of 2 and leave the QTY =1 as they are.
Obviously you can change all parameters above to suit your needs - this solution works super fast and in my opinion way better than table generation.
Simply stack SPLIT_TO_TABLE(), REPEAT() with a LATERAL() join and voila.
WITH TEN_ROWS AS (SELECT ROW_NUMBER()OVER(ORDER BY NULL)SOME_ID,UNIFORM(1,2,RANDOM())QTY FROM TABLE(GENERATOR(ROWCOUNT=>10)))
SELECT
TEN_ROWS.*
FROM
TEN_ROWS,LATERAL SPLIT_TO_TABLE(REPEAT('hire me $10/hour',QTY-1),'hire me $10/hour')ALTERNATIVE_APPROACH;

SQL, select value from row with distinct attribute but max value of another attribute [duplicate]

This question already has answers here:
Retrieving the last record in each group - MySQL
(33 answers)
SQL select only rows with max value on a column [duplicate]
(27 answers)
Closed 1 year ago.
I have, for example, this table:
id
name
campagna
valore
1
u1
1
22
2
u1
2
23
4
u2
1
223
5
u3
2
223
I would select rows that have name value unique(distinct) but the row that has max(campagna) between the rows with same name .
For example I would have :
name
campagna
valore
u1
2
23
u2
1
223
u3
2
223
Can I do it?
Thanks.
You want one result row per name. The row to pick per name is the one with the maximum capagna for the name.
This is usually done with an analytic function, e.g.:
select name, campagna, valore
from
(
select t.*, max(campagna) over (partition by name) as max_campagna
from mytable t
)
where campagna = max_campagna
order by name;
But it can be achieved with other methods, too. E.g.:
select *
from mytable
where (name, campagna) in
(
select name, max(campagna)
from mytable
group by name
)
order by name;
or
select *
from mytable t
where not exists
(
select null
from mytable t2
where t2.name = t.name
and t2.campagna > t.campagna
)
order by name;

Keyset pagination with composite key

I am using oracle 12c database and I have a table with the following structure:
Id NUMBER
SeqNo NUMBER
Val NUMBER
Valid VARCHAR2
A composite primary key is created with the field Id and SeqNo.
I would like to fetch the data with Valid = 'Y' and apply ketset pagination with a page size of 3. Assume I have the following data:
Id SeqNo Val Valid
1 1 10 Y
1 2 20 N
1 3 30 Y
1 4 40 Y
1 5 50 Y
2 1 100 Y
2 2 200 Y
Expected result:
----------------------------
Page 1
----------------------------
Id SeqNo Val Valid
1 1 10 Y
1 3 30 Y
1 4 40 Y
----------------------------
Page 2
----------------------------
Id SeqNo Val Valid
1 5 50 Y
2 1 100 Y
2 2 200 Y
Offset pagination can be done like this:
SELECT * FROM table ORDER BY Id, SeqNo OFFSET 3 ROWS FETCH NEXT 3 ROWS ONLY;
However, in the actual db it has more than 5 millions of records and using OFFSET is going to slow down the query a lot. Therefore, I am looking for a ketset pagination approach (skip records using some unique fields instead of OFFSET)
Since a composite primary key is used, I need to offset the page with information from more than 1 field.
This is a sample SQL that should work in PostgreSQL (fetch 2nd page):
SELECT * FROM table WHERE (Id, SeqNo) > (1, 4) AND Valid = 'Y' ORDER BY Id, SeqNo LIMIT 3;
How do I achieve the same in oracle?
Use row_number() analytic function with ceil arithmetic fuction. Arithmetic functions don't have a negative impact on performance, and row_number() over (order by ...) expression automatically orders the data without considering the insertion order, and without adding an extra order by clause for the main query. So, consider :
select Id,SeqNo,
ceil(row_number() over (order by Id,SeqNo)/3) as page
from tab
where Valid = 'Y';
P.S. It also works for Oracle 11g, while OFFSET 3 ROWS FETCH NEXT 3 ROWS ONLY works only for Oracle 12c.
Demo
You can use order by and then fetch rows using fetch and offset like following:
Select ID, SEQ, VAL, VALID FROM TABLE
WHERE VALID = 'Y'
ORDER BY ID, SEQ
--FETCH FIRST 3 ROWS ONLY -- first page
--OFFSET 3 ROWS FETCH NEXT 3 ROWS ONLY -- second pages
--OFFSET 6 ROWS FETCH NEXT 3 ROWS ONLY -- third page
--Update--
You can use row_number analytical function as following.
Select id, seqNo, Val, valid from
(Select t.*,
Row_number(order by id, seq) as rn from table t
Where valid = 'Y')
Where ceil(rn/3) = 2 -- for page no. 2
Cheers!!

How to delete all duplicate records from SQL Table?

Hello I have table name FriendsData that contains duplicate records as shown below
fID UserID FriendsID IsSpecial CreatedBy
-----------------------------------------------------------------
1 10 11 FALSE 1
2 11 5 FALSE 1
3 10 11 FALSE 1
4 5 25 FALSE 1
5 10 11 FALSE 1
6 12 11 FALSE 1
7 11 5 FALSE 1
8 10 11 FALSE 1
9 12 11 FALSE 1
I want to remove duplicate combinations rows using MS SQL?
Remove latest duplicate records from MS SQL FriendsData table.
here I attached image which highlights duplicate column combinations.
How I can removed all duplicate combinations from SQL table?
Try this
DELETE
FROM FriendsData
WHERE fID NOT IN
(
SELECT MIN(fID)
FROM FriendsData
GROUP BY UserID, FriendsID)
See here
Or here is more ways to do what you want
Hope this helps
It seems counter-intuitive, but you can delete from a common table expression (under certain circumstances). So, I'd do it like so:
with cte as (
select *,
row_number() over (partition by userid, friendsid order by fid) as [rn]
from FriendsData
)
delete cte where [rn] <> 1
This will keep the record with the lowest fid. If you want something else, change the order by clause in the over clause.
If it's an option, put a uniqueness constraint on the table so you don't have to keep doing this. It doesn't help to bail out a boat if you still have a leak!
I don't know if the syntax is correct for MS-SQL, but in MySQL, the query would look like:
DELETE FROM FriendsData WHERE fID
NOT IN ( SELECT fID FROM FriendsData
GROUP BY UserID, FriendsUserID, IsSpecial, CreatedBy)
In the GROUP BY clause you put the columns you need to be identical in order to consider two records duplicate
Try this query,
select * from FriendsData f1, FriendsData f2
Where f1.fID=f2.fID and f1.UserID =f2.UserID and f1.FriendsID =f2.FriendsID
If it returns you the duplicate rows, then replace Select * by "Delete"
that will solve your problem
Works in Postgres:
DELETE from "FriendsData" where "fID" in
(SELECT "fID" from
(SELECT *, ROW_NUMBER() OVER(PARTITION BY "UserID", "FriendsID" ORDER BY "fID") as rn
FROM "FriendsData") as inner1
WHERE rn > 1);