I got duplicates by using the below query
select candidateid
from table1 table with(nolock)
where status in (1,0)
group by candidateid, bgvtype, DepartmentId, BUId, CustomerId, ProjectId
having
COUNT(candidateid)>1 and COUNT(bgvtype)>1 and
COUNT(DepartmentId)>1 and COUNT(BUId)>1 and
COUNT(CustomerId)>1 and COUNT(ProjectId)>1
And I got the below result when I exec
select * from table1 where candidateid=?
I should ignore 1st record since its project id is different and I need all other records for Ref. I have given one candidate id in the image but in table we have a lot of duplicates. I need to get the record "id" only when all the columns are matched
If you want the original rows that are duplicated on the specified columns, then you can use window functions:
select t.*
from (select t.*,
count(*) over (partition by candidateid, bgvtype, DepartmentId, BUId, CustomerId, ProjectId) as cnt
from table1 t
where status in (1,0)
) t
where cnt >= 2;
I have columns app_id,http_host composed of the below rows: a-aa, a-bb, a-cc, b-hh. I want to calculate how much distinct values i have in column app_id for each distinct http_host. I want to get a-3, b-1
select app_id, http_host,
ROW_NUMBER() over (PARTITION by app_id order by app_id desc) as rowNum
FROM "public"."bus_request"
group by app_id, http_host
I was thinking it will partition by app_id, it will have 1 for a and 2 for b but this is incorrect.
use distinct count
select app_id, count(distinct http_host)
from tablename
group by app_id
use distinct count
select http_host,count(distinct app_id)
FROM "public"."bus_request"
group by http_host
I have a table with multiple rows of the same member id. I need only distinct rows based on 2 unique columns
Ex: there are 100 different customers, the table has 1000 rows because every customer has multiple cities and segments assigned to him.
I need 100 distinct rows for these customers depending on a unique segment and city combination. There is no specific requirement for this combination, just the first from the table is fine.
So, currently the table is somewhat like this,
Hope this helps.
use row_number()
select * from (select *,row_number() over(partition by memberid order by sales) rn
from table_name
) a where a.rn=1
Handy sql-server top(1) with ties syntax for that
select top(1) with ties t.*
from table_name t
order by row_number() over(partition by memberid order by sales)
As you have no paticular requirement for which exactly row to select, any column will do at order by, it can be null as well
select top(1) with ties t.*
from table_name t
order by row_number() over(partition by memberid order by (select null))
The simplest way to do this is to use the ROW_NUMBER() OVER(GROUP BY...) syntax. You have no need to use an order by, since you want an arbitrary row, but only one, for each member.
Since you need only the expected data, and not the Row_Number value, make sure that you detail the fields returned, like below:
SELECT
MemberId,
city,
segment,
sales
FROM (
SELECT *
ROW_NUMBER() OVER (GROUP BY MemberId) as Seq
FROM [Status]
) src
WHERE Seq = 1
I have a PostgreSQL database table called "user_links" which currently allows the following duplicate fields:
year, user_id, sid, cid
The unique constraint is currently the first field called "id", however I am now looking to add a constraint to make sure the year, user_id, sid and cid are all unique but I cannot apply the constraint because duplicate values already exist which violate this constraint.
Is there a way to find all duplicates?
The basic idea will be using a nested query with count aggregation:
select * from yourTable ou
where (select count(*) from yourTable inr
where inr.sid = ou.sid) > 1
You can adjust the where clause in the inner query to narrow the search.
There is another good solution for that mentioned in the comments, (but not everyone reads them):
select Column1, Column2, count(*)
from yourTable
group by Column1, Column2
HAVING count(*) > 1
Or shorter:
SELECT (yourTable.*)::text, count(*)
FROM yourTable
GROUP BY yourTable.*
HAVING count(*) > 1
From "Find duplicate rows with PostgreSQL" here's smart solution:
select * from (
SELECT id,
ROW_NUMBER() OVER(PARTITION BY column1, column2 ORDER BY id asc) AS Row
FROM tbl
) dups
where
dups.Row > 1
In order to make it easier I assume that you wish to apply a unique constraint only for column year and the primary key is a column named id.
In order to find duplicate values you should run,
SELECT year, COUNT(id)
FROM YOUR_TABLE
GROUP BY year
HAVING COUNT(id) > 1
ORDER BY COUNT(id);
Using the sql statement above you get a table which contains all the duplicate years in your table. In order to delete all the duplicates except of the the latest duplicate entry you should use the above sql statement.
DELETE
FROM YOUR_TABLE A USING YOUR_TABLE_AGAIN B
WHERE A.year=B.year AND A.id<B.id;
You can join to the same table on the fields that would be duplicated and then anti-join on the id field. Select the id field from the first table alias (tn1) and then use the array_agg function on the id field of the second table alias. Finally, for the array_agg function to work properly, you will group the results by the tn1.id field. This will produce a result set that contains the the id of a record and an array of all the id's that fit the join conditions.
select tn1.id,
array_agg(tn2.id) as duplicate_entries,
from table_name tn1 join table_name tn2 on
tn1.year = tn2.year
and tn1.sid = tn2.sid
and tn1.user_id = tn2.user_id
and tn1.cid = tn2.cid
and tn1.id <> tn2.id
group by tn1.id;
Obviously, id's that will be in the duplicate_entries array for one id, will also have their own entries in the result set. You will have to use this result set to decide which id you want to become the source of 'truth.' The one record that shouldn't get deleted. Maybe you could do something like this:
with dupe_set as (
select tn1.id,
array_agg(tn2.id) as duplicate_entries,
from table_name tn1 join table_name tn2 on
tn1.year = tn2.year
and tn1.sid = tn2.sid
and tn1.user_id = tn2.user_id
and tn1.cid = tn2.cid
and tn1.id <> tn2.id
group by tn1.id
order by tn1.id asc)
select ds.id from dupe_set ds where not exists
(select de from unnest(ds.duplicate_entries) as de where de < ds.id)
Selects the lowest number ID's that have duplicates (assuming the ID is increasing int PK). These would be the ID's that you would keep around.
Inspired by Sandro Wiggers, I did something similiar to
WITH ordered AS (
SELECT id,year, user_id, sid, cid,
rank() OVER (PARTITION BY year, user_id, sid, cid ORDER BY id) AS rnk
FROM user_links
),
to_delete AS (
SELECT id
FROM ordered
WHERE rnk > 1
)
DELETE
FROM user_links
USING to_delete
WHERE user_link.id = to_delete.id;
If you want to test it, change it slightly:
WITH ordered AS (
SELECT id,year, user_id, sid, cid,
rank() OVER (PARTITION BY year, user_id, sid, cid ORDER BY id) AS rnk
FROM user_links
),
to_delete AS (
SELECT id,year,user_id,sid, cid
FROM ordered
WHERE rnk > 1
)
SELECT * FROM to_delete;
This will give an overview of what is going to be deleted (there is no problem to keep year,user_id,sid,cid in the to_delete query when running the deletion, but then they are not needed)
In your case, because of the constraint you need to delete the duplicated records.
Find the duplicated rows
Organize them by created_at date - in this case I'm keeping the oldest
Delete the records with USING to filter the right rows
WITH duplicated AS (
SELECT id,
count(*)
FROM products
GROUP BY id
HAVING count(*) > 1),
ordered AS (
SELECT p.id,
created_at,
rank() OVER (partition BY p.id ORDER BY p.created_at) AS rnk
FROM products o
JOIN duplicated d ON d.id = p.id ),
products_to_delete AS (
SELECT id,
created_at
FROM ordered
WHERE rnk = 2
)
DELETE
FROM products
USING products_to_delete
WHERE products.id = products_to_delete.id
AND products.created_at = products_to_delete.created_at;
Following SQL syntax provides better performance while checking for duplicate rows.
SELECT id, count(id)
FROM table1
GROUP BY id
HAVING count(id) > 1
begin;
create table user_links(id serial,year bigint, user_id bigint, sid bigint, cid bigint);
insert into user_links(year, user_id, sid, cid) values (null,null,null,null),
(null,null,null,null), (null,null,null,null),
(1,2,3,4), (1,2,3,4),
(1,2,3,4),(1,1,3,8),
(1,1,3,9),
(1,null,null,null),(1,null,null,null);
commit;
set operation with distinct and except.
(select id, year, user_id, sid, cid from user_links order by 1)
except
select distinct on (year, user_id, sid, cid) id, year, user_id, sid, cid
from user_links order by 1;
except all also works. Since id serial make all rows unique.
(select id, year, user_id, sid, cid from user_links order by 1)
except all
select distinct on (year, user_id, sid, cid)
id, year, user_id, sid, cid from user_links order by 1;
So far works nulls and non-nulls.
delete:
with a as(
(select id, year, user_id, sid, cid from user_links order by 1)
except all
select distinct on (year, user_id, sid, cid)
id, year, user_id, sid, cid from user_links order by 1)
delete from user_links using a where user_links.id = a.id returning *;