How to display duplicates in SQL only if another column is different? - sql

So say I have this table:
Name
Role
First
Science
First
Math
First
Science
First
Math
Second
Science
Third
Math
Third
Math
I want to display a column of duplicates for Name/Role ONLY if role is different in each group. So the final result should be like this:
Name
Role
First
Science
First
Math
This is the only person that has a different role for the same name (no matter how many times that specific combination is duplicated). That's why even though Third/Math is also duplicated, it doesn't matter because it's the same combination.
I tried doing a CTE as follows:
;with cte as (
Select Name, Role, ROW_NUMBER() over (partition by name order by name) as 'rownum1'
from U.Users
group by u.name, u.role)
so then select * from cte where rownum > 1 gets me my names of people that have this issue but it doesn't display the duplicate roles for that user. Not sure how I should approach it differently?
If I join the CTE table to the original Users table, I also get the single entries.

You can take advantage of the fact that window functions are applied after aggregation:
select name, role
from (
select name, role, count(1) over (partition by name) c
from user_role
group by name, role
) r
where c > 1
https://www.db-fiddle.com/f/vzRDgBXwYp3VpgNyfn9qzL/0

You can try something like this:
WITH cte1 as (
SELECT distinct *
FROM
table1
),
cte2 as
(
Select Name, Role, ROW_NUMBER() over (partition by name order by name) as rnk
from cte1 u
group by u.name, u.role
)
SELECT * FROM cte2
where name in
(select name
from cte2
WHERE rnk > 1
group by name
)
I used a distinct function to remove any duplicates, then use the ROW_NUMBER() like you to find Names with multiple rows.
db fiddle link

So after I posted question I tried this which isn't as elegant as Kurt's answer but did also work:
;with cte as (select name, role, row_number() over (partition by name order by name) rownum
from user_role
group by name, role)
select distinct user_role.name, user_role.role from user_role
join cte on cte.name=user_role.name and cte.role=user_role.role
where user_role.name in (select name from cte where rownum =2)
https://www.db-fiddle.com/f/vzRDgBXwYp3VpgNyfn9qzL/2

Related

SQL query to select up to N records per criteria

I was wondering if it's possible to use SQL (preferably snowflake) to select up to N records given certain criteria.
To illustrate:
Lets say I have a table with 1 million records, containing full names and phone numbers.
There's no limits on the amount of phone numbers that can be assigned to X person, but I only want to select up to 10 numbers per person, even if the person has more than 10.
Notice I don't want to select just 10 records, I want the query to return every name in the table, I only want to ignore extra phone numbers when the person already has 10 of them.
Can this be done?
You can use window functions to solve this greatest-n-per-group problem:
select t.*
from (
select
t.*,
row_number() over(partition by name order by phone_number) rn
from mytable t
) t
where rn <= 10
Note that you need an ordering column to define what "top 10" actually means. I assumed phone_number, but you can change that to whatever suits your use case best.
Better yet: as commented by waldente, snowflake has the qualify syntax, which removes the need for a subquery:
select t.*
from mytable t
qualify row_number() over(partition by name order by phone_number) <= 10
This query will help your requirement:
select
full_name,
phonenumber
from
(select
full_name,
phonenumber,
ROW_NUMBER() over (partition by phonenumber order by full_name desc) as ROW_NUMBER from sample_tab) a
where
a.row_number between 1 and 10
order by
full_name asc,
phonenumber desc;
using Snowflake Qualify function:
select
full_name,
phonenumber
from
sample_tab qualify row_number() over (partition by phonenumber order by full_name) between 1 and 10
order by
full_name asc ,
phonenumber desc;

function that allows grouping of rows

I'm using SQL Server Management Studio 2012. I have a similar looking output from a query shown below. I want to eliminate someone from the query who has 2 contracts.
Select
Row_Number() over (partition by ID ORDER BY ContractypeDescription DESC) as [Row_Number],
Name,
ContractDescription,
Role
From table
Output
Row_Number ID Name Contract Description Role
1 1234 Mike FullTime Admin
2 1234 Mike Temp Manager
1 5678 Dave FullTime Admin
1 9785 Liz FullTime Admin
What I would like to see
Row_Number ID Name Contract Description Role
1 5678 Dave FullTime Admin
1 9785 Liz FullTime Admin
Is there a function rather than Row_Number that allows you to group rows together so I can then use something like 'where Row_Number not like 1 and 2'?
You can use HAVING as
SELECT ID,
MAX(Name) Name,
MAX(ContractDescription) ContractDescription,
MAX(Role) Role
FROM t
GROUP BY ID
HAVING COUNT(*) = 1;
Demo
Try this:
select * from (
Select
Count(*) over (partition by ID ) as [Row_Number],
Name,
ContractDescription,
Role
From table
)t where [Row_Number] = 1
You can check this option-
SELECT *
FROM table
WHERE ID IN
(
SELECT ID
FROM table
GROUP BY ID
HAVING COUNT(*) = 1
)
You can use a CTE to get all the ids of people who got only one contract and then just join the result of the CTE with your table.
;with cte as (
select
id
,COUNT(id) as no
from #tbl
group by id
having COUNT(id) = 1
)
select
t.id
,t.name
,t.ContractDescription
,t.role
from #tbl t
inner join cte
on t.id = cte.id
Basically you need those record who have exactly one contract.
Just extend your script, (My script is not tested)
;with CTE as
(
Select
Row_Number() over (partition by ID ORDER BY ContractypeDescription DESC) as [Row_Number],
Name,
ContractDescription,
Role
From table
)
select * from CTE c where [Row_Number]=1
and not exists(select 1 from CTE c1 where c.id=c1.id and c1.[Row_Number]>1 )
Is there a function rather than Row_Number that allows you to group
rows together so I can then use something like 'where Row_Number not
like 1 and 2'?
You can use a windowed COUNT(). The key is the OVER() clause.
;WITH WindowedCount AS
(
SELECT
T.*,
WindowCount = COUNT(1) OVER (PARTITION BY T.ID)
FROM
YourTable AS T
)
DELETE W FROM
WindowedCount AS W
WHERE
W.WindowCount > 1
The COUNT() will count the amount of rows for each different ID, so if the same ID appears in 2 or more rows, those rows will be deleted.

Distinct rows in a table in sql

I have a table with multiple rows of the same member id. I need only distinct rows based on 2 unique columns
Ex: there are 100 different customers, the table has 1000 rows because every customer has multiple cities and segments assigned to him.
I need 100 distinct rows for these customers depending on a unique segment and city combination. There is no specific requirement for this combination, just the first from the table is fine.
So, currently the table is somewhat like this,
Hope this helps.
use row_number()
select * from (select *,row_number() over(partition by memberid order by sales) rn
from table_name
) a where a.rn=1
Handy sql-server top(1) with ties syntax for that
select top(1) with ties t.*
from table_name t
order by row_number() over(partition by memberid order by sales)
As you have no paticular requirement for which exactly row to select, any column will do at order by, it can be null as well
select top(1) with ties t.*
from table_name t
order by row_number() over(partition by memberid order by (select null))
The simplest way to do this is to use the ROW_NUMBER() OVER(GROUP BY...) syntax. You have no need to use an order by, since you want an arbitrary row, but only one, for each member.
Since you need only the expected data, and not the Row_Number value, make sure that you detail the fields returned, like below:
SELECT
MemberId,
city,
segment,
sales
FROM (
SELECT *
ROW_NUMBER() OVER (GROUP BY MemberId) as Seq
FROM [Status]
) src
WHERE Seq = 1

DB ORACLE QUERY

I have a table where storing details
ID NAME
1 A
2 A
1 A
I need the output like
ID Name Count
1,2 A 3
Please help to get the output like that in oracle select query
In Oracle, you can use listagg(), but it has no distinct option. So, use a subquery and two levels of aggregation:
select listagg(id, ',') within group (order by id) as id, name, sum(cnt)
from (select id, name, count(*) as cnt
from t
group by id, name
) x
group by name;

Getting the smaller index for each duplicate in SQL

Let's say I have a table with two columns, one column for the ID and another for a Name. All the names in this table appear more than once.
How can I get all the IDs in the table excluding the smallest IDs for each Name?
In SQL Server 2005+ you could go like this:
SELECT ID FROM atable
EXCEPT
SELECT MIN(ID) FROM atable GROUP BY Name
I would use a CTE (Common Table Expression) using the ROW_NUMBER() ranking function for that:
;WITH GroupedNames AS
(
SELECT ID, Name,
ROW_NUMBER() OVER(PARTITION BY Name ORDER BY ID) AS 'RowNum'
FROM
dbo.YourTable
)
SELECT *
FROM GroupedNames
This will "partition" your data by means, e.g. create groups by name, and each group will get consecutive numbers starting at 1. This way, you can easily select everything except the entry (ID, Name) with the smallest ID with this:
.....
SELECT *
FROM GroupedNames
WHERE RowNum > 1
and if you need to, you can even use this construct to actually delete all those names with a row number bigger than 1 (all the "duplicates"):
;WITH GroupedNames AS
(
SELECT ID, Name,
ROW_NUMBER() OVER(PARTITION BY Name ORDER BY ID) AS 'RowNum'
FROM
dbo.YourTable
)
DELETE FROM GroupedNames
WHERE RowNum > 1
Maybe this would work?
SELECT id FROM table WHERE id NOT IN (SELECT MIN(id) FROM table GROUP BY name)
SELECT DISTINCT b.id
FROM yourTable a
JOIN yourTable b
ON a.name = b.name
AND a.id < b.id