SQL - How to remove repeating values

SQL - How to remove repeating values - sql

My requirement is to remove the repeating values.
id name surname value
1 Vinduja Vijayan 5
3 Vinduja Vijayan 6
4 Vinduja Vijayan 7
Required output:
id name surname value
1 Vinduja Vijayan 5
3 NuLL Null 6
4 NULL NULL 7

This transformation should usually be applied in the application layer. It is possible to do in SQL, but not recommended, by using row_number() and case:
select id,
(case when row_number() over (partition by name, surname order by id) = 1
then name
end) as name,
(case when row_number() over (partition by name, surname order by id) = 1
then surname
end) as surname
from t
order by id;
Note that the final order by is very, very important. SQL result sets (like tables) are unordered by default. Without an explicit order by, the results could be in any order, and that would mess up your interpretation of the results.

DECLARE #table TABLE (
Id INT
,Name VARCHAR(20)
,Surname VARCHAR(20)
,value INT
);
INSERT into #table(ID,Name,Surname,value)
Select 1,'Vinduja','Vijayan',5
Union
Select 3,'Vinduja','Vijayan',6
Union
Select 4,'Vinduja','Vijayan',7
Select S.Id ,T.Name,T.Surname,S.value from (
Select * ,ROW_NUMBER() Over(Partition by name Order by name) [Row]
From #table)S
Left join #table T On T.Id =S.Id and S.[Row]=1

select
id,
case when rnk=1 then name end as name,
case when rnk=1 then surname end as surname ,
value
from
(
select
id,name,surname,value,
row_number()over(partition by name,surname order by id) as rnk
from table_name)repeatname

I'm not sure I understand your requirements. If you just want to display the data as described, then this won't work. But if you're trying to change the data in your table, this will do that.
DECLARE #Dupes TABLE
(
id INT
,name VARCHAR(30)
,surname VARCHAR(30)
,value INT
);
INSERT #Dupes
(
id
,name
,surname
,value
)
VALUES
(1, 'Vinduja', 'Vijayan', 5),
(3, 'Vinduja', 'Vijayan', 6),
(4, 'Vinduja', 'Vijayan', 7);
WITH cte AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY [name], surname ORDER BY id) AS RowNum
,id
,name
,surname
,value
FROM #Dupes
)
UPDATE cte
SET cte.name = NULL
,cte.surname = NULL
WHERE
cte.RowNum > 1;
SELECT *
FROM #Dupes;
--Results
+----+---------+---------+-------+
| id | name | surname | value |
+----+---------+---------+-------+
| 1 | Vinduja | Vijayan | 5 |
| 3 | NULL | NULL | 6 |
| 4 | NULL | NULL | 7 |
+----+---------+---------+-------+

And just for interest, using the LAG function. I assumed SQL Server.
select id,
iif(name = previous_name, null, name) name,
iif(surname = previous_surname, null, surname) surname
from (
select name, surname, id,
lag(name, 1, null) over (order by name, surname, id) previous_name,
lag(surname, 1, null) over (order by name, surname, id) previous_surname
from table_name ) a
order by a.name, a.surname, a.id

Related

Get top 5 records for each group and Concate them in a Row per group

I have a table Contacts that basically looks like following:
Id | Name | ContactId | Contact | Amount
---------------------------------------------
1 | A | 1 | 12323432 | 555
---------------------------------------------
1 | A | 2 | 23432434 | 349
---------------------------------------------
2 | B | 3 | 98867665 | 297
--------------------------------------------
2 | B | 4 | 88867662 | 142
--------------------------------------------
2 | B | 5 | null | 698
--------------------------------------------
Here, ContactId is unique throughout the table. Contact can be NULL & I would like to exclude those.
Now, I want to select top 5 contacts for each Id based on their Amount. I am accomplished that by following query:
WITH cte AS (
SELECT id, Contact, amount, ROW_NUMBER()
over (
PARTITION BY id
order by amount desc
) AS RowNo
FROM contacts
where contact is not null
)
select *from cte where RowNo <= 5
It's working fine upto this point. Now I want to concate these (<=5) record for each group & show them in a single row by concatenating them.
Expected Result :
Id | Name | Contact
-------------------------------
1 | A | 12323432;23432434
-------------------------------
2 | B | 98867665;88867662
I am using following query to achieve this but it still gives all records in separate rows and also including Null values too:
WITH cte AS (
SELECT id, Contact, amount,contactid, ROW_NUMBER()
over (
PARTITION BY id
order by amount desc
) AS RowNo
FROM contacts
where contact is not null
)
select *from id, name,
STUFF ((
SELECT distinct '; ' + isnull(contact,'') FROM cte
WHERE co.id= cte.id and co.contactid= cte.contactid
and RowNo <= 5
FOR XML PATH('')),1, 1, '')as contact
from contacts co inner join cte where cte.id = co.id and co.contactid= cte.contactid
Above query still gives me all top 5 contacts in diff rows & including null too.
Is it a good idea to use CTE and STUFF togather? Please suggest if there is any better approach than this.

I got the problem with my final query:
I don't need original Contact table in my final Select, since I already have everything I needed in CTE. Also, Inside STUFF(), I'm using contactid to join which is what actually I'm trying to concat here. Since I'm using that condition for join, I am getting records in diff rows. I've removed these 2 condition and it worked.
WITH cte AS (
SELECT id, Contact, amount,contactid, ROW_NUMBER()
over (
PARTITION BY id
order by amount desc
) AS RowNo
FROM contacts
where contact is not null
)
select *from id, name,
STUFF ((
SELECT distinct '; ' + isnull(contact,'') FROM cte
WHERE co.id= cte.id
and RowNo <= 5
FOR XML PATH('')),1, 1, '')as contact
from cte where rowno <= 5

You can use conditional aggregation:
id, name, contact,
select id, name,
concat(max(case when seqnum = 1 then contact + ';' end),
max(case when seqnum = 2 then contact + ';' end),
max(case when seqnum = 3 then contact + ';' end),
max(case when seqnum = 4 then contact + ';' end),
max(case when seqnum = 5 then contact + ';' end)
) as contacts
from (select c.*
row_number() over (partition by id order by amount desc) as seqnum
from contacts c
where contact is not null
) c
group by id, name;

If you are running SQL Server 2017 or higher, you can use string_agg(): as most other aggregate functions, it ignores null values by design.
select id, name, string_agg(contact, ',') within group (order by rn) all_contacts
from (
select id, name, contact
row_number() over (partition by id order by amount desc) as rn
from contacts
where contact is not null
) t
where rn <= 5
group by id, name
Note that you don't strictly need a CTE here; you can return the columns you need from the subquery, and use them directly in the outer query.
In earlier versions, one approach using stuff() and for xml path is:
with cte as (
select id, name, contact,
row_number() over (partition by id order by amount desc) as rn
from contacts
where contact is not null
)
select id, name,
stuff(
(
select ', ' + c1.concat
from cte c1
where c1.id = c.id and c1.rn <= 5
order by c1.rn
for xml path (''), type
).value('.', 'varchar(max)'), 1, 2, ''
) all_contacts
from cte
group by id, name

I agree with #GMB. STRING_AGG() is what you need ...
WITH
contacts(Id,nm,ContactId,Contact,Amount) AS (
SELECT 1,'A',1,12323432,555
UNION ALL SELECT 1,'A',2,23432434,349
UNION ALL SELECT 2,'B',3,98867665,297
UNION ALL SELECT 2,'B',4,88867662,142
UNION ALL SELECT 2,'B',5,NULL ,698
)
,
with_filter_val AS (
SELECT
*
, ROW_NUMBER() OVER(PARTITION BY id ORDER BY amount DESC) AS rn
FROM contacts
)
SELECT
id
, nm
, STRING_AGG(CAST(contact AS CHAR(8)),',') AS contact_list
FROM with_filter_val
WHERE rn <=5
GROUP BY
id
, nm
-- out id | nm | contact_list
-- out ----+----+-------------------
-- out 1 | A | 12323432,23432434
-- out 2 | B | 98867665,88867662

Query to get the list of Addresses that have AddressType one else AddressType two?

Consider the folowing table
Id PersonId Address AddressTypeId
--------------------------------------------------------------------
1 1 AI1P1T1 1
2 1 AI2P1T2 2
3 2 AI3P2T2 2
I want to write a query to print the list of Addresses of Persons who have AddressType =1 or AddressTypeId=2 and
When person has AddressType =1 then select it,
else select person with AddressType =2
Expected result:
Address
--------------
AI1P1T1
AI3P2T2

Good day,
Please check if this solve your needs:
/***************************** DDL+DML */
drop table if exists T;
create table T(Id int,PersonId int, [Address] nvarchar(10), AddressTypeId int)
INSERT T(Id,PersonId, [Address], AddressTypeId)
values
(1,1,'AI1P1T1',1),
(2,1,'AI2P1T2',2),
(3,2,'AI3P2T2',2)
GO
select * from T
GO
/***************************** Solution */
With MyCTE as (
select *, ROW_NUMBER() OVER (partition by PersonId order by AddressTypeId) as RN
from T
)
select [Address]
from MyCTE
where
AddressTypeId in (1,2) -- if there can be only positive numbers then you can use "< 3"
and RN = 1
GO

You can try this also using joins:
select t1.PersonId,t1.Address from #T t1
inner join (select personid,min(AddressTypeId)atype from #T
group by PersonId )x
on x.atype=t1.AddressTypeId and x.PersonId=t1.PersonId

I would write a subquery to make ROW_NUMBER by window function, then use MAX in the main query.
SELECT
PersonId, MAX(Address) Address
FROM
(SELECT
PersonId,
(CASE
WHEN ROW_NUMBER() OVER (PARTITION BY PersonId ORDER BY PersonId) = 1
THEN Address
END) Address
FROM
T
WHERE
AddressTypeId IN (1,2)
) t1
GROUP BY
PersonId
sqlfiddle
[Results]:
| PersonId | Address |
+----------+---------+
| 1 | AI1P1T1 |
| 2 | AI3P2T2 |

Here's the top 1 with ties trick:
select top 1 * with ties
from yourtable
order by row_number() over (partition by PersonId order by AddressTypeId)
This will also work for versions <2012, and can return every field

You could use an union between the result for the result for only 1, only 2 and 1 when 1 and 2
select Address
from my_table m
Inner join (
select PersonId , count(distinct distinct AddressTypeId)
from my_table
where AddressTypeId in (1, 2)
group by PersonId
having count(distinct AddressTypeId) = 2
) t on t.personId = m.personId andm.AddressTypeId = 1
UNION
select Address
from my_table m
Inner join (
select PersonId , count(distinct distinct AddressTypeId)
from my_table
where AddressTypeId in ( 2)
group by PersonId
having count(distinct AddressTypeId) = 1
) t on t.personId = m.personId andm.AddressTypeId = 2
UNION
select Address
from my_table m
Inner join (
select PersonId , count(distinct distinct AddressTypeId)
from my_table
where AddressTypeId in ( 1)
group by PersonId
having count(distinct AddressTypeId) = 1
) t on t.personId = m.personId andm.AddressTypeId = 1

Try this one
select personId, last_value(Address) over(partition by personId order by AddressTypeId) as Address
from table
--use the where statement optionally
--where AddressTypeId in (1,2);

Find top consecutive rows where a column value is equal between them

I need to get all the consecutive top row where a column value is equal between them
my table is:
CREATE TABLE [dbo].[Items](
[Id] [int] NOT NULL,
[IdUser] [int] NOT NULL,
[CreatedDate] [datetime] NOT NULL,
[SomeData] nvarchar(50) NOT NULL);
and i want the top rows (ordered by Id desc) with the same IdUser
if table data is:
Id IdUser CreatedDate SomeData
--- ------- ------------------------ --------
1 1 2017-09-21T09:42:01.407Z sdafsasfa
2 1 2017-09-21T09:42:01.407Z sdafsasfa
4 2 2017-09-21T09:42:01.41Z sdafsasfa
5 3 2017-09-21T09:42:01.41Z sdafsasfa
7 3 2017-09-21T09:42:01.413Z sdafsasfa
8 3 2017-09-21T09:42:01.413Z sdafsasfa
9 10 2017-09-21T09:42:01.417Z sdafsasfa
11 11 2017-09-21T09:42:01.417Z sdafsasfa
12 2 2017-09-21T09:42:01.42Z sdafsasfa
15 2 2017-09-21T09:42:01.42Z sdafsasfa
I want :
Id IdUser CreatedDate SomeData
--- ------- ------------------------ --------
12 2 2017-09-21T09:42:01.42Z sdafsasfa
15 2 2017-09-21T09:42:01.42Z sdafsasfa
if table data is:
Id IdUser CreatedDate SomeData
--- ------- ------------------------ --------
1 1 2017-09-21T09:42:01.407Z sdafsasfa
2 1 2017-09-21T09:42:01.407Z sdafsasfa
4 2 2017-09-21T09:42:01.41Z sdafsasfa
I want :
Id IdUser CreatedDate SomeData
--- ------- ------------------------ --------
4 2 2017-09-21T09:42:01.41Z sdafsasfa
SqlFiddle

you can try this query:
select I.*
from
[dbo].[Items] I
JOIN
(select top 1 Id, IdUser from [dbo].[Items] order by Id desc)I2
on I.Iduser=I2.Iduser
order by Id desc;-- this can be removed to remove ordering by Id Desc
updated fiddle link

You could use LAG and SUM() OVER() like this
DECLARE #Items as Table
(
[Id] [int] NOT NULL,
[IdUser] [int] NOT NULL,
[CreatedDate] [datetime] NOT NULL,
[SomeData] nvarchar(50) NOT NULL
);
INSERT INTO #Items
(
Id,
IdUser,
CreatedDate,
SomeData
)
VALUES
( 1 , 1 ,getdate(),'sdafsasfa'),
( 2 , 1 ,getdate(),'sdafsasfa'),
( 4 , 2 ,getdate(),'sdafsasfa'),
( 5 , 3 ,getdate(),'sdafsasfa'),
( 7 , 3 ,getdate(),'sdafsasfa'),
( 8 , 3 ,getdate(),'sdafsasfa'),
( 9 , 10,getdate(),'sdafsasfa'),
( 11, 11,getdate(),'sdafsasfa'),
( 12, 2 ,getdate(),'sdafsasfa'),
( 15, 2 ,getdate(),'sdafsasfa')
;WITH temp AS
(
SELECT *,
CASE
WHEN lag(i.IdUser) over(ORDER BY i.Id) = i.IdUser THEN 0
ELSE 1
END as ChangingPoint
FROM #Items i
),
temp1 AS
(
SELECT
*,
sum(t.ChangingPoint) OVER(ORDER BY t.Id) as GroupId
FROM temp t
)
SELECT TOP 1 WITH TIES
t.Id,
t.IdUser,
t.CreatedDate,
t.SomeData
FROM temp1 t
ORDER BY GroupId DESC
See demo here: http://rextester.com/PHWWU96232

Assuming you want last rows with highest CreateDate and same IdUser then DENSE_RANK will help
SELECT id, iduser, CreatedDate, somedata
FROM (
SELECT id, iduser, CreatedDate, somedata,
DENSE_RANK() OVER (ORDER BY CreatedDate desc, IdUser) ord
FROM [dbo].[Items]) t
WHERE t.ord = 1
The equivalent SQL query is
SELECT *
FROM Items t1
WHERE NOT EXISTS (
SELECT *
FROM Items t2
WHERE t2.createddate > t1.createddate or
(t2.createddate = t1.createddate and t2.iduser < t1.iduser)
)
demo

Despite TriV's solution works fine I ended up using a modified Radim Bača's solution (his solution dont work as i need) because it is faster IMO
SELECT id, iduser, createddate, somedata
FROM Items t1
WHERE NOT EXISTS (
SELECT 1
FROM Items t2
WHERE t2.id > t1.id and t2.iduser <> t1.iduser );
SQLFiddle

select I.*
from
[dbo].[Items1] I
JOIN
(select top 1 Id, IdUser,CreatedDate from [dbo].[Items1] order by Id desc)I2
on I.CreatedDate=I2.CreatedDate
order by Id desc;-- this can be removed to remove ordering by Id Desc

Two rows in one in SQL server

The answer is too close, Thanks
But
The problem is that if too many records to be entered
| id | name | age | Tel
------------------------------------------
1 | 1 | Frank | 40 | null
2 | 1 | null | 50 | 7834xx
3 | 1 | Alex | null | null
4 | 1 | null | 20 | null
5 | 2 | James | null | 4121xx
Query return the Maximum value
Like:
| id | name | age | Tel
------------------------------------------
1 | 1 | Frank | 50 | 7834xx
i need Select Query like this:
| id | name | age | Tel
------------------------------------------
1 | 1 | Alex | 20 | 7834xx
what do I do? Plz?

Here's a roundabout way to combine the last non-empty value of 3 columns:
-- Using a table variable for test data
declare #Test table (tableId int identity(1,1), id int, name varchar(100), age int, tel varchar(30));
insert into #Test (id, name, age, tel) values
(1,'Frank',40,null),
(1,null,50,'7834xx'),
(1,'Alex',null,null),
(1,null,20,null),
(2,'James',null,'4121xx');
select n.id, n.name, a.age, t.tel
from (
select top(1) with ties id, name
from #Test
where name is not null
order by row_number() over (partition by id order by tableId desc)
) n
inner join (
select top(1) with ties id, age
from #Test
where age is not null
order by row_number() over (partition by id order by tableId desc)
) a on (n.id = a.id)
inner join (
select top(1) with ties id, tel
from #Test
where tel is not null
order by row_number() over (partition by id order by tableId desc)
) t on (n.id = t.id);
or by re-using a CTE
;with CTE AS (
select * ,
row_number() over (partition by id, iif(name is not null,1,0) order by tableId desc) as rn_name,
row_number() over (partition by id, iif(age is not null,1,0) order by tableId desc) as rn_age,
row_number() over (partition by id, iif(tel is not null,1,0) order by tableId desc) as rn_tel
from #Test
)
select n.id, n.name, a.age, t.tel
from CTE n
join CTE a on (a.id = n.id and a.age is not null and a.rn_age = 1)
join CTE t on (t.id = n.id and t.tel is not null and t.rn_tel = 1)
where (n.name is not null and n.rn_name = 1);
Result :
╔════╦══════╦═════╦════════╗
║ id ║ name ║ age ║ tel ║
╠════╬══════╬═════╬════════╣
║ 1 ║ Alex ║ 20 ║ 7834xx ║
╚════╩══════╩═════╩════════╝
After looking at this answer more than a year later.
You could also use the window function first_value for this.
Without using any join.
select Id, name, age, tel
from
(
select Id
, row_number() over (partition by id order by tableId desc) as rn
, first_value(name) over (partition by id order by iif(name is null,1,0), tableId desc) as name
, first_value(age) over (partition by id order by iif(age is null,1,0), tableId desc) as age
, first_value(tel) over (partition by id order by iif(tel is null,1,0), tableId desc) as tel
from #Test
) q
where rn = 1
and name is not null and age is not null and tel is not null;

One simple way is to get max as below:
Select Id, Max(name) as [Name], Max(age) as Age, Max(Tel) as Tel
from yourtable
Group by Id

thanks for answering and helping.
i find this query:
SELECT TOP 10
(SELECT top(1) name FROM test1 where id=1 and name is not null order by autoID desc) as name
,(SELECT top(1) age FROM test1 where id=1 and age is not null order by autoID desc) as Age
,(SELECT top(1) Tel FROM test1 where id=1 and Tel is not null order by autoID desc) as Telephon
FROM [dbo].[test1]
group by id
its worked !!!
but i thing there are another Easy Way , maybe like this:
Select Id, NotNull(name) as [Name], NotNull(age) as Age, NotNull(Tel) as Tel
from yourtable Group by Id
???

How to find max value from each group and display their information when using "group by"

For example, i create a table about people contribue to 2 campaigns
+-------------------------------------+
| ID Name Campaign Amount (USD) |
+-------------------------------------+
| 1 A 1 10 |
| 2 B 1 5 |
| 3 C 2 7 |
| 4 D 2 9 |
+-------------------------------------+
Task: For each campaign, find the person (Name, ID) who contribute the most to
Expected result is
+-----------------------------------------+
| Campaign Name ID |
+-----------------------------------------+
| 1 A 1 |
| 2 D 4 |
+-----------------------------------------+
I used "group by Campaign" but the result have 2 columns "Campagin" and "max value" when I need "Name" and "ID"
Thanks for your help.
Edited: I fix some values, really sorry

You can use analytic functions for this:
select name, id, amount
from (select t.*, max(amount) over (partition by campaign) as max_amount
from t
) t
where amount = max_amount;

You can also do it by giving a rank/row_number partiton by campaign and order by descending order of amount.
Query
;with cte as(
select [num] = dense_rank() over(
partition by [Campaign]
order by [Amount] desc
), *
from [your_table_name]
)
select [Campaign], [Name], [ID]
from cte
where [num] = 1;

Try the next query:-
SELECT Campaign , Name , ID
FROM (
SELECT Campaign , Name , ID , MAX (Amount)
FROM MyTable
GROUP BY Campaign , Name , ID
) temp;

Simply use Where Clause with the max of amount group by Campaign:-
As following generic code:-
select a, b , c
from tablename
where d in
(
select max(d)
from tablename
group by a
)
Demo:-
Create table #MyTable (ID int , Name char(1), Campaign int , Amount int)
go
insert into #MyTable values (1,'A',1,10)
insert into #MyTable values (2,'B',1,5)
insert into #MyTable values (3,'C',2,7)
insert into #MyTable values (4,'D',2,9)
go
select Campaign, Name , ID
from #MyTable
where Amount in
(
select max(Amount)
from #MyTable
group by Campaign
)
drop table #MyTable
Result:-

Please find the below code for the same
SELECT *
FROM #MyTable T
OUTER APPLY (
SELECT COUNT(1) record
FROM #MyTable T1
where t.Campaign = t1.Campaign
and t.amount < t1.amount
)E
where E.record = 0

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL - How to remove repeating values - sql

My requirement is to remove the repeating values. id name surname value 1 Vinduja Vijayan 5 3 Vinduja Vijayan 6 4 Vinduja Vijayan 7 Required output: id name surname value 1 Vinduja Vijayan 5 3 NuLL Null 6 4 NULL NULL 7

select id, case when rnk=1 then name end as name, case when rnk=1 then surname end as surname , value from ( select id,name,surname,value, row_number()over(partition by name,surname order by id) as rnk from table_name)repeatname

Related

Get top 5 records for each group and Concate them in a Row per group

Query to get the list of Addresses that have AddressType one else AddressType two?

Find top consecutive rows where a column value is equal between them

Two rows in one in SQL server

How to find max value from each group and display their information when using "group by"

Categories

Resources