Select the duplicate rows with specific values - sql

How can I only get the data with the same ID, but not the same Name?
The following is the example to explain my thought. Thanks.
ID Name Date
123 Amy 08/03/2022
123 Amy 12/03/2022
456 Billy 08/03/2022
456 Cat 09/03/2022
789 Peter 10/03/2022
Expected Output:
ID Name Date
456 Billy 08/03/2022
456 Cat 09/03/2022
How I have done.
select ID, Name, count(*)
from table
groupby ID, Name
having count(*) > 1
But the result included the following parts that I do not want it.
ID Name Date
123 Amy 08/03/2022
123 Amy 12/03/2022

One approach would be to use a subquery to identify IDs that have multiple names.
SELECT *
FROM YourTable
WHERE ID IN (SELECT ID FROM YourTable GROUP BY ID HAVING COUNT(DISTINCT Name) > 1)

I'd join the table to its self like this:
SELECT DISTINCT
a.Id as ID_A,
b.Id as ID_B,
a.[Name] as Name_A
FROM
Test as a
INNER JOIN Test as b
ON A.Id = B.Id
WHERE
A.[Name] <> B.[Name]

Do you want
SELECT * FROM table_name
WHERE ID = 456;
or
SELECT * FROM table_name
WHERE ID IN
(SELECT
ID
FROM table_name
GROUP BY ID
HAVING COUNT(DISTINCT name) > 1
);
?

Window functions are likely to be the most efficient here. They do not require self-joining of the source table.
Unfortunately, SQL Server does not support COUNT(DISTINCT as a window function. But we can simulate it by using DENSE_RANK and MAX
WITH DistinctRanks AS (
SELECT *,
rnk = DENSE_RANK(*) OVER (PARTITION BY ID ORDER BY Name)
FROM YourTable
),
MaxRanks AS (
SELECT *,
mr = MAX(rnk) OVER (PARTITION BY ID)
FROM DistinctRanks
)
SELECT
ID,
Name,
Count
FROM MaxRanks t
WHERE t.mr > 1;

Related

Get top 5 records for each group and Concate them in a Row per group

I have a table Contacts that basically looks like following:
Id | Name | ContactId | Contact | Amount
---------------------------------------------
1 | A | 1 | 12323432 | 555
---------------------------------------------
1 | A | 2 | 23432434 | 349
---------------------------------------------
2 | B | 3 | 98867665 | 297
--------------------------------------------
2 | B | 4 | 88867662 | 142
--------------------------------------------
2 | B | 5 | null | 698
--------------------------------------------
Here, ContactId is unique throughout the table. Contact can be NULL & I would like to exclude those.
Now, I want to select top 5 contacts for each Id based on their Amount. I am accomplished that by following query:
WITH cte AS (
SELECT id, Contact, amount, ROW_NUMBER()
over (
PARTITION BY id
order by amount desc
) AS RowNo
FROM contacts
where contact is not null
)
select *from cte where RowNo <= 5
It's working fine upto this point. Now I want to concate these (<=5) record for each group & show them in a single row by concatenating them.
Expected Result :
Id | Name | Contact
-------------------------------
1 | A | 12323432;23432434
-------------------------------
2 | B | 98867665;88867662
I am using following query to achieve this but it still gives all records in separate rows and also including Null values too:
WITH cte AS (
SELECT id, Contact, amount,contactid, ROW_NUMBER()
over (
PARTITION BY id
order by amount desc
) AS RowNo
FROM contacts
where contact is not null
)
select *from id, name,
STUFF ((
SELECT distinct '; ' + isnull(contact,'') FROM cte
WHERE co.id= cte.id and co.contactid= cte.contactid
and RowNo <= 5
FOR XML PATH('')),1, 1, '')as contact
from contacts co inner join cte where cte.id = co.id and co.contactid= cte.contactid
Above query still gives me all top 5 contacts in diff rows & including null too.
Is it a good idea to use CTE and STUFF togather? Please suggest if there is any better approach than this.
I got the problem with my final query:
I don't need original Contact table in my final Select, since I already have everything I needed in CTE. Also, Inside STUFF(), I'm using contactid to join which is what actually I'm trying to concat here. Since I'm using that condition for join, I am getting records in diff rows. I've removed these 2 condition and it worked.
WITH cte AS (
SELECT id, Contact, amount,contactid, ROW_NUMBER()
over (
PARTITION BY id
order by amount desc
) AS RowNo
FROM contacts
where contact is not null
)
select *from id, name,
STUFF ((
SELECT distinct '; ' + isnull(contact,'') FROM cte
WHERE co.id= cte.id
and RowNo <= 5
FOR XML PATH('')),1, 1, '')as contact
from cte where rowno <= 5
You can use conditional aggregation:
id, name, contact,
select id, name,
concat(max(case when seqnum = 1 then contact + ';' end),
max(case when seqnum = 2 then contact + ';' end),
max(case when seqnum = 3 then contact + ';' end),
max(case when seqnum = 4 then contact + ';' end),
max(case when seqnum = 5 then contact + ';' end)
) as contacts
from (select c.*
row_number() over (partition by id order by amount desc) as seqnum
from contacts c
where contact is not null
) c
group by id, name;
If you are running SQL Server 2017 or higher, you can use string_agg(): as most other aggregate functions, it ignores null values by design.
select id, name, string_agg(contact, ',') within group (order by rn) all_contacts
from (
select id, name, contact
row_number() over (partition by id order by amount desc) as rn
from contacts
where contact is not null
) t
where rn <= 5
group by id, name
Note that you don't strictly need a CTE here; you can return the columns you need from the subquery, and use them directly in the outer query.
In earlier versions, one approach using stuff() and for xml path is:
with cte as (
select id, name, contact,
row_number() over (partition by id order by amount desc) as rn
from contacts
where contact is not null
)
select id, name,
stuff(
(
select ', ' + c1.concat
from cte c1
where c1.id = c.id and c1.rn <= 5
order by c1.rn
for xml path (''), type
).value('.', 'varchar(max)'), 1, 2, ''
) all_contacts
from cte
group by id, name
I agree with #GMB. STRING_AGG() is what you need ...
WITH
contacts(Id,nm,ContactId,Contact,Amount) AS (
SELECT 1,'A',1,12323432,555
UNION ALL SELECT 1,'A',2,23432434,349
UNION ALL SELECT 2,'B',3,98867665,297
UNION ALL SELECT 2,'B',4,88867662,142
UNION ALL SELECT 2,'B',5,NULL ,698
)
,
with_filter_val AS (
SELECT
*
, ROW_NUMBER() OVER(PARTITION BY id ORDER BY amount DESC) AS rn
FROM contacts
)
SELECT
id
, nm
, STRING_AGG(CAST(contact AS CHAR(8)),',') AS contact_list
FROM with_filter_val
WHERE rn <=5
GROUP BY
id
, nm
-- out id | nm | contact_list
-- out ----+----+-------------------
-- out 1 | A | 12323432,23432434
-- out 2 | B | 98867665,88867662

Procedure to copy data from a table to another table in SQL Server

I have a table A, with 4 columns:
first_name, invoice, value, date.
And a table B (first_name, max_invoice_name, max_invoice_value, last_date)
I want to create a procedure in order to move data from A, to B, but:
first_name should be one time in B,
max_invoice_name is the name of the max invoice value
max_invoice_value is the max value
last_date is the latest date from invoices from the same first_name.
For example:
TABLE A:
Smith | Invoice1 | 100 | 23.06.2016
John | Invoice13 | 23 | 18.07.2016
Smith | Invoice3 | 200 | 01.01.2015
Table B should be:
Smith |Invoice3 | 200 | 23.06.2016
John |Invoice13| 23 | 18.07.2016
Something like this should work:
select *, (select max(date) from #Table1 T1 where T1.first_name = X.first_name)
from (
select
*,
row_number() over (partition by first_name order by invoice_Value desc) as RN
from
#Table1
) X
where RN = 1
Row number takes care of selecting the row with biggest value, and the max get's the date. You'll need to list the columns in correct place instead of *
You will need to create 2 scalar functions getMaxNameForMaxValue AND getLastDateByFirstName to get the values you want.
INSERT INTO TableB (first_name, max_invoice_name, max_invoice_value, last_date) (SELECT DISTINCT first_name, getMaxNameForMaxValue(MAX(max_value)) AS 'max_invoice_name', MAX(max_invoice_value) AS 'max_invoice_value', getLastDateByFirstName(first_name) AS 'lastDate' FROM Table A)
You can use something like this:
--INSERT INTO TableB
SELECT first_name,
invoice_name,
invoice_value,
last_date
FROM (
SELECT a.first_name,
a.invoice_name,
a.invoice_value,
COALESCE(p.last_date,a.last_date) as last_date,
ROW_NUMBER() OVER (PARTITION BY a.first_name ORDER BY a.last_date) as rn
FROM TableA a
OUTER APPLY (SELECT TOP 1 * FROM TableA WHERE first_name = a.first_name and last_date > a.last_date) as p
) as res
WHERE rn = 1
As output:
first_name invoice_name invoice_value last_date
John Invoice13 23 2016-07-18
Smith Invoice3 200 2016-06-23
Try this
Insert into TableB(first_name, max_invoice_name, max_invoice_value, last_date)
select t1.first_name,t1.invoice,t1,value,t2.date from TableA as t1 inner join
(
select first_name, max(replace(invoice,'invoice','')) as invoice, max(date) as date
from TableA group by first_name
) as t2 on t1.first_name=t2.first_name and t1.invoice=t2.invoice

how to use same column twice with different criteria with one common column in sql

I have a table
ID P_ID Cost
1 101 1000
2 101 1050
3 101 1100
4 102 5000
5 102 2000
6 102 6000
7 103 3000
8 103 5000
9 103 4000
I want to use 'Cost' column twice to fetch first and last inserted value in cost corresponding to each P_ID
I want output as:
P_ID First_Cost Last_Cost
101 1000 1100
102 5000 6000
103 3000 4000
;WITH t AS
(
SELECT P_ID, Cost,
f = ROW_NUMBER() OVER (PARTITION BY P_ID ORDER BY ID),
l = ROW_NUMBER() OVER (PARTITION BY P_ID ORDER BY ID DESC)
FROM dbo.tablename
)
SELECT t.P_ID, t.Cost, t2.Cost
FROM t INNER JOIN t AS t2
ON t.P_ID = t2.P_ID
WHERE t.f = 1 AND t2.l = 1;
In 2012 you will be able to use FIRST_VALUE():
SELECT DISTINCT
P_ID,
FIRST_VALUE(Cost) OVER (PARTITION BY P_ID ORDER BY ID),
FIRST_VALUE(Cost) OVER (PARTITION BY P_ID ORDER BY ID DESC)
FROM dbo.tablename;
You get a slightly more favorable plan if you remove the DISTINCT and instead use ROW_NUMBER() with the same partitioning to eliminate multiple rows with the same P_ID:
;WITH t AS
(
SELECT
P_ID,
f = FIRST_VALUE(Cost) OVER (PARTITION BY P_ID ORDER BY ID),
l = FIRST_VALUE(Cost) OVER (PARTITION BY P_ID ORDER BY ID DESC),
r = ROW_NUMBER() OVER (PARTITION BY P_ID ORDER BY ID)
FROM dbo.tablename
)
SELECT P_ID, f, l FROM t WHERE r = 1;
Why not LAST_VALUE(), you ask? Well, it doesn't work like you might expect. For more details, see the comments under the documentation.
SELECT t.P_ID,
SUM(CASE WHEN ID = t.minID THEN Cost ELSE 0 END) as FirstCost,
SUM(CASE WHEN ID = t.maxID THEN Cost ELSE 0 END) as LastCost
FROM myTable
JOIN (
SELECT P_ID, MIN(ID) as minID, MAX(ID) as maxID
FROM myTable
GROUP BY P_ID) t ON myTable.ID IN (t.minID, t.maxID)
GROUP BY t.P_ID
Admittedly, #AaronBertrand's approach is cleaner here. However, this solution will work on older versions of SQL Server (that don't support CTE's or window functions), or on pretty much any other DBMS.
Do you want first and last in terms of Min and Max, or do you want which one was entered first and which one was entered last? If you want Min and max you can group by.
SELECT P_ID, MIN(Cost), MAX(Cost) FROM table_name GROUP BY P_ID
I believe this does your thing also, just without self joins or subqueries:
SELECT DISTINCT
P_ID
,MIN(Cost) OVER (PARTITION BY P_ID) as FirstCost
,MAX(Cost) OVER (PARTITION BY P_ID) as LastCost
FROM Table

Sql server combine multiple data sets without duplicate data

Given three tables Ta, Tb, Tc:
Ta(ID, Field1)
Tb(ID, Field2)
Tc(ID, Field3)
Given data example:
Ta
ID Field1
---------
1 A
1 B
Tb
ID Field2
---------
1 C
1 D
2 E
Tc
ID Field3
---------
1 F
2 G
2 H
Question:
How can I join this data to return:
ID Field1 Field2 Field3
-----------------------
1 A C F
1 B D NULL
2 NULL E G
2 NULL NULL H
I thought I could achieve this with outer joins but that doesn't seem to be the case. The order of the groupings doesn't really matter, as long as I bring back all information without duplicate rows.
Just to clarify. I don't really mind which combination I get as long as the result set returns all data in the minimum number of rows. Here's a more realistic example of what I am trying to do:
Given a person, call him John. He has two phone numbers and three email addresses:
PID Email
---------
John john#test.com
John john#mail.com
John john#john.com
PID Tel
--------
John 011
John 022
I want to return:
PID Email Tel
----------------------
John john#test.com 011
John john#mail.com 022
John john#john.com NULL
You can come close with the following:
select coalesce(ta.id, tb.id, tc.id), ta.field1, tb.field2, tc.field3
from (select ta.*, row_number() over (partition by id order by (select NULL)) as seqnum
from ta
) ta full outer join
(select tb.*, row_number() over (partition by id order by (select NULL)) as seqnum
from tb
) tb
on ta.id = tb.id and
ta.seqnum = tb.seqnum
(select tc.*, row_number() over (partition by id order by (select NULL)) as seqnum
from tc
) tc
on coalesce(ta.id, tb.id) = tc.id and
coalesce(ta.seqnum, tb.seqnum) = tc.seqnum
group by coalesce(ta.id, tb.id, tc.id),
coalesce(ta.seqnum, tb.seqnum, tc.seqnum)
order by 1, 2
As I said, though, in my comment, the ordering of rows in a table is not guaranteed, so these may not come out in the order you expect. With your sample data, you could use:
over (partition by id order by field<n>)
If the fields define the ordering
Here's an alternative, using CTE's and a Union, with MIN to exclude the nulls. It doesn't guarantee the ordering, but as since you say you don't care as long as the ID's are all present.
SQL Fiddle here
WITH TaRanked AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Field1) as Rnk, ID, Field1
FROM Ta
),
TbRanked AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Field2) as Rnk, ID, Field2
FROM Tb
),
TcRanked AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Field3) as Rnk, ID, Field3
FROM Tc
),
TUnion AS
(
SELECT Rnk, ID, Field1, NULL AS Field2, NULL AS Field3
FROM TaRanked
UNION ALL
SELECT Rnk, ID, NULL, Field2, NULL
FROM TbRanked
UNION ALL
SELECT Rnk, ID, NULL, NULL, Field3
FROM TcRanked
)
SELECT ID, MIN(Field1), MIN(Field2), MIN(Field3)
FROM TUnion
GROUP BY ID, Rnk
ORDER BY ID, Rnk
The result is
1 A C F
1 B D (null)
2 (null) E G
2 (null) (null) H

How to perform SQL Query to get last entry

I am working on creating a SQL query where the result will return a student test score
from the last test that they took. I think that this should be fairly simple but I am just not seeing it.
Here is my test data
Name Date Score
John 2/3/2012 94
John 2/14/2012 82
John 2/28/2012 72
Mary 2/3/2012 80
Mary 2/28/2012 71
Ken 2/14/2012 68
Ken 2/14/2012 66
I want the returned result to be
John 2/28/2012 72
Mary 2/28/2012 80
Ken 2/14/2012 66
I appreciate any assistance.
select date, name, score
from temp t1
where date = (select max(date) from temp where t1.name = temp.name)
OR
SELECT a.*
FROM temp a
INNER JOIN
(
SELECT name,MAX(date) as max_date
FROM temp a
GROUP BY name
)b ON (b.name = a.name AND a.date=b.max_date)
Here is a sql fiddle with an example
or even this if you have more than one record for each person on a date like you show in your sample data.
SELECT c.name,c.date, MAX(c.score) as max_score
FROM
(
SELECT a.*
FROM temp a
INNER JOIN
(
SELECT name,MAX(date) as max_date
FROM temp a
GROUP BY name
)b ON (b.name = a.name AND a.date=b.max_date)
)c
group by c.name,c.date
Sql fiddle with this example
SELECT Name, Date, Score
FROM tablename t1
WHERE Date = (SELECT MAX(Date)
FROM tablename
WHERE Name = t1.Name
GROUP BY Name)
Which database are you using? Most support row_number() which is the right way to answer this:
select *
from
(
select t.*, row_number() over (partition by name order by date desc) as seqnum
from table t
)
where rownum = 1