SQL Apply Distinct One Column [duplicate] - sql

This question already has answers here:
Get top 1 row of each group
(19 answers)
Closed 9 months ago.
I have a SQL query as below. I want to apply DISTINCT to Name column in this query. Can you help me?
SELECT Id,
ConflictCheckRequestIndividualId,
Name,
Surname,
it.IndividualType AS IndividualType,
JobTitle,
RegistrationNumber,
Title,
District,
Status,
CreatedBy,
Created,
ModifiedBy,
Modified
FROM ConflictCheckItoIndividual
LEFT JOIN #IndividualTypes it
ON it.IndividualId = ConflictCheckRequestIndividualId
WHERE ConflictCheckRequestIndividualId IN
(SELECT Id
FROM ConflictCheckRequestIndividual
WHERE ConflictCheckRequestId = #ConflictId
AND SubStatus = 2)

Two ways with subtly different results. "GROUP BY X" is another way of saying "Give me one row per X". You will have to apply an aggregation function to each other row so it knows how to squash the rows into 1:
SELECT MAX(Id),
MAX(ConflictCheckRequestIndividualId),
Name,
MAX(Surname),
MAX(it.IndividualType) AS IndividualType,
MAX(JobTitle),
MAX(RegistrationNumber),
MAX(Title),
MAX(District),
MAX(Status),
MAX(CreatedBy),
MAX(Created),
MAX(ModifiedBy),
MAX(Modified)
FROM ConflictCheckItoIndividual
LEFT JOIN #IndividualTypes it
ON it.IndividualId = ConflictCheckRequestIndividualId
WHERE ConflictCheckRequestIndividualId IN
(SELECT Id
FROM ConflictCheckRequestIndividual
WHERE ConflictCheckRequestId = #ConflictId
AND SubStatus = 2)
GROUP BY Name
This might end up with data for each field coming from different rows. If you wanted all data to come from one row, you could do this
;WITH cte AS
(
SELECT Id,
ConflictCheckRequestIndividualId,
Name,
Surname,
it.IndividualType AS IndividualType,
JobTitle,
RegistrationNumber,
Title,
District,
Status,
CreatedBy,
Created,
ModifiedBy,
Modified,
ROW_NUMBER() OVER (PARTITION BY Name ORDER BY Modified DESC) as rownum
FROM ConflictCheckItoIndividual
LEFT JOIN #IndividualTypes it
ON it.IndividualId = ConflictCheckRequestIndividualId
WHERE ConflictCheckRequestIndividualId IN
(SELECT Id
FROM ConflictCheckRequestIndividual
WHERE ConflictCheckRequestId = #ConflictId
AND SubStatus = 2)
)
SELECT * FROM cte WHERE rownum = 1
This is "Partitioning" the data into one bucket per Name. Within each bucket, its ordering the rows by Modified in descending order. We then only pick out one row from each bucket - the most recently Modified one.
As an aside, given that there is a Name and a Surname field, I would expect to group by that as well

Just realize that what your asking for means: You get a random dataset for 1 particular Name, but if that's good enough for you at this moment this code should do:
select * from (
SELECT
ROW_NUMBER() over (partition by Name order by Id) [row],
Id,
ConflictCheckRequestIndividualId,
Name,
Surname,
it.IndividualType AS IndividualType,
JobTitle,
RegistrationNumber,
Title,
District,
Status,
CreatedBy,
Created,
ModifiedBy,
Modified
FROM ConflictCheckItoIndividual
LEFT JOIN #IndividualTypes it
ON it.IndividualId = ConflictCheckRequestIndividualId
WHERE ConflictCheckRequestIndividualId IN
(SELECT Id
FROM ConflictCheckRequestIndividual
WHERE ConflictCheckRequestId = #ConflictId
AND SubStatus = 2)) data
where [row] = 1
Will you let me know if this works for you?

Related

How to display duplicates in SQL only if another column is different?

So say I have this table:
Name
Role
First
Science
First
Math
First
Science
First
Math
Second
Science
Third
Math
Third
Math
I want to display a column of duplicates for Name/Role ONLY if role is different in each group. So the final result should be like this:
Name
Role
First
Science
First
Math
This is the only person that has a different role for the same name (no matter how many times that specific combination is duplicated). That's why even though Third/Math is also duplicated, it doesn't matter because it's the same combination.
I tried doing a CTE as follows:
;with cte as (
Select Name, Role, ROW_NUMBER() over (partition by name order by name) as 'rownum1'
from U.Users
group by u.name, u.role)
so then select * from cte where rownum > 1 gets me my names of people that have this issue but it doesn't display the duplicate roles for that user. Not sure how I should approach it differently?
If I join the CTE table to the original Users table, I also get the single entries.
You can take advantage of the fact that window functions are applied after aggregation:
select name, role
from (
select name, role, count(1) over (partition by name) c
from user_role
group by name, role
) r
where c > 1
https://www.db-fiddle.com/f/vzRDgBXwYp3VpgNyfn9qzL/0
You can try something like this:
WITH cte1 as (
SELECT distinct *
FROM
table1
),
cte2 as
(
Select Name, Role, ROW_NUMBER() over (partition by name order by name) as rnk
from cte1 u
group by u.name, u.role
)
SELECT * FROM cte2
where name in
(select name
from cte2
WHERE rnk > 1
group by name
)
I used a distinct function to remove any duplicates, then use the ROW_NUMBER() like you to find Names with multiple rows.
db fiddle link
So after I posted question I tried this which isn't as elegant as Kurt's answer but did also work:
;with cte as (select name, role, row_number() over (partition by name order by name) rownum
from user_role
group by name, role)
select distinct user_role.name, user_role.role from user_role
join cte on cte.name=user_role.name and cte.role=user_role.role
where user_role.name in (select name from cte where rownum =2)
https://www.db-fiddle.com/f/vzRDgBXwYp3VpgNyfn9qzL/2

Grouping while maintaining next record

I have a table (NerdsTable) with some of this data:
-------------+-----------+----------------
id name school
-------------+-----------+----------------
1 Joe ODU
2 Mike VCU
3 Ane ODU
4 Trevor VT
5 Cools VCU
When I run the following query
SELECT id, name, LEAD(id) OVER (ORDER BY id) as next_id
FROM dbo.NerdsTable where school = 'ODU';
I get these results:
[id=1,name=Joe,nextid=3]
[id=3,name=Ane,nextid=NULL]
I want to write a query that does not need the static check for
where school = 'odu'
but gives back the same results as above. In another words, I want to select all results in the database, and have them grouped correctly as if i went through individually and ran queries for:
SELECT id, name, LEAD(id) OVER (ORDER BY id) as next_id FROM dbo.NerdsTable where school = 'ODU';
SELECT id, name, LEAD(id) OVER (ORDER BY id) as next_id FROM dbo.NerdsTable where school = 'VCU';
SELECT id, name, LEAD(id) OVER (ORDER BY id) as next_id FROM dbo.NerdsTable where school = 'VT';
Here is the output I am hoping to see:
[id=1,name=Joe,nextid=3]
[id=3,name=Ane,nextid=NULL]
[id=2,name=Mike,nextid=5]
[id=5,name=Cools,nextid=NULL]
[id=4,name=Trevor,nextid=NULL]
Here is what I have tried, but am failing miserably:
SELECT id, name,
LEAD(id) OVER (ORDER BY id) as next_id
FROM dbo.NerdsTable
ORDER BY school;
-- Problem, as this does not sort by the id. I need the lowest id first for the group
SELECT id, name,
LEAD(id) OVER (ORDER BY id) as next_id
FROM dbo.NerdsTable
ORDER BY id, school;
-- Sorts by id, but the grouping is not correct, thus next_id is wrong
I then looked on the Microsoft doc site for aggregate functions, but do not see how i can use any to group my results correctly. I tried to use GROUPING_ID, as follows:
SELECT id, GROUPING_ID(name),
LEAD(id) OVER (ORDER BY id) as next_id
FROM dbo.NerdsTable
group by school;
But I get an error:
is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause
Any idea as to what I am missing here?
From your desired output it looks like you are just trying to order the records by school. You can do that like this:
SELECT id, name
FROM dbo.NerdsTable
ORDER BY school ASC, id ASC
I don't know what next ID is supposed to mean.
create table schools (id int, name varchar(50), school varchar(3))
insert into schools values (1, 'Joe', 'ODU'), (2, 'Mike', 'VCU'), (3, 'Ane',
'ODU'), (4, 'Trevor', 'VT'), (5, 'Cools', 'VCU'), (6, 'Sarah', 'VCU')
select n.id, n.name, min(g.id) nextid
from schools n
left join
(
select id, school
from schools
) g on g.school = n.school and g.id > n.id
group by n.id, n.name
drop table schools

How to select the latter row in SQL

I have a result set that looks like this:
As you can see some of the contactID are repeated with same QuestionResponse. And there is one with a different QuestionResponse (the one with red lines).
I want to group this by ContactID, but select the latter row. Eg: In case of ContactID = 78100299, I want to select the row with CreateDate = 17:00:44.907 (or rowNum = 2).
I have tried this:
select
ContactID,
max(QuestionResponse) as QuestionResponse,
max(CreateDate) as CreateDate
from
theResultSet
group by
ContactID
This will NOT work because there could be QuestionResponse 2 and then 1 for the same contactID. In that case the latter one will be the one with response 1 not 2.
Thank you for you help.
I would use ROW_NUMBER() that way:
WITH Query AS
(
SELECT rowNum, ContactID, QuestionResponse, CreateDate,
ROW_NUMBER() OVER (PARTITION BY ContactID ORDER BY CreateDate DESC) Ordered
FROM theResultSet
)
SELECT * FROM Query WHERE Ordered=1
Assign numbers in ContactID group by date, descending
Filter results having number <> 1
This might work if your SQL Engine can handle it...
SELECT trs1.*
FROM theResultSet trs1
INNER JOIN
(SELECT ContactID, max(CreateDate) as CreateDate
FROM theResultSet
GROUP BY ContactID) trs2
ON trs1.ContactID = trs2.ContactID
AND trs1.CreateDate = trs2.CreateDate
The end result will be all rows from theResultSet where the creation date is the max creation date.
This should work too:
SELECT
ContactID, QuestionResponse,CreateDate
FROM (
select rowNum, ContactID, QuestionResponse,CreateDate,
max(rowNum) over(partition by ContactID) as maxrow
from theResultSet
) x
WHERE rowNum=maxrow

Query to return values from table B when newer than record in Table A

I have a table with a member's name, address, etc. and a time stamp of the last time the record was updated. I have a second table that holds updates to the member record, a holding table, until changes are approved by staff.
I have a query that returns data from the member table. I now need to check the updates table, and if the member's record in the updates table has a more recent time stamp, return that record instead of the record in the member table.
I tried a few things such as a UNION with Top 1 but it's not quite right. I could make a complex CASE statement but is that going to perform well?
It sounds simple, get the most recent record from table A, and the most recent from table B and return the one record that is the newest.
SELECT name, address, city, state, zipcode, time_stamp
FROM Member
WHERE ID = 123
SELECT name, address, city, state, zipcode, time_stamp
FROM MemberUpdates
WHERE ID = 123
EDIT:
OK, with the help so far, I was able to get the results I expected. Then, I went to add the extra where clauses and I broke it. Tried several different ways including using a CTE and could not quite get it right. Here is a query that works and returns the expected results, however notice I have to pass name_last/birth_year/memNum twice. Is there a better way?
SELECT TOP 1 m.abn,
m.aliases,
m.birth_year,
m.user_stamp,
q.updatePending,
q.name_first,
q.name_last,
q.company,
q.address1,
q.mailing_address,
q.city,
q.state,
q.zipcode,
q.email_address
FROM (
SELECT TOP 1
1 AS updatePending,
a.entity_number,
a.name_first,
a.name_last,
NULLIF(LTRIM(RTRIM(
LTRIM(RTRIM(ISNULL(a.company, ''))) +
LTRIM(RTRIM(ISNULL(a.firm_name, ''))))),'') AS company,
a.address1,
a.mailing_address,
a.city,
a.state,
a.zip_code AS zipcode,
a.internet_address AS email_address,
a.time_stamp
FROM statebar.dbo.STAGING_Address_Change_Request a
INNER JOIN Member m ON m.entity_number = a.entity_number
WHERE a.entity_number = (
SELECT m.entity_number
FROM Member m
INNER JOIN Named_Entity ne ON (ne.entity_number = m.entity_number)
WHERE ne.name_last = 'jones'
AND m.birth_year = '1975'
AND m.memNum = '12345'
)
AND a.time_stamp > m.time_stamp
UNION ALL
SELECT TOP 1
0 AS updatePending,
ne.entity_number,
ne.name_first,
ne.name_last,
NULLIF(LTRIM(RTRIM(
LTRIM(RTRIM(ISNULL(ne.company, ''))) +
LTRIM(RTRIM(ISNULL(ne.firm_name, ''))))),'') AS company,
ne.address1,
ne.mailing_address,
ne.city,
ne.state,
ne.zip_code,
ne.internet_address AS email_address,
m.time_stamp
FROM Member m
INNER JOIN Named_Entity ne ON (ne.entity_number = m.entity_number)
LEFT JOIN statebar.dbo.STAGING_Address_Change_Request a ON a.entity_number = m.entity_number
WHERE ne.entity_number = (
SELECT m.entity_number
FROM Member m
INNER JOIN Named_Entity ne ON (ne.entity_number = m.entity_number)
WHERE ne.name_last = 'jones'
AND m.birth_year = '1975'
AND m.memNum = '12345'
)
AND m.time_stamp > a.time_stamp
ORDER BY updatePending DESC, a.time_stamp DESC) q
INNER JOIN Member m on m.entity_number = q.entity_number
ORDER BY q.time_stamp DESC
Here is a simple query that will help you return the most recent record:
--Only selects the top row with the most recent record
SELECT TOP 1 * FROM record
(
--Select rows with the same ID
SELECT name, address, city, state, zipcode, time_stamp
FROM Member
WHERE ID = 123
UNION ALL
SELECT name, address, city, state, zipcode, time_stamp
FROM MemberUpdates
WHERE ID = 123
) t
ORDER BY t.time_stamp DESC --Order the table by time_stamp to get the most recent record
-- DESC is used because datetime is ordered by oldest first in ascending order.
The union approach is a good idea, but you'd want to use the row_number() window function and not just top. Also, union all can be used instead of union. You don't care about duplicates between A and B, and union all will just perform better:
SELECT name, address, city, state, zipcode, time_stamp
FROM (SELECT name, address, city, state, zipcode, time_stamp,
ROW_NUMBER() OVER (PARTITION BY name ORDER BY time_stamp DESC) rn
FROM (SELECT name, address, city, state, zipcode, time_stamp
FROM Member
UNION ALL
SELECT name, address, city, state, zipcode, time_stamp
FROM MemberUpdates) t
) q
WHERE rn = 1
consider:
SELECT
id,
MAX(CASE WHEN u.mx_ts IS NULL THEN m.mx_ts ELSE u.mx_ts end)
FROM
(SELECT
id,
MAX(time_stamp) AS mx_ts
FROM
MEMBER
GROUP BY
id) m
LEFT OUTER JOIN
(SELECT
id,
MAX(time_stamp) AS mx_ts
FROM
MemberUpdates
GROUP BY
id) u ON
m.id = u.id AND
u.mx_ts > m.mx_ts
GROUP BY
id
this will join a later timestamp per id from memberupdates, if there is one. Otherwise, you can use the latest timestamp per id from the member table.

SQL for counting rows and categorize

Is it possible to do the following for count >=3,4,5,6,7,8 etc.
rather than repeating the entire code for each count category
Insert into OnePlus (SELECT DISTINCT Id, Name, COUNT(DISTINCT StartDate) AS OnePlusDays
FROM DataTable
HAVING OnePlusDays >= 1
GROUP BY Id, Name)
Insert into TwoPlus (SELECT DISTINCT Id, Name, COUNT(DISTINCT StartDate) AS TwoPlusDays
FROM DataTable
HAVING TwoPlusDays >= 2
GROUP BY Id, Name)
Finally
SELECT Id, Name, "1+" AS Categories
FROM OnePlus
UNION
SELECT Id, Name, "2+" AS Categories
FROM TwoPlus
You mention only sql in the tags. Depending on MySql or SQL Server, you may need to change the Cast/Convert and Concatenation. But this query may help. You really don't need to put a Distinct on top a group by, the fact that you are grouping by, means only distinct values and their counts will be fetched.
Of course, the table OnePlus, is really what you call Categories.
Insert into OnePlus
SELECT Id, Name, convert(varchar(10), COUNT(DISTINCT StartDate) ) + "+" AS Categories
FROM DataTable
GROUP BY Id, Name
In T-SQL you can write as:
SELECT Id,
NAME , -- make sure you write case statement in desc order
CASE WHEN PlusDays > = 2 THEN '2+'
WHEN PlusDays > = 1 THEN '1+' END AS Categories
FROM
(
SELECT DISTINCT Id, Name, COUNT(DISTINCT StartDate) PlusDays
FROM #DataTable
GROUP BY Id, Name
) AS T
ORDER BY Id asc