DISTINCT for only one column - sql

Let's say I have the following query.
SELECT ID, Email, ProductName, ProductModel FROM Products
How can I modify it so that it returns no duplicate Emails?
In other words, when several rows contain the same email, I want the results to include only one of those rows (preferably the last one). Duplicates in other columns should be allowed.
Clauses like DISTINCT and GROUP BY appear to work on entire rows. So I'm not sure how to approach this.

If you are using SQL Server 2005 or above use this:
SELECT *
FROM (
SELECT ID,
Email,
ProductName,
ProductModel,
ROW_NUMBER() OVER(PARTITION BY Email ORDER BY ID DESC) rn
FROM Products
) a
WHERE rn = 1
EDIT:
Example using a where clause:
SELECT *
FROM (
SELECT ID,
Email,
ProductName,
ProductModel,
ROW_NUMBER() OVER(PARTITION BY Email ORDER BY ID DESC) rn
FROM Products
WHERE ProductModel = 2
AND ProductName LIKE 'CYBER%'
) a
WHERE rn = 1

This assumes SQL Server 2005+ and your definition of "last" is the max PK for a given email
WITH CTE AS
(
SELECT ID,
Email,
ProductName,
ProductModel,
ROW_NUMBER() OVER (PARTITION BY Email ORDER BY ID DESC) AS RowNumber
FROM Products
)
SELECT ID,
Email,
ProductName,
ProductModel
FROM CTE
WHERE RowNumber = 1

When you use DISTINCT think of it as a distinct row, not column. It will return only rows where the columns do not match exactly the same.
SELECT DISTINCT ID, Email, ProductName, ProductModel
FROM Products
----------------------
1 | something#something.com | ProductName1 | ProductModel1
2 | something#something.com | ProductName1 | ProductModel1
The query would return both rows because the ID column is different. I'm assuming that the ID column is an IDENTITY column that is incrementing, if you want to return the last then I recommend something like this:
SELECT DISTINCT TOP 1 ID, Email, ProductName, ProductModel
FROM Products
ORDER BY ID DESC
The TOP 1 will return only the first record, by ordering it by the ID descending it will return the results with the last row first. This will give you the last record.

You can over that by using GROUP BY like this:
SELECT ID, Email, ProductName, ProductModel
FROM Products
GROUP BY Email

For Access, you can use the SQL Select query I present here:
For example you have this table:
CLIENTE|| NOMBRES || MAIL
888 || T800 ARNOLD || t800.arnold#cyberdyne.com
123 || JOHN CONNOR || s.connor#skynet.com
125 || SARAH CONNOR ||s.connor#skynet.com
And you need to select only distinct mails.
You can do it with this:
SQL SELECT:
SELECT MAX(p.CLIENTE) AS ID_CLIENTE
, (SELECT TOP 1 x.NOMBRES
FROM Rep_Pre_Ene_MUESTRA AS x
WHERE x.MAIL=p.MAIL
AND x.CLIENTE=(SELECT MAX(l.CLIENTE) FROM Rep_Pre_Ene_MUESTRA AS l WHERE x.MAIL=l.MAIL)) AS NOMBRE,
p.MAIL
FROM Rep_Pre_Ene_MUESTRA AS p
GROUP BY p.MAIL;
You can use this to select the maximum ID, the correspondent name to that maximum ID , you can add any other attribute that way. Then at the end you put the distinct column to filter and you only group it with that last distinct column.
This will bring you the maximum ID with the correspondent data, you can use min or any other functions and you replicate that function to the sub-queries.
This select will return:
CLIENTE|| NOMBRES || MAIL
888 || T800 ARNOLD || t800.arnold#cyberdyne.com
125 || SARAH CONNOR ||s.connor#skynet.com
Remember to index the columns you select and the distinct column must have not numeric data all in upper case or in lower case, or else it won't work.
This will work with only one registered mail as well.
Happy coding!!!

The reason DISTINCT and GROUP BY work on entire rows is that your query returns entire rows.
To help you understand: Try to write out by hand what the query should return and you will see that it is ambiguous what to put in the non-duplicated columns.
If you literally don't care what is in the other columns, don't return them. Returning a random row for each e-mail address seems a little useless to me.

Try This
;With Tab AS (SELECT DISTINCT Email FROM Products)
SELECT Email,ROW_NUMBER() OVER(ORDER BY Email ASC) AS Id FROM Tab
ORDER BY Email ASC

Try this:
SELECT ID, Email, ProductName, ProductModel FROM Products WHERE ID IN (SELECT MAX(ID) FROM Products GROUP BY Email)

Related

I need first 2 occurrences of duplicate id in the output(image attached) using sql

want first 2 occurrence of duplicate ID'S in the output, desired output in image attached.
Please help
I'm not sure what you meant by "1st occurrence" because sql select queries are unorder unless you specified the order. So I'm assuming you are using alphabetical ordering on the emails.
SELECT t.id, t.email, t.state_code FROM (
SELECT id, email, state_code,
ROW_NUMBER() OVER(partition by id ORDER BY email desc) as cnt
FROM testdb
group by id, email, state_code
) as t
WHERE t.cnt <= 2
db fiddle link

using group by operators in sql

i have two columns - email id and customer id, where an email id can be associated with multiple customer ids. Now, I need to list only those email ids (along with their corresponding customer ids) which are having a count of more than 1 customer id. I tried using grouping sets, rollup and cube operators, however, am not getting the desired result.
Any help or pointers would be appreciated.
SELECT emailid
FROM
( SELECT emailid, count(custid)
FROM table
Group by emailid
Having count(custid) > 1
)
I think this will get you what you want, if I am understanding you question correctly
select emailid, customerid from tablename where emailid in
(
select emailid from tablename group by emailid having count(emailid) > 1
)
Sounds like you would need to use HAVING
e.g
SELECT email_id, COUNT(customer_id)
From sometable
GROUP BY email_id
HAVING COUNT(customer_id) > 1
HAVING allows you to filter following the grouping of a particular column.
WITH email_ids AS (
SELECT email_id, COUNT(customer_id) customer_count
FROM Table
GROUP BY email_id
HAVING count(customer_id) > 1
)
SELECT t.email_id, t.customer_id
FROM Table t
INNER JOIN email_ids ei
ON ei.email_id = t.email_id
If you need a comma separated list of all of their customer id's returned with the single email id, you could use GROUP_CONCAT for that.
This would find all email_id's with at least 1 customer_id, and give you a comma separated list of all customer_ids for that email_id:
SELECT email_id, GROUP_CONCAT(customer_id)
FROM your_table
GROUP BY email_id
HAVING count(customer_id) > 1;
Assuming email_id #1 was assigned to customer_ids 1, 2, & 3, your output would look like:
email_id | customer_id
1 | 1,2,3
I didn't realize you were using MS SQL, there's a thread here about simulating GROUP_CONCAT in MS SQL: Simulating group_concat MySQL function in Microsoft SQL Server 2005?
SELECT t1.email, t1.customer
FROM table t1
INNER JOIN (
SELECT email, COUNT(customer)
FROM table
GROUP BY email
HAVING COUNT(customer)>1
) t2 on t1.email = t2.email
This should get you what your looking for.
Basically, as other ppl have stated, you can filter group by results with HAVING. But since you want the customerids afterwards, join the entire select back to your original table to get your results. Could probably be done prettier but this is easy to understand.
SELECT
email_id,
STUFF((SELECT ',' + CONVERT(VARCHAR,customer_id) FROM cust_email_table T1 WHERE T1.email_id = T2.email_id
FOR
XML PATH('')
),1,1,'') AS customer_ids
FROM
cust_email_table T2
GROUP BY email_id
HAVING COUNT(*) > 1
this would give you a single row per email id and comma seperated list of customer id's.

SQL Separating Distinct Values using single column

Does anyone happen to know a way of basically taking the 'Distinct' command but only using it on a single column. For lack of example, something similar to this:
Select (Distinct ID), Name, Term from Table
So it would get rid of row with duplicate ID's but still use the other column information. I would use distinct on the full query but the rows are all different due to certain columns data set. And I would need to output only the top most term between the two duplicates:
ID Name Term
1 Suzy A
1 Suzy B
2 John A
2 John B
3 Pete A
4 Carl A
5 Sally B
Any suggestions would be helpful.
select t.Id, t.Name, t.Term
from (select distinct ID from Table order by id, term) t
You can use row number for this
Select ID, Name, Term from(
Select ID, Name, Term, ROW_NUMBER ( )
OVER ( PARTITION BY ID order by Name) as rn from Table
Where rn = 1)
as tbl
Order by determines the order from which the first row will be picked.

How can I SELECT additional columns with a TSQL query using GROUP BY

I have a view (that is a union of several tables) and I need to filter out duplicates. The table looks like this:
id first last logo email entered
1 joe smith i.jpg e#m.c 2014-01-27
2 jim smith b.jpg e#j.c 2014-01-27
3 bob smith z.jpg b#b.c 2014-01-27
9 joeseph smith q.gif e#m.c 2014-01-20
I want to do something like this, but I can't seem to get a valid syntax for it:
SELECT
email, MAX(entered), first, last -- such that first and last come from the same row as the MAX(entered)
FROM
my_view
GROUP BY
email
Since your names are not the same on the duplicate email rows, you must use the row_number() function instead:
select email, entered, first, last
from (
select *, row_number() over (partition by email order by entered desc) rn
from my_view
) x
where rn = 1
You need a subquery because row_number() is not allowed in the where clause.
You want to use row_number():
SELECT email, entered, first, last
FROM (select v.*, row_number() over (partition by email order by entered desc) as seqnum
from my_view v
) v
WHERE seqnum = 1;
row_number() is a window function that assigns sequential numbers to groups of rows. The groups are defined by the partition by clause. In this case, everything with the same email is in the same group. The first row is given a value 1; the ordering is based on the order by clause.
The outer query select the first one, which has the largest entered date.

Last Item In MySQL COUNT(*) Result?

Quick question... I have a query that checks for duplicates that looks like this:
SELECT COUNT( * ) AS repetitions, Name, Phone, ID, categoryid, State
FROM users
GROUP BY Name, Phone, State
HAVING repetitions > 1
ORDER BY ID DESC
this works but MySQL returns the first ID in a set of duplicates. For example, lets say I have 2 rows. ID for row one is 1 and ID for row two is 2 and Name, Phone and State have identical data... How can i get the above query to return the count but with the ID "2" instead of "1"?
Thanks! ;)
Use the max() aggregate function:
SELECT COUNT(*) AS repetitions, max(ID) FROM users GROUP BY Name, Phone, State HAVING repetitions > 1 ORDER BY ID DESC
not perfect but works
SELECT COUNT( * ) AS repetitions, Name, Phone, MAX(ID), categoryid, State
FROM users
GROUP BY Name, Phone, State
HAVING repetitions > 1