using group by operators in sql - sql

i have two columns - email id and customer id, where an email id can be associated with multiple customer ids. Now, I need to list only those email ids (along with their corresponding customer ids) which are having a count of more than 1 customer id. I tried using grouping sets, rollup and cube operators, however, am not getting the desired result.
Any help or pointers would be appreciated.

SELECT emailid
FROM
( SELECT emailid, count(custid)
FROM table
Group by emailid
Having count(custid) > 1
)

I think this will get you what you want, if I am understanding you question correctly
select emailid, customerid from tablename where emailid in
(
select emailid from tablename group by emailid having count(emailid) > 1
)

Sounds like you would need to use HAVING
e.g
SELECT email_id, COUNT(customer_id)
From sometable
GROUP BY email_id
HAVING COUNT(customer_id) > 1
HAVING allows you to filter following the grouping of a particular column.

WITH email_ids AS (
SELECT email_id, COUNT(customer_id) customer_count
FROM Table
GROUP BY email_id
HAVING count(customer_id) > 1
)
SELECT t.email_id, t.customer_id
FROM Table t
INNER JOIN email_ids ei
ON ei.email_id = t.email_id

If you need a comma separated list of all of their customer id's returned with the single email id, you could use GROUP_CONCAT for that.
This would find all email_id's with at least 1 customer_id, and give you a comma separated list of all customer_ids for that email_id:
SELECT email_id, GROUP_CONCAT(customer_id)
FROM your_table
GROUP BY email_id
HAVING count(customer_id) > 1;
Assuming email_id #1 was assigned to customer_ids 1, 2, & 3, your output would look like:
email_id | customer_id
1 | 1,2,3
I didn't realize you were using MS SQL, there's a thread here about simulating GROUP_CONCAT in MS SQL: Simulating group_concat MySQL function in Microsoft SQL Server 2005?

SELECT t1.email, t1.customer
FROM table t1
INNER JOIN (
SELECT email, COUNT(customer)
FROM table
GROUP BY email
HAVING COUNT(customer)>1
) t2 on t1.email = t2.email
This should get you what your looking for.
Basically, as other ppl have stated, you can filter group by results with HAVING. But since you want the customerids afterwards, join the entire select back to your original table to get your results. Could probably be done prettier but this is easy to understand.

SELECT
email_id,
STUFF((SELECT ',' + CONVERT(VARCHAR,customer_id) FROM cust_email_table T1 WHERE T1.email_id = T2.email_id
FOR
XML PATH('')
),1,1,'') AS customer_ids
FROM
cust_email_table T2
GROUP BY email_id
HAVING COUNT(*) > 1
this would give you a single row per email id and comma seperated list of customer id's.

Related

SQL Oracle Find Max of count

I have this table called item:
| PERSON_id | ITEM_id |
|------------------|----------------|
|------CP2---------|-----A03--------|
|------CP2---------|-----A02--------|
|------HB3---------|-----A02--------|
|------BW4---------|-----A01--------|
I need an SQL statement that would output the person with the most Items. Not really sure where to start either.
I advice you to use inner query for this purpose. the inner query is going to include group by and order by statement. and outer query will select the first statement which has the most items.
SELECT * FROM
(
SELECT PERSON_ID, COUNT(*) FROM TABLE1
GROUP BY PERSON_ID
ORDER BY 2 DESC
)
WHERE ROWNUM = 1
here is the fiddler link : http://sqlfiddle.com/#!4/4c4228/5
Locating the maximum of an aggregated column requires more than a single calculation, so here you can use a "common table expression" (cte) to hold the result and then re-use that result in a where clause:
with cte as (
select
person_id
, count(item_id) count_items
from mytable
group by
person_id
)
select
*
from cte
where count_items = (select max(count_items) from cte)
Note, if more than one person shares the same maximum count; more than one row will be returned bu this query.

How to check if a person has duplicate date records?

I am looking to query my Access database from Excel (DAO) to determine if any name in the table has more than one record per date. E.g. If Bob has two records on 05/05/17 then I want to return both records as part of a recordset.
Seems like you are looking for something like:
SELECT *
FROM yourtable
INNER JOIN
(
SELECT count(*), name, date
FROM yourtable
GROUP BY name, date
HAVING COUNT(*) > 1
) multi
ON multi.name = yourtable.name
AND multi.date = yourtable.date
The inner select returns rows with more than 1 entry for the same name and date.
In Access you can do
select name, date
from your_table
group by name, date
having count(*) > 1

Select a NON-DISTINCT column in a query that return distincts rows

The following query returns the results that I need but I have to add the ID of the row to then update it. If I add the ID directly in the select statement it will return me more results then I need because each ID is unique so the DISTINCT statement see the line as unique.
SELECT DISTINCT ucpse.MemberID, ucpse.ProductID, ucpse.UserID
FROM UserCustomerProductSalaryExceptions as ucpse
WHERE EXISTS (SELECT NULL
FROM UserCustomerProductSalaryExceptions as upcse2
WHERE ucpse.userid = upcse2.userid AND ucpse.MemberID = upcse2.MemberID AND ucpse.ProductID = upcse2.ProductID
GROUP BY upcse2.UserID, upcse2.memberid, upcse2.productid
HAVING COUNT(UserID) >= 2
)
So basically I need to add ucpse.ID in the Select statement while keeping DISTINCT values for MemberID,ProductID and UserID.
Any Ideas ?
Thank you
According to you comment:
If the data has been duplicated 67 times for a given employee with a given product and a given client, I need to keep only one of thoses records. It's not important which one, so this is why I use DISTINC to obtain unique combinaison of given employee with a given product and a given client.
You can use MIN() or MAX() and GROUP BY instead of DISTINCT
SELECT MAX(ucpse.ID) AS ID, ucpse.MemberID, ucpse.ProductID, ucpse.UserID
FROM UserCustomerProductSalaryExceptions as ucpse
WHERE EXISTS (SELECT NULL
FROM UserCustomerProductSalaryExceptions as upcse2
WHERE ucpse.userid = upcse2.userid AND ucpse.MemberID = upcse2.MemberID AND ucpse.ProductID = upcse2.ProductID
GROUP BY upcse2.UserID, upcse2.memberid, upcse2.productid
HAVING COUNT(UserID) >= 2
)
GROUP BY ucpse.MemberID, ucpse.ProductID, ucpse.UserID
UPDATE:
From you comments I think the below query is what you need
DELETE FROM UserCustomerProductSalaryExceptions
WHERE ID NOT IN ( SELECT MAX(ucpse.ID) AS ID
FROM #UserCustomerProductSalaryExceptions
GROUP BY ucpse.MemberID, ucpse.ProductID, ucpse.UserID
HAVING COUNT(ucpse.ID) >= 2
)
If all you want is to delete the duplicates, this will do it:
WITH X AS
(SELECT ID,
ROW_NUMBER() OVER (PARTITION BY MemberID, ProductID, UserID ORDER BY ID) AS DupRowNum<br
FROM UserCustomerProductSalaryExceptions
)
DELETE X WHERE DupRowNum > 1
ID's not necessary - try:
UPDATE uu SET
<your settings here>
FROM UserCustomerProductSalaryExceptions uu
JOIN ( <paste your entire query above here>
) uc ON uc.MemberID=uu.MemberId AND uc.ProductID=uu.ProductId AND uc.UserID=uu.UserId
From the sound of your data structure (which I would STRONGLY advise normalizing as soon as possible), it sounds like you should be updating all the records. It sounds as if each duplicate is important because it contains some information about an employee's relation to a customer or product.
I would probably update all the records. Try this:
UPDATE UCPSE
SET
--Do your updates here
FROM UserCustomerProductSalaryExceptions as ucpse
JOIN
(
SELECT UserID, MemberID, ProductID
FROM UserCustomerProductSalaryExceptions
GROUP BY UserID, MemberID, ProductID
HAVING COUNT(UserID) >= 2
) T
ON ucpse.UserID = T.UserID AND ucpse.MemberID = T.MemberID AND ucpse.ProductID = T.ProductID

Need to select ALL columns while using COUNT/Group By

Ok so I have a table in which ONE of the columns have a FEW REPEATING records.
My task is to select the REPEATING records with all attributes.
CustID FN LN DOB City State
the DOB has some repeating values which I need to select from the whole table and list all columns of all records that are same within the DOB field..
My try...
Select DOB, COUNT(DOB) As 'SameDOB' from Table1
group by DOB
HAVING (COUNT(DOB) > 1)
This only returns two columns and one row 1st column is the DOB column that occurs more than once and the 2nd column gives count on how many.
I need to figure out a way to list all attributes not just these two...
Please guide me in the right direction.
I think a more general solution is to use windows functions:
select *
from (select *, count(*) over (partition by dob) as NumDOB
from table
) t
where numDOB > 1
The reason this is more general is because it is easy to change to duplicates across two or more columns.
Select *
FROM Table1 T
WHERE T.DOB IN( Select I.DOB
FROM Table1 I
GROUP BY I.DOB
HAVING COUNT(I.DOB) > 1)
Try joining with a subquery, which will also allow you to see the count
select t.*, a.SameDOB from Table1 t
join (
Select DOB, COUNT(DOB) As 'SameDOB' from Table1
group by DOB
HAVING (COUNT(DOB) > 1)
) a on a.dob = t.dob
select *
from table1, (select count(*) from table1) as cnt

SQL Server Duplicate Checking

What is the best way to determine duplicate records in a SQL Server table?
For instance, I want to find the last duplicate email received in a table (table has primary key, receiveddate and email fields).
Sample data:
1 01/01/2008 stuff#stuff.com
2 02/01/2008 stuff#stuff.com
3 01/12/2008 noone#stuff.com
something like this
select email ,max(receiveddate) as MaxDate
from YourTable
group by email
having count(email) > 1
Try something like:
SELECT * FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY ReceivedDate, Email ORDER BY ReceivedDate, Email DESC) AS RowNumber
FROM EmailTable
) a
WHERE RowNumber = 1
See http://www.technicaloverload.com/working-with-duplicates-in-sql-server/
Couldn't you join the list on the e-mail field and then see what nulls you get in your result?
Or better yet, count the instances of each e-mail address? And only return the ones with count > 1
Or even take the email and id fields. And return the entries where the e-mail is the same, and the IDs are different. (To avoid duplicates don't use != but rather either < or >.)
SELECT [id], [receivedate], [email]
FROM [mytable]
WHERE [email] IN ( SELECT [email]
FROM [myTable]
GROUP BY [email]
HAVING COUNT([email]) > 1 )
Do you want a list of the last items? If so you could use:
SELECT [info] FROM [table] t WHERE NOT EXISTS (SELECT * FROM [table] tCheck WHERE t.date > tCheck.date)
If you want a list of all duplicate email address use GROUP BY to collect similar data, then a HAVING clause to make sure the quantity is more than 1:
SELECT [info] FROM [table] GROUP BY [email] HAVING Count(*) > 1 DESC
If you want the last duplicate e-mail (a single result) you simply add a "TOP 1" and "ORDER BY":
SELECT TOP 1 [info] FROM [table] GROUP BY [email] HAVING Count(*) > 1 ORDER BY Date DESC
If you have surrogate key, it is relatively easy to use the group by syntax mentioned in SQLMenance's post. Essentially, group by all the fields that make two or more rows 'the same'.
Example pseudo-code to delete duplicate records.
Create table people (ID(PK), Name, Address, DOB)
Delete from people where id not in (
Select min(ID) from people group by name, address, dob
)
Try this
select * from table a, table b
where a.email = b.email