GroupBy Query Which Shows Related Data In Addition to Grouping Clause - sql

Ugh, I know it's a terrible title, but I can't think of a way to summarize my question in a simple statement. It's a fairly basic T-SQL query question but I haven't used T-SQL much in the last year or so and my brain simply doesn't want to work today.
Basically I have a table with usernames (email address) and a client id. There can't be multiple emails per client, but there can be multiple emails for different clients. I'm trying to do a group on email addresses to get a count of how many emails are associated with 1 or more clients - that's the easy part. Where I'm struggling is trying to also list which client ids the email address is associated to.
For example, I have this query which gives me 1/2 of what I'm looking for:
select UserName, COUNT(*)
from UserTable
group by UserName
having COUNT(*) > 1
order by COUNT(*) desc
But I would also like to have either a row-per client, or even just multiple new columns showing each client id the email address is associated with such as:
user1#test.com 3
user1#test.com 34
user1#test.com 9
OR
user1#test.com 3 34 9
Any assistance is appreciated.

If you're using SQL-Server, you can use the COUNT window function:
SELECT UserName, UserId, COUNT(UserId) OVER (PARTITION BY UserName) AS Counts
FROM UserTable
Then to pick out only those with a count greater than 1:
SELECT * FROM (
SELECT UserName, UserId, COUNT(UserId) OVER (PARTITION BY UserName) AS Counts
FROM UserTable
) rows
WHERE rows.Counts > 1
To get them into the second format, you'd need to use some row concatenation strategy - FOR XML PATH is a popular one.

You can use FOR XML PATH:
Select UserName, COUNT(*),
substring(
(
Select ','+clientID AS [text()]
From UserTable UTI
Where UTI.UserName = UTO.UserName
ORDER BY UTI.clientID
For XML PATH ('')
), 2, 1000)
From UserTable UTO
group by UserName
having COUNT(*) > 1
order by COUNT(*) desc

Related

Write a SQL Query to find different users linked to same User ID

So I'm currently on a project where I'm working with multiple sources, and one of them is SAP data.
I need to return "duplicates" in essence and find all the different users, that are linked to the same SAP User ID. There are entries that are valid however, as the data describes access roles to the different SAP systems. So it is normal if the same user occurs more than once. But I need to find where there is a different name assigned to the same User ID.
This is what I currently have:
select *
from (
select *,
row_number() over (partition by FULL_NAME order by USER_ID) as row_number
from SAP_TABLE
) as rows order by USER_ID desc
Any help would be appreciated. Thanks!
You would partition by the user_id
select *
from (
select *,
count(distinct (full_name)) over (partition by user_id) as rnk
from SAP_TABLE
) as rows
where rnk>1
order by USER_ID desc
are you looking for this?
select count(distinct FULL_NAME),
USER_ID
from SAP_TABLE
group by USER_ID
having count(distinct FULL_NAME) > 1
You can use this piece of code to find all the user id's that occur more than once.
SELECT USER_ID, COUNT(*)
FROM SAP_TABLE
GROUP BY 1
HAVING COUNT(*) > 1;

SQL SELECT Full Row with Duplicated Data in One Column

I am using Microsoft SQL Server 2014.
I am able to list emails which are duplicated.
But I am unable to list the entire row, which contain other fields such as EmployeeId, Username, FirstName, LastName, etc.
SELECT Email,
COUNT(Email) AS NumOccurrences
FROM EmployeeProfile
GROUP BY Email
HAVING ( COUNT(Email) > 1 )
May I know how can I list all field in the rows that contains Email appearing more than once in the table?
Thank you.
Try this:
WITH DataSource AS
(
SELECT *
,COUNT(*) OVER (PARTITION BY email) count_calc
FROM EmployeeProfile
)
SELECT *
FROM DataSource
WHERE count_calc > 1
select distinct * from EmployeeProfile where email in (SELECT
Email
FROM EmployeeProfile
GROUP BY Email
HAVING COUNT(*) > 1 )
SQL Fiddle
with cte as (
select *
, count(1) over (partition by email) noDuplicates
from Demo
)
select *
from cte
where noDuplicates > 1
order by Email, EmployeeId
Explanation:
I've used a common table expression (cte) here; but you could equally use a subquery; it makes no difference.
This cte/subquery fetches every row, and includes a new field called noDuplicates which says how many records have that same email address (including the record itself; so noDuplicates=1 actually means there are no duplicates; whilst noDuplicates=2 means the record itself and 1 duplicate, or 2 records with this email address). This field is calculated using an aggregate function over a window. You can read up on window functions here: https://learn.microsoft.com/en-us/sql/t-sql/queries/select-over-clause-transact-sql?view=sql-server-2017
In out outer query we're then selecting only those records with noDuplicates greater than 1; i.e. where there are multiple records with the same mail address.
Finally I've sorted by Email and EmployeeId; so that duplicates are listed alongside one another, and are presented in the sequence in which they were (presumably) created; just to make whoever's then dealing with these results life easy.
If EmployeeId is unique, then you can EXISTS :
SELECT ep.*
FROM EmployeeProfile ep
WHERE EXISTS (SELECT 1
FROM EmployeeProfile ep1
WHERE ep1.Email = ep.Email AND ep1.EmployeeId <> ep.EmployeeId
);

Sql select the winner

having this database, bold = PK
CERTIFICATE(USERID, CERTIFICATENAME)
i need to find the userid with the maximum number of certificates with a SQL query.
sample data:
USERID, CERTIFICATENAME
1,cert1
1,cert2
1,cert3
2,cert4
2,cert5
3,cert2
4,cert1
with this sample data i need a query for find that user:1 has 3 certificates, this user has the maximum number of certificates.
request result:
USERID, COUNT
1,3
in this case my dbms is oracle, but i'm looking for a generic sql solution to my problem.
Using old plain group by:
select top 1 userid, count(certificatename) total
from certificates
group by userid -- but not certificatename
order by 2 desc --you can use total or count(certificatname) here
Common Table Expressions (CTE) don't add any performance preferences because you need group by in any case.
As a subquery:
SELECT MAX(Total), UserId FROM -- select the max count
( -- create the counts per user
SELECT Count(CertificateName) as Total,
UserId
FROM YourTable
GROUP BY CertificateName, UserId
) GROUP BY Total, UserId

MS Access SQL error

This code is supposed to select the TOP 1, but it's not working properly. Instead of showing only the TOP 1 record, it is showing tons of records. It may be because I have 2 tables referenced. In another code I only had 1 and it worked. I need to reference table attendance though so I'm not sure how to work around that. Thanks!
SELECT TOP 1 userID
FROM attendance, CFRRR
WHERE [attendance.Programs] LIKE CFRRR.program
AND [attendance.Language] LIKE CFRRR.language
AND [attendance.Status] = 'Available'
ORDER BY TS ASC
Here are the table fields for attendance: userID, username, Supervisor, Category, AttendanceDay, AttendanceTime, Programs, Language, Status, TS.
Here are the table fields for CFRRR: CFRRRID, WorkerID, Workeremail, Workername, Dateassigned, assignedby, RRRmonth, Scheduledate, scheduledtime, type, ScheduledType, caseid, language, lastname, firstname, Checkedin, Qid, status, CompletedType, comments, actiondate, verifduedate, program.
Clearly the last table has a lot of records.
SELECT TOP in MS Access differs from SELECT TOP in SQL Server and similar functionality in other databases. It returns the top rows based on the order by. Then it continues to return rows that match the last value. This is convenient sometimes, which is why SQL Server has this functionality as SELECT TOP WITH TIES.
To fix this, you need to include one or more columns that is unique for each generated row:
SELECT TOP 1 userID
FROM attendance as a,
CFRRR
WHERE a.Programs LIKE CFRRR.program AND
a.Language LIKE CFRRR.language AND
a.Status = 'Available'
ORDER BY TS ASC, userId, CFFRID

How to display all columns associated with duplicate emails in SQL server 2008

I've done some research on looking for a way to filter duplicate emails so all columns display the data associated with these duplicate emails, but can't find an answer to help me with this.
I currently have data pulled using the following code:
SELECT
Email, COUNT(Email) AS dup_count
FROM
[cem_farmers_masterinvitelist].[dbo].InviteList
GROUP BY
Email
HAVING
(COUNT(Email) > 1)
Order by Email
It then gives me xxxxxx amount of rows. I then want to be able to pull any data (columns) that are associated with these duplicate emails -and just the duplicates.
SELECT * FROM [marks_party_MasterInvite] .[dbo].[InviteList]
WHERE
Email in(Select Email FROM [marks_party_MasterInvite].[dbo].[InviteList] GROUP BY Email HAVING COUNT(Email)>1)
I know I am doing something wrong, because the row count doesn't match.
So any help would be greatly appreciated!
Thanks guys,
You want to use window functions. The following adds the count to each row. Then you can use a where filter to get all the columns:
SELECT il.*
FROM (select il.*, count(*) over (partition by email) as cnt
from [cem_farmers_masterinvitelist].[dbo].InviteList
) il
where cnt > 1
Order by Email
The counts don't match because when you fetch every row, you are going to get duplicates. In the first query, you are getting distinct emails.
Join the source table, to the knowledge you just generated about your source:
SELECT *
FROM [cem_farmers_masterinvitelist].[dbo].InviteList src
INNER JOIN
(
SELECT
Email, COUNT(Email) AS dup_count
FROM
[cem_farmers_masterinvitelist].[dbo].InviteList
GROUP BY
Email
) qDupes
ON qDupes.email = src.email AND qDupes.dup_count > 1
Row count will be different, if you want to get count of all dupe records involved, you can use *WITH rollup *, witch will total the number for you... This number should match the row number of second Query...
SELECT
Email, COUNT(Email) AS dup_count
FROM
[cem_farmers_masterinvitelist].[dbo].InviteList
GROUP BY
Email WITH ROLEUP
HAVING
(COUNT(Email) > 1)
Order by Email