Using EXCEPT and DISTINCT in SQL Server - sql

I'm trying to figure out if any users in this table have multiple email addresses.
When I run the following two queries, the non-DISTINCT query has more results than the one using DISTINCT.
SELECT UserName, EmailAddress
FROM Users;
SELECT DISTINCT UserName, EmailAddress
FROM Users;
However, this query does not return any results (presumably since the remaining rows would be identical to one which is in both tables).
SELECT UserName, Payment
FROM Users
EXCEPT
SELECT DISTINCT UserName, Payment
FROM Users
How can I get the users with multiple email addresses?

Just use aggregation. If you want the list:
SELECT UserName, EmailAddress
FROM Users
GROUP BY UserName, EmailAddress
HAVING COUNT(*) > 1;
If you just want a count of duplicates, you can count the difference. Unfortunately, SQL Server doesn't allow multiple columns for COUNT(DISTINCT) but you can concatenate them. Assuming neither value is ever NULL:
SELECT COUNT(*) - COUNT(DISTINCT UserName + ' ' + EmailAddress) as numDuplicates
FROM Users;

Related

Oracle: Selecting Duplicates with where clause?

I have a question: Why can't I just use the following SQL query to get a list of unique eMail addresses from the PERSON table?
SELECT NOT DISTINCT Email FROM PERSON
I think the easiest and common way to achieve this is with grouping by the Email column and then keep the records having count = 1.
SELECT Email, COUNT(Email)
FROM PERSON
GROUP BY Email
HAVING COUNT(Email) > 1;
NOT DISTINCT is not working because it is not a valid expression.
DISTINCT is used to return only different values, so NOT before it is not working as you expect.
You asked to get a list of unique eMail addresses, which is done by:
SELECT DISTINCT(Email) FROM PERSON;
Standard SQL does not have NOT DISTINCT or anything like that.

how to write this SQL query returns the number of users having the first name who have registered on each day.*

An SQL table called users contains data about users. It has columns id, firstname, lastname, and registerdate (which has type DATE). Write a query that returns the number of users having the first name ‘Sally’ who have registered on each day.*
SELECT
TRUNC(REGISTERDATE), COUNT(*)
FROM
USERS
WHERE
firstname = 'Sally'
GROUP BY
TRUNC(REGISTERDATE)

Unique sorted list of column values from 2 tables

I need to get a list of unique email addresses across 2 table. For example I have the selects:
select distinct
email
from
contacts
order by
email
select distinct
email
from
customers
order by
email
If I only needed one of those, piece of cake. If I wanted them as 2 columns side by side, also piece of cake.
But how do I get them as a single column, no duplicates, sorted? This will be running on Azure Sql Database if that is useful.
What about something like:
select distinct email from contacts order by email // Your first query
union
select distinct email from customers order by email // Your second query
The union creates a single column email with data from both queries and already eliminates duplicates, unlike union all.
For the ordering, just add ORDER BY email in the end.
select distinct
email
from (
select distinct email from contacts order by email
union all
select distinct email from customers order by email
) as emails

Using multiple columns with counts in access database designer

I am trying to display several columns with different counts in a microsoft access query. It doesn't let me do certain things a normal query can b/c it has the sql design view.
I'd like to display
multiple single etc columns with their counts.
Note: the table names and attributes have been changed.
select (select count(*)as multiple from (select userId from dbo.Purchases
where userId is not null GRoup by userId having count(*)>1) x), (
select count(*)as single from (select userId from dbo.Purchases where
userId is not null GRoup by userId having count(*)=1) x );
if I do these separately I can display it, but I'd like to combine them into one query and one row. Is this possible?
select count(*)as multiple from (select userId from dbo.Purchases
where userId is not null GRoup by userId having count(*)>1) x)
It's very easy with 2 queries:
First one, saved as "Purchases Summary"
Select UserID, count(UserID) as Count from Purchases Group By UserID
With a 2nd built on it:
SELECT Sum(IIf([count]=1,1)) AS [Single], Sum(IIf([count]>1,1)) AS Multiple FROM [Purchases Summary]
I cannot find a clever way to combine this into a single query.
I don't know what my problem last night was, but the single query is
SELECT Sum(IIf([count]=1,1)) AS [Single], Sum(IIf([count]>1,1)) AS Multiple
FROM (Select UserID, count(UserID) as Count from Purchases Group By UserID)
Don George solution also would work.
I ended up using a form and VBA for each column. An issue I had was that I needed to use a distinct call for unique IDs and when there is a sql design view that's not really supported. distinctrow is supported, but it would not work for my query. I ended up writing it as before so that it did not need distinct.
This is the VBA I used to override each input inside of an access form. The currentDb needs to be connected to the database properly before it will also work.
selectStatement = "SELECT Count(* ) FROM (SELECT userID FROM dbo_Purchases WHERE userID is not null GROUP BY userID HAVING count(*)>1) AS x;"
rs = CurrentDb.OpenRecordset(selectStatement).Fields(0).Value
[Text30].Value = rs

FIRST ORDER BY ... THEN GROUP BY

I have two tables, one stores the users, the other stores the users' email addresses.
table users: (userId, username, etc)
table userEmail: (emailId, userId, email)
I would like to do a query that allows me to fetch the latest email address along with the user record.
I'm basically looking for a query that says
FIRST ORDER BY userEmail.emailId DESC
THEN GROUP BY userEmail.userId
This can be done with:
SELECT
users.userId
, users.username
, (
SELECT
userEmail.email
FROM userEmail
WHERE userEmail.userId = users.userId
ORDER BY userEmail.emailId DESC
LIMIT 1
) AS email
FROM users
ORDER BY users.username;
But this does a subquery for every row and is very inefficient. (It is faster to do 2 separate queries and 'join' them together in my program logic).
The intuitive query to write for what I want would be:
SELECT
users.userId
, users.username
, userEmail.email
FROM users
LEFT JOIN userEmail USING(userId)
GROUP BY users.userId
ORDER BY
userEmail.emailId
, users.username;
But, this does not function as I would like. (The GROUP BY is performed before the sorting, so the ORDER BY userEmail.emailId has nothing to do).
So my question is:
Is it possible to write the first query without making use of the subqueries?
I've searched and read the other questions on stackoverflow, but none seems to answer the question about this query pattern.
But this does a subquery for every row and is very inefficient
Firstly, do you have a query plan / timings that demonstrate this? The way you've done it (with the subselect) is pretty much the 'intuitive' way to do it. Many DBMS (though I'm not sure about MySQL) have optimisations for this case, and will have a way to execute the query only once.
Alternatively, you should be able to create a subtable with ONLY (user id, latest email id) tuples and JOIN onto that:
SELECT
users.userId
, users.username
, userEmail.email
FROM users
INNER JOIN
(SELECT userId, MAX(emailId) AS latestEmailId
FROM userEmail GROUP BY userId)
AS latestEmails
ON (users.userId = latestEmails.userId)
INNER JOIN userEmail ON
(latestEmails.latestEmailId = userEmail.emailId)
ORDER BY users.username;
If this is a query you do often, I recommend optimizing your tables to handle this.
I suggest adding an emailId column to the users table. When a user changes their email address, or sets an older email address as the primary email address, update the user's row in the users table to indicate the current emailId
Once you modify your code to do this update, you can go back and update your older data to set emailId for all users.
Alternatively, you can add an email column to the users table, so you don't have to do a join to get a user's current email address.