How to count distinct values in SQL union? - sql

I can select distinct values from two different columns, but do not know how to count them.
My guess is that i should use alias but cant figure out how to write statement correctly.
$sql = "SELECT DISTINCT author FROM comics WHERE author NOT IN
( SELECT email FROM bans ) UNION
SELECT DISTINCT email FROM users WHERE email NOT IN
( SELECT email FROM bans ) ";
Edit1: i know that i can use mysql_num_rows() in php, but i think that takes too much processing.

You could wrap the query in a subquery:
select count(distinct author)
from (
SELECT author
FROM comics
WHERE author NOT IN ( SELECT email FROM bans )
UNION ALL
SELECT email
FROM users
WHERE email NOT IN ( SELECT email FROM bans )
) as SubQueryAlias
There were two distincts in your query, and union filters out duplicates. I removed all three (the non-distinct union is union all) and moved the distinctness to the outer query with count(distinct author).

You can always do SELECT COUNT(*) FROM (SELECT DISTINCT...) x and just copy that UNION into the second SELECT (more precisely, it's called an anonymous view).

Related

Is it possible to UNION distinct rows but disregard one column to determine uniqueness?

select d.id, d.registration_number
from DOCUMENTS d
union
select dd.id, dd.registration_number
from DIFFERENT_DOCUMENTS dd
Would it be possible to union those results based solely on the uniqueness of the registration_number, disregarding the id of the documents?
Or, is it possible to achieve the same result in a different way?
Just to add: actually I'm unioning 5 queries, each ~20 lines long, with 4 columns that should be disregarded in determining uniqueness.
you basically need to wrap the unioned data with something else to get only the ones you want.
SELECT min(id), registration_number
FROM (SELECT id, registration_number
FROM documents
UNION ALL
SELECT id, registration_number
FROM different_documents)
GROUP BY registration_number
Union will check the combination of all the columns for uniqueness. You could, however, use union all (that does not remove duplicates) and then apply the logic yourself using the row_number window function:
SELECT id, registration_number
FROM (SELECT id, registration_number,
ROW_NUMBER() OVER (PARTITION BY registration_number ORDER BY id) AS rn
FROM (SELECT id, registration_number
FROM documents
UNION ALL
SELECT id, registration_number
FROM different_documents) u
) r
WHERE rn = 1
Since the other answers are already correct, may I ask why do you need to retrieve other columns in that query since the primary purpose appear to gather unique registration numbers?
Wouldn't it be simpler to first gather unique registration number and then retrieve the other info?
Or in your actual query, first gather the info without the columns that should be disregarded and then gather the info in these column if need be?
Like,for example, making a view with
SELECT d.registration_number
FROM DOCUMENT d
UNION
SELECT dd.registration_number
FROM DIFFERENT_DOCUMENT dd
and then gather information using that view and JOINS?
Assuming registration_number is unique in each table, you can use not exists:
select d.id, d.registration_number
from DOCUMENTS d
union all
select dd.id, dd.registration_number
from DIFFERENT_DOCUMENTS dd
where not exists (select 1
from DOCUMENTS d
where dd.registration_number = d.registration_number
);

Perform Simple Group By in Google Big Query

i have the simplest query on google big query that keeps returning an error
Grouping by expressions of type STRUCT is not allowed
i am simply trying to select a list of emails from two locations, union them in one cte, and count frequency in the cte to identify duplicates.
this should be very easy - what am i missing??
with a as (select properties.email as email, 'loc1' as tag from `loc1.contacts`),
b as (select properties.email as email, 'loc2' as tag from `loc2.contacts`),
c as (
select * from a
union all
select * from b
)
select email, count(email) from c group by 1
sample data:
email/tag
bob#email.com/loc1
bob#email.com/loc2
expected results:
email/count
bob#email.com/2
looks like i needed to add .value to actually get the value of the email field, following query worked as desired
with a as (select properties.email.value as email, 'loc1' as tag from `loc1.contacts`),
b as (select properties.email.value as email, 'loc2' as tag from `loc2.contacts`),
c as (
select * from a
union all
select * from b
)
select email, count(email) from c group by 1

Count two columns in one

I have two columns (user_from and user_to) and I need to know how many different users appears in my database. What is a good and fast way to do that?
I'm using PostgreSQL, btw.
select distinct tmp.UserName from
(
select distinct user_from as UserName from YourTable
union
select distinct user_To as UserName from YourTable
) as tmp;
This query is quite sufficient to get the list of users:
select user_from as UserName
from t
union -- intentional to remove duplicates
select user_To as UserName
from t;
If you want the count, then:
select count(*)
from (select user_from as UserName
from t
union
select user_To as UserName
from t
) t;

SQL SELECT Full Row with Duplicated Data in One Column

I am using Microsoft SQL Server 2014.
I am able to list emails which are duplicated.
But I am unable to list the entire row, which contain other fields such as EmployeeId, Username, FirstName, LastName, etc.
SELECT Email,
COUNT(Email) AS NumOccurrences
FROM EmployeeProfile
GROUP BY Email
HAVING ( COUNT(Email) > 1 )
May I know how can I list all field in the rows that contains Email appearing more than once in the table?
Thank you.
Try this:
WITH DataSource AS
(
SELECT *
,COUNT(*) OVER (PARTITION BY email) count_calc
FROM EmployeeProfile
)
SELECT *
FROM DataSource
WHERE count_calc > 1
select distinct * from EmployeeProfile where email in (SELECT
Email
FROM EmployeeProfile
GROUP BY Email
HAVING COUNT(*) > 1 )
SQL Fiddle
with cte as (
select *
, count(1) over (partition by email) noDuplicates
from Demo
)
select *
from cte
where noDuplicates > 1
order by Email, EmployeeId
Explanation:
I've used a common table expression (cte) here; but you could equally use a subquery; it makes no difference.
This cte/subquery fetches every row, and includes a new field called noDuplicates which says how many records have that same email address (including the record itself; so noDuplicates=1 actually means there are no duplicates; whilst noDuplicates=2 means the record itself and 1 duplicate, or 2 records with this email address). This field is calculated using an aggregate function over a window. You can read up on window functions here: https://learn.microsoft.com/en-us/sql/t-sql/queries/select-over-clause-transact-sql?view=sql-server-2017
In out outer query we're then selecting only those records with noDuplicates greater than 1; i.e. where there are multiple records with the same mail address.
Finally I've sorted by Email and EmployeeId; so that duplicates are listed alongside one another, and are presented in the sequence in which they were (presumably) created; just to make whoever's then dealing with these results life easy.
If EmployeeId is unique, then you can EXISTS :
SELECT ep.*
FROM EmployeeProfile ep
WHERE EXISTS (SELECT 1
FROM EmployeeProfile ep1
WHERE ep1.Email = ep.Email AND ep1.EmployeeId <> ep.EmployeeId
);

Recursive CTE including unions inside anchor and recursive expression

So i have three tables with the following schema,
Users(id, name)
Colleagues(id1, id2)
Friends(id1, id2)
And i need to write a query that returns every pair of id's so that id_2 can be reached from id_1 using an arbitrary number of connections between colleagues and friends.
I worked out a query that gives me every connection using either Colleagues or Friends, but not both.
This is is what i came up with trying to use both tables in the same CTE:
WITH RECURSIVE Reachable (id_1, id_2)
AS (
SELECT
*
FROM (
SELECT
id,
FRIENDS.id2
FROM
USERS,
FRIENDS
WHERE
FRIENDS.id1 = USERS.id
UNION
SELECT
id,
COLLEAGUES.id2
FROM
USERS,
COLLEAGUES
WHERE
COLLEAGUES.id1 = USERS.id)
UNION
SELECT
*
FROM (
SELECT
REACHABLE.id_1,
FRIENDS.id2
FROM
REACHABLE,
FRIENDS
WHERE
REACHABLE.id_2 = FRIENDS.id1
UNION
SELECT
REACHABLE.id_1,
COLLEAGUES.id2
FROM
REACHABLE,
COLLEAGUES
WHERE
REACHABLE.id_2 = COLLEAGUES.id1));
But i'm getting this error:
Error: near line 1: recursive reference in a subquery: Reachable
Does that mean i can't/shouldn't use subqueries in a recursive call in general? is it even possible to perform this query inside the same CTE? if so, how could i do it?
Thanks in advance!
The reference to the recursive CTE must not be in a subquery, and the two parts separated with UNION (ALL) must be a the top level of the WITH.
If there is no difference between friends and colleagues for this query, just merge the two tables before doing the recursive CTE:
WITH RECURSIVE
Connections AS (
SELECT id_1, id_2 FROM Colleagues
UNION ALL
SELECT id_1, id_2 FROM Friends
),
Reachable(id_1, id_2) AS (
SELECT ...
FROM Users, Connections
...
UNION
...
)
SELECT * FROM Reachable;