How can I create a Running Total Sumifs-like function in SQL? - sql

I'm pretty new to SQL, but Excel has become far too slow to continue working with, so I'm trying SQLiteStudio. I'm looking to create a column in a query showing running total over time (characterized as Schedule Points, marking each percent through a project's run time). Complete marks whether a Location has completed the install (Y/NULL), and is used simply to filter out incomplete locations from further calculations.
I currently have
With cte as(
Select [Location]
,[HW/NonHW]
,[Obligation/Actual]
,[Schedule Point]
,[CY20$]
,[Vendor Name]
,[Vendor Zip Code]
,[Complete]
,[System Rollup (Import)]
,IIf([Complete] = "Y", [CY20$], 0) As [Completed Costs]
FROM data)
Select [Location]
,[HW/NonHW]
,[Obligation/Actual]
,[Schedule Point]
,[CY20$]
,[Vendor Name]
,[Vendor Zip Code]
,[Complete]
,[System Rollup (Import)]
,[Completed Costs]
,SUM([Completed Costs]) OVER (PARTITION BY [Obligation/Actual], [Normalized Schedule Location 1%],[System Rollup (Import)], [HW/NonHW]) As [CY20$ Summed]
FROM cte
At this point, what I'm looking to do is a sum not for each Schedule Point, but all prior Schedule Points (i.e. the <= operator in an Excel sumifs statement)
For reference, here is the sumifs I am trying to replicate:
=SUMIFS($N$2:$N$541790,$AU$2:$AU$541790,"Y",$AQ$2:$AQ$541790,AQ2,$AI$2:$AI$541790,AI2,$AH$2:$AH$541790,AH2,$AJ$2:$AJ$541790, "<=" & AJ2)
N is CY20$, AU is Complete, AQ is System, AI is Obligation/Actual, AH is HW/NonHW, AJ is Schedule Point.
Any help would be appreciated!

The equivalent to SUMIFS is a combination of SUM and CASE-WHEN in SQL.
Abstract example:
SELECT
SUM(
CASE
WHEN <condition1> AND <condition2> AND <condition3> THEN 1
ELSE 0
END
)
FROM yourtable
In the above, condition1, condition2 and condition3 are logical expressions, the < and > are just notifying that you have an expression there, it is not part of the syntax. It is also unnecessary to have exactly 3 conditions, you can have as many as you like. Also, it is unnecessary to use AND as the operator, you can construct your own expression as you like. The reason for which I have used the AND operator was that you intend to have a disjunction, presumably, based on the fact that you used SUMIFS.
A more concrete example:
CREATE TABLE person(
number int,
name text,
age int
);
INSERT INTO person(number, name, age)
VALUES(1, 'Joe', 12);
INSERT INTO person(number, name, age)
VALUES(2, 'Jane' 12);
INSERT INTO person(number, name, age)
VALUES(3, 'Robert', 16);
INSERT INTO person(number, name, age)
VALUES(4, 'Roberta', 15);
INSERT INTO person(number, name, age)
VALUES(5, 'Blian', 18);
INSERT INTO person(number, name, age)
VALUES(6, 'Bigusdqs', 19);
SELECT
SUM(
CASE
WHEN age <= 16 AND name <> 'Joe' THEN 1
ELSE 0
END
) AS MySUMIFS
FROM person;
EDIT
If we are interested to know how many people have a smaller age than the current person, then we can do a join:
SELECT
SUM(
CASE
WHEN p2.age <= p1.age THEN 1
ELSE 0
END
) AS MySUMIFS, name
FROM person p1
JOIN person p2
ON p1.name <> p2.name
GROUP BY p1.name;
EDIT2
Created a Fiddle based on the ideas described above, you can reach it at https://dbfiddle.uk/?rdbms=sqlite_3.27&fiddle=3cb0232e5d669071a3aa5bb1df68dbca
The code in the fiddle:
CREATE TABLE person(
number int,
name text,
age int
);
INSERT INTO person(number, name, age)
VALUES(1, 'Joe', 12);
INSERT INTO person(number, name, age)
VALUES(2, 'Jane' 12);
INSERT INTO person(number, name, age)
VALUES(3, 'Robert', 16);
INSERT INTO person(number, name, age)
VALUES(4, 'Roberta', 15);
INSERT INTO person(number, name, age)
VALUES(5, 'Blian', 18);
INSERT INTO person(number, name, age)
VALUES(6, 'Bigusdqs', 19);
SELECT
SUM(
CASE
WHEN p2.age <= p1.age THEN 1
ELSE 0
END
) AS MySUMIFS, p1.name
FROM person p1
JOIN person p2
ON p1.name <> p2.name
GROUP BY p1.name;

Related

A Better Way for Conditional Counting?

Imagine I have a CTE that creates a result set containing all of the information I need and need to do a bunch of conditional counting on the result set. Is there a better way to do it than a bunch of subqueries?
I can't use count() over () either as I need to sometimes do a distinct count on values and using a case when val=true then 1 else null end to conditionally count doesn't let me distinctly count, not to mention that it is basically the same as doing a bunch of subqueries.
Any recommendations, or is creating a bunch of subqueries the way to go?
(example SQL Fiddle)
Table Definitions
create table person (id int, name varchar2(20), age int, cityID int);
create table city (id int, name varchar2(20), stateID int);
create table state (id int, name varchar2(20));
insert into person values(1, 'Bob', 45, 1);
insert into person values(2, 'Joe', 33, 1);
insert into person values(3, 'Craig', 20, 1);
insert into person values(4, 'Alex', 45, 2);
insert into person values(5, 'Kevin', 33, 3);
insert into city values(1, 'Chicago', 1);
insert into city values(2, 'New York', 2);
insert into city values(3, 'Los Angeles', 3);
insert into state values(1, 'Illinois');
insert into state values(2, 'New York');
insert into state values(3, 'California');
SQL Query Example
with cte as (
select p.name pName
, p.age pAge
, c.name cName
, s.name sName
from person p
inner join city c
on p.cityID = c.ID
inner join state s
on c.stateID = s.ID
)
select distinct
(select count(*) from cte) totalRows
, (select count(*) from cte where pAge = 45) total45YO
, (select count(*) from cte where cName like 'Chicago') totalChicago
, (select count(distinct cName) from cte) totalCities
from cte
An example output I would hope for
TOTALROWS TOTAL45YO TOTALCHICAGO TOTALCITIES
------------------------------------------------------
5 2 3 3
Easiest is just as #jarlh mentions and use case/sum combinations to accomplish as follows.
SQL> select count(*) totalRows
2 , sum(case when p.age=45 then 1 else 0 end) total45YO
3 , sum(case when c.name like 'Chicago' then 1 else 0 end) totalChicago
4 , count(distinct c.name) totalCities
5 from person p
6 inner join city c
7 on p.cityID = c.ID
8 inner join state s
9 on c.stateID = s.ID;
TOTALROWS TOTAL45YO TOTALCHICAGO TOTALCITIES
____________ ____________ _______________ ______________
5 2 3 3
SQL>

SQL query for querying counts from a table

The prompt is to form a SQL query.
That finds the students name and ID who attend all lectures having ects more than 4.
The tables are
CREATE TABLE CLASS (
STUDENT_ID INT NOT NULL,
LECTURE_ID INT NOT NULL
);
CREATE TABLE STUDENT (
STUDENT_ID INT NOT NULL,
STUDENT_NAME VARCHAR(255),
PRIMARY KEY (STUDENT_ID)
)
CREATE TABLE LECTURE (
LECTURE_ID INT NOT NULL,
LECTURE_NAME VARCHAR(255),
ECTS INT,
PRIMARY KEY (LECTURE_ID)
)
I came up with this query but this didn't seem to work on SQLFIDDLE. I'm new to SQL and this query has been a little troublesome for me. How would you query this?
SELECT STUD.STUDENT_NAME FROM STUDENT STUD
INNER JOIN CLASS CLS AND LECTURE LEC ON
CLS.STUDENT_ID = STUD.STUDENT_ID
WHERE LEC.CTS > 4
How do I fix this query?
UPDATE
insert into STUDENT values(1, 'wick', 20);
insert into STUDENT values(2, 'Drake', 25);
insert into STUDENT values(3, 'Bake', 42);
insert into STUDENT values(4, 'Man', 5);
insert into LECTURE values(1, 'Math', 6);
insert into LECTURE values(2, 'Prog', 6);
insert into LECTURE values(3, 'Physics', 1);
insert into LECTURE values(4, '4ects', 4);
insert into LECTURE values(5, 'subj', 4);
insert into SCLASS values(1, 3);
insert into SCLASS values(1, 2);
insert into SCLASS values(2, 3);
insert into SCLASS values(3, 1);
insert into SCLASS values(3, 2);
insert into SCLASS values(3, 3);
insert into SCLASS values(4, 4);
insert into SCLASS values(4, 5);
The following approach might get the job done.
It works by generating two subqueries :
one that counts how many lectures whose ects is greater than 4 were taken by each user
another that just counts the total number of lectures whose ects is greater than 4
Then, the outer query filters in users whose count reaches the total :
SELECT x.student_id, x.student_name
FROM
(
SELECT s.student_id, s.student_name, COUNT(DISTINCT l.lecture_id) cnt
FROM
student s
INNER JOIN class c ON c.student_id = s.student_id
INNER JOIN lecture l ON l.lecture_id = c.lecture_id
WHERE l.ects > 4
GROUP BY s.student_id, s.student_name
) x
CROSS JOIN (SELECT COUNT(*) cnt FROM lecture WHERE ects > 4 ) y
WHERE x.cnt = y.cnt ;
As GMB already said in their answer: count required lections and compare with those taken per student. Here is another way to write such query. We outer join classes to all lectures with ECTS > 4. Analytic window functions allow us to aggregate by two different groups at the same time (here: all rows and student's rows).
select *
from student
where (student_id, 0) in -- 0 means no gap between required and taken lectures
(
select
student_id,
count(distinct lecture_id) over () -
count(distinct lecture_id) over (partition by c.student_id) as gap
from lecture l
left join class c using (lecture_id)
where l.ects > 4
);
Demo: https://dbfiddle.uk/?rdbms=oracle_18&fiddle=74371314913565243863c225847eb044
You can try the following query.
SELECT distinct
STUD.STUDENT_NAME,
STUD.STUDENT_ID
FROM STUDENT STUD
INNER JOIN CLASS CLS ON CLS.STUDENT_ID = STUD.STUDENT_ID
INNER JOIN LECTURE LEC ON LEC.LECTURE_ID=CLS.LECTURE_ID
where LEC.ECTS > 4 group by STUD.STUDENT_ID,STUD.STUDENT_NAME
having COUNT(STUD.STUDENT_ID) =(SELECT COUNT(*) FROM LECTURE WHERE ECTS > 4)

SQL Query: Display the Staff Id, fname, last name, Malpractice Date, Malpractice Desc for all doctors that have had more than 2 cases in the last year

I have tried to solve this for a few hours and I am close but haven't been able to get it to work. Here is what I have now:
SELECT
Doctor_ID, s.firstname,s.lastname
FROM
MALPRACTICE M
INNER JOIN
STAFF S ON S.STAFF_ID = M.DOCTOR_ID
WHERE
Malpractice_date >= '01/01/2017'
GROUP BY
doctor_id, s.firstname, s.lastname
When I add malpractice_date and malpractice_desc to the select statement, it yields no result. What am I missing?
Your SQL is close, but the point of group by is to use aggregate functions such as sum, avg, min, max and so on. You then can use having (which is a kind of where after group by) on sum. This works for at least SQLite:
create table staff (staff_id, firstname, lastname);
create table malpractice (doctor_id, malpractice_date);
insert into staff values (1, 'Dr', 'Phil');
insert into staff values (2, 'Doc', 'Hollywood');
insert into staff values (3, 'Doctor', 'Dolittle');
insert into malpractice values (1, '2017-02-28');
insert into malpractice values (1, '2018-01-20');
insert into malpractice values (1, '2018-02-12');
insert into malpractice values (1, '2018-03-01');
insert into malpractice values (1, '2018-04-22');
insert into malpractice values (2, '2016-03-01');
insert into malpractice values (2, '2017-12-01');
insert into malpractice values (2, '2018-04-22');
select m.doctor_id, s.firstname, s.lastname, sum(1) malpractices_last_year
from malpractice m
join staff s on s.staff_id=m.doctor_id
where strftime('%J','now') - strftime('%J',m.malpractice_date) <= 365
group by m.doctor_id, s.firstname, s.lastname
having sum(1) > 2;
Test this SQL yourself, or change it, at: http://sqlfiddle.com/#!7/600da/11
(I interpreted in the last year as being within the last 365 days)
One thing: If your database stores dates as strings (not recommended, unless you're using SQLite, where it's necessary), which Malpractice_date >= '01/01/2017' suggests, then you should at least make the dates sortable by putting year before month and month before day: 22nd April 2018 is then 2018-04-22.
You didn't mention which database system you use, and they are notorious for treating dates differently. In SQLite strftime is often used for "doing math" on date strings, on other databases other functions will be needed.
try all of your attributes prepended with your aliases s and m

oracle correlated subquery using distinct listagg

I have an interesting query I'm trying to figure out. I have a view which is getting a column added to it. This column is pivoted data coming from other tables, to form into a single row. Now, I need to wipe out duplicate entries in this pivoted data. Listagg is great for getting the data to a single row, but I need to make it unique. While I know how to make it unique, I'm tripping up on the fact that correlated sub-queries only go 1 level deep. So... not really sure how to get a distinct list of values. I can get it to work if I don't do the distinct just fine. Anyone out there able to work some SQL magic?
Sample data:
drop table test;
drop table test_widget;
create table test (id number, description Varchar2(20));
create table test_widget (widget_id number, test_fk number, widget_type varchar2(20));
insert into test values(1, 'cog');
insert into test values(2, 'wheel');
insert into test values(3, 'spring');
insert into test_widget values(1, 1, 'A');
insert into test_widget values(2, 1, 'A');
insert into test_widget values(3, 1, 'B');
insert into test_widget values(4, 1, 'A');
insert into test_widget values(5, 2, 'C');
insert into test_widget values(6, 2, 'C');
insert into test_widget values(7, 2, 'B');
insert into test_widget values(8, 3, 'A');
insert into test_widget values(9, 3, 'C');
insert into test_widget values(10, 3, 'B');
insert into test_widget values(11, 3, 'B');
insert into test_widget values(12, 3, 'A');
commit;
Here is an example of the query that works, but shows duplicate data:
SELECT A.ID
, A.DESCRIPTION
, (SELECT LISTAGG (WIDGET_TYPE, ', ') WITHIN GROUP (ORDER BY WIDGET_TYPE)
FROM TEST_WIDGET
WHERE TEST_FK = A.ID) widget_types
FROM TEST A
Here is an example of what does NOT work due to the depth of where I try to reference the ID:
SELECT A.ID
, A.DESCRIPTION
, (SELECT LISTAGG (WIDGET_TYPE, ', ') WITHIN GROUP (ORDER BY WIDGET_TYPE)
FROM (SELECT DISTINCT WIDGET_TYPE
FROM TEST_WIDGET
WHERE TEST_FK = A.ID))
WIDGET_TYPES
FROM TEST A
Here is what I want displayed:
1 cog A, B
2 wheel B, C
3 spring A, B, C
If anyone knows off the top of their head, that would fantastic! Otherwise, I can post up some sample create statements to help you with dummy data to figure out the query.
You can apply the distinct in a subquery, which also has the join - avoiding the level issue:
SELECT ID
, DESCRIPTION
, LISTAGG (WIDGET_TYPE, ', ')
WITHIN GROUP (ORDER BY WIDGET_TYPE) AS widget_types
FROM (
SELECT DISTINCT A.ID, A.DESCRIPTION, B.WIDGET_TYPE
FROM TEST A
JOIN TEST_WIDGET B
ON B.TEST_FK = A.ID
)
GROUP BY ID, DESCRIPTION
ORDER BY ID;
ID DESCRIPTION WIDGET_TYPES
---------- -------------------- --------------------
1 cog A, B
2 wheel B, C
3 spring A, B, C
I was in a unique situation using the Pentaho reports writer and some inconsistent data. The Pentaho writer uses Oracle to query data, but has limitations. The data pieces were unique but not classified in a consistent manner, so I created a nested listagg inside of a left join to present the data the way I wanted to:
left join
(
select staff_id, listagg(thisThing, ' --- '||chr(10) ) within group (order by this) as SCHED_1 from
(
SELECT
staff_id, RPT_STAFF_SHIFTS.ORGANIZATION||': '||listagg(
RPT_STAFF_SHIFTS.DAYS_OF_WEEK
, ',' ) within group (order by BEGIN_DATE desc)
as thisThing
FROM "RPT_STAFF_SHIFTS" where "RPT_STAFF_SHIFTS"."END_DATE" is null
group by staff_id, organization)
group by staff_id
) schedule_1 on schedule_1.staff_id = "RPT_STAFF"."STAFF_ID"
where "RPT_STAFF"."STAFF_ID" ='555555'
This is a different approach than using the nested query, but it some situations it might work better by taking into account the level issue when developing the query and taking an extra step to fully concatenate the results.

How to detect duplicate records with sub table records

Let's say I'm creating an address book in which the main table contains the basic contact information and a phone number sub table -
Contact
===============
Id [PK]
Name
PhoneNumber
===============
Id [PK]
Contact_Id [FK]
Number
So, a Contact record may have zero or more related records in the PhoneNumber table. There is no constraint on uniqueness of any column other than the primary keys. In fact, this must be true because:
Two contacts having different names may share a phone number, and
Two contacts may have the same name but different phone numbers.
I want to import a large dataset which may contain duplicate records into my database and then filter out the duplicates using SQL. The rules for identifying duplicate records are simple ... they must share the same name and the same number of phone records having the same content.
Of course, this works quite effectively for selecting duplicates from the Contact table but doesn't help me to detect actual duplicates given my rules:
SELECT * FROM Contact
WHERE EXISTS
(SELECT 'x' FROM Contact t2
WHERE t2.Name = Contact.Name AND
t2.Id > Contact.Id);
It seems as if what I want is a logical extension to what I already have, but I must be overlooking it. Any help?
Thanks!
In my question, I created a greatly simplified schema that reflects the real-world problem I'm solving. Przemyslaw's answer is indeed a correct one and did what I was asking both with the sample schema and, when extended, with the real one.
But, after doing some experiments with the real schema and a larger (~10k records) dataset, I found that performance was an issue. I don't claim to be an index guru, but I wasn't able to find a better combination of indices than what was already in the schema.
So, I came up with an alternate solution which fills the same requirements but executes in a small fraction (< 10%) of the time, at least using SQLite3 - my production engine. In hopes that it may assist someone else, I'll offer it as an alternative answer to my question.
DROP TABLE IF EXISTS Contact;
DROP TABLE IF EXISTS PhoneNumber;
CREATE TABLE Contact (
Id INTEGER PRIMARY KEY,
Name TEXT
);
CREATE TABLE PhoneNumber (
Id INTEGER PRIMARY KEY,
Contact_Id INTEGER REFERENCES Contact (Id) ON UPDATE CASCADE ON DELETE CASCADE,
Number TEXT
);
INSERT INTO Contact (Id, Name) VALUES
(1, 'John Smith'),
(2, 'John Smith'),
(3, 'John Smith'),
(4, 'Jane Smith'),
(5, 'Bob Smith'),
(6, 'Bob Smith');
INSERT INTO PhoneNumber (Id, Contact_Id, Number) VALUES
(1, 1, '555-1212'),
(2, 1, '222-1515'),
(3, 2, '222-1515'),
(4, 2, '555-1212'),
(5, 3, '111-2525'),
(6, 4, '111-2525');
COMMIT;
SELECT *
FROM Contact c1
WHERE EXISTS (
SELECT 1
FROM Contact c2
WHERE c2.Id > c1.Id
AND c2.Name = c1.Name
AND (SELECT COUNT(*) FROM PhoneNumber WHERE Contact_Id = c2.Id) = (SELECT COUNT(*) FROM PhoneNumber WHERE Contact_Id = c1.Id)
AND (
SELECT COUNT(*)
FROM PhoneNumber p1
WHERE p1.Contact_Id = c2.Id
AND EXISTS (
SELECT 1
FROM PhoneNumber p2
WHERE p2.Contact_Id = c1.Id
AND p2.Number = p1.Number
)
) = (SELECT COUNT(*) FROM PhoneNumber WHERE Contact_Id = c1.Id)
)
;
The results are as expected:
Id Name
====== =============
1 John Smith
5 Bob Smith
Other engines are bound to have differing performance which may be quite acceptable. This solution seems to work quite well with SQLite for this schema.
The author stated the requirement of "two people being the same person" as:
Having the same name and
Having the same number of phone numbers and all of which are the same.
So the problem is a bit more complex than it seems (or maybe I just overthought it).
Sample data and (an ugly one, I know, but the general idea is there) a sample query which I tested on below test data which seems to be working correctly (I'm using Oracle 11g R2):
CREATE TABLE contact (
id NUMBER PRIMARY KEY,
name VARCHAR2(40))
;
CREATE TABLE phone_number (
id NUMBER PRIMARY KEY,
contact_id REFERENCES contact (id),
phone VARCHAR2(10)
);
INSERT INTO contact (id, name) VALUES (1, 'John');
INSERT INTO contact (id, name) VALUES (2, 'John');
INSERT INTO contact (id, name) VALUES (3, 'Peter');
INSERT INTO contact (id, name) VALUES (4, 'Peter');
INSERT INTO contact (id, name) VALUES (5, 'Mike');
INSERT INTO contact (id, name) VALUES (6, 'Mike');
INSERT INTO contact (id, name) VALUES (7, 'Mike');
INSERT INTO phone_number (id, contact_id, phone) VALUES (1, 1, '123'); -- John having number 123
INSERT INTO phone_number (id, contact_id, phone) VALUES (2, 1, '456'); -- John having number 456
INSERT INTO phone_number (id, contact_id, phone) VALUES (3, 2, '123'); -- John the second having number 123
INSERT INTO phone_number (id, contact_id, phone) VALUES (4, 2, '456'); -- John the second having number 456
INSERT INTO phone_number (id, contact_id, phone) VALUES (5, 3, '123'); -- Peter having number 123
INSERT INTO phone_number (id, contact_id, phone) VALUES (6, 3, '456'); -- Peter having number 123
INSERT INTO phone_number (id, contact_id, phone) VALUES (7, 3, '789'); -- Peter having number 123
INSERT INTO phone_number (id, contact_id, phone) VALUES (8, 4, '456'); -- Peter the second having number 456
INSERT INTO phone_number (id, contact_id, phone) VALUES (9, 5, '123'); -- Mike having number 456
INSERT INTO phone_number (id, contact_id, phone) VALUES (10, 5, '456'); -- Mike having number 456
INSERT INTO phone_number (id, contact_id, phone) VALUES (11, 6, '123'); -- Mike the second having number 456
INSERT INTO phone_number (id, contact_id, phone) VALUES (12, 6, '789'); -- Mike the second having number 456
-- Mike the third having no number
COMMIT;
-- does not meet the requirements described in the question - will return Peter when it should not
SELECT DISTINCT c.name
FROM contact c JOIN phone_number pn ON (pn.contact_id = c.id)
GROUP BY name, phone_number
HAVING COUNT(c.id) > 1
;
-- returns correct results for provided test data
-- take all people that have a namesake in contact table and
-- take all this person's phone numbers that this person's namesake also has
-- finally (outer query) check that the number of both persons' phone numbers is the same and
-- the number of the same phone numbers is equal to the number of (either) person's phone numbers
SELECT c1_id, name
FROM (
SELECT c1.id AS c1_id, c1.name, c2.id AS c2_id, COUNT(1) AS cnt
FROM contact c1
JOIN contact c2 ON (c2.id != c1.id AND c2.name = c1.name)
JOIN phone_number pn ON (pn.contact_id = c1.id)
WHERE
EXISTS (SELECT 1
FROM phone_number
WHERE contact_id = c2.id
AND phone = pn.phone)
GROUP BY c1.id, c1.name, c2.id
)
WHERE cnt = (SELECT COUNT(1) FROM phone_number WHERE contact_id = c1_id)
AND (SELECT COUNT(1) FROM phone_number WHERE contact_id = c1_id) = (SELECT COUNT(1) FROM phone_number WHERE contact_id = c2_id)
;
-- cleanup
DROP TABLE phone_number;
DROP TABLE contact;
Check at SQL Fiddle: http://www.sqlfiddle.com/#!4/36cdf/1
Edited
Answer to author's comment: Of course I didn't take that into account... here's a revised solution:
-- new test data
INSERT INTO contact (id, name) VALUES (8, 'Jane');
INSERT INTO contact (id, name) VALUES (9, 'Jane');
SELECT c1_id, name
FROM (
SELECT c1.id AS c1_id, c1.name, c2.id AS c2_id, COUNT(1) AS cnt
FROM contact c1
JOIN contact c2 ON (c2.id != c1.id AND c2.name = c1.name)
LEFT JOIN phone_number pn ON (pn.contact_id = c1.id)
WHERE pn.contact_id IS NULL
OR EXISTS (SELECT 1
FROM phone_number
WHERE contact_id = c2.id
AND phone = pn.phone)
GROUP BY c1.id, c1.name, c2.id
)
WHERE (SELECT COUNT(1) FROM phone_number WHERE contact_id = c1_id) IN (0, cnt)
AND (SELECT COUNT(1) FROM phone_number WHERE contact_id = c1_id) = (SELECT COUNT(1) FROM phone_number WHERE contact_id = c2_id)
;
We allow a situation when there are no phone numbers (LEFT JOIN) and in outer query we now compare the number of person's phone numbers - it must either be equal to 0, or the number returned from the inner query.
The keyword "having" is your friend. The generic use is:
select field1, field2, count(*) records
from whereever
where whatever
group by field1, field2
having records > 1
Whether or not you can use the alias in the having clause depends on the database engine. You should be able to apply this basic principle to your situation.