SQL Query where the result has duplicate rows - sql

So I have to write a query in SQL where I list the celebrities that have been in relationships with the same celeb. I basically list out celeb1, celeb2, and celeb3 where celeb3 has been in a relationship with both celeb1 and celeb2. Here's the query I'm using:
SELECT S1.Celeb1, S2.Celeb2, S3.name AS Celeb3
FROM Relationships S1, Relationships S2, Celebs S3
WHERE S3.name = S1.Celeb2
AND S3.name = S2.Celeb1
AND S1.Celeb1 <> S2.Celeb2;
It's difficult to know if this query is correct since it gives me 200 rows in the result but I looked at a few of the row and it looks like it's giving me the correct result where celeb3 has been in a relationship with both celeb1 and 2. The problem is that there are duplicate rows in the result. This could be due to the fact that in the relationships table, it lists out the celeb1, celeb2 relationship but it also lists the inverse celeb2, celeb1 as well. So how can I prevent the result from listing out duplicates?
Here are the two tables I'm using to do this (Relationships and Celebs).
CREATE TABLE Celebs(
name VARCHAR(30)
);
CREATE TABLE Relationships (
Celeb1 VARCHAR(30),
Celeb2 VARCHAR(30)
);

Let's look at a sample:
celeb1 celeb2
A B
B C
C D
Expected result:
A and C were both with B.
B and D were both with C.
In order to find these matches I suggest to duplicate tuples such that each couple is twice in the table (if that is not already the case).
celeb1 celeb2
A B
B A
B C
C B
C D
D C
We can already see that B and C each had two partners. Join this data set with itself so as to connect the records.
with rel as
(
select celeb1 as cel1, celeb2 as cel2 from relationships
union
select celeb2 as cel1, celeb1 as cel2 from relationships
)
select rel1.cel2 as celeb1, rel2.cel2 as celeb2, rel1.cel1 as partner
from rel rel1
join rel rel2 on rel2.cel1 = rel1.cel1 and rel2.cel2 > rel1.cel2
order by 1, 2, 3;

If Celeb3 was in a relationship with A and B, you will also get B, A as a result. To avoid that, just make a constraint that A > B:
SELECT DISTINCT S1.Celeb1, S2.Celeb2, S3.name AS Celeb3
FROM Relationships S1, Relationships S2, Celebs S3
WHERE S3.name = S1.Celeb2
AND S3.name = S2.Celeb1
AND S1.Celeb1 > S2.Celeb2

Oracle Setup:
CREATE TABLE celebs ( name ) AS
SELECT 'A' FROM DUAL UNION ALL
SELECT 'B' FROM DUAL UNION ALL
SELECT 'C' FROM DUAL UNION ALL
SELECT 'D' FROM DUAL;
CREATE TABLE relationships ( celeb1, celeb2 ) AS
SELECT 'A', 'B' FROM DUAL UNION ALL
SELECT 'B', 'C' FROM DUAL UNION ALL
SELECT 'C', 'D' FROM DUAL;
Query:
SELECT DISTINCT
c.name,
CASE c.name WHEN r.celeb1 THEN r.celeb2 ELSE r.celeb1 END AS has_relationship_with
FROM celebs c
LEFT OUTER JOIN
relationships r
ON ( c.name = r.celeb1 OR c.name = r.celeb2 );
Output:
NAME HAS_RELATIONSHIP_WITH
---- ---------------------
A B
B A
B C
C B
C D
D C
If you want A,B and do not want the inverse B,A then change the ON clause for the join to:
ON ( ( c.name = r.celeb1 AND c.name < r.celeb2 )
OR ( c.name = r.celeb2 AND c.name < r.celeb1 ) )
Query 2:
You could then group this using LISTAGG to just get one row per person:
SELECT name,
LISTAGG( rel, ',' ) WITHIN GROUP ( ORDER BY rel ) AS has_relationship_with
FROM (
SELECT DISTINCT
c.name,
CASE c.name WHEN r.celeb1 THEN r.celeb2 ELSE r.celeb1 END AS rel
FROM celebs c
LEFT OUTER JOIN
relationships r
ON ( c.name = r.celeb1 OR c.name = r.celeb2 )
)
GROUP BY name;
Output:
NAME HAS_RELATIONSHIP_WITH
---- ---------------------
A B
B A,C
C B,D
D C

Related

SQL: Joining tables with WHERE clauses

I am trying to join two tables that share the same individual ID (key). The first table (a) is a 'wide' table with many variables including key and age, and the second table (b) is a 'long' table, including only the variables key, diagnosis_number, and diagnosis, where each individual can have multiple values of diagnosis.
I want to select columns key, age, and the primary diagnosis for individuals where:
diagnosis = "a", "b", or "c" when diagnosis_number = 1 [the 'primary diagnosis']
AND diagnosis = "y" for any of diagnosis_number = 2:20
I've tried:
SELECT main.key, main.age, diag.diagnosis
FROM a as main
INNER JOIN
(
SELECT prim.key, prim.diagnosis
FROM
(SELECT DISTICT key, diagnosis
FROM b
WHERE diagnosis IN ('a', 'b', 'c')
AND diagnosis_number = 1) as prim
INNER JOIN
(SELECT DISTICT key, diagnosis
FROM b
WHERE diagnosis = 'y'
AND diagnosis_number BETWEEN 2 AND 20) as sec
ON prim.key = sec.key) as diag
ON main.key = diag.key
Think this could be solved without subqueries/SELECTs inside of joins:
SQL Server
SELECT a.key, a.age, b.diagnosis
FROM a
INNER JOIN b
ON b.key = a.key
WHERE b.diagnosis IN ('a', 'b', 'c')
AND b.diagnosis_code = 1
AND a.key IN (SELECT b1.key
FROM b AS b1
WHERE b1.diagnosis = 'y'
AND b1.diagnosis_code BETWEEN 2 AND 20)

How to use array in a sql join and check if all of the array elements satisfy a condition?

I have two tables, activity and contacts in postgresql.
An activity can have multiple contacts in array form, like this
contact Id = {23,54,34}.
I want to delete a activity only if all the contact Ids of that activity are deleted in contacts table and keep the activity if at least one one contact id is still not deleted.
Deleted At is column in contacts table to check for deleted contacts. I don't want to use NOT IN.
Activity table
id contact Id
-------------------
16 {60,61,23}
15 {}
5 {59}
6 {}
You can use simple EXISTS predicate testing contacts table with activity.contacts array:
create table activity (
id int primary key,
contacts int[]
)
create table contacts (
id int primary key,
name varchar(10),
deleted boolean
)
insert into activity
select 16 as id, '{1,2,3}'::int[] as contacts union all
select 15, null union all
select 5, '{4}' union all
select 6, '{6, 5}'
insert into contacts
select 1 as id, 'q' as name, false as deleted union all
select 2, 'w', false union all
select 3, 'e', true union all
select 4, 'r', false union all
select 5, 't', true union all
select 6, 'y', true
delete
from activity a
where not exists (
select null
from contacts c
where not(c.deleted)
and c.id = any(a.contacts)
)
2 rows affected
db<>fiddle here
Use UNNEST to convert the nested-array to rows, then do an anti-join to look for broken references:
(An anti-join is when you perform a x LEFT OUTER JOIN y with a WHERE y.KeyColumn IS NULL - this gives you the opposite of an INNER JOIN with the same join criteria).
WITH unnested AS (
SELECT
Id AS ActivityId,
UNNEST( ContactId ) AS ContactIdFromArray
FROM
crmActivity
)
SELECT
u.ActivityId,
u.ContactIdFromArray AS Missing_ContactId_in_Activity
FROM
unnested AS u
LEFT OUTER JOIN contacts AS c ON
c.ContactId = u.ContactIdFromArray
WHERE
c.ContactId IS NULL
ORDER BY
u.ActivityId
I want to delete a activity only if all the contact Ids of that activity are deleted in contacts table and keep the activity if at least one one contact id is still not deleted.
This can be done with a DELETE FROM using WHERE crmActivity.Id IN with a CTE that generates the correct set of bad crmActivity.Id values, via a GROUP BY with the above query:
WITH unnested AS (
SELECT
Id AS ActivityId,
UNNEST( ContactId ) AS ContactIdFK
FROM
crmActivity
)
WITH brokenContacts AS (
SELECT
u.ActivityId,
u.ContactIdFK,
c.ContactId AS ContactIdPK
FROM
unnested AS u
LEFT OUTER JOIN contacts AS c ON
c.ContactId = u.ContactIdFromArray
)
WITH counts AS (
SELECT
ActivityId,
COUNT(*) AS ContactIdCount,
COUNT( CASE WHEN ContactIdPK IS NULL THEN 1 END ) AS BadContactIdCount
FROM
brokenContacts
GROUP BY
ActivityId
)
WITH badActivityIds AS (
SELECT
ActivityId
FROM
counts
WHERE
BadContactIdCount = ContactIdCount
AND
ContactIdCount > 0
)
DELETE FROM
crmActivity
WHERE
ActivityId IN ( SELECT ActivityId FROM badActivityIds );

SELECT JOIN Table Operations

In my query below, I joined tables b, c, d to a by the base_id column.
However, I need to do some operations in my SELECT statement.
Since b, c, and d are not joined to each other,
is my (a.qty - (b.qty - (c.qty + d.qty))) formula computing only those tables that have the same base_id column?
SELECT (a.qty - (b.qty - (c.qty + d.qty))) AS qc_in
FROM receiving a
LEFT JOIN (
SELECT SUM(qty) AS qty, base_id
FROM quality_control bb
WHERE location_to = 6
AND is_canceled = 0
GROUP BY base_id
) b
ON b.base_id = a.base_id
LEFT JOIN (
SELECT SUM(qty) AS qty, base_id
FROM quality_control ba
WHERE location_from = 6
AND is_canceled = 0
GROUP BY base_id
) c
ON c.base_id = a.base_id
LEFT JOIN (
SELECT SUM(qty) AS qty, base_id
FROM issuance
WHERE location_from = 6
AND is_canceled = 0
GROUP BY base_id
) d
ON d.base_id = a.base_id
WHERE a.is_canceled = 0
I think you're confused by how joining works (if I'm reading the question correctly). If you have:
select *
from table1 a
join table2 b
on a.Id = b.Id
join table3c
on a.Id = c.Id
Then yes a is joined to b, and b is joined to c, but that also means that a is joined to c. One way to think of it is as one giant in-memory table that has all of the a columns then all of the b columns and then all of the c columns in the one result.
If a.Id is 1, and you select the row from b where Id is the same (1) and then you join to c where the Id is the same as a (1), then a, b and c all have the same id.
So yes, (a.qty - (b.qty - (c.qty + d.qty))) will only be doing that calculation for rows where a, b, c and d all have the same base_id .
Surely. In the nested statements you join all the tables on base_id. That means, as the result of those joins you will get a huge table containing columns from all of the joined tables with a common column you join on (base_id in your case).

How to differentiate between “no child rows exist” and “no parent row exists” in one SELECT query?

Say I have a table C that references rows from tables A and B:
id, a_id, b_id, ...
and a simple query:
SELECT * FROM C WHERE a_id=X AND b_id=Y
I would like to differentiate between the following cases:
No row exists in A where id = X
No row exists in B where id = Y
Both such rows in A and B exist, but no rows in C exist where a_id = X and b_id = Y
The above query will return empty result in all those cases.
In case of one parent table I could do a LEFT JOIN like:
SELECT * FROM A LEFT JOIN C ON a.id = c.a_id WHERE c.a_id = X
and then check if the result is empty (no row in A exists), has one row with NULL c.id (row in A exists, but no rows in C exist) or 1+ rows with non-NULL c.id (row in A exists and at least one row in C exists). A bit messy but it works, but I was wondering if there is a better way of doing this, especially if there is more than one parent table?
For example:
C is "things owned by people", A is "people", B is "types of things". When someone asks "give me a list of games owned by Bill", and there are no such records in C, I would like to return an empty list only if both "Bill" and "games" exist in their corresponding tables, but an error code if either of them doesn't.
So if there are no records matching "Bill" and "games" in table C, I would like to say "I don't know who Bill is" instead of "Bill has no games" if I don't have a record about Bill in table A.
create table a(a_id integer not null primary key);
create table b(b_id integer not null primary key);
create table c(a_id integer not null references a(a_id)
, b_id integer not null references b(b_id)
, primary key (a_id,b_id)
);
insert into a(a_id) values(0),(2),(4),(6);
insert into b(b_id) values(0),(3),(6);
insert into c(a_id,b_id) values(6,6);
PREPARE omg(integer,integer) AS
SELECT EXISTS(SELECT * FROM a where a.a_id = $1) AS a_exists
, EXISTS(SELECT * FROM b where b.b_id = $2) AS b_exists
, EXISTS(SELECT * FROM c where c.a_id = $1 and c.b_id = $2) AS c_exists
;
EXECUTE omg(1,1);
EXECUTE omg(2,1);
EXECUTE omg(1,3);
EXECUTE omg(6,6);
-- with optional payload:
PREPARE omg2(integer,integer) AS
SELECT val.a_id AS va_id
, val.b_id AS vb_id
, EXISTS(SELECT * FROM a WHERE a.a_id = $1) AS a_exists
, EXISTS(SELECT * FROM b WHERE b.b_id = $2) AS b_exists
, EXISTS(select * FROM c WHERE c.ca_id = val.a_id AND c.cb_id = val.b_id ) AS c_exists
, a.*
, b.*
, c.*
FROM (values ($1,$2)) val(a_id,b_id)
LEFT JOIN a ON a.a_id = val.a_id
LEFT JOIN b ON b.b_id = val.b_id
LEFT JOIN c ON c.ca_id = val.a_id AND c.cb_id = val.b_id
;
EXECUTE omg2(1,1);
EXECUTE omg2(2,1);
EXECUTE omg2(1,3);
EXECUTE omg2(6,6);
I think I managed to get a satisfactory solution using the following two features:
Subselect bound to a column, which allows me to check if a row exists and (importantly) get a NULL value otherwise (e.g. SELECT (SELECT id FROM a WHERE id = 1) as a_id))
Common Table Expressions
Initial data:
CREATE TABLE people
(
id integer not null primary key,
name text not null
);
CREATE TABLE thing_types
(
id integer not null primary key,
name text not null
);
CREATE TABLE things
(
id integer not null primary key,
person_id integer not null references people(id),
thing_type_id integer not null references thing_types(id),
name text not null
);
INSERT INTO people VALUES (1, 'Bill');
INSERT INTO thing_types VALUES (1, 'game');
INSERT INTO things VALUES (1, 1, 1, 'Duke Nukem');
INSERT INTO things VALUES (2, 1, 1, 'Warcraft 2');
And the query:
WITH v AS (
SELECT (SELECT id FROM people WHERE id=<person_id_param>) AS person_id,
(SELECT id FROM thing_types WHERE id=<thing_type_param>) AS thing_type_id
)
SELECT v.person_id, v.thing_type_id, things.name
FROM
v LEFT JOIN things
ON v.person_id = things.person_id AND v.thing_type_id = things.thing_type_id
This query will always return at least one row, and I just need to check which, if any, of the three columns of the first row are NULLs.
In case if both parent table ids are valid and there are some records, none of them will be NULL:
person_id thing_type_id name
-------------------------------------
1 1 Duke Nukem
1 1 Warcraft 2
If either person_id or thing_type_id are invalid, I get one row where name is NULL and either person_id or thing_type_id is NULL:
person_id thing_type_id name
-------------------------------------
NULL 1 NULL
If both person_id and thing_type_id are valid but there are no records in things, I get one row where both person_id and thing_type_id are not NULL, but the name is NULL:
person_id thing_type_id name
-------------------------------------
1 1 NULL
Since I have a NOT NULL constraint on things.name, I know that this case can only mean that there are no matching records in things. If NULLs were allowed in things.name, I could include things.id instead and check that for NULLness.
You have 3 cases, the third one is a bit more complex but can be achieved by using cross join between a and b, all three cases in a union could be like this
select a_id, b_id , 'case 1' from c
where not exists (select 1 from a where a.a_id=c.a_id)
union all
select a_id, b_id ,'case 2' from c
where not exists (select 1 from b where b.b_id=c.b_id)
union all
select a_id, b_id, 'case 3' from a cross join b
where exists (select 1 from c where c.a_id=a.a_id)
and exists (select 1 from c where c.b_id=b.b_id)
and not exists (select 1 from c where c.b_id=b.b_id and c.a_id=a.a_id)

Recursive query in SQL Server

I have a table with following structure
Table name: matches
That basically stores which product is matching which product. I need to process this table
And store in a groups table like below.
Table Name: groups
group_ID stores the MIN Product_ID of the Product_IDS that form a group. To give an example let's say
If A is matching B and B is Matching C then three rows should go to group table in format (A, A), (A, B), (A, C)
I have tried looking into co-related subqueries and CTE, but not getting this to implement.
I need to do this all in SQL.
Thanks for the help .
Try this:
;WITH CTE
AS
(
SELECT DISTINCT
M1.Product_ID Group_ID,
M1.Product_ID
FROM matches M1
LEFT JOIN matches M2
ON M1.Product_Id = M2.matching_Product_Id
WHERE M2.matching_Product_Id IS NULL
UNION ALL
SELECT
C.Group_ID,
M.matching_Product_Id
FROM CTE C
JOIN matches M
ON C.Product_ID = M.Product_ID
)
SELECT * FROM CTE ORDER BY Group_ID
You can use OPTION(MAXRECURSION n) to control recursion depth.
SQL FIDDLE DEMO
Something like this (not tested)
with match_groups as (
select product_id,
matching_product_id,
product_id as group_id
from matches
where product_id not in (select matching_product_id from matches)
union all
select m.product_id, m.matching_product_id, p.group_id
from matches m
join match_groups p on m.product_id = p.matching_product_id
)
select group_id, product_id
from match_groups
order by group_id;
Sample of the Recursive Level:
DECLARE #VALUE_CODE AS VARCHAR(5);
--SET #VALUE_CODE = 'A' -- Specify a level
WITH ViewValue AS
(
SELECT ValueCode
, ValueDesc
, PrecedingValueCode
FROM ValuesTable
WHERE PrecedingValueCode IS NULL
UNION ALL
SELECT A.ValueCode
, A.ValueDesc
, A.PrecedingValueCode
FROM ValuesTable A
INNER JOIN ViewValue V ON
V.ValueCode = A.PrecedingValueCode
)
SELECT ValueCode, ValueDesc, PrecedingValueCode
FROM ViewValue
--WHERE PrecedingValueCode = #VALUE_CODE -- Specific level
--WHERE PrecedingValueCode IS NULL -- Root