Query to find mutual likes? - sql

I have a table which has the id of a particular person and id of the person he likes.
Likes
(p1,p2)
id1,id2
id2,id1
id3,id4
id3 id5
expected output
id1,id2
I have to remove duplicates also meaning id1,id2 to be returned once.
It is a exercise question.
select hh.id, hh.name, hh.grade as gr
, hh.id2, kk.name, kk.grade as gr1
from ( select id, id2, grade, name
from highschooler ab
, Likes cd
where ab.id = cd.id1 ) hh
, highschooler kk
where hh.id2 = kk.id
This query returns something like this
student id,student name,student grade,friend student likes,friend name,friend grade

This should do it joining on itself:
SELECT p.p1, p.p2
FROM Likes p
INNER JOIN Likes p2 ON
p.p1=p2.p2 AND
p.p2=p2.p1 AND
p.p1<p2.p1
Sample Fiddle Demo

I think the nicest way to do this is with a group by. In SQL Server, this requires using a case statement:
with l as (
select (case when p1 < p2 then p1 else p2 end) as pfirst,
(case when p1 < p2 then p2 else p1 end) as psecond
from likes
)
select pfirst, psecond
from l
group by pfirst, psecond
having count(*) = 2
If you have duplicates in the original data, then the having clause should be:
having count(distinct p1) = 2

Related

SQL | List all all tuples(a, b, c) if there exists another tuple with equal (b,c)

I have three tables where the bold attribute(s) is the primary key
Restaurants(restaurant_ID, name, ...)
resturant_ID, name, ...
1, Macdonalds
2, Hubert
3, Dorsia
... ...
Identifier(restaurant_ID, food_ID)
restaurant_ID, food_ID, ...
1, 1
1, 4
2, 1
2, 7
... ...
Food(food_ID, name, ...)
food_ID food_name
1 Chips
2 Burgers
3 Salmon
... ...
Using postgres I want to list out all restaurants (restaurant_id and name - 1 row per restaurant) that have share the exact same set of foods with at least one other restaurant.
For example, let's say
Restaurant with ID "1" has only associated food_id's 1 and 4 as shown in Identifier
Restaurant with ID "3" has only associated food_id's 4 and 1 as shown in Identifier
Restaurant with ID "7" has only associated food_id's 6 as shown in Identifier
Restaurant with ID "9" has only associated food_id's 6 as shown in Identifier
Then output
Restaurant_id name
1 name1
3 name3
7 ...
9 ...
Any help would be greatly appreciated!
Thank you
Use the aggregate function string_agg() to get the full list of foods for each restaurant:
with cte as (
select restaurant_ID,
string_agg(food_ID::varchar(10),',' order by food_ID) foods
from identifier
group by restaurant_ID
)
select r.*
from Restaurants r inner join cte c
on c.restaurant_ID = r.restaurant_ID
where exists (select 1 from cte where restaurant_ID <> c.restaurant_ID and foods = c.foods)
But I would prefer to group restaurants based on matching foods:
with cte as (
select restaurant_ID,
string_agg(food_ID::varchar(10),',' order by food_ID) foods
from identifier
group by restaurant_ID
)
select string_agg(r.name, ',') restaurants
from Restaurants r inner join cte c
on c.restaurant_ID = r.restaurant_ID
group by foods
having count(*) > 1
See the demo.
Here is a way to get the unique set of resturants having exactly same food items. This uses array_agg() and array_to_string() functions
With cte as
(select T.restaurant_id, array_to_string(array_agg(food_id), ',') as food_list
from
(select *
from Identifier t1
order by restaurant_id, food_id) T
group by T.restaurant_id)
select
concat(r1.name,',',r2.name) as resturant_names,
t1.restaurant_id as restaurant_id1,
r1.name as restaurant_1,
t2.restaurant_id as restaurant_id2,
r2.name as restaurant_2,
t1.food_list as common_food_ids
from cte t1
join cte t2
on t1.restaurant_id < t2.restaurant_id
and t1.food_list = t2.food_list
left join Restaurants r1
on t1.restaurant_id = r1.restaurant_id
left join Restaurants r2
on t2.restaurant_id = r2.restaurant_id;
EDIT : Here is a dB fiddle - https://dbfiddle.uk/?rdbms=postgres_12&fiddle=e2de05edfbe036cc0d81c64d60f0b599 . Also, just for reference, solution to the same problem in Oracle using listagg function - https://dbfiddle.uk/?rdbms=oracle_11.2&fiddle=12785c3d5abbca97be5d44dd45a6da4a
Update : Below query addresses the update output format of the question.
With cte as
(select T.restaurant_id, array_to_string(array_agg(food_id), ',') as food_list
from
(select *
from Identifier t1
order by restaurant_id, food_id) T
group by T.restaurant_id)
select
--concat(r1.name,',',r2.name) as resturant_names,
t1.restaurant_id as restaurant_id,
r1.name as restaurant--,
--t2.restaurant_id as restaurant_id2,
--r2.name as restaurant_2,
--t1.food_list as common_food_ids
from cte t1
join cte t2
on t1.restaurant_id = t2.restaurant_id
and t1.food_list = t2.food_list
left join Restaurants r1
on t1.restaurant_id = r1.restaurant_id
left join Restaurants r2
on t2.restaurant_id = r2.restaurant_id;
As I understand your question, you want all restaurants that have the same list of foods as restaurant 1.
If so, that's a relation division problem. Here is an approach using joins and aggregation:
select r.name
from identifier i1
inner join identifier i2 on i2.food_id = i1.food_id
inner join restaurant r on r.restaurant_id = i2.restaurant_id
where i1.restaurant_id = 1
group by r.restaurant_id
having count(*) = (select count(*) from identifier i3 where i3.restaurant_id = 1)

Conditional probability in SQL

I think I have end up in a bit of a dead end.
Let's say I have a dataset, which is fairly easy -
person_id and book_id. Which is pretty much factual table that says person X bought books A, B and C.
I know how to find out how many persons have bought Book X and Book Y together.
This is
select a.book_id as B1, b.book_id as B2, count(b.person_id) as
Bought_Together
from dbo.data a
cross join dbo.data b
where a.book_id != b.book_id and a.person_id = b.person_id
group by a.book_id, b.book_id
Yet again this is where my brain decided to shut down. I know that I would probably need to do it so that
count(b.person_id) / all the people that bought book A * 100
but im not entirely sure.
I hope I was clear enough.
EDIT1: I'm using SQL Server 2017 currently, so i think the correct answer is T-SQL?.
In the end the format should be something similliar to this. Also there is no cases where person A could have bought three copies of book X.
Book1 Book2 HowManyPeopleBoughtBook2
1 2 50%
1 3 7%
2 3 15%
2 1 40%
3 1 60%
3 2 20%
EDIT2: Let it be said there is hundreds of thousands of rows in the database. Yes this is bit related to a data science course i am taking - hence huge amounts of data.
You can extend your logic to do this:
select a.book_id as B1, b.book_id as B2,
count(b.book_id) as bought_second_book,
count(b.book_id) * 1.0 / book_cnt as ratio_Bought_Together
from (select a.*, count(*) over (partition by a.book_id) as book_cnt
from dbo.data a
) a left join
dbo.data b
on a.person_id = b.person_id and a.book_id <> b.book_id
group by a.book_id, b.book_id, a.book_cnt;
This assumes that people buy a book only once. If there are duplicates, then count(distinct) would adjust for that.
If you would like to generate all possible combinations of the pairs of books bought together along with the percentage of the persons who bought that combination the following can help
create table data1(book_id int, person_id int)
insert into data1
select *
from (values(1,300)
,(2,300)
,(2,301)
,(1,301)
,(3,301)
)t(book_id,person_id)
with books
as (select distinct book_id
from data1 a
)
,tot_persons
as (select count(distinct person_id) as tot_cnt
from data1
)
,pairs
as (
select a.book_id as col1 /* This block generates all possible pair combinations of books*/
,b.book_id as col2
from books a
join books b
on a.book_id<b.book_id
)
select a.col1,a.col2
,count(b.person_id)*100/(select tot_cnt from tot_persons) as percent_of_persons_buying_both
from pairs a
join data1 b
on a.col1=b.book_id
where exists(select 1
from data1 b1
where b.person_id=b1.person_id
and a.col2=b1.book_id)
group by a.col1,a.col2
On my phone, apologies for typo's
SELECT
SUM(bought_b) * 100.0 / COUNT(*)
FROM
(
SELECT
person_id,
MAX(CASE WHEN book_id = 'A' THEN 1 END) AS bought_a,
MAX(CASE WHEN book_id = 'B' THEN 1 END) AS bought_b
FROM
data
WHERE
book_id IN ('A', 'B')
GROUP BY
person_id
)
person_stats
WHERE
bought_a = 1
On my phone, apologies for typo's
EDIT : just saw that you want all combinations, just just one set combination.
WITH
book AS
(
SELECT DISTINCT book_id FROM data
)
SELECT
book_a_id,
book_b_id,
bought_b * 100.0 / bought_b
FROM
(
SELECT
book_a.book_id AS book_a_id,
book_b.book_id AS book_b_id,
COUNT(DISTINCT data_a.person_id) AS bought_a,
COUNT(DISTINCT data_b.person_id) AS bought_b
FROM
book AS book_a
CROSS JOIN
book AS book_b
INNER JOIN
data AS data_a
ON data_a.book_id = book_a.book_id
LEFT JOIN
data AS data_b
ON data_b.book_id = book_b.book_id
GROUP BY
book_a.book_id,
book_b.book_id
)
stats

Database multiple selection

I've the following 2 tables:
Players
id name
p1 name1
p2 name2
p3 name3
p4 name4
Matches
id winner loser
m1 p1 p2
m2 p3 p4
what i want is to write a select statement that will return the following:
id name matches wins
p1 name1 1 1
p3 name3 1 1
p2 name2 1 0
p4 name4 1 0
this is basically to return the tournament standings, ordered by number of wins.
One method uses correlated subqueries:
select p.*,
(select count(*) from matches m where p.id in (m.winner, m.loser)
) as matches,
(select count(*) from matches m where p.id = m.winner
) as wins
from players p
order by wins desc;
Another way is to use two outer joins:
select p.id,
p.name,
count(distinct m.id) as matches,
count(distinct w.id) as wins
from players p
left join matches m on p.id in (m.winner, m.loser)
left join matches w on w.winner = p.id
group by p.id, p.name
order by wins desc, p.name
You can add 2 columns to Players table or create new table Tournament named. You can register Tournament Players with your hand(manually) on your admin panel.
If will you add 2 columns to Players table. U will query Players win scores, after query get your MySQL number of rows function and you will get hoe much win The Player. After writenitnto Tournament page. (Note: Sorry for bad English, im trying to speak English. Please try to understand me.)

SQL All Possible Round Robin Combinations between two Tables

given table:
create table Person( Name varchar(100) )
where Name is unique for all Persons
What SQL query can generate all possible n!/((n-2)!2!) round robin combinations?
It is assumed that the cardinality of Person is ALWAYS equal to 4
Example Person = {'Anna','Jerome','Patrick','Michael')
Output:
Anna, Jerome
Anna, Patrick
Anna, Michael
Jerome, Patrick
Jerome, Michael
Patrick, Michael
Any help would be appreciated. Thanks!
Here's my answer (I used oracle SQL):
select P1.NAME PERSON1, P2.NAME PERSON2
from (select rownum RNUM, NAME
from PERSON) P1,
(select rownum RNUM, NAME
from PERSON) P2
where P1.RNUM < P2.RNUM
Here are two solutions for the problem
SELECT t1.Name + ',' + t2.Name AS NamesCombination
FROM Person t1
INNER JOIN Person t2
ON t1.Name < t2.Name
OR (Oracle 11i R2+)
WITH NamesCombination AS
(
SELECT 1 AS Cntr
,Name
, CAST(Name AS VARCHAR(MAX))AS NamesCombinations
FROM Person
UNION ALL
SELECT
nc.Cntr+1
,p.Name
,nc.NamesCombinations + ',' + CAST(p.Name AS VARCHAR(MAX))
FROM Person AS p JOIN NamesCombination nc ON p.Name < nc.Name
WHERE nc.Cntr < 2
)
SELECT NamesCombinations
FROM NamesCombination
WHERE Cntr = 2
select P1.NAME PERSON1, P2.NAME PERSON2
from (select rownum RNUM, NAME
from PERSON) P1,
(select rownum RNUM, NAME
from PERSON) P2
where P1.RNUM < P2.RNUM
Note that this is TSQL (Sql Server) syntax. I know that Oracle supports windowing functions, particularly row_number(), which is necessary for this solution.
It shouldn't be too hard to get this to work in Oracle with some trial and error
select p1.name, p2.name
from
(
select name, row_number() over(order by name) as rownumber
from person
) p1
inner join
(
select name, row_number() over(order by name) as rownumber
from person
) p2
on p1.name <> p2.name
and p1.rownumber > p2.rownumber
order by 1
row_number assigns a row number to each row. You then need to join, as suggested before, with the additional join clause of p1.rownumber > p2.rownumber

SQL: select sets containing exactly given members

I'm sure there is a proper word for this which I fail to remember, but the problem is easy to describe:
I have a table groupmembers, which is a simple relationship between groups and members:
id | groupid | memberid
1 | g1 | m1
2 | g1 | m2
3 | g2 | m1
4 | g2 | m2
5 | g2 | m3
Above describing two groups, one with m1 and m2 and one with m1,m2 and m3.
If I want to select groupids which has members m1,m2 but no other members, how do I do it? The approaches I have tried would also return g2, as m1 and m2 is a subset of them.
UPDATE: Wow, some great answers! Let me first clarify my question a little - I want to be able to select the group that exactly matches the given members m1 and m2. So, it should NOT match if the group also contains more members than m1 and m2, and it should NOT match if the group contains less than members m1 and m2.
from your phrase
I want to select groupids which has members m1,m2 but no other members
try this one, the idea behind is to count the total instances of records that match the condition and the where clause and that it is equal to the total number of records per group.
SELECT groupid
FROM table1 a
WHERE memberid IN ('m1','m2')
GROUP BY groupid
HAVING COUNT(*) =
(
SELECT COUNT(*)
FROM table1 b
WHERE b.groupid = a.groupid
GROUP BY b.groupID
)
SQLFiddle Demo
You are looking for the intersection between those groups that have m1 and m2 and those groups that have exactly two members. SQL has an operator for that:
select groupid
from group_table
where memberid in ('m1','m2')
group by groupid
having count(distinct memberid) = 2
intersect
select groupid
from group_table
group by groupid
having count(distinct memberid) = 2
(If you are using Oracle, intersect is called minus)
Here is a SQLFiddle demo: http://sqlfiddle.com/#!12/df94d/1
Although I think John Woo's solution could be more efficient in terms of performance.
there is an issue with this query
SELECT groupid
FROM table1 a
WHERE memberid IN ('m1','m2')
GROUP BY groupid
HAVING COUNT(*) =
(
SELECT COUNT(*)
FROM table1 b
WHERE b.groupid = a.groupid
GROUP BY b.groupID
)
It will match groups with m1 only or m2 only.
For that we can add another count check
SELECT groupid
FROM table1 a
WHERE memberid IN ('m1','m2')
GROUP BY groupid
HAVING COUNT(*) = 2 --since we already know we should have exactly two rows
AND COUNT(*) =
(
SELECT COUNT(*)
FROM table1 b
WHERE b.groupid = a.groupid
GROUP BY b.groupID
)
SELECT DISTINCT -- if (groupid, memberid) is unique
-- no need for the DISTINCT
a.groupid
FROM
tableX AS a
JOIN
tableX AS b
ON b.groupid = a.groupid
WHERE a.memberid = 'm1'
AND b.memberid = 'm2'
AND NOT EXISTS
( SELECT *
FROM tableX AS t
WHERE t.groupid = a.groupid
AND t.memberid NOT IN ('m1', 'm2')
) ;
-- sample table for discussion
CREATE TABLE tbl
(id int, groupid varchar(2), memberid varchar(2));
INSERT INTO tbl
(id, groupid, memberid)
VALUES
(6, 'g4', 'm1'),
(7, 'g4', 'm2'),
(8, 'g6', 'm1'),
(9, 'g6', 'm3'),
(1, 'g1', 'm1'),
(2, 'g1', 'm2'),
(3, 'g2', 'm1'),
(4, 'g2', 'm2'),
(5, 'g2', 'm3')
;
-- the query
select a.groupid, b.groupid peer
from (select groupid, count(*) member_count, min(memberid) x, max(memberid) y
from tbl
group by groupid) A
join
(select groupid, count(*) member_count, min(memberid) x, max(memberid) y
from tbl
group by groupid) B
on a.groupid<b.groupid and a.member_count=b.member_count and a.x=b.x and a.y=b.y
join tbl A1
on A1.groupid = A.groupid
join tbl B1
on B1.groupid = B.groupid and A1.memberid = B1.memberid
group by A.groupid, b.groupid, A.member_count
having count(1) = A.member_count;
-- the result
GROUPID PEER
g1 g4
The above shows a way to get groups listed with their peers, in a highly optimal way. It works well with large databases by decomposing the groups into member counts and takes along the min and max. The groups are quickly pared down using a direct join, and only for the remaining matches is the full table consulted joining back on group ids A and B to finally determine if they are equivalent groups.
If you had 3 similar groups (101,103,104), the sets will appear as three separate rows (101,103),(101,104),(103,104) - because each pair forms a peering, so such a query is best used if you already know one of the groups that you want to find peers for. This filter would fit into the first subquery.
id | groupid | memberid
1 | g1 | m1
2 | g1 | m2
3 | g2 | m1
4 | g2 | m2
5 | g2 | m3
select GRPID from arcv where GRPID in (
select GRPID from arcv
group by GRPID having count(1)=2) and memberid in ('m1','m2')