POSTGRESQL - Find a row with a specific set of join table data - sql

I have Users in Groups. I am trying to find which Group contains ONLY a specific set of Users. For instance Bob is in the group [Bob + John], but also in the group [Bob + John + Steve], and I would like to match the first one.
I use a join table groups_users to link Users to Groups.
I am having a hard time coming up with a query that will use the join table to match the users to the group, but also using that join table to exclude groups (groups that do not have the exact set of user searched).
Here is a fiddle with some data.
Schema (PostgreSQL v13 (Beta))
CREATE TABLE users (
id SERIAL PRIMARY KEY,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
username VARCHAR(100) NOT NULL UNIQUE
);
CREATE TABLE groups (
id SERIAL PRIMARY KEY,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE TABLE groups_users (
group_id INT NOT NULL,
user_id INT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
CONSTRAINT fk_group
FOREIGN KEY(group_id)
REFERENCES groups(id),
CONSTRAINT fk_user
FOREIGN KEY(user_id)
REFERENCES users(id)
);
INSERT INTO users (username)
VALUES ('bob'), ('john'), ('steve');
INSERT INTO groups DEFAULT VALUES;
INSERT INTO groups DEFAULT VALUES;
INSERT INTO groups DEFAULT VALUES;
INSERT INTO groups_users (group_id, user_id)
VALUES (1, 1),
(1, 2),
(2, 2),
(2, 3),
(3, 1),
(3, 2),
(3, 3);
Query #1
SELECT * FROM groups_users
WHERE groups_users.user_id IN (1, 2);
| group_id | user_id | created_at |
| -------- | ------- | ------------------------ |
| 1 | 1 | 2020-11-22T16:12:35.796Z |
| 1 | 2 | 2020-11-22T16:12:35.796Z |
| 2 | 2 | 2020-11-22T16:12:35.796Z |
| 3 | 1 | 2020-11-22T16:12:35.796Z |
| 3 | 2 | 2020-11-22T16:12:35.796Z |
We see that we match groups 1, 2 and 3, but we only want to match 1, and I don't know how to go about querying this.
Thank you for your help.

You can group your groups and aggregate the user_ids into arrays. Than can compare these aggregations with created user_id arrays:
demo:db<>fiddle
SELECT
group_id
FROM
groups_users
GROUP BY group_id
HAVING ARRAY_AGG(user_id) = ARRAY[1,2]

This is gross and I apologize, but what this query does is gets a count of the row results for groups with your filter applied and then compares it to the total members of the group and only includes groups which only include those members.
SELECT t1.group_id FROM
(
SELECT group_id, COUNT(group_id) AS Instances FROM groups_users
WHERE groups_users.user_id IN (1, 2)
GROUP BY group_id
) T1
INNER JOIN
(
SELECT group_id, COUNT(group_id) AS Instances FROM groups_users
GROUP BY group_id
) T2
ON T1.group_id = T2.group_id and t1.Instances = t2.Instances

Related

Snowflake SQL aggregate based on multiple columns

I've got 2 tables of User ID's and emails.
A user can change their email but keep the same user ID (row 2 and row 5 of USER_PLAYS table).
A user can also create a new user ID with an existing email (row 3 of USER_PLAYS table).
I want to be able to sum up the total plays for this user into a single row.
There is also another table with sales value that I would like to get the total sales.
I'm thinking somehow to create a unique ID that is the same across all these fields but not sure how to implement it.
Note that I've only shown 1 actual person but there are multiple more unique people in these tables.
I am using Snowflake as that is where the data is.
USER_PLAYS table:
|ROW|USER_ID | EMAIL |VIDEO_PLAYS|
|---|-----------|--------------------|-----------|
|1 | 1 | ab#gmail.com | 2 |
|2 | 1 | cd#gmail.com | 3 |
|3 | 3 | cd#gmail.com | 4 |
|4 | 4 | cd#gmail.com | 2 |
|5 | 4 | ef#gmail.com | 3 |
Sales Table:
|NET_SALE | EMAIL |
|-----------|-------------|
|5 | cd#gmail.com|
|10 | ef#gmail.com|
Desired Output:
|UNIQUE_ID | PLAYS |NET_SALE|
|-----------|-------|--------|
| 1 | 14 | 15 |
This may have opportunities for additional efficiencies, but I think this process works to get you the unique identifier across your user_id / email combinations.
For this process I added another column called COMMON_ID to the user_plays table. This joined with the NET_SALES table by email_id, can be aggregated to the sales against the COMMON_ID (see results below):
-- Create the test case
create
or replace table user_plays (
user_id varchar not null,
email varchar not null,
video_plays integer not null,
common_id integer default NULL
);
insert into
user_plays
values
(1, 'ab#gmail.com', 2, null),
(1, 'cd#gmail.com', 3, null),
(3, 'cd#gmail.com', 4, null),
(4, 'cd#gmail.com', 2, null),
(4, 'ef#gmail.com', 3, null),
(5, 'jd#gmail.com', 10, null),
(6, 'lk#gmail.com', 1, null),
(6, 'zz#gmail.com', 2, null),
(7, 'zz#gmail.com', 3, null);
create
or replace table sales (net_sale integer, email varchar);
insert into
sales
values
(5, 'cd#gmail.com'),(10, 'ef#gmail.com');
-- Test run
-- Create view for User IDs with multiple emails
create
or replace view grp1 as (
select
user_id,
count(*) as mult
from
user_plays
group by
user_id
having
count(*) > 1
);
-- Create view for Emails with multiple user IDs
create
or replace view grp2 as (
select
email,
count(*) as mult
from
user_plays x
group by
email
having
count(*) > 1
);
EXECUTE IMMEDIATE $$
declare new_common_id integer;
counter integer;
Begin
counter := 0;
new_common_id := 0;
-- Basline common_id to NULL
update
user_plays
set
common_id = NULL;
-- Mark all unique entries with a common_id = user_id
update
user_plays
set
common_id = user_id
where
email not in (
select
distinct email
from
grp2
)
and user_id not in (
select
distinct user_id
from
grp1
);
-- Set a common_id to the lowest user_id value for each user_id with multiple emails
LOOP
select count(*) into :counter
from
user_plays
where
common_id is null;
if (counter = 0) then BREAK;
end if;
select
min(user_id) into :new_common_id
from
user_plays
where
common_id is null;
-- first pass
update
user_plays
set
common_id = :new_common_id
where
common_id is null and
(user_id = :new_common_id
or email in (
select
email
from
user_plays
where
user_id = :new_common_id
));
END LOOP;
-- Update the chain where an account using a changed email created a new user_id to match up with prior group.
UPDATE user_plays vp
set vp.common_id = vp2.common_id
from (select user_id, min(common_id) as common_id from user_plays group by user_id) vp2
where vp.user_id = vp2.user_id;
END;
$$;
-- See results
select
*
from
user_plays;
select
x.common_id,
vps.video_plays,
sum(x.net_sale) as net_sale
from
(
select
common_id,
sum(video_plays) as video_plays
from
user_plays
group by
common_id
) vps,
(
select
s.email,
s.net_sale,
max(up.common_id) as common_id
from
sales s,
user_plays up
where
up.email = s.email
group by
s.email,
s.net_sale
) x
where
vps.common_id = x.common_id
group by
x.common_id,
vps.video_plays;
Common ID assignment Results:
USER_ID EMAIL VIDEO_PLAYS COMMON_ID
1 ab#gmail.com 2 1
1 cd#gmail.com 3 1
3 cd#gmail.com 4 1
4 cd#gmail.com 2 1
4 ef#gmail.com 3 1
5 jd#gmail.com 10 5
6 lk#gmail.com 1 6
6 zz#gmail.com 2 6
7 zz#gmail.com 3 6
Final Results:
COMMON_ID VIDEO_PLAYS NET_SALE
1 14 15

SQLite query - filter name where each associated id is contained within a set of ids

I'm trying to work out a query that will find me all of the distinct Names whose LocationIDs are in a given set of ids. The catch is if any of the LocationIDs associated with a distinct Name are not in the set, then the Name should not be in the results.
Say I have the following table:
ID | LocationID | ... | Name
-----------------------------
1 | 1 | ... | A
2 | 1 | ... | B
3 | 2 | ... | B
I'm needing a query similar to
SELECT DISTINCT Name FROM table WHERE LocationID IN (1, 2);
The problem with the above is it's just checking if the LocationID is 1 OR 2, this would return the following:
A
B
But what I need it to return is
B
Since B is the only Name where both of its LocationIDs are in the set (1, 2)
You can try to write two subquery.
get count by each Name
get count by your condition.
then join them by count amount, which means your need to all match your condition count number.
Schema (SQLite v3.17)
CREATE TABLE T(
ID int,
LocationID int,
Name varchar(5)
);
INSERT INTO T VALUES (1, 1,'A');
INSERT INTO T VALUES (2, 1,'B');
INSERT INTO T VALUES (3, 2,'B');
Query #1
SELECT t2.Name
FROM
(
SELECT COUNT(DISTINCT LocationID) cnt
FROM T
WHERE LocationID IN (1, 2)
) t1
JOIN
(
SELECT COUNT(DISTINCT LocationID) cnt,Name
FROM T
WHERE LocationID IN (1, 2)
GROUP BY Name
) t2 on t1.cnt = t2.cnt;
| Name |
| ---- |
| B |
View on DB Fiddle
You can just use aggregation. Assuming no duplicates in your table:
SELECT Name
FROM table
WHERE LocationID IN (1, 2)
GROUP BY Name
HAVING COUNT(*) = 2;
If Name/LocationID pairs can be duplicated, use HAVING COUNT(DISTINCT LocationID) = 2.

insert multiple rows into sql using in statement

I am trying to write an sql script to do a bulk insert. I need it to add the users that are managers into the manager's group. I tried to write it like this
INSERT INTO group_member (group_id, user_id) VALUES ((SELECT group_id FROM user_group WHERE group_name = 'Manager') , (SELECT user_id
FROM user WHERE manager=1 and user_status = 1));
but I am getting this error
Subquery returns more than 1 row
I understand the error but am not sure how to work around it so that I do not miss any users.
When run there can be 0 to many managers, not sure if that will make a difference.
sql version: 5.6.27
CREATE TABLE user_group(
group_id INT(11) NOT NULL AUTO_INCREMENT,
group_name VARCHAR(128) NULL DEFAULT NULL,
PRIMARY KEY (group_id)
);
CREATE TABLE user (
user_id INT(11) NOT NULL AUTO_INCREMENT,
user_name VARCHAR(128) NULL DEFAULT NULL,
manager INT(11) NOT NULL
user_status INT(11) NOT NULL
PRIMARY KEY (user_id)
);
CREATE TABLE group_member (
group_id INT(11) NOT NULL,
user_id INT(11) NOT NULL,
PRIMARY KEY (group_id, user_id)
);
You want insert . . . select:
INSERT INTO group_member (group_id, user_id)
SELECT g.group_id, u.user_id
FROM (SELECT group_id FROM user_group WHERE group_name = 'Manager') g CROSS JOIN
(SELECT user_id FROM user WHERE manager = 1 and user_status = 1) u;
If the group already has members, you might want to filter them out.
You can also write this as:
INSERT INTO group_member (group_id, user_id)
SELECT g.group_id, u.user_id
FROM user u JOIN
user_group g
ON g.group_name = 'Manager' AND
(u.manager = 1 and u.user_status = 1);
I need it to add the users that are managers into the manager's group
INSERT INTO group_member (group_id, user_id)
SELECT (SELECT group_id FROM `group` WHERE group_name = 'Manager'),
user_id
FROM user WHERE manager=1 and user_status = 1;
Of course, there must be only one group with the name of Manager.
Test
mysql> SELECT * FROM `user`;
+---------+-------------+---------+
| manager | user_status | user_id |
+---------+-------------+---------+
| 1 | 1 | 1 |
| 1 | 1 | 2 |
+---------+-------------+---------+
2 rows in set (0.00 sec)
mysql> SELECT * FROM `group`;
+----------+------------+
| group_id | group_name |
+----------+------------+
| 17 | Manager |
+----------+------------+
1 row in set (0.00 sec)
mysql> INSERT INTO group_member (group_id, user_id)
-> SELECT (SELECT group_id FROM `group` WHERE group_name = 'Manager'),
-> user_id
-> FROM user WHERE manager=1 and user_status = 1;
Query OK, 2 rows affected (0.08 sec)
Records: 2 Duplicates: 0 Warnings: 0
mysql> select * FROM group_member;
+----------+---------+
| group_id | user_id |
+----------+---------+
| 17 | 1 |
| 17 | 2 |
+----------+---------+
2 rows in set (0.00 sec)

Create an SQL query from two tables in postgresql

I have two tables as shown in the image. I want to create a SQL query in postgresql to get the pkey and minimum count for each unique 'pkey' in table 1 where 'name1' is not present in the array of column 'name' in table 2.
'name' is a array
You can use ANY to check if one element exists in your name's array.
create table t1 (pkey int, cnt int);
create table t2 (pkey int, name text[]);
insert into t1 values (1, 11),(1, 9),(2, 14),(2, 15),(3, 21),(3,16);
insert into t2 values
(1, array['name1','name2']),
(1, array['name3','name2']),
(2, array['name4','name1']),
(2, array['name5','name2']),
(3, array['name2','name3']),
(3, array['name4','name5']);
select pkey
from t2
where 'name1' = any(name);
| pkey |
| ---: |
| 1 |
| 2 |
select t1.pkey, min(cnt) count
from t1
where not exists (select 1
from t2
where t2.pkey = t1.pkey
and 'name1' = any(name))
group by t1.pkey;
pkey | count
---: | ----:
3 | 16
dbfiddle here

Insert data into two columns only if not duplicate

I have a table user_interests with id(AUTO_INC), user_id, user_interest columns.
I want a easy way to insert data into user_id and user_interest without duplicate entries.
E.g. if I have a table like this before.
+------------------------------+
| ID | user_id | user_interest |
+------------------------------+
| 1 | 2 | Music |
| 2 | 2 | Swimming |
+------------------------------+
If I now insert into table (user_id, user_interest) values ((2, Dance),(2, Swimming), I only need (2,dance) entry to be inserted - not (2, swimming) since (2, swimming) already exists in the table.
I have seen upsert commands, and have also tried creating a command like below but it doesn't work.
INSERT INTO `user_interests`( `user_id`,`interest` )
VALUES ("2","Music")
WHERE (SELECT COUNT(`interest`) FROM `user_interests`
WHERE `interest` = "Music" AND `user_id` = "2"
Having COUNT(`interest`) <=0 )
Use NOT EXISTS method :
INSERT INTO your_table (user_id ,user_interest )
SELECT #userId , #UserIntreset
WHERE NOT EXISTS(SELECT 1 FROM your_table user_id = #userid AND user_interest
= #userinterest )
Or Create unique constraint in your table,
ALTER TABLE your_table
ADD CONSTRAINT Constraint_Name UNIQUE (Column_Name1,Column_Name2)