PostgreSQL: Select the group with specific members - sql

Given the tables below:
CREATE TABLE users (
id bigserial PRIMARY KEY,
name text NOT NULL
);
CREATE TABLE groups (
id bigserial PRIMARY KEY
);
CREATE TABLE group_members (
group_id bigint REFERENCES groups ON DELETE CASCADE,
user_id bigint REFERENCES users ON DELETE CASCADE,
PRIMARY KEY (group_id, user_id)
);
How do we select a group with a specific set of users?
We want an SQL function that takes an array of user IDs and returns the group ID (from the group_members table) with the exact same set of user IDs.
Also, please add indexes if they will make your solution faster.

First, we need to get "candidate" rows from group_members relation, and then with additional run ensure that group size is the same as user_ids array size (here I use CTE https://www.postgresql.org/docs/current/static/queries-with.html):
with target(id) as (
select * from unnest(array[2, 3]) -- here is your input
), candidates as (
select group_id
from group_members
where user_id in (select id from target) -- find all groups which include input
)
select group_id
from group_members
where group_id in (select group_id from candidates)
group by group_id
having array_length(array_agg(user_id), 1)
= array_length(array(select id from target), 1) -- filter out all "bigger" groups
;
Demonstration with some sample data: http://dbfiddle.uk/?rdbms=postgres_9.6&fiddle=a98c09f20e837dc430ac66e01c7f0dd0
This query will utilize indexes you already have, but probably it's worth to add a separate index on group_members (user_id) to avoid intermediate hashing in the first stage of the CTE query.
SQL function is straightforward:
create or replace function find_groups(int8[]) returns int8 as $$
with candidates as (
select group_id
from group_members
where user_id in (select * from unnest($1))
)
select group_id
from group_members
where group_id in (select group_id from candidates)
group by group_id
having array_length(array_agg(user_id), 1) = array_length($1, 1)
;
$$ language sql;
See the same DBfiddle for demonstration.

Related

Insert many foreign-key related records in one query

I am trying to come up with a single query that helps me atomically normalise a table (populate a new table with initial values from another table, and simultaneously add the foreign key reference to it)
Obviously populating one table from another is a standard INSERT INTO ... SELECT ..., but I also want to update the foreign key reference in the 'source' table to reference the new record in the 'new' table.
Let's say I am migrating a schema from:
CREATE TABLE companies (
id INTEGER PRIMARY KEY,
address_line_1 TEXT,
address_line_2 TEXT,
address_line_3 TEXT
)
to:
CREATE TABLE addresses (
id INTEGER PRIMARY KEY,
line_1 TEXT,
line_2 TEXT,
line_3 TEXT
)
CREATE TABLE companies (
id INTEGER PRIMARY KEY,
address_id INTEGER REFERENCES addresses(id)
)
... I thought perhaps a CTE might help of the form
WITH new_addresses AS (
INSERT INTO addresses (line_1, line_2, line_3)
SELECT address_line_1, address_line_2, address_line_3
FROM companies
RETURNING id, companies.id AS company_id -- DOESN'T WORK
)
UPDATE companies
SET address_id = new_addresses.id
FROM new_addresses
WHERE new_addresses.company_id = companies.id
It seems that RETURNING can only return data from the inserted record however, so this will not work
I assume at this point that the answer will be to either use PLSQL or incorporate domain knowledge of the data to do this in a multi step process. My current solution is pretty much:
-- FIRST QUERY
-- ensure address record IDs will be in the same sequential order as their relating companies
INSERT INTO addresses (line_1, line_2, line_3)
SELECT address_line_1, address_line_2, address_line_3
FROM companies
ORDER BY id;
-- SECOND QUERY
-- join, making the assumption that the table IDs are in the same order
WITH address_ids AS (
SELECT id AS address_id, ROW_NUMBER() OVER(ORDER BY id) AS idx
FROM addresses
), company_ids AS (
SELECT id AS company_id, ROW_NUMBER() OVER(ORDER BY id) AS idx
FROM companies
), company_address_ids AS (
SELECT company_id, address_id
FROM address_ids
JOIN company_ids USING (idx)
)
UPDATE companies
SET address_id = company_address_ids.address_id
FROM company_address_ids
WHERE id = company_address_ids.company_id
This is obviously problematic in that it relies on the addresses table containing exactly as many records as the company table, but such a query would be a one-off when the table is first created.

SELECT after specific row with non-sequential (uuid) primary key

With the following table:
CREATE TABLE users (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
inserted_at timestamptz NOT NULL DEFAULT now()
-- other fields
);
How could I retrieve n rows after a specific id, ordered by inserted_at ?
I want to retrieve n rows after a specific id, ordered by inserted_at.
I am expecting something like this:
select u.*
from users u
where u.inserted_at > (select u2.inserted_at from users u2 where u2.id = 'f4ae4105-1afb-4ba6-a2ad-4474c9bae483')
order by u.inserted_at
limit 10;
For this, you want one additional index on users(inserted_at).

INSERT into table with WITH clause not working in postgres

Sorry for the followup question (from INSERT into table if doesn't exists and return id in both cases)
But I couldn't find any solution for my questions.
I have a feedback table whose columns are foreign key of other tables. For ex. scopeid is foregin key of id column in scope table, similarly userid is foreign key of id column from user table and so on.
So, I am trying to insert following data in the table:
scope: home_page,
username: abc
status: fixed
app: demoapp
So, to insert above data, I am trying to write subquery to get the id of each value and use that. Also if that value doesn't exists insert and use the new ID to insert that in feedback table.
So basically I am trying to insert into multiple table (if something doesnt exists) and use those ID to insert into final table which is feedback table.
Hope things are much clearer now.
Here is my feedback table:
id scopeid comment rating userid statusid appid
3 1 test 5 2 1 2
All the id columns are foreign key of other tables and so in my below query I am trying to get the id by name and if not exists add those.
Here is my final query:
INSERT INTO feedbacks (scopeid, comment, rating, userid, statusid, appid)
VALUES
(
-- GET SCOPE ID
(
WITH rows_exists AS (
SELECT id FROM scope
WHERE appid=2 AND NAME = 'application'),
row_new AS (INSERT INTO scope (appid, NAME) SELECT 2, 'application' WHERE NOT EXISTS (SELECT id FROM scope WHERE appid=2 AND name='application') returning id)
SELECT id FROM rows_exists UNION ALL SELECT id FROM row_new
),
-- Comment
'GOD IS HERE TO COMMENT',
-- rating
5,
-- userid
(
WITH rows_exists AS (
SELECT id FROM users
WHERE username='abc'),
row_new AS (INSERT INTO users (username) SELECT 'abc' WHERE NOT EXISTS (SELECT id FROM users WHERE username='abc') returning id)
SELECT id FROM rows_exists UNION ALL SELECT id FROM row_new
),
-- statusid
(SELECT id FROM status WHERE NAME='received'),
-- appid
(
WITH rows_exists AS (
SELECT id FROM apps
WHERE name='google'),
row_new AS (INSERT INTO apps (name) SELECT 'google' WHERE NOT EXISTS (SELECT id FROM apps WHERE NAME='google') returning id)
SELECT id FROM rows_exists UNION ALL SELECT id FROM row_new
)
)
But I get following Error:
with clause containing a data-modifying statement must be at the top level
Is that even possible what I am trying to achieve by this way or other method.
The following inserts ids that don't exist and then inserts the resulting id:
with s as (
select id
from scope
where appid = 2 AND NAME = 'application'
),
si as (
insert into scope (appid, name)
select v.appid, v.name
from (values (2, 'application')) v(appid, name)
where not exists (select 1 from scope s where s.appid = v.appid and s.name = v.name)
returning id
),
. . . similar logic for other tables
insert into feedback (scopeid, comment, . . . )
select (select id from s union all select id from is) as scopeid,
'test' as comment,
. . .;
You should be sure you have unique constraints in each of the table for the values you are looking for. Otherwise, you could have a race condition and end up inserting the same row multiple times in a multithreaded environment.

SQLite: How would you rephrase the query?

I created a basic movie database and for that I'm working with SQLite.
I have a table, which looks like this:
CREATE TABLE movie_collection (
user_id INTEGER NOT NULL,
movie_id INTEGER NOT NULL,
PRIMARY KEY (user_id, movie_id),
FOREIGN KEY (user_id) REFERENCES user (id),
FOREIGN KEY (movie_id) REFERENCES movie (id)
)
As one simple task, I want to show one user (let's say user_id = 1) the whole movie collections, in which the actual user(user_id = 1) might or might not have some movie collection. I also have to prevent the multiple result sets, where more than one user have the same movie record in their collection, especially if this involves the actual user (user_id = 1) then he has the priority, that is if there are let's say 3 records as following:
user_id movie_id
-------- ---------
1 17
5 17
8 17
Then the result set must have the record (1, 17) and not other two.
For this task I wrote a sql query like this:
SELECT movie_collect.user_id, movie_collect.movie_id
FROM (
SELECT user_id, movie_id FROM movie_collection WHERE user_id = 1
UNION
SELECT user_id, movie_id FROM movie_collection WHERE user_id != 1 AND movie_id NOT IN (SELECT movie_id FROM movie_collection WHERE user_id = 1)
) AS movie_collect
Altough this query delivers pretty much that what I need, but just out of curiosity I wanted to ask, if someone else has an another idea to solve this problem.
Thank you.
The outer query is superfluous:
SELECT user_id, movie_id
FROM movie_collection
WHERE user_id = 1
UNION
SELECT user_id, movie_id
FROM movie_collection
WHERE user_id != 1
AND movie_id NOT IN (SELECT movie_id
FROM movie_collection
WHERE user_id = 1)
And UNION removes duplicates, so you do not need to check for uid in the second subquery:
SELECT user_id, movie_id
FROM movie_collection
WHERE user_id = 1
UNION
SELECT user_id, movie_id
FROM movie_collection
WHERE movie_id NOT IN (SELECT movie_id
FROM movie_collection
WHERE user_id = 1)
And the only difference between the two subqueries is the WHERE clause, so you can combine them:
SELECT user_id, movie_id
FROM movie_collection
WHERE user_id = 1
OR movie_id NOT IN (SELECT movie_id
FROM movie_collection
WHERE user_id = 1);

Optimizing query when trying to find latest record in multiple tables for specific column

Problem: Find the most recent record based on (created) column for each (linked_id) column in multiple tables, the results should include (user_id, MAX(created), linked_id). The query must also be able to be used with a WHERE clause to find a single record based on the (linked_id).
There is actually several tables in question but here is 3 tables so you can get the idea of the structure (there is several other columns in each table that have been omitted since they are not to be returned).
CREATE TABLE em._logs_adjustments
(
id serial NOT NULL,
user_id integer,
created timestamp with time zone NOT NULL DEFAULT now(),
linked_id integer,
CONSTRAINT _logs_adjustments_pkey PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
);
CREATE TABLE em._logs_assets
(
id serial NOT NULL,
user_id integer,
created timestamp with time zone NOT NULL DEFAULT now(),
linked_id integer,
CONSTRAINT _logs_assets_pkey PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
);
CREATE TABLE em._logs_condition_assessments
(
id serial NOT NULL,
user_id integer,
created timestamp with time zone NOT NULL DEFAULT now(),
linked_id integer,
CONSTRAINT _logs_condition_assessments_pkey PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
);
The query i'm currently using with a small hack to get around the need for user_id in the GROUP BY clause, if possible array_agg should be removed.
SELECT MAX(MaxDate), linked_id, (array_agg(user_id ORDER BY MaxDate DESC))[1] AS user_id FROM (
SELECT user_id, MAX(created) as MaxDate, asset_id AS linked_id FROM _logs_assets
GROUP BY asset_id, user_id
UNION ALL
SELECT user_id, MAX(created) as MaxDate, linked_id FROM _logs_adjustments
GROUP BY linked_id, user_id
UNION ALL
SELECT user_id, MAX(created) as MaxDate, linked_id FROM _logs_condition_assessments
GROUP BY linked_id, user_id
) as subQuery
GROUP BY linked_id
ORDER BY linked_id DESC
I receive the desired results but don't believe it is the right way to be doing this, especially when array_agg is being used and shouldn't and some tables can have upwards of 1.5+ million records making the query take upwards of 10-15+ seconds to run. Any help/steering in the right direction is much appreciated.
distinct on
SELECT DISTINCT ON ( expression [, ...] ) keeps only the first row of each set of rows where the given expressions evaluate to equal. The DISTINCT ON expressions are interpreted using the same rules as for ORDER BY (see above). Note that the "first row" of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first
select distinct on (linked_id) created, linked_id, user_id
from (
select user_id, created, asset_id as linked_id
from _logs_assets
union all
select user_id, created, linked_id
from _logs_adjustments
union all
select user_id, created, linked_id
from _logs_condition_assessments
) s
order by linked_id desc, created desc