Optimise many-to-many join - sql

I have three tables: groups and people and groups_people which forms a many-to-many relationship between groups and people.
Schema:
CREATE TABLE groups (
id SERIAL PRIMARY KEY,
name TEXT
);
CREATE TABLE people (
id SERIAL PRIMARY KEY,
name TEXT,
join_date TIMESTAMP
);
CREATE TABLE groups_people (
group_id INT REFERENCES groups(id),
person_id INT REFERENCES people(id)
);
When I want to query for the latest 10 people who recenlty joined the group which has id = 1:
WITH person_ids AS (SELECT person_id FROM groups_people WHERE group_id = 1)
SELECT * FROM people WHERE id = ANY(SELECT person_id FROM person_ids)
ORDER BY join_date DESC LIMIT 10;
The query needs to scan all of the joined people then ordering them before selecting. That would be slow if the group containing too many people.
Is there anyway to work around it?

Schema (re-)design to allow same person joining multiple group
Since you mentioned that the relationship between groups and people
is many-to-many, I think you may want to move join_date to groups_people
(from people) because the same person can join different groups and each
such event has its own join_date
So I would change the schema to
CREATE TABLE people (
id SERIAL PRIMARY KEY,
name TEXT --, -- change
-- join_date TIMESTAMP -- delete
);
CREATE TABLE groups_people (
group_id INT REFERENCES groups(id),
person_id INT REFERENCES people(id), -- change
join_date TIMESTAMP -- add
);
Query
select
p.id
, p.name
, gp.join_date
from
people as p
, groups_people as gp
where
p.id = gp.person_id
and gp.group_id=1
order by gp.join_date desc
limit 10
Disclaimer: The above query is in MySQL syntax (the question was originally tagged with MySQL)

This seems much easier to write as a simple join with order by and limit:
select p.*
from people p join
groups_people gp
on p.id = gp.person_id
where gp.group_id = 1
order by gp.join_date desc
limit 10; -- or fetch first 10 rows only

Try rewriting using EXISTS
SELECT *
FROM people p
WHERE EXISTS (SELECT 1
FROM groups_people ps
WHERE p.id = ps.person_id and group_id = 1)
ORDER BY join_date DESC
LIMIT 10;

Related

Get user from table based on id

I have these Postgres tables:
create table deals_new
(
id bigserial primary key,
slip_id text,
deal_type integer,
timestamp timestamp,
employee_id bigint
constraint employee_id_fk
references common.employees
);
create table twap
(
id bigserial primary key,
deal_id varchar not null,
employee_id bigint
constraint fk_twap__employee_id
references common.employees,
status integer
);
create table employees
(
id bigint primary key,
account_id integer,
first_name varchar(150),
last_name varchar(150)
);
New table to query:
create table accounts
(
id bigint primary key,
account_name varchar(150) not null
);
I use this SQL query:
select d.*, t.id as twap_id
from common.deals_new d
left outer join common.twap t on
t.deal_id = d.slip_id and
d.timestamp between '11-11-2021' AND '11-11-2021' and
d.deal_type in (1, 2) and
d.quote_id is null
where d.employee_id is not null
order by d.timestamp desc, d.id
offset 10
limit 10;
How I can extend this SQL query to search also in table employees by account_id and map the result in table accounts by id? I would like to print also accounts. account_name based on employees .account_id.
You need two joins to to make this work for you. One join to get to the employee table, and one more join to get to the accounts table.
select d.*, t.id as twap_id, a.account_name
from common.deals_new d
left outer join common.twap t on
t.deal_id = d.slip_id and
d.timestamp between '11-11-2021' AND '11-11-2021' and
d.deal_type in (1, 2) and
d.quote_id is null
join employees as e on d.employee_id = e.id
join accounts as a on a.id = e.account_id
where d.employee_id is not null
order by d.timestamp desc, d.id
offset 10
limit 10;
Note: I did not fiddle this one, so could have a typo, but I think you get the idea here.

optmize delete based on "count" of elements in table with foreign key

I has two tables, "tracks" (header of track and track_points - points in track).
Schema:
CREATE TABLE tracks(
id INTEGER PRIMARY KEY ASC,
start_time TEXT NOT NULL
);
CREATE TABLE track_points (
id INTEGER PRIMARY KEY AUTOINCREMENT,
data BLOB,
track_id INTEGER NOT NULL,
FOREIGN KEY(track_id) REFERENCES tracks(id) ON DELETE CASCADE
);
CREATE INDEX track_id_idx ON track_points (track_id);
CREATE INDEX start_time_idx ON tracks (start_time);
And I want delete all "tracks" that has 0 or 1 point.
Note if 0 points in tracks, then it has no rows in "track_points".
I write such query:
DELETE FROM tracks WHERE tracks.id IN
(SELECT track_id FROM
(SELECT tracks.id as track_id, COUNT(track_points.id) as track_len FROM tracks
LEFT JOIN track_points ON tracks.id=track_points.track_id GROUP BY tracks.id)
WHERE track_len<=1)
it seems to work, but I wonder is it possible to optmize such query?
I mean time of work (now 10 seconds on big table on my machine).
Or may be simplification of this SQL code is possible (with preservance of work time of course)?
You can simplify your code by removing 1 level of your subqueries, because you can achieve the same with a HAVING clause instead of an outer WHERE clause:
DELETE FROM tracks
WHERE id IN (
SELECT t.id
FROM tracks t LEFT JOIN track_points p
ON t.id = p.track_id
GROUP BY t.id
HAVING COUNT(p.id) <= 1
);
The above code may not make any difference, but it's simpler.
The same logic could also be applied by using EXCEPT:
DELETE FROM tracks
WHERE id IN (
SELECT id FROM tracks
EXCEPT
SELECT track_id
FROM track_points
GROUP BY track_id
HAVING COUNT(*) > 1
);
What you can try is a query that does not involve the join of the 2 tables.
Aggregate only in track_points and get the track_ids with 2 or more occurrences. Then delete all the rows from tracks with ids that are not included in the result of the previous query:
DELETE FROM tracks
WHERE id NOT IN (
SELECT track_id
FROM track_points
GROUP BY track_id
HAVING COUNT(*) > 1
);

Postgres many to one relationship join multiple tables and select all rows, provided that at least one row matches some criterea

Suppose I have a schema something like
create table if not exists user (
id serial primary key,
name text not null
);
create table if not exists post (
id serial primary key,
user_id integer not null references user (id),
score integer not null
)
I want to run a query that selects a row from the user table by ID, and all the rows that reference it from the post table, provided that at least one row in the post table has a score of greater than some number n (e.g. 50). I'm not exactly sure how to do this though.
You can use window functions. Let me assume that post has a user_id column so the tables can be tied together:
select u.*
from user u join
(select p.*, max(score) over (partition by user_id) as max_score
from post p
) p
on p.user_id = u.id
where p.max_score > 50;
If you just wanted all scores, then aggregation with filtering might be sufficient:
select u.*, array_agg(p.score order by p.score desc)
from user u join
post p
) p
on p.user_id = u.id
group by u.id
having max(p.score) > 50;

How to massive update?

I have three tables:
group:
id - primary key
name - varchar
profile:
id - primary key
name - varchar
surname - varchar
[...etc...]
profile_group:
profile_id - integer, foreign key to table profile
group_id - integer, foreign key to table group
Profiles may be in many groups. I have group named "Users" with id=1 and I want to assign all users to this group but only if there was no such entry for the table profiles.
How to do it?
If I understood you correctly, you want to add entries like (profile_id, 1) into profile_group table for all profiles, that were not in this table before. If so, try this:
INSERT INTO profile_group(profile_id, group_id)
SELECT id, 1 FROM profile p
LEFT JOIN profile_group pg on (p.id=pg.profile_id)
WHERE pg.group_id IS NULL;
What you want to do is use a left join to the profile group table and then exclude any matching records (this is done in the where clause of the below SQL statement).
This is faster than using not in (select xxx) since the query profiler seems to handle it better (in my experience)
insert into profile_group (profile_id, group_id)
select p.id, 1
from profiles p
left join profile_group pg on p.id = pg.profile_id
and pg.group_id = 1
where pg.profile_id is null

Join tables in sqlite with many to many

I have the following database schema:
create table people (
id integer primary key autoincrement,
);
create table groups (
id integer primary key autoincrement,
);
and I already have which people are members of which groups in a separate file (let's say in tuples of (person id, group id). How can I structure my database schema such that it's easy to access a person's groups, and also easy to access the members of a group? It is difficult and slow to read the tuples that I currently have, so I want this to be in database form. I can't have things like member1, member2, etc. as columns because the number of people in a group is currently unlimited.
Move your text file into a database table
CREATE TABLE groups_people (
groups_id integer,
people_id integer,
PRIMARY KEY(group_id, people_id)
);
And select all people that are a member of group 7
SELECT * FROM people p
LEFT JOIN groups_people gp ON gp.people_id = p.id
WHERE gp.groups_id = '7';
And select all the groups that person 5 is in
SELECT * FROM groups g
LEFT JOIN groups_people gp ON gp.groups_id = g.id
WHERE gp.people_id = '5';