Optimize query with IN clause for one-to-many association - sql

Currently having 2 tables
Users -> 1 million records
Requests -> 10 millions records
A User has many Requests. I'm fetching all the users alongside with their last created Request with by something like the following query:
SELECT *
FROM users AS u
INNER JOIN requests AS r
ON u.id = r.user_id
WHERE r.id IN (
SELECT MAX(r.id)
FROM users u
INNER JOIN requests r ON r.user_id = u.id
GROUP BY u.id
);
which does work, but with very very poor performance (> 7 sec). I understand the reason why and i'm trying to find a solution for that, even if i have to modify the schema.
Note: Requests table consists of Boolean columns, and I'm not quite sure if indexing will help here.

distinct on () is one way to do this:
select distinct on (u.id) *
from users u
join requests r on u.id = r.user_id
order by u.id, r.id desc;
Another option is a lateral join:
select *
from users u
join lateral (
select *
from requests r1
where u.id = r1.user_id
order by id desc
limit 1
) r on true

Related

How create SQL pagination difficult join query with duplicate data?

I have several tables in the database.
Users, profiles and user roles.
The relationship of profiles and users one to one.
The relationship of roles and users many to many.
To select all users, I send the following request:
SELECT A.role_id, A.role_name, A.user_id,B.user_username, B.user_password, B.profile_color_text, B.profile_color_menu, B.profile_color_bg FROM
(SELECT Roles.role_id, Roles.role_name, UserRoles.user_id
FROM Roles INNER JOIN UserRoles ON Roles.role_id = UserRoles.role_id) AS A
LEFT JOIN
(SELECT Users.user_username, Users.user_password, Profiles.profile_color_text, Profiles.profile_color_menu, Profiles.profile_color_bg, Profiles.profile_id
FROM Users INNER JOIN Profiles ON Users.user_id = Profiles.profile_id) AS B
ON A.user_id = B.profile_id;
The question is how do I select a pagination?
I would get the 10 users first, then perform the joins. Two reasons for this:
Since you don't want specifically 10 results but just the results of 10 users, which could contain any number of rows, you can't get all the data then limit it, otherwise you could be getting 10 rows containing data for 5 users;
Even if point 1 were irrelevant because there was always a 1-1 relationship, and especially if the number of results is small like 10, it's faster to get those results first and then join on that smaller "table", rather than doing all your joins on all the data and then limiting it.
.
SELECT
u.user_id,
u.user_username,
u.user_password,
r.role_id,
r.role_name,
p.profile_id,
p.profile_color_text,
p.profile_color_menu,
p.profile_color_bg
FROM (
SELECT user_id, user_username, user_password
FROM users
ORDER BY ???
OFFSET 10
LIMIT 10
) AS u
LEFT JOIN profiles AS p
ON u.user_id = p.profile_id
LEFT JOIN userroles AS ur
ON u.user_id = ur.user_id
LEFT JOIN roles AS r
ON ur.role_id = r.role_id
I assume you'll want some order, so I've put an ORDER BY in there - to be completed.
OFFSET added to get the second page of results; first page wouldn't require it, or would be OFFSET 0. Then a LIMIT of course to limit the page size.
I've also restructured the joins in a way that made more sense to me.

Querying for many relations of type "hasMany" in one query - lot of rows returned

Lets say I have model User and models Profile, Comment, Tag, Friend.
There exist following relations between models:
User.hasMany(Profile)
User.hasMany(Comment)
User.hasMany(Tag)
User.hasMany(Friend)
by hasMany relation I mean that for example Profiles table has FK column on Users.id.
I want to fetch User by user id and also fetch all associations. I use ORM to do this and it generates following query:
SELECT *
FROM "Users" users
LEFT JOIN "Profiles" profiles ON profiles."UserId" = users.id
LEFT JOIN "Comments" comments ON comments."UserId" = users.id
LEFT JOIN "Tags" tags ON tags."UserId" = users.id
LEFT JOIN "Friends" friends ON friends."UserId" = users.id
WHERE users.id = 1
The problem is that rows returned by query are multiplied. So if user has 10 profiles, 10 comments, 10 tags and 10 friends then query will return 10*10*10*10 = 10 000 rows. This is quite a lot and it require time to transfer from db to app and also time for parsing it (and memory too!). So how to avoid such situation? Should I make separate queries to DB? Or there some special trick that will stop multiplying rows by each others?
For now query that I am using returns 73k rows (!) and consumes ~400MB (!!!) before parsing to desired model structure.
This especially "escalates" if I have many relations of type hasManyThrough (which adds 1 additional join table per relation/association)
Would a union work?
select p.id from users u join profiles p on p."Userid" = u.id where u.id = 1
union all
select c.id from users u join comments c on c."Userid" = u.id where u.id = 1
union all
select t.id from users u join tags t on t."Userid" = u.id where u.id = 1
union all
select f.id from users u join friends f on f."Userid" = u.id where u.id = 1

How can I get records from one table which do not exist in a related table?

I have this users table:
and this relationships table:
So each user is paired with another one in the relationships table.
Now I want to get a list of users which are not in the relationships table, in either of the two columns (user_id or pair_id).
How could I write that query?
First try:
SELECT users.id
FROM users
LEFT OUTER JOIN relationships
ON users.id = relationships.user_id
WHERE relationships.user_id IS NULL;
Output:
This is should display only 2 results: 5 and 6. The result 8 is not correct, as it already exists in relationships. Of course I'm aware that the query is not correct, how can I fix it?
I'm using PostgreSQL.
You need to compare to both values in the on statement:
SELECT u.id
FROM users u LEFT OUTER JOIN
relationships r
ON u.id = r.user_id or u.id = r.pair_id
WHERE r.user_id IS NULL;
In general, or in an on clause can be inefficient. I would recommend replacing this with two not exists statements:
SELECT u.id
FROM users u
WHERE NOT EXISTS (SELECT 1 FROM relationships r WHERE u.id = r.user_id) AND
NOT EXISTS (SELECT 1 FROM relationships r WHERE u.id = r.pair_id);
I like the set operators
select id from users
except
select user_id from relationships
except
select pair_id from relationships
or
select id from users
except
(select user_id from relationships
union
select pair_id from relationships
)
This is a special case of:
Select rows which are not present in other table
I suppose this will be simplest and fastest:
SELECT u.id
FROM users u
WHERE NOT EXISTS (
SELECT 1
FROM relationships r
WHERE u.id IN (r.user_id, r.pair_id)
);
In Postgres, u.id IN (r.user_id, r.pair_id) is just short for:(u.id = r.user_id OR u.id = r.pair_id).
The expression is transformed that way internally, which can be observed from EXPLAIN ANALYZE.
To clear up speculations in the comments: Modern versions of Postgres are going to use matching indexes on user_id, and / or pair_id with this sort of query.
Something like:
select u.id
from users u
where u.id not in (select r.user_id from relationships r)
and u.id not in (select r.pair_id from relationships r)

Proper pagination in a JOIN select

I have a SQL statement
select *
from users u left join files f
on u.id = f.user_id
where f.mime_type = 'jpg'
order by u.join_date desc
limit 10 offset 10
The relationship is 1-N: user may have many files.
This effectively selects the second 10-element page.
The problem is this query limits/offsets a joined table, but I want to limit/offset distinct rows from the first (users) table.
How to? I target PostgreSQL and HSQLDB
You need to limit the select on the outer table first and then join the dependent table to the result.
select * from (select * from users where f.mime_type = 'jpg' limit 10 offset 10) as u
left join files f
on u.id = f.user_id
You can also use GROUP_CONCAT() and GROUP BY to paginate and reduce the number of rows returned.
select u.id, u.name, GROUP_CONCAT(f.id) as file_ids
from users u left join files f
on u.id = f.user_id
where f.mime_type = 'jpg'
group by u.id
order by u.join_date desc
limit 10 offset 10
To combine multiple columns use this
select u.id, u.name, GROUP_CONCAT(f.id, '|', f.name) as file_ids
from users u left join files f
on u.id = f.user_id
where f.mime_type = 'jpg'
group by u.id
order by u.join_date desc
limit 10 offset 10
Also have a look at this and this

sql using count & group by without using distinct keyword?

I want to optimize this query
**SELECT * FROM Users WHERE Active = 1 AND UserId IN (SELECT UserId FROM Users_Roles WHERE RoleId IN (SELECT RoleId FROM Roles WHERE PermissionLevel >= 100)) ORDER BY LastName**
execution time became less wen i replace above query with joins as below,
**SELECT u.* FROM Users u INNER JOIN Users_Roles ur ON (u.UserId = ur.UserId) INNER JOIN Roles r ON (r.RoleId = ur.RoleId) WHERE u.Active = 1 AND r.PermissionLevel > 100 GROUP BY u.UserId ORDER BY u.LastName**
But the above query gives duplicate records since my roles table has more than one entry for every user.
I cant use DISTINCT since there is a function where i find count by replacing SELECT(*) FROM to SELECT COUNT(*) FROM to find count for pagination and then execute count query and result query
As we already known that count & GROUP BY is used together will result in bad output.
Now i want to optimize the query and have to find number of rows ie count for the query. Please give be better way find out the result.
It is difficult to optimise other peoples queries without fully knowing the schema, what is indexed what isn't, how much data there is, what your DBMS is etc. Even with this we can't see execution plans, IO statistics etc. With this in mind, the below may not be better than what you already have, but it is how I would write the query in your situation.
SELECT u.*
FROM Users u
INNER JOIN
( SELECT ur.UserID
FROM Users_Roles ur
INNER JOIN Roles r
ON r.RoleID = ur.RoleID
WHERE r.PermissionLevel > 100
GROUP BY ur.UserID
) ur
ON u.UserId = ur.UserId
WHERE u.Active = 1
ORDER BY u.LastName