How create SQL pagination difficult join query with duplicate data? - sql

I have several tables in the database.
Users, profiles and user roles.
The relationship of profiles and users one to one.
The relationship of roles and users many to many.
To select all users, I send the following request:
SELECT A.role_id, A.role_name, A.user_id,B.user_username, B.user_password, B.profile_color_text, B.profile_color_menu, B.profile_color_bg FROM
(SELECT Roles.role_id, Roles.role_name, UserRoles.user_id
FROM Roles INNER JOIN UserRoles ON Roles.role_id = UserRoles.role_id) AS A
LEFT JOIN
(SELECT Users.user_username, Users.user_password, Profiles.profile_color_text, Profiles.profile_color_menu, Profiles.profile_color_bg, Profiles.profile_id
FROM Users INNER JOIN Profiles ON Users.user_id = Profiles.profile_id) AS B
ON A.user_id = B.profile_id;
The question is how do I select a pagination?

I would get the 10 users first, then perform the joins. Two reasons for this:
Since you don't want specifically 10 results but just the results of 10 users, which could contain any number of rows, you can't get all the data then limit it, otherwise you could be getting 10 rows containing data for 5 users;
Even if point 1 were irrelevant because there was always a 1-1 relationship, and especially if the number of results is small like 10, it's faster to get those results first and then join on that smaller "table", rather than doing all your joins on all the data and then limiting it.
.
SELECT
u.user_id,
u.user_username,
u.user_password,
r.role_id,
r.role_name,
p.profile_id,
p.profile_color_text,
p.profile_color_menu,
p.profile_color_bg
FROM (
SELECT user_id, user_username, user_password
FROM users
ORDER BY ???
OFFSET 10
LIMIT 10
) AS u
LEFT JOIN profiles AS p
ON u.user_id = p.profile_id
LEFT JOIN userroles AS ur
ON u.user_id = ur.user_id
LEFT JOIN roles AS r
ON ur.role_id = r.role_id
I assume you'll want some order, so I've put an ORDER BY in there - to be completed.
OFFSET added to get the second page of results; first page wouldn't require it, or would be OFFSET 0. Then a LIMIT of course to limit the page size.
I've also restructured the joins in a way that made more sense to me.

Related

Optimize query with IN clause for one-to-many association

Currently having 2 tables
Users -> 1 million records
Requests -> 10 millions records
A User has many Requests. I'm fetching all the users alongside with their last created Request with by something like the following query:
SELECT *
FROM users AS u
INNER JOIN requests AS r
ON u.id = r.user_id
WHERE r.id IN (
SELECT MAX(r.id)
FROM users u
INNER JOIN requests r ON r.user_id = u.id
GROUP BY u.id
);
which does work, but with very very poor performance (> 7 sec). I understand the reason why and i'm trying to find a solution for that, even if i have to modify the schema.
Note: Requests table consists of Boolean columns, and I'm not quite sure if indexing will help here.
distinct on () is one way to do this:
select distinct on (u.id) *
from users u
join requests r on u.id = r.user_id
order by u.id, r.id desc;
Another option is a lateral join:
select *
from users u
join lateral (
select *
from requests r1
where u.id = r1.user_id
order by id desc
limit 1
) r on true

Trying to count the number of occurences that 3 columns from 2 tables have on my organizations table? I need the occurrences joined in one table

-- 2. In one table, show how many private topics, admins, and standard users each organization has.
SELECT organizations.name, COUNT(topics.privacy) AS private_topic, COUNT(users.type) AS user_admin, COUNT(users.type) AS user_standard
FROM organizations
LEFT JOIN topics
ON organizations.id=topics.org_id
AND topics.privacy='private'
LEFT JOIN users
ON users.org_id=organizations.id
AND users.type='admin'
LEFT JOIN users
ON users.org_id=organizations.id
AND users.type='standard'
GROUP BY organizations.name
;
org_id is the foreign key that reals both the users table and topics table. It keeps giving me the wrong result by only either counting the number of admins or standard users and putting that for all rows in the each column. Any help is really appreciated as I have been stuck on this for a while now!
So, I am getting an error when I do as you said which is that the users table cannot be specified more than once. I updated the code to how you said to write it but still nothing. They really don't give me any sample data either but I just made some queries and saw the number of times there are private topics for example, which is in the privacy column of the topics table. When I dont get this error as I said, the joins seem to overwrite themselves where each row for all the columns is the same as the last join.
It appears to me that topics and users have no relationship. You're just trying to get the result together in a single query. There are other and possibly better ways to accomplish that but I think this will fix what you've got already (assuming you have id columns for each table.)
SELECT
organizations.name,
COUNT(DISTINCT topics.id) AS private_topic,
COUNT(DISTINCT users.id) FILTER (WHERE users.type = 'admin') AS user_admin,
COUNT(DISTINCT users.id) FILTER (WHERE users.type = 'standard') AS user_standard`
FROM organizations
LEFT JOIN topics
ON organizations.id = topics.org_id AND topics.privacy = 'private'
LEFT JOIN users
ON users.org_id = organizations.id
GROUP BY organizations.name;
I propose this as a more straightforward way:
SELECT
min(o.name) as "name",
(
select count(*) from topics t
where t.org_id = o.id AND t.privacy = 'private'
) as private_topics,
(
select count(*) from users u
where u.org_id = o.id and u.type = 'admin'
) AS user_admin,
(
select count(*) from users u
where u.org_id = o.id and u.type = 'standard'
) AS user_standard
FROM organizations o
GROUP BY o.id;

Is it true that JOINS can be used everywhere to replace Subqueries in SQL

I heard people saying that table joins can be used everywhere to replace sub-queries. I tested it in my query, but found that appropriate data set was only retrieved when I used sub-queries. I was not able to get same data set using joins. I am not sure if what I found is right because I am a newcomer in RDBMS, thus not so much experienced. I will try to draw the schema (in words) of the database in which I was experimenting:
The database has two tables:
Users (ID, Name, City) and Friendship (ID, Friend_ID)
Goal: Users table is designed to store simple user data and Friendship table represents Friendship between users. Friendship table has both the columns as foreign keys, referencing to Users.ID. Tables have many-to-many relationship between them.
Question: I have to retrieve Users.ID and Users.Name of all the Users, which are not friends with a particular user x, but are from same city (much like fb's friend suggestion system).
By using subquery, I am able to achieve this. Query looks like:
SELECT ID, NAME
FROM USERS AS U
WHERE U.ID NOT IN (SELECT FRIENDS_ID
FROM FRIENDSHIP,
USERS
WHERE USERS.ID = FRIENDSHIP.ID AND USERS.ID = x)
AND U.ID != x AND CITY LIKE '% A_CITY%';
Example entries:
Users
Id = 1 Name = Jon City = Mumbai
Id=2 Name=Doe City=Mumbai
Id=3 Name=Arun City=Mumbai
Id=4 Name=Prakash City=Delhi
Friendship
Id= 1 Friends_Id = 2
Id = 2 Friends_Id=1
Id = 2 Friends_Id = 3
Id = 3 Friends_Id = 2
Can I get the same data set in a single query by performing joins. How? Please let me know if my question is not clear. Thanks.
Note: I used inner join in the sub-query by specifying both tables: Friendship, Users. Omitting the Users table and using the U from outside, gives an error (But if not using alias for the table Users, query becomes syntactically okay but result from this query includes ID's and names of users, who have more than one friends, including the user having ID x. Interesting, but is not the topic of the question).
For not in you can use left join and check for is null:
select u.id, u.name
from Users u
left join Friends f on u.id = f.id and f.friend_id = #person
where u.city like '%city%' and f.friend_id is null and u.id <> #person;
There are some cases where you can't work out your way with just inner/left/right joins, but your case is not one of them.
Please check sql fiddle: http://sqlfiddle.com/#!9/1c5b1/14
Also about your note: What you tried to do can be achieved with lateral join or cross apply depending on the engine you are using.
You can rewrite your query using only joins. The trick is to join to the User tables once with an inner join to identify users within the same city and reference the Friendship table with a left join and a null check to identify non-friends.
SELECT
U1.ID,
U1.Name
FROM
USERS U1
INNER JOIN
USERS U2
ON
U1.CITY = U2.CITY
LEFT JOIN
FRIENDSHIP F
ON
U2.ID = F.ID AND
U1.ID = F.FRIEND_ID
WHERE
U2.id = X AND
U1.ID <> U2.id AND
F.id IS NULL
The above query doesn't handle the situation where USER x's primary key is in the FRIEND_ID column of the FRIENDSHIP table. I assume because your subquery version doesn't handle that situation, perhaps you create 2 rows for each friendship, or friendships are not bi-directional.
Joins and subqueries can be used to achieve similar results in some cases, but certainly not all. As an example, this query with a subquery could not be achieve vis-a-vis a join:
SELECT ID, COLUMN1, COUNT(*) FROM MYTABLE
WHERE ID IN (
SELECT DISTINCT ID FROM MYTABLE
WHERE COLUMN2 NOT IN (VALUES1, VALUES2)
)
GROUP BY ID;
This is only one example, but there are many.
Conversely, you cannot get information from another table by using a subquery without joining it.
As to your example
SELECT ID, NAME FROM USERS AS U
WHERE U.ID NOT IN (
SELECT FRIENDS_ID FROM FRIENDSHIP, USERS
WHERE USERS.ID = FRIENDSHIP.ID AND USERS.ID = x)
AND U.ID != x AND CITY LIKE '% A_CITY%';
This could be constructed as:
select ID, NAME from users u
join FRIENDSHIP f on f.ID = u.ID
where u.ID = x
and u.ID != y
and CITY like '%A_CITY';
I changed your second x to a y assumptively, so it wouldn't cause confusion.
Of course, you may also want to LEFT JOIN aka LEFT OUTER JOIN if there is a chance that there may be multiple results in the FRIENDSHIP table.

Querying for many relations of type "hasMany" in one query - lot of rows returned

Lets say I have model User and models Profile, Comment, Tag, Friend.
There exist following relations between models:
User.hasMany(Profile)
User.hasMany(Comment)
User.hasMany(Tag)
User.hasMany(Friend)
by hasMany relation I mean that for example Profiles table has FK column on Users.id.
I want to fetch User by user id and also fetch all associations. I use ORM to do this and it generates following query:
SELECT *
FROM "Users" users
LEFT JOIN "Profiles" profiles ON profiles."UserId" = users.id
LEFT JOIN "Comments" comments ON comments."UserId" = users.id
LEFT JOIN "Tags" tags ON tags."UserId" = users.id
LEFT JOIN "Friends" friends ON friends."UserId" = users.id
WHERE users.id = 1
The problem is that rows returned by query are multiplied. So if user has 10 profiles, 10 comments, 10 tags and 10 friends then query will return 10*10*10*10 = 10 000 rows. This is quite a lot and it require time to transfer from db to app and also time for parsing it (and memory too!). So how to avoid such situation? Should I make separate queries to DB? Or there some special trick that will stop multiplying rows by each others?
For now query that I am using returns 73k rows (!) and consumes ~400MB (!!!) before parsing to desired model structure.
This especially "escalates" if I have many relations of type hasManyThrough (which adds 1 additional join table per relation/association)
Would a union work?
select p.id from users u join profiles p on p."Userid" = u.id where u.id = 1
union all
select c.id from users u join comments c on c."Userid" = u.id where u.id = 1
union all
select t.id from users u join tags t on t."Userid" = u.id where u.id = 1
union all
select f.id from users u join friends f on f."Userid" = u.id where u.id = 1

sql using count & group by without using distinct keyword?

I want to optimize this query
**SELECT * FROM Users WHERE Active = 1 AND UserId IN (SELECT UserId FROM Users_Roles WHERE RoleId IN (SELECT RoleId FROM Roles WHERE PermissionLevel >= 100)) ORDER BY LastName**
execution time became less wen i replace above query with joins as below,
**SELECT u.* FROM Users u INNER JOIN Users_Roles ur ON (u.UserId = ur.UserId) INNER JOIN Roles r ON (r.RoleId = ur.RoleId) WHERE u.Active = 1 AND r.PermissionLevel > 100 GROUP BY u.UserId ORDER BY u.LastName**
But the above query gives duplicate records since my roles table has more than one entry for every user.
I cant use DISTINCT since there is a function where i find count by replacing SELECT(*) FROM to SELECT COUNT(*) FROM to find count for pagination and then execute count query and result query
As we already known that count & GROUP BY is used together will result in bad output.
Now i want to optimize the query and have to find number of rows ie count for the query. Please give be better way find out the result.
It is difficult to optimise other peoples queries without fully knowing the schema, what is indexed what isn't, how much data there is, what your DBMS is etc. Even with this we can't see execution plans, IO statistics etc. With this in mind, the below may not be better than what you already have, but it is how I would write the query in your situation.
SELECT u.*
FROM Users u
INNER JOIN
( SELECT ur.UserID
FROM Users_Roles ur
INNER JOIN Roles r
ON r.RoleID = ur.RoleID
WHERE r.PermissionLevel > 100
GROUP BY ur.UserID
) ur
ON u.UserId = ur.UserId
WHERE u.Active = 1
ORDER BY u.LastName