Designing A Database For User Friendship - sql

I would like to store friendships in a database. My idea is that when user1 becomes friends with user2 I store that friendship so that I can get all of either user's friends if I ever need it. At first I thought I would just store their id's in a table with one insert, but then I thought about some complications while querying the db.
If I have 2 users that have a user id of 10 and 20 should I make two inserts into the db when they become friends
ID USER1 USER2
1 10 20
2 20 10
or is there a way to query the db to only get a particular users friends if I only did one insert like so
ID USER1 USER2
1 10 20
I know the first way can definitely give me what I am looking for but I would like to know if this is good practice and if there is a better alternative. And if the second way can be queried to get me the result I would be looking for like all of user 10's friends.

Brad Christie's suggestion of querying the table in both directions is good.
However, given that MySQL isn't very good at optimizing OR queries, using UNION ALL might be more efficient:
( SELECT u.id, u.name
FROM friendship f, user u
WHERE f.user1 = 1 AND f.user2 = u.id )
UNION ALL
( SELECT u.id, u.name
FROM friendship f, user u
WHERE f.user2 = 1 AND f.user1 = u.id )
Here's a SQLFiddle of it, based on Brad's example. I modified the friendship table to add two-way indexes for efficient access, and to remove the meaningless id column. Of course, with such a tiny example you can't really test real-world performance, but comparing the execution plans between the two versions may be instructive.

A friendship is a two-way bond (for all intents and purposes). Unlike another link (like a message that's one-way) a friendship should only have one entry. However, what you're seeing is correct; you would need to query against both columns to get a user's friends, but that's simple enough:
-- The uses of `1` below is where you'd insert the ID of
-- the person you're looking up friends on
SELECT u.id, u.name
FROM friendship f
LEFT JOIN user u
ON (u.id = f.user1 OR u.id = f.user2)
AND u.id <> 1
WHERE (f.user1 = 1 OR f.user2 = 1)
example here

Related

Relationships query PostgresSQL, Follow/Unfollow functionality with PostgresSQL

I have two tables, Users and Relationships tables. Users table has following columns:
id, name,password,username,email,avatar,followersCount,followingCount,tweetCount.
And the Relationships table has the following columns:
id, followingId, followerId
How should I go about creating a SQL query to extract a user with a specific Id and find id's from Relationships that user is following? So in other words find people that user follows
I've come this far so long
SELECT *
FROM public."Users" JOIN
public."Relationships"
ON (public."Users".id = public."Relationships".id)
If I understand correctly, you want:
SELECT u.*
FROM public."Relationships" r JOIN
public."Users" u
ON u.id = r.followerId
WHERE r.followingId = ?;
? is a parameter placeholder for the user you care about. This returns all the followers of that user.
Do you mean this query
SELECT public."Users".*
FROM public."Users"
JOIN public."Relationships"
ON public."Users".id = public."Relationships".followingId
AND public."Relationships".followerId = a user ID
I am not really clear about followerId and followingId mean but you can change them in the query if it is not what you want.

SQL report duplicated rows into count into count

First of all let me tell you I have no experience in SQL whatsoever, however I changed my positions lately and given the situation it'd be easier for me to run some script then check each record in the application individually. Here's the scenario:
I have two tables:
Users with userID, username, email etc.. and
Documents with DocumentID and UserID, document name and again some other columns.
I want to create a report that will help me check if users have documents attached to their profile.
What I am doing now is
SELECT UsersTable.UserID,
DocumentsTable.DocumentID,
DocumentsTable.UserID
FROM UserTable
LEFT JOIN DocumentTable ON UserTable.UserID = DocumentTable.UserID
The problem I am having is that, some users already have 2 or more documents attached to their profile, this is causing a duplication.
For example, in the report I see such rows
User1 DocumentA
User2 DocumentA
User2 DocumentB
User2 DocumentC
User3 DocumentA
etc.
Is there a way to somehow convert those document to values count based on the UserID? so instead I'd like to see
User1 1
User2 3
User3 1
You are looking for GROUP BY. I would recommend writing the query as:
SELECT ut.UserID, COUNT(dt.UserID)
FROM UserTable ut LEFT JOIN
DocumentTable dt
ON ut.UserID = dt.UserID
GROUP BY ut.UserID
ORDER BY ut.UserID;
Notes:
The use of table aliases makes the query easier to write and to read.
The ORDER BY guarantees that the results are ordered by the user id.
The COUNT() is based on the second table, because there might not be a match.

Select a user by their username and then select data from another table using their UID

Sorry if that title is a bit convoluted... I'm spoiled by an ORM usually and my raw SQL skills are really poor, apparently.
I'm writing an application that links to a vBulletin forum. Users authenticate with their forum username, and the query for that is simple (selecting by username from the users table). The next half of it is more complex. There's also a subscriptions table that has a timestamp in it, but the primary key for these is a user id, not a username.
This is what I've worked out so far:
SELECT
forum.user.userid,
forum.user.usergroupid,
forum.user.password,
forum.user.salt,
forum.user.pmunread,
forum.subscriptionlog.expirydate
FROM
forum.user
JOIN forum.subscriptionlog
WHERE
forum.user.username LIKE 'SomeUSER'
Unfortunately this returns the entirety of the subscriptionlog table, which makes sense because there's no username field in it. Is it possible to grab the subscriptionlog row using the userid I get from forum.user.userid, or does this need to be split into two queries?
Thanks!
The issue is that you are blindly joining the two tables. You need to specify what column they are related by.
I think you want something like:
SELECT * FROM user u
INNER JOIN subscriptionlog sl ON u.id = sl.userid
WHERE u.username LIKE 'SomeUSER'
select * from user u JOIN subscriptions s ON u.id = s.id where u.username = 'someuser'
The bit in bold is what you want to add, it combines the 2 tables into one that you return results from.
try this
SELECT
forum.user.userid,
forum.user.usergroupid,
forum.user.password,
forum.user.salt,
forum.user.pmunread,
forum.subscriptionlog.expirydate
FROM
forum.user
INNER JOIN forum.subscriptionlog
ON forum.subscriptionlog.userid = forum.user.userid
WHERE
forum.user.username LIKE 'SomeUSER'

Performance of search query bottlenecked 98% by mutual friends despite caching

So on my social networking website, similar to facebook, my search speed is bottlenecked like 98% by this one part. I want to rank the results based on the number of mutual friends the searching user has, with all of the results (we can assume they are users)
My friends table has 3 columns -
user_id (person who sends the request)
friend_id (person who receives the request)
pending (boolean to indicate if the request was accepted or not)
user_id and friend_id are both foreign keys that reference users.id
Finding friend_ids of a user is simple, it looks like this
def friends
Friend.where(
'(user_id = :id OR friend_id = :id) AND pending = false',
id: self.id
).pluck(:user_id, :friend_id)
.flatten
.uniq
.reject { |id| id == self.id }
end
So, after getting the results that match the search query, ranking the results by mutual friends, requires following steps -
Get user_ids of all the searching user's friends - Set(A). Above mentioned friends method does this
Loop over each of the ids in Set(A) -
Get user_ids of all the friends of |id| - Set (B). Again, done by friends method
Find length of intersection of set A and set B
Order in descending order of length of intersections for all results
The most expensive operation over here obviously getting friend_ids of of hundreds of users. So I cached the friend_ids of all the users to speed it up. The difference in performance was amazing, but I'm curious if it can be further improved.
I'm wondering if there is a way that I can get friend_ids of all the desired users in a single query, that is efficient. Something like -
SELECT user_id, [array of friend_ids of the user with id = user_id]
FROM friends
....
Can someone help me write a fast SQL or ActiveRecord query for this?
That way I can store the user_ids of all the search results and their corresponding friend_ids in a hash or some other fast data structure, and then perform the same operation of ranking (that I mentioned above). Since I won't be hitting the cache for thousands of users and their friend_ids, I think it'll speed up the process significantly
Caching your friends table in RAM is not a viable approach if you expect your site to grow to large numbers of users, but I'm sure it does great for a smallish number of users.
It is to your advantage to get the most work you can out of the database with as few calls as possible. It is inefficient to issue large numbers of queries, as the overhead per query as comparatively large. Moreover, databases are built for the kind of task you're trying to perform. I think you are doing far too much work on the Ruby side, and you ought to let the database do the kind of work it does best.
You did not give many details, so I decided to start by defining a minimal model DB:
create table users (
user_id int not null primary key,
nick varchar(32)
);
create table friends (
user_id int not null,
friend_id int not null,
pending bool,
primary key (user_id, friend_id),
foreign key (user_id) references users(user_id),
foreign key (friend_id) references users(user_id),
check (user_id < friend_id)
);
The check constraint on friends avoids the same pair of users being listed in the table in both orders, and of course the PK prevents the same pair from being enrolled multiple times in the same order. The PK also automatically has a unique index associated with it.
Since I suppose the 'is a friend of' relation is supposed to be logically symmetric, it is convenient to define a view that presents that symmetry:
create view friends_symmetric (user_id, friend_id) as (
select user_id, friend_id from friends where not pending
union all
select friend_id, user_id from friends where not pending
);
(If friendship is not symmetric then you can drop the check constraint and the view, and use table friends in place of friends_symmetric in what follows.)
As a model query whose results you want to rank, then, I take this:
select * from users where nick like 'Sat%';
The objective is to return result rows in descending order of the number of friends each hit has in common with User1, the user on whose behalf the query is run. You might do that like so:
(update: modified this query to filter out duplicate results)
select *
from (
select
u.*,
count(mutual.shared_friend_id) over (partition by u.user_id) as num_shared,
row_number() over (partition by u.user_id) as copy_num
from
users u
left join (
select
f1.friend_id as shared_friend_id,
f2.friend_id as friend_id
from friends_symmetric f1
join friends_symmetric f2
on f1.friend_id = f2.user_id
where f1.user_id = ?
and f2.friend_id != f1.user_id
) mutual
on u.user_id = mutual.friend_id
where u.nick like 'Sat%'
) all_rows
where copy_num = 1
order by num_shared desc
where the ? is a placeholder for a parameter containing the ID of the User1.
Edited to add:
I have structured this query with window functions instead of an aggregate query with the idea that such a structure will be easier for the query planner to optimize. Nevertheless, the inline view "mutual" could instead be structured as an aggregate query that computes the number of shared friends that the searching user has with every user that shares at least one friend, and that would permit one level of inline view to be avoided. If performance of the provided query is or becomes inadequate, then it would be worthwhile to test that variant.
There are other ways to approach the problem of performing the sorting in the DB, some of which may perform better, and there may be ways to improve the performance of each by tweaking the database (adding indexes or constraints, modifying table definitions, computing db statistics, ...).
I cannot predict whether that query will outperform what you're doing now, but I assure you that it scales better, and it is easier to maintain.
Assuming that you want a relation of the User model whose primary key is id, you should be able to join onto a subquery that calculates the number of mutual friends:
class User < ActiveRecord::Base
def other_users_ordered_by_mutual_friends
self.class.select("users.*, COALESCE(f.friends_count, 0) AS friends_count").joins("LEFT OUTER JOIN (
SELECT all_friends.user_id, COUNT(DISTINCT all_friends.friend_id) AS friends_count FROM (
SELECT f1.user_id, f1.friend_id FROM friends f1 WHERE f1.pending = false
UNION ALL
SELECT f2.friend_id AS user_id, f2.user_id AS friend_id FROM friends f2 WHERE f2.pending = false
) all_friends INNER JOIN (
SELECT DISTINCT f1.friend_id AS user_id FROM friends f1 WHERE f1.user_id = #{id} AND f1.pending = false
UNION ALL
SELECT DISTINCT f2.user_id FROM friends f2 WHERE f2.friend_id = #{id} AND f2.pending = false
) user_friends ON user_friends.user_id = all_friends.friend_id GROUP BY all_friends.user_id
) f ON f.user_id = users.id").where.not(id: id).order("friends_count DESC")
end
end
The subquery selects all user IDs with associated friends and inner joins that to another select with all of the current user's friends' IDs. Since it groups by the user_id and selects the count, we get the number of mutual friends for each user_id. I have not tested this since I don't have any sample data, but it should work.
Since this returns a scope, you can chain other scopes/conditions to the relation:
current_user.other_users_ordered_by_mutual_friends.where(attribute1: value1).reorder(:attribute2)
The select scope as written will also give you access to the field friends_count on instances within the relation:
<%- current_user.other_users_ordered_by_mutual_friends.each do |user| -%>
<p>User <%= user.id -%> has <%= user.friends_count -%> mutual friends.</p>
<%- end -%>
John had a great idea with the friends_symetric view. With two filtered indexes (one on (friend_id,user_id and the other on (user_id,friend_id) ) it's gonna work great.
However the query can be a bit simpler
WITH user_friends AS(
SELECT user_id, array_agg(friend_id) AS friends
FROM friends_symmetric
WHERE user_id = :user_id -- id of our user
GROUP BY user_id
)
SELECT u.*
,array_agg(friend_id) AS shared_friends -- aggregated ids of friends in case they are needed for something
,count(*) AS shared_count
FROM user_friends AS uf
JOIN friends_symmetric AS f
ON f.user_id = ANY(uf.friends) AND f.friend_id = ANY(uf.friends)
JOIN user
ON u.user_id = f.user_id
WHERE u.nick LIKE 'Sat%' --nickname of our user's friend
GROUP BY u.user_id

SQL join on one-to-many relation where none of the many match a given value

Say I have two tables
User
-----
id
first_name
last_name
User_Prefs
-----
user_id
pref
Sample data in User_Prefs might be
user_id | pref
2 | SMS_NOTIFICATION
2 | EMAIL_OPT_OUT
2 | PINK_BACKGROUND_ON_FRIDAYS
And some users might have no corresponding rows in User_Prefs.
I need to query for the first name and last name of any user who does NOT have EMAIL_OPT_OUT as one of their (possibly many, possibly none) User_Pref rows.
SELECT DISTINCT u.* from User u
LEFT JOIN User_Prefs up ON (u.id=up.user_id)
WHERE up.pref<>'EMAIL_OPT_OUT'
gets me everyone who has at least one row that isn't "EMAIL_OPT_OUT", which of course is not what I want. I want everyone with no rows that match "EMAIL_OPT_OUT".
Is there a way to have the join type and the join conditions filter out the rows I want to leave out here? Or do I need a sub-query?
I personally think a "where not exists" type of clause might be easier to read, but here's a query with a join that does the same thing.
select distinct u.* from User u
left join User_Prefs up ON u.id = up.user_id and up.pref = 'EMAIL_OPT_OUT'
where up.user_id is null
Why not have your user preferences stored in the user table as boolean fields? This would simplify your queries significantly.
SELECT * FROM User WHERE EMAIL_OPT_OUT = false