Best way to store followed users - sql

I know the title isn't so describing but it's really hard to find something generic to describe my situation. If someone wants to edit, feel free...
So, I have a postgres database, with a users table. I would like to store the users followed by one user, and I really don't see how I could do this. I would like to do like SELECT followed_users FROM users WHERE username='username' and this would return me every usernames, or id, or whatever of each followed users. But I don't see any clean way to do this.
Maybe an example would be more describing: user1 is following user2 and user3.
How to store who user1 is following?
EDIT: I don't know how many users the user will follow.
Thank you for your help.

Expanding on my comment above, since it got wordy:
Create a new table called something like user_follows with columns like
user_id1 | user_id2
or
follower_id | follows_id
Then you can query:
SELECT t1.username as follower_username, t3.username as following_usernae
FROM users t1
INNER JOIN user_follows t2 ON t1.user_id = t2.follower_id
INNER JOIN users t3 ON t2.following_id = t3.user_id
WHERE t1.user_id = <your user>
In the end, think of your tables as "Objects". Then when you are presented with a problem like "How do I add users that are following other users" you can determine if this relationship is a new object, or an attribute of an existing object. Since a user might follow more than one other user than the relationship is not a good attribute for "Users", so it gets its own table user_follows.
Since user_follows is just one type of relationship that two users may have to one another, it might make sense to increase the scope of that object to relationships and store the relationship type as an attribute of the table:
user_id1 | user_id2 | relationship_type
where relationships.relationship_type might have values like follows, student of, sister of etc...
So the new query would be something like:
SELECT t1.username as follower_username, t3.username as following_username
FROM users t1
INNER JOIN relationships t2 ON t1.user_id = t2.user_id1
INNER JOIN users t3 ON t2.user_id2 = t3.user_id
WHERE t1.user_id = <your user> AND t2.relationship_type = 'Follows';

I'd add another table, let's call it following for argument's sake, which saves pairs of users and users they are following:
CREATE TABLE following (
user_id INT NOT NULL REFERENCES users(id),
following_id INT NOT NULL REFERENCES users(id),
PRIMARY KEY (user_id, following_id)
)
Then you could query all the user's a specific user is following by joining with the users table (twice). E.g., to get the names of all the users that I (username "mureinik") am following:
SELECT fu.username
FROM following f
JOIN users u ON f.user_id = u.id
JOIN users fu ON f.user_id = fu.id
WHERE u.username = 'mureinik'

Related

How to select records from database table which has to user id (created_by_user, given_to_user) and replace users id by usernames?

This is task table:
This is user table:
I want to select user tasks.
I would give from backend ("given_to_user) id.
But The thing is I want that SELECTED data would have usernames instead of Id which is (created_by_user and given_to_user).
SELECTED table would look like this.
Example:
How to achieve what I want?
Or maybe I designed poorly my tables that It is difficult to select data I need? :)
task table has to id values that are foreign keys to user table.
I tried many thinks but couldn't get desired result.
You did not design poorly the tables.
In fact this is common practice to store the ids that reference columns in other tables. You just need to learn to implement joins:
SELECT
task.id, task.title, task.information, user.usename AS created_by, user2.usename AS given_to
FROM
(task INNER JOIN user ON task.created_by_user = user.id)
INNER JOIN user AS user2 ON task.created_by_user = user2.id;
Do you just want two joins?
select t.*, uc.username as created_by_username,
ug.username as given_to_username
from task t left join
users uc
on t.created_by_user = uc.id left join
users ug
on t.given_to_user = ug.id;
This uses left join in case one of the user ids is missing.

Join Three Tables and Use Count on One of Them

I'm trying to improve my querying abilities and I have been having trouble wrapping my head around joins of moderate complexity. To be as clear and concise as possible, I am trying to join 3 tables. The first join selects posts from all users on users.User_ID=posts.FK_User_ID
**User table**
User_ID pk
username int
email
etc...
**Post Table**
post_ID PK
User_ID FK
Post
etc...
**Like Table**
FK_User_ID references user.user_ID
FK_Post_ID references post.Post_ID (*This is what I want to count*)
after this I want to reference a third table. This table contains a Foreign key to user_ID of user table and a foreign key of FK_Post_ID referencing the primary key Post_ID in the post table. This third table is a linking table of users who have liked the post. I want to count all occurrences of a post ID in this table and append it to each post in the initial user and post join so an output result would look like this:
User_id Username Post_ID Post Number_of_Likes
1 bob 4 'foo' 18
my first join between the two tables works and looks like this (simplified with * for example)
select * from users
join post
on post.User_ID=users.User_ID
Now I need a way to reference the third table to count the total # of times that a post id appears in the like table and append it to each row. This is where I am lost, I have been trying a lot of things to no luck. I believe I need to construct an inner join clause for my second join or I need to pull off a nested select statement? Could someone correct me on this if I am wrong and perhaps guide me in the right direct? Appreciate it!
A common way to do this is make a sub-query that has counts and key then join to that. Like this:
select *
from users
join post on post.FK_User_ID=users.User_ID
left join (
select FK_Post_ID, count(*) as count_of_likes_on_a_post
from likestable
group by FK_Post_ID
) likes on post.Post_ID = likes.FK_Post_ID

Performance of search query bottlenecked 98% by mutual friends despite caching

So on my social networking website, similar to facebook, my search speed is bottlenecked like 98% by this one part. I want to rank the results based on the number of mutual friends the searching user has, with all of the results (we can assume they are users)
My friends table has 3 columns -
user_id (person who sends the request)
friend_id (person who receives the request)
pending (boolean to indicate if the request was accepted or not)
user_id and friend_id are both foreign keys that reference users.id
Finding friend_ids of a user is simple, it looks like this
def friends
Friend.where(
'(user_id = :id OR friend_id = :id) AND pending = false',
id: self.id
).pluck(:user_id, :friend_id)
.flatten
.uniq
.reject { |id| id == self.id }
end
So, after getting the results that match the search query, ranking the results by mutual friends, requires following steps -
Get user_ids of all the searching user's friends - Set(A). Above mentioned friends method does this
Loop over each of the ids in Set(A) -
Get user_ids of all the friends of |id| - Set (B). Again, done by friends method
Find length of intersection of set A and set B
Order in descending order of length of intersections for all results
The most expensive operation over here obviously getting friend_ids of of hundreds of users. So I cached the friend_ids of all the users to speed it up. The difference in performance was amazing, but I'm curious if it can be further improved.
I'm wondering if there is a way that I can get friend_ids of all the desired users in a single query, that is efficient. Something like -
SELECT user_id, [array of friend_ids of the user with id = user_id]
FROM friends
....
Can someone help me write a fast SQL or ActiveRecord query for this?
That way I can store the user_ids of all the search results and their corresponding friend_ids in a hash or some other fast data structure, and then perform the same operation of ranking (that I mentioned above). Since I won't be hitting the cache for thousands of users and their friend_ids, I think it'll speed up the process significantly
Caching your friends table in RAM is not a viable approach if you expect your site to grow to large numbers of users, but I'm sure it does great for a smallish number of users.
It is to your advantage to get the most work you can out of the database with as few calls as possible. It is inefficient to issue large numbers of queries, as the overhead per query as comparatively large. Moreover, databases are built for the kind of task you're trying to perform. I think you are doing far too much work on the Ruby side, and you ought to let the database do the kind of work it does best.
You did not give many details, so I decided to start by defining a minimal model DB:
create table users (
user_id int not null primary key,
nick varchar(32)
);
create table friends (
user_id int not null,
friend_id int not null,
pending bool,
primary key (user_id, friend_id),
foreign key (user_id) references users(user_id),
foreign key (friend_id) references users(user_id),
check (user_id < friend_id)
);
The check constraint on friends avoids the same pair of users being listed in the table in both orders, and of course the PK prevents the same pair from being enrolled multiple times in the same order. The PK also automatically has a unique index associated with it.
Since I suppose the 'is a friend of' relation is supposed to be logically symmetric, it is convenient to define a view that presents that symmetry:
create view friends_symmetric (user_id, friend_id) as (
select user_id, friend_id from friends where not pending
union all
select friend_id, user_id from friends where not pending
);
(If friendship is not symmetric then you can drop the check constraint and the view, and use table friends in place of friends_symmetric in what follows.)
As a model query whose results you want to rank, then, I take this:
select * from users where nick like 'Sat%';
The objective is to return result rows in descending order of the number of friends each hit has in common with User1, the user on whose behalf the query is run. You might do that like so:
(update: modified this query to filter out duplicate results)
select *
from (
select
u.*,
count(mutual.shared_friend_id) over (partition by u.user_id) as num_shared,
row_number() over (partition by u.user_id) as copy_num
from
users u
left join (
select
f1.friend_id as shared_friend_id,
f2.friend_id as friend_id
from friends_symmetric f1
join friends_symmetric f2
on f1.friend_id = f2.user_id
where f1.user_id = ?
and f2.friend_id != f1.user_id
) mutual
on u.user_id = mutual.friend_id
where u.nick like 'Sat%'
) all_rows
where copy_num = 1
order by num_shared desc
where the ? is a placeholder for a parameter containing the ID of the User1.
Edited to add:
I have structured this query with window functions instead of an aggregate query with the idea that such a structure will be easier for the query planner to optimize. Nevertheless, the inline view "mutual" could instead be structured as an aggregate query that computes the number of shared friends that the searching user has with every user that shares at least one friend, and that would permit one level of inline view to be avoided. If performance of the provided query is or becomes inadequate, then it would be worthwhile to test that variant.
There are other ways to approach the problem of performing the sorting in the DB, some of which may perform better, and there may be ways to improve the performance of each by tweaking the database (adding indexes or constraints, modifying table definitions, computing db statistics, ...).
I cannot predict whether that query will outperform what you're doing now, but I assure you that it scales better, and it is easier to maintain.
Assuming that you want a relation of the User model whose primary key is id, you should be able to join onto a subquery that calculates the number of mutual friends:
class User < ActiveRecord::Base
def other_users_ordered_by_mutual_friends
self.class.select("users.*, COALESCE(f.friends_count, 0) AS friends_count").joins("LEFT OUTER JOIN (
SELECT all_friends.user_id, COUNT(DISTINCT all_friends.friend_id) AS friends_count FROM (
SELECT f1.user_id, f1.friend_id FROM friends f1 WHERE f1.pending = false
UNION ALL
SELECT f2.friend_id AS user_id, f2.user_id AS friend_id FROM friends f2 WHERE f2.pending = false
) all_friends INNER JOIN (
SELECT DISTINCT f1.friend_id AS user_id FROM friends f1 WHERE f1.user_id = #{id} AND f1.pending = false
UNION ALL
SELECT DISTINCT f2.user_id FROM friends f2 WHERE f2.friend_id = #{id} AND f2.pending = false
) user_friends ON user_friends.user_id = all_friends.friend_id GROUP BY all_friends.user_id
) f ON f.user_id = users.id").where.not(id: id).order("friends_count DESC")
end
end
The subquery selects all user IDs with associated friends and inner joins that to another select with all of the current user's friends' IDs. Since it groups by the user_id and selects the count, we get the number of mutual friends for each user_id. I have not tested this since I don't have any sample data, but it should work.
Since this returns a scope, you can chain other scopes/conditions to the relation:
current_user.other_users_ordered_by_mutual_friends.where(attribute1: value1).reorder(:attribute2)
The select scope as written will also give you access to the field friends_count on instances within the relation:
<%- current_user.other_users_ordered_by_mutual_friends.each do |user| -%>
<p>User <%= user.id -%> has <%= user.friends_count -%> mutual friends.</p>
<%- end -%>
John had a great idea with the friends_symetric view. With two filtered indexes (one on (friend_id,user_id and the other on (user_id,friend_id) ) it's gonna work great.
However the query can be a bit simpler
WITH user_friends AS(
SELECT user_id, array_agg(friend_id) AS friends
FROM friends_symmetric
WHERE user_id = :user_id -- id of our user
GROUP BY user_id
)
SELECT u.*
,array_agg(friend_id) AS shared_friends -- aggregated ids of friends in case they are needed for something
,count(*) AS shared_count
FROM user_friends AS uf
JOIN friends_symmetric AS f
ON f.user_id = ANY(uf.friends) AND f.friend_id = ANY(uf.friends)
JOIN user
ON u.user_id = f.user_id
WHERE u.nick LIKE 'Sat%' --nickname of our user's friend
GROUP BY u.user_id

Query for orphaned relations

I have to cleanup orphaned associations in a Rails app which uses OmniAuth. For the sake of simplicity, here's a stripped down scenario.
Given two tables:
users:
password_id: INTEGER
<more columns>
passwords:
id: INTEGER NOT NULL
password_digest: VARCHAR
In other words: There's a facultative "user belongs_to password" relation. (There are good reasons why the relation is not the other way around.)
Normally, every user relates to one password. But sometimes a user is deleted and the corresponding password gets orphaned.
Is there an efficient way to find all orphaned passwords (in other words: all passwords which are not related to by any user) with just one SQL query on Postgres?
Thanks for your hints!
This type of query is called an anti-join. The simplest method is:
SELECT p.*
FROM passwords p
LEFT JOIN users u
ON u.password_id = p.id
WHERE u.<primary key field> IS NULL;
Another alternative is the NOT EXISTS method #Politank-Z gave. They should have basically identical query plans.
SELECT p.id FROM PASSWORDS p
WHERE NOT EXISTS ( SELECT 1 FROM users u WHERE p.id = u.password_id );
...is a straightforward enough solution. You could build it around a LEFT JOIN or a MINUS if you prefer. You could also prevent the scenario entirely by adding a foreign key from users to passwords.
Have you tried:
SELECT *
FROM passwords
WHERE I'd NOT IN (SELECT password_id FROM users)
Or
SELECT please.*
FROM passwords please LEFT JOIN
users u ON p.id=u.password_id
WHERE u password_id IS NULL

How do I establish a SQL query for this many-to-many table?

I'm building this bartering-type function in this site using PHP/MySQL and I'm trying to create a query that responds with the following fields:
owner's username, title, offerer's username, offerer's item
Basically I have three tables here with the following fields:
users
user_id, username, email
items_available
item_number, owner_id (foreign key that links to user_id), title
offers_open
trade_id, offerers_id (fk of user_id), offerers_item(fk with item_number), receivers_id (fk from user_id)
How do I do this? I've seen some references to many-to-many SQL queries but they don't seem to particularly fit what I'm looking for (for example, the offerers_id and the owner_ids refer to different users_id in the Users table, but how do I make them distinguishable in the sql query?)
If I understand correctly, this is what you are looking for:
SELECT owner.username, oferrers.username, ia.title
FROM offers_open o
INNER JOIN users AS offerers
ON o.offerers_id = offerers.user_id
INNER JOIN items_available AS ia
ON o.offerers_item= ia.item_number
INNER JOIN users AS owner
ON ia.owner_id = owner.user_id
I don't see a title on the users table, so didn't include one.
I'm not sure exactly what output you want, but since your users table will appear twice in the query, they need to be aliased like so:
SELECT offerer.username AS offerer_username, title, receiver.username AS receiver_username
FROM users AS owner
JOIN items_available ON owner_id = owner.user_id
JOIN offers_open ON offerers_item = item_number
JOIN users AS receiver ON receivers_id
Again, don't know if that's what you want, but hope you get the idea.
It sounds as you need an alias of the users table.
Something like?
select
u.*, ia.*, oo.*,u2.*
from
users as u,
items_available as ia,
offers_open as oo,
users as u2
where
u.user_id = ia_user_id and
oo.user_id = u2.user_id and
oo.item_id = ia.item_id