Check if inverse relationship exists in table - sql

I'm using a table for signifying friends/friend requests. Basically my idea was to have a table structure like this:
CREATE TABLE friends(
user_id NUMERIC NOT NULL,
friend_user_id NUMERIC NOT NULL
UNIQUE (user_id,friend_user_id)
)
When a user wants to create a friend requests you add a row:
INSERT INTO friends(user_id,friend_user_id) VALUES($1,$2)
Then when the other user accepts the friend request you would simply add the inverse of the previous row (i.e. the recipient would technically be sending a friend request back to the sender thus completing the friend relationship):
INSERT INTO friends(user_id,friend_user_id) VALUES($2,$1)
My question:
If I want to get all the friends of a user I would have to get all the rows with that user's id and I want to inner join that with another table that contains the user's information, how would I check within the query for whether the rows inverse relationship exists?
P.S. I think I could do it pretty easily doing multiple queries but I would rather only have one query if possible.

Solution:
SELECT users.id,users.username,users.avatar FROM users
INNER JOIN friends ON users.id = friends.user_id
WHERE friends.user_id = $1 AND
EXISTS (SELECT 1 FROM friends WHERE friend_user_id = $1 AND user_id = friends.user_id)

Related

Join Three Tables and Use Count on One of Them

I'm trying to improve my querying abilities and I have been having trouble wrapping my head around joins of moderate complexity. To be as clear and concise as possible, I am trying to join 3 tables. The first join selects posts from all users on users.User_ID=posts.FK_User_ID
**User table**
User_ID pk
username int
email
etc...
**Post Table**
post_ID PK
User_ID FK
Post
etc...
**Like Table**
FK_User_ID references user.user_ID
FK_Post_ID references post.Post_ID (*This is what I want to count*)
after this I want to reference a third table. This table contains a Foreign key to user_ID of user table and a foreign key of FK_Post_ID referencing the primary key Post_ID in the post table. This third table is a linking table of users who have liked the post. I want to count all occurrences of a post ID in this table and append it to each post in the initial user and post join so an output result would look like this:
User_id Username Post_ID Post Number_of_Likes
1 bob 4 'foo' 18
my first join between the two tables works and looks like this (simplified with * for example)
select * from users
join post
on post.User_ID=users.User_ID
Now I need a way to reference the third table to count the total # of times that a post id appears in the like table and append it to each row. This is where I am lost, I have been trying a lot of things to no luck. I believe I need to construct an inner join clause for my second join or I need to pull off a nested select statement? Could someone correct me on this if I am wrong and perhaps guide me in the right direct? Appreciate it!
A common way to do this is make a sub-query that has counts and key then join to that. Like this:
select *
from users
join post on post.FK_User_ID=users.User_ID
left join (
select FK_Post_ID, count(*) as count_of_likes_on_a_post
from likestable
group by FK_Post_ID
) likes on post.Post_ID = likes.FK_Post_ID

Best way to store followed users

I know the title isn't so describing but it's really hard to find something generic to describe my situation. If someone wants to edit, feel free...
So, I have a postgres database, with a users table. I would like to store the users followed by one user, and I really don't see how I could do this. I would like to do like SELECT followed_users FROM users WHERE username='username' and this would return me every usernames, or id, or whatever of each followed users. But I don't see any clean way to do this.
Maybe an example would be more describing: user1 is following user2 and user3.
How to store who user1 is following?
EDIT: I don't know how many users the user will follow.
Thank you for your help.
Expanding on my comment above, since it got wordy:
Create a new table called something like user_follows with columns like
user_id1 | user_id2
or
follower_id | follows_id
Then you can query:
SELECT t1.username as follower_username, t3.username as following_usernae
FROM users t1
INNER JOIN user_follows t2 ON t1.user_id = t2.follower_id
INNER JOIN users t3 ON t2.following_id = t3.user_id
WHERE t1.user_id = <your user>
In the end, think of your tables as "Objects". Then when you are presented with a problem like "How do I add users that are following other users" you can determine if this relationship is a new object, or an attribute of an existing object. Since a user might follow more than one other user than the relationship is not a good attribute for "Users", so it gets its own table user_follows.
Since user_follows is just one type of relationship that two users may have to one another, it might make sense to increase the scope of that object to relationships and store the relationship type as an attribute of the table:
user_id1 | user_id2 | relationship_type
where relationships.relationship_type might have values like follows, student of, sister of etc...
So the new query would be something like:
SELECT t1.username as follower_username, t3.username as following_username
FROM users t1
INNER JOIN relationships t2 ON t1.user_id = t2.user_id1
INNER JOIN users t3 ON t2.user_id2 = t3.user_id
WHERE t1.user_id = <your user> AND t2.relationship_type = 'Follows';
I'd add another table, let's call it following for argument's sake, which saves pairs of users and users they are following:
CREATE TABLE following (
user_id INT NOT NULL REFERENCES users(id),
following_id INT NOT NULL REFERENCES users(id),
PRIMARY KEY (user_id, following_id)
)
Then you could query all the user's a specific user is following by joining with the users table (twice). E.g., to get the names of all the users that I (username "mureinik") am following:
SELECT fu.username
FROM following f
JOIN users u ON f.user_id = u.id
JOIN users fu ON f.user_id = fu.id
WHERE u.username = 'mureinik'

Performance of search query bottlenecked 98% by mutual friends despite caching

So on my social networking website, similar to facebook, my search speed is bottlenecked like 98% by this one part. I want to rank the results based on the number of mutual friends the searching user has, with all of the results (we can assume they are users)
My friends table has 3 columns -
user_id (person who sends the request)
friend_id (person who receives the request)
pending (boolean to indicate if the request was accepted or not)
user_id and friend_id are both foreign keys that reference users.id
Finding friend_ids of a user is simple, it looks like this
def friends
Friend.where(
'(user_id = :id OR friend_id = :id) AND pending = false',
id: self.id
).pluck(:user_id, :friend_id)
.flatten
.uniq
.reject { |id| id == self.id }
end
So, after getting the results that match the search query, ranking the results by mutual friends, requires following steps -
Get user_ids of all the searching user's friends - Set(A). Above mentioned friends method does this
Loop over each of the ids in Set(A) -
Get user_ids of all the friends of |id| - Set (B). Again, done by friends method
Find length of intersection of set A and set B
Order in descending order of length of intersections for all results
The most expensive operation over here obviously getting friend_ids of of hundreds of users. So I cached the friend_ids of all the users to speed it up. The difference in performance was amazing, but I'm curious if it can be further improved.
I'm wondering if there is a way that I can get friend_ids of all the desired users in a single query, that is efficient. Something like -
SELECT user_id, [array of friend_ids of the user with id = user_id]
FROM friends
....
Can someone help me write a fast SQL or ActiveRecord query for this?
That way I can store the user_ids of all the search results and their corresponding friend_ids in a hash or some other fast data structure, and then perform the same operation of ranking (that I mentioned above). Since I won't be hitting the cache for thousands of users and their friend_ids, I think it'll speed up the process significantly
Caching your friends table in RAM is not a viable approach if you expect your site to grow to large numbers of users, but I'm sure it does great for a smallish number of users.
It is to your advantage to get the most work you can out of the database with as few calls as possible. It is inefficient to issue large numbers of queries, as the overhead per query as comparatively large. Moreover, databases are built for the kind of task you're trying to perform. I think you are doing far too much work on the Ruby side, and you ought to let the database do the kind of work it does best.
You did not give many details, so I decided to start by defining a minimal model DB:
create table users (
user_id int not null primary key,
nick varchar(32)
);
create table friends (
user_id int not null,
friend_id int not null,
pending bool,
primary key (user_id, friend_id),
foreign key (user_id) references users(user_id),
foreign key (friend_id) references users(user_id),
check (user_id < friend_id)
);
The check constraint on friends avoids the same pair of users being listed in the table in both orders, and of course the PK prevents the same pair from being enrolled multiple times in the same order. The PK also automatically has a unique index associated with it.
Since I suppose the 'is a friend of' relation is supposed to be logically symmetric, it is convenient to define a view that presents that symmetry:
create view friends_symmetric (user_id, friend_id) as (
select user_id, friend_id from friends where not pending
union all
select friend_id, user_id from friends where not pending
);
(If friendship is not symmetric then you can drop the check constraint and the view, and use table friends in place of friends_symmetric in what follows.)
As a model query whose results you want to rank, then, I take this:
select * from users where nick like 'Sat%';
The objective is to return result rows in descending order of the number of friends each hit has in common with User1, the user on whose behalf the query is run. You might do that like so:
(update: modified this query to filter out duplicate results)
select *
from (
select
u.*,
count(mutual.shared_friend_id) over (partition by u.user_id) as num_shared,
row_number() over (partition by u.user_id) as copy_num
from
users u
left join (
select
f1.friend_id as shared_friend_id,
f2.friend_id as friend_id
from friends_symmetric f1
join friends_symmetric f2
on f1.friend_id = f2.user_id
where f1.user_id = ?
and f2.friend_id != f1.user_id
) mutual
on u.user_id = mutual.friend_id
where u.nick like 'Sat%'
) all_rows
where copy_num = 1
order by num_shared desc
where the ? is a placeholder for a parameter containing the ID of the User1.
Edited to add:
I have structured this query with window functions instead of an aggregate query with the idea that such a structure will be easier for the query planner to optimize. Nevertheless, the inline view "mutual" could instead be structured as an aggregate query that computes the number of shared friends that the searching user has with every user that shares at least one friend, and that would permit one level of inline view to be avoided. If performance of the provided query is or becomes inadequate, then it would be worthwhile to test that variant.
There are other ways to approach the problem of performing the sorting in the DB, some of which may perform better, and there may be ways to improve the performance of each by tweaking the database (adding indexes or constraints, modifying table definitions, computing db statistics, ...).
I cannot predict whether that query will outperform what you're doing now, but I assure you that it scales better, and it is easier to maintain.
Assuming that you want a relation of the User model whose primary key is id, you should be able to join onto a subquery that calculates the number of mutual friends:
class User < ActiveRecord::Base
def other_users_ordered_by_mutual_friends
self.class.select("users.*, COALESCE(f.friends_count, 0) AS friends_count").joins("LEFT OUTER JOIN (
SELECT all_friends.user_id, COUNT(DISTINCT all_friends.friend_id) AS friends_count FROM (
SELECT f1.user_id, f1.friend_id FROM friends f1 WHERE f1.pending = false
UNION ALL
SELECT f2.friend_id AS user_id, f2.user_id AS friend_id FROM friends f2 WHERE f2.pending = false
) all_friends INNER JOIN (
SELECT DISTINCT f1.friend_id AS user_id FROM friends f1 WHERE f1.user_id = #{id} AND f1.pending = false
UNION ALL
SELECT DISTINCT f2.user_id FROM friends f2 WHERE f2.friend_id = #{id} AND f2.pending = false
) user_friends ON user_friends.user_id = all_friends.friend_id GROUP BY all_friends.user_id
) f ON f.user_id = users.id").where.not(id: id).order("friends_count DESC")
end
end
The subquery selects all user IDs with associated friends and inner joins that to another select with all of the current user's friends' IDs. Since it groups by the user_id and selects the count, we get the number of mutual friends for each user_id. I have not tested this since I don't have any sample data, but it should work.
Since this returns a scope, you can chain other scopes/conditions to the relation:
current_user.other_users_ordered_by_mutual_friends.where(attribute1: value1).reorder(:attribute2)
The select scope as written will also give you access to the field friends_count on instances within the relation:
<%- current_user.other_users_ordered_by_mutual_friends.each do |user| -%>
<p>User <%= user.id -%> has <%= user.friends_count -%> mutual friends.</p>
<%- end -%>
John had a great idea with the friends_symetric view. With two filtered indexes (one on (friend_id,user_id and the other on (user_id,friend_id) ) it's gonna work great.
However the query can be a bit simpler
WITH user_friends AS(
SELECT user_id, array_agg(friend_id) AS friends
FROM friends_symmetric
WHERE user_id = :user_id -- id of our user
GROUP BY user_id
)
SELECT u.*
,array_agg(friend_id) AS shared_friends -- aggregated ids of friends in case they are needed for something
,count(*) AS shared_count
FROM user_friends AS uf
JOIN friends_symmetric AS f
ON f.user_id = ANY(uf.friends) AND f.friend_id = ANY(uf.friends)
JOIN user
ON u.user_id = f.user_id
WHERE u.nick LIKE 'Sat%' --nickname of our user's friend
GROUP BY u.user_id

How to tightly contain an SQL query result

I'm writing an application that implements a message system through a 'memos' table in a database. The table has several fields that look like this:
id, date_sent, subject, senderid, recipients,message, status
When someone sends a new memo, it will be entered into the memos table. A memo can be sent to multiple people at the same time and the recipients userid's will be inserted into the 'recipients' field as comma separated values.
It would seem that an SQL query like this would work to see if a specific userid is included in a memo:
SELECT * FROM memos WHERE recipients LIKE %15%
But I'm not sure this is the right solution. If I use the SQL statement above, won't that return everything that "contains" 15? For example, using the above statement, user 15, 1550, 1564, 2015, would all be included in the result set (and those users might not actually be on the recipient list).
What is the best way to resolve this so that ONLY the user 15 is pulled in if they are in the recipient field instead of everything containing a 15? Am I misunderstanding the LIKE statement?
I think you would be better off having your recipients as a child table of the memos table. So your memo's table has a memo ID which is referenced by the child table as
MemoRecipients
-----
MemoRecipientId INT PRIMARY KEY, IDENTITY,
MemoId INT FK to memos NOT NULL
UserId INT NOT NULL
for querying specific memos from a user you would do something like
SELECT *
FROM MEMOS m
INNER JOIN memoRecipients mr on m.Id = mr.memoId
WHERE userId = 15
No, you aren't misunderstood, that's how LIKE works.. But to achieve what you want, it would be better not to combine the recipients into 1 field. Instead try to create separate table that saves the recipient list for each memo..
For me I will use below schema, for your need:
Table_Memo
id, date_sent, subject, senderid, message, status
Table_Recipient
id_memo FK Table_Memo(id), recipient
By doing so, if you want to get specific recipients from a memo, you can do such query:
SELECT a.* FROM Table_Memo a, Table_Recipient b
WHERE a.id = "memo_id" AND a.id = b.id_memo AND b.recipient LIKE %15%
I am not sure how your application is exactly pulling these messages, but I imagine that better way would be creating a table message_recepient, which will represent many-to-many relationship between recipients and memos
id, memoId, recepientId
Then your application could pull messages like this
SELECT m.*
FROM memos m inner join message_recepient mr on m.id = mr.memoId
WHERE recepientId = 15
This way you will get messages for the specific user. Again, don't know what your status field is for but if this is for new/read/unread, you could add in your where
and m.status = 'new'
Order by date_set desc
This way you could just accumulate messages, those that are new

Login DAO sql statement involving multiple joins

I am trying to create a query that will return through DAO whether the inputted username and password is correct. I'm using java for DAO implementation as well as JSF.
I have the following tables:
LOGIN: username (pk)
BUSINESS: username (fk), password
CUSTOMER: username (fk), password
What I'm trying to do is create multiple joins so that when a user goes to log in, their stored username defines what type of account they have. By pulling the username, the username is looked for in both the BUSINESS and CUSTOMER and when found, the password is then compared. I tried the following statement:
SELECT l.USERNAME
FROM ITKSTU.BUSINESS b
JOIN ITKSTU.LOGIN l
ON l.USERNAME=b.USERNAME
JOIN ITKSTU.CUSTOMER c
ON c.USERNAME=l.USERNAME
WHERE l.USERNAME='user111' AND (b.PASSWORD='aaa' OR c.PASSWORD='aaa');
Yet it returns nothing. Any possible suggestions?
I have replicated the same here and it looks like it is working. Could you check?
http://sqlfiddle.com/#!2/f253d/2
Thanks
If I understood correctly, what you need is to distinguish a user's type, whether he/she is in business table or customer table. Then, check the password correctness.
Then, again if I am not wrong, you should have an entry for all users in login table, then each one of them should take place EITHER in businees OR customer table.
Let's assume we have records such as:
INSERT INTO login VALUES ('TEST');
INSERT INTO login VALUES ('TEST2');
INSERT INTO business VALUES ('TEST','PASSWORD123');
INSERT INTO customer VALUES ('TEST2','PASSWORD1234');
I think you may solve the problem with the following query. Let's test with the user named "TEST2":
SELECT b.username AS business_user, c.username AS customer_user
FROM login l
LEFT JOIN business b ON b.username = l.username
LEFT JOIN customer c ON c.username = l.username
WHERE l.username = 'TEST2' AND (b.password = 'PASSWORD1234' OR c.password = 'PASSWORD1234');
This query will return 2 columns as you notice: first one will return null as the user is not in business table. The second one will give you the username and label it as "customer_user". Therefore, if you check each column and determine which one is null, then you will know where the user actually belongs to (either to business or customer table).
The trick here is to begin with login table ("FROM login") and use LEFT JOIN, instead of JOIN. Here is a quick tip about joins and their differences, if you need it: http://www.firebirdfaq.org/faq93/