Query for orphaned relations - sql

I have to cleanup orphaned associations in a Rails app which uses OmniAuth. For the sake of simplicity, here's a stripped down scenario.
Given two tables:
password_id: INTEGER
<more columns>
password_digest: VARCHAR
In other words: There's a facultative "user belongs_to password" relation. (There are good reasons why the relation is not the other way around.)
Normally, every user relates to one password. But sometimes a user is deleted and the corresponding password gets orphaned.
Is there an efficient way to find all orphaned passwords (in other words: all passwords which are not related to by any user) with just one SQL query on Postgres?
Thanks for your hints!

This type of query is called an anti-join. The simplest method is:
FROM passwords p
LEFT JOIN users u
ON u.password_id = p.id
WHERE u.<primary key field> IS NULL;
Another alternative is the NOT EXISTS method #Politank-Z gave. They should have basically identical query plans.

WHERE NOT EXISTS ( SELECT 1 FROM users u WHERE p.id = u.password_id );
...is a straightforward enough solution. You could build it around a LEFT JOIN or a MINUS if you prefer. You could also prevent the scenario entirely by adding a foreign key from users to passwords.

Have you tried:
FROM passwords
WHERE I'd NOT IN (SELECT password_id FROM users)
SELECT please.*
FROM passwords please LEFT JOIN
users u ON p.id=u.password_id
WHERE u password_id IS NULL


How to select records from database table which has to user id (created_by_user, given_to_user) and replace users id by usernames?

This is task table:
This is user table:
I want to select user tasks.
I would give from backend ("given_to_user) id.
But The thing is I want that SELECTED data would have usernames instead of Id which is (created_by_user and given_to_user).
SELECTED table would look like this.
How to achieve what I want?
Or maybe I designed poorly my tables that It is difficult to select data I need? :)
task table has to id values that are foreign keys to user table.
I tried many thinks but couldn't get desired result.
You did not design poorly the tables.
In fact this is common practice to store the ids that reference columns in other tables. You just need to learn to implement joins:
task.id, task.title, task.information, user.usename AS created_by, user2.usename AS given_to
(task INNER JOIN user ON task.created_by_user = user.id)
INNER JOIN user AS user2 ON task.created_by_user = user2.id;
Do you just want two joins?
select t.*, uc.username as created_by_username,
ug.username as given_to_username
from task t left join
users uc
on t.created_by_user = uc.id left join
users ug
on t.given_to_user = ug.id;
This uses left join in case one of the user ids is missing.

Best way to store followed users

I know the title isn't so describing but it's really hard to find something generic to describe my situation. If someone wants to edit, feel free...
So, I have a postgres database, with a users table. I would like to store the users followed by one user, and I really don't see how I could do this. I would like to do like SELECT followed_users FROM users WHERE username='username' and this would return me every usernames, or id, or whatever of each followed users. But I don't see any clean way to do this.
Maybe an example would be more describing: user1 is following user2 and user3.
How to store who user1 is following?
EDIT: I don't know how many users the user will follow.
Thank you for your help.
Expanding on my comment above, since it got wordy:
Create a new table called something like user_follows with columns like
user_id1 | user_id2
follower_id | follows_id
Then you can query:
SELECT t1.username as follower_username, t3.username as following_usernae
FROM users t1
INNER JOIN user_follows t2 ON t1.user_id = t2.follower_id
INNER JOIN users t3 ON t2.following_id = t3.user_id
WHERE t1.user_id = <your user>
In the end, think of your tables as "Objects". Then when you are presented with a problem like "How do I add users that are following other users" you can determine if this relationship is a new object, or an attribute of an existing object. Since a user might follow more than one other user than the relationship is not a good attribute for "Users", so it gets its own table user_follows.
Since user_follows is just one type of relationship that two users may have to one another, it might make sense to increase the scope of that object to relationships and store the relationship type as an attribute of the table:
user_id1 | user_id2 | relationship_type
where relationships.relationship_type might have values like follows, student of, sister of etc...
So the new query would be something like:
SELECT t1.username as follower_username, t3.username as following_username
FROM users t1
INNER JOIN relationships t2 ON t1.user_id = t2.user_id1
INNER JOIN users t3 ON t2.user_id2 = t3.user_id
WHERE t1.user_id = <your user> AND t2.relationship_type = 'Follows';
I'd add another table, let's call it following for argument's sake, which saves pairs of users and users they are following:
CREATE TABLE following (
user_id INT NOT NULL REFERENCES users(id),
following_id INT NOT NULL REFERENCES users(id),
PRIMARY KEY (user_id, following_id)
Then you could query all the user's a specific user is following by joining with the users table (twice). E.g., to get the names of all the users that I (username "mureinik") am following:
SELECT fu.username
FROM following f
JOIN users u ON f.user_id = u.id
JOIN users fu ON f.user_id = fu.id
WHERE u.username = 'mureinik'

Select a user by their username and then select data from another table using their UID

Sorry if that title is a bit convoluted... I'm spoiled by an ORM usually and my raw SQL skills are really poor, apparently.
I'm writing an application that links to a vBulletin forum. Users authenticate with their forum username, and the query for that is simple (selecting by username from the users table). The next half of it is more complex. There's also a subscriptions table that has a timestamp in it, but the primary key for these is a user id, not a username.
This is what I've worked out so far:
JOIN forum.subscriptionlog
forum.user.username LIKE 'SomeUSER'
Unfortunately this returns the entirety of the subscriptionlog table, which makes sense because there's no username field in it. Is it possible to grab the subscriptionlog row using the userid I get from forum.user.userid, or does this need to be split into two queries?
The issue is that you are blindly joining the two tables. You need to specify what column they are related by.
I think you want something like:
SELECT * FROM user u
INNER JOIN subscriptionlog sl ON u.id = sl.userid
WHERE u.username LIKE 'SomeUSER'
select * from user u JOIN subscriptions s ON u.id = s.id where u.username = 'someuser'
The bit in bold is what you want to add, it combines the 2 tables into one that you return results from.
try this
INNER JOIN forum.subscriptionlog
ON forum.subscriptionlog.userid = forum.user.userid
forum.user.username LIKE 'SomeUSER'

Performance of search query bottlenecked 98% by mutual friends despite caching

So on my social networking website, similar to facebook, my search speed is bottlenecked like 98% by this one part. I want to rank the results based on the number of mutual friends the searching user has, with all of the results (we can assume they are users)
My friends table has 3 columns -
user_id (person who sends the request)
friend_id (person who receives the request)
pending (boolean to indicate if the request was accepted or not)
user_id and friend_id are both foreign keys that reference users.id
Finding friend_ids of a user is simple, it looks like this
def friends
'(user_id = :id OR friend_id = :id) AND pending = false',
id: self.id
).pluck(:user_id, :friend_id)
.reject { |id| id == self.id }
So, after getting the results that match the search query, ranking the results by mutual friends, requires following steps -
Get user_ids of all the searching user's friends - Set(A). Above mentioned friends method does this
Loop over each of the ids in Set(A) -
Get user_ids of all the friends of |id| - Set (B). Again, done by friends method
Find length of intersection of set A and set B
Order in descending order of length of intersections for all results
The most expensive operation over here obviously getting friend_ids of of hundreds of users. So I cached the friend_ids of all the users to speed it up. The difference in performance was amazing, but I'm curious if it can be further improved.
I'm wondering if there is a way that I can get friend_ids of all the desired users in a single query, that is efficient. Something like -
SELECT user_id, [array of friend_ids of the user with id = user_id]
FROM friends
Can someone help me write a fast SQL or ActiveRecord query for this?
That way I can store the user_ids of all the search results and their corresponding friend_ids in a hash or some other fast data structure, and then perform the same operation of ranking (that I mentioned above). Since I won't be hitting the cache for thousands of users and their friend_ids, I think it'll speed up the process significantly
Caching your friends table in RAM is not a viable approach if you expect your site to grow to large numbers of users, but I'm sure it does great for a smallish number of users.
It is to your advantage to get the most work you can out of the database with as few calls as possible. It is inefficient to issue large numbers of queries, as the overhead per query as comparatively large. Moreover, databases are built for the kind of task you're trying to perform. I think you are doing far too much work on the Ruby side, and you ought to let the database do the kind of work it does best.
You did not give many details, so I decided to start by defining a minimal model DB:
create table users (
user_id int not null primary key,
nick varchar(32)
create table friends (
user_id int not null,
friend_id int not null,
pending bool,
primary key (user_id, friend_id),
foreign key (user_id) references users(user_id),
foreign key (friend_id) references users(user_id),
check (user_id < friend_id)
The check constraint on friends avoids the same pair of users being listed in the table in both orders, and of course the PK prevents the same pair from being enrolled multiple times in the same order. The PK also automatically has a unique index associated with it.
Since I suppose the 'is a friend of' relation is supposed to be logically symmetric, it is convenient to define a view that presents that symmetry:
create view friends_symmetric (user_id, friend_id) as (
select user_id, friend_id from friends where not pending
union all
select friend_id, user_id from friends where not pending
(If friendship is not symmetric then you can drop the check constraint and the view, and use table friends in place of friends_symmetric in what follows.)
As a model query whose results you want to rank, then, I take this:
select * from users where nick like 'Sat%';
The objective is to return result rows in descending order of the number of friends each hit has in common with User1, the user on whose behalf the query is run. You might do that like so:
(update: modified this query to filter out duplicate results)
select *
from (
count(mutual.shared_friend_id) over (partition by u.user_id) as num_shared,
row_number() over (partition by u.user_id) as copy_num
users u
left join (
f1.friend_id as shared_friend_id,
f2.friend_id as friend_id
from friends_symmetric f1
join friends_symmetric f2
on f1.friend_id = f2.user_id
where f1.user_id = ?
and f2.friend_id != f1.user_id
) mutual
on u.user_id = mutual.friend_id
where u.nick like 'Sat%'
) all_rows
where copy_num = 1
order by num_shared desc
where the ? is a placeholder for a parameter containing the ID of the User1.
Edited to add:
I have structured this query with window functions instead of an aggregate query with the idea that such a structure will be easier for the query planner to optimize. Nevertheless, the inline view "mutual" could instead be structured as an aggregate query that computes the number of shared friends that the searching user has with every user that shares at least one friend, and that would permit one level of inline view to be avoided. If performance of the provided query is or becomes inadequate, then it would be worthwhile to test that variant.
There are other ways to approach the problem of performing the sorting in the DB, some of which may perform better, and there may be ways to improve the performance of each by tweaking the database (adding indexes or constraints, modifying table definitions, computing db statistics, ...).
I cannot predict whether that query will outperform what you're doing now, but I assure you that it scales better, and it is easier to maintain.
Assuming that you want a relation of the User model whose primary key is id, you should be able to join onto a subquery that calculates the number of mutual friends:
class User < ActiveRecord::Base
def other_users_ordered_by_mutual_friends
self.class.select("users.*, COALESCE(f.friends_count, 0) AS friends_count").joins("LEFT OUTER JOIN (
SELECT all_friends.user_id, COUNT(DISTINCT all_friends.friend_id) AS friends_count FROM (
SELECT f1.user_id, f1.friend_id FROM friends f1 WHERE f1.pending = false
SELECT f2.friend_id AS user_id, f2.user_id AS friend_id FROM friends f2 WHERE f2.pending = false
) all_friends INNER JOIN (
SELECT DISTINCT f1.friend_id AS user_id FROM friends f1 WHERE f1.user_id = #{id} AND f1.pending = false
SELECT DISTINCT f2.user_id FROM friends f2 WHERE f2.friend_id = #{id} AND f2.pending = false
) user_friends ON user_friends.user_id = all_friends.friend_id GROUP BY all_friends.user_id
) f ON f.user_id = users.id").where.not(id: id).order("friends_count DESC")
The subquery selects all user IDs with associated friends and inner joins that to another select with all of the current user's friends' IDs. Since it groups by the user_id and selects the count, we get the number of mutual friends for each user_id. I have not tested this since I don't have any sample data, but it should work.
Since this returns a scope, you can chain other scopes/conditions to the relation:
current_user.other_users_ordered_by_mutual_friends.where(attribute1: value1).reorder(:attribute2)
The select scope as written will also give you access to the field friends_count on instances within the relation:
<%- current_user.other_users_ordered_by_mutual_friends.each do |user| -%>
<p>User <%= user.id -%> has <%= user.friends_count -%> mutual friends.</p>
<%- end -%>
John had a great idea with the friends_symetric view. With two filtered indexes (one on (friend_id,user_id and the other on (user_id,friend_id) ) it's gonna work great.
However the query can be a bit simpler
WITH user_friends AS(
SELECT user_id, array_agg(friend_id) AS friends
FROM friends_symmetric
WHERE user_id = :user_id -- id of our user
GROUP BY user_id
,array_agg(friend_id) AS shared_friends -- aggregated ids of friends in case they are needed for something
,count(*) AS shared_count
FROM user_friends AS uf
JOIN friends_symmetric AS f
ON f.user_id = ANY(uf.friends) AND f.friend_id = ANY(uf.friends)
JOIN user
ON u.user_id = f.user_id
WHERE u.nick LIKE 'Sat%' --nickname of our user's friend
GROUP BY u.user_id

Login DAO sql statement involving multiple joins

I am trying to create a query that will return through DAO whether the inputted username and password is correct. I'm using java for DAO implementation as well as JSF.
I have the following tables:
LOGIN: username (pk)
BUSINESS: username (fk), password
CUSTOMER: username (fk), password
What I'm trying to do is create multiple joins so that when a user goes to log in, their stored username defines what type of account they have. By pulling the username, the username is looked for in both the BUSINESS and CUSTOMER and when found, the password is then compared. I tried the following statement:
WHERE l.USERNAME='user111' AND (b.PASSWORD='aaa' OR c.PASSWORD='aaa');
Yet it returns nothing. Any possible suggestions?
I have replicated the same here and it looks like it is working. Could you check?
If I understood correctly, what you need is to distinguish a user's type, whether he/she is in business table or customer table. Then, check the password correctness.
Then, again if I am not wrong, you should have an entry for all users in login table, then each one of them should take place EITHER in businees OR customer table.
Let's assume we have records such as:
I think you may solve the problem with the following query. Let's test with the user named "TEST2":
SELECT b.username AS business_user, c.username AS customer_user
FROM login l
LEFT JOIN business b ON b.username = l.username
LEFT JOIN customer c ON c.username = l.username
WHERE l.username = 'TEST2' AND (b.password = 'PASSWORD1234' OR c.password = 'PASSWORD1234');
This query will return 2 columns as you notice: first one will return null as the user is not in business table. The second one will give you the username and label it as "customer_user". Therefore, if you check each column and determine which one is null, then you will know where the user actually belongs to (either to business or customer table).
The trick here is to begin with login table ("FROM login") and use LEFT JOIN, instead of JOIN. Here is a quick tip about joins and their differences, if you need it: http://www.firebirdfaq.org/faq93/