Convert some SQL query to active record

Convert some SQL query to active record - sql

So, I have this "advanced" query (not much, really) and I would like to translate it into Ruby Active Record's syntax.
SELECT microposts.*
FROM microposts
WHERE user_id IN
( SELECT r.followed_id as uid
FROM relationships r
WHERE follower_id = 1
UNION
SELECT u.id as uid
FROM users as u
WHERE id = 1
)
ORDER BY microposts.created_at DESC
The idea was to retrieve all microposts for user 1 AND user 1 followed users in desc creation order, but I really don't know how to translate this easily using Active Record's syntax.
Any thought ?
PS : As asked here is some rails context :
I have 3 models : Microposts, Users, Relationships.
Relationships is a join table handling all users relationships (follower/followed stuff).
Users have many followed_users/followers through relationships.
Users have many microhoops, and microhoops have one user.
Thanks.

No idea about Ruby but the SQL can be simplified to:
SELECT microposts.*
FROM microposts
WHERE user_id IN
( SELECT r.followed_id as uid
FROM relationships r
WHERE follower_id = 1
)
OR user_id = 1
ORDER BY microposts.created_at DESC

My answer will assume (since you've provided no ruby/rails-context outside of your raw SQL query) you have a User model, a Micropost model through relation :microposts, and a Relationship model through relation :following. User has many Micropost and Relationship instances related. You could do
u = User.find(1)
user.microposts + user.following.microposts
or you could move this into a method within Micropost
def self.own_and_following(user)
user.microposts + user.following.microposts
end
And call Micropost.own_and_following(User.find(1)).
This may not be what you're looking for, but in given the above mentioned likely relations you have in your Rails application, it sounds like something similar to this should work.

Your query is very specific, therefore your best bet would be to write a good portion of it using SQL, or try a gem like squeel that can help out generating very customized SQL from ActiveRecord.
Nevertheless, this should do the work with no additional gems :
user_id = ... #Get the user_id you want to test
Micropost.where("user_id IN
( SELECT r.followed_id as uid
FROM relationships r
WHERE follower_id = ? )
OR user_id = ?
", user_id, user_id).order("created_at desc")

I managed to do it using only where, seems a lot like a find_by_sql to me, and I don't know which one would be better :
Micropost.order('created_at DESC').
where('user_id in (select r.followed_id as uid from relationships as r where follower_id = ?) or user_id = ?', user.id, user.id)
Don't know how good this is, but it seem to be working.

Related

Relationships query PostgresSQL, Follow/Unfollow functionality with PostgresSQL

I have two tables, Users and Relationships tables. Users table has following columns:
id, name,password,username,email,avatar,followersCount,followingCount,tweetCount.
And the Relationships table has the following columns:
id, followingId, followerId
How should I go about creating a SQL query to extract a user with a specific Id and find id's from Relationships that user is following? So in other words find people that user follows
I've come this far so long
SELECT *
FROM public."Users" JOIN
public."Relationships"
ON (public."Users".id = public."Relationships".id)

If I understand correctly, you want:
SELECT u.*
FROM public."Relationships" r JOIN
public."Users" u
ON u.id = r.followerId
WHERE r.followingId = ?;
? is a parameter placeholder for the user you care about. This returns all the followers of that user.

Do you mean this query
SELECT public."Users".*
FROM public."Users"
JOIN public."Relationships"
ON public."Users".id = public."Relationships".followingId
AND public."Relationships".followerId = a user ID
I am not really clear about followerId and followingId mean but you can change them in the query if it is not what you want.

Performance of search query bottlenecked 98% by mutual friends despite caching

So on my social networking website, similar to facebook, my search speed is bottlenecked like 98% by this one part. I want to rank the results based on the number of mutual friends the searching user has, with all of the results (we can assume they are users)
My friends table has 3 columns -
user_id (person who sends the request)
friend_id (person who receives the request)
pending (boolean to indicate if the request was accepted or not)
user_id and friend_id are both foreign keys that reference users.id
Finding friend_ids of a user is simple, it looks like this
def friends
Friend.where(
'(user_id = :id OR friend_id = :id) AND pending = false',
id: self.id
).pluck(:user_id, :friend_id)
.flatten
.uniq
.reject { |id| id == self.id }
end
So, after getting the results that match the search query, ranking the results by mutual friends, requires following steps -
Get user_ids of all the searching user's friends - Set(A). Above mentioned friends method does this
Loop over each of the ids in Set(A) -
Get user_ids of all the friends of |id| - Set (B). Again, done by friends method
Find length of intersection of set A and set B
Order in descending order of length of intersections for all results
The most expensive operation over here obviously getting friend_ids of of hundreds of users. So I cached the friend_ids of all the users to speed it up. The difference in performance was amazing, but I'm curious if it can be further improved.
I'm wondering if there is a way that I can get friend_ids of all the desired users in a single query, that is efficient. Something like -
SELECT user_id, [array of friend_ids of the user with id = user_id]
FROM friends
....
Can someone help me write a fast SQL or ActiveRecord query for this?
That way I can store the user_ids of all the search results and their corresponding friend_ids in a hash or some other fast data structure, and then perform the same operation of ranking (that I mentioned above). Since I won't be hitting the cache for thousands of users and their friend_ids, I think it'll speed up the process significantly

Caching your friends table in RAM is not a viable approach if you expect your site to grow to large numbers of users, but I'm sure it does great for a smallish number of users.
It is to your advantage to get the most work you can out of the database with as few calls as possible. It is inefficient to issue large numbers of queries, as the overhead per query as comparatively large. Moreover, databases are built for the kind of task you're trying to perform. I think you are doing far too much work on the Ruby side, and you ought to let the database do the kind of work it does best.
You did not give many details, so I decided to start by defining a minimal model DB:
create table users (
user_id int not null primary key,
nick varchar(32)
);
create table friends (
user_id int not null,
friend_id int not null,
pending bool,
primary key (user_id, friend_id),
foreign key (user_id) references users(user_id),
foreign key (friend_id) references users(user_id),
check (user_id < friend_id)
);
The check constraint on friends avoids the same pair of users being listed in the table in both orders, and of course the PK prevents the same pair from being enrolled multiple times in the same order. The PK also automatically has a unique index associated with it.
Since I suppose the 'is a friend of' relation is supposed to be logically symmetric, it is convenient to define a view that presents that symmetry:
create view friends_symmetric (user_id, friend_id) as (
select user_id, friend_id from friends where not pending
union all
select friend_id, user_id from friends where not pending
);
(If friendship is not symmetric then you can drop the check constraint and the view, and use table friends in place of friends_symmetric in what follows.)
As a model query whose results you want to rank, then, I take this:
select * from users where nick like 'Sat%';
The objective is to return result rows in descending order of the number of friends each hit has in common with User1, the user on whose behalf the query is run. You might do that like so:
(update: modified this query to filter out duplicate results)
select *
from (
select
u.*,
count(mutual.shared_friend_id) over (partition by u.user_id) as num_shared,
row_number() over (partition by u.user_id) as copy_num
from
users u
left join (
select
f1.friend_id as shared_friend_id,
f2.friend_id as friend_id
from friends_symmetric f1
join friends_symmetric f2
on f1.friend_id = f2.user_id
where f1.user_id = ?
and f2.friend_id != f1.user_id
) mutual
on u.user_id = mutual.friend_id
where u.nick like 'Sat%'
) all_rows
where copy_num = 1
order by num_shared desc
where the ? is a placeholder for a parameter containing the ID of the User1.
Edited to add:
I have structured this query with window functions instead of an aggregate query with the idea that such a structure will be easier for the query planner to optimize. Nevertheless, the inline view "mutual" could instead be structured as an aggregate query that computes the number of shared friends that the searching user has with every user that shares at least one friend, and that would permit one level of inline view to be avoided. If performance of the provided query is or becomes inadequate, then it would be worthwhile to test that variant.
There are other ways to approach the problem of performing the sorting in the DB, some of which may perform better, and there may be ways to improve the performance of each by tweaking the database (adding indexes or constraints, modifying table definitions, computing db statistics, ...).
I cannot predict whether that query will outperform what you're doing now, but I assure you that it scales better, and it is easier to maintain.

Assuming that you want a relation of the User model whose primary key is id, you should be able to join onto a subquery that calculates the number of mutual friends:
class User < ActiveRecord::Base
def other_users_ordered_by_mutual_friends
self.class.select("users.*, COALESCE(f.friends_count, 0) AS friends_count").joins("LEFT OUTER JOIN (
SELECT all_friends.user_id, COUNT(DISTINCT all_friends.friend_id) AS friends_count FROM (
SELECT f1.user_id, f1.friend_id FROM friends f1 WHERE f1.pending = false
UNION ALL
SELECT f2.friend_id AS user_id, f2.user_id AS friend_id FROM friends f2 WHERE f2.pending = false
) all_friends INNER JOIN (
SELECT DISTINCT f1.friend_id AS user_id FROM friends f1 WHERE f1.user_id = #{id} AND f1.pending = false
UNION ALL
SELECT DISTINCT f2.user_id FROM friends f2 WHERE f2.friend_id = #{id} AND f2.pending = false
) user_friends ON user_friends.user_id = all_friends.friend_id GROUP BY all_friends.user_id
) f ON f.user_id = users.id").where.not(id: id).order("friends_count DESC")
end
end
The subquery selects all user IDs with associated friends and inner joins that to another select with all of the current user's friends' IDs. Since it groups by the user_id and selects the count, we get the number of mutual friends for each user_id. I have not tested this since I don't have any sample data, but it should work.
Since this returns a scope, you can chain other scopes/conditions to the relation:
current_user.other_users_ordered_by_mutual_friends.where(attribute1: value1).reorder(:attribute2)
The select scope as written will also give you access to the field friends_count on instances within the relation:
<%- current_user.other_users_ordered_by_mutual_friends.each do |user| -%>
<p>User <%= user.id -%> has <%= user.friends_count -%> mutual friends.</p>
<%- end -%>

John had a great idea with the friends_symetric view. With two filtered indexes (one on (friend_id,user_id and the other on (user_id,friend_id) ) it's gonna work great.
However the query can be a bit simpler
WITH user_friends AS(
SELECT user_id, array_agg(friend_id) AS friends
FROM friends_symmetric
WHERE user_id = :user_id -- id of our user
GROUP BY user_id
)
SELECT u.*
,array_agg(friend_id) AS shared_friends -- aggregated ids of friends in case they are needed for something
,count(*) AS shared_count
FROM user_friends AS uf
JOIN friends_symmetric AS f
ON f.user_id = ANY(uf.friends) AND f.friend_id = ANY(uf.friends)
JOIN user
ON u.user_id = f.user_id
WHERE u.nick LIKE 'Sat%' --nickname of our user's friend
GROUP BY u.user_id

Rails ActiveRecord query where relationship does not exist based on third attribute

I have an Adventure model, which is a join table between a Destination and a User (and has additional attributes such as zipcode and time_limit). I want to create a query that will return me all the Destinations where an Adventure between that Destination and the User currently trying to create an Adventure does not exist.
The way the app works when a User clicks to start a new Adventure it will create that Adventure with the user_id being that User's id and then runs a method to provide a random Destination, ex:
Adventure.create(user_id: current_user.id) (it is actually doing current_user.adventures.new ) but same thing
I have tried a few things from writing raw SQL queries to using .joins. Here are a few examples:
Destination.joins(:adventures).where.not('adventures.user_id != ?'), user.id)
Destination.joins('LEFT OUTER JOIN adventure ON destination.id = adventure.destination_id').where('adventure.user_id != ?', user.id)

The below should return all destinations that user has not yet visited in any of his adventures:
destinations = Destination.where('id NOT IN (SELECT destination_id FROM adventures WHERE user_id = ?)', user.id)
To select a random one append one of:
.all.sample
# or
.pluck(:id).sample
Depending on whether you want a full record or just id.

No need for joins, this should do:
Destination.where(['id not in ?', user.adventures.pluck(:destination_id)])

In your first attempt, I see the problem to be in the usage of equality operator with where.not. In your first attempt:
Destination.joins(:adventures).where.not('adventures.user_id != ?'), user.id)
you're doing where.not('adventures.user_id != ?'), user.id). I understand this is just the opposite of what you want, isn't it? Shouldn't you be calling it as where.not('adventures.user_id = ?', user.id), i.e. with an equals =?
I think the following query would work for the requirement:
Destination.joins(:adventures).where.not(adventures: { user_id: user.id })
The only problem I see in your second method is the usage of destinations and adventures table in both join and where conditions. The table names should be plural. The query should have been:
Destination
.joins('LEFT OUTER JOIN adventures on destinations.id = adventures.destination_id')
.where('adventures.user_id != ?', user.id)

ActiveRecord doesn't do join conditions but you can use your User destinations relation (eg a has_many :destinations, through: adventures) as a sub select which results in a WHERE NOT IN (SELECT...)
The query is pretty simple to express and doesn't require using sql string shenanigans, multiple queries or pulling back temporary sets of ids:
Destination.where.not(id: user.destinations)
If you want you can also chain the above realation with additional where terms, ordering and grouping clauses.

I solved this problem with a mix of this answer and this other answer and came out with:
destination = Destination.where
.not(id: Adventure.where(user: user)
.pluck(:destination_id)
)
.sample
The .not(id: Adventure.where(user: user).pluck(:destination_id)) part excludes destinations present in previous adventures of the user.
The .sample part will pick a random destination from the results.

Advanced SQL in Rails

I have 2 models
class User < AR
has_many :friends
end
class Friend < AR
# has a name column
end
I need to find all Users who are Friends with both 'Joe' and 'Jack'
Any idea how i can do this in rails?

One option is to put each of the names as arguments for individual INNER JOINS. In SQL it would be something like this:
SELECT users.* FROM users
INNER JOIN friends AS f1
ON users.id = f1.user_id
AND f1.name = 'Joe'
INNER JOIN friends AS f2
ON users.id = f2.user_id
AND f2.name = 'Jack'
Since it is INNER JOINS, it will only display results where the users table can be joined with both f1 and f2.
And to use it in Rails, maybe do it something like this:
class User < AR
has_many :friends
def self.who_knows(*friend_names)
joins((1..friend_names.length).map{ |n|
"INNER JOIN friends AS f#{n} ON users.id = f#{n}.user_id AND f#{n}.name = ?" }.join(" "),
*friend_names)
})
end
end
Which you then can call like this:
#users = User.who_knows("Joe", "Jack")

Possible way: User.all(:joins => :friends, :conditions => ["friends.name IN (?,?)", "Joe", "Jack"], :group => "users.id") and then iterate over the array to find users with 2 friends.
This is the best solution i got when tried to solve similar problem for myself. If you find the way to do it in pure sql or ActiveRecord – let me know please!

Although using hard-coded SQL as suggested by DanneManne will most often work, and is probably the way you'd want to go, it is not necessarily composable. As soon as you have hard-coded a table name, you can run into problems combining that into other queries where ActiveRecord may decide to alias the table.
So, at the cost of some extra complexity, we can solve this using some ARel as follows:
f = Friend.arel_table
User.
where(:id=>f.project(:user_id).where(f[:name].eq('Joe'))).
where(:id=>f.project(:user_id).where(f[:name].eq('Jack')))
This will use a pair of subqueries to do the job.
I'm fairly certain there's an ARel solution using joins as well, but and I can figure out how to compose that query in ARel, just not how to then use that query as the basis for an ActiveRecord query to get back User model instances.

Finding unique records, ordered by field in association, with PostgreSQL and Rails 3?

UPDATE: So thanks to #Erwin Brandstetter, I now have this:
def self.unique_users_by_company(company)
users = User.arel_table
cards = Card.arel_table
users_columns = User.column_names.map { |col| users[col.to_sym] }
cards_condition = cards[:company_id].eq(company.id).
and(cards[:user_id].eq(users[:id]))
User.joins(:cards).where(cards_condition).group(users_columns).
order('min(cards.created_at)')
end
... which seems to do exactly what I want. There are two shortcomings that I would still like to have addressed, however:
The order() clause is using straight SQL instead of Arel (couldn't figure it out).
Calling .count on the query above gives me this error:
NoMethodError: undefined method 'to_sym' for
#<Arel::Attributes::Attribute:0x007f870dc42c50> from
/Users/neezer/.rvm/gems/ruby-1.9.3-p0/gems/activerecord-3.1.1/lib/active_record/relation/calculations.rb:227:in
'execute_grouped_calculation'
... which I believe is probably related to how I'm mapping out the users_columns, so I don't have to manually type in all of them in the group clause.
How can I fix those two issues?
ORIGINAL QUESTION:
Here's what I have so far that solves the first part of my question:
def self.unique_users_by_company(company)
users = User.arel_table
cards = Card.arel_table
cards_condition = cards[:company_id].eq(company.id)
.and(cards[:user_id].eq(users[:id]))
User.where(Card.where(cards_condition).exists)
end
This gives me 84 unique records, which is correct.
The problem is that I need those User records ordered by cards[:created_at] (whichever is earliest for that particular user). Appending .order(cards[:created_at]) to the scope at the end of the method above does absolutely nothing.
I tried adding in a .joins(:cards), but that give returns 587 records, which is incorrect (duplicate Users). group_by as I understand it is practically useless here as well, because of how PostgreSQL handles it.
I need my result to be an ActiveRecord::Relation (so it's chainable) that returns a list of unique users who have cards that belong to a given company, ordered by the creation date of their first card... with a query that's written in Ruby and is database-agnostic. How can I do this?
class Company
has_many :cards
end
class Card
belongs_to :user
belongs_to :company
end
class User
has_many :cards
end
Please let me know if you need any other information, or if I wasn't clear in my question.

The query you are looking for should look like this one:
SELECT user_id, min(created_at) AS min_created_at
FROM cards
WHERE company_id = 1
GROUP BY user_id
ORDER BY min(created_at)
You can join in the table user if you need columns of that table in the result, else you don't even need it for the query.
If you don't need min_created_at in the SELECT list, you can just leave it away.
Should be easy to translate to Ruby (which I am no good at).
To get the whole user record (as I derive from your comment):
SELECT u.*,
FROM user u
JOIN (
SELECT user_id, min(created_at) AS min_created_at
FROM cards
WHERE company_id = 1
GROUP BY user_id
) c ON u.id = c.user_id
ORDER BY min_created_at
Or:
SELECT u.*
FROM user u
JOIN cards c ON u.id = c.user_id
WHERE c.company_id = 1
GROUP BY u.id, u.col1, u.col2, .. -- You have to spell out all columns!
ORDER BY min(c.created_at)
With PostgreSQL 9.1+ you can simply write:
GROUP BY u.id
(like in MySQL) .. provided id is the primary key.
I quote the release notes:
Allow non-GROUP BY columns in the query target list when the primary
key is specified in the GROUP BY clause (Peter Eisentraut)
The SQL standard allows this behavior, and because of the primary key,
the result is unambiguous.

The fact that you need it to be chainable complicates things, otherwise you can either drop down into SQL yourself or only select the column(s) you need via select("users.id") to get around the Postgres issue. Because at the heart of it your query is something like
SELECT users.id
FROM users
INNER JOIN cards ON users.id = cards.user_id
WHERE cards.company_id = 1
GROUP BY users.id, DATE(cards.created_at)
ORDER BY DATE(cards.created_at) DESC
Which in Arel syntax is more or less:
User.select("id").joins(:cards).where(:"cards.company_id" => company.id).group_by("users.id, DATE(cards.created_at)").order("DATE(cards.created_at) DESC")

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Convert some SQL query to active record - sql

No idea about Ruby but the SQL can be simplified to: SELECT microposts.* FROM microposts WHERE user_id IN ( SELECT r.followed_id as uid FROM relationships r WHERE follower_id = 1 ) OR user_id = 1 ORDER BY microposts.created_at DESC

Related

Relationships query PostgresSQL, Follow/Unfollow functionality with PostgresSQL

Performance of search query bottlenecked 98% by mutual friends despite caching

Rails ActiveRecord query where relationship does not exist based on third attribute

Advanced SQL in Rails

Finding unique records, ordered by field in association, with PostgreSQL and Rails 3?

Categories

Resources