Rails subquery without SQL? - sql

I have a User model that has many Post.
I want to get, on a single query, a list of users IDs, ordered by name, and include the ID of their last post.
Is there a way to do this using the ActiveRecord API instead of a SQL query like the following?
SELECT users.id,
(SELECT id FROM posts
WHERE user_id = users.id
ORDER BY id DESC LIMIT 1) AS last_post_id
FROM users
ORDER BY id ASC;

You should be able to do this with the query generator:
User.joins(:posts).group('users.id').order('users.id').pluck(:id, 'MAX(posts.id)')
There's a lot of options on the relationship you can use to get data out of it. pluck is handy for getting values independent of models.
Update: To get models instead:
User.joins(:posts).group('users.id').order('users.id').select('users.*', 'MAX(posts.id) AS max_post_id')
That will create a field called max_post_id which works as any other attribute.

Related

How to get post with maximum likes or post with likes counts in rails

I am having two models post and like, having a relationship between them. Post has_many likes. I wanted an optimal way to find which post has maximum likes. One way of doing this by
count = {}
Post.includes(:likes).each do |post|
count[post.id] = post.likes.count
end
Initially I used array which is not a good data structure so I used hash,but still I am not satisfy with this type of approach. What would be the best to get posts with likes.
Also, I have tried the following query but it is not working as expected so could anyone can suggest a better and optimal approach.
Post.joins("LEFT OUTER JOIN Likes ON likes.post_id =posts.id").group("posts.id").order("COUNT(likes.id) DESC")
Use counter_cache so that you always have a count of likes on the Post objects, then you can call Post.maximum(:likes_count).first to retrieve the one post that has the most likes. Likewise, any Post query will include a post's like count.
You don't need joining. Group likes by post_id and count them. The resulting post_id with max count will be id of your most liked post. Then you can join or just select the post you're looking for. In pure SQL it would look like:
SELECT l.post_id, count(*) as cnt
FROM likes l
GROUP BY l.post_id
ORDER BY cnt DESC
LIMIT 1;

Order with a has_many relationship

I have Articles that have_many Metrics. I wish to order the Articles by a specific Metric.value when Metric.name = "score". (Metric records various article stats as 'name' and 'value' pairs. An Article can have multiple metrics, and even multiple 'scores', although I'm only interested in ordering by the most recent.)
class Article
has_many :metrics
class Metric
# name :string(255)
# value :decimal(, )
belongs_to :article
I'm struggling to write a scope to do this - any ideas? Something like this?
scope :highest_score, joins(:metrics).order('metrics.value DESC')
.where('metrics.name = "score"')
UPDATE:
An article may have many "scores" stored in the metrics table (as they are calculated weekly/monthly/yearly etc.) but I'm only interested in using the first-found (most recent) "score" for any one article. The Metric model has a default_scope that ensures DESCending ordering.
Fixed typo on quote location for 'metrics.value DESC'.
Talking to my phone-a-friend uber rails hacker, it looks likely I need a raw SQL query for this. Now I'm in way over my head... (I'm using Postgres if that helps.)
Thanks!
UPDATE 2:
Thanks to Erwin's great SQL query suggestion I have a raw SQL query that works:
SELECT a.*
FROM articles a
LEFT JOIN (
SELECT DISTINCT ON (article_id)
article_id, value
FROM metrics m
WHERE name = 'score'
ORDER BY article_id, date_created DESC
) m ON m.article_id = a.id
ORDER BY m.value DESC;
article_list_by_desc_score = ActiveRecord::Base.connection.execute(sql)
Which gives an array of hashes representing article data (but not article objects??).
Follow-up question:
Any way of translating this back into an activerecord query for Rails? (so I can then use it in a scope)
SOLUTION UPDATE:
In case anyone is looking for the final ActiveRecord query - many thanks to Mattherick who helped me in this question. The final working query is:
scope :highest_score, joins(:metrics).where("metrics.name"
=> "score").order("metrics.value desc").group("metrics.article_id",
"articles.id", "metrics.value", "metrics.date_created")
.order("metrics.date_created desc")
Thanks everyone!
The query could work like this:
SELECT a.*
FROM article a
LEFT JOIN (
SELECT DISTINCT ON (article_id)
article_id, value
FROM metrics m
WHERE name = 'score'
ORDER BY article_id, date_created DESC
) m ON m.metrics_id = a.metrics_id
ORDER BY m.value DESC;
First, retrieve the "most recent" value for name = 'score' per article in the subquery m. More explanation for the used technique in this related answer:
Select first row in each GROUP BY group?
You seem to fall victim to a very basic misconception though:
but I'm only interested in using the first-found (most recent) "score"
for any one article. The Metric model has a default_scope that ensures DESCending ordering.
There is no "natural order" in a table. In a SELECT, you need to ORDER BY well defined criteria. For the purpose of this query I am assuming a column metrics.date_created. If you have nothing of the sort, you have no way to define "most recent" and are forced to fall back to an arbitrary pick from multiple qualifying rows:
ORDER BY article_id
This is not reliable. Postgres will pick a row as it choses. May change with any update to the table or any change in the query plan.
Next, LEFT JOIN to the the table article and ORDER BY value. NULL sorts last, so articles without qualifying value go last.
Note: some not-so-smart ORMs (and I am afraid Ruby's ActiveRecord is one of them) use the non-descriptive and non-distinctive id as name for the primary key. You'll have to adapt to your actual column names, which you didn't provide.
Performance
Should be decent. This is a "simple" query as far as Postgres is concerned. A partial multicolumn index on table metrics would make it faster:
CREATE INDEX metrics_some_name_idx ON metrics(article_id, date_created)
WHERE name = 'score';
Columns in this order. In PostgreSQL 9.2+ you could add the column value to make index-only scans possible:
CREATE INDEX metrics_some_name_idx ON metrics(article_id, date_created, value)
WHERE name = 'score';

Eliminate subquery in the FROM clause

The tagging table has 3 columns: id (the primary key), tag, and resource.
I want to select the tags that are associated with at least 3 resources. A resource can be associated several times with the same tag, so a single GROUP BY is not enough.
My current SQL query is the following:
SELECT tag FROM
(SELECT resource, tag FROM tagging GROUP BY resource, tag) AS tagging
GROUP BY tag HAVING count(*) > 2;
I need to convert this request in HQL, and HQL does not accept subqueries inside the FROM clause.
Is there a (fast) way to do the same thing without using a subquery, or with a subquery in the WHERE clause?
Thank you
To find tags that are associated with more than 2 different resources you can use
SELECT tag
FROM tagging
GROUP BY tag
HAVING count(DISTINCT resource) > 2;

Finding unique records, ordered by field in association, with PostgreSQL and Rails 3?

UPDATE: So thanks to #Erwin Brandstetter, I now have this:
def self.unique_users_by_company(company)
users = User.arel_table
cards = Card.arel_table
users_columns = User.column_names.map { |col| users[col.to_sym] }
cards_condition = cards[:company_id].eq(company.id).
and(cards[:user_id].eq(users[:id]))
User.joins(:cards).where(cards_condition).group(users_columns).
order('min(cards.created_at)')
end
... which seems to do exactly what I want. There are two shortcomings that I would still like to have addressed, however:
The order() clause is using straight SQL instead of Arel (couldn't figure it out).
Calling .count on the query above gives me this error:
NoMethodError: undefined method 'to_sym' for
#<Arel::Attributes::Attribute:0x007f870dc42c50> from
/Users/neezer/.rvm/gems/ruby-1.9.3-p0/gems/activerecord-3.1.1/lib/active_record/relation/calculations.rb:227:in
'execute_grouped_calculation'
... which I believe is probably related to how I'm mapping out the users_columns, so I don't have to manually type in all of them in the group clause.
How can I fix those two issues?
ORIGINAL QUESTION:
Here's what I have so far that solves the first part of my question:
def self.unique_users_by_company(company)
users = User.arel_table
cards = Card.arel_table
cards_condition = cards[:company_id].eq(company.id)
.and(cards[:user_id].eq(users[:id]))
User.where(Card.where(cards_condition).exists)
end
This gives me 84 unique records, which is correct.
The problem is that I need those User records ordered by cards[:created_at] (whichever is earliest for that particular user). Appending .order(cards[:created_at]) to the scope at the end of the method above does absolutely nothing.
I tried adding in a .joins(:cards), but that give returns 587 records, which is incorrect (duplicate Users). group_by as I understand it is practically useless here as well, because of how PostgreSQL handles it.
I need my result to be an ActiveRecord::Relation (so it's chainable) that returns a list of unique users who have cards that belong to a given company, ordered by the creation date of their first card... with a query that's written in Ruby and is database-agnostic. How can I do this?
class Company
has_many :cards
end
class Card
belongs_to :user
belongs_to :company
end
class User
has_many :cards
end
Please let me know if you need any other information, or if I wasn't clear in my question.
The query you are looking for should look like this one:
SELECT user_id, min(created_at) AS min_created_at
FROM cards
WHERE company_id = 1
GROUP BY user_id
ORDER BY min(created_at)
You can join in the table user if you need columns of that table in the result, else you don't even need it for the query.
If you don't need min_created_at in the SELECT list, you can just leave it away.
Should be easy to translate to Ruby (which I am no good at).
To get the whole user record (as I derive from your comment):
SELECT u.*,
FROM user u
JOIN (
SELECT user_id, min(created_at) AS min_created_at
FROM cards
WHERE company_id = 1
GROUP BY user_id
) c ON u.id = c.user_id
ORDER BY min_created_at
Or:
SELECT u.*
FROM user u
JOIN cards c ON u.id = c.user_id
WHERE c.company_id = 1
GROUP BY u.id, u.col1, u.col2, .. -- You have to spell out all columns!
ORDER BY min(c.created_at)
With PostgreSQL 9.1+ you can simply write:
GROUP BY u.id
(like in MySQL) .. provided id is the primary key.
I quote the release notes:
Allow non-GROUP BY columns in the query target list when the primary
key is specified in the GROUP BY clause (Peter Eisentraut)
The SQL standard allows this behavior, and because of the primary key,
the result is unambiguous.
The fact that you need it to be chainable complicates things, otherwise you can either drop down into SQL yourself or only select the column(s) you need via select("users.id") to get around the Postgres issue. Because at the heart of it your query is something like
SELECT users.id
FROM users
INNER JOIN cards ON users.id = cards.user_id
WHERE cards.company_id = 1
GROUP BY users.id, DATE(cards.created_at)
ORDER BY DATE(cards.created_at) DESC
Which in Arel syntax is more or less:
User.select("id").joins(:cards).where(:"cards.company_id" => company.id).group_by("users.id, DATE(cards.created_at)").order("DATE(cards.created_at) DESC")

"subquery returns more than 1 row" error.

I am new to web programming and I'm trying to make a twitter-clone. At this point, I have 3 tables:
users (id, name)
id is the auto-generated id
name of the user
tweets (id, content, user_id)
id is the auto-generated id
content is the text of the tweet
user_id is the id of the user that made the post
followers (id, user_id, following_id)
id is the auto-generated id
user_id is the user who is doing the following
following_id is the user that is being followed
So, being new to sql as well, I am trying to build an SQL statement that would return the tweets that the currently-logged in user and of everyone he follows.
I tried to use this statement, which works sometimes, but other times, I get an error that says "Subquery returns more than 1 row". Here is the statement:
SELECT * FROM tweets
WHERE user_id IN
((SELECT following_id FROM followers
WHERE user_id = 1),1) ORDER BY date DESC
I put 1 as an example here, which would be the id of the currently logged-in user.
I haven't had any luck with this statement; any help would be greatly appreciated! Thank you.
In one comment you ask is it generally better to use a subquery or a union. Unfortunately, there is no simple answer, just some information.
Some varieties of SQL have problems optimising the IN clause if the lsit is large, and may perform better in any of the following ways...
SELECT * FROM tweets
INNER JOIN followers ON tweets.user_id = followers.following_id
WHERE followers.user_id = 1
UNION ALL
SELECT * FROM tweets
WHERE user_id = 1
Or...
SELECT
*
FROM
tweets
INNER JOIN
(SELECT following_id FROM followers WHERE user_id = 1 UNION SELECT 1) AS followers
ON tweets.user_id = followers.following_id
Or...
SELECT
*
FROM
tweets
WHERE
EXISTS (SELECT * FROM followers WHERE following_id = tweets.user_id and user_id = 1)
OR user_id = 1
There are many, many alternatives...
Some varieties of SQL struggle to optimise the OR condition, and end up checking every record in the tweets table instead of utilising an INDEX. This would make the UNION option preferrable, because each half of the query will then benefit from an index on the user_id field.
But you CAN actually refactor this corner case out of your code altogether : Make every user a follower of themselves. This would then mean that getting tweets for followers would naturally include the user themselves. Whether this would make sense in all cases is dependant on your design and other functional requirements.
In short, your best bet is to create some representative data and test the options. But I wouldn't really worry about it for now. As long as you encapsulate this code in one place, you can just pick one that you are most comfortable with. Then, when you have the rest of the system hashed out, and you're much more confident that things won't change, you can go back and optimise if necessary.
SELECT *
FROM tweets
WHERE
user_id IN (SELECT following_id FROM followers WHERE user_id = 1)
OR user_id = 1
ORDER BY date DESC
try this
SELECT * FROM tweets WHERE user_id = [YourUser]
UNION
SELECT * FROM tweets WHERE user_id in (SELECT following_id FROM followers WHERE user_id ? [YourUser]
shall work even if you've got no followers for your user
There's also a solution with joins, but actually I'm in a hurry. Will try to write the query as soon as I have the time to. Some other will probably answer by then. Sorry.