I have Articles that have_many Metrics. I wish to order the Articles by a specific Metric.value when Metric.name = "score". (Metric records various article stats as 'name' and 'value' pairs. An Article can have multiple metrics, and even multiple 'scores', although I'm only interested in ordering by the most recent.)
class Article
has_many :metrics
class Metric
# name :string(255)
# value :decimal(, )
belongs_to :article
I'm struggling to write a scope to do this - any ideas? Something like this?
scope :highest_score, joins(:metrics).order('metrics.value DESC')
.where('metrics.name = "score"')
UPDATE:
An article may have many "scores" stored in the metrics table (as they are calculated weekly/monthly/yearly etc.) but I'm only interested in using the first-found (most recent) "score" for any one article. The Metric model has a default_scope that ensures DESCending ordering.
Fixed typo on quote location for 'metrics.value DESC'.
Talking to my phone-a-friend uber rails hacker, it looks likely I need a raw SQL query for this. Now I'm in way over my head... (I'm using Postgres if that helps.)
Thanks!
UPDATE 2:
Thanks to Erwin's great SQL query suggestion I have a raw SQL query that works:
SELECT a.*
FROM articles a
LEFT JOIN (
SELECT DISTINCT ON (article_id)
article_id, value
FROM metrics m
WHERE name = 'score'
ORDER BY article_id, date_created DESC
) m ON m.article_id = a.id
ORDER BY m.value DESC;
article_list_by_desc_score = ActiveRecord::Base.connection.execute(sql)
Which gives an array of hashes representing article data (but not article objects??).
Follow-up question:
Any way of translating this back into an activerecord query for Rails? (so I can then use it in a scope)
SOLUTION UPDATE:
In case anyone is looking for the final ActiveRecord query - many thanks to Mattherick who helped me in this question. The final working query is:
scope :highest_score, joins(:metrics).where("metrics.name"
=> "score").order("metrics.value desc").group("metrics.article_id",
"articles.id", "metrics.value", "metrics.date_created")
.order("metrics.date_created desc")
Thanks everyone!
The query could work like this:
SELECT a.*
FROM article a
LEFT JOIN (
SELECT DISTINCT ON (article_id)
article_id, value
FROM metrics m
WHERE name = 'score'
ORDER BY article_id, date_created DESC
) m ON m.metrics_id = a.metrics_id
ORDER BY m.value DESC;
First, retrieve the "most recent" value for name = 'score' per article in the subquery m. More explanation for the used technique in this related answer:
Select first row in each GROUP BY group?
You seem to fall victim to a very basic misconception though:
but I'm only interested in using the first-found (most recent) "score"
for any one article. The Metric model has a default_scope that ensures DESCending ordering.
There is no "natural order" in a table. In a SELECT, you need to ORDER BY well defined criteria. For the purpose of this query I am assuming a column metrics.date_created. If you have nothing of the sort, you have no way to define "most recent" and are forced to fall back to an arbitrary pick from multiple qualifying rows:
ORDER BY article_id
This is not reliable. Postgres will pick a row as it choses. May change with any update to the table or any change in the query plan.
Next, LEFT JOIN to the the table article and ORDER BY value. NULL sorts last, so articles without qualifying value go last.
Note: some not-so-smart ORMs (and I am afraid Ruby's ActiveRecord is one of them) use the non-descriptive and non-distinctive id as name for the primary key. You'll have to adapt to your actual column names, which you didn't provide.
Performance
Should be decent. This is a "simple" query as far as Postgres is concerned. A partial multicolumn index on table metrics would make it faster:
CREATE INDEX metrics_some_name_idx ON metrics(article_id, date_created)
WHERE name = 'score';
Columns in this order. In PostgreSQL 9.2+ you could add the column value to make index-only scans possible:
CREATE INDEX metrics_some_name_idx ON metrics(article_id, date_created, value)
WHERE name = 'score';
Related
I'm using Rails 4.2 and PostgreSQL 9.4.
I have a basic users, reservations and events schema.
I'd like to return a list of users and the most recent event they attended, along with what date/time this was at.
I've created a query that returns the user and the time of the most recent event. However I need to return the events.id as well.
My application does not allow a user to reserve two events with the same start time, however I appreciate SQL does not know anything about this and thinks there can be multiple events in the result. Hence I am happy for the query to return an appropriate event ID at random in the case of a hypothetical 'tie' for events.starts_at.
User.all.joins(reservations: :event)
.select('users.*, max(events.starts_at)')
.where('reservations.state = ?', "attended")
.where('events.company_id = ?', 1)
.group('users.id')
The corresponding SQL query is:
SELECT users.*, max(events.starts_at) FROM "users" INNER JOIN "reservations" ON "reservations"."user_id" = "users"."id" INNER JOIN "events" ON "events"."id" = "reservations"."event_id" WHERE (reservations.state = 'attended') AND (events.company_id = 1) GROUP BY users.id
The reservations table is very large so loading the entire set into Rails and processing it via Ruby code is undesirable. I'd like to perform the entire query in SQL if it is possible to do so.
My basic model:
User
has_many :reservations
Reservation
belongs_to :user
belongs_to :event
Event
belongs_to :company
has_many :reservations
The generic sql that returns data for the most recent event looks like this:
select yourfields
from yourtables
join
(select someField
, max(datetimefield) maxDateTime
from table1
where whatever
group by someField ) temp on table1.someField = temp.somefield
and table1.dateTimeField = maxDateTime
where whatever
The two "where whatever" things should be the same. All you have to do is adapt this construct into your app. You might consider putting the query into a stored procedure which you then call from your app.
I think your query should focus first to retrieve the most recent reservation.
SELECT MAX(`events.starts_at`),`events"."id`,`user_id` FROM `reservations` WHERE (reservations.state = 'attended')
Then JOIN the Users and Events.
Assuming the results will include every User and Event it may be more efficient to retrieve all users and events and store then in two arrays keyed by id.
The logic behind that is rather than a separate lookup into the user and events table for each resulting reservation by the db engine, it is more efficient to get them all in a single query.
SELECT * FROM Users' WHERE 1 ORDER BYuser_id`
SELECT * FROM Events' WHERE 1 ORDER BYevent_id`
I am not familiar with Rails syntax so cannot give exact code but can show using it in PHP code, the results are put into the array with a single line of code.
while ($row = mysql_fetch_array($results, MYSQL_NUM)){users[$row(user_id)] = $row;}
Then when processing the Reservations you get the user and event data from the arrays.
The Index for reservations is critical and may be worth profiling.
Possible profile choices may be to include and exclude 'attended' in the Index. The events.starts_at should be the first column in the index followed by user_id. But profiling the Index's column order should be profiled.
You may want to use a unique Index to enforce the no duplicate reservations times.
I have a User model that has many Post.
I want to get, on a single query, a list of users IDs, ordered by name, and include the ID of their last post.
Is there a way to do this using the ActiveRecord API instead of a SQL query like the following?
SELECT users.id,
(SELECT id FROM posts
WHERE user_id = users.id
ORDER BY id DESC LIMIT 1) AS last_post_id
FROM users
ORDER BY id ASC;
You should be able to do this with the query generator:
User.joins(:posts).group('users.id').order('users.id').pluck(:id, 'MAX(posts.id)')
There's a lot of options on the relationship you can use to get data out of it. pluck is handy for getting values independent of models.
Update: To get models instead:
User.joins(:posts).group('users.id').order('users.id').select('users.*', 'MAX(posts.id) AS max_post_id')
That will create a field called max_post_id which works as any other attribute.
UPDATE: So thanks to #Erwin Brandstetter, I now have this:
def self.unique_users_by_company(company)
users = User.arel_table
cards = Card.arel_table
users_columns = User.column_names.map { |col| users[col.to_sym] }
cards_condition = cards[:company_id].eq(company.id).
and(cards[:user_id].eq(users[:id]))
User.joins(:cards).where(cards_condition).group(users_columns).
order('min(cards.created_at)')
end
... which seems to do exactly what I want. There are two shortcomings that I would still like to have addressed, however:
The order() clause is using straight SQL instead of Arel (couldn't figure it out).
Calling .count on the query above gives me this error:
NoMethodError: undefined method 'to_sym' for
#<Arel::Attributes::Attribute:0x007f870dc42c50> from
/Users/neezer/.rvm/gems/ruby-1.9.3-p0/gems/activerecord-3.1.1/lib/active_record/relation/calculations.rb:227:in
'execute_grouped_calculation'
... which I believe is probably related to how I'm mapping out the users_columns, so I don't have to manually type in all of them in the group clause.
How can I fix those two issues?
ORIGINAL QUESTION:
Here's what I have so far that solves the first part of my question:
def self.unique_users_by_company(company)
users = User.arel_table
cards = Card.arel_table
cards_condition = cards[:company_id].eq(company.id)
.and(cards[:user_id].eq(users[:id]))
User.where(Card.where(cards_condition).exists)
end
This gives me 84 unique records, which is correct.
The problem is that I need those User records ordered by cards[:created_at] (whichever is earliest for that particular user). Appending .order(cards[:created_at]) to the scope at the end of the method above does absolutely nothing.
I tried adding in a .joins(:cards), but that give returns 587 records, which is incorrect (duplicate Users). group_by as I understand it is practically useless here as well, because of how PostgreSQL handles it.
I need my result to be an ActiveRecord::Relation (so it's chainable) that returns a list of unique users who have cards that belong to a given company, ordered by the creation date of their first card... with a query that's written in Ruby and is database-agnostic. How can I do this?
class Company
has_many :cards
end
class Card
belongs_to :user
belongs_to :company
end
class User
has_many :cards
end
Please let me know if you need any other information, or if I wasn't clear in my question.
The query you are looking for should look like this one:
SELECT user_id, min(created_at) AS min_created_at
FROM cards
WHERE company_id = 1
GROUP BY user_id
ORDER BY min(created_at)
You can join in the table user if you need columns of that table in the result, else you don't even need it for the query.
If you don't need min_created_at in the SELECT list, you can just leave it away.
Should be easy to translate to Ruby (which I am no good at).
To get the whole user record (as I derive from your comment):
SELECT u.*,
FROM user u
JOIN (
SELECT user_id, min(created_at) AS min_created_at
FROM cards
WHERE company_id = 1
GROUP BY user_id
) c ON u.id = c.user_id
ORDER BY min_created_at
Or:
SELECT u.*
FROM user u
JOIN cards c ON u.id = c.user_id
WHERE c.company_id = 1
GROUP BY u.id, u.col1, u.col2, .. -- You have to spell out all columns!
ORDER BY min(c.created_at)
With PostgreSQL 9.1+ you can simply write:
GROUP BY u.id
(like in MySQL) .. provided id is the primary key.
I quote the release notes:
Allow non-GROUP BY columns in the query target list when the primary
key is specified in the GROUP BY clause (Peter Eisentraut)
The SQL standard allows this behavior, and because of the primary key,
the result is unambiguous.
The fact that you need it to be chainable complicates things, otherwise you can either drop down into SQL yourself or only select the column(s) you need via select("users.id") to get around the Postgres issue. Because at the heart of it your query is something like
SELECT users.id
FROM users
INNER JOIN cards ON users.id = cards.user_id
WHERE cards.company_id = 1
GROUP BY users.id, DATE(cards.created_at)
ORDER BY DATE(cards.created_at) DESC
Which in Arel syntax is more or less:
User.select("id").joins(:cards).where(:"cards.company_id" => company.id).group_by("users.id, DATE(cards.created_at)").order("DATE(cards.created_at) DESC")
Say I've employee table where any employee can be related to any other employee (many to many). Each employee has many characteristics that are stored separately.
emp: id,name
related: name,emp1role,emp1id,emp2role,emp2id
chars: empid,name,value
I want to get all the characteristics of employees who are related via 'xxx' along with the relation. I am currently using this query:
SELECT c.empid, c.Name, c.Value
FROM chars as c, related as r
WHERE r.name='xxx' AND (r.emp1id=c.empid OR r.emp2id=c.empid)
This works and it gives related employees one after another i.e. if emp22 & emp43 are related via 'xxx' then I am getting chars of emp43 followed by emp22 and so on. This way I am able to know which two employees are related (which is needed). However, I want to know if this order is mere luck or is it well-defined. This is in SQLite.
If it is not defined way, how else can I do it? Also, I need to know their respective roles. I want to preferably do it in one query. Can you think of some other query?
Thanks in advance,
Manish
PS: These are not actual tables. They are here for simplicity of asking question.
In SQL, ordering of the result is undefined unless you have an explicit ORDER BY clause in your query. So I believe you want:
SELECT c.empid, c.Name, c.Value
FROM chars as c, related as r
WHERE r.name='xxx' AND (r.emp1id=c.empid OR r.emp2id=c.empid)
ORDER BY c.empid ASC
I tend to add a unique field (such as a primary key) at the end, to if not get an obvious ordering when there are multiple records matching in all other order fields, at least get deterministic ordering. But that's largely a matter of style and choice; it's by no means required.
If I GROUP BY on a unique key, and apply a LIMIT clause to the query, will all the groups be calculated before the limit is applied?
If I have hundred records in the table (each has a unique key), Will I have 100 records in the temporary table created (for the GROUP BY) before a LIMIT is applied?
A case study why I need this:
Take Stack Overflow for example.
Each query you run to show a list of questions, also shows the user who asked this question, and the number of badges he has.
So, while a user<->question is one to one, user<->badges is one has many.
The only way to do it in one query (and not one on questions and another one on users and then combine results), is to group the query by the primary key (question_id) and join+group_concat to the user_badges table.
The same goes for the questions TAGS.
Code example:
Table Questions:
question_id (int)(pk)| question_body(varchar)
Table tag-question:
question-id (int) | tag_id (int)
SELECT:
SELECT quesuestions.question_id,
questions.question_body,
GROUP-CONCAT(tag_id,' ') AS 'tags-ids'
FROM
questions
JOIN
tag_question
ON
questions.question_id=tag-question.question-id
GROUP BY
questions.question-id
LIMIT 15
Yes, the order the query executes is:
FROM
WHERE
GROUP
HAVING
SORT
SELECT
LIMIT
LIMIT is the last thing calculated, so your grouping will be just fine.
Now, looking at your rephrased question, then you're not having just one row per group, but many: in the case of stackoverflow, you'll have just one user per row, but many badges - i.e.
(uid, badge_id, etc.)
(1, 2, ...)
(1, 3, ...)
(1, 12, ...)
all those would be grouped together.
To avoid full table scan all you need are indexes. Besides that, if you need to SUM, for example, you cannot avoid a full scan.
EDIT:
You'll need something like this (look at the WHERE clause):
SELECT
quesuestions.question_id,
questions.question_body,
GROUP_CONCAT(tag_id,' ') AS 'tags_ids'
FROM
questions q1
JOIN tag_question tq
ON q1.question_id = tq.question-id
WHERE
q1.question_id IN (
SELECT
tq2.question_id
FROM
tag_question tq2
ON q2.question_id = tq2.question_id
JOIN tag t
tq2.tag_id = t.tag_id
WHERE
t.name = 'the-misterious-tag'
)
GROUP BY
q1.question_id
LIMIT 15
LIMIT does get applied after GROUP BY.
Will the temporary table be created or not, depends on how your indexes are built.
If you have an index on the grouping field and don't order by the aggregate results, then an INDEX SCAN FOR GROUP BY is applied, and each aggregate is counted on the fly.
That means that if you don't select an aggregate due to the LIMIT, it won't ever be calculated.
But if you order by an aggregate, then, of course, all of them need to be calculated before they can be sorted.
That's why they are calculated first and then the filesort is applied.
Update:
As for your query, see what EXPLAIN EXTENDED says for it.
Most probably, question_id is a PRIMARY KEY for your table, and most probably, it will be used in a scan.
That means no filesort will be applies and the join itself will not ever happen after the 15'th row.
To make sure, rewrite your query as following:
SELECT question_id,
question_body,
(
SELECT GROUP_CONCAT(tag_id, ' ')
FROM tag_question t
WHERE t.question_id = q.question_id
)
FROM questions q
ORDER BY
question_id
LIMIT 15
First, it is more readable,
Second, it is more efficient, and
Third, it will return even untagged questions (which your current query doesn't).
If the field you're grouping on is indexed, it shouldn't do a full table scan.