How to use postgres AVG() function with order clause from Rails - sql

I have a model Company and Company has many DailyData.
And DailyData has columns volume and date
To calculate average volume of recent 10 business days I wrote like:
class Array
def sum
inject(0) { |result, el| result + el }
end
def mean
sum.to_d / size
end
end
company = Company.first
company.daily_data.order(date: :desc).limit(10).pluck(:volume).mean
This code works fine, but I want to use postgres AVG() function.
company.daily_data.select('AVG(volume) as average_volume').order(date: :desc)
This code ends up with error:
PG::GroupingError: ERROR: column "daily_data.date" must appear in the GROUP BY clause or be used in an aggregate function
But If I put .group(:date) in method chain, the sql reurns multiple results.
How can I get recent 10 volumes average value by using postgresql AVG() function?

An ActiveRecord query like this:
company.daily_data.select('AVG(volume) as average_volume').order(date: :desc)
doesn't really make much sense. avg is an aggregate function in SQL so it needs to operate on a groups of rows. But you're not telling the database how to group the rows, you're telling the database to compute the average volume over the entire table and then order that single value by something that doesn't exist in the final result set.
Throwing a limit in:
company.daily_data
.select('AVG(volume) as average_volume')
.order(date: :desc)
.limit(10)
won't help either because the limit is applied after the order and by then you've already confused the database with your attempted avg(volume).
I'd probably use a derived table if I was doing this in SQL, something like:
select avg(volume) as average_volume
from (
select volume
from where_ever...
where what_ever...
order by date desc
limit 10
) dt
The derived table in the FROM clause finds the volumes that you want and then the overall query averages those 10 volumes.
Alternatively, you could use a subquery to grab the rows of interest:
select avg(volume) as average_volume
from where_ever...
where id in (
select id
from where_ever...
where what_ever...
order by date desc
limit 10
)
The subquery approach is fairly straight forward to implement with ActiveRecord, something like this:
ten_most_recent = company.daily_data.select(:id).order(:date => :desc).limit(10)
company.daily_data.where(:id => ten_most_recent).average(:volume)
If you throw a to_sql call on the end of the second line you should see something that looks like the subquery SQL.
You can also make the derived table approach work with ActiveRecord but it is a little less natural. There is a from method in ActiveRecord that will take an ActiveRecord query to build the from (select ...) derived table construction but you'll want to be sure to manually name the derived table:
ten_most_recent = company.daily_data.select(:volume).order(:date => :desc).limit(10)
AnyModelAtAll.from(ten_most_recent, 'dt').average('dt.volume')
You have to use a string argument to average and include the dt. prefix to keep ActiveRecord from trying to add its own table name.
Of course you'd hide all this stuff in a method somewhere so that you could hide the details. Perhaps an extension method on the daily_data association:
has_many :daily_data, ... do
def average_volume(days)
#...
end
end
so that you could say things like:
company.daily_data.average_volume(11)

Related

Access SQL GROUP BY problem (eg. tbl_Produktion.ID not part of the aggregation-function)

I want to group by two columns, however MS Access won't let me do it.
Here is the code I wrote:
SELECT
tbl_Produktion.Datum, tbl_Produktion.Schichtleiter,
tbl_Produktion.ProduktionsID, tbl_Produktion.Linie,
tbl_Produktion.Schicht, tbl_Produktion.Anzahl_Schichten_P,
tbl_Produktion.Schichtteam, tbl_Produktion.Von, tbl_Produktion.Bis,
tbl_Produktion.Pause, tbl_Produktion.Kunde, tbl_Produktion.TeileNr,
tbl_Produktion.FormNr, tbl_Produktion.LabyNr,
SUM(tbl_Produktion.Stueckzahl_Prod),
tbl_Produktion.Stueckzahl_Ausschuss, tbl_Produktion.Ausschussgrund,
tbl_Produktion.Kommentar, tbl_Produktion.StvSchichtleiter,
tbl_Produktion.Von2, tbl_Produktion.Bis2, tbl_Produktion.Pause2,
tbl_Produktion.Arbeiter3, tbl_Produktion.Von3, tbl_Produktion.Bis3,
tbl_Produktion.Pause3, tbl_Produktion.Arbeiter4,
tbl_Produktion.Von4, tbl_Produktion.Bis4, tbl_Produktion.Pause4,
tbl_Produktion.Leiharbeiter5, tbl_Produktion.Von5,
tbl_Produktion.Bis5, tbl_Produktion.Pause5,
tbl_Produktion.Leiharbeiter6, tbl_Produktion.Von6,
tbl_Produktion.Bis6, tbl_Produktion.Pause6, tbl_Produktion.Muster
FROM
tbl_Personal
INNER JOIN
tbl_Produktion ON tbl_Personal.PersID = tbl_Produktion.Schichtleiter
GROUP BY
tbl_Produktion.Datum, tbl_Produktion.Schichtleiter;
It works when I group it by all the columns, but not like this.
The error message say that the rest of the columns aren't part of the aggregation-function (translated from german to english as best as I could).
PS.: I also need the sum of "tbl_Produktion.Stueckzahl_Prod" therefore I tried using the SUM function (couldn't try it yet).
Have you tried something along these lines?
SELECT
tbl_Produktion.Datum, tbl_Produktion.Schichtleiter,
MAX(tbl_Produktion.ProduktionsID), MAX(tbl_Produktion.Linie),
MAX(tbl_Produktion.Schicht), MAX(tbl_Produktion.Anzahl_Schichten_P),
MAX(tbl_Produktion.Schichtteam), MAX(tbl_Produktion.Von), MAX(tbl_Produktion.Bis),
SUM(tbl_Produktion.Stueckzahl_Prod)
FROM
tbl_Personal
INNER JOIN
tbl_Produktion ON tbl_Personal.PersID = tbl_Produktion.Schichtleiter
GROUP BY
tbl_Produktion.Datum, tbl_Produktion.Schichtleiter;
I have used the MAX function for all the data except the two items you specify in the GROUP BY and the one where you desire the SUM. I took the liberty of leaving out mush of your data just to get started.
Using the MAX function turns out to be a convenient workaround when the data item is known to be unique within each group. We cannot know your data or your itent, so we cannot tell you whether MAX will yield the results you need.
If you use an aggregation function in the select clause, you must group by every column that you're selecting that's not an aggregation. If you don't want to do that for some reason (perhaps it changes the output of the aggregation in way that you don't intend) you either must think of an aggregate to use (pick a value. Average? Max? Min?) or just do two selects, one for the aggregate, and one for the non-aggregates. But, then, you have to decide how to get the non-aggregated fields that make sense for the aggregate (or show them all in a table, I suppose?)

Why can't I query from an activerecord relation using joins?

I am trying to figure out why this is producing an N + 1 query. Specifically, I want to get the count of upvotes, and thought that by using the joins it would. My goal is to order the results by the count of upvotes, however when I also want to report the count of upvotes for each entry, why does it query the database again for upvotes? Thanks :)
self
.joins(:upvotes)
.group("#{self.table_name}.id")
.order("count(upvotes.id) DESC")
.includes(:creator, :categories)
If this is going to be a common count, Rails has something called counter cache, that will store the count in the parent model, for easier querying.
I don't know your class name, so I called X.
In Upvote: belongs_to :x, counter_cache: true
Then you simply needs a upvotes_count fields in X
Being stored on X, you simply query X, sort by _upvotes_count_ and use that field to display it.

Rails: Need to scope by max version

I have this problem, I've got database table that looks like this:
"63";"CLINICAL...";"Please...";Blah...;"2014-09-23 13:15:59";37;8
"64";"CLINICAL...";"Please...";Blah...;"2014-09-23 13:22:51";37;9
The values that matter are the second to last and last one.
As you can see, the second to last (abstract_category_numbers) are the same, but the last differs (version_numbers)
Here is the problem:
When I make a scope, it returns all of the records, which i need to focus on the one with the maximum version number.
In SQL i would do something like this:
'SELECT * FROM Category c WHERE
NOT EXISTS SELECT * FROM Category c1
WHERE c.version_number < c1.version_number
AND c.abstract_category_id = c1.abstract_category_id'
But i'm totally lost at Ruby, more specifically how to do this kind of select in the scope (I understand it should be a relation)
Thanks
We can create a scope to select the category with max version_number like this:
scope :with_max_version_number, -> {
joins("JOIN ( SELECT abstract_category_id, max(version_number) AS max_version
FROM categories
GROUP BY abstract_category_id
) AS temp
ON temp.abstract_category_id = categories.abstract_category_id
AND temp.max_version = categories.version_number"
)
}
Basically, we will select the category with the max_version value on temp table in the subquery.
Btw, I expect the table name is categories, you may correct it. Then the final query will be:
Category.with_max_version_number
Scopes are suppose to return an array of values even if there is only 1 record.
If you want to ALWAYS return 1 value, use a static method instead.
def self.max_abstract_category
<your_scope>.max_by{ |obj| obj.version_number }
end
If I understand your question: you have a database table with a version_number column, which rails represents using an Active Record model--that I'll call Category because I don't know what you've called it--and you want to find the single Category record with the largest version_number?
Category.all.order(version_numbers: :DESC).limit(1).first
This query asks for all Category records ordered by version_number from highest to lowest and limits the request to one record (the first record, a.k.a the highest). Because the result of this request is an array containing one record, we call .first on the request to simply return the record.
As far as I'm aware, a scope is simply a named query (I don't actually use scopes). I think you can save this query as a scope by adding the following to your Category model. This rails guide explains more about Scopes.
scope :highest_version, -> { all.order(version_numbers: :DESC).limit(1).first }
I join implementation with baby_squeel but for some reason it was very slow on mysql. So I ended up with something like:
scope :only_latest, -> do
where(%{
NOT EXISTS (SELECT * FROM Category c
WHERE categories.version_number < version_number
AND categories.abstract_category_id = abstract_category_id')
}
end
I filed a BabySqueel bug as I spent a long time trying to do in a code proper way to no avail.

How do I execute a subselect (or large group by) of SQL in Rails 4 with ActiveModel?

I would like to execute this type of SQL and get an array of Model1:
select model1.*,
(select sum(prop) from model2 where model2.model1_id = model1.id) as total
from model1
where model1.parent_id is null
order by total desc
I tried to use
#parent.model1.select("model2.prop").where(parent_id: nil).group(:id).order(:prop => :desc)
but it gave an error about ' missing FROM-clause entry for table'. I don't know how to use sub-selects in Rails, and it's not in the guide. The only thing I can think to do is loop through each model1 and find the sum of the property I'm looking for, but that would generate N+1 queries and be really slow compared to 1 straight SQL. I suppose using a group by and grouping by every single property of model1 could also work, but that sounds messy.
Holy cow I did it!
#post.comments.select(Comment.columns.map{|c| c.name}).select("(select coalesce(sum(vote),0) as total from votes where votes.comment_id = comments.id) as total").order("total desc")

Group by SQL statement

So I got this statement, which works fine:
SELECT MAX(patient_history_date_bio) AS med_date, medication_name
FROM biological
WHERE patient_id = 12)
GROUP BY medication_name
But, I would like to have the corresponding medication_dose also. So I type this up
SELECT MAX(patient_history_date_bio) AS med_date, medication_name, medication_dose
FROM biological
WHERE (patient_id = 12)
GROUP BY medication_name
But, it gives me an error saying:
"coumn 'biological.medication_dose' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.".
So I try adding medication_dose to the GROUP BY clause, but then it gives me extra rows that I don't want.
I would like to get the latest row for each medication in my table. (The latest row is determined by the max function, getting the latest date).
How do I fix this problem?
Use:
SELECT b.medication_name,
b.patient_history_date_bio AS med_date,
b.medication_dose
FROM BIOLOGICAL b
JOIN (SELECT y.medication_name,
MAX(y.patient_history_date_bio) AS max_date
FROM BIOLOGICAL y
GROUP BY y.medication_name) x ON x.medication_name = b.medication_name
AND x.max_date = b.patient_history_date_bio
WHERE b.patient_id = ?
If you really have to, as one quick workaround, you can apply an aggregate function to your medication_dose such as MAX(medication_dose).
However note that this is normally an indication that you are either building the query incorrectly, or that you need to refactor/normalize your database schema. In your case, it looks like you are tackling the query incorrectly. The correct approach should the one suggested by OMG Poinies in another answer.
You may be interested in checking out the following interesting article which describes the reasons behind this error:
But WHY Must That Column Be Contained in an Aggregate Function or the GROUP BY clause?
You need to put max(medication_dose) in your select. Group by returns a result set that contains distinct values for fields in your group by clause, so apparently you have multiple records that have the same medication_name, but different doses, so you are getting two results.
By putting in max(medication_dose) it will return the maximum dose value for each medication_name. You can use any aggregate function on dose (max, min, avg, sum, etc.)