Rails: Need to scope by max version - sql

I have this problem, I've got database table that looks like this:
"63";"CLINICAL...";"Please...";Blah...;"2014-09-23 13:15:59";37;8
"64";"CLINICAL...";"Please...";Blah...;"2014-09-23 13:22:51";37;9
The values that matter are the second to last and last one.
As you can see, the second to last (abstract_category_numbers) are the same, but the last differs (version_numbers)
Here is the problem:
When I make a scope, it returns all of the records, which i need to focus on the one with the maximum version number.
In SQL i would do something like this:
'SELECT * FROM Category c WHERE
NOT EXISTS SELECT * FROM Category c1
WHERE c.version_number < c1.version_number
AND c.abstract_category_id = c1.abstract_category_id'
But i'm totally lost at Ruby, more specifically how to do this kind of select in the scope (I understand it should be a relation)
Thanks

We can create a scope to select the category with max version_number like this:
scope :with_max_version_number, -> {
joins("JOIN ( SELECT abstract_category_id, max(version_number) AS max_version
FROM categories
GROUP BY abstract_category_id
) AS temp
ON temp.abstract_category_id = categories.abstract_category_id
AND temp.max_version = categories.version_number"
)
}
Basically, we will select the category with the max_version value on temp table in the subquery.
Btw, I expect the table name is categories, you may correct it. Then the final query will be:
Category.with_max_version_number

Scopes are suppose to return an array of values even if there is only 1 record.
If you want to ALWAYS return 1 value, use a static method instead.
def self.max_abstract_category
<your_scope>.max_by{ |obj| obj.version_number }
end

If I understand your question: you have a database table with a version_number column, which rails represents using an Active Record model--that I'll call Category because I don't know what you've called it--and you want to find the single Category record with the largest version_number?
Category.all.order(version_numbers: :DESC).limit(1).first
This query asks for all Category records ordered by version_number from highest to lowest and limits the request to one record (the first record, a.k.a the highest). Because the result of this request is an array containing one record, we call .first on the request to simply return the record.
As far as I'm aware, a scope is simply a named query (I don't actually use scopes). I think you can save this query as a scope by adding the following to your Category model. This rails guide explains more about Scopes.
scope :highest_version, -> { all.order(version_numbers: :DESC).limit(1).first }

I join implementation with baby_squeel but for some reason it was very slow on mysql. So I ended up with something like:
scope :only_latest, -> do
where(%{
NOT EXISTS (SELECT * FROM Category c
WHERE categories.version_number < version_number
AND categories.abstract_category_id = abstract_category_id')
}
end
I filed a BabySqueel bug as I spent a long time trying to do in a code proper way to no avail.

Related

Using rails , what 's wrong with this query , it does not return a valid id

store_id=Store.select(:id).where(user_id:current_user.id).to_a.first
it returns id like that : Store:0x00007f8717546c30
Store.select(:id).where(user_id:current_user.id).to_a.first
select does not return an array of strings or integers for the given column(s), but rather an active record relation containing objects with just the given field:
https://apidock.com/rails/ActiveRecord/QueryMethods/select
Your code is then converting that relation to an array, and taking the first object in that array, which is an instance of the Store class. If you want the ID, then try:
Store.select(:id).where(user_id:current_user.id).to_a.first.id
However, I think you're misunderstanding how to structure the queries. Put the where part first, and then find the ID of the first result:
Store.where(user_id: current_user.id).first.id
And if there is only 1 store, then:
Store.find_by(user_id: current_user.id).id
Or...
Store.find_by(user: current_user).id
or.....
current_user.store.id
(or current_user.stores.first.id if there are many)

How to use postgres AVG() function with order clause from Rails

I have a model Company and Company has many DailyData.
And DailyData has columns volume and date
To calculate average volume of recent 10 business days I wrote like:
class Array
def sum
inject(0) { |result, el| result + el }
end
def mean
sum.to_d / size
end
end
company = Company.first
company.daily_data.order(date: :desc).limit(10).pluck(:volume).mean
This code works fine, but I want to use postgres AVG() function.
company.daily_data.select('AVG(volume) as average_volume').order(date: :desc)
This code ends up with error:
PG::GroupingError: ERROR: column "daily_data.date" must appear in the GROUP BY clause or be used in an aggregate function
But If I put .group(:date) in method chain, the sql reurns multiple results.
How can I get recent 10 volumes average value by using postgresql AVG() function?
An ActiveRecord query like this:
company.daily_data.select('AVG(volume) as average_volume').order(date: :desc)
doesn't really make much sense. avg is an aggregate function in SQL so it needs to operate on a groups of rows. But you're not telling the database how to group the rows, you're telling the database to compute the average volume over the entire table and then order that single value by something that doesn't exist in the final result set.
Throwing a limit in:
company.daily_data
.select('AVG(volume) as average_volume')
.order(date: :desc)
.limit(10)
won't help either because the limit is applied after the order and by then you've already confused the database with your attempted avg(volume).
I'd probably use a derived table if I was doing this in SQL, something like:
select avg(volume) as average_volume
from (
select volume
from where_ever...
where what_ever...
order by date desc
limit 10
) dt
The derived table in the FROM clause finds the volumes that you want and then the overall query averages those 10 volumes.
Alternatively, you could use a subquery to grab the rows of interest:
select avg(volume) as average_volume
from where_ever...
where id in (
select id
from where_ever...
where what_ever...
order by date desc
limit 10
)
The subquery approach is fairly straight forward to implement with ActiveRecord, something like this:
ten_most_recent = company.daily_data.select(:id).order(:date => :desc).limit(10)
company.daily_data.where(:id => ten_most_recent).average(:volume)
If you throw a to_sql call on the end of the second line you should see something that looks like the subquery SQL.
You can also make the derived table approach work with ActiveRecord but it is a little less natural. There is a from method in ActiveRecord that will take an ActiveRecord query to build the from (select ...) derived table construction but you'll want to be sure to manually name the derived table:
ten_most_recent = company.daily_data.select(:volume).order(:date => :desc).limit(10)
AnyModelAtAll.from(ten_most_recent, 'dt').average('dt.volume')
You have to use a string argument to average and include the dt. prefix to keep ActiveRecord from trying to add its own table name.
Of course you'd hide all this stuff in a method somewhere so that you could hide the details. Perhaps an extension method on the daily_data association:
has_many :daily_data, ... do
def average_volume(days)
#...
end
end
so that you could say things like:
company.daily_data.average_volume(11)

Returning the first X records in a postgresql query with a unique field

Ok so I'm having a bit of a learning moment here and after figuring out A way to get this to work, I'm curious if anyone with a bit more postgres experience could help me figure out a way to do this without doing a whole lotta behind the scene rails stuff (or doing a single query for each item i'm trying to get)... now for an explaination:
Say I have 1000 records, we'll call them "Instances", in the database that have these fields:
id
user_id
other_id
I want to create a method that I can call that pulls in 10 instances that all have a unique other_id field, in plain english (I realize this won't work :) ):
Select * from instances where user_id = 3 and other_id is unique limit 10
So instead of pulling in an array of 10 instances where user_id is 3 and you can get multiple instances with the other_id is 5, I want to be able to run a map function on those 10 instances and get back something like [1,2,3,4,5,6,7,8,9,10].
In theory, I can probably do one of two things currently, though I'm trying to avoid them:
Store an array of id's and do individual calls making sure the next call says "not in this array". The problem here is I'm doing 10 individual db queries.
Pull in a large chunk of say, 50 instances and sorting through them in ruby-land to find 10 unique ones. This wouldn't allow me to take advantage of any optimizations already done in the database and I'd also run the risk of doing a query for 50 items that don't have 10 unique other_id's and I'd be stuck with those unless I did another query.
Anyways, hoping someone may be able to tell me I'm overlooking an easy option :) I know this is kind of optimizing before it's really needed but this function is going to be run over and over and over again so I figure it's not a waste of time right now.
For the record, I'm using Ruby 1.9.3, Rails 3.2.13, and Postgresql (Heroku)
Thanks!
EDIT: Just wanted to give an example of a function that technically DOES work (and is number 1 above)
def getInstances(limit, user)
out_of_instances = false
available = []
other_ids = [-1] # added -1 to avoid submitting a NULL query
until other_ids.length == limit || out_of_instances == true
instance = Instance.where("user_id IS ? AND other_id <> ALL (ARRAY[?])", user.id, other_ids).limit(1)
if instance != []
available << instance.first
other_ids << instance.first.other_id
else
out_of_instances = true
end
end
end
And you would run:
getInstances(10, current_user)
While this works, it's not ideal because it's leading to 10 separate queries every time it's called :(
In a single SQL query, it can be achieved easily with SELECT DISTINCT ON... which is a PostgreSQL-specific feature.
See http://www.postgresql.org/docs/current/static/sql-select.html
SELECT DISTINCT ON ( expression [, ...] ) keeps only the first row of
each set of rows where the given expressions evaluate to equal. The
DISTINCT ON expressions are interpreted using the same rules as for
ORDER BY (see above). Note that the "first row" of each set is
unpredictable unless ORDER BY is used to ensure that the desired row
appears first
With your example:
SELECT DISTINCT ON (other_id) *
FROM instances
WHERE user_id = 3
ORDER BY other_id LIMIT 10

Sql statement with multi ANDs querying the same column

I don't know if the title of the post is the appropriate. I have the following table
and an Array in php with some items, parsed_array. What I want to do is to find all the SupermarketIDs which have all the items of the parsed_array.
For example, if parsed_array contains [111,121,131] I want the result to be 21 which is the ID of the Supermarket that contains all these items.
I tried to do it like that:
$this->db->select('SupermarketID');
$this->db->from('productinsupermarket');
for ($i=0; $i<sizeof($parsed_array); $i++)
{
$this->db->where('ItemID', $parsed_array[$i]);
}
$query = $this->db->get();
return $query->result_array();
If there is only one item in the parsed_array the result is correct because the above is equal to
SELECT SupermarketID
FROM productinsupermarket
WHERE ItemID=parsed_array[0];
but if there are more than one items, lets say two, is equal to
SELECT SupermarketID
FROM productinsupermarket
WHERE ItemID=parsed_array[0]
AND ItemID=parsed_array[1];
which of course return an empty table. Any idea how can this be solved?
There are at least two ways of generating the result you want, either a self join (no fun to generate with a dynamic number of items) or using IN, GROUP BY and HAVING.
I can't really tell you how to generate it using CodeIgniter, I assume you're better at that than I am :)
SELECT SupermarketID
FROM productinsupermarket
WHERE ItemID IN (111,121,131) -- The 3 item id's you're looking for
GROUP BY SupermarketID
HAVING COUNT(ItemId) = 3; -- All 3 must match
An SQLfiddle to test with.
EDIT: As #ypercube mentions below, if the ItemId can show up more than once for a SupermarketID, you'll want to use COUNT(DISTINCT ItemId) to count only unique rows instead of counting every occurrence.
You can use where_in in codeigniter as below,
if(count($parsed_array) > 0)
{
$this->db->where_in('ItemID', $parsed_array);
}
Active record class in codeigniter
Try an IN clause or multiple ORs:
SELECT SupermarketID
FROM productinsupermarket
WHERE ItemID=parsed_array[0]
OR ItemID=parsed_array[1];

Select records with highest values for each subset

I have a set of records of which some, but not all, have a 'path' field, and all have a 'value' field. I wish to select only those which either do not have a path, or have the largest value of all the records with a particular path.
That is, given these records:
Name: Path: Value:
A foo 5
B foo 6
C NULL 2
D bar 2
E NULL 4
I want to return B, C, D, and E, but not A (because A has a path and it's path is the same as B, but A has a lower value).
How can I accomplish this, using ActiveRecord, ARel and Postgres? Ideally, I would like a solution which functions as a scope.
You could use something like this by using 2 subqueries (will do only one SQL query which has subqueries). Did not test, but should get you in the right direction. This is for Postgres.
scope :null_ids, -> { where(path: nil).select('id') }
scope :non_null_ids, -> { where('path IS NOT NULL').select('DISTINCT ON (path) id').order('path, value desc, id') }
scope :stuff, -> {
subquery = [null_ids, non_null_ids].map{|q| "(#{q.to_sql})"}.join(' UNION ')
where("#{table_name}.id IN (#{subquery})")
}
If you are using a different DB you might need to use group/order instead of distinct on for the non_nulls scope. If the query is running slow put an index on path and value.
You get only 1 query and it's a chainable scope.
A straightforward transliteration of your description to SQL would look like this:
select name, path, value
from (
select name, path, value,
row_number() over (partition by path order by value desc) as r
from your_table
where path is not null
) as dt
where r = 1
union all
select name, path, value
from your_table
where path is null
You could wrap that in a find_by_sql and get your objects out the other side.
That query works like this:
The row_number window function allows us to group the rows by path, order each group by value, and then number the rows in each group. Play around with the SQL a bit inside psql and you'll see how this works, there are other window functions available that will allow you to do all sorts of wonderful things.
You're treating NULL path values separately from non-NULL paths, hence the path is not null in the inner query.
We can peel off the first row in each of the path groups by selecting those rows from the derived table that have a row number of one (i.e. where r = 1).
The treatment of path is null rows is easily handled by the section query.
The UNION is used to join the result sets of the queries together.
I can't think of any way to construct such a query using ActiveRecord nor can I think of any way to integrate such a query with ActiveRecord's scope mechanism. If you could easily access just the WHERE component of an ActiveRecord::Relation then you could augment the where path is not null and where path is null components of that query with the WHERE components of a scope chain. I don't know how to do that though.
In truth, I tend to abandon ActiveRecord at the drop of a hat. I find ActiveRecord to be rather cumbersome for most of the complicated things I do and not nearly as expressive as SQL. This applies to every ORM I've ever used so the problem isn't specific to ActiveRecord.
I have no experience with ActiveRecord, but here's a sample with SQLAlchemy to silent the just-use-SQL crowd ;)
q1 = Session.query(Record).filter(Record.path != None)
q1 = q1.distinct(Record.path).order_by(Record.path, Record.value.desc())
q2 = Session.query(Record).filter(Record.path == None)
query = q1.from_self().union(q2)
# Further chaining, e.g. query = query.filter(Record.value > 3) to return B, E
for record in query:
print record.name