activerecord equivalent to SQL 'minus' - sql

What's the rails way to subtract a query result from another? A database specific SQL example would be:
SELECT Date FROM Store_Information
MINUS
SELECT Date FROM Internet_Sales

I'll throw this into the mix - not a solution, but might help with the progress:
Best I can think of is to use NOT IN:
StoreInformation.where('date NOT IN (?)', InternetSale.all)
That's Rails 3 - Rails 2 would be:
StoreInformation.all(:conditions => ['date NOT IN(?)', InternetSale.all])
But both of these will first select everything from internet_sales; what you really want is a nested query to do the whole thing in the database engine. For this, I think you'll have to break into find_by_sql and just give a nested query.
Obviously this assumes you're using MySQL! HTH.

Late answer but I think you meant :
activities = Activity.all
searches = activites.where(key: 'search')
(activites - searches).each do |anything_but_search|
p anything_but_search
end
You can substract two ActiveRecordsRelation and get the MINUS result, just like SQL would.
I am using Rails 4.2 so anything beyond that version should do the trick.

Related

How to write an SQL NOT EXISTS query/scope in the Rails way?

I have a database scope to filter only latest ProxyConfig version for particular Proxy and environment.
This is the raw SQL that works very well with MySQL, PostgreSQL and Oracle:
class ProxyConfig < ApplicationRecord
...
scope :current_versions, -> do
where %(NOT EXISTS (
SELECT 1 FROM proxy_configs pc
WHERE proxy_configs.environment = environment
AND proxy_configs.proxy_id = proxy_id
AND proxy_configs.version < version
))
end
...
end
You can find a simple test case in my baby_squeel issue.
But I find it nicer not to use SQL directly. I have spent a lot of time trying out different approaches to write it in the Rails way to no avail. I found generic Rails and baby_squeel examples but they always involved different tables.
PS The previous version used joins but it was super slow and it messed up some queries. For example #count produced an SQL syntax error. So I'm not very open on using other approaches. Rather I prefer to know how to implement this query exactly. Although I'm at least curious to see other simple solutions.
PPS About the question that direct SQL is fine. In this case, mostly yes. Maybe all RDBMS can understand this quoting. If one needs to compare text fields though that requires special functions on Oracle. On Postgres the case-insensitive LIKE is ILIKE. It can be handled automatically by Arel. In raw SQL it would require different string for the different RDBMS.
This isn't actually a query that you can build with the ActiveRecord Query Interface alone. It can be done with a light sprinkling of Arel though:
class ProxyConfig < ApplicationRecord
def self.current_versions
pc = arel_table.alias("pc")
where(
unscoped.select(1)
.where(pc[:environment].eq(arel_table[:environment]))
.where(pc[:proxy_id].eq(arel_table[:proxy_id]))
.where(pc[:version].gt(arel_table[:version]))
.from(pc)
.arel.exists.not
)
end
end
The generated SQL isn't identical but I think it should be functionally equivilent.
SELECT "proxy_configs".* FROM "proxy_configs"
WHERE NOT (
EXISTS (
SELECT 1 FROM "proxy_configs" "pc"
WHERE "pc"."environment" = "proxy_configs"."environment"
AND "pc"."proxy_id" = "proxy_configs"."proxy_id"
AND "pc"."version" > "proxy_configs"."version"
)
)

ActiveRecord: can't use `pluck` after `where` clause with eager-loaded associations

I have an app that has a number of Post models, each of which belongs_to a User model. When these posts are published, a PublishedPost model is created that belongs_to the relevant Post model.
I'm trying to build an ActiveRecord query to find published posts that match a user name, then get the ids of those published posts, but I'm getting an error when I try to use the pluck method after eager-loading my associations and searching them with the where method.
Here's (part of) my controller:
class PublishedPostsController < ApplicationController
def index
ar_query = PublishedPost.order("published_posts.created_at DESC")
if params[:searchQuery].present?
search_query = params[:searchQuery]
ar_query = ar_query.includes(:post => :user)
.where("users.name like ?", "%#{search_query}%")
end
#found_ids = ar_query.pluck(:id)
...
end
end
When the pluck method is called, I get this:
ActiveRecord::StatementInvalid: Mysql2::Error: Unknown column 'users.name' in 'where clause': SELECT id FROM `published_posts` WHERE (users.name like '%Andrew%') ORDER BY published_posts.created_at DESC
I can get the results I'm looking for with
#found_ids = ar_query.select(:id).map{|r| r.id}
but I'd rather use pluck as it seems like the cleaner way to go. I can't figure out why it's not working, though. Any ideas?
You need to and should do joins instead of includes here.
The two functions are pretty similar except that the data from joins is not returned in the result of the query whereas the data in an includes is.
In that respect, includes and pluck are kind of antithetical. One says to return me all the data you possibly can, whereas the other says to only give me only this one little bit.
Since you only want a small amount of the data, you want to do joins. (Strangely select which also seems somewhat antithetical still works, but you would need to remove the ambiguity over id in this case.)
Try it out in the console and you'll see that includes causes a query that looks kind of like this: SELECT "posts"."id" as t0_ro, "posts"."text" as t0_r1, "users"."id" as t1_r0, "users"."name" as t1_r1 ... When you tack on a pluck statement all those crazy tx_ry columns go away and are replaced by whatever you specified.
I hope that helps, but if not maybe this RailsCast can. It is explained around the 5 minute mark.
http://railscasts.com/episodes/181-include-vs-joins
If you got here by searching "rails pluck ambiguous column", you may want to know you can just replace query.pluck(:id) with:
query.pluck("table_name.id")
Your query wouldn't work as it is written, even without the pluck call.
Reason being, your WHERE clause includes literal SQL referencing the users table which Rails doesn't notice and decides to use multiple queries and join in memory ( .preload() ) instead of joining in the database level ( .eager_load() ):
SELECT * from published_posts WHERE users.name like "pattern" ORDER BY published_posts.created_at DESC
SELECT * from posts WHERE id IN ( a_list_of_all_post_ids_in_publised_posts_returned_above )
SELECT * from users WHERE id IN ( a_list_of_all_user_ids_in_posts_returned_above )
The first of the 3 queries fails and it is the error you get.
To force Rails use a JOIN here, you should either use the explicit .eager_load() instead of .includes(), or add a .references() clause.
Other than that, what #Geoff answered stands, you don't really need to .includes() here, but rather a .joins().

How to find the record with the maximum price?

This returns the maxium value, not the complete record:
self.prices.maximum(:price_field)
And currently, I find the record like this:
def maximum_price
self.prices.find(:first, :conditions => "price = #{self.prices.maximum(:price_field)}" )
end
Is this the correct way ? Because the above needs two SQL statements to make it work, and it somehow does not feel right.
Ps. Additionally, I want that if more than one record has the same "maximum" value, then it should get the one with the latest updated_at value. So that would mean another SQL statement??
Pps. Does anyone know of a good or detailed reference for AREL and non-AREL things in Rails? The Rails Guide for ActiveRecord query is just not enough!
(I'm using Rails 3)
===UPDATE===
Using AREL I do the following:
self.prices.order("updated_at DESC").maximum(:price_field)
But this only gives the maximum value, not the complete record :(
Also, is the use of maximum() really AREL?
How about something like this?
self.prices.order("price DESC").first

Best way to join unique month and year from db in rails 3 ( or otherwise )

I am trying to figure out a nice way of doing this and thought maybe there is a nicer way in the newer Rails 3.0 ActiveRecord query.
I have a bunch of Posts that have a published_at field.
Now I want to present an Archive in the sidebar with all unique months and year that contains posts and display that archive. What's the best way to do this avoiding to heavy hits on the DB on every pageload? Suggestions?
You need a query along the lines of select distinct date_format(published_at, '%m %y'), count(id) from posts group by 1. It's a trivial matter to convert this to AR syntax.
RE: pageload
Run the query for the archive and cache the result using either query caching or fragment caching.

In which order Rails does the DB queries

In Select n objects randomly with condition in Rails Anurag kindly proposed this answer to randomly select n posts with votes >= x
Post.all(:conditions => ["votes >= ?", x], :order => "rand()", :limit => n)
my concern is that the number of posts that have more than x votes is very big.
what is the order the DB apply this criteria to the query?
Does it
(a) select n posts with votes > x and then randomises it? or
(b) select all posts with votes > x and then randomises and then selects the n first posts?
(c) other?
The recommendation to check the development log is very useful.
However, in this case, the randomisation is happening on the MySQL end, not inside Active Record. In order to see how the query is being run inside MySQL, you can copy the query from the log and paste it into your MySQL tool of choice (console, GUI, whatever) and add "EXPLAIN" to the front of it.
You should end up with something like:
EXPLAIN SELECT * FROM posts WHERE votes >= 'x' ORDER BY rand() LIMIT n
When I try a similar query in MySQL, I am told:
Select Type: SIMPLE
Using where; Using temporary; Using filesort
Then you should do a search for some of the excellent advice on SO on how to optimise MySQL queries. If there is an issue, adding an index on the votes column may improve performance. situation.
As Toby already pointed out, this is purely up to SQL server, everything being done in the query itself.
However, I am afraid that you can't get truly randomized output unless the database gets the whole resultset first, and then randomises it. Although, you should check the EXPLAIN result anyway.
Look in development.log for the generated query, should give you a clue.