I am receiving a list of ID's. Most of these already exist in a table. I need to find which ID's are NOT in the table. This question has nothing to do with joins.
My API will receive a list of IDs, such as: [1, 2, 3, 4, 5]
Let's say there are three records in the table: [2, 3, 4]
The result I'm looking for is the array: [1, 5]
Our SQL brains jump quickly to something like the following, but clearly that's not what we need:
select * from widgets where id not in [list]
We don't need the records not in the list, we need the part of the list not in the records!
My fallback is to retrieve all records in the list and subtract from the list, something like this:
existing_ids = Widget.where(id: id_list).pluck(:id)
new_ids = id_list - existing_ids
That will work...but feels heavy-handed. Particularly if id_list has 100,000 records, and the table has 99,999 of those records.
I've searched around, and the only similar result is ID from list that is not in a table ... which did not find a viable solution.
Is there any way to do this in a single SQL query? (Bonus points for an ActiveRecord solution!)
To compare the lists to each other, either the input list needs to go into the database or the list of existing ids needs to come out of the database. The latter you already tried and didn't like, so here's an alternative
SELECT "id" FROM unnest('{1,2,3,4,5}'::integer[]) AS "id" WHERE "id" NOT IN (SELECT "id" FROM "widgets");
Not sure about performance.
Depending how many records are in your database, the simplest thing might just be to select all of the IDs and then drop the duplicates in Ruby.
from_api = [1,2,3,4,5]
existing = Widgets.pluck(:id) # => [2,3,4]
from_api.difference(existing) # => [1,5]
Obviously, if you have a substantial dataset, this will be less than optimal.
This should work.
from_api = [1,2,3,4,5]
existing = Widgets.order(:id).ids # => [2,3,4]
new_ids = []
from_api.each{ |n| new_ids << n unless existing.include? n }
new_ids # => [1,5]
or
from_api = [1,2,3,4,5]
existing = Widgets.order(:id).ids # => [2,3,4]
from_api.map{ |n| n == existing.first ? (nil if existing = existing.drop(1)) : n }.compac # => [1,5]
Balancing the complexity (to the current and future developers) of unset approach, I decided for my project that the simpler approach was warranted. While I didn't profile performance, I believe any gains would be minimal, if any.
Here is the solution I ended up with:
class Widget < ApplicationRecord
def self.absent(names)
uniq_names = names.uniq
uniq_names - where(name: uniq_names).pluck(:name)
end
end
And tests:
describe '.absent' do
subject { described_class.absent(names) }
let!(:widget1) { create(:widget, name: 'old-1') }
let!(:widget2) { create(:widget, name: 'old-2') }
let(:names) { %w[new-2 old-2 new-1 old-1 new-1 old-1] }
it { is_expected.to eq %w[new-2 new-1] }
end
I am building a rails application, and I need to create some charts.
I am running this query to retrieve the answers from the user:
quiz = Quiz.select("answer1").where(completed: true).pluck(:answer1)
And the query returns for me this: [1, 2, 1, 1, 1]
I want to count the values and group them like this: { 1 => 4, 2 => 1 }
I have tried to use group by and count but it is not working, I could do this manually but I wanted to use just SQL to achieve this.
I remember to use group by and count using sql, but I am not sure how to do this using rails.
You can group('answer1') as described here
Quiz.where(completed: true).group('answer1').count
Hope it helped!
Try this one
Quiz.where(completed: true).group(:answer1).count(:answer1)
BACKGROUND: Posts have many Communities through CommunityPosts. I understand the following query returns posts associated with ANY ONE of these community_ids.
Post.joins(:communities).where(communities: { id: [1,2,3] })
OBJECTIVE: I'd like to query for posts associated with ALL THREE community_ids in the array. Posts having communities 1, 2, and 3 as associations
EDIT: Please assume that length of the array is unknown. Used this array for explanation purposes.
Try this,
ids=[...]
Post.joins(:communities).select(“count(communities.id) AS cnt”).where(id: ids).group(‘post.id’).having(cnt: ids.size)
ids = [1, 2, 3] # and etc
Post.joins(:communities).where("communities.id IN ?", ids)
Wish it helps .
I have a table for users and roles. I'm using a has many through relationship. I am trying to create a query that will find users that have all of the roles in an array.
ex.
role_ids = [2, 4, 6]
User.filter(role_ids) would return all users that have roles with ids 2, 4, 6.
This is what I have so far.
def self.filter(role_ids)
results = User.joins(:roles).where(roles: {id: role_ids} )
end
The problem with this statement is it returns all users who have at least one of the roles in role_ids.
How do I make this statement give me an intersection, not a union?
I think you are asking for only unique instances of users who meet the role filter criteria. If so, then this should work.
def self.filter(role_ids)
results = User.joins(:roles).where(roles: {id: role_ids} ).uniq
end
The setup I have has publications, drafts, and live versions. Publication has a polymorphic belongs_to since many different types of objects can be drafted.
# Publication.all
Publication id: 1, publishable_id: 2, publishable_type: "Foo",
original_id: 1, original_type: "Foo"
# published scope on Foo
select('*, MAX(publications.created_at)').
joins(:publications).
group('publications.original_id')
# Foo.published.all
[<Foo id: 1, ...>]
Here is the published scope's to_sql:
SELECT *, MAX(publications.created_at)
FROM "foos"
INNER JOIN "publications"
ON "publications"."publishable_id" = "foos"."id"
AND "publications"."publishable_type" = 'Foo'
GROUP BY publications.original_id
Because there is only one publication with a publishable_id of 2, I expect this query to return the second Foo. But when I call the published scope on Foo, I instead get the first one. How is this possible? I thought that an INNER JOIN would limit the results to where the join condition is satisfied? How am I getting the complete opposite of what I'm looking for?
Something interesting: just performing the joins returns the correct result:
self.class.unscoped.joins(:publications)
However, the published scope (shown above) returns the incorrect result. Is something happening with the SELECT or GROUP BY parts of the query that is causing this?