this is the code from Ruby on Rails Tutorial by MH:
def feed
following_ids = "SELECT followed_id FROM relationships
WHERE follower_id = :user_id"
Micropost.where("user_id IN (#{following_ids})
OR user_id = :user_id", user_id: id)
end
Is this SQL safe? Because many people told me never use interpolation but use escaped code ever (with ? in this case). So is this code safe?
Yes, this is safe.
There is no interpolation, in fact: the whole query could be written as
Micropost.where("user_id IN (
SELECT followed_id FROM relationships
WHERE follower_id = :user_id)
OR user_id = :user_id", user_id: id)
but for the sake of clarity, the first query was extracted into it’s own variable.
Interpolation must be avoided when the interpolated string comes from the outside. This string is constructed by you, right here, hence there is no risk of SQL injection or like.
Examples
safe, id is determined:
id = 42
"SELECT * FROM users WHERE users.id = #{id}"
unsafe, params[:id] comes from the outside and might be dangerous:
"SELECT * FROM users WHERE users.id = #{params[:id]}"
This is safe because the string interpolation itself is not the issue. It only leads to a security vulnerability if you do not control the text that gets interpolated into the query.
In your example, the string following_ids that is inserted is not an unknown user input but a fixed SQL subquery. This cannot lead to a security issue.
But I agree that this is still not a good example and should probably be refactored to use scopes and Rails query syntax.
Related
I am building a Rails app with the following models:
# vote.rb
class Vote < ApplicationRecord
belongs_to :person
belongs_to :show
scope :fulfilled, -> { where(fulfilled: true) }
scope :unfulfilled, -> { where(fulfilled: false) }
end
# person.rb
class Person < ApplicationRecord
has_many :votes, dependent: :destroy
def self.order_by_votes(show = nil)
count = 'nullif(votes.fulfilled, true)'
count = "case when votes.show_id = #{show.id} AND NOT votes.fulfilled then 1 else null end" if show
people = left_joins(:votes).group(:id).uniq!(:group)
people = people.select("people.*, COUNT(#{count}) AS people.vote_count")
people.order('people.vote_count DESC')
end
end
The idea behind order_by_votes is to sort People by the number of unfulfilled votes, either counting all votes, or counting only votes associated with a given Show.
This seem to work fine when I test against SQLite. But when I switch to Postgres I get this error:
Error:
PeopleControllerIndexTest#test_should_get_previously_on_show:
ActiveRecord::StatementInvalid: PG::UndefinedColumn: ERROR: column people.vote_count does not exist
LINE 1: ...s"."show_id" = $1 GROUP BY "people"."id" ORDER BY people.vot...
^
If I dump the SQL using #people.to_sql, this is what I get:
SELECT people.*, COUNT(nullif(votes.fulfilled, true)) AS people.vote_count FROM "people" LEFT OUTER JOIN "votes" ON "votes"."person_id" = "people"."id" GROUP BY "people"."id" ORDER BY people.vote_count DESC
Why is this failing on Postgres but working on SQLite? And what should I be doing instead to make it work on Postgres?
(PS: I named the field people.vote_count, with a dot, so I can access it in my view without having to do another SQL query to actually view the vote count for each person in the view (not sure if this works) but I get the same error even if I name the field simply vote_count.)
(PS2: I recently added the .uniq!(:group) because of some deprecation warning for Rails 6.2, but I couldn't find any documentation for it so I am not sure I am doing it right, still the error is there without that part.)
Are you sure you're not getting a syntax error from PostgreSQL somewhere? If you do something like this:
select count(*) as t.vote_count from t ... order by t.vote_count
I get a syntax error before PostgreSQL gets to complain about there being no t.vote_count column.
No matter, the solution is to not try to put your vote_count in the people table:
people = people.select("people.*, COUNT(#{count}) AS vote_count")
...
people.order(vote_count: :desc)
You don't need it there, you'll still be able to reference the vote_count just like any "normal" column in people. Anything in the select list will appear as an accessor in the resultant model instances whether they're columns or not, they won't show up in the #inspect output (since that's generated based on the table's columns) but you call the accessor methods nonetheless.
Historically there have been quite a few AR problems (and bugs) in getting the right count by just using count on a scope, and I am not sure they are actually all gone.
That depends on the scope (AR version, relations, group, sort, uniq, etc). A defaut count call that a gem has to generically use on a scope is not a one-fit-all solution. For that known reason Pagy allows you to pass the right count to its pagy method as explained in the Pagy documentation.
Your scope might become complex and the default pagy collection.count(:all) may not get the actual count. In that case you can get the right count with some custom statement, and pass it to pagy.
#pagy, #records = pagy(collection, count: your_count)
Notice: pagy will efficiently skip its internal count query and will just use the passed :count variable.
So... just get your own calculated count and pass it to pagy, and it will not even try to use the default.
EDIT: I forgot to mention: you may want to try the pagy arel extra that:
adds specialized pagination for collections from sql databases with GROUP BY clauses, by computing the total number of results with COUNT(*) OVER ().
Thanks to all the comments and answers I have finally found a solution which I think is the best way to solve this.
First of, the issue occurred when I called pagy which tried to count my scope by appending .count(:all). This is what caused the errors. The solution was to not create a "field" in select() and use it in .order().
So here is the proper code:
def self.order_by_votes(show = nil)
count = if show
"case when votes.show_id = #{show.id} AND NOT votes.fulfilled then 1 else null end"
else
'nullif(votes.fulfilled, true)'
end
left_joins(:votes).group(:id)
.uniq!(:group)
.select("people.*, COUNT(#{count}) as vote_count")
.order(Arel.sql("COUNT(#{count}) DESC"))
end
This sorts the number of people on the number of unfulfilled votes for them, with the ability to count only votes for a given show, and it works with pagy(), and pagy_arel() which in my case is a much better fit, so the results can be properly paginated.
So here are my models:
class Event(models.Model):
user = models.ForeignKey(User, blank=True, null=True, db_index=True)
name = models.CharField(max_length = 200, db_index=True)
platform = models.CharField(choices = (("ios", "ios"), ("android", "android")), max_length=50)
class User(AbstractUser):
email = models.CharField(max_length=50, null=False, blank=False, unique=True)
Event is like an analytics event, so it's very possible that I could have multiple events for one user, some with platform=ios and some with platform=android, if a user has logged in on multiple devices. I want to query to see how many users have both ios and android devices. So I wrote a query like this:
User.objects.filter(Q(event__platform="ios") & Q(event__platform="android")).count()
Which returns 0 results. I know this isn't correct. I then thought I would try to just query for iOS users:
User.objects.filter(Q(event__platform="ios")).count()
Which returned 6,717,622 results, which is unexpected because I only have 39,294 users. I'm guessing it's not counting the Users, but counting the Event instances, which seems like incorrect behavior to me. Does anyone have any insights into this problem?
You can use annotations instead:
django.db.models import Count
User.objects.all().annotate(events_count=Count('event')).filter(events_count=2)
So it will filter out any user that has two events.
You can also use chained filters:
User.objects.filter(event__platform='android').filter(event__platform='ios')
Which first filter will get all users with android platform and the second one will get the users that also have iOS platform.
This is generally an answer for a queryset with two or more conditions related to children objects.
Solution: A simple solution with two subqueries is possible, even without any join:
base_subq = Event.objects.values('user_id').order_by().distinct()
user_qs = User.objects.filter(
Q(pk__in=base_subq.filter(platform="android")) &
Q(pk__in=base_subq.filter(platform="ios"))
)
The method .order_by() is important if the model Event has a default ordering (see it in the docs about distinct() method).
Notes:
Verify the only SQL request that will be executed: (Simplified by removing "app_" prefix.)
>>> print(str(user_qs.query))
SELECT user.id, user.email FROM user WHERE (
user.id IN (SELECT DISTINCT U0.user_id FROM event U0 WHERE U0.platform = 'android')
AND
user.id IN (SELECT DISTINCT U0.user_id FROM event U0 WHERE U0.platform = 'ios')
)
The function Q() is used because the same condition parameter (pk__in) can not be repeated in the same filter(), but also chained filters could be used instead: .filter(...).filter(...). (The order of filter conditions is not important and it is outweighed by preferences estimated by SQL server optimizer.)
The temporary variable base_subq is an "alias" queryset only to don't repeat the same part of expression that is never evaluated individually.
One join between User (parent) and Event (child) wouldn't be a problem and a solution with one subquery is also possible, but a join with Event and Event (a join with a repeated children object or with two children objects) should by avoided by a subquery in any case. Two subqueries are nice for readability to demonstrate the symmetry of the two filter conditions.
Another solution with two nested subqueries This non symmetric solution can be faster if we know that one subquery (that we put innermost) has a much more restrictive filter than another necessary subquery with a huge set of results. (example if a number of Android users would be huge)
ios_user_ids = (Event.objects.filter(platform="ios")
.values('user_id').order_by().distinct())
user_ids = (Event.objects.filter(platform="android", user_id__in=ios_user_ids)
.values('user_id').order_by().distinct())
user_qs = User.objects.filter(pk__in=user_ids)
Verify how it is compiled to SQL: (simplified again by removing app_ prefix and ".)
>>> print(str(user_qs.query))
SELECT user.id, user.email FROM user
WHERE user.id IN (
SELECT DISTINCT V0.user_id FROM event V0
WHERE V0.platform = 'ios' AND V0.user_id IN (
SELECT DISTINCT U0.user_id FROM event U0
WHERE U0.platform = 'android'
)
)
(These solutions work also in an old Django e.g. 1.8. A special subquery function Subquery() exists since Django 1.11 for more complicated cases, but we didn't need it for this simple question.)
I have updated this question
I have the following SQL scope in a RAILS 4 app, it works, but has a couple of issues.
1) Its really RAW SQL and not the rails way
2) The string interpolation opens up risks with SQL injection
here is what I have:
scope :not_complete -> (user_id) { joins("WHERE id NOT IN
(SELECT modyule_id FROM completions WHERE user_id = #{user_id})")}
The relationship is many to many, using a join table called completions for matching id(s) on relationships between users and modyules.
any help with making this Rails(y) and how to set this up to take the arg of user_id with out the risk, so I can call it like:
Modyule.not_complete("1")
Thanks!
You should have added few info about the models and their assocciation, anyways here's my trial, might have some errors because I don't know if the assocciation is one to many or many to many.
scope :not_complete, lambda do |user_id|
joins(:completion).where.not( # or :completions ?
id: Completion.where(user_id: user_id).pluck(modyule_id)
)
end
PS: I turned it into multi line just for readability, you can change it back to a oneline if you like.
I have this use case where I get the symbolized deep associations from a certain model, and I have to perform certain queries that involve using outer joins. How can one do it WITHOUT resorting to write the full SQL by hand?
Answers I don't want:
- using includes (doesn't solve deep associations very well ( .includes(:cars => [:windows, :engine => [:ignition]..... works unexpectedly ) and I don't want its side-effects)
- writing the SQL myself (sorry, it's 2013, cross-db support, etc etc..., and the objects I fetch are read_only, more side-effects)
I'd like to have an Arel solution. I know that using the arel_table's from the models I can construct SQL expressions, there's also a DSL for the joins, but somehow i cannot use it in the joins method from the model:
car = Car.arel_table
engine = Engine.arel_table
eng_exp = car.join(engine).on(car[:engine_id].eq(engine[:id]))
eng_exp.to_sql #=> GOOD! very nice!
Car.joins(eng_exp) #=> Breaks!!
Why this doesn't work is beyond me. I don't know exactly what is missing. But it's the closest thing to a solution I have now. If somebody could help me completing my example or provide me with a nice work-around or tell me when will Rails include such an obviously necessary feature will have my everlasting gratitude.
This is an old question, but for the benefit of anyone finding it through search engines:
If you want something you can pass into .joins, you can either use .create_join and .create_on:
join_on = car.create_on(car[:engine_id].eq(engine[:id]))
eng_join = car.create_join(engine, join_on, Arel::Nodes::OuterJoin)
Car.joins(eng_join)
OR
use the .join_sources from your constructed join object:
eng_exp = car.join(engine, Arel::Nodes::OuterJoin).on(car[:engine_id].eq(engine[:id]))
Car.joins(eng_exp.join_sources)
I found a blog post that purports to address this problem: http://blog.donwilson.net/2011/11/constructing-a-less-than-simple-query-with-rails-and-arel/
Based on this (and my own testing), the following should work for your situation:
car = Car.arel_table
engine = Engine.arel_table
sql = car.project(car[Arel.star])
.join(engine, Arel::Nodes::OuterJoin).on(car[:engine_id].eq(engine[:id]))
Car.find_by_sql(sql)
If you don't mind adding a dependency and skipping AREL altogether, you could use Ernie Miller's excellent Squeel gem. It would be something like
Car.joins{engine.outer}.where(...)
This would require that the Car model be associated with Engine like so:
belongs_to :engine
SQL:
select * from user where room_id not in (select id from rooms);
what is the same equivalent query in rails console with this sql?
ex:
User.all.each { |u| user.room }
(sorry, but this example is not correct.)
You can translate it almost literally:
User.where('room_id not in (select id from rooms)').all
The where clause is quite flexible in what it accepts.
User.where("room_id not in (select id from rooms)")
but you want this since it would be rather faster:
User.where("not exist (select id from rooms where id=users.room_id)")
that's the closest you can get. There appears to be no way to create an Active Record query that translates to SQL NOT(). A search on the subject returns a bunch of SO questions with much the same answer.
You could do something like
User.all.select { |u| !Room.find_by_id(u.room_id) }
But that could be less efficient again.
I don't know if you are familiar with the squeel gem. It allows allows you to build very complex SQL-queries in a pure Ruby code. In your case it should be as simple as the following code (afer adding the gem 'squeel' line in your Gemfile and running bundle install):
room_ids = Room.select{id}; User.where{room_id.not_in room_ids}
Worth trying, isn't it?
Here's the squeel's page.