How to send a table names as parameters to a function which performs a join on them? - sql

Currently I used the following code for joining tables.
Booking.joins(:table1, :table2, :table3, :table4).other_queries
However, the number of tables to be joined with depends on certain conditions. The other_queries also form a very large chain. So, I am duplicating a lot of code just because I need to perform joins differently.
So, I want to implement something like this
def method(params)
Booking.joins(params).other_queries
end
How can this be done?

Maybe just Booking.joins(*params).other_queries is what you need?
Operator * transforms array into list of params, for example:
arr = [1,2,3]
any_method(*arr) # is equal to any_method(1,2,3)
However, if params is smth came from user I recommend you not to trust it, it probably could be security issue. But if you trust it or filter it - why not.

SAFE_JOINS = [:table1, :table2, :table3]
def method(params)
booking = Booking.scoped # or Booking.all if you are rails 5
(params[:joins] & SAFE_JOINS.map(&:to_s)).each do |j|
booking = booking.joins(j.intern)
end
end

Related

Django Q Queries & on the same field?

So here are my models:
class Event(models.Model):
user = models.ForeignKey(User, blank=True, null=True, db_index=True)
name = models.CharField(max_length = 200, db_index=True)
platform = models.CharField(choices = (("ios", "ios"), ("android", "android")), max_length=50)
class User(AbstractUser):
email = models.CharField(max_length=50, null=False, blank=False, unique=True)
Event is like an analytics event, so it's very possible that I could have multiple events for one user, some with platform=ios and some with platform=android, if a user has logged in on multiple devices. I want to query to see how many users have both ios and android devices. So I wrote a query like this:
User.objects.filter(Q(event__platform="ios") & Q(event__platform="android")).count()
Which returns 0 results. I know this isn't correct. I then thought I would try to just query for iOS users:
User.objects.filter(Q(event__platform="ios")).count()
Which returned 6,717,622 results, which is unexpected because I only have 39,294 users. I'm guessing it's not counting the Users, but counting the Event instances, which seems like incorrect behavior to me. Does anyone have any insights into this problem?
You can use annotations instead:
django.db.models import Count
User.objects.all().annotate(events_count=Count('event')).filter(events_count=2)
So it will filter out any user that has two events.
You can also use chained filters:
User.objects.filter(event__platform='android').filter(event__platform='ios')
Which first filter will get all users with android platform and the second one will get the users that also have iOS platform.
This is generally an answer for a queryset with two or more conditions related to children objects.
Solution: A simple solution with two subqueries is possible, even without any join:
base_subq = Event.objects.values('user_id').order_by().distinct()
user_qs = User.objects.filter(
Q(pk__in=base_subq.filter(platform="android")) &
Q(pk__in=base_subq.filter(platform="ios"))
)
The method .order_by() is important if the model Event has a default ordering (see it in the docs about distinct() method).
Notes:
Verify the only SQL request that will be executed: (Simplified by removing "app_" prefix.)
>>> print(str(user_qs.query))
SELECT user.id, user.email FROM user WHERE (
user.id IN (SELECT DISTINCT U0.user_id FROM event U0 WHERE U0.platform = 'android')
AND
user.id IN (SELECT DISTINCT U0.user_id FROM event U0 WHERE U0.platform = 'ios')
)
The function Q() is used because the same condition parameter (pk__in) can not be repeated in the same filter(), but also chained filters could be used instead: .filter(...).filter(...). (The order of filter conditions is not important and it is outweighed by preferences estimated by SQL server optimizer.)
The temporary variable base_subq is an "alias" queryset only to don't repeat the same part of expression that is never evaluated individually.
One join between User (parent) and Event (child) wouldn't be a problem and a solution with one subquery is also possible, but a join with Event and Event (a join with a repeated children object or with two children objects) should by avoided by a subquery in any case. Two subqueries are nice for readability to demonstrate the symmetry of the two filter conditions.
Another solution with two nested subqueries This non symmetric solution can be faster if we know that one subquery (that we put innermost) has a much more restrictive filter than another necessary subquery with a huge set of results. (example if a number of Android users would be huge)
ios_user_ids = (Event.objects.filter(platform="ios")
.values('user_id').order_by().distinct())
user_ids = (Event.objects.filter(platform="android", user_id__in=ios_user_ids)
.values('user_id').order_by().distinct())
user_qs = User.objects.filter(pk__in=user_ids)
Verify how it is compiled to SQL: (simplified again by removing app_ prefix and ".)
>>> print(str(user_qs.query))
SELECT user.id, user.email FROM user
WHERE user.id IN (
SELECT DISTINCT V0.user_id FROM event V0
WHERE V0.platform = 'ios' AND V0.user_id IN (
SELECT DISTINCT U0.user_id FROM event U0
WHERE U0.platform = 'android'
)
)
(These solutions work also in an old Django e.g. 1.8. A special subquery function Subquery() exists since Django 1.11 for more complicated cases, but we didn't need it for this simple question.)

Why does the where() method run SQL queries after all nested relations are eager-loaded?

In my controller method for the the index view I have the following line.
#students_instance = Student.includes(:memo_tests => {:memo_target => :memo_level})
So for each Student I eager-load all necessary info.
Later on in a .map block, I call the .where() method on one of the relations as shown below.
#all_students = #students_instance.map do |student|
...
last_pass = student.memo_tests.where(:result => true).last.created_at.utc
difference_in_weeks = ((last_pass.to_i - current_date.to_i) / 1.week).round
...
end
This leads to a single SQL query for each student. And since I have over 300+ students, leads to very slow load times and over 300+ SQL queries.
Am I right in thinking that this is caused by the .where() method. I think this because I have checked everything else and these are the two lines that cause all of the queries.
More importantly, is there a better way to do this that reduces these queries to a single query?
The moment you ask where, the statement is translated to a query. Normally, the result should be sql-cached...
Anyway, in order to be sure, you can instead add programming logic to your statement. That way, you are not requesting a NEW sql statement.
last_pass = student.memo_tests.map {|m| m.created_at if m.result}.compact.sort.last
EDIT
I see the OP's question does not require sorting... So, leaving the sorting out:
last_pass = student.memo_tests.map {|m| m.created_at if m.result}.compact.last
compact is required to remove nil results from the array.

Django ORM Cross Product

I have three models:
class Customer(models.Model):
pass
class IssueType(models.Model):
pass
class IssueTypeConfigPerCustomer(models.Model):
customer=models.ForeignKey(Customer)
issue_type=models.ForeignKey(IssueType)
class Meta:
unique_together=[('customer', 'issue_type')]
How can I find all tuples of (custmer, issue_type) where there is no IssueTypeConfigPerCustomer object?
I want to avoid a loop in Python. A solution which solves this in the DB would be preferred.
Background: for every customer and for every issue-type, there should be a config in the DB.
If you can afford to make one database trip for each issue type, try something like this untested snippet:
def lacking_configs():
for issue_type in IssueType.objects.all():
for customer in Customer.objects.filter(
issuetypeconfigpercustomer__issue_type=None
):
yield customer, issue_type
missing = list(lacking_configs())
This is probably OK unless you have a lot of issue types or if you are doing this several times per second, but you may also consider having a sensible default instead of making a config object mandatory for each combination of issue type and customer (IMHO it is a bit of a design-smell).
[update]
I updated the question: I want to avoid a loop in Python. A solution which solves this in the DB would be preferred.
In Django, every Queryset is either a list of Model instances or a dict (values querysets), so it is impossible to return the format you want (a list of tuples of Model) without some Python (and possibly multiple trips to the database).
The closest thing to a cross product would be using the "extra" method without a where parameter, but it involves raw SQL and knowing the underlying table name for the other model:
missing = Customer.objects.extra(
select={"issue_type_id": 'appname_issuetype.id'},
tables=['appname_issuetype']
)
As a result, each Customer object will have an extra attribute, "issue_type_id", containing the id of one IssueType. You can use the where parameter to filter based on NOT EXISTS (SELECT 1 FROM appname_issuetypeconfigpercustomer WHERE issuetype_id=appname_issuetype.id AND customer_id=appname_customer.id). Using the values method you can have something close to what you want - this is probably enough information to verify the rule and create the missing records. If you need other fields from IssueType just include them in the select argument.
In order to assemble a list of (Customer, IssueType) you need something like:
cross_product = [
(customer, IssueType.objects.get(pk=customer.issue_type_id))
for customer in
Customer.objects.extra(
select={"issue_type_id": 'appname_issuetype.id'},
tables=['appname_issuetype'],
where=["""
NOT EXISTS (
SELECT 1
FROM appname_issuetypeconfigpercustomer
WHERE issuetype_id=appname_issuetype.id
AND customer_id=appname_customer.id
)
"""]
)
]
Not only this requires the same number of trips to the database as the "generator" based version but IMHO it is also less portable, less readable and violates DRY. I guess you can lower the number of database queries to a couple using something like this:
missing = Customer.objects.extra(
select={"issue_type_id": 'appname_issuetype.id'},
tables=['appname_issuetype'],
where=["""
NOT EXISTS (
SELECT 1
FROM appname_issuetypeconfigpercustomer
WHERE issuetype_id=appname_issuetype.id
AND customer_id=appname_customer.id
)
"""]
)
issue_list = dict(
(issue.id, issue)
for issue in
IssueType.objects.filter(
pk__in=set(m.issue_type_id for m in missing)
)
)
cross_product = [(c, issue_list[c.issue_type_id]) for c in missing]
Bottom line: in the best case you make two queries at the cost of legibility and portability. Having sensible defaults is probably a better design compared to mandatory config for each combination of Customer and IssueType.
This is all untested, sorry if some homework was left for you.

Rails 3 Applying limit and offset to subquery

I have a query that goes something like this (in song.rb):
def self.new_songs
Song.where(id: Song.grouped_order_published).select_important_stuff
end
Later on in my app, it is then passed the limit and offset, lets say in the controller:
#songs = Song.new_songs.limit(10).offset(10)
The way my app is structured, I'd like to keep this method of setting things, but unfortunately it is really slow as it is limiting the outer query rather than the subquery.
Is there a way I can expose the subquery such that it receives the limit and offset rather than the outer query?
Edit: I should add I am using postgres 9.2.
Edit 2: The reason why I want to do it in this fashion is I am doing pagination and I need to get the "count" of the total number of rows. So I do something like this:
#songs = Song.new_songs
...
#pages = #songs.count / 10
...
render #songs.limit(params[:page]).offset(0)
If I were to change it somehow, I'd have to redo this entirely (which is in a ton of places). By not limiting it until it's actually called, I can do the count in between and then get just the page at the end. I guess I'm looking more for advice on how this can be done with the inner query, without becoming horribly slow as the database grows.
I could not try the solution and I am not a ruby expert either, but as far as I understand the problem you would need an object that passes all method-calls but limit and offset onto the full query and store the limited sub_query in the meantime.
It could probably look like this:
class LimitedSubquery < Object
# sub_query has to be stored so we can limit/offset it
def initialize(sub_query)
#sub_query = sub_query
end
# Make sure everybody knows we can be used like a query
def self.respond_to?(symbol, include_private=false)
super || full_query.respond_to?(symbol, include_private)
end
# Missing methods are probably meant to be called on the whole query
def self.method_missing(method_sym, *arguments, &block)
if full_query.respond_to?(method_sym)
full_query.send(method_sym, *arguments, &block)
else
super
end
end
# Generate the query for execution
def self.full_query
Song.where(id: #sub_query).select_important_stuff
end
# Apply limit to sub_query
def self.limit(*number)
LimitedSubquery.new(#sub_query.limit(*number))
end
# Apply offset to sub_query
def self.offset(*number)
LimitedSubquery.new(#sub_query.offset(*number))
end
end
And than call it like
def new_songs
LimitedSubquery.new(Song.grouped_order_published)
end
Please edit me if I got something wrong!
Regards
TC
You should consider using the will_paginate gem. This keeps you away form the hazzle to calculate all this by hand ;-)

Rails3 - Is there a way to do NOTLIKE?

I previously asked a question regarding pulling specific items out of a database if they contained a specific word in their string, someone kindly offered the following which did just the job:
def SomeModel < ActiveRecord::Base
scope :contains_city,
lambda { |city| where("some_models.address LIKE ?","%"+city+"%" ) }
end
However, I have some instances where I would like to do the opposite, i.e. pull out all the items which do not have the specified word in their string. Is there a way to do a NOT LIKE function? I have prevously seen people use '!=' for a NOT EQUALS, but have had no success along these lines for the LIKE function. Is there an equivalent or is it best to iterate through the database putting items in 2 separate databases based on whether they satisfy the LIKE condition?
You could try NOT LIKE in your query; MySQL supports this.
http://dev.mysql.com/doc/refman/5.0/en/string-comparison-functions.html