Django Q Queries & on the same field? - sql

So here are my models:
class Event(models.Model):
user = models.ForeignKey(User, blank=True, null=True, db_index=True)
name = models.CharField(max_length = 200, db_index=True)
platform = models.CharField(choices = (("ios", "ios"), ("android", "android")), max_length=50)
class User(AbstractUser):
email = models.CharField(max_length=50, null=False, blank=False, unique=True)
Event is like an analytics event, so it's very possible that I could have multiple events for one user, some with platform=ios and some with platform=android, if a user has logged in on multiple devices. I want to query to see how many users have both ios and android devices. So I wrote a query like this:
User.objects.filter(Q(event__platform="ios") & Q(event__platform="android")).count()
Which returns 0 results. I know this isn't correct. I then thought I would try to just query for iOS users:
User.objects.filter(Q(event__platform="ios")).count()
Which returned 6,717,622 results, which is unexpected because I only have 39,294 users. I'm guessing it's not counting the Users, but counting the Event instances, which seems like incorrect behavior to me. Does anyone have any insights into this problem?

You can use annotations instead:
django.db.models import Count
User.objects.all().annotate(events_count=Count('event')).filter(events_count=2)
So it will filter out any user that has two events.
You can also use chained filters:
User.objects.filter(event__platform='android').filter(event__platform='ios')
Which first filter will get all users with android platform and the second one will get the users that also have iOS platform.

This is generally an answer for a queryset with two or more conditions related to children objects.
Solution: A simple solution with two subqueries is possible, even without any join:
base_subq = Event.objects.values('user_id').order_by().distinct()
user_qs = User.objects.filter(
Q(pk__in=base_subq.filter(platform="android")) &
Q(pk__in=base_subq.filter(platform="ios"))
)
The method .order_by() is important if the model Event has a default ordering (see it in the docs about distinct() method).
Notes:
Verify the only SQL request that will be executed: (Simplified by removing "app_" prefix.)
>>> print(str(user_qs.query))
SELECT user.id, user.email FROM user WHERE (
user.id IN (SELECT DISTINCT U0.user_id FROM event U0 WHERE U0.platform = 'android')
AND
user.id IN (SELECT DISTINCT U0.user_id FROM event U0 WHERE U0.platform = 'ios')
)
The function Q() is used because the same condition parameter (pk__in) can not be repeated in the same filter(), but also chained filters could be used instead: .filter(...).filter(...). (The order of filter conditions is not important and it is outweighed by preferences estimated by SQL server optimizer.)
The temporary variable base_subq is an "alias" queryset only to don't repeat the same part of expression that is never evaluated individually.
One join between User (parent) and Event (child) wouldn't be a problem and a solution with one subquery is also possible, but a join with Event and Event (a join with a repeated children object or with two children objects) should by avoided by a subquery in any case. Two subqueries are nice for readability to demonstrate the symmetry of the two filter conditions.
Another solution with two nested subqueries This non symmetric solution can be faster if we know that one subquery (that we put innermost) has a much more restrictive filter than another necessary subquery with a huge set of results. (example if a number of Android users would be huge)
ios_user_ids = (Event.objects.filter(platform="ios")
.values('user_id').order_by().distinct())
user_ids = (Event.objects.filter(platform="android", user_id__in=ios_user_ids)
.values('user_id').order_by().distinct())
user_qs = User.objects.filter(pk__in=user_ids)
Verify how it is compiled to SQL: (simplified again by removing app_ prefix and ".)
>>> print(str(user_qs.query))
SELECT user.id, user.email FROM user
WHERE user.id IN (
SELECT DISTINCT V0.user_id FROM event V0
WHERE V0.platform = 'ios' AND V0.user_id IN (
SELECT DISTINCT U0.user_id FROM event U0
WHERE U0.platform = 'android'
)
)
(These solutions work also in an old Django e.g. 1.8. A special subquery function Subquery() exists since Django 1.11 for more complicated cases, but we didn't need it for this simple question.)

Related

PyPika control order of with clauses

I am using PyPika (version 0.37.6) to create queries to be used in BigQuery. I am building up a query that has two WITH clauses, and one clause is dependent on the other. Due to the dynamic nature of my application, I do not have control over the order in which those WITH clauses are added to the query.
Here is example working code:
a_alias = AliasedQuery("a")
b_alias = AliasedQuery("b")
a_subq = Query.select(Term.wrap_constant("1").as_("z")).select(Term.wrap_constant("2").as_("y"))
b_subq = Query.from_(a_alias).select("z")
q = Query.with_(a_subq, "a").from_(a_alias).select(a_alias.y)
q = q.with_(b_subq, "b").from_(b_alias).select(b_alias.z)
sql = q.get_sql(quote_char=None)
That generates a working query:
WITH a AS (SELECT '1' z,'2' y) ,b AS (SELECT a.z FROM a) SELECT a.y,b.z FROM a,b
However, if I add the b WITH clause first, then since a is not yet defined, the resulting query:
WITH b AS (SELECT a.z FROM a), a AS (SELECT '1' z,'2' y) SELECT a.y,b.z FROM a,b
does not work. Since BigQuery does not support WITH RECURSIVE, that is not an option for me.
Is there any way to control the order of the WITH clauses? I see the _with list in the QueryBuilder (the type of variable q), but since that's a private variable, I don't want to rely on that, especially as new versions of PyPika may not operate the same way.
One way I tried to do this is to always insert the first WITH clause at the beginning of the _with list, like this:
q._with.insert(0, q._with.pop())
Although this works, I'd like to use a PyPika supported way to do that.
In a related question, is there a supported way within PyPika to see what has already been added to the select list or other parts of the query? I noticed the q.selects member variable, but selects is not part of the public documentation. Using q.selects did not actually work for me when using our project's Python version (3.6) even though it did work in Python 3.7. The code I was trying to use is:
if any(field.name == "date" for field in q.selects if isinstance(field, Field))
The error I got was as follows:
def __getitem__(self, item: slice) -> "BetweenCriterion":
if not isinstance(item, slice):
> raise TypeError("Field' object is not subscriptable")
Thank you in advance for your help.
I could not figure out how to control the order of the WITH clauses after calling query.with_() (except for the hack already noted). As a result, I restructured my application to get around this problem. I am now calling query.with_() before building up the rest of the query.
This also made my related question moot, because I no longer need to see what I've already added to the query.

How to send a table names as parameters to a function which performs a join on them?

Currently I used the following code for joining tables.
Booking.joins(:table1, :table2, :table3, :table4).other_queries
However, the number of tables to be joined with depends on certain conditions. The other_queries also form a very large chain. So, I am duplicating a lot of code just because I need to perform joins differently.
So, I want to implement something like this
def method(params)
Booking.joins(params).other_queries
end
How can this be done?
Maybe just Booking.joins(*params).other_queries is what you need?
Operator * transforms array into list of params, for example:
arr = [1,2,3]
any_method(*arr) # is equal to any_method(1,2,3)
However, if params is smth came from user I recommend you not to trust it, it probably could be security issue. But if you trust it or filter it - why not.
SAFE_JOINS = [:table1, :table2, :table3]
def method(params)
booking = Booking.scoped # or Booking.all if you are rails 5
(params[:joins] & SAFE_JOINS.map(&:to_s)).each do |j|
booking = booking.joins(j.intern)
end
end

Trying to refactor Rails 4.2 scope

I have updated this question
I have the following SQL scope in a RAILS 4 app, it works, but has a couple of issues.
1) Its really RAW SQL and not the rails way
2) The string interpolation opens up risks with SQL injection
here is what I have:
scope :not_complete -> (user_id) { joins("WHERE id NOT IN
(SELECT modyule_id FROM completions WHERE user_id = #{user_id})")}
The relationship is many to many, using a join table called completions for matching id(s) on relationships between users and modyules.
any help with making this Rails(y) and how to set this up to take the arg of user_id with out the risk, so I can call it like:
Modyule.not_complete("1")
Thanks!
You should have added few info about the models and their assocciation, anyways here's my trial, might have some errors because I don't know if the assocciation is one to many or many to many.
scope :not_complete, lambda do |user_id|
joins(:completion).where.not( # or :completions ?
id: Completion.where(user_id: user_id).pluck(modyule_id)
)
end
PS: I turned it into multi line just for readability, you can change it back to a oneline if you like.

Django ORM Cross Product

I have three models:
class Customer(models.Model):
pass
class IssueType(models.Model):
pass
class IssueTypeConfigPerCustomer(models.Model):
customer=models.ForeignKey(Customer)
issue_type=models.ForeignKey(IssueType)
class Meta:
unique_together=[('customer', 'issue_type')]
How can I find all tuples of (custmer, issue_type) where there is no IssueTypeConfigPerCustomer object?
I want to avoid a loop in Python. A solution which solves this in the DB would be preferred.
Background: for every customer and for every issue-type, there should be a config in the DB.
If you can afford to make one database trip for each issue type, try something like this untested snippet:
def lacking_configs():
for issue_type in IssueType.objects.all():
for customer in Customer.objects.filter(
issuetypeconfigpercustomer__issue_type=None
):
yield customer, issue_type
missing = list(lacking_configs())
This is probably OK unless you have a lot of issue types or if you are doing this several times per second, but you may also consider having a sensible default instead of making a config object mandatory for each combination of issue type and customer (IMHO it is a bit of a design-smell).
[update]
I updated the question: I want to avoid a loop in Python. A solution which solves this in the DB would be preferred.
In Django, every Queryset is either a list of Model instances or a dict (values querysets), so it is impossible to return the format you want (a list of tuples of Model) without some Python (and possibly multiple trips to the database).
The closest thing to a cross product would be using the "extra" method without a where parameter, but it involves raw SQL and knowing the underlying table name for the other model:
missing = Customer.objects.extra(
select={"issue_type_id": 'appname_issuetype.id'},
tables=['appname_issuetype']
)
As a result, each Customer object will have an extra attribute, "issue_type_id", containing the id of one IssueType. You can use the where parameter to filter based on NOT EXISTS (SELECT 1 FROM appname_issuetypeconfigpercustomer WHERE issuetype_id=appname_issuetype.id AND customer_id=appname_customer.id). Using the values method you can have something close to what you want - this is probably enough information to verify the rule and create the missing records. If you need other fields from IssueType just include them in the select argument.
In order to assemble a list of (Customer, IssueType) you need something like:
cross_product = [
(customer, IssueType.objects.get(pk=customer.issue_type_id))
for customer in
Customer.objects.extra(
select={"issue_type_id": 'appname_issuetype.id'},
tables=['appname_issuetype'],
where=["""
NOT EXISTS (
SELECT 1
FROM appname_issuetypeconfigpercustomer
WHERE issuetype_id=appname_issuetype.id
AND customer_id=appname_customer.id
)
"""]
)
]
Not only this requires the same number of trips to the database as the "generator" based version but IMHO it is also less portable, less readable and violates DRY. I guess you can lower the number of database queries to a couple using something like this:
missing = Customer.objects.extra(
select={"issue_type_id": 'appname_issuetype.id'},
tables=['appname_issuetype'],
where=["""
NOT EXISTS (
SELECT 1
FROM appname_issuetypeconfigpercustomer
WHERE issuetype_id=appname_issuetype.id
AND customer_id=appname_customer.id
)
"""]
)
issue_list = dict(
(issue.id, issue)
for issue in
IssueType.objects.filter(
pk__in=set(m.issue_type_id for m in missing)
)
)
cross_product = [(c, issue_list[c.issue_type_id]) for c in missing]
Bottom line: in the best case you make two queries at the cost of legibility and portability. Having sensible defaults is probably a better design compared to mandatory config for each combination of Customer and IssueType.
This is all untested, sorry if some homework was left for you.

ActiveRecord .select(): Possible to clear old selects?

Is there a way to clear old selects in a .select("table.col1, ...") statement?
Background:
I have a scope that requests accessible items for a given user id (simplified)
scope :accessible, lambda { |user_id|
joins(:users).select("items.*")
.where("items_users.user_id = ?) OR items.created_by = ?", user_id, user_id)
}
Then for example in the index action i only need the item id and title, so i would do this:
#items = Item.accessible(#auth.id).select("polls.id, polls.title")
However, this will select the columns "items., items.id, items.title". I'd like to avoid removing the select from the scope, since then i'd have to add a select("items.") everywhere else. Am I right to assume that there is no way to do this, and i either live with fetching too many fields or have to use multiple scopes?
Fortunately you're wrong :D, you can use the #except method to remove some parts of the query made by the relation, so if you want to remove the SELECT part just do :
#items = Item.accessible(#auth.id).except(:select).select("polls.id, polls.title")
reselect (Rails 6+)
Rails 6 introduced a new method called reselect, which does exactly what you need, it replaces previously set select statement.
So now, your query can be written even shorter:
#items = Item.accessible(#auth.id).reselect("polls.id, polls.title")