Django ORM filter multiple fields using 'IN' statement - sql

So I have the following model in Django:
class MemberLoyalty(models.Model):
date_time = models.DateField(primary_key=True)
member = models.ForeignKey(Member, models.DO_NOTHING)
loyalty_value = models.IntegerField()
My goal is to have all the tuples grouped by the member with the most recent date. There are many ways to do it, one of them is using a subquery that groups by the member with max date_time and filtering member_loyalty with its results. The working sql for this solution is as follows:
SELECT
*
FROM
member_loyalty
WHERE
(date_time , member_id) IN (SELECT
max(date_time), member_id
FROM
member_loyalty
GROUP BY member_id);
Another way to do this would be by joining with the subquery.
How could i translate this on a django query? I could not find a way to filter with two fields using IN, nor a way to join with a subquery using a specific ON statement.
I've tried:
cls.objects.values('member_id', 'loyalty_value').annotate(latest_date=Max('date_time'))
But it starts grouping by the loyalty_value.
Also tried building the subquery, but cant find how to join it or use it on a filter:
subquery = cls.objects.values('member_id').annotate(max_date=Max('date_time'))
Also, I am using Mysql so I can not make use of the .distinct('param') method.

This is a typical greatest-per-group query. Stack-overflow even has a tag for it.
I believe the most efficient way to do it with the recent versions of Django is via a window query. Something along the lines should do the trick.
MemberLoyalty.objects.all().annotate(my_max=Window(
expression=Max('date_time'),
partition_by=F('member')
)).filter(my_max=F('date_time'))
Update: This actually won't work, because Window annotations are not filterable. I think in order to filter on window annotation you need to wrap it inside a Subquery, but with Subquery you are actually not obligated to use a Window function, there is another way to do it, which is my next example.
If either MySQL or Django does not support window queries, then a Subquery comes into play.
MemberLoyalty.objects.filter(
date_time=Subquery(
(MemberLoyalty.objects
.filter(member=OuterRef('member'))
.values('member')
.annotate(max_date=Max('date_time'))
.values('max_date')[:1]
)
)
)
If event Subqueries are not available (pre Django 1.11) then this should also work:
MemberLoyalty.objects.annotate(
max_date=Max('member__memberloyalty_set__date_time')
).filter(max_date=F('date_time'))

Related

How to write Window functions using Druid?

For example, i wanted to write Window functions like sum over (window)
Since over clause is not supported by Druid, how do i achieve the same using Druid Native query API or SQL API?
You should use a GroupBy Query. As Druid is a time series database, you have to specify your interval (window) where you want to query data from. You can use aggregation methods over this data, for example a SUM() aggregation.
If you want, you can also do extra filtering within your aggregation, like "only sum records where city=paris"). You could also apply the SUM aggregation only to records which exists in a certain time window within your selected interval.
If you are a PHP user then maybe this package is handy for you: https://github.com/level23/druid-client#sum
We have tried to implement an easy way to query such data.

Path references in QueryDSL join to subquery

I'd like to know the best way to implement this query in SQL-style QueryDSL which joins to a subquery. I struggled a bit, but got it to generate the necessary SQL. I'm wondering if there are any simplifications/improvements, however, particularly related to the three "paths" I had to create? For example, would be great to define latestCaseId in terms of latestSubQuery.
In the simplified form of my actual query below, I am finding the set of records (fields spread across ucm and pcm) which have the latest timestamp per case group. The subquery identifies the latest timestamp per group so that we can filter the outer query by it.
final SimplePath<ListSubQuery> latestSubQueryPath = Expressions.path(ListSubQuery.class, "latest");
final SimplePath<Timestamp> caseLatestMentioned = Expressions.path(Timestamp.class, "caseLatestMentioned");
final SimplePath<Integer> latestCaseId = Expressions.path(Integer.class, "latest.caseId");
final ListSubQuery<Tuple> latest = new SQLSubQuery()
.from(ucm2)
.innerJoin(pcm2).on(ucm2.id.eq(pcm2.id))
.groupBy(pcm2.caseId)
.list(pcm2.caseId.as(latestCaseId), ucm2.lastExtracted.max().as(caseLatestMentioned));
q.from(ucm)
.join(pcm).on(ucm.id.eq(pcm.id))
.innerJoin(latest, latestSubQueryPath).on(pcm.caseId.eq(latestCaseId))
.where(ucm.lastExtracted.eq(caseLatestMentioned));
I believe you could use the .get(<various Path impls>) method of PathBuilder. The way I like to think of it is that creating final PathBuilder<Tuple> latestSubQueryPath = new PathBuilder<>(Tuple.class, "latest") and joining to it .innerJoin(latest, latestSubQueryPath) is creating an alias for the subquery. Then you can use .get(<various Path impls>) to access the fields as follows:
q.from(ucm)
.join(pcm).on(ucm.id.eq(pcm.id))
.innerJoin(latest, latestSubQueryPath).on(pcm.caseId.eq(latestSubQueryPath.get(pcm2.caseId)))
.where(ucm.lastExtracted.eq(latestSubQueryPath.get(maxLastExtractedDate)));
I've not run the code but hopefully this is in the right direction. If not, I'll have a look tomorrow when I have the relevant codebase to hand.
Update: As mentioned in the comments, ucm2.lastExtracted.max() requires an alias. I've called it maxLastExtractedDate and assume it's used to alias ucm2.lastExtracted.max() when creating the subquery.

How to simulate ActiveRecord Model.count.to_sql

I want to display the SQL used in a count. However, Model.count.to_sql will not work because count returns a FixNum that doesn't have a to_sql method. I think the simplest solution is to do this:
Model.where(nil).to_sql.sub(/SELECT.*FROM/, "SELECT COUNT(*) FROM")
This creates the same SQL as is used in Model.count, but is it going to cause a problem further down the line? For example, if I add a complicated where clause and some joins.
Is there a better way of doing this?
You can try
Model.select("count(*) as model_count").to_sql
You may want to dip into Arel:
Model.select(Arel.star.count).to_sql
ASIDE:
I find I often want to find sub counts, so I embed the count(*) into another query:
child_counts = ChildModel.select(Arel.star.count)
.where(Model.arel_attribute(:id).eq(
ChildModel.arel_attribute(:model_id)))
Model.select(Arel.star).select(child_counts.as("child_count"))
.order(:id).limit(10).to_sql
which then gives you all the child counts for each of the models:
SELECT *,
(
SELECT COUNT(*)
FROM "child_models"
WHERE "models"."id" = "child_models"."model_id"
) child_count
FROM "models"
ORDER BY "models"."id" ASC
LIMIT 10
Best of luck
UPDATE:
Not sure if you are trying to solve this in a generic way or not. Also not sure what kind of scopes you are using on your Model.
We do have a method that automatically calls a count for a query that is put into the ui layer. I found using count(:all) is more stable than the simple count, but sounds like that does not overlap your use case. Maybe you can improve your solution using the except clause that we use:
scope.except(:select, :includes, :references, :offset, :limit, :order)
.count(:all)
The where clause and the joins necessary for the where clause work just fine for us. We tend to want to keep the joins and where clause since that needs to be part of the count. While you definitely want to remove the includes (which should be removed by rails automatically in my opinion), but the references (much trickier especially in the case where it references a has_many and requires a distinct) that starts to throw a wrench in there. If you need to use references, you may be able to convert these over to a left_join.
You may want to double check the parameters that these "join" methods take. Some of them take table names and others take relation names. Later rails version have gotten better and take relation names - be sure you are looking at the docs for the right version of rails.
Also, in our case, we spend more time trying to get sub selects with more complicated relationships, we have to do some munging. Looks like we are not dealing with where clauses as much.
ref2

Active Record embed Table.where('x').count inside of select statement

I'm setting up an AR query that is basically meant to find an average of a few values that span three different tables. I'm getting hung up on how to embed the result of a particular Count query inside of the Active Record select statement.
Just by itself, this query returns "3":
Order.where(user_id: 319).count => 3
My question is, can I embed this into a select statement as a SQL alias similar to below:
Table.xxxxxx.select("Order.where(user_id: 319).count AS count,user_id, SUM(quantity*current_price) AS revenue").xxxxx
It seems to be throwing an error and generally not recognizing what I'm trying to do when I declare that first count alias. Any ideas on the syntax?
Well, after examining a bit, I cleared my mind into the ActiveRecord select() syntax.
It's a method that can take a variable length of parameters. So, your failing :
Table.xxxxxx.select("Order.where(user_id: 319).count AS count,user_id, SUM(quantity*current_price) AS revenue").xxxxx
After replacing proper SQL for your misplaced ActiveRecord statement, should be more of like this [be careful, you can't use as count in most cases, count is reserved]:
Table.xxxxx.select("(SELECT count(id) from orders where user_id=319) as usercount", "user_id","SUM(quantity*current_price) AS revenue").xxxx
But I guess you should need more a per-user_id-table.
So, I'd skip Models and go to direct SQL, always being careful to avoid injections:
ActiveRecord::Base.connection.execute('SELECT COUNT(orders.id) as usercount, users.id from users, orders where users.id=orders.user_id group by users.id')
This is simplified of course, you can apply the rest of the data (which I currently do not know) accordingly. The above simplified, not full solution, could be written also as:
Order.joins(:user).select("count(orders.id) as usercount, users.id").group(:user_id)

Grouping by month in database and not with ruby

I'm trying to group calls by month but I need to do it in the database and not with ruby. Here is the current code:
Call.limit(1000).group_by { |t| t.created_at.month }
Which returns:
SELECT `calls`.* FROM `calls` ORDER BY created_at desc LIMIT 1000
Then ruby does the grouping. What should I do to make the database do the work ?
Thank you.
The short answer, is that you cannot achieve the same result at SQL level.
Here's the full explanation.
First of all, what should be the result of that call? You can use the PG/SQL Group BY statement, however it's likely the result is not what you expect.
The Group By syntax is designed to group rows with a pattern, and compute and aggregate function. In your case, even assuming you create a query that uses date_trunc to group by a part of the timestamp, the aggregate function does not permit you to return a dataset structured like the Ruby group_by method.
Why do you want to compute such grouping at database level?
If you have specific requirements or computation limits, then work on a custom method.
Use Call.limit(1000).group("month(created_at)")
Please checkout mysql date-time methods appropriate in your case. But .group() will do the mysql grouping.
http://dev.mysql.com/doc/refman/5.1/en/date-and-time-functions.html#function_month