Performing INNER JOIN, GROUP BY, and COUNT on Django models - sql

I am having problems understanding how to make complex queries (or even simple ones) using Django models. I am looking to do an inner join, group by, and count in one statement using django models.
Example:
Select ab.userid, count(ab.userid) as bids, u.username
from auctionbids ab
inner join users u on ab.userid=u.id
group by ab.userid
order by numbids desc;
This type of query is very common and straight forward so I have to imagine it can be done with django models but it is not apparent from the documentation.
edit: added models
class Users(models.Model):
id = models.IntegerField(primary_key=True)
username = models.CharField(max_length=150)
class Auctionbids(models.Model):
id = models.IntegerField(primary_key=True)
user = models.ForeignKey(Users)

If you post your models.py file I can probably give you a more precise answer, but I think what you want is the Django Aggregration API
You would use it something like this:
from django.db.models import Count
User.objects.all().annotate(bids=Count('auctionbids')).order_by('bids')

Related

Django - join between different models without foreign key

Imagine I have two simple models (it's not really what I have but this will do):
Class Person(models.Model):
person_id = models.TextField()
name = models.TextField()
#...some other fields
Class Pet(models.Model):
person_id = models.TextField()
pet_name = models.TextField()
species = models.TextField()
#...even more fields
Here's the key difference between this example and some other questions I read about: my models don't enforce a foreign key, so I can't use select_related()
I need to create a view that shows a join between two querysets in each one. So, let's imagine I want a view with all owners named John with a dog.
# a first filter
person_query = Person.objects.filter(name__startswith="John")
# a second filter
pet_query = Pet.objects.filter(species="Dog")
# the sum of the two
magic_join_that_i_cant_find_and_possibly_doesnt_exist = join(person_query.person_id, pet_query.person_id)
Now, can I join those two very very simple querysets with any function?
Or should I use raw?
SELECT p.person_id, p.name, a.pet_name, a.species
FROM person p
LEFT JOIN pet a ON
p.person_id = a.person_id AND
a.species = 'Dog' AND
p.name LIKE 'John%'
Is this query ok? Damn, I'm not sure anymore... that's my issue with queries. Everything is all at once. But consecutive queries seem so simple...
If I reference in my model class a "foreign key" (for select_related() use), will it be enforced in the database after the migration? (I need that it DOESN'T happen)
Make a models.ForeignKey but use db_constraint=False.
See https://docs.djangoproject.com/en/3.0/ref/models/fields/#django.db.models.ForeignKey.db_constraint
Also, if this model is managed=False, ie it is a legacy db table and you're not using Django migrations, the constraint won't ever be made in the first place and it's fine.
If you create a FK in the model, Django will create a constraint on migration, so you want to avoid that in your case.
I don't think there is a way to join in the database in Django if you don't declare the field to join as a foreign key. The only thing you can do is to do the join in Python, which might or might not be OK. Think that prefetch_related does precisely this.
The code would be something like:
person_query = Person.objects.filter(name__startswith="John")
person_ids = [person.id for person in person_query]
pet_query = Pet.objects.filter(species="Dog", person_id__in=person_ids).order_by('person_id')
pets_by_person_id = {person_id: pet_group for person_id, pet_group in itertools.groupby(pet_query, lambda pet: pet.person_id)}
# Now everytime you need the pets for a certain person
pets_by_person_id(person.id)
# You can also set it in all objects for easy retrieval
for person in person_query:
person.pets = pets_by_person_id(person.id)
The code might not be 100% accurate, but you get the idea I hope.

Querying between two tables that share an association

New to seqeul and sql in general. I have two tables, groups and resources, that are associated many_to_many and therefore have a groups_resources join table. I also have a task table that has a foreign_key :group_id, :groups and is associated many_to_one with groups.
I'm trying to figure out what query to use that will allow my to get the resources that are able to do a task, based on a task's group. Do I have to do a complicated query via the `groups_resources' join table, or is there a more straightforward query/ way of setting up my associations?
Thanks!
I would structure the SQL statement as below. Which would provide you the resources objects that are associated with a specific task id through the join table.
SELECT r.*
FROM resources r
JOIN groups_resources gr ON gr.resources_id = r.id
JOIN groups g ON gr.group_id = g.id
JOIN task t ON t.id = g.id
WHERE t.id = ?
I think following is enough:
select res.* from resources res, task tk, groups_resources gr
where res.resource_id = gr.resource_id and
gr.group_id = tk.group_id and
tk.group_id=<>;
The other two answers are helpful for how to structure a SQL query, but thought I would answer my own question specifically as it relates to Sequel. Turns out there is a many_through_many plugin that makes this sort of querying simple, if you make both tables many_to_many :
Task.plugin :many_through_many
Task.many_through_many :resources,
:through =>[
[:groups_tasks, :task_id, :group_id],
[:groups, :id, :id],
[:groups_resources, :group_id, :resource_id]
]
Now you can just call something like task.resources on a Task instance, even though your tables don't explicitly associate tasks and resources.

Can I sort records by child record count with DataMapper (without using raw SQL)?

What I want to do feels pretty basic to me, but I'm not finding a way to do it using DataMapper without resorting to raw SQL. That would look something like:
select u.id, u.name, count(p.id) as post_count
from posts p
inner join users u on p.user_id = u.id
group by p.user_id
order by post_count desc;
The intention of the above query is to show me all users sorted by how many posts each user has. The closest I've found using DataMapper is aggregate, which doesn't give me back resource objects. What I'd like is some way to generate one query and get back standard DM objects back.
Assuming you have relationships
has_n, :posts
you should be able to do
User.get(id).posts.count
or
User.first(:some_id => id).posts.count
or
u = User.get(1)
u.posts.count
you can also chain conditions
User.get(1).posts.all(:date.gt => '2012-10-01')
see scopes and chaining here http://datamapper.org/docs/find.html
finally add the ordering
User.get(1).posts.all(:order => [:date.desc])

Finding unique records, ordered by field in association, with PostgreSQL and Rails 3?

UPDATE: So thanks to #Erwin Brandstetter, I now have this:
def self.unique_users_by_company(company)
users = User.arel_table
cards = Card.arel_table
users_columns = User.column_names.map { |col| users[col.to_sym] }
cards_condition = cards[:company_id].eq(company.id).
and(cards[:user_id].eq(users[:id]))
User.joins(:cards).where(cards_condition).group(users_columns).
order('min(cards.created_at)')
end
... which seems to do exactly what I want. There are two shortcomings that I would still like to have addressed, however:
The order() clause is using straight SQL instead of Arel (couldn't figure it out).
Calling .count on the query above gives me this error:
NoMethodError: undefined method 'to_sym' for
#<Arel::Attributes::Attribute:0x007f870dc42c50> from
/Users/neezer/.rvm/gems/ruby-1.9.3-p0/gems/activerecord-3.1.1/lib/active_record/relation/calculations.rb:227:in
'execute_grouped_calculation'
... which I believe is probably related to how I'm mapping out the users_columns, so I don't have to manually type in all of them in the group clause.
How can I fix those two issues?
ORIGINAL QUESTION:
Here's what I have so far that solves the first part of my question:
def self.unique_users_by_company(company)
users = User.arel_table
cards = Card.arel_table
cards_condition = cards[:company_id].eq(company.id)
.and(cards[:user_id].eq(users[:id]))
User.where(Card.where(cards_condition).exists)
end
This gives me 84 unique records, which is correct.
The problem is that I need those User records ordered by cards[:created_at] (whichever is earliest for that particular user). Appending .order(cards[:created_at]) to the scope at the end of the method above does absolutely nothing.
I tried adding in a .joins(:cards), but that give returns 587 records, which is incorrect (duplicate Users). group_by as I understand it is practically useless here as well, because of how PostgreSQL handles it.
I need my result to be an ActiveRecord::Relation (so it's chainable) that returns a list of unique users who have cards that belong to a given company, ordered by the creation date of their first card... with a query that's written in Ruby and is database-agnostic. How can I do this?
class Company
has_many :cards
end
class Card
belongs_to :user
belongs_to :company
end
class User
has_many :cards
end
Please let me know if you need any other information, or if I wasn't clear in my question.
The query you are looking for should look like this one:
SELECT user_id, min(created_at) AS min_created_at
FROM cards
WHERE company_id = 1
GROUP BY user_id
ORDER BY min(created_at)
You can join in the table user if you need columns of that table in the result, else you don't even need it for the query.
If you don't need min_created_at in the SELECT list, you can just leave it away.
Should be easy to translate to Ruby (which I am no good at).
To get the whole user record (as I derive from your comment):
SELECT u.*,
FROM user u
JOIN (
SELECT user_id, min(created_at) AS min_created_at
FROM cards
WHERE company_id = 1
GROUP BY user_id
) c ON u.id = c.user_id
ORDER BY min_created_at
Or:
SELECT u.*
FROM user u
JOIN cards c ON u.id = c.user_id
WHERE c.company_id = 1
GROUP BY u.id, u.col1, u.col2, .. -- You have to spell out all columns!
ORDER BY min(c.created_at)
With PostgreSQL 9.1+ you can simply write:
GROUP BY u.id
(like in MySQL) .. provided id is the primary key.
I quote the release notes:
Allow non-GROUP BY columns in the query target list when the primary
key is specified in the GROUP BY clause (Peter Eisentraut)
The SQL standard allows this behavior, and because of the primary key,
the result is unambiguous.
The fact that you need it to be chainable complicates things, otherwise you can either drop down into SQL yourself or only select the column(s) you need via select("users.id") to get around the Postgres issue. Because at the heart of it your query is something like
SELECT users.id
FROM users
INNER JOIN cards ON users.id = cards.user_id
WHERE cards.company_id = 1
GROUP BY users.id, DATE(cards.created_at)
ORDER BY DATE(cards.created_at) DESC
Which in Arel syntax is more or less:
User.select("id").joins(:cards).where(:"cards.company_id" => company.id).group_by("users.id, DATE(cards.created_at)").order("DATE(cards.created_at) DESC")

Django: Order a model by a many-to-many field

I am writing a Django application that has a model for People, and I have hit a snag. I am assigning Role objects to people using a Many-To-Many relationship - where Roles have a name and a weight. I wish to order my list of people by their heaviest role's weight. If I do People.objects.order_by('-roles__weight'), then I get duplicates when people have multiple roles assigned to them.
My initial idea was to add a denormalized field called heaviest-role-weight - and sort by that. This could then be updated every time a new role was added or removed from a user. However, it turns out that there is no way to perform a custom action every time a ManyToManyField is updated in Django (yet, anyway).
So, I thought I could then go completely overboard and write a custom field, descriptor and manager to handle this - but that seems extremely difficult when the ManyRelatedManager is created dynamically for a ManyToManyField.
I have been trying to come up with some clever SQL that could do this for me - I'm sure it's possible with a subquery (or a few), but I'd be worried about it not being compatible will all the database backends Django supports.
Has anyone done this before - or have any ideas how it could be achieved?
Django 1.1 (currently beta) adds aggregation support. Your query can be done with something like:
from django.db.models import Max
People.objects.annotate(max_weight=Max('roles__weight')).order_by('-max_weight')
This sorts people by their heaviest roles, without returning duplicates.
The generated query is:
SELECT people.id, people.name, MAX(role.weight) AS max_weight
FROM people LEFT OUTER JOIN people_roles ON (people.id = people_roles.people_id)
LEFT OUTER JOIN role ON (people_roles.role_id = role.id)
GROUP BY people.id, people.name
ORDER BY max_weight DESC
Here's a way to do it without an annotation:
class Role(models.Model):
pass
class PersonRole(models.Model):
weight = models.IntegerField()
person = models.ForeignKey('Person')
role = models.ForeignKey(Role)
class Meta:
# if you have an inline configured in the admin, this will
# make the roles order properly
ordering = ['weight']
class Person(models.Model):
roles = models.ManyToManyField('Role', through='PersonRole')
def ordered_roles(self):
"Return a properly ordered set of roles"
return self.roles.all().order_by('personrole__weight')
This lets you say something like:
>>> person = Person.objects.get(id=1)
>>> roles = person.ordered_roles()
Something like this in SQL:
select p.*, max (r.Weight) as HeaviestWeight
from persons p
inner join RolePersons rp on p.id = rp.PersonID
innerjoin Roles r on rp.RoleID = r.id
group by p.*
order by HeaviestWeight desc
Note: group by p.* may be disallowed by your dialect of SQL. If so, just list all the columns in table p that you intend to use in the select clause.
Note: if you just group by p.ID, you won't be able to call for the other columns in p in your select clause.
I don't know how this interacts with Django.