Django: Order a model by a many-to-many field - sql

I am writing a Django application that has a model for People, and I have hit a snag. I am assigning Role objects to people using a Many-To-Many relationship - where Roles have a name and a weight. I wish to order my list of people by their heaviest role's weight. If I do People.objects.order_by('-roles__weight'), then I get duplicates when people have multiple roles assigned to them.
My initial idea was to add a denormalized field called heaviest-role-weight - and sort by that. This could then be updated every time a new role was added or removed from a user. However, it turns out that there is no way to perform a custom action every time a ManyToManyField is updated in Django (yet, anyway).
So, I thought I could then go completely overboard and write a custom field, descriptor and manager to handle this - but that seems extremely difficult when the ManyRelatedManager is created dynamically for a ManyToManyField.
I have been trying to come up with some clever SQL that could do this for me - I'm sure it's possible with a subquery (or a few), but I'd be worried about it not being compatible will all the database backends Django supports.
Has anyone done this before - or have any ideas how it could be achieved?

Django 1.1 (currently beta) adds aggregation support. Your query can be done with something like:
from django.db.models import Max
People.objects.annotate(max_weight=Max('roles__weight')).order_by('-max_weight')
This sorts people by their heaviest roles, without returning duplicates.
The generated query is:
SELECT people.id, people.name, MAX(role.weight) AS max_weight
FROM people LEFT OUTER JOIN people_roles ON (people.id = people_roles.people_id)
LEFT OUTER JOIN role ON (people_roles.role_id = role.id)
GROUP BY people.id, people.name
ORDER BY max_weight DESC

Here's a way to do it without an annotation:
class Role(models.Model):
pass
class PersonRole(models.Model):
weight = models.IntegerField()
person = models.ForeignKey('Person')
role = models.ForeignKey(Role)
class Meta:
# if you have an inline configured in the admin, this will
# make the roles order properly
ordering = ['weight']
class Person(models.Model):
roles = models.ManyToManyField('Role', through='PersonRole')
def ordered_roles(self):
"Return a properly ordered set of roles"
return self.roles.all().order_by('personrole__weight')
This lets you say something like:
>>> person = Person.objects.get(id=1)
>>> roles = person.ordered_roles()

Something like this in SQL:
select p.*, max (r.Weight) as HeaviestWeight
from persons p
inner join RolePersons rp on p.id = rp.PersonID
innerjoin Roles r on rp.RoleID = r.id
group by p.*
order by HeaviestWeight desc
Note: group by p.* may be disallowed by your dialect of SQL. If so, just list all the columns in table p that you intend to use in the select clause.
Note: if you just group by p.ID, you won't be able to call for the other columns in p in your select clause.
I don't know how this interacts with Django.

Related

SQL query to exclude records that are part of a group

I can't believe this hasn't been answered elsewhere, but I don't seem to know the right words to convey what I'm trying to do. I'm using Ruby/Rails and PostgreSQL.
I have a bunch of Users in the DB that I'm trying to add to a Group based on a name search. I need to return Users that do not belong to a particular Group, but there is a join table as well (UserGroups, with the appropriate FKs).
Is there a simple way to use this configuration to perform this query without having to result to grabbing all the Users from which belong to the group and doing something like .where.not(id: users_in_group.pluck(:id)) (these groups can be pretty huge, so I don't want to send that query to the DB on a text search as the user types).
I need to return Users that do not belong to a particular Group
SELECT *
FROM users u
WHERE username ~ 'some pattern' -- ?
AND NOT EXISTS (
SELECT FROM user_groups ug
WHERE ug.group_id = 123 -- your group_id to exclude here
AND ug.user_id = u.id
);
See:
Select rows which are not present in other table

SQL query join different tables based on value in a given column?

Well I am designing a domain model and data mapper system for my site. The system has different kind of user domain objects, with the base class for users with child classes for admins, mods and banned users. Every domain object uses data from a table called 'users', while the child classes have an additional table to store admin/mod/banned information. The user type is determined by a column in the table 'users' called 'userlevel', its value is 3 for admins, 2 for mods, 1 for registered users and -1 for banned users.
Now it comes a problem when I work on a members list feature for my site, since the members list is supposed to load all users from the database(with pagination, but lets not worry about this now). The issue is that I want to load the data from both the base user table and additional admin/mod/banned table. As you see, the registered users do not have additional table to store extra data, while for admin/mod/banned users the table is different. Moreover, the columns in these tables are also different.
So How am I supposed to handle this situation using SQL queries? I know I can simply just select from the base user table and then use multiple queries to load additional data if the user level is found to be a given value, but this is a bad idea since it will results in n+1 queries for n admins/mods/banned users, a very expensive trip to database. What else am I supposed to do? Please help.
If you want to query all usertypes with one query you will have to have the columns from all tables in your result-table, several of them filled with null-values.
To get them filled with data use a left-join like this:
SELECT *
FROM userdata u
LEFT OUTER JOIN admindata a
ON ( u.userid = a.userid
AND u.usertype = 3 )
LEFT OUTER JOIN moddata m
ON ( u.userid = m.userid
AND u.usertype = 2 )
LEFT OUTER JOIN banneddata b
ON ( u.userid = b.userid
AND u.usertype = -1 )
WHERE...
You could probably drop the usertype-condition, since there should only be data in one of the joined tables, but you never know...
Then your program-code will have the job to pick the correct columns based on the usertype.
P.S.: Not that select * is only for sake of simplicity, in real code better list all of the column-names...
While is totally fine having this hierarchy in your domain classes, I would suggest changing the approach in your database. Otherwise your queries are going to be very complex and slow.
You can have just another table called e.g. users_additional_info with the mix of the columns that you need for all your user types. Then you can do
SELECT * FROM users
LEFT JOIN users_additional_info ON users.id = users_additional_info.user_id
to get all the information in a single simple query. Believe me or not, this approach will save you a lots of headaches in the future when your tables start to grow or you decide to add another type of user.

Can I sort records by child record count with DataMapper (without using raw SQL)?

What I want to do feels pretty basic to me, but I'm not finding a way to do it using DataMapper without resorting to raw SQL. That would look something like:
select u.id, u.name, count(p.id) as post_count
from posts p
inner join users u on p.user_id = u.id
group by p.user_id
order by post_count desc;
The intention of the above query is to show me all users sorted by how many posts each user has. The closest I've found using DataMapper is aggregate, which doesn't give me back resource objects. What I'd like is some way to generate one query and get back standard DM objects back.
Assuming you have relationships
has_n, :posts
you should be able to do
User.get(id).posts.count
or
User.first(:some_id => id).posts.count
or
u = User.get(1)
u.posts.count
you can also chain conditions
User.get(1).posts.all(:date.gt => '2012-10-01')
see scopes and chaining here http://datamapper.org/docs/find.html
finally add the ordering
User.get(1).posts.all(:order => [:date.desc])

Performing INNER JOIN, GROUP BY, and COUNT on Django models

I am having problems understanding how to make complex queries (or even simple ones) using Django models. I am looking to do an inner join, group by, and count in one statement using django models.
Example:
Select ab.userid, count(ab.userid) as bids, u.username
from auctionbids ab
inner join users u on ab.userid=u.id
group by ab.userid
order by numbids desc;
This type of query is very common and straight forward so I have to imagine it can be done with django models but it is not apparent from the documentation.
edit: added models
class Users(models.Model):
id = models.IntegerField(primary_key=True)
username = models.CharField(max_length=150)
class Auctionbids(models.Model):
id = models.IntegerField(primary_key=True)
user = models.ForeignKey(Users)
If you post your models.py file I can probably give you a more precise answer, but I think what you want is the Django Aggregration API
You would use it something like this:
from django.db.models import Count
User.objects.all().annotate(bids=Count('auctionbids')).order_by('bids')

MS Access Distinct Records in Recordset

So, I once again seem to have an issue with MS Access being finicky, although it seems to also be an issue when trying similar queries in SSMS (SQL Server Management Studio).
I have a collection of tables, loosely defined as follows:
table widget_mfg { id (int), name (nvarchar) }
table widget { id (int), name (nvarchar), mfg_id (int) }
table widget_component { id (int), name (nvarchar), widget_id (int), component_id }
table component { id (int), name (nvarchar), ... } -- There are ~25 columns in this table
What I'd like to do is query the database and get a list of all components that a specific manufacturer uses. I've tried some of these queries:
SELECT c.*, wc.widget_id, w.mfg_id
FROM ((widget_component wc INNER JOIN widget w ON wc.widget_id = w.id)
INNER JOIN widget_manufacturer wm on w.mfg_id = wm.id)
INNER JOIN component c on c.id = wc.component_id
WHERE wm.id = 1
The previous example displays duplicates of any part that is contained in multiple widget_component lists for different widgets.
I've also tried doing:
SELECT DISTINCT c.id, c.name, wc.widget_id, w.mfg_id
FROM component c, widget_component wc, widget w, widget_manufacturer wm
WHERE wm.id=w.mfg_id AND wm.id = 1
This doesn't display anything at all. I was reading about sub-queries, but I do not understand how they work or how they would apply to my current application.
Any assistance in this would be beneficial.
As an aside, I am not very good with either MS Access or SQL in general. I know the basics, but not a lot beyond that.
Edit:
I just tried this code, and it works to get all the component.id's while limiting them to a single entry each. How do I go about using the results of this to get a list of all the rest of the component data (component.*) where the id's from the first part are used to select this data?
SELECT DISTINCT c.part_no
FROM component c, widget w, widget_component wc, widget_manufacturer wm
WHERE(((c.id=wc.component_id AND wc.widget_id=w.id AND w.mfg_id=wm.id AND wm.id=1)))
(P.S. this is probably not the best way to do this, but I am still learning SQL.)
What I'd like to do is query the database and get a list of all
components that a specific manufacturer uses
There are several ways to do this. IN is probably the easiest to write
SELECT c.*
FROM component c
WHERE c.id IN (SELECT c.component_id
FROM widget w
INNER JOIN widget_component c
ON w.id = c.widget_id
WHERE w.mfg_id = 123)
The IN sub query finds all the component ids that a specific manufacturer uses. The outer query then selects any component.id that is that result. It doesn't matter if its in there once or 1000 times it will only get the component record once.
The other ways of doing this are using an EXISTS sub query or using a join to the query (but then you do need to de-dup it)
It sounds like your component -to- widget relationship is one-to-many. Hence the duplicates. (i.e., the same component is used by more than one widget).
Your Select is almost OK --
SELECT c.*, wc.widget_id, w.mfg_id
but the wc.widget_id is causing the duplicates (per the assumption above).
So remove wc.widget_id from the SELECT, or else aggregate it (min, max, count, etc.). Removing is easier. If you agregate, remember to add a group by clause.
Try this:
SELECT DISTINCT c.*, w.mfg_id
Also -- FWIW, it's generally a better practice to use field names, instead of the *