Django determine "most popular" - sql

Given this somewhat simplified representation of my application's models, my question is how do I globally find the most popular MyModel? I.e., those MyModels are favorited the most by MyUsers.
I've come across similar blog posts about how to find favorite tags, but I don't think those apply to this particular situation.
class MyUser(models.Model):
favorite_models = models.ManyToManyField(MyModel)
...
class MyModel(models.Model):
name = models.CharField(...)
...
Can this be done in a single query? Or do I need to loop over every MyUser and MyModel to determine the most popular? Thanks in advance!

I'm too lazy to create a django project from scratch, but this one should do the job:
from django.db.models import Count
MyModel.objects.annotate(Count('myuser'))
(or this)
MyModel.objects.annotate(Count('myuser_set'))
if not, try this:
class MyUser(models.Model):
favorite_models = models.ManyToManyField(MyModel, related_name='myuser')
and then
MyModel.objects.annotate(Count('myuser_set'))
(let me know if it works, in any case this page should contain what you need to do that: https://docs.djangoproject.com/en/dev/topics/db/aggregation/)

Related

Where to write predefined queries in django?

I am working with a team of engineers, and this is my first Django project.
Since I have done SQL before, I chose to write the predefined queries that the front-end developers are supposed to use to build this page (result set paging, simple find etc.).
I just learned Django QuerySet, and I am ready to use it, but I do not know on which file/class to write them.
Should I write them as methods inside each class in models.py? Django documentation simply writes them in the shell, and I haven't read it say where to put them.
Generally, the Django pattern is that you will write your queries in your views in the views.py file. Here you will take each of your predefined queries for a given URL and return a response that renders a template (that presumably your front end team will build with you.) or returns a JSON response (for example through Django Rest Framework for an SPA front-end).
The tutorial is strong on this, so that may be a better bet for where to put things than the docs itself.
Queries can be run anywhere, but django is built to receive Requests through the URL schema, and return a response. This is typically done in the views.py, and each view is generally called by a line in the urls.py file.
If you're particularly interested in following the fat models approach and putting them there, then you might be interested in the Manager objects, which are what define querysets that you get through, for example MyModel.objects.all()
My example view (for a class based view, which provides information about a list of matches:
class MatchList(generics.ListCreateAPIView):
"""
Retrieve, update or delete a Match.
"""
queryset = Match.objects.all()
serializer_class = MatchSerialiser
That queryset could be anything, though.
A function based view with a different queryset would be:
def event(request, event_slug):
from .models import Event, Comment, Profile
event = Event.objects.get(event_url=event_slug)
future_events = Event.objects.filter(date__gt=event.date)
comments = Comment.objects.select_related('user').filter(event=event)
final_comments = []
return render(request, 'core/event.html', {"event": event, "future_events": future_events})
edit: That second example is quite old, and the query would be better refactored to:
future_events=Event.objects.filter(date__gt=event.date).select_related('comments')
Edit edit: It's worth pointing out, QuerySet isn't a language, in the way that you're using it. It's django's API for the Object Relational Mapper that sits on top of the database, in the same way that SQLAlchemy also does - in fact, you can swap out or use SQLAlchemy instead of using the Django ORM, if you really wanted. Mostly you'll hear people talking about the Django ORM. :)
If you have some model SomeModel and you wanted to access its objects via a raw SQL query you would do: SomeModel.objects.raw(raw_query).
For example: SomeModel.objects.raw('SELECT * FROM myapp_somemodel')
https://docs.djangoproject.com/en/1.11/topics/db/sql/#performing-raw-queries
Django file structure:
app/
models.py
views.py
urls.py
templates/
app/
my_template.html
In models.py
class MyModel(models.Model):
#field definition and relations
In views.py:
from .models import MyModel
def my_view():
my_model = MyModel.objects.all() #here you use the querysets
return render('my_template.html', {'my_model': my_model}) #pass the object to the template
In the urls.py
from .views import my_view
url(r'^myurl/$', my_view, name='my_view'), # here you write the url that points to your view
And finally in my_template.html
# display the data using django template
{% for obj in object_list %}
<p>{{ obj }}</p>
{% endfor %}

Django Making 1000 Duplicate Queries

Model:
class Comment(MPTTModel):
submitter = models.ForeignKey(User, blank=True, null=True)
post = models.ForeignKey(Post, related_name="post_comments")
parent = TreeForeignKey('self', blank=True, null=True, related_name="children")
text = models.CharField("Text", max_length=1000)
rank = models.FloatField(default=0.0)
pub_date = models.DateTimeField(auto_now_add=True)
Iterating through nodes has the same effect (>1000 queries).
I had similar issue with MPTT models. It was solved with select_related
(also for parent's foreign keys).
So, depending on your needs, proper queryset can looks like:
Comment.objects.select_related('post', 'submitter', 'parent', 'parent__submitter', 'parent__post')
Also, if you need comment's children in your loop as well, it can be optimized like that:
queryset.prefetch_related('children')
Or even like that:
queryset.prefetch_related(
Prefetch(
'children',
queryset=Comment.objects.select_related('post', 'etc.'),
to_attr='children_with_posts'
)
)
... and depending on tree depth, you can use that:
queryset.select_related('parent', 'parent__parent', 'parent__parent__parent')
# you got the idea:)
Duplicated queries happens because all objects from iteration hits the data base when you refer a related object.
Try using select_related in your view method.
Probably using django prefetch related or select related will resolve that, but if not work, sorry you will need a raw query.
Have you ever read about optimizing Django queries? Here is a simple tutorial that's explain a lot of things: https://docs.djangoproject.com/en/3.1/topics/db/optimization/

Django aggregate query

I have a model Page, which can have Posts on it. What I want to do is get every Page, plus the most recent Post on that page. If the Page has no Posts, I still want the page. (Sound familiar? This is a LEFT JOIN in SQL).
Here is what I currently have:
Page.objects.annotate(most_recent_post=Max('post__post_time'))
This only gets Pages, but it doesn't get Posts. How can I get the Posts as well?
Models:
class Page(models.Model):
name = models.CharField(max_length=50)
created = models.DateTimeField(auto_now_add = True)
enabled = models.BooleanField(default = True)
class Post(models.Model):
user = models.ForeignKey(User)
page = models.ForeignKey(Page)
post_time = models.DateTimeField(auto_now_add = True)
Depending on the relationship between the two, you should be able to follow the relationships quite easily, and increase performance by using select_related
Taking this:
class Page(models.Model):
...
class Post(models.Model):
page = ForeignKey(Page, ...)
You can follow the forward relationship (i.e. get all the posts and their associated pages) efficiently using select_related:
Post.objects.select_related('page').all()
This will result in only one (larger) query where all the page objects are prefetched.
In the reverse situation (like you have) where you want to get all pages and their associated posts, select_related won't work. See this,this and this question for more information about what you can do.
Probably your best bet is to use the techniques described in the django docs here: Following Links Backward.
After you do:
pages = Page.objects.annotate(most_recent_post=Max('post__post_time'))
posts = [page.post_set.filter(post_time=page.most_recent_post) for page in pages]
And then posts[0] should have the most recent post for pages[0] etc. I don't know if this is the most efficient solution, but this was the solution mentioned in another post about the lack of left joins in django.
You can create a database view that will contain all Page columns alongside with with necessary latest Post columns:
CREATE VIEW `testapp_pagewithrecentpost` AS
SELECT testapp_page.*, testapp_post.* -- I suggest as few post columns as possible here
FROM `testapp_page` LEFT JOIN `testapp_page`
ON test_page.id = test_post.page_id
AND test_post.post_time =
( SELECT MAX(test_post.post_time)
FROM test_post WHERE test_page.id = test_post.page_id );
Then you need to create a model with flag managed = False (so that manage.py sync won't break). You can also use inheritance from abstract Model to avoid column duplication:
class PageWithRecentPost(models.Model): # Or extend abstract BasePost ?
# Page columns goes here
# Post columns goes here
# We use LEFT JOIN, so all columns from the
# 'post' model will need blank=True, null=True
class Meta:
managed = False # Django will not handle creation/reset automatically
By doing that you can do what you initially wanted, so fetch from both tables in just one query:
pages_with_recent_post = PageWithRecentPost.objects.filter(...)
for page in pages_with_recent_post:
print page.name # Page column
print page.post_time # Post column
However this approach is not drawback free:
It's very DB engine-specific
You'll need to add VIEW creation SQL to your project
If your models are complex it's very likely that you'll need to resolve table column name clashes.
Model based on a database view will very likely be read-only (INSERT/UPDATE will fail).
It adds complexity to your project. Allowing for multiple queries is a definitely simpler solution.
Changes in Page/Post will require re-creating the view.

rails 3 default_scope(:where) and find

Find doesn't descope the default_scope anymore, what should I do now? I need to find entries that are out of the default scope on so many places and I also need the scoped arrays of entries for so many lists in my application.
Why did they changed it? :(
Take a look at this article as what has been deprecated in Rails3 here.
So if you want to use the model without the default_scope on it, then you can use the following as in the snippet below.(This is extracted from the article I mentioned)
with_scope and with_exclusive_scope
with_scope and with_exclusive_scope are now implemented on top of Relation as well. Making it possible to use any relation with them :
with_scope(where(:name => 'lifo')) do
...
end
Or even use a named scope :
with_exclusive_scope(Item.red) do
...
end

Find all records of a certain type in Polymorphic table using ActiveRecord in Rails 3

I have a table Category that is a polymorphic model for a bunch of other models. For instance
model Address has shipping, billing,
home, work category
model Phone has home, mobile, work,
fax category
model Product has medical, it
equipment, automotive, aerospace, etc
categories.
What I want to be able to do is something like
Product.all_categories and get and array of all categories that are specific to this model.
Of course I can do something like this for each model in question:
Category.select("name").where("categorizable_type = ?","address")
Also pace_car - which is rails 3 ready, allows me to do something like this:
Category.for_category_type(Address)
But I was wondering if there is a more straightforward / elegant solution to this problem using Active Record iteself - without relying on a gem?
Thank you
I'm not aware of anything built-in to ActiveRecord to give you this, but you could set up that for_category_type method in one line of code in your Category controller:
scope :for_category_type, lambda { |class_name| where("categorizable_type = ?", class_name) }