Django aggregate query - sql

I have a model Page, which can have Posts on it. What I want to do is get every Page, plus the most recent Post on that page. If the Page has no Posts, I still want the page. (Sound familiar? This is a LEFT JOIN in SQL).
Here is what I currently have:
Page.objects.annotate(most_recent_post=Max('post__post_time'))
This only gets Pages, but it doesn't get Posts. How can I get the Posts as well?
Models:
class Page(models.Model):
name = models.CharField(max_length=50)
created = models.DateTimeField(auto_now_add = True)
enabled = models.BooleanField(default = True)
class Post(models.Model):
user = models.ForeignKey(User)
page = models.ForeignKey(Page)
post_time = models.DateTimeField(auto_now_add = True)

Depending on the relationship between the two, you should be able to follow the relationships quite easily, and increase performance by using select_related
Taking this:
class Page(models.Model):
...
class Post(models.Model):
page = ForeignKey(Page, ...)
You can follow the forward relationship (i.e. get all the posts and their associated pages) efficiently using select_related:
Post.objects.select_related('page').all()
This will result in only one (larger) query where all the page objects are prefetched.
In the reverse situation (like you have) where you want to get all pages and their associated posts, select_related won't work. See this,this and this question for more information about what you can do.

Probably your best bet is to use the techniques described in the django docs here: Following Links Backward.
After you do:
pages = Page.objects.annotate(most_recent_post=Max('post__post_time'))
posts = [page.post_set.filter(post_time=page.most_recent_post) for page in pages]
And then posts[0] should have the most recent post for pages[0] etc. I don't know if this is the most efficient solution, but this was the solution mentioned in another post about the lack of left joins in django.

You can create a database view that will contain all Page columns alongside with with necessary latest Post columns:
CREATE VIEW `testapp_pagewithrecentpost` AS
SELECT testapp_page.*, testapp_post.* -- I suggest as few post columns as possible here
FROM `testapp_page` LEFT JOIN `testapp_page`
ON test_page.id = test_post.page_id
AND test_post.post_time =
( SELECT MAX(test_post.post_time)
FROM test_post WHERE test_page.id = test_post.page_id );
Then you need to create a model with flag managed = False (so that manage.py sync won't break). You can also use inheritance from abstract Model to avoid column duplication:
class PageWithRecentPost(models.Model): # Or extend abstract BasePost ?
# Page columns goes here
# Post columns goes here
# We use LEFT JOIN, so all columns from the
# 'post' model will need blank=True, null=True
class Meta:
managed = False # Django will not handle creation/reset automatically
By doing that you can do what you initially wanted, so fetch from both tables in just one query:
pages_with_recent_post = PageWithRecentPost.objects.filter(...)
for page in pages_with_recent_post:
print page.name # Page column
print page.post_time # Post column
However this approach is not drawback free:
It's very DB engine-specific
You'll need to add VIEW creation SQL to your project
If your models are complex it's very likely that you'll need to resolve table column name clashes.
Model based on a database view will very likely be read-only (INSERT/UPDATE will fail).
It adds complexity to your project. Allowing for multiple queries is a definitely simpler solution.
Changes in Page/Post will require re-creating the view.

Related

What is the best way to get all linked instances of a models in Django?

I am trying to create a messaging system in Django, and I came across an issue: How could I efficiently find all messages linked in a thread?
Let's imagine I have two models:
class Conversation(models.Model):
sender = models.ForeignKey(User)
receiver = models.ForeignKey(User)
first_message = models.OneToOneField(Message)
last_message = models.OneToOneField(Message)
class Message(models.Model):
previous = models.OneToOneField(Message)
content = models.TextField()
(code not tested, I'm sure it wouldn't work as is)
Since it is designed as a simple linked list, is it the only way to traverse it recursively?
Should I try to just get the previous of the previous until I find the first, or is there a way to query all of them more efficiently?
I use Rest Framework serializer with depth. So If you have serializer with Depth value to 3. I will fetch the full model of whatever the foreign key available until three parents.
https://www.django-rest-framework.org/api-guide/serializers/#specifying-nested-serialization
class AppliedSerializer(serializers.ModelSerializer):
class Meta:
model = Applied
fields = ("__all__")
depth = 3

Implementing a "soft delete" system using sqlalchemy

We are creating a service for an app using tornado and sqlalchemy. The application is written in django and uses a "soft delete mechanism". What that means is that there was no deletion in the underlying mysql tables. To mark a row as deleted we simply set the attributed "delete" as True. However, in the service we are using sqlalchemy. Initially, we started to add check for delete in the queries made through sqlalchemy itself like:
customers = db.query(Customer).filter(not_(Customer.deleted)).all()
However this leads to a lot of potential bugs because developers tend to miss the check for deleted in there queries. Hence we decided to override the default querying with our query class that does a "pre-filter":
class SafeDeleteMixin(Query):
def __iter__(self):
return Query.__iter__(self.deleted_filter())
def from_self(self, *ent):
# override from_self() to automatically apply
# the criterion too. this works with count() and
# others.
return Query.from_self(self.deleted_filter(), *ent)
def deleted_filter(self):
mzero = self._mapper_zero()
if mzero is not None:
crit = mzero.class_.deleted == False
return self.enable_assertions(False).filter(crit)
else:
return self
This inspired from a solution on sqlalchemy docs here:
https://bitbucket.org/zzzeek/sqlalchemy/wiki/UsageRecipes/PreFilteredQuery
However, we are still facing issues, like in cases where we are doing filter and update together and using this query class as defined above the update does not respect the criterion of delete=False when applying the filter for update.
db = CustomSession(with_deleted=False)()
result = db.query(Customer).filter(Customer.id == customer_id).update({Customer.last_active_time: last_active_time })
How can I implement the "soft-delete" feature in sqlalchemy
I've done something similar here. We did it a bit differently, we made a service layer that all database access goes through, kind of like a controller, but only for db access, we called it a ResourceManager, and it's heavily inspired by "Domain Driven Design" (great book, invaluable for using SQLAlchemy well). A derived ResourceManager exists for each aggregate root, ie. each resource class you want to get at things through. (Though sometimes for really simple ResourceManagers, the derived manager class itself is generated dynamically) It has a method that gives out your base query, and that base query gets filtered for your soft delete before it's handed out. From then on, you can add to that query generatively for filtering, and finally call it with query.one() or first() or all() or count(). Note, there is one gotcha I encountered for this kind of generative query handling, you can hang yourself if you join a table too many times. In some cases for filtering we had to keep track of which tables had already been joined. If your delete filter is off the primary table, just filter that first, and you can join willy nilly after that.
so something like this:
class ResourceManager(object):
# these will get filled in by the derived class
# you could use ABC tools if you want, we don't bother
model_class = None
serializer_class = None
# the resource manager gets instantiated once per request
# and passed the current requests SQAlchemy session
def __init__(self, dbsession):
self.dbs = dbsession
# hand out base query, assumes we have a boolean 'deleted' column
#property
def query(self):
return self.dbs(self.model_class).filter(
getattr(self.model_class, 'deleted')==False)
class UserManager(ResourceManager):
model_class = User
# some client code might look this
dbs = SomeSessionFactoryIHave()
user_manager = UserManager(dbs)
users = user_manager.query.filter_by(name_last="Duncan").first()
Now as long as I always start off by going through a ResourceManager, which has other benefits too (see aforementioned book), I know my query is pre-filtered. This has worked very well for us on a current project that has soft-delete and quite an extensive and thorny db schema.
hth!
I would create a function
def customer_query():
return db.session.query(Customer).filter(Customer.deleted == False)
I used query functions to not forget default flags, to set flags based on user permission, filter using joins etc, so that these things wont be copy-pasted and forgotten at various places.

Ruby on rails - Sort data with SQL/ActiveRecord instead of ruby

I'm working with two tables Video and Picture and I would like to regroup them using SQL instead of ruby. This is how I do it now :
#medias = (Video.all + Picture.all).sort_by { |model| model.created_at }
Is their a way to do the same thing only with SQL/ActiveRecord?
Since you don’t have the same columns in each model you could create a polymorphic relationship with a new model called media. Your Videos and Pictures would be associated with this new model and when you need to work on only your media you don’t need to worry about whether it is a video or a picture. I’m not sure if this fits into your schema and design since there is not much info to go on from your post but this might work if you wanted to take the time to restructure your schema. This would allow you to use the query interface to access media. See the Rails Guide here:
http://guides.rubyonrails.org/association_basics.html#polymorphic-associations
You can create a media model with all the fields need to satisfy a Video or Picture object. The media model will also have a type field to keep track of what kind of media it is: Video or Picture.
Yes, using ActiveRecord's #order:
#video = Video.order(:created_at)
#pictures = Picture.order(:created_at)
#medias = #video.all + #pictures.all # Really bad idea!
Also calling all on the models like that will unnecessarily load them to memory. If you don't absolutely need all records at that time, then don't use all.
To run sql queries in Rails you could do this:
sql_statement = "Select * from ..."
#data = ActiveRecord::Base.connection.execute(sql_statement)
Then in your view you could simply reference the #data object

Django Making 1000 Duplicate Queries

Model:
class Comment(MPTTModel):
submitter = models.ForeignKey(User, blank=True, null=True)
post = models.ForeignKey(Post, related_name="post_comments")
parent = TreeForeignKey('self', blank=True, null=True, related_name="children")
text = models.CharField("Text", max_length=1000)
rank = models.FloatField(default=0.0)
pub_date = models.DateTimeField(auto_now_add=True)
Iterating through nodes has the same effect (>1000 queries).
I had similar issue with MPTT models. It was solved with select_related
(also for parent's foreign keys).
So, depending on your needs, proper queryset can looks like:
Comment.objects.select_related('post', 'submitter', 'parent', 'parent__submitter', 'parent__post')
Also, if you need comment's children in your loop as well, it can be optimized like that:
queryset.prefetch_related('children')
Or even like that:
queryset.prefetch_related(
Prefetch(
'children',
queryset=Comment.objects.select_related('post', 'etc.'),
to_attr='children_with_posts'
)
)
... and depending on tree depth, you can use that:
queryset.select_related('parent', 'parent__parent', 'parent__parent__parent')
# you got the idea:)
Duplicated queries happens because all objects from iteration hits the data base when you refer a related object.
Try using select_related in your view method.
Probably using django prefetch related or select related will resolve that, but if not work, sorry you will need a raw query.
Have you ever read about optimizing Django queries? Here is a simple tutorial that's explain a lot of things: https://docs.djangoproject.com/en/3.1/topics/db/optimization/

Editing saved Multiple Checkbox selections in Django

Whats the best way of performing the save because at the moment. When it comes to editing, I'm not getting the saved responses to populate the form. Other fields such as drop downs are fine. Is there somehthing I should do in the view to make this work? Here is my view:
def populateaboutme(request):
extractlinkedindata(request)
if request.method == "POST":
form = AboutMeForm(request.POST)
if form.is_valid():
today = datetime.date.today()
currentYYMMDD = today.strftime('%Y-%m-%d')
model_instance = form.save(commit=False)
model_instance.save()
request.session["AboutMe_id"] = model_instance.pk
StoreImage(settings.STATIC_ROOT, str(request.session["fotoloc"]), '.jpg', str(request.session["AboutMe_id"]))
return redirect('/dashboard/')
else:
myid = request.session["AboutMe_id"]
if not myid:
form = AboutMeForm()
else:
aboutme = AboutMe.objects.get(pk=int(myid))
form = AboutMeForm(instance=aboutme)
return render(request, "aboutme.html", {'form': form})
Here are the models:
class AboutMe(models.Model):
MyRelationshipIntent = models.CharField(max_length=50)
and the forms:
class AboutMeForm(ModelForm):
class Meta:
model = AboutMe
exclude = ()
MyRelationshipIntent = forms.MultipleChoiceField(choices=RELATIONSHIPINTENT_CHOICES,widget=forms.CheckboxSelectMultiple())
RELATIONSHIPINTENT_CHOICES = (
('JL', 'Just Looking'),
('FL', 'Looking for friendship'),
('FN', 'Looking for fun'),
('FL', 'Looking for a relationship'),
)
You want to use the initial option on the form:
form = AboutMeForm(initial={'name': aboutme.name})
The instance= you are using is what you need to use when saving to tell django this isn't a new object:
if request.method == 'POST':
form = AboutMeForm(request.POST, instance=aboutme)
Now using instance can give the initial values as well, but only when using a modelform, and you still need it for the saving part.
Edit
It took me a while to notice it because I was focusing on the form, but the problem you are having stems, essentially, from the fact that you are using a CharField where you should be using a ManyToManyField. I mean - how would four checked boxes be translated into one CharField and vice-versa? Django can't just guess it. It makes no sense.
You can use a CharField if you somehow add a method to translate it to the checkboxes. But it's also a wrong approach so don't. Instead, I'll give you two solutions, and you'll choose the one you see fit.
The most natural thing to do would be to use a ManyToMany field here, and then tell the django form to use the checkbox field for it (the default would be a multiselect, and if you want you can use a client side plugin to make that look nice as well). Your models would look something like this:
class Intent(models.Model):
relationship = models.CharField(max_length=50)
class AboutMe(models.Model):
intents = models.ManyToManyField(Intent)
Then you just create four Intent instances for each of the values in your RELATIONSHIPINTENT_CHOICES:
rels = ('Just Looking',
'Looking for friendship',
'Looking for fun',
'Looking for a relationship')
for i in rels:
new = Intent(relationship=i)
new.save()
This is especially good if you think that you might want to add more options later on (and you can create a model on the admin site to ease that proccess instead of the script I wrote up there). If you don't like that solution and you're sure your options would remain the same, another good solution that might suit you is creating a boolean field for each option. Like this:
class AboutMe(models.Model)
jl = models.BooleanField(verbose_name='Just Looking')
fl = models.BooleanField(verbose_name='Looking for friendship')
fn = models.BooleanField(verbose_name='Looking for fun')
fl = models.BooleanField(verbose_name='Looking for a relationship')
Then you don't even need the widget, because checkbox is the default for boolean fields. After doing this, using form(instance=aboutme) and form(initial={'jl': aboutme.jl}) would both work. I know those might look a little scary and more complex than your simple CharField, but this is the right way to go.
p.s.
Other python tips to keep in mind:
Don't name your class "AboutMe". That should be the view, not the model. It makes more sense (to me at least) to make it an extension of the built-in User, name it User or give it a similar fitting name (Profile or Account or the sort)
Field names should not look like class names (check out PEP8 for more conventions). So it should be my_relationship_intent. However, that's also a long and wearying name. relationship_intent or simply intents is a lot better.