How do I construct complex django query statements?

How do I construct complex django query statements? - sql

I am not very familiar with SQL and so trying to make more complex calls via Django ORM is stumping me. I have a Printer model that spawns Jobs and the jobs receive statuses via a State model with a foreign key relationship to it. The jobs status is determined by the most recent state object associated with it. This is so I can track the history of states of jobs throughout its life cycle. I want to be able to determine which Printers have successful jobs associated with them.
from django.db import models
class Printer(models.Model):
label = models.CharField(max_length=120)
class Job(models.Model):
label = models.CharField(max_length=120)
printer = models.ForeignKey(
Printer,
related_name='jobs',
related_query_name='job'
)
def set_state(self, state):
State.objects.create(state=state, job=self)
#property
def current_state(self):
return self.states.latest('created_at').state
class State(models.Model):
created_at = models.DateTimeField(auto_now_add=True)
state = models.SmallIntegerField()
job = models.ForeignKey(
Job,
related_name='states',
related_query_name='state'
)
I need a QuerySet of Printer objects that have at least one related job with its most recent (latest) state object which has State.state == '200'. Is there a way to construct a compound call which will achieve this using the database and not having to pull in all Job objects to run python iterations on? Perhaps a custom manager? I've been reading posts about Subquery and Annotation and OuterRef, but these ideas are just not sinking in in a way that is showing me a path. I need them explained like I'm 5. They are very unpythonic statements..
The naive python way to describe what I want:
printers = []
for printer in Printer.objects.all():
for job in printer.jobs.objects.all():
if job.states.latest().state == '200':
printers.append(printer)
printers = list(set(printers))
But with the least number of DB round trips possible. Help!
edit: further question, what's the best way to filter Jobs based on the current state. Since Job.current_state is a calculated property it cannot be used in a QuerySet filter. But, again, I don't want to have to pull in all Job objects.

Took about two days to sink in, but I think I have an answer using annotation and Subqueries:
state_sq = State.objects.filter(job=OuterRef('pk')).order_by('-created_at')
successful_jobs = Job.objects.annotate(
latest_state=Subquery(state_sq.values('state')[:1])
).filter(printer=OuterRef('pk'), latest_state='200')
printers_with_successful_jobs = Printer.objects.annotate(
has_success_jobs=Exists(successful_jobs)
).filter(has_success_jobs=True)
And further, I constructed a custom manager to return latest_state by default.
class JobManager(models.Manager):
def get_queryset(self):
state_sq = State.objects.filter(
object_id=OuterRef('pk')
).order_by('-created_at')
return super().get_queryset().annotate(
latest_state=Subquery(state_sq.values('state')[:1])
)
class Job(models.Model):
objects = JobManager()
...

Related

What is the best way to get all linked instances of a models in Django?

I am trying to create a messaging system in Django, and I came across an issue: How could I efficiently find all messages linked in a thread?
Let's imagine I have two models:
class Conversation(models.Model):
sender = models.ForeignKey(User)
receiver = models.ForeignKey(User)
first_message = models.OneToOneField(Message)
last_message = models.OneToOneField(Message)
class Message(models.Model):
previous = models.OneToOneField(Message)
content = models.TextField()
(code not tested, I'm sure it wouldn't work as is)
Since it is designed as a simple linked list, is it the only way to traverse it recursively?
Should I try to just get the previous of the previous until I find the first, or is there a way to query all of them more efficiently?

I use Rest Framework serializer with depth. So If you have serializer with Depth value to 3. I will fetch the full model of whatever the foreign key available until three parents.
https://www.django-rest-framework.org/api-guide/serializers/#specifying-nested-serialization
class AppliedSerializer(serializers.ModelSerializer):
class Meta:
model = Applied
fields = ("__all__")
depth = 3

Why would I not get a printed list for some ORM Methods in SQLAlchemy?

So I'm just trying to make sense of the output of the SQLAlchemy ORM methods after creating a model, committing some entries and running queries. Most queries are fine...I'm getting back a list but for some it just returns an object (see below). I know this sounds obvious but is this normal behavior? I'm specifically referring to the filter_by query as you can see below...
#sample_app.py
from flask import Flask
from flask_sqlalchemy import SQLAlchemy
app=Flask(__name__)
app.config['SQLALCHEMY_DATABASE_URI']='...'
db = SQLAlchemy(app)
class Person(db.Model):
__tablename__='persons'
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String(), nullable=False)
def __repr__(self):
return f'<Person Id: {self.id}, name: {self.name}>'
db.create_all()
#Run some basic commands in interactive mode with model already populated
python3
from sample_app import db,Person
#add a bunch of persons
person1=Person(name='Amy')
person2=...
db.session.add(person1)
db.session.commit()
...
#Run queries
Person.query.all() #returns all persons as a list
Person.query.first() #returns first item in the list
Person.query.filter_by(name='Amy')
#returns <flask_sqlalchemy.Basequery object at 0xsadfjasdfsd>
So why am I not getting the same type of output for the third query for 'Amy'? is that normal behavior for the filter_by method?
Thanks

You didn’t execute the query in the last example. The all method brings back all object selected by the query, first is the first. You’ve specified a filter in the last example, but you didn’t execute a method which processes the query and returns a result [set].
If there are more than one Amy’s, you get all the matches with all() or the first with first(). If you had a filter which should yield a unique record, you could also use .one()

Implementing a "soft delete" system using sqlalchemy

We are creating a service for an app using tornado and sqlalchemy. The application is written in django and uses a "soft delete mechanism". What that means is that there was no deletion in the underlying mysql tables. To mark a row as deleted we simply set the attributed "delete" as True. However, in the service we are using sqlalchemy. Initially, we started to add check for delete in the queries made through sqlalchemy itself like:
customers = db.query(Customer).filter(not_(Customer.deleted)).all()
However this leads to a lot of potential bugs because developers tend to miss the check for deleted in there queries. Hence we decided to override the default querying with our query class that does a "pre-filter":
class SafeDeleteMixin(Query):
def __iter__(self):
return Query.__iter__(self.deleted_filter())
def from_self(self, *ent):
# override from_self() to automatically apply
# the criterion too. this works with count() and
# others.
return Query.from_self(self.deleted_filter(), *ent)
def deleted_filter(self):
mzero = self._mapper_zero()
if mzero is not None:
crit = mzero.class_.deleted == False
return self.enable_assertions(False).filter(crit)
else:
return self
This inspired from a solution on sqlalchemy docs here:
https://bitbucket.org/zzzeek/sqlalchemy/wiki/UsageRecipes/PreFilteredQuery
However, we are still facing issues, like in cases where we are doing filter and update together and using this query class as defined above the update does not respect the criterion of delete=False when applying the filter for update.
db = CustomSession(with_deleted=False)()
result = db.query(Customer).filter(Customer.id == customer_id).update({Customer.last_active_time: last_active_time })
How can I implement the "soft-delete" feature in sqlalchemy

I've done something similar here. We did it a bit differently, we made a service layer that all database access goes through, kind of like a controller, but only for db access, we called it a ResourceManager, and it's heavily inspired by "Domain Driven Design" (great book, invaluable for using SQLAlchemy well). A derived ResourceManager exists for each aggregate root, ie. each resource class you want to get at things through. (Though sometimes for really simple ResourceManagers, the derived manager class itself is generated dynamically) It has a method that gives out your base query, and that base query gets filtered for your soft delete before it's handed out. From then on, you can add to that query generatively for filtering, and finally call it with query.one() or first() or all() or count(). Note, there is one gotcha I encountered for this kind of generative query handling, you can hang yourself if you join a table too many times. In some cases for filtering we had to keep track of which tables had already been joined. If your delete filter is off the primary table, just filter that first, and you can join willy nilly after that.
so something like this:
class ResourceManager(object):
# these will get filled in by the derived class
# you could use ABC tools if you want, we don't bother
model_class = None
serializer_class = None
# the resource manager gets instantiated once per request
# and passed the current requests SQAlchemy session
def __init__(self, dbsession):
self.dbs = dbsession
# hand out base query, assumes we have a boolean 'deleted' column
#property
def query(self):
return self.dbs(self.model_class).filter(
getattr(self.model_class, 'deleted')==False)
class UserManager(ResourceManager):
model_class = User
# some client code might look this
dbs = SomeSessionFactoryIHave()
user_manager = UserManager(dbs)
users = user_manager.query.filter_by(name_last="Duncan").first()
Now as long as I always start off by going through a ResourceManager, which has other benefits too (see aforementioned book), I know my query is pre-filtered. This has worked very well for us on a current project that has soft-delete and quite an extensive and thorny db schema.
hth!

I would create a function
def customer_query():
return db.session.query(Customer).filter(Customer.deleted == False)
I used query functions to not forget default flags, to set flags based on user permission, filter using joins etc, so that these things wont be copy-pasted and forgotten at various places.

Django aggregate query

I have a model Page, which can have Posts on it. What I want to do is get every Page, plus the most recent Post on that page. If the Page has no Posts, I still want the page. (Sound familiar? This is a LEFT JOIN in SQL).
Here is what I currently have:
Page.objects.annotate(most_recent_post=Max('post__post_time'))
This only gets Pages, but it doesn't get Posts. How can I get the Posts as well?
Models:
class Page(models.Model):
name = models.CharField(max_length=50)
created = models.DateTimeField(auto_now_add = True)
enabled = models.BooleanField(default = True)
class Post(models.Model):
user = models.ForeignKey(User)
page = models.ForeignKey(Page)
post_time = models.DateTimeField(auto_now_add = True)

Depending on the relationship between the two, you should be able to follow the relationships quite easily, and increase performance by using select_related
Taking this:
class Page(models.Model):
...
class Post(models.Model):
page = ForeignKey(Page, ...)
You can follow the forward relationship (i.e. get all the posts and their associated pages) efficiently using select_related:
Post.objects.select_related('page').all()
This will result in only one (larger) query where all the page objects are prefetched.
In the reverse situation (like you have) where you want to get all pages and their associated posts, select_related won't work. See this,this and this question for more information about what you can do.

Probably your best bet is to use the techniques described in the django docs here: Following Links Backward.
After you do:
pages = Page.objects.annotate(most_recent_post=Max('post__post_time'))
posts = [page.post_set.filter(post_time=page.most_recent_post) for page in pages]
And then posts[0] should have the most recent post for pages[0] etc. I don't know if this is the most efficient solution, but this was the solution mentioned in another post about the lack of left joins in django.

You can create a database view that will contain all Page columns alongside with with necessary latest Post columns:
CREATE VIEW `testapp_pagewithrecentpost` AS
SELECT testapp_page.*, testapp_post.* -- I suggest as few post columns as possible here
FROM `testapp_page` LEFT JOIN `testapp_page`
ON test_page.id = test_post.page_id
AND test_post.post_time =
( SELECT MAX(test_post.post_time)
FROM test_post WHERE test_page.id = test_post.page_id );
Then you need to create a model with flag managed = False (so that manage.py sync won't break). You can also use inheritance from abstract Model to avoid column duplication:
class PageWithRecentPost(models.Model): # Or extend abstract BasePost ?
# Page columns goes here
# Post columns goes here
# We use LEFT JOIN, so all columns from the
# 'post' model will need blank=True, null=True
class Meta:
managed = False # Django will not handle creation/reset automatically
By doing that you can do what you initially wanted, so fetch from both tables in just one query:
pages_with_recent_post = PageWithRecentPost.objects.filter(...)
for page in pages_with_recent_post:
print page.name # Page column
print page.post_time # Post column
However this approach is not drawback free:
It's very DB engine-specific
You'll need to add VIEW creation SQL to your project
If your models are complex it's very likely that you'll need to resolve table column name clashes.
Model based on a database view will very likely be read-only (INSERT/UPDATE will fail).
It adds complexity to your project. Allowing for multiple queries is a definitely simpler solution.
Changes in Page/Post will require re-creating the view.

NHibernate Partial Update

Is there a way in NHibernate to start with an unproxied model
var m = new Model() { ID = 1 };
m.Name = "test";
//Model also has .LastName and .Age
Now save this model only updating Name without first selecting the model from the session?

If model has other properties then name, you need to initialize these with the original value in the database, unless they will be set to null.
You can use HQL update operations; I never tried it myself.
You could also use a native SQL statement. ("Update model set name ...").
Usually, this optimization is not needed. There are really rare cases where you need to avoid selecting the data, so writing this SQL statements are just a waste of time. You are using an ORM, this means: write your software object oriented! Unless you won't get much advantages from it.

What Stefan says looks like what you need. Please be aware that this is really an edge case and you should be happy with fully loading your entity unless you have some ultra-high-performance issues.
If you simply don't want to hit the database - try using caching - entity cache is very simple and efficient.
If your entity is a huge one - i.e. it contains a blob or something - think about splitting it in two (with many-to-one so that you can utilize lazy loading).

http://www.hibernate.org/hib_docs/nhibernate/html/mapping.html
dynamic-update (optional, defaults to
false): Specifies that UPDATE SQL
should be generated at runtime and
contain only those columns whose
values have changed.
Place dynamic-update on the class in the HBM.
var m = new Model() { ID = 1 };
m = session.Update(m); //attach m to the session.
m.Name = "test";
session.Save(m);

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How do I construct complex django query statements? - sql

Related

What is the best way to get all linked instances of a models in Django?

Why would I not get a printed list for some ORM Methods in SQLAlchemy?

Implementing a "soft delete" system using sqlalchemy

Django aggregate query

NHibernate Partial Update

Categories

Resources