Why would I not get a printed list for some ORM Methods in SQLAlchemy? - flask-sqlalchemy

So I'm just trying to make sense of the output of the SQLAlchemy ORM methods after creating a model, committing some entries and running queries. Most queries are fine...I'm getting back a list but for some it just returns an object (see below). I know this sounds obvious but is this normal behavior? I'm specifically referring to the filter_by query as you can see below...
#sample_app.py
from flask import Flask
from flask_sqlalchemy import SQLAlchemy
app=Flask(__name__)
app.config['SQLALCHEMY_DATABASE_URI']='...'
db = SQLAlchemy(app)
class Person(db.Model):
__tablename__='persons'
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String(), nullable=False)
def __repr__(self):
return f'<Person Id: {self.id}, name: {self.name}>'
db.create_all()
#Run some basic commands in interactive mode with model already populated
python3
from sample_app import db,Person
#add a bunch of persons
person1=Person(name='Amy')
person2=...
db.session.add(person1)
db.session.commit()
...
#Run queries
Person.query.all() #returns all persons as a list
Person.query.first() #returns first item in the list
Person.query.filter_by(name='Amy')
#returns <flask_sqlalchemy.Basequery object at 0xsadfjasdfsd>
So why am I not getting the same type of output for the third query for 'Amy'? is that normal behavior for the filter_by method?
Thanks

You didn’t execute the query in the last example. The all method brings back all object selected by the query, first is the first. You’ve specified a filter in the last example, but you didn’t execute a method which processes the query and returns a result [set].
If there are more than one Amy’s, you get all the matches with all() or the first with first(). If you had a filter which should yield a unique record, you could also use .one()

Related

How do I construct complex django query statements?

I am not very familiar with SQL and so trying to make more complex calls via Django ORM is stumping me. I have a Printer model that spawns Jobs and the jobs receive statuses via a State model with a foreign key relationship to it. The jobs status is determined by the most recent state object associated with it. This is so I can track the history of states of jobs throughout its life cycle. I want to be able to determine which Printers have successful jobs associated with them.
from django.db import models
class Printer(models.Model):
label = models.CharField(max_length=120)
class Job(models.Model):
label = models.CharField(max_length=120)
printer = models.ForeignKey(
Printer,
related_name='jobs',
related_query_name='job'
)
def set_state(self, state):
State.objects.create(state=state, job=self)
#property
def current_state(self):
return self.states.latest('created_at').state
class State(models.Model):
created_at = models.DateTimeField(auto_now_add=True)
state = models.SmallIntegerField()
job = models.ForeignKey(
Job,
related_name='states',
related_query_name='state'
)
I need a QuerySet of Printer objects that have at least one related job with its most recent (latest) state object which has State.state == '200'. Is there a way to construct a compound call which will achieve this using the database and not having to pull in all Job objects to run python iterations on? Perhaps a custom manager? I've been reading posts about Subquery and Annotation and OuterRef, but these ideas are just not sinking in in a way that is showing me a path. I need them explained like I'm 5. They are very unpythonic statements..
The naive python way to describe what I want:
printers = []
for printer in Printer.objects.all():
for job in printer.jobs.objects.all():
if job.states.latest().state == '200':
printers.append(printer)
printers = list(set(printers))
But with the least number of DB round trips possible. Help!
edit: further question, what's the best way to filter Jobs based on the current state. Since Job.current_state is a calculated property it cannot be used in a QuerySet filter. But, again, I don't want to have to pull in all Job objects.
Took about two days to sink in, but I think I have an answer using annotation and Subqueries:
state_sq = State.objects.filter(job=OuterRef('pk')).order_by('-created_at')
successful_jobs = Job.objects.annotate(
latest_state=Subquery(state_sq.values('state')[:1])
).filter(printer=OuterRef('pk'), latest_state='200')
printers_with_successful_jobs = Printer.objects.annotate(
has_success_jobs=Exists(successful_jobs)
).filter(has_success_jobs=True)
And further, I constructed a custom manager to return latest_state by default.
class JobManager(models.Manager):
def get_queryset(self):
state_sq = State.objects.filter(
object_id=OuterRef('pk')
).order_by('-created_at')
return super().get_queryset().annotate(
latest_state=Subquery(state_sq.values('state')[:1])
)
class Job(models.Model):
objects = JobManager()
...

Where to write predefined queries in django?

I am working with a team of engineers, and this is my first Django project.
Since I have done SQL before, I chose to write the predefined queries that the front-end developers are supposed to use to build this page (result set paging, simple find etc.).
I just learned Django QuerySet, and I am ready to use it, but I do not know on which file/class to write them.
Should I write them as methods inside each class in models.py? Django documentation simply writes them in the shell, and I haven't read it say where to put them.
Generally, the Django pattern is that you will write your queries in your views in the views.py file. Here you will take each of your predefined queries for a given URL and return a response that renders a template (that presumably your front end team will build with you.) or returns a JSON response (for example through Django Rest Framework for an SPA front-end).
The tutorial is strong on this, so that may be a better bet for where to put things than the docs itself.
Queries can be run anywhere, but django is built to receive Requests through the URL schema, and return a response. This is typically done in the views.py, and each view is generally called by a line in the urls.py file.
If you're particularly interested in following the fat models approach and putting them there, then you might be interested in the Manager objects, which are what define querysets that you get through, for example MyModel.objects.all()
My example view (for a class based view, which provides information about a list of matches:
class MatchList(generics.ListCreateAPIView):
"""
Retrieve, update or delete a Match.
"""
queryset = Match.objects.all()
serializer_class = MatchSerialiser
That queryset could be anything, though.
A function based view with a different queryset would be:
def event(request, event_slug):
from .models import Event, Comment, Profile
event = Event.objects.get(event_url=event_slug)
future_events = Event.objects.filter(date__gt=event.date)
comments = Comment.objects.select_related('user').filter(event=event)
final_comments = []
return render(request, 'core/event.html', {"event": event, "future_events": future_events})
edit: That second example is quite old, and the query would be better refactored to:
future_events=Event.objects.filter(date__gt=event.date).select_related('comments')
Edit edit: It's worth pointing out, QuerySet isn't a language, in the way that you're using it. It's django's API for the Object Relational Mapper that sits on top of the database, in the same way that SQLAlchemy also does - in fact, you can swap out or use SQLAlchemy instead of using the Django ORM, if you really wanted. Mostly you'll hear people talking about the Django ORM. :)
If you have some model SomeModel and you wanted to access its objects via a raw SQL query you would do: SomeModel.objects.raw(raw_query).
For example: SomeModel.objects.raw('SELECT * FROM myapp_somemodel')
https://docs.djangoproject.com/en/1.11/topics/db/sql/#performing-raw-queries
Django file structure:
app/
models.py
views.py
urls.py
templates/
app/
my_template.html
In models.py
class MyModel(models.Model):
#field definition and relations
In views.py:
from .models import MyModel
def my_view():
my_model = MyModel.objects.all() #here you use the querysets
return render('my_template.html', {'my_model': my_model}) #pass the object to the template
In the urls.py
from .views import my_view
url(r'^myurl/$', my_view, name='my_view'), # here you write the url that points to your view
And finally in my_template.html
# display the data using django template
{% for obj in object_list %}
<p>{{ obj }}</p>
{% endfor %}

Implementing a "soft delete" system using sqlalchemy

We are creating a service for an app using tornado and sqlalchemy. The application is written in django and uses a "soft delete mechanism". What that means is that there was no deletion in the underlying mysql tables. To mark a row as deleted we simply set the attributed "delete" as True. However, in the service we are using sqlalchemy. Initially, we started to add check for delete in the queries made through sqlalchemy itself like:
customers = db.query(Customer).filter(not_(Customer.deleted)).all()
However this leads to a lot of potential bugs because developers tend to miss the check for deleted in there queries. Hence we decided to override the default querying with our query class that does a "pre-filter":
class SafeDeleteMixin(Query):
def __iter__(self):
return Query.__iter__(self.deleted_filter())
def from_self(self, *ent):
# override from_self() to automatically apply
# the criterion too. this works with count() and
# others.
return Query.from_self(self.deleted_filter(), *ent)
def deleted_filter(self):
mzero = self._mapper_zero()
if mzero is not None:
crit = mzero.class_.deleted == False
return self.enable_assertions(False).filter(crit)
else:
return self
This inspired from a solution on sqlalchemy docs here:
https://bitbucket.org/zzzeek/sqlalchemy/wiki/UsageRecipes/PreFilteredQuery
However, we are still facing issues, like in cases where we are doing filter and update together and using this query class as defined above the update does not respect the criterion of delete=False when applying the filter for update.
db = CustomSession(with_deleted=False)()
result = db.query(Customer).filter(Customer.id == customer_id).update({Customer.last_active_time: last_active_time })
How can I implement the "soft-delete" feature in sqlalchemy
I've done something similar here. We did it a bit differently, we made a service layer that all database access goes through, kind of like a controller, but only for db access, we called it a ResourceManager, and it's heavily inspired by "Domain Driven Design" (great book, invaluable for using SQLAlchemy well). A derived ResourceManager exists for each aggregate root, ie. each resource class you want to get at things through. (Though sometimes for really simple ResourceManagers, the derived manager class itself is generated dynamically) It has a method that gives out your base query, and that base query gets filtered for your soft delete before it's handed out. From then on, you can add to that query generatively for filtering, and finally call it with query.one() or first() or all() or count(). Note, there is one gotcha I encountered for this kind of generative query handling, you can hang yourself if you join a table too many times. In some cases for filtering we had to keep track of which tables had already been joined. If your delete filter is off the primary table, just filter that first, and you can join willy nilly after that.
so something like this:
class ResourceManager(object):
# these will get filled in by the derived class
# you could use ABC tools if you want, we don't bother
model_class = None
serializer_class = None
# the resource manager gets instantiated once per request
# and passed the current requests SQAlchemy session
def __init__(self, dbsession):
self.dbs = dbsession
# hand out base query, assumes we have a boolean 'deleted' column
#property
def query(self):
return self.dbs(self.model_class).filter(
getattr(self.model_class, 'deleted')==False)
class UserManager(ResourceManager):
model_class = User
# some client code might look this
dbs = SomeSessionFactoryIHave()
user_manager = UserManager(dbs)
users = user_manager.query.filter_by(name_last="Duncan").first()
Now as long as I always start off by going through a ResourceManager, which has other benefits too (see aforementioned book), I know my query is pre-filtered. This has worked very well for us on a current project that has soft-delete and quite an extensive and thorny db schema.
hth!
I would create a function
def customer_query():
return db.session.query(Customer).filter(Customer.deleted == False)
I used query functions to not forget default flags, to set flags based on user permission, filter using joins etc, so that these things wont be copy-pasted and forgotten at various places.

working of openerp-create & write orm methods

can anyone explain the working of create and write orm mehods in openerp ? Actually I'm stuck at this methods,I'm not getting how it works internally and how can I implement it over a simple program.
class dumval(osv.osv):
_name = 'dum_val'
_columns={
'state':fields.selection([('done','confirm'),('cancel','cancelled')],'position',readonly=True),
'name':fields.char('Name',size=40,required=True,states={'done':[('required','False')]}),
'lname':fields.char('Last name',size=40,required=True),
'fname':fields.char('Full name',size=80,readonly=True),
'addr':fields.char('Address',size=40,required=True,help='enter address'),
}
_defaults = {
'state':'done',
}
It would be nice if u could explain using this example..
A couple of comments plus a bit more detail.
As Lukasz answered, convention is to use periods in your model names dum.val. Usually something like my_module.my_model to ensure there are no name collisions (e.g. account.invoice, sale.order)
I am not sure if your conditional "required" in the model will work; this kind of thing is usually done in the view but it would be worth seeing how the field is defined in the SQL schema.
The create method creates new records (SQL Insert). It takes a dict of values, applies any defaults you have specified and then inserts the record and returns the new ID. Note that you can do compound creates, i.e. if you are creating and invoice, you can add the invoice lines into the dictionary and do it all in one create and OpenERP will take care of the related fields for you (ref write method in https://doc.openerp.com/trunk/server/api_models/)
The write method updates existing records (SQL Update). It takes a dict of values and applies to all of the ids you pass. This is an important point, if you pass a list of ids, the values will be written to all ids. If you want to update a single record, pass a list of one entry, if you want to do different updates to the records, you have to do multiple write calls. You can also manage related fields with a write.
It's convention to give _name like dum.val instead of dum_val.
In dumval class you can write a method:
def abc(cr, uid, ids, context=None):
create_dict = {'name':'xxx','lname':'xxx','fname':'xxx','addr':'xyz'}
# create new object and get id
new_id = self.create(cr, uid, write_dict, context=context)
# write on new object
self.write(cr, uid, new_id, {'lname':'yyy'}, context=context)
For more details look: https://www.openerp.com/files/memento/older_versions/OpenERP_Technical_Memento_v0.6.1.pdf

Django aggregate query

I have a model Page, which can have Posts on it. What I want to do is get every Page, plus the most recent Post on that page. If the Page has no Posts, I still want the page. (Sound familiar? This is a LEFT JOIN in SQL).
Here is what I currently have:
Page.objects.annotate(most_recent_post=Max('post__post_time'))
This only gets Pages, but it doesn't get Posts. How can I get the Posts as well?
Models:
class Page(models.Model):
name = models.CharField(max_length=50)
created = models.DateTimeField(auto_now_add = True)
enabled = models.BooleanField(default = True)
class Post(models.Model):
user = models.ForeignKey(User)
page = models.ForeignKey(Page)
post_time = models.DateTimeField(auto_now_add = True)
Depending on the relationship between the two, you should be able to follow the relationships quite easily, and increase performance by using select_related
Taking this:
class Page(models.Model):
...
class Post(models.Model):
page = ForeignKey(Page, ...)
You can follow the forward relationship (i.e. get all the posts and their associated pages) efficiently using select_related:
Post.objects.select_related('page').all()
This will result in only one (larger) query where all the page objects are prefetched.
In the reverse situation (like you have) where you want to get all pages and their associated posts, select_related won't work. See this,this and this question for more information about what you can do.
Probably your best bet is to use the techniques described in the django docs here: Following Links Backward.
After you do:
pages = Page.objects.annotate(most_recent_post=Max('post__post_time'))
posts = [page.post_set.filter(post_time=page.most_recent_post) for page in pages]
And then posts[0] should have the most recent post for pages[0] etc. I don't know if this is the most efficient solution, but this was the solution mentioned in another post about the lack of left joins in django.
You can create a database view that will contain all Page columns alongside with with necessary latest Post columns:
CREATE VIEW `testapp_pagewithrecentpost` AS
SELECT testapp_page.*, testapp_post.* -- I suggest as few post columns as possible here
FROM `testapp_page` LEFT JOIN `testapp_page`
ON test_page.id = test_post.page_id
AND test_post.post_time =
( SELECT MAX(test_post.post_time)
FROM test_post WHERE test_page.id = test_post.page_id );
Then you need to create a model with flag managed = False (so that manage.py sync won't break). You can also use inheritance from abstract Model to avoid column duplication:
class PageWithRecentPost(models.Model): # Or extend abstract BasePost ?
# Page columns goes here
# Post columns goes here
# We use LEFT JOIN, so all columns from the
# 'post' model will need blank=True, null=True
class Meta:
managed = False # Django will not handle creation/reset automatically
By doing that you can do what you initially wanted, so fetch from both tables in just one query:
pages_with_recent_post = PageWithRecentPost.objects.filter(...)
for page in pages_with_recent_post:
print page.name # Page column
print page.post_time # Post column
However this approach is not drawback free:
It's very DB engine-specific
You'll need to add VIEW creation SQL to your project
If your models are complex it's very likely that you'll need to resolve table column name clashes.
Model based on a database view will very likely be read-only (INSERT/UPDATE will fail).
It adds complexity to your project. Allowing for multiple queries is a definitely simpler solution.
Changes in Page/Post will require re-creating the view.