Where to write predefined queries in django? - sql

I am working with a team of engineers, and this is my first Django project.
Since I have done SQL before, I chose to write the predefined queries that the front-end developers are supposed to use to build this page (result set paging, simple find etc.).
I just learned Django QuerySet, and I am ready to use it, but I do not know on which file/class to write them.
Should I write them as methods inside each class in models.py? Django documentation simply writes them in the shell, and I haven't read it say where to put them.

Generally, the Django pattern is that you will write your queries in your views in the views.py file. Here you will take each of your predefined queries for a given URL and return a response that renders a template (that presumably your front end team will build with you.) or returns a JSON response (for example through Django Rest Framework for an SPA front-end).
The tutorial is strong on this, so that may be a better bet for where to put things than the docs itself.
Queries can be run anywhere, but django is built to receive Requests through the URL schema, and return a response. This is typically done in the views.py, and each view is generally called by a line in the urls.py file.
If you're particularly interested in following the fat models approach and putting them there, then you might be interested in the Manager objects, which are what define querysets that you get through, for example MyModel.objects.all()
My example view (for a class based view, which provides information about a list of matches:
class MatchList(generics.ListCreateAPIView):
"""
Retrieve, update or delete a Match.
"""
queryset = Match.objects.all()
serializer_class = MatchSerialiser
That queryset could be anything, though.
A function based view with a different queryset would be:
def event(request, event_slug):
from .models import Event, Comment, Profile
event = Event.objects.get(event_url=event_slug)
future_events = Event.objects.filter(date__gt=event.date)
comments = Comment.objects.select_related('user').filter(event=event)
final_comments = []
return render(request, 'core/event.html', {"event": event, "future_events": future_events})
edit: That second example is quite old, and the query would be better refactored to:
future_events=Event.objects.filter(date__gt=event.date).select_related('comments')
Edit edit: It's worth pointing out, QuerySet isn't a language, in the way that you're using it. It's django's API for the Object Relational Mapper that sits on top of the database, in the same way that SQLAlchemy also does - in fact, you can swap out or use SQLAlchemy instead of using the Django ORM, if you really wanted. Mostly you'll hear people talking about the Django ORM. :)

If you have some model SomeModel and you wanted to access its objects via a raw SQL query you would do: SomeModel.objects.raw(raw_query).
For example: SomeModel.objects.raw('SELECT * FROM myapp_somemodel')
https://docs.djangoproject.com/en/1.11/topics/db/sql/#performing-raw-queries

Django file structure:
app/
models.py
views.py
urls.py
templates/
app/
my_template.html
In models.py
class MyModel(models.Model):
#field definition and relations
In views.py:
from .models import MyModel
def my_view():
my_model = MyModel.objects.all() #here you use the querysets
return render('my_template.html', {'my_model': my_model}) #pass the object to the template
In the urls.py
from .views import my_view
url(r'^myurl/$', my_view, name='my_view'), # here you write the url that points to your view
And finally in my_template.html
# display the data using django template
{% for obj in object_list %}
<p>{{ obj }}</p>
{% endfor %}

Related

Implementing a "soft delete" system using sqlalchemy

We are creating a service for an app using tornado and sqlalchemy. The application is written in django and uses a "soft delete mechanism". What that means is that there was no deletion in the underlying mysql tables. To mark a row as deleted we simply set the attributed "delete" as True. However, in the service we are using sqlalchemy. Initially, we started to add check for delete in the queries made through sqlalchemy itself like:
customers = db.query(Customer).filter(not_(Customer.deleted)).all()
However this leads to a lot of potential bugs because developers tend to miss the check for deleted in there queries. Hence we decided to override the default querying with our query class that does a "pre-filter":
class SafeDeleteMixin(Query):
def __iter__(self):
return Query.__iter__(self.deleted_filter())
def from_self(self, *ent):
# override from_self() to automatically apply
# the criterion too. this works with count() and
# others.
return Query.from_self(self.deleted_filter(), *ent)
def deleted_filter(self):
mzero = self._mapper_zero()
if mzero is not None:
crit = mzero.class_.deleted == False
return self.enable_assertions(False).filter(crit)
else:
return self
This inspired from a solution on sqlalchemy docs here:
https://bitbucket.org/zzzeek/sqlalchemy/wiki/UsageRecipes/PreFilteredQuery
However, we are still facing issues, like in cases where we are doing filter and update together and using this query class as defined above the update does not respect the criterion of delete=False when applying the filter for update.
db = CustomSession(with_deleted=False)()
result = db.query(Customer).filter(Customer.id == customer_id).update({Customer.last_active_time: last_active_time })
How can I implement the "soft-delete" feature in sqlalchemy
I've done something similar here. We did it a bit differently, we made a service layer that all database access goes through, kind of like a controller, but only for db access, we called it a ResourceManager, and it's heavily inspired by "Domain Driven Design" (great book, invaluable for using SQLAlchemy well). A derived ResourceManager exists for each aggregate root, ie. each resource class you want to get at things through. (Though sometimes for really simple ResourceManagers, the derived manager class itself is generated dynamically) It has a method that gives out your base query, and that base query gets filtered for your soft delete before it's handed out. From then on, you can add to that query generatively for filtering, and finally call it with query.one() or first() or all() or count(). Note, there is one gotcha I encountered for this kind of generative query handling, you can hang yourself if you join a table too many times. In some cases for filtering we had to keep track of which tables had already been joined. If your delete filter is off the primary table, just filter that first, and you can join willy nilly after that.
so something like this:
class ResourceManager(object):
# these will get filled in by the derived class
# you could use ABC tools if you want, we don't bother
model_class = None
serializer_class = None
# the resource manager gets instantiated once per request
# and passed the current requests SQAlchemy session
def __init__(self, dbsession):
self.dbs = dbsession
# hand out base query, assumes we have a boolean 'deleted' column
#property
def query(self):
return self.dbs(self.model_class).filter(
getattr(self.model_class, 'deleted')==False)
class UserManager(ResourceManager):
model_class = User
# some client code might look this
dbs = SomeSessionFactoryIHave()
user_manager = UserManager(dbs)
users = user_manager.query.filter_by(name_last="Duncan").first()
Now as long as I always start off by going through a ResourceManager, which has other benefits too (see aforementioned book), I know my query is pre-filtered. This has worked very well for us on a current project that has soft-delete and quite an extensive and thorny db schema.
hth!
I would create a function
def customer_query():
return db.session.query(Customer).filter(Customer.deleted == False)
I used query functions to not forget default flags, to set flags based on user permission, filter using joins etc, so that these things wont be copy-pasted and forgotten at various places.

Ruby on rails - Sort data with SQL/ActiveRecord instead of ruby

I'm working with two tables Video and Picture and I would like to regroup them using SQL instead of ruby. This is how I do it now :
#medias = (Video.all + Picture.all).sort_by { |model| model.created_at }
Is their a way to do the same thing only with SQL/ActiveRecord?
Since you don’t have the same columns in each model you could create a polymorphic relationship with a new model called media. Your Videos and Pictures would be associated with this new model and when you need to work on only your media you don’t need to worry about whether it is a video or a picture. I’m not sure if this fits into your schema and design since there is not much info to go on from your post but this might work if you wanted to take the time to restructure your schema. This would allow you to use the query interface to access media. See the Rails Guide here:
http://guides.rubyonrails.org/association_basics.html#polymorphic-associations
You can create a media model with all the fields need to satisfy a Video or Picture object. The media model will also have a type field to keep track of what kind of media it is: Video or Picture.
Yes, using ActiveRecord's #order:
#video = Video.order(:created_at)
#pictures = Picture.order(:created_at)
#medias = #video.all + #pictures.all # Really bad idea!
Also calling all on the models like that will unnecessarily load them to memory. If you don't absolutely need all records at that time, then don't use all.
To run sql queries in Rails you could do this:
sql_statement = "Select * from ..."
#data = ActiveRecord::Base.connection.execute(sql_statement)
Then in your view you could simply reference the #data object

Django REST framework flat, read-write serializer

In Django REST framework, what is involved in creating a flat, read-write serializer representation? The docs refer to a 'flat representation' (end of the section http://django-rest-framework.org/api-guide/serializers.html#dealing-with-nested-objects) but don't offer examples or anything beyond a suggestion to use a RelatedField subclass.
For instance, how to provide a flat representation of the User and UserProfile relationship, below?
# Model
class UserProfile(models.Model):
user = models.OneToOneField(User)
favourite_number = models.IntegerField()
# Serializer
class UserProfileSerializer(serializers.ModelSerializer):
email = serialisers.EmailField(source='user.email')
class Meta:
model = UserProfile
fields = ['id', 'favourite_number', 'email',]
The above UserProfileSerializer doesn't allow writing to the email field, but I hope it expresses the intention sufficiently well. So, how should a 'flat' read-write serializer be constructed to allow a writable email attribute on the UserProfileSerializer? Is it at all possible to do this when subclassing ModelSerializer?
Thanks.
Looking at the Django REST framework (DRF) source I settled on the view that a DRF serializer is strongly tied to an accompanying Model for unserializing purposes. Field's source param make this less so for serializing purposes.
With that in mind, and viewing serializers as encapsulating validation and save behaviour (in addition to their (un)serializing behaviour) I used two serializers: one for each of the User and UserProfile models:
class UserSerializer(serializer.ModelSerializer):
class Meta:
model = User
fields = ['email',]
class UserProfileSerializer(serializer.ModelSerializer):
email = serializers.EmailField(source='user.email')
class Meta:
model = UserProfile
fields = ['id', 'favourite_number', 'email',]
The source param on the EmailField handles the serialization case adequately (e.g. when servicing GET requests). For unserializing (e.g. when serivicing PUT requests) it is necessary to do a little work in the view, combining the validation and save behaviour of the two serializers:
class UserProfileRetrieveUpdate(generics.GenericAPIView):
def get(self, request, *args, **kwargs):
# Only UserProfileSerializer is required to serialize data since
# email is populated by the 'source' param on EmailField.
serializer = UserProfileSerializer(
instance=request.user.get_profile())
return Response(serializer.data)
def put(self, request, *args, **kwargs):
# Both UserSerializer and UserProfileSerializer are required
# in order to validate and save data on their associated models.
user_profile_serializer = UserProfileSerializer(
instance=request.user.get_profile(),
data=request.DATA)
user_serializer = UserSerializer(
instance=request.user,
data=request.DATA)
if user_profile_serializer.is_valid() and user_serializer.is_valid():
user_profile_serializer.save()
user_serializer.save()
return Response(
user_profile_serializer.data, status=status.HTTP_200_OK)
# Combine errors from both serializers.
errors = dict()
errors.update(user_profile_serializer.errors)
errors.update(user_serializer.errors)
return Response(errors, status=status.HTTP_400_BAD_REQUEST)
First: better handling of nested writes is on it's way.
Second: The Serializer Relations docs say of both PrimaryKeyRelatedField and SlugRelatedField that "By default this field is read-write..." — so if your email field was unique (is it?) it might be you could use the SlugRelatedField and it would just work — I've not tried this yet (however).
Third: Instead I've used a plain Field subclass that uses the source="*" technique to accept the whole object. From there I manually pull the related field in to_native and return that — this is read-only. In order to write I've checked request.DATA in post_save and updated the related object there — This isn't automatic but it works.
So, Fourth: Looking at what you've already got, my approach (above) amounts to marking your email field as read-only and then implementing post_save to check for an email value and perform the update accordingly.
Although this does not strictly answer the question - I think it will solve your need. The issue may be more in the split of two models to represent one entity than an issue with DRF.
Since Django 1.5, you can make a custom user, if all you want is some method and extra fields but apart from that you are happy with the Django user, then all you need to do is:
class MyUser(AbstractBaseUser):
favourite_number = models.IntegerField()
and in settings: AUTH_USER_MODEL = 'myapp.myuser'
(And of course a db-migration, which could be made quite simple by using db_table option to point to your existing user table and just add the new columns there).
After that, you have the common case which DRF excels at.

Django determine "most popular"

Given this somewhat simplified representation of my application's models, my question is how do I globally find the most popular MyModel? I.e., those MyModels are favorited the most by MyUsers.
I've come across similar blog posts about how to find favorite tags, but I don't think those apply to this particular situation.
class MyUser(models.Model):
favorite_models = models.ManyToManyField(MyModel)
...
class MyModel(models.Model):
name = models.CharField(...)
...
Can this be done in a single query? Or do I need to loop over every MyUser and MyModel to determine the most popular? Thanks in advance!
I'm too lazy to create a django project from scratch, but this one should do the job:
from django.db.models import Count
MyModel.objects.annotate(Count('myuser'))
(or this)
MyModel.objects.annotate(Count('myuser_set'))
if not, try this:
class MyUser(models.Model):
favorite_models = models.ManyToManyField(MyModel, related_name='myuser')
and then
MyModel.objects.annotate(Count('myuser_set'))
(let me know if it works, in any case this page should contain what you need to do that: https://docs.djangoproject.com/en/dev/topics/db/aggregation/)

Django aggregate query

I have a model Page, which can have Posts on it. What I want to do is get every Page, plus the most recent Post on that page. If the Page has no Posts, I still want the page. (Sound familiar? This is a LEFT JOIN in SQL).
Here is what I currently have:
Page.objects.annotate(most_recent_post=Max('post__post_time'))
This only gets Pages, but it doesn't get Posts. How can I get the Posts as well?
Models:
class Page(models.Model):
name = models.CharField(max_length=50)
created = models.DateTimeField(auto_now_add = True)
enabled = models.BooleanField(default = True)
class Post(models.Model):
user = models.ForeignKey(User)
page = models.ForeignKey(Page)
post_time = models.DateTimeField(auto_now_add = True)
Depending on the relationship between the two, you should be able to follow the relationships quite easily, and increase performance by using select_related
Taking this:
class Page(models.Model):
...
class Post(models.Model):
page = ForeignKey(Page, ...)
You can follow the forward relationship (i.e. get all the posts and their associated pages) efficiently using select_related:
Post.objects.select_related('page').all()
This will result in only one (larger) query where all the page objects are prefetched.
In the reverse situation (like you have) where you want to get all pages and their associated posts, select_related won't work. See this,this and this question for more information about what you can do.
Probably your best bet is to use the techniques described in the django docs here: Following Links Backward.
After you do:
pages = Page.objects.annotate(most_recent_post=Max('post__post_time'))
posts = [page.post_set.filter(post_time=page.most_recent_post) for page in pages]
And then posts[0] should have the most recent post for pages[0] etc. I don't know if this is the most efficient solution, but this was the solution mentioned in another post about the lack of left joins in django.
You can create a database view that will contain all Page columns alongside with with necessary latest Post columns:
CREATE VIEW `testapp_pagewithrecentpost` AS
SELECT testapp_page.*, testapp_post.* -- I suggest as few post columns as possible here
FROM `testapp_page` LEFT JOIN `testapp_page`
ON test_page.id = test_post.page_id
AND test_post.post_time =
( SELECT MAX(test_post.post_time)
FROM test_post WHERE test_page.id = test_post.page_id );
Then you need to create a model with flag managed = False (so that manage.py sync won't break). You can also use inheritance from abstract Model to avoid column duplication:
class PageWithRecentPost(models.Model): # Or extend abstract BasePost ?
# Page columns goes here
# Post columns goes here
# We use LEFT JOIN, so all columns from the
# 'post' model will need blank=True, null=True
class Meta:
managed = False # Django will not handle creation/reset automatically
By doing that you can do what you initially wanted, so fetch from both tables in just one query:
pages_with_recent_post = PageWithRecentPost.objects.filter(...)
for page in pages_with_recent_post:
print page.name # Page column
print page.post_time # Post column
However this approach is not drawback free:
It's very DB engine-specific
You'll need to add VIEW creation SQL to your project
If your models are complex it's very likely that you'll need to resolve table column name clashes.
Model based on a database view will very likely be read-only (INSERT/UPDATE will fail).
It adds complexity to your project. Allowing for multiple queries is a definitely simpler solution.
Changes in Page/Post will require re-creating the view.