sqlalchemy join with sum and count of grouped rows - sql

Hi i am working on a little prediction game in flask with flask-sqlalchemy I have a User Model:
class User(db.Model, UserMixin):
id = db.Column(db.Integer, primary_key=True)
nick = db.Column(db.String(255), unique=True)
bets = relationship('Bet', backref=backref("user"))
and my Bet model
class Bet(db.Model):
id = db.Column(db.Integer, primary_key=True)
uid = db.Column(db.Integer, db.ForeignKey('user.id'))
matchid = db.Column(db.Integer, db.ForeignKey('match.id'))
points = db.Column(db.Integer)
Both are not the full classes but it should do it for the question. A user can gather points for predicting the match outcome and gets different amount of points for predicting the exact outcome, the winner or the difference.
I now want to have a list of the top users, where i have to sum up the points which i'm doing via
toplist = db.session.query(User.nick, func.sum(Bet.points)).\
join(User.bets).group_by(Bet.uid).order_by(func.sum(Bet.points).desc()).all()
This works quite good, now there maybe the case that two players have the same sum of points. In this case the amount of correct predictions (rewarded with 3 points) would define the winner. I can get this list by
tophits = db.session.query(User.nick, func.count(Bet.points)).\
join(User.bets).filter_by(points=3).all()
They both work well, but I think there has to be a way to get both querys together and get a table with username, points and "hitcount". I've done that before in SQL but i am not that familiar with SQLAlchemy and thought knots in my brain. How can I get both queries in one?

In the query for tophits just replace the COUNT/filter_by construct with equivalent SUM(CASE(..)) without filter so that the WHERE clause for both is the same. The code below should do it:
total_points = func.sum(Bet.points).label("total_points")
total_hits = func.sum(case(value=Bet.points, whens={3: 1}, else_=0)).label("total_hits")
q = (session.query(
User.nick,
total_points,
total_hits,
)
.join(User.bets)
.group_by(User.nick)
.order_by(total_points.desc())
.order_by(total_hits.desc())
)
Note that i changed a group_by clause to use the column which is in SELECT, as some database engines might complain otherwise. But you do not need to do it.

Related

Query on many-to-many relationship in sqlalchemy

I have two db classes linked by a relationship table, and I can retrieve the questions assigned to each paper. However I want to retrieve all questions not currently assigned to the current paper, regardless of any other papers they are assigned to. I've tried a lot of different things, nothing quite did what I wanted.
I believe some kind of outer left join should do it, but I cannot figure out the correct syntax.
Thanks in advance for your suggestions.
Here is the DB structure so far:
class Question(db.Model):
__tablename__ = 'question'
id = db.Column(db.Integer, primary_key=True)
papers = db.relationship(
'Paper', secondary='question_in', backref='has_question', lazy='dynamic')
class Paper(db.Model):
__tablename__ = 'paper'
id = db.Column(db.Integer, primary_key=True)
#returns the questions in the paper
def all_questions(self):
questions = Question.query.filter(Question.papers.any(id=self.id)).all()
return questions
Question_in = db.Table('question_in',
db.Column('question_id', db.Integer, db.ForeignKey('question.id'), primary_key=True),
db.Column('paper_id', db.Integer,db.ForeignKey('paper.id'), primary_key=True),
)
You should be able to follow the same logic you used in your all_questions function, and filter it out using a subquery():
def not_assigned(self):
assigned_questions = db.session.query(Question.id)\
.filter(Question.papers.any(id=self.id)).subquery()
not_assigned = db.session.query(Question)\
.filter(Question.id.notin_(assigned_questions)).all()
return not_assigned

Understanding odd results when paginating double outerjoins

I'm working with Flask and SQLAlchemy and I stumble on behavior I do not understand. When I build a query with .outerjoin() and .paginate() it all works well until today when I created a query with double outerjoin related from one another like on the simplified example code bellow (db is reference to SQLAlchemy). Class First has one-to-many relation with Second and Second is one-to-one with Third.
For testing purpose I have prepared three search queries. First two search_1 and search_2 works well all the time. But search_3 works only until there are two Second records related to the same First record. When there is more than one Second related to First then query returns mostly lower number of records (but not as low as when using .join instead of .outerjoin) and in some cases even higher then the number of records in First table. What's strange number of records is changing even when using different sorting order (always by columns of First model).
class First(db.Model):
__tablename__ = 'first'
id = db.Column(db.Integer, primary_key=True)
date_create = db.Column(db.DateTime, default=datetime.utcnow)
date_update = db.Column(db.DateTime)
class Second(db.Model):
__tablename__ = 'second'
id = db.Column(db.Integer, primary_key=True)
first_id = db.Column(db.Integer, db.ForeignKey('first.id'))
third_id = db.Column(db.Integer, db.ForeignKey('third.id'))
class Third(db.Model):
__tablename__ = 'third'
id = db.Column(db.Integer, primary_key=True)
date_create = db.Column(db.DateTime, default=datetime.utcnow)
# prepare base query to reuse later
outer_base = First.query \
.outerjoin(Second, Second.first_id == First.id) \
.outerjoin(Third, Third.id == Second.third_id) \
.order_by(First.id.asc())
# works well
search_1 = First.query.order_by(First.id.asc()).paginate(1, 10, False)
# works well
search_2 = outer_base.all()
# odd as hell...
search_3 = outer_base.paginate(1, 10, False)
I just want to be able to filter First records by value from Third if there is any relation created with the use of Second table. Can anyone please explain me what am I missing? Maybe the double outerjoin can be achieved differently to work with pagination?

SQLAlchemy query.filter returned no FROM clauses due to auto-correlation

I'm a beginner SQLAlchemy user frustrated with the extensive documentation.
I have a newsfeed that is updated when a new revision is made to some content object. content always has at least one revision. content is related to 1 or more topics through an association table. I'm given a set of topic.ids T, and would like to show the N most recent "approved" revisions belonging to a row in content that has at least one topic in T. ("approved" is just an enum attribute on revision)
Here are the models and the relevant attributes:
class Revision(Model):
__tablename__ = 'revision'
class Statuses(object): # enum
APPROVED = 'approved'
PROPOSED = 'proposed'
REJECTED = 'rejected'
values = [APPROVED, PROPOSED, REJECTED]
id = Column(Integer, primary_key=True)
data = Column(JSONB, default=[], nullable=False)
content_id = db.Column(db.Integer, db.ForeignKey('content.id'), nullable=False)
class Content(Model):
__tablename__ = 'content'
id = Column(Integer, primary_key=True)
topic_edges = relationship(
'TopicContentAssociation',
primaryjoin='Content.id == TopicContentAssociation.content_id',
backref='content',
lazy='dynamic',
cascade='all, delete-orphan'
)
revisions = relationship(
'Revision',
lazy='dynamic',
backref='content',
cascade='all, delete-orphan'
)
class TopicContentAssociation(Model):
__tablename__ = 'topic_content_association'
topic_id = Column(Integer, ForeignKey('topic.id'), primary_key=True)
content_id = Column(Integer, ForeignKey('content.id'), primary_key=True)
class Topic(Model):
__tablename__ = 'topic'
id = Column(Integer, primary_key=True)
Here's what I've got so far:
revisions = session.query(Revision).outerjoin(Content).outerjoin(Topic).filter(
~exists().where(
and_(
Topic.id.in_(T),
Revision.status == Revision.Statuses.APPROVED
) )
).order_by(Revision.ts_created.desc()).limit(N)
and this error is happening:
Select statement returned no FROM clauses due to auto-correlation; specify correlate(<tables>) to control correlation manually.:
SELECT *
FROM topic, revision
WHERE topic.id IN (:id_1, :id_2, :id_3...)
AND revision.status = :status_1
The interesting part is that if I remove the and_ operator and the second expression within it (lines 3, 5, and 6), the error seems to go away.
BONUS: :) I would also like to show only one revision per row of content. If somebody hits save a bunch of times, I don't want the feed to be cluttered.
Again, I'm very new to SQLAlchemy (and actually, relational databases), so an answer targeted to a beginner would be much appreciated!
EDIT: adding .correlate(Revision) after the .where clause fixes things, but I'm still working to figure out exactly what is going on here.
This is a very late response, but I am replying in case someone finds this topic.
Your second expression within the exist statement (Revision.status == Revision.Statuses.APPROVED) calls the table Revision for a second time in your query. The first time you call this table is by writing "session.query(Revision)"
If we "translate" your exists statement in PostgreSQL it would be:
EXISTS (SELECT 1
FROM Revision
WHERE topic.id IN (:id_1, :id_2, :id_3...)
AND revision.status = :status_1
)
Calling the same table twice (FROM Revision) in the same query is not allowed unless you use an alias. So, you can create an alias of the desired table, using the aliased() function and solve your problem. Your code should be fine like this:
from sqlalchemy.orm import aliased
aliasRev = aliased(Revision)
revisions = session.query(Revision).outerjoin(Content).outerjoin(Topic).filter(
~exists().where(
and_(
Topic.id.in_(T),
aliasRev.status == Revision.Statuses.APPROVED
) )
).order_by(Revision.ts_created.desc()).limit(N)

SQLAlchemy: select all posts that has tags in [..] (many-to-many)

I have Users, Interests and Events.
User has (many-to-many) interests. Event has (many-to-many) interests. That's why I have two "intermediate" tables: user_to_interest and event_to_interest.
I want to somehow select all events that has interests from user's interests list (in other words, all events that has tags IN [1, 144, 4324]).
In SQL I'd do that ~like this:
SELECT DISTINCT event.name FROM event JOIN event_to_interest ON event.id = event_to_interest.event_id WHERE event_to_interest.interest_id IN (10, 144, 432)
How should I do that through SQLAlchemy? (I'm using Flask-SQLAlchemy if necessary)
Assuming you have a (simplified) model like below:
user_to_interest = Table('user_to_interest', Base.metadata,
Column('id', Integer, primary_key=True),
Column('user_id', Integer, ForeignKey('user.id')),
Column('interest_id', Integer, ForeignKey('interest.id'))
)
event_to_interest = Table('event_to_interest', Base.metadata,
Column('id', Integer, primary_key=True),
Column('event_id', Integer, ForeignKey('event.id')),
Column('interest_id', Integer, ForeignKey('interest.id'))
)
class User(Base):
__tablename__ = 'user'
id = Column(Integer, primary_key=True)
name = Column(String)
class Event(Base):
__tablename__ = 'event'
id = Column(Integer, primary_key=True)
name = Column(String)
class Interest(Base):
__tablename__ = 'interest'
id = Column(Integer, primary_key=True)
name = Column(String)
users = relationship(User, secondary=user_to_interest, backref="interests")
events = relationship(Event, secondary=event_to_interest, backref="interests")
Version-1: you should be able to do simple query on list of interest_ids, which will generate basically the SQL statement you desire:
interest_ids = [10, 144, 432]
query = session.query(Event.name)
query = query.join(event_to_interest, event_to_interest.c.event_id == Event.id)
query = query.filter(event_to_interest.c.interest_id.in_(interest_ids))
However, if there are events which have two or more of the interests from the list, the query will return the same Event.name multiple times. You can work-around it by using distinct though: query = session.query(Event.name.distinct())
Version-2: Alternatively, you could do this using just relationships, which will generate different SQL using sub-query with EXISTS clause, but semantically it should be the same:
query = session.query(Event.name)
query = query.filter(Event.interests.any(Interest.id.in_(interest_ids)))
This version does not have a problem with duplicates.
However, I would go one step back, and assume that you do get interest_ids for particular user, and would create a query that works for a user_id (or User.id)
Final Version: using any twice:
def get_events_for_user(user_id):
#query = session.query(Event.name)
query = session.query(Event) # #note: I assume name is not enough
query = query.filter(Event.interests.any(Interest.users.any(User.id == user_id)))
return query.all()
One can agrue that this creates not so beautiful SQL statement, but this is exactly the beauty of using SQLAlchemy which hides the implementation details.
Bonus: you might actually want to give higher priority to the events which have more overlapping interests. In this case the below could help:
query = session.query(Event, func.count('*').label("num_interests"))
query = query.join(Interest, Event.interests)
query = query.join(User, Interest.users)
query = query.filter(User.id == user_id)
query = query.group_by(Event)
# first order by overlaping interests, then also by event.date
query = query.order_by(func.count('*').label("num_interests").desc())
#query = query.order_by(Event.date)

Django sql order by

I'm really struggling on this one.
I need to be able to sort my user by the number of positive vote received on their comment.
I have a table userprofile, a table comment and a table likeComment.
The table comment has a foreign key to its user creator and the table likeComment has a foreign key to the comment liked.
To get the number of positive vote a user received I do :
LikeComment.objects.filter(Q(type = 1), Q(comment__user=user)).count()
Now I want to be able to get all the users sorted by the ones that have the most positive votes. How do I do that ? I tried to use extra and JOIN but this didn't go anywhere.
Thank you
It sounds like you want to perform a filter on an annotation:
class User(models.Model):
pass
class Comment(models.Model):
user = models.ForeignKey(User, related_name="comments")
class Like(models.Model):
comment = models.ForeignKey(Comment, related_name="likes")
type = models.IntegerField()
users = User \
.objects \
.all()
.extra(select = {
"positive_likes" : """
SELECT COUNT(*) FROM app_like
JOIN app_comment on app_like.comment_id = app_comment.id
WHERE app_comment.user_id = app_user.id AND app_like.type = 1 """})
.order_by("positive_likes")
models.py
class UserProfile(models.Model):
.........
def like_count(self):
LikeComment.objects.filter(comment__user=self.user, type=1).count()
views.py
def getRanking( anObject ):
return anObject.like_count()
def myview(request):
users = list(UserProfile.objects.filter())
users.sort(key=getRanking, reverse=True)
return render(request,'page.html',{'users': users})
Timmy's suggestion to use a subquery is probably the simplest way to solve this kind of problem, but subqueries almost never perform as well as joins, so if you have a lot of users you may find that you need better performance.
So, re-using Timmy's models:
class User(models.Model):
pass
class Comment(models.Model):
user = models.ForeignKey(User, related_name="comments")
class Like(models.Model):
comment = models.ForeignKey(Comment, related_name="likes")
type = models.IntegerField()
the query you want looks like this in SQL:
SELECT app_user.id, COUNT(app_like.id) AS total_likes
FROM app_user
LEFT OUTER JOIN app_comment
ON app_user.id = app_comment.user_id
LEFT OUTER JOIN app_like
ON app_comment.id = app_like.comment_id AND app_like.type = 1
GROUP BY app_user.id
ORDER BY total_likes DESCENDING
(If your actual User model has more fields than just id, then you'll need to include them all in the SELECT and GROUP BY clauses.)
Django's object-relational mapping system doesn't provide a way to express this query. (As far as I know—and I'd be very happy to be told otherwise!—it only supports aggregation across one join, not across two joins as here.) But when the ORM isn't quite up to the job, you can always run a raw SQL query, like this:
sql = '''
SELECT app_user.id, COUNT(app_like.id) AS total_likes
# etc (as above)
'''
for user in User.objects.raw(sql):
print user.id, user.total_likes
I believe this can be achieved with Django's queryset:
User.objects.filter(comments__likes__type=1)\
.annotate(lks=Count('comments__likes'))\
.order_by('-lks')
The only problem here is that this query will miss users with 0 likes. Code from #gareth-rees, #timmy-omahony and #Catherine will include also 0-ranked users.