Understanding odd results when paginating double outerjoins - flask-sqlalchemy

I'm working with Flask and SQLAlchemy and I stumble on behavior I do not understand. When I build a query with .outerjoin() and .paginate() it all works well until today when I created a query with double outerjoin related from one another like on the simplified example code bellow (db is reference to SQLAlchemy). Class First has one-to-many relation with Second and Second is one-to-one with Third.
For testing purpose I have prepared three search queries. First two search_1 and search_2 works well all the time. But search_3 works only until there are two Second records related to the same First record. When there is more than one Second related to First then query returns mostly lower number of records (but not as low as when using .join instead of .outerjoin) and in some cases even higher then the number of records in First table. What's strange number of records is changing even when using different sorting order (always by columns of First model).
class First(db.Model):
__tablename__ = 'first'
id = db.Column(db.Integer, primary_key=True)
date_create = db.Column(db.DateTime, default=datetime.utcnow)
date_update = db.Column(db.DateTime)
class Second(db.Model):
__tablename__ = 'second'
id = db.Column(db.Integer, primary_key=True)
first_id = db.Column(db.Integer, db.ForeignKey('first.id'))
third_id = db.Column(db.Integer, db.ForeignKey('third.id'))
class Third(db.Model):
__tablename__ = 'third'
id = db.Column(db.Integer, primary_key=True)
date_create = db.Column(db.DateTime, default=datetime.utcnow)
# prepare base query to reuse later
outer_base = First.query \
.outerjoin(Second, Second.first_id == First.id) \
.outerjoin(Third, Third.id == Second.third_id) \
.order_by(First.id.asc())
# works well
search_1 = First.query.order_by(First.id.asc()).paginate(1, 10, False)
# works well
search_2 = outer_base.all()
# odd as hell...
search_3 = outer_base.paginate(1, 10, False)
I just want to be able to filter First records by value from Third if there is any relation created with the use of Second table. Can anyone please explain me what am I missing? Maybe the double outerjoin can be achieved differently to work with pagination?

Related

Query on many-to-many relationship in sqlalchemy

I have two db classes linked by a relationship table, and I can retrieve the questions assigned to each paper. However I want to retrieve all questions not currently assigned to the current paper, regardless of any other papers they are assigned to. I've tried a lot of different things, nothing quite did what I wanted.
I believe some kind of outer left join should do it, but I cannot figure out the correct syntax.
Thanks in advance for your suggestions.
Here is the DB structure so far:
class Question(db.Model):
__tablename__ = 'question'
id = db.Column(db.Integer, primary_key=True)
papers = db.relationship(
'Paper', secondary='question_in', backref='has_question', lazy='dynamic')
class Paper(db.Model):
__tablename__ = 'paper'
id = db.Column(db.Integer, primary_key=True)
#returns the questions in the paper
def all_questions(self):
questions = Question.query.filter(Question.papers.any(id=self.id)).all()
return questions
Question_in = db.Table('question_in',
db.Column('question_id', db.Integer, db.ForeignKey('question.id'), primary_key=True),
db.Column('paper_id', db.Integer,db.ForeignKey('paper.id'), primary_key=True),
)
You should be able to follow the same logic you used in your all_questions function, and filter it out using a subquery():
def not_assigned(self):
assigned_questions = db.session.query(Question.id)\
.filter(Question.papers.any(id=self.id)).subquery()
not_assigned = db.session.query(Question)\
.filter(Question.id.notin_(assigned_questions)).all()
return not_assigned

Merging together a lot of small queries

I am making an app that analyses in-game items of a player and displays their prices. Items are retrieved from outside in JSON format in form of an array of items, which have fields like id, and several other attributes that are to be processed. Each item is fully defined only after processing all the attributes. Accessing each item's price in SQLAlchemy takes microseconds (1-2), but there are several hundreds of them, so there is a couple second load time, which is not really acceptable delay for a web app. What ways are there to consolidate a query or otherwise speed up accessing the information?
Model is really simple:
class Item(db.Model):
__tablename__ = "prices"
defindex = db.Column(db.Integer, primary_key=True)
quality = db.Column(db.Integer, primary_key=True)
craftable = db.Column(db.Boolean, primary_key=True)
tradeable = db.Column(db.Boolean, primary_key=True)
item_metadata = db.Column(db.String(70), primary_key=True)
name = db.Column(db.String(70), primary_key=True)
currency = db.Column(db.String(70))
price = db.Column(db.Float)
price_high = db.Column(db.Float)
It is a set of attributes that constitute a primary key and price, no relationships, nothing special.

SQLAlchemy query.filter returned no FROM clauses due to auto-correlation

I'm a beginner SQLAlchemy user frustrated with the extensive documentation.
I have a newsfeed that is updated when a new revision is made to some content object. content always has at least one revision. content is related to 1 or more topics through an association table. I'm given a set of topic.ids T, and would like to show the N most recent "approved" revisions belonging to a row in content that has at least one topic in T. ("approved" is just an enum attribute on revision)
Here are the models and the relevant attributes:
class Revision(Model):
__tablename__ = 'revision'
class Statuses(object): # enum
APPROVED = 'approved'
PROPOSED = 'proposed'
REJECTED = 'rejected'
values = [APPROVED, PROPOSED, REJECTED]
id = Column(Integer, primary_key=True)
data = Column(JSONB, default=[], nullable=False)
content_id = db.Column(db.Integer, db.ForeignKey('content.id'), nullable=False)
class Content(Model):
__tablename__ = 'content'
id = Column(Integer, primary_key=True)
topic_edges = relationship(
'TopicContentAssociation',
primaryjoin='Content.id == TopicContentAssociation.content_id',
backref='content',
lazy='dynamic',
cascade='all, delete-orphan'
)
revisions = relationship(
'Revision',
lazy='dynamic',
backref='content',
cascade='all, delete-orphan'
)
class TopicContentAssociation(Model):
__tablename__ = 'topic_content_association'
topic_id = Column(Integer, ForeignKey('topic.id'), primary_key=True)
content_id = Column(Integer, ForeignKey('content.id'), primary_key=True)
class Topic(Model):
__tablename__ = 'topic'
id = Column(Integer, primary_key=True)
Here's what I've got so far:
revisions = session.query(Revision).outerjoin(Content).outerjoin(Topic).filter(
~exists().where(
and_(
Topic.id.in_(T),
Revision.status == Revision.Statuses.APPROVED
) )
).order_by(Revision.ts_created.desc()).limit(N)
and this error is happening:
Select statement returned no FROM clauses due to auto-correlation; specify correlate(<tables>) to control correlation manually.:
SELECT *
FROM topic, revision
WHERE topic.id IN (:id_1, :id_2, :id_3...)
AND revision.status = :status_1
The interesting part is that if I remove the and_ operator and the second expression within it (lines 3, 5, and 6), the error seems to go away.
BONUS: :) I would also like to show only one revision per row of content. If somebody hits save a bunch of times, I don't want the feed to be cluttered.
Again, I'm very new to SQLAlchemy (and actually, relational databases), so an answer targeted to a beginner would be much appreciated!
EDIT: adding .correlate(Revision) after the .where clause fixes things, but I'm still working to figure out exactly what is going on here.
This is a very late response, but I am replying in case someone finds this topic.
Your second expression within the exist statement (Revision.status == Revision.Statuses.APPROVED) calls the table Revision for a second time in your query. The first time you call this table is by writing "session.query(Revision)"
If we "translate" your exists statement in PostgreSQL it would be:
EXISTS (SELECT 1
FROM Revision
WHERE topic.id IN (:id_1, :id_2, :id_3...)
AND revision.status = :status_1
)
Calling the same table twice (FROM Revision) in the same query is not allowed unless you use an alias. So, you can create an alias of the desired table, using the aliased() function and solve your problem. Your code should be fine like this:
from sqlalchemy.orm import aliased
aliasRev = aliased(Revision)
revisions = session.query(Revision).outerjoin(Content).outerjoin(Topic).filter(
~exists().where(
and_(
Topic.id.in_(T),
aliasRev.status == Revision.Statuses.APPROVED
) )
).order_by(Revision.ts_created.desc()).limit(N)

sqlalchemy: paginate does not return the expected number of elements

I am using flask-sqlalchemy together with a sqlite database. I try to get all votes below date1
sub_query = models.VoteList.query.filter(models.VoteList.vote_datetime < date1)
sub_query = sub_query.filter(models.VoteList.group_id == selected_group.id)
sub_query = sub_query.filter(models.VoteList.user_id == g.user.id)
sub_query = sub_query.subquery()
old_votes = models.Papers.query.join(sub_query, sub_query.c.arxiv_id == models.Papers.arxiv_id).paginate(1, 4, False)
where the database model for VoteList looks like this
class VoteList(db.Model):
id = db.Column(db.Integer, primary_key=True)
user_id = db.Column(db.Integer, db.ForeignKey('user.id'))
group_id = db.Column(db.Integer, db.ForeignKey('groups.id'))
arxiv_id = db.Column(db.String(1000), db.ForeignKey('papers.arxiv_id'))
vote_datetime = db.Column(db.DateTime)
group = db.relationship("Groups", backref=db.backref('vote_list', lazy='dynamic'))
user = db.relationship("User", backref=db.backref('votes', lazy='dynamic'), foreign_keys=[user_id])
def __repr__(self):
return '<VoteList %r>' % (self.id)
I made sure that the 'old_votes' selection above has 20 elements. If I use .all() instead of .paginate() I get the expected 20 result?
Since I used a max results value of 4 in the example above I would expect that old_votes.items has 4 elements. But it has only 2? If I increase the max results value the number of elements also increases, but it is always below the max result value? Paginate seems to mess up something here?
any ideas?
thanks
carl
EDIT
I noticed that it works fine if I apply the paginate() function on add_columns(). So if I add (for no good reason) a column with
old_votes = models.Papers.query.join(sub_query, sub_query.c.arxiv_id == models.Papers.arxiv_id)
old_votes = old_votes.add_columns(sub_query.c.vote_datetime).paginate(page, VOTES_PER_PAGE, False)
it works fine? But since I don't need that column it would still be interesting to know what goes wrong with my example above?
Looks to me that for the 4 rows returned (and filtered) by the query, there are 4 rows representing 4 different rows of the VoteList table, but they refer/link/belong to only 2 different Papers models. When model instances are created, duplicates are filtered out, and therefore you get less rows. When you add a column from a subquery, the results are tuples of (Papers, vote_datetime), and in this case no duplicates are removed.
I encountered the same issue and I applied van's answer but it did not work. However I agree with van's explanation so I added .distinct() to the query like this:
old_votes = models.Papers.query.distinct().join(sub_query, sub_query.c.arxiv_id == models.Papers.arxiv_id).paginate(1, 4, False)
It worked as I expected.

sqlalchemy join with sum and count of grouped rows

Hi i am working on a little prediction game in flask with flask-sqlalchemy I have a User Model:
class User(db.Model, UserMixin):
id = db.Column(db.Integer, primary_key=True)
nick = db.Column(db.String(255), unique=True)
bets = relationship('Bet', backref=backref("user"))
and my Bet model
class Bet(db.Model):
id = db.Column(db.Integer, primary_key=True)
uid = db.Column(db.Integer, db.ForeignKey('user.id'))
matchid = db.Column(db.Integer, db.ForeignKey('match.id'))
points = db.Column(db.Integer)
Both are not the full classes but it should do it for the question. A user can gather points for predicting the match outcome and gets different amount of points for predicting the exact outcome, the winner or the difference.
I now want to have a list of the top users, where i have to sum up the points which i'm doing via
toplist = db.session.query(User.nick, func.sum(Bet.points)).\
join(User.bets).group_by(Bet.uid).order_by(func.sum(Bet.points).desc()).all()
This works quite good, now there maybe the case that two players have the same sum of points. In this case the amount of correct predictions (rewarded with 3 points) would define the winner. I can get this list by
tophits = db.session.query(User.nick, func.count(Bet.points)).\
join(User.bets).filter_by(points=3).all()
They both work well, but I think there has to be a way to get both querys together and get a table with username, points and "hitcount". I've done that before in SQL but i am not that familiar with SQLAlchemy and thought knots in my brain. How can I get both queries in one?
In the query for tophits just replace the COUNT/filter_by construct with equivalent SUM(CASE(..)) without filter so that the WHERE clause for both is the same. The code below should do it:
total_points = func.sum(Bet.points).label("total_points")
total_hits = func.sum(case(value=Bet.points, whens={3: 1}, else_=0)).label("total_hits")
q = (session.query(
User.nick,
total_points,
total_hits,
)
.join(User.bets)
.group_by(User.nick)
.order_by(total_points.desc())
.order_by(total_hits.desc())
)
Note that i changed a group_by clause to use the column which is in SELECT, as some database engines might complain otherwise. But you do not need to do it.