Query on many-to-many relationship in sqlalchemy - sql

I have two db classes linked by a relationship table, and I can retrieve the questions assigned to each paper. However I want to retrieve all questions not currently assigned to the current paper, regardless of any other papers they are assigned to. I've tried a lot of different things, nothing quite did what I wanted.
I believe some kind of outer left join should do it, but I cannot figure out the correct syntax.
Thanks in advance for your suggestions.
Here is the DB structure so far:
class Question(db.Model):
__tablename__ = 'question'
id = db.Column(db.Integer, primary_key=True)
papers = db.relationship(
'Paper', secondary='question_in', backref='has_question', lazy='dynamic')
class Paper(db.Model):
__tablename__ = 'paper'
id = db.Column(db.Integer, primary_key=True)
#returns the questions in the paper
def all_questions(self):
questions = Question.query.filter(Question.papers.any(id=self.id)).all()
return questions
Question_in = db.Table('question_in',
db.Column('question_id', db.Integer, db.ForeignKey('question.id'), primary_key=True),
db.Column('paper_id', db.Integer,db.ForeignKey('paper.id'), primary_key=True),
)

You should be able to follow the same logic you used in your all_questions function, and filter it out using a subquery():
def not_assigned(self):
assigned_questions = db.session.query(Question.id)\
.filter(Question.papers.any(id=self.id)).subquery()
not_assigned = db.session.query(Question)\
.filter(Question.id.notin_(assigned_questions)).all()
return not_assigned

Related

Understanding odd results when paginating double outerjoins

I'm working with Flask and SQLAlchemy and I stumble on behavior I do not understand. When I build a query with .outerjoin() and .paginate() it all works well until today when I created a query with double outerjoin related from one another like on the simplified example code bellow (db is reference to SQLAlchemy). Class First has one-to-many relation with Second and Second is one-to-one with Third.
For testing purpose I have prepared three search queries. First two search_1 and search_2 works well all the time. But search_3 works only until there are two Second records related to the same First record. When there is more than one Second related to First then query returns mostly lower number of records (but not as low as when using .join instead of .outerjoin) and in some cases even higher then the number of records in First table. What's strange number of records is changing even when using different sorting order (always by columns of First model).
class First(db.Model):
__tablename__ = 'first'
id = db.Column(db.Integer, primary_key=True)
date_create = db.Column(db.DateTime, default=datetime.utcnow)
date_update = db.Column(db.DateTime)
class Second(db.Model):
__tablename__ = 'second'
id = db.Column(db.Integer, primary_key=True)
first_id = db.Column(db.Integer, db.ForeignKey('first.id'))
third_id = db.Column(db.Integer, db.ForeignKey('third.id'))
class Third(db.Model):
__tablename__ = 'third'
id = db.Column(db.Integer, primary_key=True)
date_create = db.Column(db.DateTime, default=datetime.utcnow)
# prepare base query to reuse later
outer_base = First.query \
.outerjoin(Second, Second.first_id == First.id) \
.outerjoin(Third, Third.id == Second.third_id) \
.order_by(First.id.asc())
# works well
search_1 = First.query.order_by(First.id.asc()).paginate(1, 10, False)
# works well
search_2 = outer_base.all()
# odd as hell...
search_3 = outer_base.paginate(1, 10, False)
I just want to be able to filter First records by value from Third if there is any relation created with the use of Second table. Can anyone please explain me what am I missing? Maybe the double outerjoin can be achieved differently to work with pagination?

Merging together a lot of small queries

I am making an app that analyses in-game items of a player and displays their prices. Items are retrieved from outside in JSON format in form of an array of items, which have fields like id, and several other attributes that are to be processed. Each item is fully defined only after processing all the attributes. Accessing each item's price in SQLAlchemy takes microseconds (1-2), but there are several hundreds of them, so there is a couple second load time, which is not really acceptable delay for a web app. What ways are there to consolidate a query or otherwise speed up accessing the information?
Model is really simple:
class Item(db.Model):
__tablename__ = "prices"
defindex = db.Column(db.Integer, primary_key=True)
quality = db.Column(db.Integer, primary_key=True)
craftable = db.Column(db.Boolean, primary_key=True)
tradeable = db.Column(db.Boolean, primary_key=True)
item_metadata = db.Column(db.String(70), primary_key=True)
name = db.Column(db.String(70), primary_key=True)
currency = db.Column(db.String(70))
price = db.Column(db.Float)
price_high = db.Column(db.Float)
It is a set of attributes that constitute a primary key and price, no relationships, nothing special.

SQLAlchemy query.filter returned no FROM clauses due to auto-correlation

I'm a beginner SQLAlchemy user frustrated with the extensive documentation.
I have a newsfeed that is updated when a new revision is made to some content object. content always has at least one revision. content is related to 1 or more topics through an association table. I'm given a set of topic.ids T, and would like to show the N most recent "approved" revisions belonging to a row in content that has at least one topic in T. ("approved" is just an enum attribute on revision)
Here are the models and the relevant attributes:
class Revision(Model):
__tablename__ = 'revision'
class Statuses(object): # enum
APPROVED = 'approved'
PROPOSED = 'proposed'
REJECTED = 'rejected'
values = [APPROVED, PROPOSED, REJECTED]
id = Column(Integer, primary_key=True)
data = Column(JSONB, default=[], nullable=False)
content_id = db.Column(db.Integer, db.ForeignKey('content.id'), nullable=False)
class Content(Model):
__tablename__ = 'content'
id = Column(Integer, primary_key=True)
topic_edges = relationship(
'TopicContentAssociation',
primaryjoin='Content.id == TopicContentAssociation.content_id',
backref='content',
lazy='dynamic',
cascade='all, delete-orphan'
)
revisions = relationship(
'Revision',
lazy='dynamic',
backref='content',
cascade='all, delete-orphan'
)
class TopicContentAssociation(Model):
__tablename__ = 'topic_content_association'
topic_id = Column(Integer, ForeignKey('topic.id'), primary_key=True)
content_id = Column(Integer, ForeignKey('content.id'), primary_key=True)
class Topic(Model):
__tablename__ = 'topic'
id = Column(Integer, primary_key=True)
Here's what I've got so far:
revisions = session.query(Revision).outerjoin(Content).outerjoin(Topic).filter(
~exists().where(
and_(
Topic.id.in_(T),
Revision.status == Revision.Statuses.APPROVED
) )
).order_by(Revision.ts_created.desc()).limit(N)
and this error is happening:
Select statement returned no FROM clauses due to auto-correlation; specify correlate(<tables>) to control correlation manually.:
SELECT *
FROM topic, revision
WHERE topic.id IN (:id_1, :id_2, :id_3...)
AND revision.status = :status_1
The interesting part is that if I remove the and_ operator and the second expression within it (lines 3, 5, and 6), the error seems to go away.
BONUS: :) I would also like to show only one revision per row of content. If somebody hits save a bunch of times, I don't want the feed to be cluttered.
Again, I'm very new to SQLAlchemy (and actually, relational databases), so an answer targeted to a beginner would be much appreciated!
EDIT: adding .correlate(Revision) after the .where clause fixes things, but I'm still working to figure out exactly what is going on here.
This is a very late response, but I am replying in case someone finds this topic.
Your second expression within the exist statement (Revision.status == Revision.Statuses.APPROVED) calls the table Revision for a second time in your query. The first time you call this table is by writing "session.query(Revision)"
If we "translate" your exists statement in PostgreSQL it would be:
EXISTS (SELECT 1
FROM Revision
WHERE topic.id IN (:id_1, :id_2, :id_3...)
AND revision.status = :status_1
)
Calling the same table twice (FROM Revision) in the same query is not allowed unless you use an alias. So, you can create an alias of the desired table, using the aliased() function and solve your problem. Your code should be fine like this:
from sqlalchemy.orm import aliased
aliasRev = aliased(Revision)
revisions = session.query(Revision).outerjoin(Content).outerjoin(Topic).filter(
~exists().where(
and_(
Topic.id.in_(T),
aliasRev.status == Revision.Statuses.APPROVED
) )
).order_by(Revision.ts_created.desc()).limit(N)

SQLAlchemy: select all posts that has tags in [..] (many-to-many)

I have Users, Interests and Events.
User has (many-to-many) interests. Event has (many-to-many) interests. That's why I have two "intermediate" tables: user_to_interest and event_to_interest.
I want to somehow select all events that has interests from user's interests list (in other words, all events that has tags IN [1, 144, 4324]).
In SQL I'd do that ~like this:
SELECT DISTINCT event.name FROM event JOIN event_to_interest ON event.id = event_to_interest.event_id WHERE event_to_interest.interest_id IN (10, 144, 432)
How should I do that through SQLAlchemy? (I'm using Flask-SQLAlchemy if necessary)
Assuming you have a (simplified) model like below:
user_to_interest = Table('user_to_interest', Base.metadata,
Column('id', Integer, primary_key=True),
Column('user_id', Integer, ForeignKey('user.id')),
Column('interest_id', Integer, ForeignKey('interest.id'))
)
event_to_interest = Table('event_to_interest', Base.metadata,
Column('id', Integer, primary_key=True),
Column('event_id', Integer, ForeignKey('event.id')),
Column('interest_id', Integer, ForeignKey('interest.id'))
)
class User(Base):
__tablename__ = 'user'
id = Column(Integer, primary_key=True)
name = Column(String)
class Event(Base):
__tablename__ = 'event'
id = Column(Integer, primary_key=True)
name = Column(String)
class Interest(Base):
__tablename__ = 'interest'
id = Column(Integer, primary_key=True)
name = Column(String)
users = relationship(User, secondary=user_to_interest, backref="interests")
events = relationship(Event, secondary=event_to_interest, backref="interests")
Version-1: you should be able to do simple query on list of interest_ids, which will generate basically the SQL statement you desire:
interest_ids = [10, 144, 432]
query = session.query(Event.name)
query = query.join(event_to_interest, event_to_interest.c.event_id == Event.id)
query = query.filter(event_to_interest.c.interest_id.in_(interest_ids))
However, if there are events which have two or more of the interests from the list, the query will return the same Event.name multiple times. You can work-around it by using distinct though: query = session.query(Event.name.distinct())
Version-2: Alternatively, you could do this using just relationships, which will generate different SQL using sub-query with EXISTS clause, but semantically it should be the same:
query = session.query(Event.name)
query = query.filter(Event.interests.any(Interest.id.in_(interest_ids)))
This version does not have a problem with duplicates.
However, I would go one step back, and assume that you do get interest_ids for particular user, and would create a query that works for a user_id (or User.id)
Final Version: using any twice:
def get_events_for_user(user_id):
#query = session.query(Event.name)
query = session.query(Event) # #note: I assume name is not enough
query = query.filter(Event.interests.any(Interest.users.any(User.id == user_id)))
return query.all()
One can agrue that this creates not so beautiful SQL statement, but this is exactly the beauty of using SQLAlchemy which hides the implementation details.
Bonus: you might actually want to give higher priority to the events which have more overlapping interests. In this case the below could help:
query = session.query(Event, func.count('*').label("num_interests"))
query = query.join(Interest, Event.interests)
query = query.join(User, Interest.users)
query = query.filter(User.id == user_id)
query = query.group_by(Event)
# first order by overlaping interests, then also by event.date
query = query.order_by(func.count('*').label("num_interests").desc())
#query = query.order_by(Event.date)

sqlalchemy join with sum and count of grouped rows

Hi i am working on a little prediction game in flask with flask-sqlalchemy I have a User Model:
class User(db.Model, UserMixin):
id = db.Column(db.Integer, primary_key=True)
nick = db.Column(db.String(255), unique=True)
bets = relationship('Bet', backref=backref("user"))
and my Bet model
class Bet(db.Model):
id = db.Column(db.Integer, primary_key=True)
uid = db.Column(db.Integer, db.ForeignKey('user.id'))
matchid = db.Column(db.Integer, db.ForeignKey('match.id'))
points = db.Column(db.Integer)
Both are not the full classes but it should do it for the question. A user can gather points for predicting the match outcome and gets different amount of points for predicting the exact outcome, the winner or the difference.
I now want to have a list of the top users, where i have to sum up the points which i'm doing via
toplist = db.session.query(User.nick, func.sum(Bet.points)).\
join(User.bets).group_by(Bet.uid).order_by(func.sum(Bet.points).desc()).all()
This works quite good, now there maybe the case that two players have the same sum of points. In this case the amount of correct predictions (rewarded with 3 points) would define the winner. I can get this list by
tophits = db.session.query(User.nick, func.count(Bet.points)).\
join(User.bets).filter_by(points=3).all()
They both work well, but I think there has to be a way to get both querys together and get a table with username, points and "hitcount". I've done that before in SQL but i am not that familiar with SQLAlchemy and thought knots in my brain. How can I get both queries in one?
In the query for tophits just replace the COUNT/filter_by construct with equivalent SUM(CASE(..)) without filter so that the WHERE clause for both is the same. The code below should do it:
total_points = func.sum(Bet.points).label("total_points")
total_hits = func.sum(case(value=Bet.points, whens={3: 1}, else_=0)).label("total_hits")
q = (session.query(
User.nick,
total_points,
total_hits,
)
.join(User.bets)
.group_by(User.nick)
.order_by(total_points.desc())
.order_by(total_hits.desc())
)
Note that i changed a group_by clause to use the column which is in SELECT, as some database engines might complain otherwise. But you do not need to do it.