Merging together a lot of small queries - optimization

I am making an app that analyses in-game items of a player and displays their prices. Items are retrieved from outside in JSON format in form of an array of items, which have fields like id, and several other attributes that are to be processed. Each item is fully defined only after processing all the attributes. Accessing each item's price in SQLAlchemy takes microseconds (1-2), but there are several hundreds of them, so there is a couple second load time, which is not really acceptable delay for a web app. What ways are there to consolidate a query or otherwise speed up accessing the information?
Model is really simple:
class Item(db.Model):
__tablename__ = "prices"
defindex = db.Column(db.Integer, primary_key=True)
quality = db.Column(db.Integer, primary_key=True)
craftable = db.Column(db.Boolean, primary_key=True)
tradeable = db.Column(db.Boolean, primary_key=True)
item_metadata = db.Column(db.String(70), primary_key=True)
name = db.Column(db.String(70), primary_key=True)
currency = db.Column(db.String(70))
price = db.Column(db.Float)
price_high = db.Column(db.Float)
It is a set of attributes that constitute a primary key and price, no relationships, nothing special.

Related

Query on many-to-many relationship in sqlalchemy

I have two db classes linked by a relationship table, and I can retrieve the questions assigned to each paper. However I want to retrieve all questions not currently assigned to the current paper, regardless of any other papers they are assigned to. I've tried a lot of different things, nothing quite did what I wanted.
I believe some kind of outer left join should do it, but I cannot figure out the correct syntax.
Thanks in advance for your suggestions.
Here is the DB structure so far:
class Question(db.Model):
__tablename__ = 'question'
id = db.Column(db.Integer, primary_key=True)
papers = db.relationship(
'Paper', secondary='question_in', backref='has_question', lazy='dynamic')
class Paper(db.Model):
__tablename__ = 'paper'
id = db.Column(db.Integer, primary_key=True)
#returns the questions in the paper
def all_questions(self):
questions = Question.query.filter(Question.papers.any(id=self.id)).all()
return questions
Question_in = db.Table('question_in',
db.Column('question_id', db.Integer, db.ForeignKey('question.id'), primary_key=True),
db.Column('paper_id', db.Integer,db.ForeignKey('paper.id'), primary_key=True),
)
You should be able to follow the same logic you used in your all_questions function, and filter it out using a subquery():
def not_assigned(self):
assigned_questions = db.session.query(Question.id)\
.filter(Question.papers.any(id=self.id)).subquery()
not_assigned = db.session.query(Question)\
.filter(Question.id.notin_(assigned_questions)).all()
return not_assigned

How to model tournaments database into a SQL in django

I want to model a tournament database to store data of online games
My question is: How to create a model in relationship database to store all this types of tournaments? (such as, league of legends tournament, dota 2 tournament)
For example, a tournament can have 8 teams or 5 teams.
This is the sketch I created in my mind. What things do you suggest (especially I need help with relationships of tables).
Also how to keep team 1 and team 2 in the match table (such as, scores, winner, loser)
i thought;
Game database
game_id,name
Player database
player_id,name,surname,country,Game(FK).. ( and some other fields)
Team database
team_id,name,country,game,Player(ManyToMany).. ( and some other fields)
Match database
match_id,name,match_game,match_map,team1,team2,winner,loser,date,duration,score1,score2.. ( and some other fields)
Tournament database
tournament_id,tournament_name,tournament_game,Match(ManyToMany).. ( and some other fields)
You can create something like this in [app_name]/models.py
from django.db import models
class Tournament(models.Model):
name = models.CharField(max_length=255)
class Team(models.Model):
name = models.CharField(max_length=255)
class Player(models.Model):
first_name = models.CharField(max_length=255)
last_name = models.CharField(max_length=255)
country = models.CharField(max_length=255)
team = models.ForeignKey(Team, on_delete=models.CASCADE)
class Match(models.Model):
name = models.CharField(max_length=255)
match_game = models.CharField(max_length=255)
match_map = models.CharField(max_length=255)
match_teams = models.ManyToManyField(Team)
winner = models.ForeignKey(Team, on_delete=models.CASCADE)
loser = models.ForeignKey(Team, on_delete=models.CASCADE)
duration = models.DurationField()
winning_score = models.PositiveIntegerField()
losing_score = models.PositiveIntegerField()
tournament = models.ForeignKey(Tournament, on_delete=models.CASCADE)
class Game(models.Model):
name = models.CharField(max_length=255)
match = models.ForeignKey(Match, on_delete=models.CASCADE)
Some things to note:
You do not need to create ID fields, Django does this for you automatically.
A Many-to-Many field can often be replaced with a One-to-One field on the other model, for instance instead of many matches having many games, each game is part of one match. This may or may not work in your particular use case.
I have changed some field names (such as score_1 being replaced with winning_score) because I feel they are more clear, assuming I have correctly understood their purpose.
There are some fields (Tournament.tournament_game, Player.country) for which I used CharField but would be better served with a ForeingKey field to a separate model.
This also assumes that you do not need different fields for different types of tournament (League of Legends, DOTA). If you do need this you could achieve it with different models that inherit from an abstract base class:
class Game(models.Model):
name = models.CharField(max_length=255)
match = models.ForeignKey(Match, on_delete=models.CASCADE)
class Meta:
abstract = True
class DOTA2Game(Game):
dota_field = models.CharField(max_length=255)
class LeagueOfLegendsGame(Game):
lol_field = models.CharField(max_length=255)
In this example DOTA2Game and LeagueOfLegendsGame both inherit from Game and therefore have both a name and a match field as well as their custom fields. Setting abstract = True in the meta class of Game prevents it existing as a separate table within the database.

Understanding odd results when paginating double outerjoins

I'm working with Flask and SQLAlchemy and I stumble on behavior I do not understand. When I build a query with .outerjoin() and .paginate() it all works well until today when I created a query with double outerjoin related from one another like on the simplified example code bellow (db is reference to SQLAlchemy). Class First has one-to-many relation with Second and Second is one-to-one with Third.
For testing purpose I have prepared three search queries. First two search_1 and search_2 works well all the time. But search_3 works only until there are two Second records related to the same First record. When there is more than one Second related to First then query returns mostly lower number of records (but not as low as when using .join instead of .outerjoin) and in some cases even higher then the number of records in First table. What's strange number of records is changing even when using different sorting order (always by columns of First model).
class First(db.Model):
__tablename__ = 'first'
id = db.Column(db.Integer, primary_key=True)
date_create = db.Column(db.DateTime, default=datetime.utcnow)
date_update = db.Column(db.DateTime)
class Second(db.Model):
__tablename__ = 'second'
id = db.Column(db.Integer, primary_key=True)
first_id = db.Column(db.Integer, db.ForeignKey('first.id'))
third_id = db.Column(db.Integer, db.ForeignKey('third.id'))
class Third(db.Model):
__tablename__ = 'third'
id = db.Column(db.Integer, primary_key=True)
date_create = db.Column(db.DateTime, default=datetime.utcnow)
# prepare base query to reuse later
outer_base = First.query \
.outerjoin(Second, Second.first_id == First.id) \
.outerjoin(Third, Third.id == Second.third_id) \
.order_by(First.id.asc())
# works well
search_1 = First.query.order_by(First.id.asc()).paginate(1, 10, False)
# works well
search_2 = outer_base.all()
# odd as hell...
search_3 = outer_base.paginate(1, 10, False)
I just want to be able to filter First records by value from Third if there is any relation created with the use of Second table. Can anyone please explain me what am I missing? Maybe the double outerjoin can be achieved differently to work with pagination?

SQLAlchemy query.filter returned no FROM clauses due to auto-correlation

I'm a beginner SQLAlchemy user frustrated with the extensive documentation.
I have a newsfeed that is updated when a new revision is made to some content object. content always has at least one revision. content is related to 1 or more topics through an association table. I'm given a set of topic.ids T, and would like to show the N most recent "approved" revisions belonging to a row in content that has at least one topic in T. ("approved" is just an enum attribute on revision)
Here are the models and the relevant attributes:
class Revision(Model):
__tablename__ = 'revision'
class Statuses(object): # enum
APPROVED = 'approved'
PROPOSED = 'proposed'
REJECTED = 'rejected'
values = [APPROVED, PROPOSED, REJECTED]
id = Column(Integer, primary_key=True)
data = Column(JSONB, default=[], nullable=False)
content_id = db.Column(db.Integer, db.ForeignKey('content.id'), nullable=False)
class Content(Model):
__tablename__ = 'content'
id = Column(Integer, primary_key=True)
topic_edges = relationship(
'TopicContentAssociation',
primaryjoin='Content.id == TopicContentAssociation.content_id',
backref='content',
lazy='dynamic',
cascade='all, delete-orphan'
)
revisions = relationship(
'Revision',
lazy='dynamic',
backref='content',
cascade='all, delete-orphan'
)
class TopicContentAssociation(Model):
__tablename__ = 'topic_content_association'
topic_id = Column(Integer, ForeignKey('topic.id'), primary_key=True)
content_id = Column(Integer, ForeignKey('content.id'), primary_key=True)
class Topic(Model):
__tablename__ = 'topic'
id = Column(Integer, primary_key=True)
Here's what I've got so far:
revisions = session.query(Revision).outerjoin(Content).outerjoin(Topic).filter(
~exists().where(
and_(
Topic.id.in_(T),
Revision.status == Revision.Statuses.APPROVED
) )
).order_by(Revision.ts_created.desc()).limit(N)
and this error is happening:
Select statement returned no FROM clauses due to auto-correlation; specify correlate(<tables>) to control correlation manually.:
SELECT *
FROM topic, revision
WHERE topic.id IN (:id_1, :id_2, :id_3...)
AND revision.status = :status_1
The interesting part is that if I remove the and_ operator and the second expression within it (lines 3, 5, and 6), the error seems to go away.
BONUS: :) I would also like to show only one revision per row of content. If somebody hits save a bunch of times, I don't want the feed to be cluttered.
Again, I'm very new to SQLAlchemy (and actually, relational databases), so an answer targeted to a beginner would be much appreciated!
EDIT: adding .correlate(Revision) after the .where clause fixes things, but I'm still working to figure out exactly what is going on here.
This is a very late response, but I am replying in case someone finds this topic.
Your second expression within the exist statement (Revision.status == Revision.Statuses.APPROVED) calls the table Revision for a second time in your query. The first time you call this table is by writing "session.query(Revision)"
If we "translate" your exists statement in PostgreSQL it would be:
EXISTS (SELECT 1
FROM Revision
WHERE topic.id IN (:id_1, :id_2, :id_3...)
AND revision.status = :status_1
)
Calling the same table twice (FROM Revision) in the same query is not allowed unless you use an alias. So, you can create an alias of the desired table, using the aliased() function and solve your problem. Your code should be fine like this:
from sqlalchemy.orm import aliased
aliasRev = aliased(Revision)
revisions = session.query(Revision).outerjoin(Content).outerjoin(Topic).filter(
~exists().where(
and_(
Topic.id.in_(T),
aliasRev.status == Revision.Statuses.APPROVED
) )
).order_by(Revision.ts_created.desc()).limit(N)

sqlalchemy join with sum and count of grouped rows

Hi i am working on a little prediction game in flask with flask-sqlalchemy I have a User Model:
class User(db.Model, UserMixin):
id = db.Column(db.Integer, primary_key=True)
nick = db.Column(db.String(255), unique=True)
bets = relationship('Bet', backref=backref("user"))
and my Bet model
class Bet(db.Model):
id = db.Column(db.Integer, primary_key=True)
uid = db.Column(db.Integer, db.ForeignKey('user.id'))
matchid = db.Column(db.Integer, db.ForeignKey('match.id'))
points = db.Column(db.Integer)
Both are not the full classes but it should do it for the question. A user can gather points for predicting the match outcome and gets different amount of points for predicting the exact outcome, the winner or the difference.
I now want to have a list of the top users, where i have to sum up the points which i'm doing via
toplist = db.session.query(User.nick, func.sum(Bet.points)).\
join(User.bets).group_by(Bet.uid).order_by(func.sum(Bet.points).desc()).all()
This works quite good, now there maybe the case that two players have the same sum of points. In this case the amount of correct predictions (rewarded with 3 points) would define the winner. I can get this list by
tophits = db.session.query(User.nick, func.count(Bet.points)).\
join(User.bets).filter_by(points=3).all()
They both work well, but I think there has to be a way to get both querys together and get a table with username, points and "hitcount". I've done that before in SQL but i am not that familiar with SQLAlchemy and thought knots in my brain. How can I get both queries in one?
In the query for tophits just replace the COUNT/filter_by construct with equivalent SUM(CASE(..)) without filter so that the WHERE clause for both is the same. The code below should do it:
total_points = func.sum(Bet.points).label("total_points")
total_hits = func.sum(case(value=Bet.points, whens={3: 1}, else_=0)).label("total_hits")
q = (session.query(
User.nick,
total_points,
total_hits,
)
.join(User.bets)
.group_by(User.nick)
.order_by(total_points.desc())
.order_by(total_hits.desc())
)
Note that i changed a group_by clause to use the column which is in SELECT, as some database engines might complain otherwise. But you do not need to do it.