Grouping different join paths for similar objects - orm

Okay, so let's say I have a table of users
class User(ModelBase):
name = Column(String, nullable=False)
And some tables of transactions
class TransactionTypeA(ModelBase):
amount = Column(Integer)
timestamp = Column(DateTime, nullable=False)
user_id = Column(Integer, ForeignKey(User.id))
user = relationship(...)
class TransactionTypeB(ModelBase):
amount = Column(Integer)
timestamp = Column(DateTime, nullable=False)
property_unique_to_b = ...
user = relationship(to user via property_unique_to_b)
Is there a good way to create some parent Transaction table that retains the relationship to User, so I can just query session.query(func.sum(AllTransactions.amount)).filter(User.id==some_id)? The only way to create a table similar to this would be with non-traditional mappings by mapping it to some UNION ALL select. If I use joined-table inheritance I remove the duplication of timestamp and amount, but I still can't get any global relationship of the transactions to the User table.

Related

How do I add a row to an association table in SQLAlchemy? [duplicate]

I don't think I fully understand association tables. I know how to work with normal tables i.e add rows and what not but I don't understand how to work with an association table.
why would I use the below
student_identifier = db.Table('student_identifier',
db.Column('class_id', db.Integer, db.ForeignKey('classes.class_id')),
db.Column('user_id', db.Integer, db.ForeignKey('students.user_id'))
)
Vs
class studentIdent(db.model):
db.Column(db.Integer, db.ForeignKey('classes.class_id')),
db.Column(db.Integer, db.ForeignKey('students.user_id'))
As mentioned in a comment to the question, you would not bother creating a class for the association table if it only contains the foreign keys linking the two tables in the many-to-many relationship. In that case your first example – an association table – would be sufficient.
However, if you want to store additional information about the nature of the link between the two tables then you will want to create an association object so you can manipulate those additional attributes:
class StudentIdent(db.Model):
__tablename__ = "student_identifier"
course_id = db.Column(
db.Integer,
primary_key=True,
autoincrement=False,
db.ForeignKey('courses.course_id')
)
user_id = db.Column(
db.Integer,
primary_key=True,
autoincrement=False,
db.ForeignKey('students.user_id')
)
enrolment_type = db.Column(db.String(20))
# reason for student taking this course
# e.g., "core course", "elective", "audit"
and then you could create the link between a given student and a particular course by creating a new instance of the association object:
thing = StudentIdent(course_id=3, user_id=6, enrolment_type="elective")
Note: This is just a basic linkage. You can get more sophisticated by explicitly declaring a relationship between the ORM objects.

SQLAlchemy with multiple Many to Many Relationships

I would like create a database for recipes with SQLAlchemy, however I am not sure if my approach is correct. How can I insert data to the recipe_ingredient table?
My approach:
A recipe has a name and can have multiple ingredients. One ingredient consists of an amount, an unit and a name (ingredients table), for example 500 ml water.
Table recipe
- id, primary key
- name
(one recipe can have multiple ingredients and one ingredient can be in multiple recipes)
table recipe_ingredient
foreignkey(recipe.id)
foreignkey(amounts.id)
foreignkey(units.id)
foreignkey(ingredients.id)
table amounts
id, primary key
amount (e.g. 500)
table units
id, primary key
unit (e.g. ml)
table ingredients
id, primary key
ingredient (e.g. water)
Code:
recipe_ingredient = db.Table('recipe_ingredient',
db.Column('idrecipe', db.Integer, db.ForeignKey('recipe.id')),
db.Column('idingredient', db.Integer, db.ForeignKey('ingredients.id')),
db.Column('idunit', db.Integer, db.ForeignKey('units.id')),
db.Column('idamount', db.Integer, db.ForeignKey('amounts.id'))
)
class Recipe(db.Model):
id= db.Column(db.Integer, primary_key=True, autoincrement=True)
name= db.Column(db.VARCHAR(70), index=True)
ingredient= db.relationship("Ingredients", secondary=recipe_ingredient, backref='recipe')
amounts = db.relationship("Amounts", secondary=recipe_ingredient, backref='recipe')
units= db.relationship("Units", secondary=recipe_ingredient , backref='recipe')
class Ingredients(db.Model):
id = db.Column(db.Integer, primary_key=True, autoincrement=True)
ingredient = db.Column(db.VARCHAR(200))
class Units(db.Model):
id= db.Column(db.Integer, primary_key=True, autoincrement=True)
unit= db.Column(db.VARCHAR(45), nullable=False)
class Amounts(db.Model):
id= db.Column(db.Integer, primary_key=True, autoincrement=True)
amount= db.Column(db.VARCHAR(45), nullable=False)
UPDATE:
class RecipeIngredient(db.Model):
__tablename__ = 'recipe_ingredient'
recipe_id = db.Column(db.Integer, db.ForeignKey('recipe.id'), primary_key=True)
ingredient_id = db.Column(db.Integer, db.ForeignKey('ingredient.id'), primary_key=True)
amount = db.Column(db.Integer, db.ForeignKey('amount.id'))
unit = db.Column(db.Integer, db.ForeignKey('unit.id'))
recipes = relationship("Recipe", back_populates="ingredients")
ingredients = relationship("Ingredient", back_populates="recipes")
class Recipe(db.Model):
id = db.Column(db.Integer, primary_key=True, autoincrement=True)
name = db.Column(db.VARCHAR(70), index=True)
ingredients = relationship("RecipeIngredient", back_populates="recipes")
class Ingredient(db.Model):
id = db.Column(db.Integer, primary_key=True, autoincrement=True)
name = db.Column(db.VARCHAR(200))
recipes = relationship("RecipeIngredient", back_populates="ingredients")
class Unit(db.Model):
id = db.Column(db.Integer, primary_key=True, autoincrement=True)
name = db.Column(db.VARCHAR(45), nullable=False)
class Amount(db.Model):
id = db.Column(db.Integer, primary_key=True, autoincrement=True)
number = db.Column(db.VARCHAR(45), nullable=False)
I added the following objects
pizza = Recipe(name='Pizza')
flour = Ingredient(name='Flour')
water = Ingredient(name='Water')
g = Unit(name='g')
ml = Unit(name='ml')
a500 = Amount(number='500')
a100 = Amount(number='100')
r1= RecipeIngredient(recipe_id=pizza.id, ingredient_id=flour.id, amount=a500.id, unit=g.id)
r2= RecipeIngredient(recipe_id=pizza.id, ingredient_id=water.id, amount=a100.id, unit=ml.id)
Result of pizza.ingredients:
[<RecipeIngredient 1, 1>, <RecipeIngredient 1, 3>]
What do I have to add to that model to get the name of an ingredient with:
pizza.ingredients[0].name
So, your relationship definitions and your many-to-many recipe_ingredient table is Fine. You can do what you want to do with the code you have. You have a few stylistic issues that make your code harder to read than it should be.
How it works
Let's take a look at the functionality you have first:
Recipe objects will have an ingredient attribute that acts like a list. You can append Ingredients objects to it, and when you call it you'll have a regular Python list of Ingredients:
# Make a cake,
cake = Recipe(name='Cake')
flour = Ingredients(ingredient='Flour')
eggs = Ingredients(ingredient='Eggs')
cake.ingredient.append(flour)
cake.ingredient.append(eggs)
for i in cake.ingredient:
print(i.ingredient)
Because you already defined the Recipe.amount's secondary relationship, where secondary=recipe_ingredient, SQLAlchemy has all the information it needs to manage the many-to-many relationship for you.
Ingredients objects will have a recipe attribute that acts like a list, and references the same relationships:
# Find recipes using flour
for r in flour.recipe:
print(r.name)
You can even add recipes to an ingredient, rather than adding ingredients to a recipe, and it will work just the same:
# Make cookies
cookies = Recipe(name='Cookies')
eggs.recipe.append(cookies)
for i in cookies.ingredient:
print(i.ingredient)
How it reads
You may have noticed that the way you're naming things makes it read a little clunkily. When any attribute is referencing a one-to-many relationship, it's a lot clearer when a plural is used. For instance, the ingredients relationship in Recipe would read a lot more nicely if it was actually called ingredients rather than ingredient. That would let us iterate through cake.ingredients. The same goes in the reverse direction: calling the backref recipes instead of recipe will make it a lot clearer that flour.recipes refers to multiple linked recipes, where flour.recipe might be a little misleading.
There's also inconsistency about whether your objects are plural or singular. Recipe is singular, but Ingredients is plural. Honestly, opinions on which is the correct style aren't universal - I prefer using the singular for all my models, Recipe, Ingredient, Amount, Unit - but that's just me. Pick a single style and stick to it, rather than switching between them.
Lastly, your attribute names are a little redundant. Ingredients.ingredient is a bit much. Ingredient.name makes a lot more sense, and it's clearer too.
One more thing
There's one additional thing here - it looks to me like you want to store additional information about your recipes/ingredients relationship, which is the amount and unit of an ingredient. You might want 2 eggs or 500 grams of flour, and I'm guessing that's what your Amounts and Units tables are for. In this case, this additional information is a key part of your relationship, and instead of trying to match it up in separate tables you can story it directly in your association table. This requires a bit more work - The SQLAlchemy docs go deeper into how to use an Association object to manage this additional data. Suffice it to say that using this pattern will make your code a lot cleaner and easier to manage long term.
Update
#user7055220 in my opinion you don't need the separate Amount or Unit tables, because these values only have meaning as part of theRecipeIngredient relationship. I would remove those tables, and change the amount and unit attributes in RecipeIngredient to be straight String and Integer values that store the unit/amount directly:
class RecipeIngredient(db.Model):
__tablename__ = 'recipe_ingredient'
recipe_id = db.Column(db.Integer, db.ForeignKey('recipe.id'), primary_key=True)
ingredient_id = db.Column(db.Integer, db.ForeignKey('ingredient.id'), primary_key=True)
amount = db.Column(db.Integer)
unit = db.Column(db.VARCHAR(45), nullable=False)
recipes = relationship("Recipe", back_populates="ingredients")
ingredients = relationship("Ingredient", back_populates="recipes")
In either case, to answer your question of how to access the ingredient values - in your example pizza.ingredients now contains an array of association objects. The ingredient is a child of that association object, and can be accessed via its ingredients attribute. You could access the unit and amount values directly if you make the change I suggested above. Accessing that data would look like this:
for i in pizza.ingredients:
print(i.amount) # This only works if you're storing the amount directly in the association proxy table as per my example above
print(i.unit) # Same goes for this
print(i.ingredients.name) # here, the ingredient is accessed via the "ingredients" attribute of the association object
Or, if you just want to use the syntax you were asking about:
print(pizza.ingredients[0].ingredients.name)
One more thing on naming: Notice here that the backref objects in your association object is called ingredients when it only ever maps to a single ingredient, so it should be singular - then the above example would be pizza.ingredients[0].ingredient.name, which sound a little better

Optimising a specific type of DB query

I have a many-to-many relation between two objects: Quotes and Books. A quote can belong to multiple books, but usually only one or two. On the other hand a book usually has multiple quotes attributed to it. I have a SQL query for the quotes which I'd like to turn into a single query for all the books that have at least one of the quotes:
I have done the following for the to change a query from Quotes into the corresponding one for all the Books:
def get_books(session, quotes):
quote_id_query = quotes.from_self(Quote.quote_id)
book_query = (session.query(Book)
.join(Book.quotes)
.filter(Book.book_id.in_(quote_id_query))
.distinct())
return book_query
This works but it is way to slow for certain quotes queries. If quotes is empty, the corresponding book query is quick, but if the corresponding quote query is non-empty then it may take upwards of 10 seconds(about 1000x what the quotes query takes and even being slower than N+1 queries). I am using a recent version of Postgres. I have indices on my secondary table and my attempts to EXPLAIN ANALYZE the problem have met a plan almost a dozen levels deep. Can anyone help me reduce these queries to sane times?
EDIT: Here are the current model definitions:
class Quote(BaseModel):
quote_id = Column(Integer, primary_key=True, nullable=False)
full_text = Column(String, nullable=False, unique=True)
uses = Column(Integer, nullable=False)
popularity = Column(Integer, nullable=False)
books = relationship('Book', secondary='quotebook', back_populates='quotes')
class Book(BaseModel):
book_id = Column(Integer, primary_key=True, nullable=False)
author = Column(String, nullable=False, index=True)
title = Column(String, nullable=False, index=True)
genre = Column(String, nullable=False, index=True)
cost = Column(Integer, nullable=False)
quotes = relationship('Quote', secondary='quotebook', back_populates='books', lazy='joined')
class QuoteBook(BaseModel):
__tablename__ = 'quotebook'
id = Column(Integer, primary_key=True)
book_id = Column(Integer, ForeignKey('book.book_id'), index=True)
quote_id = Column(Integer, ForeignKey('quote.quote_id'), index=True)
You should log, look and post the SQL statement generated for review (set sqlalchemy.engine logger to INFO), but your join(Book.quotes) should already perform an inner join condition so your filter(Book.book_id.in_(quote_id_query)) is extraneous and should be removed.
Depending on the size and definitions of your tables, if the join is expensive, you might also consider testing the performance of an exists clause using filter(Book.quotes. http://docs.sqlalchemy.org/en/latest/orm/internals.html?highlight=has#sqlalchemy.orm.properties.RelationshipProperty.Comparator.has (quoteid>0)

Define backref parameter

class Country(db.Model):
__tablename__ = 'countries'
id = db.Column(db.Integer, primary_key=True)
code = db.Column(db.Integer)
name_en = db.Column(db.String(100))
user_country = db.relationship('User', backref='country')
company_country = db.relationship('Company', backref='country')
In this example I am overriding the table name. So, what should be the backref? The tablename or the class name? Or can be anything else?
From the docs:
backref is a simple way to also declare a new property on the xxxxx class
So, the name of backref is only a general description?
And if I have two tables using the country table, i need to create two relationships, like in my example? the procedure is one relation by each reference in another table?
You can refer this example.
class Parent(Base):
__tablename__ = 'parent'
id = Column(Integer, primary_key=True)
children = relationship("Child", backref="parent")
class Child(Base):
__tablename__ = 'child'
id = Column(Integer, primary_key=True)
parent_id = Column(Integer, ForeignKey('parent.id'))
To store Child in Parent
Get parent object.
p = Parent.query.get(1)
Store in child object using backref
c = Child(parent=p)
db.session.add(c)
db.session.commit()
To access parend through child object you will follow this.
Make object.
child = Child ()
Access through backref.
child.parent.
Like this? This will make user.country into something like the model
class User(db.Model):
__tablename__ = 'users'
id = db.Column(db.Integer, primary_key=True)
country = db.relationship("Country", backref="users")
class Country(db.Model):
__tablename__ = 'countries'
id = db.Column(db.Integer, primary_key=True)
user_country_id = db.Column(db.Integer, db.ForeignKey('users.id'))
user_country = db.relationship("User")
It does confused. Also sqlalchemy's doc is long and tedious, make it hard to understand.
So, what should be the backref? The tablename or the class name? Or can be anything else?
Anything else can become the backref name
So, the name of backref is only a general description?
yes, it add a attribute in your class(in other words, table) dynamically.
Because SQLAlchemy uses metaclasses, the code that creates the back reference on the other class won't run until you have created at least one instance of the Client class.
See backref-class-attribute for more details, such as access backref by Subject.client .
And if I have two tables using the country table, i need to create two relationships, like in my example? the procedure is one relation by each reference in another table?
Sounds like you want a many to many relationship. Both the association-table and flask-doc can help.
hope it helps :)

sqlalchemy materialized relationships

I have a data model which is analogous to the following:
Location 1-----*<> Vacation <>*------1 TravelAgency
<>
|*
|
|1
Airline
It is implemented in sqlalchemy in the normal way:
class Vacation(Base):
__tablename__ = 'vacation'
id = Column(Integer, primary_key=True)
location_id = Column(Integer, ForeignKey('location.id')
location = relationship("Location")
travel_agency_id = Column(Integer, ForeignKey('travel_agency.id')
travel_agency = relationship("TravelAgency")
airline_id = Column(Integer, ForeignKey('airline.id')
airline = relationship("Airline")
class Location(Base):
__tablename__ = 'location'
id = Column(Integer, primary_key=True)
data = Column(Integer)
class TravelAgency(Base):
__tablename__ = 'travel_agency'
id = Column(Integer, primary_key=True)
data = Column(Integer)
class Airline(Base):
__tablename__ = 'airline'
id = Column(Integer, primary_key=True)
data = Column(Integer)
Analysis of vacations in a database of hundreds of millions of objects is too slow due to the multiple joins required. After exhausting my options for speeding up the join operations with database configuration options, I am now trying to use database triggers to maintain a materialized view of vacations joined with its aggregates.
SELECT column_name FROM INFORMATION_SCHEMA.COLUMNS
WHERE table_name = 'vacation_materialized';
column_name
--------------
id
location_id
location$data
travel_agency_id
travel_agency$data
airline_id
airline$data
Now I am weighing options for how to reconstruct vacation, travel_agency and airline objects from this view. One option is to use the sqlalchemy core to query the vacation_materialized table then parse the rows and construct the objects "by hand". Are there any ORM features I should be looking into that might result in a more "elegant" solution?
You should be able to do map a class against the materialized view, then provide read-only Relationship properties:
materialized_vacations = Table('materialized_vacations', metadata, autoload=True)
locations = Table('locations', metadata, autoload=True)
mvac_2_location = materialized_vacations.c.location_id==locations.c.location_id
class Location(Base):
__table__ = locations
class MaterializedVacation(Base):
__table__ = materialized_vacations
location = relationship("Location", primaryjoin=mvac_2_location, viewonly=True)
...
I'm assuming here that you don't want to put any Foreign Keys into your materialized view. Instead, I'm specifying the join conditions explicitly using the primaryjoin keyword argument to relationship().
Here mvac_2_location creates an sqlalchemy.sql.expression.BinaryExpression; I like to declare those separately before using because they tend to take up most of a line on their own and make argument sequences unreadable if they're declared where they're used. It also makes them reusable and importable into submodules, which can be handy.
To construct mvac_2_location, I need the actual table objects, and I need them before finishing declaration of class MaterializedVacation, so I'm declaring them the old-fashioned pre-declarative way and then binding the classes to the tables using the declarative argument __table__ in place of the more common __tablename__. It's possible that there's a better way to do that, but I'm not sure.