How to remove SQLAlchemy Many-To-Many Orphans from database? - sql

Context
I have a simple MySQL database written with SQLAlchemy. The following are my two models, Subreddit and Keyword, that have a many-to-many relationship, along with their association table:
subreddits_keywords = db.Table('subreddits_keywords', db.Model.metadata,
db.Column('subreddit_id', db.Integer, db.ForeignKey('subreddits.id', ondelete='CASCADE')),
db.Column('keyword_id', db.Integer, db.ForeignKey('keywords.id', ondelete='CASCADE')),
)
class Subreddit(db.Model, JsonSerializer):
__tablename__ = 'subreddits'
id = db.Column(db.Integer, primary_key=True)
subreddit_name = db.Column(db.String(128), index=True)
# Establish a parent-children relationship (subreddit -> keywords).
keywords = db.relationship('Keyword', secondary=subreddits_keywords, backref='subreddits', cascade='all, delete', passive_deletes=True, lazy='dynamic')
// ...
class Keyword(db.Model, JsonSerializer):
__tablename__ = 'keywords'
id = db.Column(db.Integer, primary_key=True)
keyword = db.Column(db.String(128), index=True)
// ...
As test data, I've created the following data set:
Subreddit:
test_subreddit
Keywords:
test_keyword1
test_keyword2
test_keyword3
In other words, test_subreddit.keywords should return [test_keyword1, test_keyword2, test_keyword3].
Problem
When I remove test_subreddit, test_keyword1, test_keyword2, test_keyword3 still persist in the database.
I understand that with many-to-many relationships, there is technically no parent so cascade's technically will not work according to this post:
https://stackoverflow.com/a/803584/10426919.
What I've Tried
I followed this link: https://github.com/sqlalchemy/sqlalchemy/wiki/ManyToManyOrphan.
This link provides a library function that should fix my exact problem.
However, the function does not work when integrated into my Model file in the following ways:
Method #1:
from app.extensions import db
from werkzeug.security import generate_password_hash, check_password_hash
from sqlalchemy.inspection import inspect
from sqlalchemy_utils import auto_delete_orphans <------ # library
subreddits_keywords = db.Table('subreddits_keywords', db.Model.metadata,
db.Column('subreddit_id', db.Integer, db.ForeignKey('subreddits.id', ondelete='CASCADE')),
db.Column('keyword_id', db.Integer, db.ForeignKey('keywords.id', ondelete='CASCADE')),
)
class Subreddit(db.Model, JsonSerializer):
__tablename__ = 'subreddits'
id = db.Column(db.Integer, primary_key=True)
subreddit_name = db.Column(db.String(128), index=True)
# Establish a parent-children relationship (subreddit -> keywords).
keywords = db.relationship('Keyword', secondary=subreddits_keywords, backref='subreddits', cascade='all, delete', passive_deletes=True, lazy='dynamic')
// ...
class Keyword(db.Model, JsonSerializer):
__tablename__ = 'keywords'
id = db.Column(db.Integer, primary_key=True)
keyword = db.Column(db.String(128), index=True)
// ...
auto_delete_orphans(Subreddit.keywords) <------ # Library function
However, this function does not seem to do anything. There is no error that is output to help guide me towards the right direction. When I check my database in MySQL workbench, the Subreddit, test_subreddit, is deleted, but the keywords [test_keyword1, test_keyword2, test_keyword3] are still in the database under the Keywords table.
Method #2:
I tried integrating the actual function, that the library function is based on, into my code as well:
from app.extensions import db
from werkzeug.security import generate_password_hash, check_password_hash
from sqlalchemy.inspection import inspect
from sqlalchemy_utils import auto_delete_orphans
# for deleting many-to-many "orphans".
from sqlalchemy import event, create_engine
from sqlalchemy.orm import attributes, sessionmaker
subreddits_keywords = db.Table('subreddits_keywords', db.Model.metadata,
db.Column('subreddit_id', db.Integer, db.ForeignKey('subreddits.id', ondelete='CASCADE')),
db.Column('keyword_id', db.Integer, db.ForeignKey('keywords.id', ondelete='CASCADE')),
)
class Subreddit(db.Model, JsonSerializer):
__tablename__ = 'subreddits'
id = db.Column(db.Integer, primary_key=True)
subreddit_name = db.Column(db.String(128), index=True)
# Establish a parent-children relationship (subreddit -> keywords).
keywords = db.relationship('Keyword', secondary=subreddits_keywords, backref='subreddits', cascade='all, delete', passive_deletes=True, lazy='dynamic')
// ...
class Keyword(db.Model, JsonSerializer):
__tablename__ = 'keywords'
id = db.Column(db.Integer, primary_key=True)
keyword = db.Column(db.String(128), index=True)
// ...
engine = create_engine("mysql://", echo=True)
Session = sessionmaker(bind=engine)
#event.listens_for(Session, 'after_flush')
def delete_tag_orphans(session, ctx):
# optional: look through Session state to see if we want
# to emit a DELETE for orphan Tags
flag = False
for instance in session.dirty:
if isinstance(instance, Subreddit) and \
attributes.get_history(instance, 'keywords').deleted:
flag = True
break
for instance in session.deleted:
if isinstance(instance, Subreddit):
flag = True
break
# emit a DELETE for all orphan Tags. This is safe to emit
# regardless of "flag", if a less verbose approach is
# desired.
if flag:
session.query(Keyword).\
filter(~Keyword.subreddits.any()).\
delete(synchronize_session=False)
Again, the keywords persisted despite being attached to no parent.
What I'm trying to accomplish
When children in the database no longer have a parent, I would like them to be removed from the database. What am I doing wrong?

Rather than using auto_delete_orphans, I created a method that I can call when I want to delete children. This method checks the child in question, and sees if it has any parents. If it does have a parent, we leave it be, but if it does not have a parent, we then delete the children.
Here is how I implemented this method, given that a Subreddit is a parent and a Keyword is a child of Subreddit.
def check_for_keyword_orphans(keyword):
# check if each keyword has an associated subreddit
if len(keyword.subreddits) == 0:
db.session.delete(keyword)
return True # keyword deleted
else:
return False # keyword has an associated subreddit
And here is how I used the method in my API route:
keywords = subreddit.keywords
for keyword in keywords:
check_for_keyword_orphans(keyword)
db.session.commit()

Related

How to perform a data migration with Alembic and two versions of my Table?

I'm trying to refactor a database model; separate one column out of a table into another new one. I'd like to do this using existing SQLAlchemy Core models & Alembic. I'd also like to use server-side INSERT ... FROM SELECT ...-style query to migrate data (docs). By avoiding having to copy all the gazillion of rows to Python-world I hope to have maximum scalability, maximum performance and minimum downtime.
My problem is the programmatic use of SQLAlchemy running on two versions of the same table name in a single Metadata context. Should I resort to using an textual SQL instead? 😕
schema.py before:
class User(Base):
__tablename__ = "users"
id = Column(BigInteger, primary_key=True, autoincrement=False, nullable=False)
[...]
profile_picture_url = Column(String, nullable=True)
schema.py after:
class User(Base):
__tablename__ = "users"
id = Column(BigInteger, primary_key=True, autoincrement=False, nullable=False)
[...]
class UserProfileExtras(Base):
__tablename__ = "user_profile_extras"
user_id = Column(BigInteger, ForeignKey("users.id"), index=True, nullable=False)
profile_picture_url = Column(String, nullable=False)
So here's my attempt to create an Alembic upgrade script:
# Import the new/current-in-code models.
from ... import User, UserProfileExtras
# Define the previous User model in order to operate on the current/old schema.
class UserBeforeUpgrade(Base):
__tablename__ = "users"
id = Column(BigInteger, primary_key=True, autoincrement=False, nullable=False)
[...]
profile_picture_url = Column(String, nullable=True)
table_before_upgrade: Table = UserBeforeUpgrade.__table__
new_target_table = UserProfileExtras.__table__
[...]
def upgrade() -> None:
op.create_table(
"user_profile_extras",
sa.Column("user_id", sa.BigInteger(), autoincrement=False, nullable=False),
sa.Column("profile_picture_url", sa.VARCHAR(), nullable=False),
[...]
)
from_user_table = (select([table_before_upgrade.c.id, table_before_upgrade.c.profile_picture_url])
.where(table_before_upgrade.c.profile_picture_url != None))
insert_from = (
new_target_table.insert().from_select(
[new_target_table.c.user_id, new_target_table.c.profile_picture_url],
from_user_table)
)
op.execute(insert_from))
[...]
[...]
Error:
sqlalchemy.exc.InvalidRequestError: Table 'users' is already defined for this MetaData instance.
Specify 'extend_existing=True' to redefine options and columns on an existing Table object.

marshmallow - include_fk fail if foreign_key is not int

On serializing a related database entity with sql-alchemy and marshmallow I encountered following issue:
On dumping this schema the debugger raises a ValueError with the message
self._serialize(d, many=False)
value = field_obj.serialize(attr_name, obj, accessor=self.get_attribute)
return self._serialize(value, attr, obj, **kwargs)
ret = self._format_num(value) # type: _T
return self.num_type(value)
ValueError: invalid literal for int() with base 10: 'sub'
It seems the library tries to cast the the key into an Integer. For readability reasons the key is a String in this case, so the cast obviously fails.
Ist there a flag to avoid casting the foreign_key?
Here the models for reference:
Parent Class
class Operation(db.Model):
__tablename__ = "operation"
key = db.Column(db.String(64), primary_key=True)
label = db.Column(db.String(128), nullable=False)
rules = db.relationship('app.models.rule.Rule', backref="operation")
Child Class
class Rule(db.Model):
__tablename__ = 'rule'
id = db.Column(db.Integer, primary_key=True, autoincrement="auto")
operation_key = db.Column(db.Integer, ForeignKey('operation.key'))
class RuleSchema(ma.SQLAlchemyAutoSchema):
class Meta:
model = Rule
include_fk = True
Found a workaround. Adding a nested field with an only condition for the schema. Api. The serialised json nests the foreign key into a object. 'operation': {'key': 'sub'}
class RuleSchema(ma.SQLAlchemyAutoSchema):
class Meta:
model = Rule
# include_fk = True
operation = ma.Nested(OperationSchema, only=('key',))

flask: Get all items from parent in self referencing table: sqlalchemy.exc.ArgumentError: Expected mapped entity or selectable/table as join target

I am trying to select all the child posts from the post id selected by the user. Below is my model definition for Posts and the query that gives the error in the title. I have tried a few things including aliasing which looked promising but this and several other options tried but all gave errors.
# Post model
class Post(db.Model):
id = db.Column(db.Integer, primary_key=True)
title = db.Column(db.String(100), nullable=False)
date_posted = db.Column(db.DateTime(100), nullable=False, default=datetime.utcnow)
content = db.Column(db.Text, nullable=False)
topic = db.Column(db.String(100), nullable=False)
user_id = db.Column(db.Integer, db.ForeignKey('user.id'), nullable=False)
# thread structure using self-referencing see
# https://docs.sqlalchemy.org/en/14/orm/self_referential.html
parent_post = db.Column(db.Integer, db.ForeignKey('post.id'), nullable=True)
child_posts = db.relationship('Post', backref='parent', lazy="joined", remote_side='Post.id') # background query
# display selected post thread route
#posts.route("/post_thread/<int:post_id>")
def post_thread(post_id):
posts = Post.query.filter(Post.id==post_id).join(Post.id==Post.parent_post).all()
# This works but to get only the selected post: posts = Post.query.where(Post.id==post_id)
return render_template('post_thread.html', posts=posts, topic="Thread")
I have written a recursive function that works but this does not make use of the backref relationship so I guess is not the efficient way to do it.
# Recursive function to iterate down tree and return union of parent and children found
def union_children(post_id, posts):
print("Looking for child posts")
child_posts = Post.query.where(Post.parent_post==post_id)
if child_posts:
print("Child found")
posts = posts.union(child_posts)
for post in child_posts:
posts = union_children(post.id, posts)
return posts
#posts.route("/post_thread/<int:post_id>")
def post_thread(post_id):
posts = Post.query.where(Post.id==post_id)
posts = union_children(post_id, posts)
return render_template('post_thread.html', posts=posts, topic="Thread")

Edit many-to-many relationship in Flask-SQLAlchemy withg WTForms

I struggle handling a many-to-many relationship (here: users and groups) in a Flask form. My database structure (simplified) looks as follow:
association_user_group = db.Table(
'association_user_group',
db.Column('user_id', db.Integer, db.ForeignKey('user.id')),
db.Column('group_id', db.Integer, db.ForeignKey('group.id'))
)
class Group(db.Model):
__tablename__ = 'group'
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String(32))
users = db.relationship(
'User',
secondary=association_user_group,
backref=db.backref('groups', lazy='dynamic'))
class User(UserMixin, db.Model):
__tablename__ = 'user'
id = db.Column(db.Integer, primary_key=True)
username = db.Column(db.String(64), index=True, unique=True)
#property
def group_ids(self):
return [u.id for u in self.groups]
To handle a form that allows an admin to edit the users, I have:
class MultiCheckboxField(SelectMultipleField):
widget = widgets.ListWidget(prefix_label=False)
option_widget = widgets.CheckboxInput()
class UserEditForm(FlaskForm):
username = StringField('Username', validators=[DataRequired()])
password = PasswordField('Password')
groups = MultiCheckboxField('Groups', coerce=int)
submit = SubmitField('Apply changes')
and the following route:
#bp.route('/user_edit/<int:id>', methods=['GET', 'POST'])
#login_required
#admin_required
def user_edit(id):
user = User.query.get(id)
if request.method == 'GET':
form = UserEditForm(obj=user)
form.groups.data = [grp.id for grp in user.groups]
else:
form = UserEditForm(request.form)
form.groups.choices = [(grp.id, grp.name) for grp in Group.query.all()]
if form.validate_on_submit():
#form.populate_obj(user) <-- does not work
user.username = form.data.username
if form.password.data != '':
user.set_password(form.password.data)
# how do I update the 'groups'?
db.session.add(user)
db.session.commit()
return "Data={}".format(form.data)
The database and form works, but I am unable to copy the groups form content back to the database. Ideally, I'd be able to 'populate back' the form content to the User object, but this fails because of the groups (and I think it might also fail with the password field that is empty if no change is requested).
I then tried to delete all groups from user and re-add the ones I want, but did not figure out a reasonable way to achieve this. I am able to add an group membership with user.groups.append(grp) where grp is the corresponding database object. I also am able to remove a group membership in the same way, but given the group ids this would mean looping through all group ids, retrieving the group object, and then using this object to invoke the remove method, which seems overly complicated.
Overall, I suspect that I attempt to implements all this in a far too awkward way...

SQLAlchemy Find a user with a single query (probably join necessary)

I have three models which have relationships.
Firstly, the participant model.
class Participant(UserMixin, db.Model):
__tablename__ = 'participants'
id = db.Column(db.Integer, primary_key=True)
email = db.Column(db.String(64), index=True)
team_id = db.Column(db.Integer, db.ForeignKey('teams.id'))
# Relationships
team = db.relationship("Team", back_populates="members")
Secondly, the event model.
class Event(db.Model):
__tablename__ = 'events'
id = db.Column(db.Integer, primary_key=True)
Thirdly, the team model.
class Team(db.Model):
__tablename__ = 'teams'
id = db.Column(db.Integer, primary_key=True)
event_id = db.Column(db.Integer, db.ForeignKey('events.id'))
# Relationships
members = db.relationship('Participant', back_populates="team")
Several participants are allowed to have the same email address if their team is not connected to the same event.
I am looking for a query which checks if there is a participant with a given email address who is connected to a team, which is connected to the same event. I know the event.id and the email address in advance.
Pseudo code
def check(EVENTID, EMAIL):
if db.session.query(Team, Event, Participant). \
filter(Team.event_id == EVENTID). \
filter(Participant.team_id == Team.id). \
filter(Participant.email == EMAIL).first():
return true
I think it can be done with one single query using joins, but I couldn't figure it out. Please help!