Explain Keyed Tuple output from SQLAlchemy ORM Query using Aliased - orm

Please help me improve/understand queries using an aliased class. Consider an example with movement between two locations described as follows.
class Location(Base):
__tablename__ = 'location'
id = Column(Integer, primary_key = True)
class Movement(Base):
__tablename__ = 'movement'
id = Column(Integer, primary_key = True)
from_id = Column(None, ForeignKey('location.id')
to_id = Column(None, ForeignKey('location.id')
from_location = relationship('Location', foreign_keys = from_id)
to_location = relationship('Location', foreign_keys = to_id)
To join three tables in a query, I'm using the aliased() function from sqlalchemy.orm:
FromLocation = aliased(Location)
ToLocation = aliased(Location)
r = session.query(Movement, FromLocation, ToLocation).\
join(FromLocation, Movement.from_id == FromLocation.id).\
join(ToLocation, Movement.to_id == ToLocation.id).first()
First question is "What's the intelligent way to work with r?" The query returns a keyed tuple, but the only key is 'Movement', there's no 'FromLocation' as I would expect. I can get it with r[1], but that's easily broken.
Second question is "Did I put in the relationship right?" I didn't think I would have to specify the join target so explicitly. But without the targets specified, I get an error:
r = session.query(Movement, FromLocation, ToLocation).\
join(FromLocation).\
join(ToLocation)
InvalidRequestError: Could not find a FROM clause to join from. Tried joining to <AliasedClass at 0x10cfa16d8; Location>, but got: Can't determine join between 'movement' and '%(4512717680 location)s'; tables have more than one foreign key constraint relationship between them. Please specify the 'onclause' of this join explicitly.
Yes, I see the two foreign keys, but how to map them correctly?

Option-1: To have names in the KeyedTuple, just add names to the aliases:
FromLocation = aliased(Location, name="From")
ToLocation = aliased(Location, name="To")
# ...
print(r.keys)
# >>>> ['Movement', 'From', 'To']
Option-2: Create a query to return only Movement instance(s), but preload both locations. Please note also alternative join syntax by specifying relationship instead of key pairs.
r = (session.query(Movement)
.join(FromLocation, Movement.from_location)
.join(ToLocation, Movement.to_location)
.options(contains_eager(Movement.from_location, alias=FromLocation))
.options(contains_eager(Movement.to_location, alias=ToLocation))
).first()
print(r)
print(r.from_location)
print(r.to_location)

Related

SQLAlchemy query.filter returned no FROM clauses due to auto-correlation

I'm a beginner SQLAlchemy user frustrated with the extensive documentation.
I have a newsfeed that is updated when a new revision is made to some content object. content always has at least one revision. content is related to 1 or more topics through an association table. I'm given a set of topic.ids T, and would like to show the N most recent "approved" revisions belonging to a row in content that has at least one topic in T. ("approved" is just an enum attribute on revision)
Here are the models and the relevant attributes:
class Revision(Model):
__tablename__ = 'revision'
class Statuses(object): # enum
APPROVED = 'approved'
PROPOSED = 'proposed'
REJECTED = 'rejected'
values = [APPROVED, PROPOSED, REJECTED]
id = Column(Integer, primary_key=True)
data = Column(JSONB, default=[], nullable=False)
content_id = db.Column(db.Integer, db.ForeignKey('content.id'), nullable=False)
class Content(Model):
__tablename__ = 'content'
id = Column(Integer, primary_key=True)
topic_edges = relationship(
'TopicContentAssociation',
primaryjoin='Content.id == TopicContentAssociation.content_id',
backref='content',
lazy='dynamic',
cascade='all, delete-orphan'
)
revisions = relationship(
'Revision',
lazy='dynamic',
backref='content',
cascade='all, delete-orphan'
)
class TopicContentAssociation(Model):
__tablename__ = 'topic_content_association'
topic_id = Column(Integer, ForeignKey('topic.id'), primary_key=True)
content_id = Column(Integer, ForeignKey('content.id'), primary_key=True)
class Topic(Model):
__tablename__ = 'topic'
id = Column(Integer, primary_key=True)
Here's what I've got so far:
revisions = session.query(Revision).outerjoin(Content).outerjoin(Topic).filter(
~exists().where(
and_(
Topic.id.in_(T),
Revision.status == Revision.Statuses.APPROVED
) )
).order_by(Revision.ts_created.desc()).limit(N)
and this error is happening:
Select statement returned no FROM clauses due to auto-correlation; specify correlate(<tables>) to control correlation manually.:
SELECT *
FROM topic, revision
WHERE topic.id IN (:id_1, :id_2, :id_3...)
AND revision.status = :status_1
The interesting part is that if I remove the and_ operator and the second expression within it (lines 3, 5, and 6), the error seems to go away.
BONUS: :) I would also like to show only one revision per row of content. If somebody hits save a bunch of times, I don't want the feed to be cluttered.
Again, I'm very new to SQLAlchemy (and actually, relational databases), so an answer targeted to a beginner would be much appreciated!
EDIT: adding .correlate(Revision) after the .where clause fixes things, but I'm still working to figure out exactly what is going on here.
This is a very late response, but I am replying in case someone finds this topic.
Your second expression within the exist statement (Revision.status == Revision.Statuses.APPROVED) calls the table Revision for a second time in your query. The first time you call this table is by writing "session.query(Revision)"
If we "translate" your exists statement in PostgreSQL it would be:
EXISTS (SELECT 1
FROM Revision
WHERE topic.id IN (:id_1, :id_2, :id_3...)
AND revision.status = :status_1
)
Calling the same table twice (FROM Revision) in the same query is not allowed unless you use an alias. So, you can create an alias of the desired table, using the aliased() function and solve your problem. Your code should be fine like this:
from sqlalchemy.orm import aliased
aliasRev = aliased(Revision)
revisions = session.query(Revision).outerjoin(Content).outerjoin(Topic).filter(
~exists().where(
and_(
Topic.id.in_(T),
aliasRev.status == Revision.Statuses.APPROVED
) )
).order_by(Revision.ts_created.desc()).limit(N)

SQLAlchemy: select all posts that has tags in [..] (many-to-many)

I have Users, Interests and Events.
User has (many-to-many) interests. Event has (many-to-many) interests. That's why I have two "intermediate" tables: user_to_interest and event_to_interest.
I want to somehow select all events that has interests from user's interests list (in other words, all events that has tags IN [1, 144, 4324]).
In SQL I'd do that ~like this:
SELECT DISTINCT event.name FROM event JOIN event_to_interest ON event.id = event_to_interest.event_id WHERE event_to_interest.interest_id IN (10, 144, 432)
How should I do that through SQLAlchemy? (I'm using Flask-SQLAlchemy if necessary)
Assuming you have a (simplified) model like below:
user_to_interest = Table('user_to_interest', Base.metadata,
Column('id', Integer, primary_key=True),
Column('user_id', Integer, ForeignKey('user.id')),
Column('interest_id', Integer, ForeignKey('interest.id'))
)
event_to_interest = Table('event_to_interest', Base.metadata,
Column('id', Integer, primary_key=True),
Column('event_id', Integer, ForeignKey('event.id')),
Column('interest_id', Integer, ForeignKey('interest.id'))
)
class User(Base):
__tablename__ = 'user'
id = Column(Integer, primary_key=True)
name = Column(String)
class Event(Base):
__tablename__ = 'event'
id = Column(Integer, primary_key=True)
name = Column(String)
class Interest(Base):
__tablename__ = 'interest'
id = Column(Integer, primary_key=True)
name = Column(String)
users = relationship(User, secondary=user_to_interest, backref="interests")
events = relationship(Event, secondary=event_to_interest, backref="interests")
Version-1: you should be able to do simple query on list of interest_ids, which will generate basically the SQL statement you desire:
interest_ids = [10, 144, 432]
query = session.query(Event.name)
query = query.join(event_to_interest, event_to_interest.c.event_id == Event.id)
query = query.filter(event_to_interest.c.interest_id.in_(interest_ids))
However, if there are events which have two or more of the interests from the list, the query will return the same Event.name multiple times. You can work-around it by using distinct though: query = session.query(Event.name.distinct())
Version-2: Alternatively, you could do this using just relationships, which will generate different SQL using sub-query with EXISTS clause, but semantically it should be the same:
query = session.query(Event.name)
query = query.filter(Event.interests.any(Interest.id.in_(interest_ids)))
This version does not have a problem with duplicates.
However, I would go one step back, and assume that you do get interest_ids for particular user, and would create a query that works for a user_id (or User.id)
Final Version: using any twice:
def get_events_for_user(user_id):
#query = session.query(Event.name)
query = session.query(Event) # #note: I assume name is not enough
query = query.filter(Event.interests.any(Interest.users.any(User.id == user_id)))
return query.all()
One can agrue that this creates not so beautiful SQL statement, but this is exactly the beauty of using SQLAlchemy which hides the implementation details.
Bonus: you might actually want to give higher priority to the events which have more overlapping interests. In this case the below could help:
query = session.query(Event, func.count('*').label("num_interests"))
query = query.join(Interest, Event.interests)
query = query.join(User, Interest.users)
query = query.filter(User.id == user_id)
query = query.group_by(Event)
# first order by overlaping interests, then also by event.date
query = query.order_by(func.count('*').label("num_interests").desc())
#query = query.order_by(Event.date)

Django - Making a SQL query on a many to many relationship with PostgreSQL Inner Join

I am looking for a perticular raw SQL query using Inner Join.
I have those models:
class EzMap(models.Model):
layers = models.ManyToManyField(Shapefile, verbose_name='Layers to display', null=True, blank=True)
class Shapefile(models.Model):
filename = models.CharField(max_length=255)
class Feature(models.Model):
shapefile = models.ForeignKey(Shapefile)
I would like to make a SQL Query valid with PostgreSQL that would be like this one:
select id from "table_feature" where' shapefile_ezmap_id = 1 ;
but I dont know how to use the INNER JOIN to filter features where the shapefile they belongs to are related to a particular ezmap object
Something like this:
try:
id = Feature.objects.get(shapefile__ezmap__id=1).id
except Feature.DoesNotExist:
id = 0 # or some other action when no result is found
You will need to use filter (instead of get) if you want to deal with multiple Feature results.

django self join query using aliases

am trying to use queryset to perform the following query without using raw SQL. any idea how can do that?
select * from category_main a, category_list b, category_main c where b.main_id=c.id and a.id=c.parent_id
UPDATED
below are my models
class Main(models.Model):
slug = models.SlugField()
is_active = models.BooleanField(default=True)
site = models.ForeignKey(Site)
parent = models.ForeignKey('self', blank=True, null=True, limit_choices_to={'parent' : None})
class Meta:
unique_together = (("slug", "parent"))
def __unicode__(self):
return self.slug
class List(models.Model):
main = models.ForeignKey(Main)
slug = models.SlugField(unique=True)
is_active = models.BooleanField(default=True)
parent = models.ForeignKey('self', blank=True, null=True)
def __unicode__(self):
return self.slug
UPDATE
Hi, I just managed to find a query that does that for me, I used advised below to join main with main's parent and from there I joined list with main list using the below
Main.objects.select_related('main', 'parent').filter(list__is_active=True, maini18n__language='en', list__listi18n__language='en').query.__str__()
'SELECT `category_main`.`id`, `category_main`.`slug`, `category_main`.`is_active`, `category_main`.`site_id`, `category_main`.`parent_id`, T5.`id`, T5.`slug`, T5.`is_active`, T5.`site_id`, T5.`parent_id` FROM `category_main` INNER JOIN `category_maini18n` ON (`category_main`.`id` = `category_maini18n`.`main_id`) INNER JOIN `category_list` ON (`category_main`.`id` = `category_list`.`main_id`) INNER JOIN `category_listi18n` ON (`category_list`.`id` = `category_listi18n`.`list_id`) LEFT OUTER JOIN `category_main` T5 ON (`category_main`.`parent_id` = T5.`id`) WHERE (`category_maini18n`.`language` = en AND `category_list`.`is_active` = True AND `category_listi18n`.`language` = en )'
the returned query mapped everything I need, accept its not being added to the select statement, is there a way so i can force it to select columns from category_list.* ?
This does basically what you want:
lists = List.objects.select_related('main', 'parent')
Note you have to explicitly state the relationships to follow in select_related here, because your parent relationship has null=True which isn't followed by default.
This will give you a set of List objects, but pre-fetch the related Main and List objects which you can reference as normal without hitting the db again.

Django LEFT OUTER JOIN on TWO columns where one isn't a foreign key

I have two models like so:
class ObjectLock(models.Model):
partner = models.ForeignKey(Partner)
object_id = models.CharField(max_length=100)
class Meta:
unique_together = (('partner', 'object_id'),)
class ObjectImportQueue(models.Model):
partner = models.ForeignKey(Partner)
object_id = models.CharField(max_length=100)
... # other fields
created = models.DateTimeField(auto_now_add = True)
modified = models.DateTimeField(auto_now = True, db_index=True)
class Meta:
ordering = ('modified', 'created')
There is nothing notable about the third model mentioned above (Partner).
I'd like to get something like:
SELECT * FROM ObjectImportQueue q LEFT OUTER JOIN ObjectLock l ON
q.partner_id=l.partner_id AND q.object_id=l.object_id WHERE l.object_id
IS NULL and l.partner_id IS NULL;
I came across this page that tells how to do custom joins, and I tried passing in a tuple of the column names to join in place of the column name to join, and that didn't work. The Partner table shouldn't need to be included in the resulting sql query but I will accept an answer that does include it as long as it effectively does what I'm trying to do with one query.
If you're using Django 1.2+ and know the SQL you want, you could always fall back to a Raw Query.
I also meet a similar question.but finally,I found I asked a wrong question to be solve.
in the Django ORM,the condition of SQL join is base on what the models.Model fields defined.
there are Many-to-one relationships (ForeignKey),Many-to-many relationships(ManyToManyField),One-to-one relationships(OneToOneField).
in your situation.ObjectLockModel and ObjectImportQueueModel have the same part of fields, the partnerfield and object_idfield.yon should use One-to-one relationships.
you can change your Model like this:
class ObjectImportQueue(models.Model):
partner = models.ForeignKey(Partner)
object_id = models.CharField(max_length=100)
created = models.DateTimeField(auto_now_add = True)
modified = models.DateTimeField(auto_now = True, db_index=True)
def __unicode__(self):
return u"%s:%s" % (self.partner, self.object_id)
class Meta:
ordering = ('modified', 'created')
class ObjectLock(models.Model):
lock = models.OneToOneField(ObjectImportQueue, null=True)
class Meta:
unique_together = (('lock',),)
order of Model is import,OneToOneField argument model must come first.
>>> p1 = Partner.objects.get(pk=1)
>>> p2 = Partner.objects.get(pk=2)
>>> Q1 = ObjectImportQueue.objects.create(partner=p1,object_id='id_Q1')
>>> Q2 = ObjectImportQueue.objects.create(partner=p2,object_id='id_Q2')
>>> ObjectImportQueue.objects.filter(lock__isnull=True)
[<ObjectImportQueue: Partner object:id_Q1>, <ObjectImportQueue: Partner object:id_Q2>]
>>> L1 = ObjectLock.objects.create(lock=Q1)
>>> ObjectImportQueue.objects.filter(lock__isnull=True)
[<ObjectImportQueue: Partner object:id_Q2>]
ObjectLock.objects.createlock a object
ObjectImportQueue.objects.filter(lock__isnull=True) select object don't be lock.
if you use the appropriate relationships, generate the ORM query will be easy.In Django,Define the relationships during you build the Model is better than use Query statement to relation the relationships between tables.
I just found a solution to this problem.
You have to create a view that does the join for you
CREATE VIEW ImporQueueLock AS (
SELECT q.id, l.id
FROM ObjectImportQueue q
LEFT OUTER JOIN ObjectLock l
ON q.partner_id=l.partner_id AND q.object_id=l.object_id
)
Then make a django model for that view
class ImportQueueLock(models.Model):
queue = models.ForeignKey(ObjectImportQueue, db_column='q')
lock = models.ForeignKey(ObjectLock, db_column='l')
Then make a ManyToMany on your Django model from ObjectLock to ObjectImportQueue through ImportQueueLock
class ObjectLock(models.Model):
partner = models.ForeignKey(Partner)
object_id = models.CharField(max_length=100)
queue = models.ManyToManyField(ObjectImportQueue, through = ImportQueueLock)
and you will be able to do
ObjectLock.objects.filter(importqueuelock__objectimportqueue__ .....)