SQL join both ways to one result - sql

I have two tables "TestItem" and "Connector" where Connector is used for relating two items in "TestItem".
I have two questions in prioritized order. But first, feel free to suggest alternative approaches. I'm open for suggestion to completely rethink my approach to what I want to achieve here.
Question 1) How to get relations both ways returned in the same result
Question 2) How to filter the most efficient way for specific items
Q1)
Two tables
Table: "TestItem"
ID, ITEM
1, "John Doe"
2, "Peggy Sue"
3, "Papa Sue"
Table: "Connector"
MOTHER, CHILD
1,2
The connector table will be used for several purposes (see below), but this is a destilled scenario for the equal type connection, like for instance marriage. If "John Doe" is married to "Peggy Sue" that information should also be sufficient to return "Peggy Sue" as married to "John Doe".
I can do this in two queries, but for efficiency (especially regarding my question 2) I'd appreciate this done in one query, so an implementation is not dependent on which way the connection is defined.
What is the most efficient way to do this?
Two queries approach to illustrate how the data can be fetched, but how one connection is missed one way or the other.
//Connector through "mother"-part SELECT ITEM, SUBITEM FROM TestItem
INNER JOIN (
SELECT MOTHER, ITEM AS SUBITEM
FROM Connector
INNER JOIN TestItem ON Connector.CHILD = TestItem.ID
) AS SUB ON TestItem.ID = SUB.MOTHER
/* WHERE ITEM = "John Doe" return "Peggy Sue" => Correct
WHERE ITEM = "Peggy Sue" return nothing => Wrong
*/
//Connector through "child"-part SELECT ITEM, SUBITEM FROM TestItem
INNER JOIN (
SELECT CHILD, ITEM AS SUBITEM
FROM Connector
INNER JOIN TestItem ON Connector.MOTHER= TestItem.ID
) AS SUB ON TestItem.ID = SUB.CHILD
/* WHERE ITEM = "John Doe" return nothing => Wrong
WHERE ITEM = "Peggy Sue" return "John Doe" => Correct
*/
Q2) Having the two approaches returned in one result may increase the amount of data involved, and hence bring down performance. If my focus is Peggy Sue, I assume sorting out only the relevant data as early as possible will improve performance. Is there a neat way of doing this from top level, or will every sub-query require an added WHERE?
PS: Some more information of the bigger perspective.
I'm planning to use the connector table for several purposes, both of the mentioned equal type, like colleagues, family, friends, etc, but also for hierarchical connection types like mother/child, leader/employee, country/city.
Thus solutions eliminating the mother/child-type connection may not suit my bigger purpose.
Basically I'm requesting how to handle the equal type of connections without losing the opportunity to use the same architecture and data for hierarchical connections.
Peggy Sue may through the same dataset be defined as daughter of Papa Sue through the relation
Mother, Child, Mother_type, Child_type
3, 2, Father, Daughter
1, 2, Married to, Married to
(But this is as mentioned on the side of what I'm requesting here. )

UNION ALL might be what you are looking for:
select mother.id as connectedToId,
mother.item as connectedToItem,
'Mother' as role
from TestItem ti
join Connector c on c.child = ti.id
join TestItem mother on c.mother = mother.id
where ti.item = 'John Doe'
union all
select child.id as connectedToId,
child.item as connectedToItem,
'Child' as role
from TestItem ti
join Connector c on c.mother = ti.id
join TestItem child on c.child = child.id
where ti.item = 'John Doe'

Related

Cypher - Add multiple connections

I have 2 nodes:
Students and Subjects.
I want to be able to add multiple student names to multiple subjects at the same time using cypher query.
So far I have done it by iterating through the list of names of students and subjects and executing the query for each. but is there a way to do the same in the query itself?
This is the query I use for adding 1 student to 1 subject:
MATCH
(s:Student)-[:STUDENT_BELONGS_TO]->(c:Classroom),
(u:Subjects)-[:SUBJECTS_TAUGHT_IN]->(c:Classroom)
WHERE
s.id = ${"$"}studentId
AND c.id = ${"$"}classroomId
AND u.name = ${"$"}subjectNames
AND NOT (s)-[:IN_SUBJECT]->(u)
CREATE (s)-[:IN_SUBJECT]->(u)
So I want to be able to receive multiple subjectNames and studentIds at once to create these connections. Any guidance for multi relationships in cypher ?
I think what you are looking for is UNWIND. If you have an array as parameter to your query:
studentList :
[
studentId: "sid1", classroomId: "cid1", subjectNames: ['s1','s2'] },
studentId: "sid2", classroomId: "cid2", subjectNames: ['s1','s3'] },
...
]
You can UNWIND that parameter in the beginning of your query:
UNWIND $studentList as student
MATCH
(s:Student)-[:STUDENT_BELONGS_TO]->(c:Classroom),
(u:Subjects)-[:SUBJECTS_TAUGHT_IN]->(c:Classroom)
WHERE
s.id = student.studentId
AND c.id = student.classroomId
AND u.name = in student.subjectNames
AND NOT (s)-[:IN_SUBJECT]->(u)
CREATE (s)-[:IN_SUBJECT]->(u)
You probably need to use UNWIND.
I haven't tested the code, but something like this might work:
MATCH
(s:Student)-[:STUDENT_BELONGS_TO]->(c:Classroom),
(u:Subjects)-[:SUBJECTS_TAUGHT_IN]->(c:Classroom)
WITH
s AS student, COLLECT(u) AS subjects
UNWIND subjects AS subject
CREATE (student)-[:IN_SUBJECT]->(subject)

How to choose how to fetch relationships from a relational database?

I know of at least 4 ways of fetching relationships from a relational database.
I tried to make the examples generic to any language.
I'd like to know some algorithm of choosing one over the other besides manually testing the query.
Method A: has the problem of having to loop twice to build the result. also can't process one row at a time one by one without building arrays.
for contact in query("SELECT id, name FROM contact") {
contacts[contact.id]["name"] = contact.name
push(ids, contact.id)
}
for email in query("SELECT contact_id, address FROM email WHERE contact_id IN ?", ids) {
push(contacts[email.contact_id]["emails"], email.address)
}
Method B: Has the problem of a cartesian join in case of more joins
for contact in query("SELECT c.id, c.name, e.address FROM contact c JOIN email e ON e.contact_id = c.id") {
contacts[contact.id]["name"] = contact.name
push(contacts[email.contact_id]["emails"], email.address)
}
Method C: Has a problem in that GROUP_CONCAT is limited to a certain number of bytes and might be cut off
for contact in query("SELECT c.id, c.name, GROUP_CONCAT(e.address) AS addresses FROM contact c JOIN email e ON e.contact_id = c.id GROUP BY c.id") {
contacts[contact.id]["name"] = contact.name
contacts[contact.id]["emails"] = split(",", contact.addresses)
}
Method D: Has the N+1 query problem, runs a query for each contact
contactStatement = prepare("SELECT id, name FROM contact")
emailStatement = prepare("SELECT address FROM email WHERE contact_id = ?")
for contact in contactStatement.query() {
contacts[contact.id]["name"] = contact.name
// this uses a prepared query
for email in emailStatement.query(contact.id) {
push(contacts[contact.id]["emails"], email.address)
}
}
Or maybe there is a way that is ideal in every case?
I have many years of experience using various ORM's and I'd like to avoid using one.
Method A is what most ORM's default to.
Method D is what seems like the solution that SQL designers expect us to use.
I'm asking for some help in how to find a logical way of picking one method over another for a specific SQL query.

Return results from more than one database table in Django

Suppose I have 3 hypothetical models;
class State(models.Model):
name = models.CharField(max_length=20)
class Company(models.Model):
name = models.CharField(max_length=60)
state = models.ForeignField(State)
class Person(models.Model):
name = models.CharField(max_length=60)
state = models.ForeignField(State)
I want to be able to return results in a Django app, where the results, if using SQL directly, would be based on a query such as this:
SELECT a.name as 'personName',b.name as 'companyName', b.state as 'State'
FROM Person a, Company b
WHERE a.state=b.state
I have tried using the select_related() method as suggested here, but I don't think this is quite what I am after, since I am trying to join two tables that have a common foreign-key, but have no key-relationships amongst themselves.
Any suggestions?
Since a Person can have multiple Companys in the same state. It is not a good idea to do the JOIN at the database level. That would mean that the database will (likely) return the same Company multiple times, making the output quite large.
We can prefetch the related companies, with:
qs = Person.objects.select_related('state').prefetch_related('state__company')
Then we can query the Companys in the same state with:
for person in qs:
print(person.state.company_set.all())
You can use a Prefetch-object [Django-doc] to prefetch the list of related companies in an attribute of the Person, for example:
from django.db.models import Prefetch
qs = Person.objects.prefetch_related(
Prefetch('state__company', Company.objects.all(), to_attr='same_state_companies')
)
Then you can print the companies with:
for person in qs:
print(person.same_state_companies)

Selecting related model: Left join, prefetch_related or select_related?

Considering I have the following relationships:
class House(Model):
name = ...
class User(Model):
"""The standard auth model"""
pass
class Alert(Model):
user = ForeignKey(User)
house = ForeignKey(House)
somevalue = IntegerField()
Meta:
unique_together = (('user', 'property'),)
In one query, I would like to get the list of houses, and whether the current user has any alert for any of them.
In SQL I would do it like this:
SELECT *
FROM house h
LEFT JOIN alert a
ON h.id = a.house_id
WHERE a.user_id = ?
OR a.user_id IS NULL
And I've found that I could use prefetch_related to achieve something like this:
p = Prefetch('alert_set', queryset=Alert.objects.filter(user=self.request.user), to_attr='user_alert')
houses = House.objects.order_by('name').prefetch_related(p)
The above example works, but houses.user_alert is a list, not an Alert object. I only have one alert per user per house, so what is the best way for me to get this information?
select_related didn't seem to work. Oh, and surely I know I can manage this in multiple queries, but I'd really want to have it done in one, and the 'Django way'.
Thanks in advance!
The solution is clearer if you start with the multiple query approach, and then try to optimise it. To get the user_alerts for every house, you could do the following:
houses = House.objects.order_by('name')
for house in houses:
user_alerts = house.alert_set.filter(user=self.request.user)
The user_alerts queryset will cause an extra query for every house in the queryset. You can avoid this with prefetch_related.
alerts_queryset = Alert.objects.filter(user=self.request.user)
houses = House.objects.order_by('name').prefetch_related(
Prefetch('alert_set', queryset=alerts_queryset, to_attrs='user_alerts'),
)
for house in houses:
user_alerts = house.user_alerts
This will take two queries, one for houses and one for the alerts. I don't think you require select related here to fetch the user, since you already have access to the user with self.request.user. If you want you could add select_related to the alerts_queryset:
alerts_queryset = Alert.objects.filter(user=self.request.user).select_related('user')
In your case, user_alerts will be an empty list or a list with one item, because of your unique_together constraint. If you can't handle the list, you could loop through the queryset once, and set house.user_alert:
for house in houses:
house.user_alert = house.user_alerts[0] if house.user_alerts else None

nHibernate collections and alias criteria

I have a simple test object model in which there are schools, and a school has a collection of students.
I would like to retrieve a school and all its students who are above a certain age.
I carry out the following query, which obtains a given school and the children which are above a certain age:
public School GetSchoolAndStudentsWithDOBAbove(int schoolid, DateTime dob)
{
var school = this.Session.CreateCriteria(typeof(School))
.CreateAlias("Students", "students")
.Add(Expression.And(Expression.Eq("SchoolId", schoolid), Expression.Gt("students.DOB", dob)))
.UniqueResult<School>();
return school;
}
This all works fine and I can see the query going to the database and returning the expected number of rows.
However, when I carry out either of the following, it gives me the total number of students in the given school (regardless of the preceding request) by running another query:
foreach (Student st in s.Students)
{
Console.WriteLine(st.FirstName);
}
Assert.AreEqual(s.Students.Count, 3);
Can anyone explain why?
You made your query on the School class and you restricted your results on it, not on the mapped related objects.
Now there are many ways to do this.
You can make a static filter as IanL said, however its not really flexible.
You can just iterate the collection like mxmissile but that is ugly and slow (especially considering lazy loading considerations)
I would provide 2 different solutions:
In the first you maintain the query you have and you fire a dynamic filter on the collection (maintaining a lazy-loaded collection) and doing a round-trip to the database:
var school = GetSchoolAndStudentsWithDOBAbove(5, dob);
IQuery qDob = nhSession.CreateFilter(school.Students, "where DOB > :dob").SetDateTime("dob", dob);
IList<Student> dobedSchoolStudents = qDob.List<Student>();
In the second solution just fetch both the school and the students in one shot:
object result = nhSession.CreateQuery(
"select ss, st from School ss, Student st
where ss.Id = st.School.Id and ss.Id = :schId and st.DOB > :dob")
.SetInt32("schId", 5).SetDateTime("dob", dob).List();
ss is a School object and st is a Student collection.
And this can definitely be done using the criteria query you use now (using Projections)
Unfortunately s.Students will not contain your "queried" results. You will have to create a separate query for Students to reach your goal.
foreach(var st in s.Students.Where(x => x.DOB > dob))
Console.WriteLine(st.FirstName);
Warning: That will still make second trip to the db depending on your mapping, and it will still retrieve all students.
I'm not sure but you could possibly use Projections to do all this in one query, but I am by no means an expert on that.
You do have the option of filtering data. If it there is a single instance of the query mxmissle option would be the better choice.
Nhibernate Filter Documentation
Filters do have there uses, but depending on the version you are using there can be issues where filtered collections are not cached correctly.