I know of at least 4 ways of fetching relationships from a relational database.
I tried to make the examples generic to any language.
I'd like to know some algorithm of choosing one over the other besides manually testing the query.
Method A: has the problem of having to loop twice to build the result. also can't process one row at a time one by one without building arrays.
for contact in query("SELECT id, name FROM contact") {
contacts[contact.id]["name"] = contact.name
push(ids, contact.id)
}
for email in query("SELECT contact_id, address FROM email WHERE contact_id IN ?", ids) {
push(contacts[email.contact_id]["emails"], email.address)
}
Method B: Has the problem of a cartesian join in case of more joins
for contact in query("SELECT c.id, c.name, e.address FROM contact c JOIN email e ON e.contact_id = c.id") {
contacts[contact.id]["name"] = contact.name
push(contacts[email.contact_id]["emails"], email.address)
}
Method C: Has a problem in that GROUP_CONCAT is limited to a certain number of bytes and might be cut off
for contact in query("SELECT c.id, c.name, GROUP_CONCAT(e.address) AS addresses FROM contact c JOIN email e ON e.contact_id = c.id GROUP BY c.id") {
contacts[contact.id]["name"] = contact.name
contacts[contact.id]["emails"] = split(",", contact.addresses)
}
Method D: Has the N+1 query problem, runs a query for each contact
contactStatement = prepare("SELECT id, name FROM contact")
emailStatement = prepare("SELECT address FROM email WHERE contact_id = ?")
for contact in contactStatement.query() {
contacts[contact.id]["name"] = contact.name
// this uses a prepared query
for email in emailStatement.query(contact.id) {
push(contacts[contact.id]["emails"], email.address)
}
}
Or maybe there is a way that is ideal in every case?
I have many years of experience using various ORM's and I'd like to avoid using one.
Method A is what most ORM's default to.
Method D is what seems like the solution that SQL designers expect us to use.
I'm asking for some help in how to find a logical way of picking one method over another for a specific SQL query.
Related
Imagine we have the following models:
type Company struct {
gorm.Model
Name string
Addresses []Address
}
type Address struct {
gorm.Model
CompanyID uint64
Street string
City string
Country string
}
I want to take all the companies(and their addresses) which have address in a specific location. Something like this:
SELECT company.*, address.* FROM company
INNER JOIN address ON address.company_id = company.id AND address.country = 'Bulgaria'
So if a company does not have address at the specific location, I will not get it as a result at all. I was trying something like that:
db.Joins("Addresses", "addresses.country = ?", "Bulgaria").Find(&companies)
However, it doesn't work, because GORM doesn't take the second argument of Joins(when preloading join used), so I should check the generated query and make something like that:
db.Where(`"Address".country = ?`, "Bulgaria").Joins("Addresses").Find(&companies)
Is there a better way/not hacky way? Have in mind all of the above code is mock of the real problem, I didn't want to expose the original models/queries.
You can use Preload to load Addresses into the Company object.
Based on your described conditions, where you don't want to load companies that don't match your filter, you should use an INNER JOIN
Two options:
First, if your table is named company, then your query should look like this:
db.Table("company").
Preload("Addresses").
Joins("INNER JOIN addresses a ON a.company_id = company.id").
Where("a.country = ?", "Bulgaria").
Find(&companies)
Second, if your table is named companies, then you should try this:
db.Preload("Addresses").
Joins("INNER JOIN addresses a ON a.company_id = companies.id").
Where("a.country = ?", "Bulgaria").
Find(&companies)
If you are using Gorm v2 you perform CRUD operations on has-one and one-to-many via associations:
var companies []Company
var addresses []Address
countries := []string{"Bulgaria"}
db.Model(&Address).Where("country IN (?)", countries).Find(&addresses)
// The key is using the previously fetched variable as the model for the association query
db.Model(&addresses).Association("CompanyID").Find(&companies)
This also works for many-to-many relations.
When comparing Gorm VS raw SQL queries (db.Raw(SQLquery)) you will typically see a performance hit. On the upside, you will not have to deal with the added complexity of raw sql string building in Go, which is pretty nice. I personally use a combination of both raw and gorm-built queries in my projects :)
Suppose I have 3 hypothetical models;
class State(models.Model):
name = models.CharField(max_length=20)
class Company(models.Model):
name = models.CharField(max_length=60)
state = models.ForeignField(State)
class Person(models.Model):
name = models.CharField(max_length=60)
state = models.ForeignField(State)
I want to be able to return results in a Django app, where the results, if using SQL directly, would be based on a query such as this:
SELECT a.name as 'personName',b.name as 'companyName', b.state as 'State'
FROM Person a, Company b
WHERE a.state=b.state
I have tried using the select_related() method as suggested here, but I don't think this is quite what I am after, since I am trying to join two tables that have a common foreign-key, but have no key-relationships amongst themselves.
Any suggestions?
Since a Person can have multiple Companys in the same state. It is not a good idea to do the JOIN at the database level. That would mean that the database will (likely) return the same Company multiple times, making the output quite large.
We can prefetch the related companies, with:
qs = Person.objects.select_related('state').prefetch_related('state__company')
Then we can query the Companys in the same state with:
for person in qs:
print(person.state.company_set.all())
You can use a Prefetch-object [Django-doc] to prefetch the list of related companies in an attribute of the Person, for example:
from django.db.models import Prefetch
qs = Person.objects.prefetch_related(
Prefetch('state__company', Company.objects.all(), to_attr='same_state_companies')
)
Then you can print the companies with:
for person in qs:
print(person.same_state_companies)
Considering I have the following relationships:
class House(Model):
name = ...
class User(Model):
"""The standard auth model"""
pass
class Alert(Model):
user = ForeignKey(User)
house = ForeignKey(House)
somevalue = IntegerField()
Meta:
unique_together = (('user', 'property'),)
In one query, I would like to get the list of houses, and whether the current user has any alert for any of them.
In SQL I would do it like this:
SELECT *
FROM house h
LEFT JOIN alert a
ON h.id = a.house_id
WHERE a.user_id = ?
OR a.user_id IS NULL
And I've found that I could use prefetch_related to achieve something like this:
p = Prefetch('alert_set', queryset=Alert.objects.filter(user=self.request.user), to_attr='user_alert')
houses = House.objects.order_by('name').prefetch_related(p)
The above example works, but houses.user_alert is a list, not an Alert object. I only have one alert per user per house, so what is the best way for me to get this information?
select_related didn't seem to work. Oh, and surely I know I can manage this in multiple queries, but I'd really want to have it done in one, and the 'Django way'.
Thanks in advance!
The solution is clearer if you start with the multiple query approach, and then try to optimise it. To get the user_alerts for every house, you could do the following:
houses = House.objects.order_by('name')
for house in houses:
user_alerts = house.alert_set.filter(user=self.request.user)
The user_alerts queryset will cause an extra query for every house in the queryset. You can avoid this with prefetch_related.
alerts_queryset = Alert.objects.filter(user=self.request.user)
houses = House.objects.order_by('name').prefetch_related(
Prefetch('alert_set', queryset=alerts_queryset, to_attrs='user_alerts'),
)
for house in houses:
user_alerts = house.user_alerts
This will take two queries, one for houses and one for the alerts. I don't think you require select related here to fetch the user, since you already have access to the user with self.request.user. If you want you could add select_related to the alerts_queryset:
alerts_queryset = Alert.objects.filter(user=self.request.user).select_related('user')
In your case, user_alerts will be an empty list or a list with one item, because of your unique_together constraint. If you can't handle the list, you could loop through the queryset once, and set house.user_alert:
for house in houses:
house.user_alert = house.user_alerts[0] if house.user_alerts else None
I have a problem doing a query. That I want to do is access with a query to the data in my "String code" in MyDomainA through a query from MyDomainB. The relationship between the tables is unidirectional one to one.
Is possible use a gorm way to do this??
Domain A:
class LicenceType {
String code
String description
Double price
static constraints = {
}
}
TABLE DOMAIN A
code description
A this is A
B this is B
C this is C
Domain B: (have the unidirectional relationship)
class VoiceUser {
LicenceType licenceType
String username
String email
String nameID
}
TABLE DOMAIN B
User
1
2
3
4
That I want to do is know how many users have the same code(code is a column of DomainA and both tables have a unidirectional relationship as I indicated before).
This is that I'm trying to do that is wrong...
Controller:
def resulta = VoiceUser.executeQuery('SELECT a.code, COUNT(b.nameID) FROM VoiceUser AS b INNER JOIN b.licenceType AS a GROUP BY a.code')
def resultCount = resulta[0]
This is some example result that I hope...
Users with code A = 2
Users with code B = 2
Users with code C = o
The trick is to do a group by on the code and a count() on the user. You can do this using either HQL or a criteria query.
HQL
Here's an example in HQL:
VoiceUser.executeQuery('SELECT licence.code, COUNT(user) FROM VoiceUser AS user INNER JOIN user.licenceType AS licence GROUP BY licence.code')
If you're familiar with SQL, most of this should make sense right away. An important difference is the syntax for joining domain classes. HQL deals with domain classes, not tables.
Criteria query
And here's the equivalent criteria query.
VoiceUser.withCriteria {
projections {
licenceType {
groupProperty('code')
}
count('id')
}
}
Alternative queries
The queries shown above return a List<List> like this:
[
['A', 2],
['B', 2],
['C', 0]
]
If you provide a LicenceType (or its code) as input to the query, then you can get the count for just that LicenceType. For instance, here are examples which retrieve the user count for licence code 'A'.
HQL
def result = VoiceUser.executeQuery('SELECT COUNT(user) FROM VoiceUser AS user INNER JOIN user.licenceType AS licence WHERE licence.code = :code', [code: 'A'])[0]
Criteria query
def result = VoiceUser.createCriteria().get {
licenceType {
eq('code', 'A')
}
projections {
count('id')
}
}
Additional resources
I've got a series of articles which explain HQL, criteria, and where queries in detail; such as how to use projections and joins. Feel free to check them out.
I have an Article with a Set of Category.
How can I query, using the criteria interface, for all Articles that contain all Categories with a certain Id?
This is not an "in", I need exclusively those who have all necessary categories - and others. Partial matches should not come in there.
Currently my code is failing with this desperate attempt:
var c = session.CreateCriteria<Article>("a");
if (categoryKeys.HasItems())
{
c.CreateAlias("a.Categories", "c");
foreach (var key in categoryKeys)
c.Add(Restrictions.Eq("c", key)); //bogus, I know!
}
Use the "IN" restriction, but supplement to ensure that the number of category matches is equal to the count of all the categories you're looking for to make sure that all the categories are matched and not just a subset.
For an example of what I mean, you might want to take a look at this page, especially the "Intersection" query under the "Toxi solution" heading. Replace "bookmarks" with "articles" and "tags" with "categories" to map that back to your specific problem. Here's the SQL that they show there:
SELECT b.*
FROM tagmap bt, bookmark b, tag t
WHERE bt.tag_id = t.tag_id
AND (t.name IN ('bookmark', 'webservice', 'semweb'))
AND b.id = bt.bookmark_id
GROUP BY b.id
HAVING COUNT( b.id )=3
I believe you can also represent this using a subquery that may be easier to represent with the Criteria API
SELECT Article.Id
FROM Article
INNER JOIN (
SELECT ArticleId, count(*) AS MatchingCategories
FROM ArticleCategoryMap
WHERE CategoryId IN (<list of category ids>)
GROUP BY ArticleId
) subquery ON subquery.ArticleId = EntityTable.Id
WHERE subquery.MatchingCategories = <number of category ids in list>
I'm not 100% sure, but I think query by example may be what you want.
Assuming that Article to Category is a one-to-many relationship and that the Category has a many-to-one property called Article here is a VERY dirty way of doing this (I am really not proud of this but it works)
List<long> catkeys = new List<long>() { 4, 5, 6, 7 };
if (catkeys.Count == 0)
return;
var cr = Session.CreateCriteria<Article>("article")
.CreateCriteria("Categories", "cat0")
.Add(Restrictions.Eq("cat0.Id", catkeys[0]));
if (catkeys.Count > 1)
{
for (int i = 1; i < catkeys.Count; i++)
{
cr = cr.CreateCriteria("Article", "a" + i)
.CreateCriteria("Categories", "cat" + i)
.Add(Restrictions.Eq("cat" + i + ".Id", catkeys[i]));
}
}
var results = cr.List<Article>();
What it does is to re-join the relationship over and over again guaranteeing you the AND between category Ids. It should be very slow query especially if the list of Ids gets big.
I am offering this solution as NOT a recommended way but at least you can have something working while looking for a proper one.