Sequelize select with aggregate includes bad attributes => error - orm

The models are Person and Team with a M:1 relationship.
The query that fails:
db.Person.findAll({
attributes: [ [sequelize.fn('COUNT', sequelize.col('*')), 'count']],
include: [{model: db.Team, required: true}], // to force inner join
group: ['team.team_id']
}).complete(function(err, data) {
...
});
The generated SQL is:
SELECT "person"."person_id",
COUNT(*) AS "count",
"team"."team_id" AS "team.team_id",
"team"."team_name" AS "team.team_name",
"team"."team_email" AS "team.team_email",
"team"."team_lead" AS "team.team_lead"
FROM "person" AS "person" INNER JOIN "team" AS "team"
ON "person"."team_id" = "team"."team_id"
GROUP BY "team"."team_id";
Obviously, the person.person_id included in the SELECT clause, screws it up, with Postgres complaining correctly that:
ERROR: column "person.person_id" must appear in the `GROUP BY` clause or be used in an aggregate function
It seems that the attributes option is taken into account since the COUNT appears correctly, but all the rest of the columns in the SELECT clause are added by default.
Is there another way (besides attributes) to explicitly define which columns appear in the SELECT clause or is this a bug?
I'm using Sequelize v2.0.3.

Sequelize will always add the primary key to the selected fields. Currently there is no way to disable that.
Perhaps adding DISTINCT ON as suggested here https://stackoverflow.com/a/19723716/800016 to person_id could fix the issue?
Otherwise, feel free to open an issue on the sequelize bug tracker

Related

Whats the best method of making arrays of joined tables in Big Query?

I'm trying to do some data transformation inside of Big Query, with SQL.
Let's say I have three tables:
Customer - data about the customer, like age, etc
Subscriptions - data about what subscriptions the user have
Engagements - data about how the customers have interacted with digital products.
I'd like to collect this inside of one table, using nested fields.
I can join all these tables, but I'd like to aggregate them to arrays.
So, instead of three tables I get this:
id:123,
name:David,
age:30,
subscritions: [{
name:sub1
price:10
},
{
name:sub2
price:20
}],
engagment: [{
event:visited_product_x
time:2020-06-10
},
{
event:visited_product_y
time:2020-06-10
}]
Of course I've used array_agg in SELECT. And that works great, when adding only one table. However, when adding another one, I get duplicate rows, which I don't want. So, I guess I shouldn't use array_agg in SELECT, but rather somewhere else.
But whats the best way of solving this?
You can use subqueries to construct the fields. Something like this:
select c.*,
(select array_agg(s)
from substriptions s
where s.user_id = c.user_id
) as subscriptions,
(select array_agg(e)
from engagements e
where e.user_id = c.user_id
) as engagements
from customers c

How to count different rows and put them all in same table in SQL?

I want to count how many people are applied for each category in my database and put all of those counts in one table. I don't have exact idea how should I do that. I've done it already like this but I want to have all in one query, and not to do it like this for each category.
select
count(cc.fk_id_candidates) as 'category A'
from candidate_category cc, candidate c, category cat
where c.id=cc.fk_id_candidates and cc.fk_id_category=cat.id and category.name='A';
From that code I get number of people applied for category A as an output, which is correct, but I just need the same info for other categories too. I tried with case but it's not working right.
Thank you.
You would typically add a group by clause on the category name and/or id:
select cat.category.name, count(*) cnt
from candidate_category cc
inner join candidate c on c.id = cc.fk_id_candidates and
inner join category cat on cc.fk_id_category = cat.id
group by cat.id, cat.category.name;
Note that I changed your query to use standard joins (with the on keyword) rather than implicit joins (with commas in the from clause and conditions in the where clause) - this old syntax should not be used in new code.
As i understand you have three tables:
- candidate_category cc,
- candidate c,
- category cat
Also, you have Created below relationship using the REFERENCES:
where c.id=cc.fk_id_candidates
and cc.fk_id_category=cat.id
and category.name='A;
Now, you must have all the three tables in one table already and you can view it by:
and this is very important step to find a common column where you can filter it and find the proper data you want.
Select *
from candidate_category cc,
candidate c,
category cat
where c.id=cc.fk_id_candidates
and cc.fk_id_category=cat.id

Matching Many To Many relation

I have the following entities:
#Entity
class User {
#ManyToMany(type => Group)
#JoinTable()
groups: Group[];
}
#Entity
class MediaObject {
#ManyToMany(type => Group)
#JoinTable()
groups: Group[];
}
#Entity
class Group {
// [...]
}
Now I want to select every MediaObject which has at least one group in common with the one specific User.
Example:
User 1 MediaObject 1
-----------------------------
Group 1 |--- Group 2
Group 2 ---| Group 3
User 1 has at least one same group as MediaObject
How can I build a where sql query for this? I use typeorm for building my queries, yet every sql query would help. Also, I want to understand how.
Typeorm joins the tables like this
LEFT JOIN "group" "groups" ON "groups"."id" = "media_groups"."groupId"
Using a simple JOIN you may retrieve MediaObject Id's that share at least one group with the user. Than use IN to retrieve the MediaObject's
select *
from MediaObject mo
where mo.id in
(
select moJoin.mediaObjectId
from media_object_groups_group moJoin
join user_groups_group uJoin on moJoin.groupId = uJoin.groupId
where uJoin.userId = 1
)
If there can be multiple overlapping groups between same MediaObject and the same User, an EXISTS semi-join might be faster than using IN:
SELECT m.*
FROM "MediaObject" m
WHERE EXISTS (
SELECT -- select list can be empty here
FROM user_groups_group gu
JOIN media_object_groups_group gm USING ("groupId")
WHERE gu."userId" = 1
AND gm."mediaObjectId" = m.id
);
Else, Radim's query should serve just fine after adding some double-quotes.
This assumes that referential integrity is enforced with foreign key constraints, so it's safe to rely on user_groups_group."userId" without checking the corresponding user even exists.
It's unwise to use reserved words like "user" or "group" or CaMeL-case strings as identifiers. Either requires double-quoting. ORMs regularly serve poorly in this respect. See:
Are PostgreSQL column names case-sensitive?

ActiveRecord Join Query and select in Rails

In my rails 4 application, a client (clients table) can have many projects (projects table). I have a column called name in each table. I am trying to write a join and then select which uses projects as the base table and clients as the lookup table. client_id is the foreign_key in the projects table:
I am writing my query as follows:
Project.joins(:client).select('projects.id,projects.name,clients.name')
I get the following response:
Project Load (0.6ms) SELECT projects.id,projects.name,clients.name FROM "projects" INNER JOIN "clients" ON "clients"."id" = "projects"."client_id"
=> #<ActiveRecord::Relation [#<Project id: 1, name: "Fantastico Client">]>
If I try to alias it like so:
Project.joins(:client).select('projects.id,projects.name,clients.name as client_name')
Then I get the following response:
Project Load (0.8ms) SELECT projects.id,projects.name,clients.name as client_name FROM "projects" INNER JOIN "clients" ON "clients"."id" = "projects"."client_id"
=> #<ActiveRecord::Relation [#<Project id: 1, name: "The Dream Project">]>
In either case, ActiveRecord looses one of the names as you can see from the above response. How should I be writing this query?
If the column in select is not one of the attributes of the model on which the select is called on then those columns are not displayed. All of these attributes are still contained in the objects within AR::Relation and are accessible as any other public instance attributes.
You could verify this by calling first.client_name:
Project.joins(:client)
.select('projects.id,projects.name,clients.name as client_name')
.first.client_name
You can use :'clients.name' as one of your symbols. For instance:
Project.select(:id, :name, :'clients.name').joins(:client)
I like it better because it seems like Rails understands it, since it quotes all parameters:
SELECT "projects"."id", "projects"."name", "clients"."name"
FROM "projects"
INNER JOIN "clients" ON "clients"."id" = "projects"."client_id"
(I'm not 100% sure that's the exact SQL query, but I'm fairly certain and I promise it will use "clients"."name")
To get both project table name and client name you can do like below query
Project.joins(:client).pluck(:name,:'clients.name')
your query don't looses any thing. Actually you have applied join on models and you have written Project.joins(:client) that why it is looking like.
means It will hold Project related data as it is and associated data hold with alias name that you have given 'client_name' in your query.
if you use
Project.joins(:client)
.select('projects.id project_id, projects.name projects_name,clients.name as client_name')
then it look like
[#, #]
but it hold all the attribute that you selected.
Try This:
sql = Project.joins(:client).select(:id, :name, :"clients.name AS client_name").to_sql
data = ActiveRecord::Base.connection.exec_query(sql)
OUTPUT
[
{"id"=>1, "name"=>"ProjectName1", "client_name"=>"ClientName1"},
{"id"=>2, "name"=>"ProjectName2", "client_name"=>"ClientName2"}
]

Django: Order a model by a many-to-many field

I am writing a Django application that has a model for People, and I have hit a snag. I am assigning Role objects to people using a Many-To-Many relationship - where Roles have a name and a weight. I wish to order my list of people by their heaviest role's weight. If I do People.objects.order_by('-roles__weight'), then I get duplicates when people have multiple roles assigned to them.
My initial idea was to add a denormalized field called heaviest-role-weight - and sort by that. This could then be updated every time a new role was added or removed from a user. However, it turns out that there is no way to perform a custom action every time a ManyToManyField is updated in Django (yet, anyway).
So, I thought I could then go completely overboard and write a custom field, descriptor and manager to handle this - but that seems extremely difficult when the ManyRelatedManager is created dynamically for a ManyToManyField.
I have been trying to come up with some clever SQL that could do this for me - I'm sure it's possible with a subquery (or a few), but I'd be worried about it not being compatible will all the database backends Django supports.
Has anyone done this before - or have any ideas how it could be achieved?
Django 1.1 (currently beta) adds aggregation support. Your query can be done with something like:
from django.db.models import Max
People.objects.annotate(max_weight=Max('roles__weight')).order_by('-max_weight')
This sorts people by their heaviest roles, without returning duplicates.
The generated query is:
SELECT people.id, people.name, MAX(role.weight) AS max_weight
FROM people LEFT OUTER JOIN people_roles ON (people.id = people_roles.people_id)
LEFT OUTER JOIN role ON (people_roles.role_id = role.id)
GROUP BY people.id, people.name
ORDER BY max_weight DESC
Here's a way to do it without an annotation:
class Role(models.Model):
pass
class PersonRole(models.Model):
weight = models.IntegerField()
person = models.ForeignKey('Person')
role = models.ForeignKey(Role)
class Meta:
# if you have an inline configured in the admin, this will
# make the roles order properly
ordering = ['weight']
class Person(models.Model):
roles = models.ManyToManyField('Role', through='PersonRole')
def ordered_roles(self):
"Return a properly ordered set of roles"
return self.roles.all().order_by('personrole__weight')
This lets you say something like:
>>> person = Person.objects.get(id=1)
>>> roles = person.ordered_roles()
Something like this in SQL:
select p.*, max (r.Weight) as HeaviestWeight
from persons p
inner join RolePersons rp on p.id = rp.PersonID
innerjoin Roles r on rp.RoleID = r.id
group by p.*
order by HeaviestWeight desc
Note: group by p.* may be disallowed by your dialect of SQL. If so, just list all the columns in table p that you intend to use in the select clause.
Note: if you just group by p.ID, you won't be able to call for the other columns in p in your select clause.
I don't know how this interacts with Django.