Matching Many To Many relation - sql

I have the following entities:
#Entity
class User {
#ManyToMany(type => Group)
#JoinTable()
groups: Group[];
}
#Entity
class MediaObject {
#ManyToMany(type => Group)
#JoinTable()
groups: Group[];
}
#Entity
class Group {
// [...]
}
Now I want to select every MediaObject which has at least one group in common with the one specific User.
Example:
User 1 MediaObject 1
-----------------------------
Group 1 |--- Group 2
Group 2 ---| Group 3
User 1 has at least one same group as MediaObject
How can I build a where sql query for this? I use typeorm for building my queries, yet every sql query would help. Also, I want to understand how.
Typeorm joins the tables like this
LEFT JOIN "group" "groups" ON "groups"."id" = "media_groups"."groupId"

Using a simple JOIN you may retrieve MediaObject Id's that share at least one group with the user. Than use IN to retrieve the MediaObject's
select *
from MediaObject mo
where mo.id in
(
select moJoin.mediaObjectId
from media_object_groups_group moJoin
join user_groups_group uJoin on moJoin.groupId = uJoin.groupId
where uJoin.userId = 1
)

If there can be multiple overlapping groups between same MediaObject and the same User, an EXISTS semi-join might be faster than using IN:
SELECT m.*
FROM "MediaObject" m
WHERE EXISTS (
SELECT -- select list can be empty here
FROM user_groups_group gu
JOIN media_object_groups_group gm USING ("groupId")
WHERE gu."userId" = 1
AND gm."mediaObjectId" = m.id
);
Else, Radim's query should serve just fine after adding some double-quotes.
This assumes that referential integrity is enforced with foreign key constraints, so it's safe to rely on user_groups_group."userId" without checking the corresponding user even exists.
It's unwise to use reserved words like "user" or "group" or CaMeL-case strings as identifiers. Either requires double-quoting. ORMs regularly serve poorly in this respect. See:
Are PostgreSQL column names case-sensitive?

Related

SQL query to exclude records that are part of a group

I can't believe this hasn't been answered elsewhere, but I don't seem to know the right words to convey what I'm trying to do. I'm using Ruby/Rails and PostgreSQL.
I have a bunch of Users in the DB that I'm trying to add to a Group based on a name search. I need to return Users that do not belong to a particular Group, but there is a join table as well (UserGroups, with the appropriate FKs).
Is there a simple way to use this configuration to perform this query without having to result to grabbing all the Users from which belong to the group and doing something like .where.not(id: users_in_group.pluck(:id)) (these groups can be pretty huge, so I don't want to send that query to the DB on a text search as the user types).
I need to return Users that do not belong to a particular Group
SELECT *
FROM users u
WHERE username ~ 'some pattern' -- ?
AND NOT EXISTS (
SELECT FROM user_groups ug
WHERE ug.group_id = 123 -- your group_id to exclude here
AND ug.user_id = u.id
);
See:
Select rows which are not present in other table

How can I remove the repetitive joins from this SELECT query?

I have a postgres database with two tables: services and meta.
The first table stores the "core" information the application needs, and the app also has a "custom field" feature implemented similar to how Wordpress's wp_post_meta table works.
Users can add on meta rows with arbitrary keys and values, in a one-to-many relationship with the service.
The schema of the meta table is:
id
key (string)
value (string)
service_id (foreign key)
That works great for the app, so I'm not interested in changing the schema, but for some infrequently used admin dashboards I need to get back a list of services with several of the meta rows joined on as columns.
Here's what I have so far:
SELECT
services.*,
meta1.value AS funding,
meta2.value AS ownership
FROM services
JOIN meta meta1
ON services.id = meta.service_id
AND meta.key = 'Funding'
JOIN meta meta2
ON services.id = meta2.service_id
AND meta2.key = 'Ownership'
Now, this works great, but I have to do another join every time I want to add another meta value.
That seems like it will slow down the query and make it less readable.
Is there a good way to refactor this to keep it easy to read and fast to run?
here's an attempted refactor using OR, which doesn't work:
SELECT
*,
meta.value AS funding,
meta.value AS ownership
FROM services
JOIN meta
ON services.id = meta.service_id
AND meta.key = 'Funding' OR meta.key = 'Ownership'
One way would be to aggregate the key/value pairs into a JSON value with a derived table:
select srv.*,
mv.vals ->> 'Funding' as funding,
mv.vals ->> 'Ownership' as ownership
from services srv
cross join lateral (
select jsonb_object_agg(m.key, m.value) as vals
from meta m
where m.key in ('Funding', 'Ownership')
and m.service_id = srv.id
) as mv
If your application can handle the JSON, then maybe the conversion into two separate columns isn't actually necessary which would avoid the repetition of the keys.
You can use conditional aggregation:
SELECT s.*, m.funding, m.ownership
FROM services s JOIN
(SELECT m.service_id,
MAX(value) FILTER (WHERE key = 'Funding') as Funding,
MAX(value) FILTER (WHERE key = 'Ownership') as Ownership
FROM meta m
GROUP BY m.service_id
) m
ON m.service_id = s.id

MS Access Distinct Records in Recordset

So, I once again seem to have an issue with MS Access being finicky, although it seems to also be an issue when trying similar queries in SSMS (SQL Server Management Studio).
I have a collection of tables, loosely defined as follows:
table widget_mfg { id (int), name (nvarchar) }
table widget { id (int), name (nvarchar), mfg_id (int) }
table widget_component { id (int), name (nvarchar), widget_id (int), component_id }
table component { id (int), name (nvarchar), ... } -- There are ~25 columns in this table
What I'd like to do is query the database and get a list of all components that a specific manufacturer uses. I've tried some of these queries:
SELECT c.*, wc.widget_id, w.mfg_id
FROM ((widget_component wc INNER JOIN widget w ON wc.widget_id = w.id)
INNER JOIN widget_manufacturer wm on w.mfg_id = wm.id)
INNER JOIN component c on c.id = wc.component_id
WHERE wm.id = 1
The previous example displays duplicates of any part that is contained in multiple widget_component lists for different widgets.
I've also tried doing:
SELECT DISTINCT c.id, c.name, wc.widget_id, w.mfg_id
FROM component c, widget_component wc, widget w, widget_manufacturer wm
WHERE wm.id=w.mfg_id AND wm.id = 1
This doesn't display anything at all. I was reading about sub-queries, but I do not understand how they work or how they would apply to my current application.
Any assistance in this would be beneficial.
As an aside, I am not very good with either MS Access or SQL in general. I know the basics, but not a lot beyond that.
Edit:
I just tried this code, and it works to get all the component.id's while limiting them to a single entry each. How do I go about using the results of this to get a list of all the rest of the component data (component.*) where the id's from the first part are used to select this data?
SELECT DISTINCT c.part_no
FROM component c, widget w, widget_component wc, widget_manufacturer wm
WHERE(((c.id=wc.component_id AND wc.widget_id=w.id AND w.mfg_id=wm.id AND wm.id=1)))
(P.S. this is probably not the best way to do this, but I am still learning SQL.)
What I'd like to do is query the database and get a list of all
components that a specific manufacturer uses
There are several ways to do this. IN is probably the easiest to write
SELECT c.*
FROM component c
WHERE c.id IN (SELECT c.component_id
FROM widget w
INNER JOIN widget_component c
ON w.id = c.widget_id
WHERE w.mfg_id = 123)
The IN sub query finds all the component ids that a specific manufacturer uses. The outer query then selects any component.id that is that result. It doesn't matter if its in there once or 1000 times it will only get the component record once.
The other ways of doing this are using an EXISTS sub query or using a join to the query (but then you do need to de-dup it)
It sounds like your component -to- widget relationship is one-to-many. Hence the duplicates. (i.e., the same component is used by more than one widget).
Your Select is almost OK --
SELECT c.*, wc.widget_id, w.mfg_id
but the wc.widget_id is causing the duplicates (per the assumption above).
So remove wc.widget_id from the SELECT, or else aggregate it (min, max, count, etc.). Removing is easier. If you agregate, remember to add a group by clause.
Try this:
SELECT DISTINCT c.*, w.mfg_id
Also -- FWIW, it's generally a better practice to use field names, instead of the *

SQL multiple join on many to many tables + comma separation

I have these tables:
media
id (int primary key)
uri (varchar).
media_to_people
media_id (int primary key)
people_id (int primary key)
people
id (int primary key)
name (varchar)
role (int) -- role specifies whether the person is an artist, publisher, writer, actor,
etc relative to the media and has range(1-10)
This is a many to many relation
I want to fetch a media and all its associated people in a select. So if a media has 10 people associated with it, all 10 must come.
Further more, if multiple people with the same role exist for a given media, they must come as comma separated values under a column for that role.
Result headings must look like: media.id, media.uri, people.name(actor), people.name(artist), people.name(publisher) and so on.
I'm using sqlite.
SQLite doesn't have the "pivot" functionality you'd need for starters, and the "comma separated values" part is definitely a presentation issue that it would be absurd (and possibly unfeasible) to try to push into any database layer, whatever dialect of SQL may be involved -- it's definitely a part of the job you'd do in the client, e.g. a reporting facility or programming language.
Use SQL for data access, and leave presentation to other layers.
How you get your data is
SELECT media.id, media.uri, people.name, people.role
FROM media
JOIN media_to_people ON (media.id = media_to_people.media_id)
JOIN people ON (media_to_people.people_id = people.id)
WHERE media.id = ?
ORDER BY people.role, people.name
(the ? is one way to indicate a parameter in SQLite, to be bound to the specific media id you're looking for in ways that depend on your client); the data will come from the DB to your client code in several rows, and your client code can easily put them into the single column form that you want.
It's hard for us to say how to code the client-side part w/o knowing anything about the environment or language you're using as the client. But in Python for example:
def showit(dataset):
by_role = collections.defaultdict(list)
for mediaid, mediauri, name, role in dataset:
by_role[role].append(name)
headers = ['mediaid', 'mediauri']
result = [mediaid, mediauri]
for role in sorted(by_role):
headers.append('people(%s)' % role)
result.append(','.join(by_role[role]))
return ' '.join(headers) + '\n' + ' '.join(result)
even this won't quite match your spec -- you ask for headers such as 'people(artist)' while you also specify that the role's encoded as an int, and mention no way to go from the int to the string 'artist', so it's obviously impossible to match your spec exactly... but it's as close as my ingenuity can get;-).
I agree with Alex Martelli's answer, that you should get the data in multiple rows and do some processing in your application.
If you try to do this with just joins, you need to join to the people table for each role type, and if there are multiple people in each role, your query will have Cartesian products between these roles.
So you need to do this with GROUP_CONCAT() and produce a scalar subquery in your select-list for each role:
SELECT m.id, m.uri,
(SELECT GROUP_CONCAT(name)
FROM media_to_people JOIN people ON (people_id = id)
WHERE media_id = m.id AND role = 1) AS Actors,
(SELECT GROUP_CONCAT(name)
FROM media_to_people JOIN people ON (people_id = id)
WHERE media_id = m.id AND role = 2) AS Artists,
(SELECT GROUP_CONCAT(name)
FROM media_to_people JOIN people ON (people_id = id)
WHERE media_id = m.id AND role = 3) AS Publishers
FROM media m;
This is truly ugly! Don't try this at home!
Take our advice, and don't try to format the pivot table using only SQL.

Django: Order a model by a many-to-many field

I am writing a Django application that has a model for People, and I have hit a snag. I am assigning Role objects to people using a Many-To-Many relationship - where Roles have a name and a weight. I wish to order my list of people by their heaviest role's weight. If I do People.objects.order_by('-roles__weight'), then I get duplicates when people have multiple roles assigned to them.
My initial idea was to add a denormalized field called heaviest-role-weight - and sort by that. This could then be updated every time a new role was added or removed from a user. However, it turns out that there is no way to perform a custom action every time a ManyToManyField is updated in Django (yet, anyway).
So, I thought I could then go completely overboard and write a custom field, descriptor and manager to handle this - but that seems extremely difficult when the ManyRelatedManager is created dynamically for a ManyToManyField.
I have been trying to come up with some clever SQL that could do this for me - I'm sure it's possible with a subquery (or a few), but I'd be worried about it not being compatible will all the database backends Django supports.
Has anyone done this before - or have any ideas how it could be achieved?
Django 1.1 (currently beta) adds aggregation support. Your query can be done with something like:
from django.db.models import Max
People.objects.annotate(max_weight=Max('roles__weight')).order_by('-max_weight')
This sorts people by their heaviest roles, without returning duplicates.
The generated query is:
SELECT people.id, people.name, MAX(role.weight) AS max_weight
FROM people LEFT OUTER JOIN people_roles ON (people.id = people_roles.people_id)
LEFT OUTER JOIN role ON (people_roles.role_id = role.id)
GROUP BY people.id, people.name
ORDER BY max_weight DESC
Here's a way to do it without an annotation:
class Role(models.Model):
pass
class PersonRole(models.Model):
weight = models.IntegerField()
person = models.ForeignKey('Person')
role = models.ForeignKey(Role)
class Meta:
# if you have an inline configured in the admin, this will
# make the roles order properly
ordering = ['weight']
class Person(models.Model):
roles = models.ManyToManyField('Role', through='PersonRole')
def ordered_roles(self):
"Return a properly ordered set of roles"
return self.roles.all().order_by('personrole__weight')
This lets you say something like:
>>> person = Person.objects.get(id=1)
>>> roles = person.ordered_roles()
Something like this in SQL:
select p.*, max (r.Weight) as HeaviestWeight
from persons p
inner join RolePersons rp on p.id = rp.PersonID
innerjoin Roles r on rp.RoleID = r.id
group by p.*
order by HeaviestWeight desc
Note: group by p.* may be disallowed by your dialect of SQL. If so, just list all the columns in table p that you intend to use in the select clause.
Note: if you just group by p.ID, you won't be able to call for the other columns in p in your select clause.
I don't know how this interacts with Django.