GROUPBY many to many with itself - sql

I have a User table that has a many to many relation with itself, and I would like to get all the pairs of users with this specific relation. The problem is, in the relation table I store users like that:
+------+---------------+
| User | relation |
+------+---------------+
| id | left_user_id |
| name | right_user_id |
| ... | ... |
+------+---------------+
So when I do a basic
SELECT count(*)
FROM relation LEFT OUTER JOIN user AS user_1 ON user_1.id = relation.left_user_id
LEFT OUTER JOIN user AS user_2 ON user_2.id = relation.right_user_id
GROUP BY left_user_id, right_user_id;
I sometimes get two results for the same pair (for example sometimes (Adam, Eva) and (Eva, Adam) which are the same pair). What I would like to achieve is just one pair: (Adam, Eva).
How can this be achieved?

You can use the functions least() and greatest():
SELECT count(*)
FROM relation r
LEFT OUTER JOIN user AS user_1 ON user_1.id = r.left_user_id
LEFT OUTER JOIN user AS user_2 ON user_2.id = r.right_user_id
GROUP BY LEAST(r.left_user_id, r.right_user_id), GREATEST(r.left_user_id, r.right_user_id);
Or in this case where you don't need the joins:
SELECT count(*)
FROM relation
GROUP BY LEAST(left_user_id, right_user_id), GREATEST(left_user_id, right_user_id);

The left joins should not be necessary. The key is simply using least() and greatest(). That would be:
SELECT LEAST(r.left_user_id, r.right_user_id) as user_id_1,
GREATEST(r.left_user_id, r.right_user_id) as user_id_2,
COUNT(*)
FROM relation
GROUP BY user_id_1, user_id_2;
The one caveat with this approach is that the pair in the result set may not be in the original data -- in that order. So, if you had "Eve"/"Adam" once in the data, then it would return: "Adam"/"Eve"/1. That can be addressed, if necessary.

Related

In Oracle SQL is there a way to join on a value twice?

Lets say I have two tables, with the following columns:
cars
motorcycle_id | fuel_id | secondary_fuel_id ...
fuel_types
fuel_id | fuel_label | ...
In this case fuel_id and secondary fuel_id both refer to the fuel_types table.
Is it possible to include both labels in an inner join? I want to join on the fuel_id but I want to be able to have the fuel label twice as a new column. So the join would be something like:
motorcycle_id | fuel_id | fuel_label | secondary_fuel_id | secondary_fuel_label | ...
In this case I would have created the secondary_fuel_label column.
Is this possible to do in SQL with joins? Is there another way to accomplish this?
You would just join twice:
select c.*, f1.fuel_label, f2.fuel_label as secondary_fuel_label
from cars c left join
fuel_types f1
on c.fuel_id = f1.fuel_id left join
fuel_types f2
on c.fuel_id = f1.secondary_fuel_id ;
The key point here is to use table aliases, so you can distinguish between the two table references to fuel_types.
Note that this uses left join to be sure that rows are returned even if one of the ids are missing.

With SQL, how can I find non-matches in a single table many-to-many relation?

I have a database that, among other things, records the results of reactions between ingredients. It's currently structured with the following three tables:
| Material |
|----------------|
| id : Integer |
| name : Varchar |
| Reaction |
|-----------------|
| id : Integer |
| <other details> |
| Ingredient |
|-----------------------|
| material_id : Integer |
| reaction_id : Integer |
| quantity : Real |
This maps the many-to-many relationship between materials and reactions.
I would like to run a query that returns every pair of materials that do not form a reaction. (i.e. every pair (x, y) such that there is no reaction that uses exactly x and y and no other materials.) In other circumstances, I would do this with a LEFT JOIN onto the intermediate table and then look for NULL reaction_ids. In this case, I'm getting the pairs by doing a CROSS JOIN on the materials table and itself, but I'm not sure how (or whether) doing two LEFT JOINs onto the two materials aliases can work.
How can this be done?
I'm most interested in a generic SQL approach, but I'm currently using SQLite3 with SQLAlchemy. I have the option of moving the database to PostgreSQL, but SQLite is strongly preferred.
Use a cross join to generate the list and then remove the pairs that are in the same reaction.
select m.id, m2.id as id2
from materials m cross join
materials m2
where not exists (select 1
from ingredient i join
ingredient i2
on i.reaction_id = i2.reaction_id and
i.material_id = m.id and
i2.material_id = m2.id
);
Although this query looks complicated, it is essentially a direct translation of your question. The where clause is saying that there are not two ingredients for the same reaction that have each of the materials.
For performance, you want an index on ingredient(reaction_id, material_id).
EDIT:
If you like, you can do this without an exists, using a left join and where:
select m.id, m2.id
from materials m cross join
materials m2 left join
ingredients i
on i.material_id = m.id left join
ingredients i2
on i2.material_id = m2.id and
i2.reaction_id = m2.reaction_id
where i2.reaction_id is null;

'Implicit' JOIN based on schema's foreign keys?

Hello all :) I'm wondering if there is way to tell the database to look at the schema and infer the JOIN predicate:
+--------------+ +---------------+
| prices | | products |
+--------------+ +---------------+
| price_id (PK)| |-1| product_id(PK)|
| prod_id |*-| | weight |
| shop | +---------------+
| unit_price |
| qty |
+--------------+
Is there a way (preferably in Oracle 10g) to go from:
SELECT * FROM prices JOIN product ON prices.prod_id = products.product_id
to:
SELECT * FROM pricesIMPLICIT JOINproduct
The closest you can get to not writing the actual join condition is a natural join.
select * from t1 natural join t2
Oracle will look for columns with identical names and join by them (this is not true in your case). See the documentation on the SELECT statement:
A natural join is based on all columns in the two tables that have the same name. It selects rows from the two tables that have equal values in the relevant columns. If two columns with the same name do not have compatible data types, then an error is raised
This is very poor practice and I strongly recommend not using it on any environment
You shouldnt do that. Some db systems allow you to but what if you modify the fk's (i.e. add foreign keys)? You should always state what to join on to avoid problems. Most db systems won't even allow you to do an implicit join though (good!).

What kind of SQL join do I need to compress a One to Many relationship into the same view row?

Edit: this isn't to be a dynamic output, the output view structure is fixed.
I am trying to create a SQL Server view that shows a single fixed column row for each user, and flattens out an associated one to many table into that row.
Although the associated table has a one to many relationship, the output table structure is limited to 4 elememts form that table.
My table structure is like so:
User (Id, FirstName, LastName)
Assessment (Id, Date, Location, User_Id)
Topics (Id, Topic, Assessment_Id)
Where the Assessment is joined to the User by the User_Id (One 2 One), and the Topics are joined to the Assessment by the Assessment_Id.
So, if I have three topics for an assessment, I'd want the view to look something like:
User_Id | FirstName | LastName | Date | Location | Topic1 | Topic2 | Topic3 | Topic4 |
1 | dave | toby | 2/2/11 | In situ | apples | pears | lemons | NULL |
My current SQL looks like this:
SELECT User.Id, User.FirstName, User.LastName, Assessment.Date, Assessment.Location, Topic.Topic
FROM User LEFT OUTER JOIN
Assessment INNER JOIN
Topic ON Assessment.Id = Topic.Assessment_Id ON
User.Id = Assessment.User_Id
But this returns a row for each concern - it doesn't compress them to one line. I've played with a few different joins, but haven't been able to get the behaviour I want.
Is it possible to do this in a view?
What do I need to do to make it happen??
Thanks!
There is no such JOIN. SQL has a fixed column output: so you can't add arbritrary numbers of columns. It doesn't matter if it's a view, direct or in a stored procedure.
There are 2 main options
concatenate the many rows into one column which is a popular questions here on SO. One random solution using XML PATH
use dynamic SQL to add a column per row in a stored procedure.
Note: PIVOT is fixed column output too
Edit: for a maximum of 4 child rows
SELECT
P.col1, P.col2,
C1.col1 AS Topic1,
C2.col1 AS Topic2,
C3.col1 AS Topic2,
C4.col1 AS Topic4
FROM
Parent P
LEFT JOIN
Child C1 ON P.Key = C1.FKey AND C1.ID = 1
LEFT JOIN
Child C2 ON P.Key = C2.FKey AND C2.ID = 2
LEFT JOIN
Child C3 ON P.Key = C3.FKey AND C3.ID = 3
LEFT JOIN
Child C4 ON P.Key = C4.FKey AND C4.ID = 4
You can use PIVOT too but I prefer the simpler self joins.
Take a look at PIVOT table functionality - e.g. http://www.help-sql.info/27/9/610208.html and http://blog.sqlauthority.com/2008/05/22/sql-server-pivot-table-example/
Although you will need to know the AssessmentId's before you can write the PIVOT

Select all items in a table that do not appear in a foreign key of another table

Take for example an application which has users, each of which can be in exactly one group. If we want to SELECT the list of groups which have no members, what would be the correct SQL? I keep feeling like I'm just about to grasp the query, and then it disappears again.
Bonus points - given the alternative senario, where it's a many to many pairing, what is the SQL to identify unused groups?
(if you want concrete field names:)
One-To-Many:
Table 'users': | user_id | group_id |
Table 'groups': | group_id |
Many-To-Many:
Table 'users': | user_id |
Table 'groups': | group_id |
Table 'user-group': | user_id | group_id |
Groups that have no members (for the many-many pairing):
SELECT *
FROM groups g
WHERE NOT EXISTS
(
SELECT 1
FROM users_groups ug
WHERE g.groupid = ug.groupid
);
This Sql will also work in your "first" example as you can substitute "users" for "users_groups" in the sub-query =)
As far as performance is concerned, I know that this query can be quite performant on Sql Server, but I'm not so sure how well MySql likes it..
For the first one, try this:
SELECT * FROM groups
LEFT JOIN users ON (groups.group_id=users.group_id)
WHERE users.user_id IS NULL;
For the second one, try this:
SELECT * FROM groups
LEFT JOIN user-group ON (groups.group_id=user-group.group_id)
WHERE user-group.user_id IS NULL;
SELECT *
FROM groups
WHERE groups.id NOT IN (
SELECT user.group_id
FROM user
)
It will return all group id which not present in user