SQL problem - select accross multiple tables (user groups) - sql

I have a db schema which looks something like this:
create table user (id int, name varchar(32));
create table group (id int, name varchar(32));
create table group_member (group_id int, user_id int, flag int);
I want to write a query that allows me to so the following:
Given a valid user id (UID), fetch the ids of all users that are in the same group as the specified user id (UID) AND have group_member.flag=3.
Rather than just have the SQL. I want to learn how to think like a Db programmer. As a coder, SQL is my weakest link (since I am far more comfortable with imperative languages than declarative ones) - but I want to change that.
Anyway here are the steps I have identified as necessary to break down the task. I would be grateful if some SQL guru can demonstrate the simple SQL statements - i.e. atomic SQL statements, one for each of the identified subtasks below, and then finally, how I can combine those statements to make the ONE statement that implements the required functionality.
Here goes (assume specified user_id [UID] = 1):
//Subtask #1.
Fetch list of all groups of which I am a member
Select group.id from user inner join group_member where user.id=group_member.user_id and user.id=1
//Subtask #2
Fetch a list of all members who are members of the groups I am a member of (i.e. groups in subtask #1)
Not sure about this ...
select user.id from user, group_member gm1, group_member gm2, ... [Stuck]
//Subtask #3
Get list of users that satisfy criteria group_member.flag=3
Select user.id from user inner join group_member where user.id=group_member.user_id and user.id=1 and group_member.flag=3
Once I have the SQL for subtask2, I'd then like to see how the complete SQL statement is built from these subtasks (you dont have to use the SQL in the subtask, it just a way of explaining the steps involved - also, my SQL may be incorrect/inefficient, if so, please feel free to correct it, and point out what was wrong with it).
Thanks

Query 1 - Select all groups I am a member of.
You don't need a join here unless you also want the groups' names. Just check the group_member table.
SELECT group_id
FROM group_member
WHERE user_id = 1
Result:
1
3
Query 2: Select all users in one of the same groups as me.
You can self-join the group_member table to find all the users that are in the same group as each other and then add a where clause to only find all those that are in the same group as yourself. Add DISTINCT to make sure you don't get people twice.
SELECT DISTINCT T2.user_id
FROM group_member AS T1
JOIN group_member AS T2
ON T1.group_id = T2.group_id
WHERE T1.user_id = 1
AND T2.user_id <> 1 -- Remove myself
Result:
2
3
5
Query 3: Users who have flag 3 in any group.
You just need to check the group_member table. Again, add DISTINCT if you only want to see each user once.
SELECT DISTINCT user_id
FROM group_member
WHERE group_member.flag=3
Result:
2
3
4
Final query: Users in the same group as me who have flag 3.
This is almost the same as query two, just add an extra WHERE condition.
SELECT DISTINCT T2.user_id
FROM group_member AS T1
JOIN group_member AS T2
ON T1.group_id = T2.group_id
WHERE T1.user_id = 1
AND T2.user_id <> 1 -- Remove myself
AND T2.flag = 3
Result:
2
3
Test data:
create table user (id int, name varchar(32));
create table `group` (id int, name varchar(32));
create table group_member (group_id int, user_id int, flag int);
insert into user (id, name) VALUES (1, 'user1'), (2, 'user2'), (3, 'user3'), (4, 'user4'), (5, 'user5');
insert into `group` (id, name) VALUES (1, 'group1'),(2, 'group2'), (3, 'group3');
insert into group_member (group_id, user_id, flag) VALUES (1, 1, 0), (1, 2, 3), (1, 3, 3), (2, 3, 3), (2, 4, 3), (2, 5, 0), (3, 1, 0), (3, 5, 0);

This should find all people in the same group as a certain user, with the specified flag.
SELECT DISTINCT g2.user_id
FROM group_member AS g INNER JOIN group_member AS g2
ON g.group_id = g2.group_id
WHERE g.user_id = <the userid you want to find>
AND g2.flag = 3
To approach the problem, I did the following:
We need to compare two group_members, as we want to know which ones are in the same group, so we'll need to join group_member with itself. (On group_id, as we want the ones in the same group.)
I want to make sure that one of them is the user I want to compare it to, and then the other one must have the correct flag.
Once this is done, I simply need to pull out the user_id from the one I compare to my original user. Since I figure a user might be in two groups another user is in, it might also be wise to add a DISTINCT to ensure that we only get each user_id out of it once.

The beauty of SQL is that you can join these three selects together with one efficient join. In this case I used the WHERE version of a join because I find it easier to understand. But you might also look at the syntax for LEFT JOIN and INNER JOIN because they give you a lot of expressivity.
SELECT * FROM user, group_member, group
WHERE user.id = group_member.user_id
&& group_member.group_id = group.id
&& group_member.flag = 3

Related

HAVING clause with subquery -- Checking if group has at least one row matching conditions

Suppose I have the following table
DROP TABLE IF EXISTS #toy_example
CREATE TABLE #toy_example
(
Id int,
Pet varchar(10)
);
INSERT INTO #toy
VALUES (1, 'dog'),
(1, 'cat'),
(1, 'emu'),
(2, 'cat'),
(2, 'turtle'),
(2, 'lizard'),
(3, 'dog'),
(4, 'elephant'),
(5, 'cat'),
(5, 'emu')
and I want to fetch all Ids that have certain pets (for example either cat or emu, so Ids 1, 2 and 5).
DROP TABLE IF EXISTS #Pets
CREATE TABLE #Pets
(
Animal varchar(10)
);
INSERT INTO #Pets
VALUES ('cat'),
('emu')
SELECT Id
FROM #toy_example
GROUP BY Id
HAVING COUNT(
CASE
WHEN Pet IN (SELECT Animal FROM #Pets)
THEN 1
END
) > 0
The above gives me the error Cannot perform an aggregate function on an expression containing an aggregate or a subquery. I have two questions:
Why is this an error? If I instead hard code the subquery in the HAVING clause, i.e. WHEN Pet IN ('cat','emu') then this works. Is there a reason why SQL server (I've checked with SQL server 2017 and 2008) does not allow this?
What would be a nice way to do this? Note that the above is just a toy example. The real problem has many possible "Pets", which I do not want to hard code. It would be nice if the suggested method could check for multiple other similar conditions too in a single query.
If I followed you correctly, you can just join and aggregate:
select t.id, count(*) nb_of_matches
from #toy_example t
inner join #pets p on p.animal = t.pet
group by t.id
The inner join eliminates records from #toy_example that have no match in #pets. Then, we aggregate by id and count how many recors remain in each group.
If you want to retain records that have no match in #pets and display them with a count of 0, then you can left join instead:
select t.id, count(*) nb_of_records, count(p.animal) nb_of_matches
from #toy_example t
left join #pets p on p.animal = t.pet
group by t.id
How about this approach?
SELECT e.Id
FROM #toy_example e JOIN
#pets p
ON e.pet = p.animal
GROUP BY e.Id
HAVING COUNT(DISTINCT e.pet) = (SELECT COUNT(*) FROM #pets);

What is the best way to join tables

this is more like a general question.
I am looking for the best way to join 4, maybe 5 different tables. I am trying to create a Power Bi pulling live information from an IBM AS400 where customer service can type one of our parts number,
see how many parts we have in inventory, if none, see the lead time and if there are any orders already already entered for the typed part number.
SERI is our inventory table with 37180 records.
(active inventory that is available)
METHDM is our kit table with 37459 records.
(this table contains the bill of materials for custom kits, KIT A123 contains different part numbers in it witch are in SERI as well.)
STKA is our part lead time table with 76796 records.
(lead time means how long will it take for parts to come in)
OCRI is our sales order table with 6497 records.
(This table contains all customer orders)
I have some knowledge in writing queries but this one is more challenging of what I have created in the past. Should I start with the table that has the most records and start left joining the rest ?
From STKA 76796 records
Left join METHDM 37459 records on STKA
left join SERI 37180 records on STKA
left join OCRI 6497 records on STAK
Select
STKA.v6part as part,
STKA.v6plnt as plant,
STKA.v6tdys as pur_leadtime,
STKA.v6prpt as Pur_PrepLeadtime,
STKA.v6lead as Mfg_leadtime,
STKA.v6prpt as Mfg_PrepLeadTime,
METHDM.AQMTLP AS COMPONENT,
METHDM.AQQPPC AS QTYNEEDED,
SERI.HTLOTN AS BATCH,
SERI.HTUNIT AS UOM,
(HTQTY - HTQTYC) as ONHAND,
OCRI.DDORD# AS SALESORDER,
OCRI.DDRDAT AS PROMISED
from stka
left join METHDM on STKA.V6PART = METHDM.AQPART
left join SERI on STKA.V6PART = SERI.HTPART
left join OCRI on STKA.V6PART = OCRI.DDPART
Is this the best way to join the tables?
I think you already have your answer, but conceptually, there are a few issues here to deal with, and I figured I would give you a few examples, using data a little bit like yours, but massively simplified.
CREATE TABLE #STKA (V6PART INT, OTHER_DATA VARCHAR(50));
CREATE TABLE #METHDM (AQPART INT, KIT_ID INT, SOME_DATE DATETIME, OTHER_DATA VARCHAR(50));
CREATE TABLE #SERI (HTPART INT, OTHER_DATA VARCHAR(50));
CREATE TABLE #OCRI (DDPART INT, OTHER_DATA VARCHAR(50));
INSERT INTO #STKA SELECT 1, NULL UNION ALL SELECT 2, NULL UNION ALL SELECT 3, NULL; --1, 2, 3 Ids
INSERT INTO #METHDM SELECT 1, 1, '20200108 10:00', NULL UNION ALL SELECT 1, 2, '20200108 11:00', NULL UNION ALL SELECT 2, 1, '20200108 13:00', NULL; --1 Id appears twice, 2 Id once, no 3 Id
INSERT INTO #SERI SELECT 1, NULL UNION ALL SELECT 3, NULL; --1 and 3 Ids
INSERT INTO #OCRI SELECT 1, NULL UNION ALL SELECT 4, NULL; --1 and 4 Ids
So fundamentally we have a few issues here:
o the first problem is that the IDs in the tables differ, one table has an ID #4 but this isn't in any of the others;
o the second issue is that we have multiple rows for the same ID in one table;
o the third issue is that some tables are "missing" IDs that are in other tables, which you already covered by using LEFT JOINs, so I will ignore this.
--This will select ID 1 twice, 2 once, 3 once, and miss 4 completely
SELECT
*
FROM
#STKA
LEFT JOIN #METHDM ON #METHDM.AQPART = #STKA.V6PART
LEFT JOIN #SERI ON #SERI.HTPART = #STKA.V6PART
LEFT JOIN #OCRI ON #OCRI.DDPART = #STKA.V6PART;
So the problem here is that we don't have every ID in our "anchor" table STKA, and in fact there's no single table that has every ID in it. Now your data might be fine here, but if it isn't then you can simply add a step to find every ID, and use this as the anchor.
--This will select each ID, but still doubles up on ID 1
WITH Ids AS (
SELECT V6PART AS ID FROM #STKA
UNION
SELECT AQPART AS ID FROM #METHDM
UNION
SELECT HTPART AS ID FROM #SERI
UNION
SELECT DDPART AS ID FROM #OCRI)
SELECT
*
FROM
Ids I
LEFT JOIN #STKA ON #STKA.V6PART = I.Id
LEFT JOIN #METHDM ON #METHDM.AQPART = I.Id
LEFT JOIN #SERI ON #SERI.HTPART = I.Id
LEFT JOIN #OCRI ON #OCRI.DDPART = I.Id;
That's using a common-table expression, but a subquery would also do the job. However, this still leaves us with an issue where ID 1 appears twice in the list, because it has multiple rows in one of the sub-tables.
One way to fix this is to pick the row with the latest date, or any other ORDER you can apply to the data:
--Pick the best row for the table where it has multiple rows, now we get one row per ID
WITH Ids AS (
SELECT V6PART AS ID FROM #STKA
UNION
SELECT AQPART AS ID FROM #METHDM
UNION
SELECT HTPART AS ID FROM #SERI
UNION
SELECT DDPART AS ID FROM #OCRI),
BestMETHDM AS (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY AQPART ORDER BY SOME_DATE DESC) AS ORDER_ID
FROM
#METHDM)
SELECT
*
FROM
Ids I
LEFT JOIN #STKA ON #STKA.V6PART = I.Id
LEFT JOIN BestMETHDM ON BestMETHDM.AQPART = I.Id AND BestMETHDM.ORDER_ID = 1
LEFT JOIN #SERI ON #SERI.HTPART = I.Id
LEFT JOIN #OCRI ON #OCRI.DDPART = I.Id;
Of course you could also add some aggregation (SUM, MAX, MIN, AVG, etc.) to fix this problem (if it is indeed an issue). Also, I used a common-table expression, but this would work just as well with a subquery.
Expanding on a comment made on the question..
I would say I will start with SERI as that table contains the entire inventory for our facility and should cover the other tables
However the question said
SERI is our inventory table with 37180 records. (active inventory that is available)
In my experience, active inventory, isn't the same as all parts.
Normally, in a query like this, I'd expect the first table to be a Parts Master table of some sort that contains every possible part ID.

Reuse result from subquery when creating a view

Is it possible to reuse the result from a subquery in a subsequent subquery when creating a view in postgresql?
For example I have the following two tables:
CREATE TABLE application
(
id INT PRIMARY KEY,
name CHARACTER VARYING(255)
);
CREATE TABLE application_user
(
id INT PRIMARY KEY,
application_id INT REFERENCES application (id) ON DELETE CASCADE,
active BOOLEAN
);
-- some sample data
INSERT INTO application (id, name) VALUES
(10, 'application1'),
(20, 'application2'),
(30, 'application3');
INSERT INTO application_user (id, application_id, active) VALUES
(1, 10, true),
(2, 10, false),
(3, 20, false),
(4, 20, false),
(5, 20, false);
The view that I need looks (right now) as follows:
CREATE VIEW application_stats AS
SELECT a.name,
(SELECT COUNT(1) FROM application_user u
WHERE a.id = u.application_id) AS users,
(SELECT COUNT(1) FROM application_user u
WHERE a.id = u.application_id AND u.active = true) AS active_users
FROM application a;
This does give me the correct result:
name users active_users
application1 2 1
application2 3 0
application3 0 0
However it is also pretty inefficient since I'm using two times almost the same query and ideally I would like to reuse the result from the first query. Is there an efficient way to do this?
This would normally be expressed as a join/group by:
SELECT a.name, COUNT(au.application_id) as users,
SUM( (au.active = true)::int) as active_users
FROM application a LEFT JOIN
application_user au
ON a.name = au.application_id
GROUP BY a.name;
I'm rather surprised that application doesn't have a serial primary key. But because you are using name, perhaps the join is not needed at all:
SELECT au.application_id, COUNT(*) as users,
SUM( (au.active = true)::int) as active_users
FROM application_user au
GROUP BY au.application_id;
This will return applications that have at least one server.
You should join the two tables, group by application_id and use count with a FILTER (WHERE ...) clause to count only the rows you want:
CREATE VIEW application_stats AS
SELECT a.name
count(*) AS users,
count(*) FILTER (WHERE u.active) AS active_users
FROM application a
LEFT JOIN application_user u ON a.id = u.application_id
GROUP BY a.id;

Join on resultant table of another join without using subquery,CTE or temp tables

My question is can we join a table A to resultant table of inner join of table A and B without using subquery, CTE or temp tables ?
I am using SQL Server.
I will explain the situation with an example
The are two tables GoaLScorers and GoalScoredDetails.
GoaLScorers
gid Name
-----------
1 A
2 B
3 A
GoalScoredDetails
DetailId gid stadium goals Cards
---------------------------------------------
1 1 X 2 1
2 2 Y 5 2
3 3 Y 2 1
The result I am expecting is if I select a stadium 'X' (or 'Y')
I should get name of all who may or may not have scored there, also aggregate total number of goals,total cards.
Null value is acceptable for names if no goals or no cards.
I can get the result I am expecting with the below query
SELECT
gs.name,
SUM(goal) as TotalGoals,
SUM(cards) as TotalCards
FROM
(SELECT
gid, stadium, goal, cards
FROM
GoalScoredDetails
WHERE
stadium = 'Y') AS vtable
RIGHT OUTER JOIN
GoalScorers AS gs ON vtable.gid = gs.gid
GROUP BY
gs.name
My question is can we get the above result without using a subquery or CTE or temp table ?
Basically what we need to do is OUTER JOIN GoalScorers to resultant virtual table of INNER JOIN OF GoalScorers and GoalScoredDetails.
But I am always faced with ambiguous column name error as "gid" column is present in GoalScorers and also in resultant table. Error persists even if I try to use alias for column names.
I have created a sql fiddle for this her: http://sqlfiddle.com/#!3/40162/8
SELECT gs.name, SUM(gsd.goal) AS totalGoals, SUM(gsd.cards) AS totalCards
FROM GoalScorers gs
LEFT JOIN GoalScoredDetails gsd ON gsd.gid = gs.gid AND
gsd.Stadium = 'Y'
GROUP BY gs.name;
IOW, you could push your where criteria onto joining expression.
The error Ambiguous column name 'ColumnName' occurs when SQL Server encounters two or more columns with the same and it hasn't been told which to use. You can avoid the error by prefixing your column names with either the full table name, or an alias if provided. For the examples below use the following data:
Sample Data
DECLARE #GoalScorers TABLE
(
gid INT,
Name VARCHAR(1)
)
;
DECLARE #GoalScoredDetails TABLE
(
DetailId INT,
gid INT,
stadium VARCHAR(1),
goals INT,
Cards INT
)
;
INSERT INTO #GoalScorers
(
gid,
Name
)
VALUES
(1, 'A'),
(2, 'B'),
(3, 'A')
;
INSERT INTO #GoalScoredDetails
(
DetailId,
gid,
stadium,
goals,
Cards
)
VALUES
(1, 1, 'x', 2, 1),
(2, 2, 'y', 5, 2),
(3, 3, 'y', 2, 1)
;
In this first example we recieve the error. Why? Because there is more than one column called gid it cannot tell which to use.
Failed Example
SELECT
gid
FROM
#GoalScoredDetails AS gsd
RIGHT OUTER JOIN #GoalScorers as gs ON gs.gid = gsd.gid
;
This example works because we explicitly tell SQL which gid to return:
Working Example
SELECT
gs.gid
FROM
#GoalScoredDetails AS gsd
RIGHT OUTER JOIN #GoalScorers as gs ON gs.gid = gsd.gid
;
You can, of course, return both:
Example
SELECT
gs.gid,
gsd.gid
FROM
#GoalScoredDetails AS gsd
RIGHT OUTER JOIN #GoalScorers as gs ON gs.gid = gsd.gid
;
In multi table queries I would always recommend prefixing every column name with a table/alias name. This makes the query easier to follow, and reduces the likelihood of this sort of error.

hibernate (or SQL or HQL) finding objects who have a collection which is a subset of another collection

Story:
I have this User -> Role -> Privilege mechanism. Every user has some roles. Every role has some privileges.
CREATE TABLE user (id int);
INSERT INTO user VALUES (1);
INSERT INTO user VALUES (2);
INSERT INTO user VALUES (3);
CREATE TABLE role (id int);
INSERT INTO role VALUES (100);
INSERT INTO role VALUES (200);
CREATE TABLE user__role (user_id int, role_id int);
INSERT INTO user__role VALUES (1, 100);
INSERT INTO user__role VALUES (2, 200);
CREATE TABLE privilege (id int);
INSERT INTO privilege VALUES (1000);
INSERT INTO privilege VALUES (2000);
INSERT INTO privilege VALUES (3000);
INSERT INTO privilege VALUES (4000);
CREATE TABLE role__privilege (role_id int, privilege_id int);
INSERT INTO role__privilege VALUES (100, 1000);
INSERT INTO role__privilege VALUES (100, 3000);
INSERT INTO role__privilege VALUES (200, 2000);
(users, roles and privileges all have names and some other stuff. but i omitted them to keep the example simple)
then I have some Rooms. You need certain privileges to enter the room.
CREATE TABLE room (id int);
INSERT INTO room VALUES (11);
INSERT INTO room VALUES (22);
INSERT INTO room VALUES (33);
INSERT INTO room VALUES (44);
INSERT INTO room VALUES (55);
CREATE TABLE room__privilege (room_id int, privilege_id int);
INSERT INTO room__privilege VALUES (11, 1000);
INSERT INTO room__privilege VALUES (11, 3000);
INSERT INTO room__privilege VALUES (22, 2000);
INSERT INTO room__privilege VALUES (33, 3000);
INSERT INTO room__privilege VALUES (55, 1000);
INSERT INTO room__privilege VALUES (55, 2000);
INSERT INTO room__privilege VALUES (55, 3000);
Here is the deal: If a user have all the privileges required by a room, then the user can enter the room. If a room requires no privilege, then anyone can enter.
In terms of object, I have something like
class User {
int id;
Set<Role> roles;
}
class Role {
int id;
Set<Privilege> privileges;
}
class Room {
int id;
Set<Privilege> requirements;
}
now I have a User whose say id = 1. I want to know which rooms this user can enter. How do I achieve this with hibernate criteria or SQL?
I guess I can use some queries to find all the privileges that a user own (and store them in a set). And then I find the rooms whose requirements are a subset of this set. But I can't find the right criteria/restriction to this. Also, after reading some posts in stackoverflow I got the feeling that the whole thing can be done with a single SQL/HQL query.
Can anyone give me some help, please. Thanks in advance!
UPDATE 1
I have been working on this the whole night. I managed to get some results out
SELECT requirements.room_id
FROM (
SELECT room.id AS room_id, room__privilege.privilege_id FROM room
JOIN room__privilege ON room__privilege.room_id = room.id
) requirements
INNER JOIN (
SELECT room__privilege.room_id, COUNT(*) as count FROM room__privilege
GROUP BY room__privilege.room_id
) hits ON requirements.room_id = hits.room_id
INNER JOIN (
SELECT user.id AS user_id, rp.privilege_id FROM user
JOIN user__role ur ON user.id = ur.user_id
JOIN role__privilege rp ON ur.role_id = rp.role_id
) up ON requirements.privilege_id = up.privilege_id
WHERE up.user_id = 1
GROUP BY requirements.room_id, up.user_id, hits.count HAVING COUNT(*) = hits.count
UNION
SELECT room.id FROM room
WHERE room.id NOT IN (
SELECT room_id FROM room__privilege
);
which seems to give me what I want. I seems fairly complicated and I am not sure if I can wrap this into criteria or HQL.
I checked the answer from #Rajesh and #Erik Hart. Their queries seem to work with the example too. I am gonna do an analyse to see which one performs better.
Thank you so much for your help. I really appreciate it. If anyone know how this can be achieved by criteria or HQL, please don't hesitate to reply. Cheers!!!
SQL would be:
select id from room where id not in (
select room_id from room_privilege where privilege_id not in (
select id from privilege where id in ( -- can omit
select privilege_id from role_privilege where role_id in (
select id from rol where id in ( -- can omit
select role_id from user_role where user_id in ( -- if user table omitted: user_id=#userid
select id from usr where id=1 -- can omit
)))))) -- remove closing brackets when omitting subselects!
Checked this in SQL Server (user #1:11,33,44; #2: 22,44; #3: 44 only), table names slightly changed due to reserved keywords.
Line 2 selects room privileges the user has not, which would block him from entry. Then line 1 selects rooms without blocking room_privileges.
The selects on the main object tables can usually be omitted (except the first), but also be left for the safety of not having orphaned cross references (if not prevented by foreign keys, delete cascades).
This should return distinct room ids (without distinct/group by clause).
The IN subselects are usually translated into semi-joins by the database, NOT IN to semi-anti-join, meaning: no values from joined tables are assigned, and results will not be multiplied for multiple join matches.
First query is fetching all the rooms that user has access to
Second query fetches all rooms that doesn't require any priveleges.
UNION would give the desired result
select A.room_id
FROM(
SELECT room_id,
count(privilege_id) as count1
FROM room__privilege
GROUP BY room_id) A
INNER JOIN
(
SELECT room_id,
count(RP.privilege_id) as count2
FROM room__privilege RP
INNER join
(select RLP.privilege_id as privilege_id
FROM role__privilege RLP
inner join user__role UR
on UR.role_id = RLP.role_id
and UR.user_id = 1 ) T
on T.privilege_id = RP.privilege_id
group by room_id) B
ON A.count1 = B.count2
AND A.room_id = B.room_id
union
select R.id from Room R
where R.id not in ( select room_id from room__privilege )