Find rows with a group values from other table - sql

id
role
Group_ID
1
A
1
2
B
1
3
A
2
4
D
2
5
A
3
6
B
3
7
C
3
8
C
4
...
User_id
role
user1
A
user1
B
user2
C
user2
D
user3
A
user3
D
user4
C
user5
A
user5
B
user5
C
user5
D
...
I have 2 tables Table1 and Table2 as shown above.
My requirement is to get the User_ID from the table2 which has all the roles from a group. Additionally, only those groups need to be checked which has at least 2 roles. If a group_ID has only 1 role then it should not be considered
For example, this is how the result will look like from above 2 tables
user1 has both the roles from group 1 (A,B) -> therefore it is in the results.
user3 has both the roles from group 2 (A,D) -> therefore it is in the results.
user5 has all the roles from group 1(A,B), 2(A,D) and 3(A,B,C) -> therefore it is in the results.
User2 has role C and D which is not a group, hence not shown in the result
User4 has role C which is a group (Group_ID = 4), but the group should have at least 2 roles, hence not shown in the result
User_id
Group_ID
user1
1
user3
2
user5
1
user5
2
user5
3
....
Select Table2.USER_ID,Table1.GROUP_ID
from Table2,
Table1
Where Table2.ROLE = Table1.ROLE
group by Table1.GROUP_ID,Table2.USER_ID
With the above query, I am able to get the records with user_id assigned any of the role. However, I want to get the User_ID from the table2 which has all the roles from a group.
Any help is much appreciated. I will also make sure to accept the answer leading me to the solution

A direct translation of the requirement into SQL (Oracle style)
with g as (
select group_id, count(*) role_cnt from table1 group by group_id having count(*) > 1),
u as (
select u.user_id, g.group_id, count(*) usergroup_role_cnt from table1 g, table2 u
where u.role=g.role
group by u.user_id, g.group_id)
select u.user_id, g.group_id from g, u
where u.group_id=g.group_id
and u.usergroup_role_cnt=g.role_cnt;
Edit:
There has been concern about query performance using "group by" when tables are "very large". I don't know how "very large" is defined. Anyway, I did a test on Oracle Clould Free Tier ATP database 19c, using tables with tens of thousand rows. Here is the result.
select count(*) from table1;
COUNT(*)
--------
289327
Elapsed: 00:00:00.007
1 rows selected.
select count(*) from table2;
COUNT(*)
--------
1445328
Elapsed: 00:00:00.024
1 rows selected.
with g as (
select group_id, count(*) role_cnt from table1 group by group_id having count(*) > 1),
u as (
select u.user_id, g.group_id, count(*) usergroup_role_cnt from table1 g, table2 u
where u.role=g.role
group by u.user_id, g.group_id)
select u.user_id, g.group_id from g, u
where u.group_id=g.group_id
and u.usergroup_role_cnt=g.role_cnt;
USER_ID GROUP_ID
------- --------
user99 994
user97 462
user97 199
user87 35
user99 462
user87 179
user99 199
user98 199
user96 199
user87 813
user87 941
user96 994
user97 994
user96 462
user98 462
user98 994
Elapsed: 00:00:04.770
16 rows selected.
select a2.user_id, a1.group_id
from (
select user_id, cast(collect(role) as role_list) as user_roles
from table2
group by user_id
) a2
inner join
(
select group_id, cast(collect(role) as role_list) as group_roles
from table1
group by group_id
) a1
on a1.group_roles submultiset of a2.user_roles
order by user_id, group_id
;
USER_ID GROUP_ID
------- --------
user87 35
user87 179
user87 813
user87 941
user96 199
user96 462
user96 994
user97 199
user97 462
user97 994
user98 199
user98 462
user98 994
user99 199
user99 462
user99 994
Elapsed: 00:01:35.395
16 rows selected.
select q2.user_id, q1.group_id
from (select distinct user_id from table2) q2
cross join
(select distinct group_id from table1) q1
where not exists
(
select role
from table1
where group_id = q1.group_id
and role not in
(
select role
from table2
where user_id = q2.user_id
)
)
order by user_id, group_id
;
(Manually cancelled after 10 min)
Data is generated quite randomly and is somewhat skewed. But in reality, skewed data is not avoidable. Statistics should minimize the impact. (Both tables were analyzed and have no indexes)

Here is a solution that will work in Oracle Database; adapt it for SQL Server (if that is possible; I don't know that dialect).
Test data (others may use this too):
create table table1(id number, role varchar2(10), group_id number);
insert into table1 (id, role, group_id)
select 1, 'A', 1 from dual union all
select 2, 'B', 1 from dual union all
select 3, 'A', 2 from dual union all
select 4, 'D', 2 from dual union all
select 5, 'A', 3 from dual union all
select 6, 'B', 3 from dual union all
select 7, 'C', 3 from dual
;
create table table2 (user_id varchar2(20), role varchar2(10));
insert into table2 (user_id, role)
select 'user1', 'A' from dual union all
select 'user1', 'B' from dual union all
select 'user2', 'C' from dual union all
select 'user2', 'D' from dual union all
select 'user3', 'A' from dual union all
select 'user3', 'D' from dual union all
select 'user4', 'C' from dual union all
select 'user5', 'A' from dual union all
select 'user5', 'B' from dual union all
select 'user5', 'C' from dual union all
select 'user5', 'D' from dual
;
commit;
Create user-defined data type (collection of strings representing roles):
create or replace type role_list as table of varchar2(10);
/
Query and output:
select a2.user_id, a1.group_id
from (
select user_id, cast(collect(role) as role_list) as user_roles
from table2
group by user_id
) a2
inner join
(
select group_id, cast(collect(role) as role_list) as group_roles
from table1
group by group_id
) a1
on a1.group_roles submultiset of a2.user_roles
order by user_id, group_id
;
USER_ID GROUP_ID
------------ ----------
user1 1
user3 2
user5 1
user5 2
user5 3
The strategy is pretty obvious, and should be easy to read directly from the code. Group the roles by group_id in the first table and by user_id in the second. Identify all the pairs (user, group) where all roles for the group are found in the role list of the user - that is exactly what the submultiset comparison operator does.
A more rudimentary query (harder to follow and maintain, and likely slower), but perhaps helpful as it is likely to work with very few changes - if any - in pretty much all SQL dialects, might look like this. Assuming role can't be null in table2 (to make the query slightly simpler):
select q2.user_id, q1.group_id
from (select distinct user_id from table2) q2
cross join
(select distinct group_id from table1) q1
where not exists
(
select role
from table1
where group_id = q1.group_id
and role not in
(
select role
from table2
where user_id = q2.user_id
)
)
order by user_id, group_id
;

Related

Oracle perform a count for each row on a table

I have a doubt regarding a query in oracle SQL.
I have a groups table, that I can query as:
SELECT *
FROM groups g
WHERE g.owner = 123;
This would give something like:
groupId
owner
1
123
2
123
3
123
4
123
I also have another table of administrators, that I can query as:
SELECT *
FROM admins a
ORDER BY groupId;
This would get administrators as:
adminId
userName
groupId
1
myadmin1
1
2
myAdmin2
1
3
myAdmin3
1
4
myAdmin4
2
5
myAdmin5
3
6
myAdmin6
3
That basically means that a group can have multiple administrators. I would like to count the number of administrators for each group. A result such as:
groupId
owner
adminCount
1
123
3
2
123
1
3
123
2
4
123
0
However, I cannot make a count of each administrator in the table and then make a join, as it is a table with a lot of rows.
I would like to perform the count query
SELECT count(*)
FROM admins a
WHERE groupId = 1;
for each row of the groups query, such that I get the desired result without performing a count of each administrator in the table, just the ones that belong to the groups from a specific owner.
Does someone know how can I count it without counting all the rows in the administrators table?
Thanks
The easiest and most readable variant is to use outer apply (or lateral(+)):
select *
from groups g
outer apply (
select count(*) as adminCount
from admins a
where a.groupId=g.groupId
);
Or you can get the same results using subqueries (moreover, in fact Oracle optimizer can decide to transform outer-apply/lateral to this variant, since it has "lateral view decorrelation" transformation):
select g.groupId,g.owner, nvl(a.adminCount,0) as adminCount
from groups g
left join (
select x.groupId, count(*) as adminCount
from admins x
group by x.groupId
) a
on a.groupId=g.groupId;
or even group-by with join:
select g.groupId,g.owner, count(a.groupId) as adminCount
from groups g
left join admins a
on g.groupId=a.groupId
group by g.groupId,g.owner
https://dbfiddle.uk/a-Q_abg8
You could use analytic function COUNT() OVER() ...
Select Distinct
g.GROUP_ID,
g.OWNER,
Count(a.ADMIN_ID) OVER(Partition By g.GROUP_ID) "COUNT_ADMINS"
From groups g
Left Join admins a ON(a.GROUP_ID = g.GROUP_ID)
Where g.OWNER = 123
Order By g.GROUP_ID
... this requires the Distinct keyword which could be performance costly with big datasets. I don't expect that user groups and admins are that big.
WIth your sample data:
WITH
groups (GROUP_ID, OWNER) AS
(
Select 1, 123 From Dual Union ALL
Select 2, 123 From Dual Union ALL
Select 3, 123 From Dual Union ALL
Select 4, 123 From Dual
),
admins (ADMIN_ID, ADMIN_USER_NAMAE, GROUP_ID) AS
(
Select 1, 'myadmin1', 1 From Dual Union All
Select 2, 'myadmin2', 1 From Dual Union All
Select 3, 'myadmin3', 1 From Dual Union All
Select 4, 'myadmin4', 2 From Dual Union All
Select 5, 'myadmin5', 3 From Dual Union All
Select 6, 'myadmin6', 3 From Dual
)
... the result is
GROUP_ID
OWNER
COUNT_ADMINS
1
123
3
2
123
1
3
123
2
4
123
0

Query to display one to many result in a single table

Ive been trying to use the GROUP function and also PIVOT but I cannot wrap my head around how to merge these tables and combine duplicate rows. Currently my SELECT statement returns results with duplicate UserID rows but I want to consolidate them into columns.
How would I join TABLE1 and TABLE2 into a new table which would look something like this:
NEW TABLE:
UserID Username ParentID 1 ParentID 2
--------- -------- -------- ----------
1 Dave 1 2
2 Sally 3 4
TABLE1:
UserID Username ParentID
--------- -------- --------
1 Dave 1
1 Dave 2
2 Sally 3
2 Sally 4
Table 2:
ParentID Username
--------- --------
1 Sarah
2 Joe
3 Tom
4 Mark
O r a c l e
The with clause is here just to generate some sample data and, as such, it is not a part of the answer.
After joining the tables you can use LAST_VALUE analytic function with windowing clause to get the next PARENT_ID of the user. That column (PARENT_ID_2) contains a value only within the first row of a particular USER_ID (ROW_NUMBER analytic function). Afterwords just filter out rows where PARENT_ID_2 is empty...
Sample data:
WITH
tbl_1 AS
(
Select 1 "USER_ID", 'Dave' "USER_NAME", 1 "PARENT_ID" From Dual Union All
Select 1 "USER_ID", 'Dave' "USER_NAME", 2 "PARENT_ID" From Dual Union All
Select 2 "USER_ID", 'Sally' "USER_NAME", 3 "PARENT_ID" From Dual Union All
Select 2 "USER_ID", 'Sally' "USER_NAME", 4 "PARENT_ID" From Dual
),
tbl_2 AS
(
Select 1 "PARENT_ID", 'Sarah' "USER_NAME" From Dual Union All
Select 2 "PARENT_ID", 'Joe' "USER_NAME" From Dual Union All
Select 3 "PARENT_ID", 'Tom' "USER_NAME" From Dual Union All
Select 4 "PARENT_ID", 'Mark' "USER_NAME" From Dual
)
Main SQL:
SELECT
*
FROM (
SELECT
t1.USER_ID "USER_ID",
t1.USER_NAME "USER_NAME",
t1.PARENT_ID "PARENT_ID_1",
CASE
WHEN ROW_NUMBER() OVER(PARTITION BY t1.USER_ID ORDER BY t1.USER_ID) = 1
THEN LAST_VALUE(t1.PARENT_ID) OVER(PARTITION BY t1.USER_ID ORDER BY t1.USER_ID ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING)
END "PARENT_ID_2"
FROM
tbl_1 t1
INNER JOIN
tbl_2 t2 ON(t1.PARENT_ID = t2.PARENT_ID)
)
WHERE PARENT_ID_2 Is Not Null
... and the Result ...
-- USER_ID USER_NAME PARENT_ID_1 PARENT_ID_2
-- ---------- --------- ----------- -----------
-- 1 Dave 1 2
-- 2 Sally 3 4
The windowing clause in this answer
ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING
takes curent and next row and returns the value defined by the analytic function (LAST_VALUE) taking care of grouping (PARTITION BY) and ordering of the rows. Regards...
This is mySql ver 5.6. Create a concatenated ParentID using group concat then separate the concatenated ParentID (1,2) and (3,4) into ParentID 1 and Parent ID 2.
SELECT t1.UserID,
t1.Username,
SUBSTRING_INDEX(SUBSTRING_INDEX(GROUP_CONCAT(t1.ParentID), ',', 1), ',', -1) AS `ParentID 1`,
SUBSTRING_INDEX(SUBSTRING_INDEX(GROUP_CONCAT(t1.ParentID), ',', 2), ',', -1) as `ParentID 2`
FROM TABLE1 t1
INNER JOIN TABLE2 t2 on t1.ParentID = t2.ParentID
GROUP BY t1.UserID
ORDER BY t1.UserID;
Result:
UserID Username ParentID 1 ParentID 2
1 Dave 1 2
2 Sally 3 4

Oracle Finding a string match from multiple database tables

This is somewhat a complex problem to describe, but I'll try to explain it with an example. I thought I would have been able to use the Oracle Instr function to accomplish this, but it does not accept queries as parameters.
Here is a simplification of my data:
Table1
Person Qualities
Joe 5,6,7,8,9
Mary 7,8,10,15,20
Bob 7,8,9,10,11,12
Table2
Id Desc
5 Nice
6 Tall
7 Short
Table3
Id Desc
8 Angry
9 Sad
10 Fun
Table4
Id Desc
11 Boring
12 Happy
15 Cool
20 Mad
Here is somewhat of a query to give an idea of what I'm trying to accomplish:
select * from table1
where instr (Qualities, select Id from table2, 1,1) <> 0
and instr (Qualities, select Id from table3, 1,1) <> 0
and instr (Qualities, select Id from table3, 1,1) <> 0
I'm trying to figure out which people have at least 1 quality from each of the 3 groups of qualities (tables 2,3, and 4)
So Joe would not be returned in the results because he does not have the quality from each of the 3 groups, but Mary and Joe would since they have at least 1 quality from each group.
We are running Oracle 12, thanks!
Here's one option:
SQL> with
2 table1 (person, qualities) as
3 (select 'Joe', '5,6,7,8,9' from dual union all
4 select 'Mary', '7,8,10,15,20' from dual union all
5 select 'Bob', '7,8,9,10,11,12' from dual
6 ),
7 table2 (id, descr) as
8 (select 5, 'Nice' from dual union all
9 select 6, 'Tall' from dual union all
10 select 7, 'Short' from dual
11 ),
12 table3 (id, descr) as
13 (select 8, 'Angry' from dual union all
14 select 9, 'Sad' from dual union all
15 select 10, 'Fun' from dual
16 ),
17 table4 (id, descr) as
18 (select 11, 'Boring' from dual union all
19 select 12, 'Happy' from dual union all
20 select 15, 'Cool' from dual union all
21 select 20, 'Mad' from dual
22 ),
23 t1new (person, id) as
24 (select person, regexp_substr(qualities, '[^,]+', 1, column_value) id
25 from table1 cross join table(cast(multiset(select level from dual
26 connect by level <= regexp_count(qualities, ',') + 1
27 ) as sys.odcinumberlist))
28 )
29 select a.person,
30 count(b.id) bid,
31 count(c.id) cid,
32 count(d.id) did
33 from t1new a left join table2 b on a.id = b.id
34 left join table3 c on a.id = c.id
35 left join table4 d on a.id = d.id
36 group by a.person
37 having ( count(b.id) > 0
38 and count(c.id) > 0
39 and count(d.id) > 0
40 );
PERS BID CID DID
---- ---------- ---------- ----------
Bob 1 3 2
Mary 1 2 2
SQL>
What does it do?
lines #1 - 22 represent your sample data
T1NEW CTE (lines #23 - 28) splits comma-separated qualities into rows, per every person
final select (lines #29 - 40) are outer joining t1new with each of "description" tables (table2/3/4) and counting how many qualities are contained in there for each of person's qualities (represented by rows from t1new)
having clause is here to return only desired persons; each of those counts have to be a positive number
Maybe this will help:
{1} Create a view that categorises all qualities and allows you to SELECT quality IDs and categories . {2} JOIN the view to TABLE1 and use a join condition that "splits" the CSV value stored in TABLE1.
{1} View
create or replace view allqualities
as
select 1 as category, id as qid, descr from table2
union
select 2, id, descr from table3
union
select 3, id, descr from table4
;
select * from allqualities order by category, qid ;
CATEGORY QID DESCR
---------- ---------- ------
1 5 Nice
1 6 Tall
1 7 Short
2 8 Angry
2 9 Sad
2 10 Fun
3 11 Boring
3 12 Happy
3 15 Cool
3 20 Mad
{2} Query
-- JOIN CONDITION:
-- {1} add a comma at the start and at the end of T1.qualities
-- {2} remove all blanks (spaces) from T1.qualities
-- {3} use LIKE and the qid (of allqualities), wrapped in commas
--
-- inline view: use UNIQUE, otherwise we may get counts > 3
--
select person
from (
select unique person, category
from table1 T1
join allqualities A
on ',' || replace( T1.qualities, ' ', '' ) || ',' like '%,' || A.qid || ',%'
)
group by person
having count(*) = ( select count( distinct category ) from allqualities )
;
-- result
PERSON
Bob
Mary
Tested w/ Oracle 18c and 11g. DBfiddle here.

How to find all possible connections between different rows?

Our company asked the user to enter various pieces of information including address and license plate in order to get car insurance quotes
We store this information in BigQuery.
Some users have entered more than one license plate (they might own more than one car) and more than one address at different times.
Basically the structure could look like this
row info_1 info_2
----- -------- --------
1 a y
2 a x
3 b y
4 b z
5 c z
6 a z
We want utilize all links between these two pieces of information to have all information from one user in one row
The above table all have connections and should thus be in
Is this possible and what is best practice?
We have experimented with both STRING_AGG and ARRAY_AGG but have not found the solution yet.
Assuming that you have an extra column user_id and making the input table:
row user_id info_1 info_2
----- -------- -------- --------
1 u1 a y
2 u2 a x
3 u1 b y
4 u1 b z
5 u2 c z
6 u2 a z
The following query gives you 2 list of distinct element for info_1 and info_1 respectively:
select
user_id,
array_agg(disctinc info_1) as element_in_info_1,
array_agg(disctinc info_2) as element_in_info_2,
from table
group by 1
Test example with input data
with table as (
select 'u1' as user_id, 'a' as info_1, 'y' as info_2 union all
select 'u2' as user_id, 'a' as info_1, 'x' as info_2 union all
select 'u1' as user_id, 'b' as info_1, 'y' as info_2 union all
select 'u1' as user_id, 'b' as info_1, 'z' as info_2 union all
select 'u2' as user_id, 'c' as info_1, 'z' as info_2 union all
select 'u2' as user_id, 'a' as info_1, 'z' as info_2
)
select
user_id,
array_agg(distinct info_1) as element_in_info_1,
array_agg(distinct info_2) as element_in_info_2
from table
group by 1
Result

Oracle Hierarchical Query

Using Oracle 10g. I have two tables:
User Parent
-------------
1 (null)
2 1
3 1
4 3
Permission User_ID
-------------------
A 1
B 3
The values in the permissions table get inherited down to the children. I would like to write a single query that could return me something like this:
User Permission
------------------
1 A
2 A
3 A
3 A
3 B
4 A
4 B
Is it possible to formulate such a query using 10g connect .. by syntax to pull in rows from previous levels?
you can achieve the desired result with a connect by (and the function CONNECT_BY_ROOT that returns the column value of the root node):
SQL> WITH users AS (
2 SELECT 1 user_id, (null) PARENT FROM dual
3 UNION ALL SELECT 2, 1 FROM dual
4 UNION ALL SELECT 3, 1 FROM dual
5 UNION ALL SELECT 4, 3 FROM dual
6 ), permissions AS (
7 SELECT 'A' permission, 1 user_id FROM dual
8 UNION ALL SELECT 'B', 3 FROM dual
9 )
10 SELECT lpad('*', 2 * (LEVEL-1), '*')||u.user_id u,
11 u.user_id, connect_by_root(permission) permission
12 FROM users u
13 LEFT JOIN permissions p ON u.user_id = p.user_id
14 CONNECT BY u.PARENT = PRIOR u.user_id
15 START WITH p.permission IS NOT NULL
16 ORDER SIBLINGS BY user_id;
U USER_ID PERMISSION
--------- ------- ----------
3 3 B
**4 4 B
1 1 A
**2 2 A
**3 3 A
****4 4 A
You could take a look at http://www.adp-gmbh.ch/ora/sql/connect_by.html
Kind of black magic, but you can use table-cast-multiset to reference one table from another in WHERE clause:
create table t1(
usr number,
parent number
);
create table t2(
usr number,
perm char(1)
);
insert into t1 values (1,null);
insert into t1 values (2,1);
insert into t1 values (3,1);
insert into t1 values (4,3);
insert into t2 values (1,'A');
insert into t2 values (3,'B');
select t1.usr
, t2.perm
from t1
, table(cast(multiset(
select t.usr
from t1 t
connect by t.usr = prior t.parent
start with t.usr = t1.usr
) as sys.odcinumberlist)) x
, t2
where t2.usr = x.column_value
;
In the subquery x I construct a table of all parents for the given user from t1 (including itself), then join it with permissions for these parents.
Here is a example for just one user id. you can use proc to loop all.
CREATE TABLE a_lnk
(user_id VARCHAR2(5),
parent_id VARCHAR2(5));
CREATE TABLE b_perm
(perm VARCHAR2(5),
user_id VARCHAR2(5));
INSERT INTO a_lnk
SELECT 1, NULL
FROM DUAL;
INSERT INTO a_lnk
SELECT 2, 1
FROM DUAL;
INSERT INTO a_lnk
SELECT 3, 1
FROM DUAL;
INSERT INTO a_lnk
SELECT 4, 3
FROM DUAL;
INSERT INTO b_perm
SELECT 'A', 1
FROM DUAL;
INSERT INTO b_perm
SELECT 'B', 3
FROM DUAL;
-- example for just for user id = 1
--
SELECT c.user_id, c.perm
FROM b_perm c,
(SELECT parent_id, user_id
FROM a_lnk
START WITH parent_id = 1
CONNECT BY PRIOR user_id = parent_id
UNION
SELECT parent_id, user_id
FROM a_lnk
START WITH parent_id IS NULL
CONNECT BY PRIOR user_id = parent_id) d
WHERE c.user_id = d.user_id
UNION
SELECT d.user_id, c.perm
FROM b_perm c,
(SELECT parent_id, user_id
FROM a_lnk
START WITH parent_id = 1
CONNECT BY PRIOR user_id = parent_id
UNION
SELECT parent_id, user_id
FROM a_lnk
START WITH parent_id IS NULL
CONNECT BY PRIOR user_id = parent_id) d
WHERE c.user_id = d.parent_id;