Oracle - Finding missing /non-joined records - sql

I have an issue in Oracle 12 that is easiest explained with the traditional database design scenario of students, classes, and students taking classes called registrations. I understand this model well. I have a scenario where I need to get a COMPLETE list, of all students against ALL classes, and whether or not they are taking that class or not...
Lets use this table design here...
CREATE TABLE CLASSES
(CLASSID VARCHAR2(10) PRIMARY KEY,
CLASSNAME VARCHAR2(25),
INSTRUCTOR VARCHAR2(25) );
CREATE TABLE STUDENTS
(STUDENTID VARCHAR2(10) PRIMARY KEY,
STUDENTNAMENAME VARCHAR2(25)
STUDY_MAJOR VARCHAR2(25) );
CREATE TABLE REGISTRATION
(
CLASSID VARCHAR2(10 BYTE),
STUDENTID VARCHAR2(10 BYTE),
GRADE NUMBER(4,0),
CONSTRAINT "PK1" PRIMARY KEY ("CLASSID", "STUDENTID"),
CONSTRAINT "FK1" FOREIGN KEY ("CLASSID") REFERENCES "CLASSES" ("CLASSID") ENABLE,
CONSTRAINT "FK2" FOREIGN KEY ("STUDENTID") REFERENCES "EGR_MM"."STUDENTS" ("STUDENTID") ENABLE
) ;
So assume the following... 300 students, and 15 different classes... and the REGISTRATION table will show how many students taking how many classes... What I need is that info PLUS all the NON-TAKEN combinations... i.e. I need a report (SQL statement) that shows ALL possible combinations... i.e. 300 x 15, and then whether that row exists in the registration table...so for example, the output should look like this...
STUDENTID Class1_GRADE Class2_Grade Class3_Grade` Class4_Grade
101 A B Not Taking A
102 C Not Taking Not Taking Not Taking
****** THIS STUDENT NOT TAKING ANY CLASSES So NOT in the Registrations Table
103 Not Taking Not Taking Not Taking Not Taking
This would work as well, and I can probably do a PIVOT to get the above listing.
STUDENTID CLASSID GRADE
101 Class1 A
101 Class2 B
101 Class3 Not Taking
101 Class4 A
...
102 Class1 C
102 Class2 Not Taking
102 Class3 Not Taking
102 Class4 Not Taking
...
103 Class1 Not Taking // THIS STUDENT NOT TAKING ANY CLASSES
103 Class2 Not Taking
103 Class3 Not Taking
103 Class4 Not Taking
How do I fill in the missing data, i.e. the combination of students and classes NOT taken...?

CROSS JOIN the students and classes and then LEFT OUTER JOIN the registrations and then use COALESCE to get the Not taken value:
SELECT s.studentid,
c.classid,
COALESCE( TO_CHAR( r.grade ), 'Not taken' ) AS grade
FROM students s
CROSS JOIN classes c
LEFT OUTER JOIN registration r
ON ( s.studentid = r.studentid AND c.classid = r.classid )
Which, if you have the data:
INSERT INTO Classes
SELECT LEVEL,
'Class' || LEVEL,
'Instructor' || LEVEL
FROM DUAL
CONNECT BY LEVEL <= 3;
INSERT INTO Students
SELECT TO_CHAR( LEVEL, 'FM000' ),
'Student' || LEVEL,
'Major'
FROM DUAL
CONNECT BY LEVEL <= 5;
INSERT INTO Registration
SELECT 1, '001', 4 FROM DUAL UNION ALL
SELECT 1, '002', 2 FROM DUAL UNION ALL
SELECT 1, '003', 5 FROM DUAL UNION ALL
SELECT 2, '001', 3 FROM DUAL UNION ALL
SELECT 3, '001', 1 FROM DUAL;
Then it outputs:
STUDENTID | CLASSID | GRADE
:-------- | :------ | :--------
001 | 1 | 4
002 | 1 | 2
003 | 1 | 5
001 | 2 | 3
001 | 3 | 1
005 | 1 | Not taken
004 | 2 | Not taken
003 | 3 | Not taken
005 | 3 | Not taken
005 | 2 | Not taken
002 | 2 | Not taken
003 | 2 | Not taken
004 | 1 | Not taken
002 | 3 | Not taken
004 | 3 | Not taken
If you want to pivot it then:
SELECT *
FROM (
SELECT s.studentid,
c.classid,
COALESCE( TO_CHAR( r.grade ), 'Not taken' ) AS grade
FROM students s
CROSS JOIN classes c
LEFT OUTER JOIN registration r
ON ( s.studentid = r.studentid AND c.classid = r.classid )
)
PIVOT ( MAX( grade ) FOR classid IN (
1 AS Class1,
2 AS Class2,
3 AS Class3
) )
ORDER BY StudentID
Which outputs:
STUDENTID | CLASS1 | CLASS2 | CLASS3
:-------- | :-------- | :-------- | :--------
001 | 4 | 3 | 1
002 | 2 | Not taken | Not taken
003 | 5 | Not taken | Not taken
004 | Not taken | Not taken | Not taken
005 | Not taken | Not taken | Not taken
db<>fiddle here

This is just conditional aggregation:
select s.studentid,
max(case when r.classid = 1 then r.grade end) as class1_grade,
max(case when r.classid = 2 then r.grade end) as class2_grade,
. . .
from students s left join
registrations r
on r.studentid = s.studentid;
You do have to list the columns explicitly. To avoid that, you need dynamic SQL (execute immediate).
Getting the results with one grade per row is simpler. Use a cross join to generate the rows and a left join to bring in the values:
select s.studentid, c.classid, r.grade
from students s cross join
classes c left join
registrations r
on r.studentid = s.studentid and r.classid = c.classid;

Related

PLSQL - How to retrieve the maximum value except if another specific value exists

I'm looking to join data between two tables and want to retrieve the maximum semester except if a specific value also exists from a table where a person can have multiple semesters. Our semester system is coded as 'YEAR-numerical month value' ex: 12 = December. My test script looks for students from August 2021 and I want to see what their maximum semester is except if the value is '2022-8'. I want that number to be prioritized and pulled in, even if there is a higher value like '2023-1'.
This is the simplified script where it's currently just looking at the maximum semester. I've tried IF and OR statements but keep getting generic errors so I don't know what i'm doing. Thank you for any help.
SELECT a.person
i.field_of_study
i.semester
FROM application_data a
LEFT JOIN information_table i ON i.person = a.person
WHERE a.semester = '2022-8'
AND i.semester = (SELECT max i2.semester
FROM information_table i2
WHERE i2.person = a.person
AND i.semester <= i2.semester)
The data table might look like this:
| Person A | Biology | 2022-5 |
| Person A | Biology | 2023-1 |
| Person B | Chemistry | 2022-1 |
| Person B | Psychology | 2022-8 |
| Person C | Mathematics | 2022-8 |
| Person C | Statistics | 2023-1 |
I would want the output to look like this:
| Person A | Biology | 2023-1 |
| Person B | Psychology | 2022-8 |
| Person C | Mathematics | 2022-8 |
Person A it chose the highest max semester as the prioritized semester does not exist. Person C it chose the prioritized semester even though there was a higher max value.
One option is to rank rows per person by semester in descending order, where 2022-8 ranks as the highest even though there are other "higher" values; finally extract rows whose rank = 1.
Sample data:
SQL> with test (person, field, semester) as
2 (select 'A', 'biology' , '2022-5' from dual union all
3 select 'A', 'biology' , '2023-1' from dual union all
4 select 'B', 'chemistry' , '2022-1' from dual union all
5 select 'B', 'psychology' , '2022-8' from dual union all
6 select 'C', 'mathematics', '2022-8' from dual union all
7 select 'C', 'statistics' , '2023-1' from dual
8 ),
Query begins here:
9 temp as
10 (select person, field, semester,
11 row_number() Over
12 (partition by person
13 order by case when semester = '2022-8' then 1E9
14 else to_number(regexp_substr(semester, '^\d+') ||
15 lpad(regexp_substr(semester, '\d+$'), 2, '0'))
16 end desc) rn
17 from test
18 )
19 select person, field, semester
20 from temp
21 where rn = 1;
PERSON FIELD SEMESTER
---------- ----------- ----------
A biology 2023-1
B psychology 2022-8
C mathematics 2022-8
SQL>

Update MyTable with values from AnotherTable (with self join)

I'm relatively new to SQL and currently making some practical tasks to gain experience and got struggled with an update of my custom overview table with values from another table that contains join.
I have an overview table MyTable with column EmployeeID. AnotherTable contains data of employees with EmployeeID and their ManagerID.
I am able to retrieve ManagerName using different join methods, including:
SELECT m.first_name
FROM AnotherTable.employees e LEFT JOIN
AnotherTable.employees m
on m.EmployeeID = e.ManagerID
But I am getting stuck updating MyTable, as I usually receive errors such as "single row query returns more than one row" or "SQL command not properly ended". I've read that Oracle doesnt support joins for updating tables. How can I overcome this issue? A sample data would be:
MyTable
------------------------------
EmployeeID | SomeOtherColumns| ..
1 | SomeData |
2 | SomeData |
3 | SomeData |
4 | SomeData |
5 | SomeData |
------------------------------
OtherTable
-------------------------------------
EmployeeID | Name | ManagerID |
1 | Steve | - |
2 | John | 1 |
3 | Peter | 1 |
4 | Bob | 2 |
5 | Patrick | 3 |
6 | Connor | 1 |
-------------------------------------
And the result would be then:
MyTable
-------------------------------------------
EmployeeID | SomeOtherColumns |ManagerName|
1 | SomeData | - |
2 | SomeData | Steve |
3 | SomeData | Steve |
4 | SomeData | John |
5 | SomeData | Peter |
6 | SomeData | Steve |
-------------------------------------------
As one of the options I tried to use is:
update MyTable
set MyTable.ManagerName = (
SELECT
(m.name) ManagerName
FROM
OtherTable.employees e
LEFT JOIN OtherTable.employees m ON
m.EmployeeID = e.ManagerID
)
But there I get "single row query returns more than one row" error. How is it possible to solve this?
You can use a hierarchical query:
UPDATE mytable m
SET managername = (SELECT name
FROM othertable
WHERE LEVEL = 2
START WITH employeeid = m.employeeid
CONNECT BY PRIOR managerid = employeeid);
or a self-join:
UPDATE mytable m
SET managername = (SELECT om.name
FROM othertable o
INNER JOIN othertable om
ON (o.managerid = om.employeeid)
WHERE o.employeeid = m.employeeid);
Which, for the sample data:
CREATE TABLE MyTable (EmployeeID, SomeOtherColumns, ManagerName) AS
SELECT LEVEL, 'SomeData', CAST(NULL AS VARCHAR2(20))
FROM DUAL
CONNECT BY LEVEL <= 5;
CREATE TABLE OtherTable(EmployeeID, Name, ManagerID) AS
SELECT 1, 'Alice', NULL FROM DUAL UNION ALL
SELECT 2, 'Beryl', 1 FROM DUAL UNION ALL
SELECT 3, 'Carol', 1 FROM DUAL UNION ALL
SELECT 4, 'Debra', 2 FROM DUAL UNION ALL
SELECT 5, 'Emily', 3 FROM DUAL UNION ALL
SELECT 6, 'Fiona', 1 FROM DUAL;
Then after either update, MyTable contains:
EMPLOYEEID
SOMEOTHERCOLUMNS
MANAGERNAME
1
SomeData
null
2
SomeData
Alice
3
SomeData
Alice
4
SomeData
Beryl
5
SomeData
Carol
Note: Keeping this data violates third-normal form; instead, you should keep the employee name in the table with the other employee data and then when you want to display the manager's name use SELECT ... FROM ... LEFT OUTER JOIN with a hierarchical query to include the result. What you do not want to do is duplicate the data as then it has the potential to become out-of-sync when something changes.
db<>fiddle here

How to join tables selecting duplicates with condition

I have 2 tables
TicketHeaders TH
ticketID
Amount
1
600
2
900
3
400
TicketBody TB
ticketID
SellerName
SellerType
1
Karen
Manager
1
James
Trainee
2
John
Manager
3
James
Trainee
What I need is to get a table with
TicketID - Amount - SellerName, but if I have a ticket with 2 sellers, I need to select only the manager for that particular ticket.
The output table should be:
ticketID
Amount
SellerName
1
600
Karen
2
900
John
3
400
James
If I use left join, I get duplicate amounts for ticket 1
SELECT TH.ticketID, TH.Amount, TB.SellerName
FROM TH
LEFT JOIN TB ON TH.ticketID = TB.ticketID
SELECT TH.ticketID, TH.Amount, COALESCE(TB_M.SellerName, TB_T.SellerName)
FROM TH
LEFT JOIN TB TB_M ON TH.ticketID = TB_M.ticketID AND TB_M.SellerType = 'Manager'
LEFT JOIN TB TB_T ON TH.ticketID = TB_T.ticketID AND TB_T.SellerType <> 'Manager'
Based on the stated version 2.5 it does appear that row_number() solutions would not be available to you. You could approach this with a single inner join. I don't know if there's possibly any benefit to avoiding the extra join in Firebird.
select ticketID, min(Amount) as Amount,
case min(case SellerType when 'Manager' then 1 else 2 end) when 1
then min(case SellerType = 'Manager' then SellerName end)
else min(case SellerType <> 'Manager' then SellerName end)
end SellerName
from TH th inner join TB tb on tb.ticketID = th.ticketID
group by ticketID
Another benefit is that this would work for a larger hierarchy of different sellers (by adding new cases.) It wouldn't work if there were multiple sellers at a single level though.
I would use a row_number() in this scenario:
with _cte as (
SELECT TH.ticketID,
TH.Amount,
TB.SellerName,
row_number() over (partition by TH.ticketID order by case when SellerType = 'Manager' then 0 else 1 end) as rn
FROM TH
LEFT JOIN TB ON TH.ticketID = TB.ticketID
)
select ticketID, Amount, SellerName
from _cte
where rn = 1
A.S. I think your problem is not well defined yet:
I need to select only the manager for that particular ticket
What if there would be many rows, but two or more managers? zero managers? Without thinking it through, like making warranties SQL server would never allow such data to be inserted, it is an error waiting to happen.
I also think SellerType should better be an integer field, a foreign key to some Seller_types dictionary table - both because integer fields are easier indexed and compared for joining tables, and because that would allow you to later rename "functional roles" as you please (or as you was bossed to do), without changing a thing: your Seller_types can have extra columns like integer role_priority or even something like max_persons_of_type_in_one_ticket (you bosses might, just for example, decide there can be two managers on one ticket, or a manager and a vice-manager, and then no more than four trainees).
But back to the question, there is one more approach to do it. It would de facto run a correlated sub-query for every row in TicketHeaders table, so it would probably be slower if you do "long" selects with thousands rows. Especially if you keep default small memory caches of Firebird (see articles on configuring Firebird and relaxed configs on ib-aid.com).
On the other hand, it akes to "do once and forget", making your queries simpler, thus less chance for you to err in future. And speed penalty would probably be unnoticeable on short (100 and less rows) queries. And, there would be no penalty if you would not query that new column at all (you do NOT use select * queries in production, do you?).
So, the code, finally: db<>fiddle here
create table Ticket_Headers (
ticket_id integer primary key,
amount integer not null
)
create table Ticket_Body (
ticket_id integer
REFERENCES Ticket_Headers(ticket_id)
ON DELETE CASCADE
ON UPDATE CASCADE,
Seller_Name varchar(20) not null,
Seller_Type varchar(20) not null,
CONSTRAINT TicketBody_PK PRIMARY KEY (ticket_id, Seller_Name)
)
create index idx_TicketBody_Type on Ticket_Body(Seller_Type)
insert into Ticket_Headers
select 1, 600 from rdb$database union all
select 2, 900 from rdb$database union all
select 3, 400 from rdb$database
3 rows affected
insert into Ticket_body
select 1, 'Karen', 'Manager' from rdb$database union all
select 1, 'James', 'Trainee' from rdb$database union all
select 2, 'John', 'Manager' from rdb$database union all
select 3, 'James', 'Trainee' from rdb$database
4 rows affected
select * from Ticket_Headers
TICKET_ID | AMOUNT
--------: | -----:
1 | 600
2 | 900
3 | 400
select * from Ticket_Body
TICKET_ID | SELLER_NAME | SELLER_TYPE
--------: | :---------- | :----------
1 | Karen | Manager
1 | James | Trainee
2 | John | Manager
3 | James | Trainee
select * from Ticket_Headers TH, Ticket_Body TB where TH.ticket_id = TB.ticket_ID
TICKET_ID | AMOUNT | TICKET_ID | SELLER_NAME | SELLER_TYPE
--------: | -----: | --------: | :---------- | :----------
1 | 600 | 1 | Karen | Manager
1 | 600 | 1 | James | Trainee
2 | 900 | 2 | John | Manager
3 | 400 | 3 | James | Trainee
alter table Ticket_Headers
add Seller_Top computed by
( -- this parenthesis is required by COMPUTED BY SQL syntax
( -- this parenthesis is required to coerce SELECT from query to expression
select First(1) TB.Seller_Name
from Ticket_Body TB
where TB.ticket_id = Ticket_Headers.ticket_id
order by TB.Seller_type /* Descending - if other order to be needed */
)
)
select * from Ticket_Headers
TICKET_ID | AMOUNT | SELLER_TOP
--------: | -----: | :---------
1 | 600 | Karen
2 | 900 | John
3 | 400 | James
The aforementioned Seller_types.role_priority would be much more flexible thus future-proof approach for such an order-by.

SQL: Multiple select statements in one query

I want to select information from three SQL tables within one query.
An example could be the following setup.
tblFriends
id | idmother | dayBirth
--------------------------
1 | 1 | 09/09/21
2 | 2 | 09/09/21
3 | 3 | 11/09/21
4 | 3 | 11/09/21
5 | 4 | 07/09/21
... | ... | ...
tblMothers
id | name
---------------
1 | Alice
2 | Samantha
3 | Veronica
4 | Maria
... | ...
tblIsAssignedParty
idMother | codeParty | price
------------------------------
1 | 231 | 15
2 | 645 | 28
3 | 164 | 33
... | ... | ...
I want to have a query that gives me the following:
dayBirth | weekDay | totalFriendsForParty | totalFriendsForPartyPercent | totalFriendsNoParty | totalFriendsNoPartyPercent
-----------------------------------------------------------------------------------------------------------------------------
07/09/21 | Tuesday | 0 | 0 | 1 | 0.??
09/09/21 | Thursday | 2 | 0.?? | 0 | 0
11/09/21 | Saturday | 2 | 0.?? | 0 | 0
Note:
dayBirth = simply the day of birth; I need the friends grouped by this date
weekDay = dayBirth name
totalFriendsForParty = friends who will be attending the party; we know if the mother has a party assigned
totalFriendsForPartyPercent = Percentatge of friends, of the total number of friends who will attend the parties
totalFriendsNoParty = friends who will not attend the party; we know if the mother does not have a party assigned
totalFriendsNoPartyPercent = Percentatge of friends, of the total number of friends who will not attend the parties
I need the number of friends based on whether their mothers are at a party or not. I tried to multiple select statements in Single query but the following code didn't work:
SELECT
(SELECT distinct dayBirth, TO_CHAR(dayBirth, 'DAY') from tblFriends) as firstSecondColumn,
(SELECT dayBirth, count(*) from tblFriends
where idMother IN (
SELECT f.idMother
from tblFriends f
left join tblIsAssignedParty iap
on f.idMother = iap.idMother
where iap.codeParty is not null)
group by dayBirth) as thirdColumn,
(SELECT TRUNC(count(*) / count(thirdColumn.id) , 2) from tblFriends) as quarterColumn,
(SELECT dayBirth, count(*) from tblFriends
where idMother IN (
SELECT f.idMother
from tblFriends f
left join tblIsAssignedParty iap
on f.idMother = iap.idMother
where iap.codeParty is not null)
group by dayBirth) as fifthColumn,
(SELECT TRUNC(count(*) / count(fifthColumn.id) , 2) from tblFriends) as sixthColumn,
order by dayBirth
Any advice on this one? I try to learn, I do what I can :-(
Edit: I can't add inserts because it's a file upload, but I can add an approximation of table creation.
Create tables:
CREATE TABLE tblFriends
(
id NUMBER(*,0),
idMother CHAR(10 CHAR),
CONSTRAINT PK_FRIEND PRIMARY KEY (id, idMother),
CONSTRAINT FK_IDMOTHER FOREIGN KEY (idMother)
REFERENCES tblMothers (id),
dayBirth DATE CONSTRAINT NN_DAY NOT NULL
)
CREATE TABLE tblMothers
(
id CHAR(10 CHAR) CONSTRAINT PK_MOTHER PRIMARY KEY (id),
name VARCHAR2(20 CHAR) CONSTRAINT NN_MNAME NOT NULL
)
CREATE TABLE tblIsAssignedParty
(
idMother CHAR(10 CHAR),
codeParty CHAR(10 CHAR),
CONSTRAINT PK_ASSIGNED PRIMARY KEY (idMother, codeParty),
CONSTRAINT FK_ASSIGNEDMOTHER FOREIGN KEY (idMother)
REFERENCES tblMothers (id),
CONSTRAINT FK_ASSIGNEDPARTY FOREIGN KEY (codeParty)
REFERENCES tblParties (codeParty),
price DECIMAL(10,2)
)
You appear to want to LEFT JOIN the firends and party tables and then use conditional aggregation:
SELECT dayBirth,
TO_CHAR(dayBirth, 'FMDAY', 'NLS_DATE_LANGUAGE=English') AS day,
COUNT(p.idmother)
AS totalFriendsForParty,
COUNT(p.idmother) / COUNT(*) * 100
AS totalFriendsForPartyPercent,
COUNT(CASE WHEN p.idmother IS NULL THEN 1 END) AS totalFriendsNoParty,
COUNT(CASE WHEN p.idmother IS NULL THEN 1 END) / COUNT(*) * 100
AS totalFriendsNoPartyPercent
FROM tblFriends f
LEFT OUTER JOIN tblIsAssignedParty p
ON (f.idmother = p.idmother)
GROUP BY dayBirth
Which, for the sample data:
CREATE TABLE tblFriends (id, idmother, dayBirth) AS
SELECT 1, 1, DATE '2021-09-09' FROM DUAL UNION ALL
SELECT 2, 2, DATE '2021-09-09' FROM DUAL UNION ALL
SELECT 3, 3, DATE '2021-09-11' FROM DUAL UNION ALL
SELECT 4, 3, DATE '2021-09-11' FROM DUAL UNION ALL
SELECT 5, 4, DATE '2021-09-07' FROM DUAL;
CREATE TABLE tblIsAssignedParty (idMother, codeParty, price) AS
SELECT 1, 231, 15 FROM DUAL UNION ALL
SELECT 2, 645, 28 FROM DUAL UNION ALL
SELECT 3, 164, 33 FROM DUAL;
Outputs:
DAYBIRTH
DAY
TOTALFRIENDSFORPARTY
TOTALFRIENDSFORPARTYPERCENT
TOTALFRIENDSNOPARTY
TOTALFRIENDSNOPARTYPERCENT
09-SEP-21
THURSDAY
2
100
0
0
11-SEP-21
SATURDAY
2
100
0
0
07-SEP-21
TUESDAY
0
0
1
100
db<>fiddle here

A better way to aggregate into a default value

For this example I have three tables (individual, business, and ind_to_business). Individual has information on people. Business has information on businesses. And ind_to_business has information on which people are linked to which business. Here are their DDL:
CREATE TABLE individual
(
ID INTEGER PRIMARY KEY,
NAME VARCHAR2(100) NOT NULL,
ENTERPRISE_ID VARCHAR2(25) NOT NULL UNIQUE
);
CREATE TABLE business
(
ID INTEGER PRIMARY KEY,
NAME VARCHAR2(100) NOT NULL,
ENTERPRISE_ID VARCHAR2(25) NOT NULL UNIQUE
);
CREATE TABLE ind_to_business
(
ID INTEGER PRIMARY KEY,
IND_ID REFERENCES individual(id),
BUS_ID REFERENCES business(id),
START_DT DATE NOT NULL,
END_DT DATE
);
I'm looking for the best way to display one row for each person. If they are linked to one business, I want to display the the business's ENTERPRISE_ID. If they are linked to more than one business, I want to display the default value 'Multiple'. They will always be linked to a business, so there is no LEFT JOIN necessary. They can also be linked to a business more than once (Leaving and coming back). Multiple records for the same business would be aggregated.
So for the following sample data:
Individual:
+----+------------+---------------+
| ID | NAME | ENTERPRISE_ID |
+----+------------+---------------+
| 1 | John Smith | 53a23B7 |
| 2 | Jane Doe | 63f2a35 |
+----+------------+---------------+
Business:
+----+----------+---------------+
| ID | NAME | ENTERPRISE_ID |
+----+----------+---------------+
| 3 | ABC Corp | 2a34d9b |
| 4 | XYZ Inc | 34bf21e |
+----+----------+---------------+
ind_to_business
+----+--------+--------+-------------+-------------+
| ID | IND_ID | BUS_ID | START_DT | END_DT |
+----+--------+--------+-------------+-------------+
| 5 | 1 | 3 | 01-JAN-2000 | 31-DEC-2002 |
| 6 | 1 | 3 | 01-JAN-2015 | |
| 7 | 2 | 3 | 01-JAN-2000 | |
| 8 | 2 | 4 | 01-MAR-2006 | 05-JUN-2010 |
| 9 | 2 | 4 | 15-DEC-2019 | |
+----+--------+--------+-------------+-------------+
I would expect the following output:
+---------+------------+------------+
| IND_ID | NAME | LINKED_BUS |
+---------+------------+------------+
| 53a23B7 | John Smith | 2a34d9b |
| 63f2a35 | Jane Doe | Multiple |
+---------+------------+------------+
Here is my current query:
SELECT DISTINCT
sub.ind_id,
sub.name,
DECODE(sub.bus_count, 1, sub.bus_id, 'Multiple') AS LINKED_BUS
FROM (SELECT i.enterprise_id AS IND_ID,
i.name,
b.enterprise_id AS BUS_ID,
COUNT(DISTINCT b.enterprise_id) OVER (PARTITION BY i.id) AS BUS_COUNT
FROM individual i
INNER JOIN ind_to_business i2b ON i.id = i2b.ind_id
INNER JOIN business b ON i2b.bus_id = b.id) sub;
My query works, but this is running on a large dataset and taking a long time to run. I'm wondering if anyone has any ideas on how improve this so that there isn't so much wasted processing (i.e Needing to do a DISTINCT on the final result or doing COUNT(DISTINCT) in the inline view only to use that value in the DECODE above).
I've also created a DBFiddle for this question. (Link)
Thanks in advance for any input.
You could try and use a correlated subquery. This removes the need for outer distinct:
SELECT
i.enterprise_id ind_id,
i.name,
(
SELECT DECODE(COUNT(DISTINCT b.enterprise_id), 1, MIN(bus_id), 'Multiple')
FROM ind_to_business i2b
INNER JOIN business b ON i2b.bus_id = b.id
WHERE i2b.ind_id = i.id
) linked_bus
FROM individual i
You can join with the aggregated ind_to_business per individual. One way to do this:
select i.id, i.name, coalesce(b.enterprise_id, 'Multiple')
from individual i
join
(
select
ind_id,
case when min(bus_id) = max(bus_id) then min(bus_id) else null end as bus_id
from ind_to_business
group by ind_id
) ib on ib.ind_id = i.id
left join business b on b.id = ib.bus_id
order by i.id;
First you should sub-query to get all needed dimensions and then do all your final aggregation using CASE statement.
select
ind_id,
name,
case
when count(*) > 1 then 'Multiple'
else ind_id
end as linked_bus
from
(
select
distinct i.enterprise_id as ind_id,
i.name,
b.enterprise_id as bus_id
from individual i
join ind_to_business i2b
on i.id = i2b.ind_id
join business b
on i2b.bus_id = b.id
) vals
group by
ind_id,
name
order by
ind_id
No need of using DISTINCT twice. You could use subquery factoring and put the in-line view in WITH clause, and make the data set DISTINCT in the subquery itself.
WITH data AS
(
SELECT distinct
i.enterprise_id AS IND_ID,
i.name,
b.enterprise_id AS BUS_ID
FROM individual i
JOIN ind_to_business i2b ON i.id = i2b.ind_id
JOIN business b ON i2b.bus_id = b.id
)
SELECT ind_id,
name,
case
when count(*) = 1 then MIN(bus_id)
else 'Multiple'
end AS LINKED_BUS
FROM data
GROUP BY ind_id, name;
IND_ID NAME LINKED_BUS
---------- ---------- -------------------------
53a23B7 John Smith 2a34d9b
63f2a35 Jane Doe Multiple