How to join tables selecting duplicates with condition - sql

I have 2 tables
TicketHeaders TH
ticketID
Amount
1
600
2
900
3
400
TicketBody TB
ticketID
SellerName
SellerType
1
Karen
Manager
1
James
Trainee
2
John
Manager
3
James
Trainee
What I need is to get a table with
TicketID - Amount - SellerName, but if I have a ticket with 2 sellers, I need to select only the manager for that particular ticket.
The output table should be:
ticketID
Amount
SellerName
1
600
Karen
2
900
John
3
400
James
If I use left join, I get duplicate amounts for ticket 1
SELECT TH.ticketID, TH.Amount, TB.SellerName
FROM TH
LEFT JOIN TB ON TH.ticketID = TB.ticketID

SELECT TH.ticketID, TH.Amount, COALESCE(TB_M.SellerName, TB_T.SellerName)
FROM TH
LEFT JOIN TB TB_M ON TH.ticketID = TB_M.ticketID AND TB_M.SellerType = 'Manager'
LEFT JOIN TB TB_T ON TH.ticketID = TB_T.ticketID AND TB_T.SellerType <> 'Manager'

Based on the stated version 2.5 it does appear that row_number() solutions would not be available to you. You could approach this with a single inner join. I don't know if there's possibly any benefit to avoiding the extra join in Firebird.
select ticketID, min(Amount) as Amount,
case min(case SellerType when 'Manager' then 1 else 2 end) when 1
then min(case SellerType = 'Manager' then SellerName end)
else min(case SellerType <> 'Manager' then SellerName end)
end SellerName
from TH th inner join TB tb on tb.ticketID = th.ticketID
group by ticketID
Another benefit is that this would work for a larger hierarchy of different sellers (by adding new cases.) It wouldn't work if there were multiple sellers at a single level though.

I would use a row_number() in this scenario:
with _cte as (
SELECT TH.ticketID,
TH.Amount,
TB.SellerName,
row_number() over (partition by TH.ticketID order by case when SellerType = 'Manager' then 0 else 1 end) as rn
FROM TH
LEFT JOIN TB ON TH.ticketID = TB.ticketID
)
select ticketID, Amount, SellerName
from _cte
where rn = 1

A.S. I think your problem is not well defined yet:
I need to select only the manager for that particular ticket
What if there would be many rows, but two or more managers? zero managers? Without thinking it through, like making warranties SQL server would never allow such data to be inserted, it is an error waiting to happen.
I also think SellerType should better be an integer field, a foreign key to some Seller_types dictionary table - both because integer fields are easier indexed and compared for joining tables, and because that would allow you to later rename "functional roles" as you please (or as you was bossed to do), without changing a thing: your Seller_types can have extra columns like integer role_priority or even something like max_persons_of_type_in_one_ticket (you bosses might, just for example, decide there can be two managers on one ticket, or a manager and a vice-manager, and then no more than four trainees).
But back to the question, there is one more approach to do it. It would de facto run a correlated sub-query for every row in TicketHeaders table, so it would probably be slower if you do "long" selects with thousands rows. Especially if you keep default small memory caches of Firebird (see articles on configuring Firebird and relaxed configs on ib-aid.com).
On the other hand, it akes to "do once and forget", making your queries simpler, thus less chance for you to err in future. And speed penalty would probably be unnoticeable on short (100 and less rows) queries. And, there would be no penalty if you would not query that new column at all (you do NOT use select * queries in production, do you?).
So, the code, finally: db<>fiddle here
create table Ticket_Headers (
ticket_id integer primary key,
amount integer not null
)
create table Ticket_Body (
ticket_id integer
REFERENCES Ticket_Headers(ticket_id)
ON DELETE CASCADE
ON UPDATE CASCADE,
Seller_Name varchar(20) not null,
Seller_Type varchar(20) not null,
CONSTRAINT TicketBody_PK PRIMARY KEY (ticket_id, Seller_Name)
)
create index idx_TicketBody_Type on Ticket_Body(Seller_Type)
insert into Ticket_Headers
select 1, 600 from rdb$database union all
select 2, 900 from rdb$database union all
select 3, 400 from rdb$database
3 rows affected
insert into Ticket_body
select 1, 'Karen', 'Manager' from rdb$database union all
select 1, 'James', 'Trainee' from rdb$database union all
select 2, 'John', 'Manager' from rdb$database union all
select 3, 'James', 'Trainee' from rdb$database
4 rows affected
select * from Ticket_Headers
TICKET_ID | AMOUNT
--------: | -----:
1 | 600
2 | 900
3 | 400
select * from Ticket_Body
TICKET_ID | SELLER_NAME | SELLER_TYPE
--------: | :---------- | :----------
1 | Karen | Manager
1 | James | Trainee
2 | John | Manager
3 | James | Trainee
select * from Ticket_Headers TH, Ticket_Body TB where TH.ticket_id = TB.ticket_ID
TICKET_ID | AMOUNT | TICKET_ID | SELLER_NAME | SELLER_TYPE
--------: | -----: | --------: | :---------- | :----------
1 | 600 | 1 | Karen | Manager
1 | 600 | 1 | James | Trainee
2 | 900 | 2 | John | Manager
3 | 400 | 3 | James | Trainee
alter table Ticket_Headers
add Seller_Top computed by
( -- this parenthesis is required by COMPUTED BY SQL syntax
( -- this parenthesis is required to coerce SELECT from query to expression
select First(1) TB.Seller_Name
from Ticket_Body TB
where TB.ticket_id = Ticket_Headers.ticket_id
order by TB.Seller_type /* Descending - if other order to be needed */
)
)
select * from Ticket_Headers
TICKET_ID | AMOUNT | SELLER_TOP
--------: | -----: | :---------
1 | 600 | Karen
2 | 900 | John
3 | 400 | James
The aforementioned Seller_types.role_priority would be much more flexible thus future-proof approach for such an order-by.

Related

Update MyTable with values from AnotherTable (with self join)

I'm relatively new to SQL and currently making some practical tasks to gain experience and got struggled with an update of my custom overview table with values from another table that contains join.
I have an overview table MyTable with column EmployeeID. AnotherTable contains data of employees with EmployeeID and their ManagerID.
I am able to retrieve ManagerName using different join methods, including:
SELECT m.first_name
FROM AnotherTable.employees e LEFT JOIN
AnotherTable.employees m
on m.EmployeeID = e.ManagerID
But I am getting stuck updating MyTable, as I usually receive errors such as "single row query returns more than one row" or "SQL command not properly ended". I've read that Oracle doesnt support joins for updating tables. How can I overcome this issue? A sample data would be:
MyTable
------------------------------
EmployeeID | SomeOtherColumns| ..
1 | SomeData |
2 | SomeData |
3 | SomeData |
4 | SomeData |
5 | SomeData |
------------------------------
OtherTable
-------------------------------------
EmployeeID | Name | ManagerID |
1 | Steve | - |
2 | John | 1 |
3 | Peter | 1 |
4 | Bob | 2 |
5 | Patrick | 3 |
6 | Connor | 1 |
-------------------------------------
And the result would be then:
MyTable
-------------------------------------------
EmployeeID | SomeOtherColumns |ManagerName|
1 | SomeData | - |
2 | SomeData | Steve |
3 | SomeData | Steve |
4 | SomeData | John |
5 | SomeData | Peter |
6 | SomeData | Steve |
-------------------------------------------
As one of the options I tried to use is:
update MyTable
set MyTable.ManagerName = (
SELECT
(m.name) ManagerName
FROM
OtherTable.employees e
LEFT JOIN OtherTable.employees m ON
m.EmployeeID = e.ManagerID
)
But there I get "single row query returns more than one row" error. How is it possible to solve this?
You can use a hierarchical query:
UPDATE mytable m
SET managername = (SELECT name
FROM othertable
WHERE LEVEL = 2
START WITH employeeid = m.employeeid
CONNECT BY PRIOR managerid = employeeid);
or a self-join:
UPDATE mytable m
SET managername = (SELECT om.name
FROM othertable o
INNER JOIN othertable om
ON (o.managerid = om.employeeid)
WHERE o.employeeid = m.employeeid);
Which, for the sample data:
CREATE TABLE MyTable (EmployeeID, SomeOtherColumns, ManagerName) AS
SELECT LEVEL, 'SomeData', CAST(NULL AS VARCHAR2(20))
FROM DUAL
CONNECT BY LEVEL <= 5;
CREATE TABLE OtherTable(EmployeeID, Name, ManagerID) AS
SELECT 1, 'Alice', NULL FROM DUAL UNION ALL
SELECT 2, 'Beryl', 1 FROM DUAL UNION ALL
SELECT 3, 'Carol', 1 FROM DUAL UNION ALL
SELECT 4, 'Debra', 2 FROM DUAL UNION ALL
SELECT 5, 'Emily', 3 FROM DUAL UNION ALL
SELECT 6, 'Fiona', 1 FROM DUAL;
Then after either update, MyTable contains:
EMPLOYEEID
SOMEOTHERCOLUMNS
MANAGERNAME
1
SomeData
null
2
SomeData
Alice
3
SomeData
Alice
4
SomeData
Beryl
5
SomeData
Carol
Note: Keeping this data violates third-normal form; instead, you should keep the employee name in the table with the other employee data and then when you want to display the manager's name use SELECT ... FROM ... LEFT OUTER JOIN with a hierarchical query to include the result. What you do not want to do is duplicate the data as then it has the potential to become out-of-sync when something changes.
db<>fiddle here

SQL - Filter common records from a subquery

I've a table storing data about employees & their respective departments.
One employee can belong to > 1 department.
Without going much into details below is the result of one of my query from joins on multiple tables.
Query:
Select
e.EmpID,
d.Department,
ed.Date,
ed.Action
from Employee e
inner join Emp_Dept ed on ....
inner join Department d on ....
where .....
Data fetched:
EmpID
Department
Date
Action
1
Food
01-01-2021
ADDED
2
Food
01-01-2021
ADDED
2
Food
04-01-2021
REMOVED
2
Auto
01-01-2021
ADDED
3
Electric
02-01-2021
ADDED
3
Electric
04-01-2021
REMOVED
3
Auto
04-01-2021
REMOVED
From this data I want to remove those emplyees who have been added & also removed in that department.
That is from above data EmpId 2 -> Food & EmpId 3 -> Electric should be excluded.
Please suggest how to filter this out?
If I understand correctly, you can use not exists:
select t.*
from t
where t.action = 'ADDED' and
not exists (select 1
from t t2
where t2.empid = t.empid and
t2.department = t.department and
t2.action = 'REMOVED'
);
EDIT:
If you want employees that have more added then removed for a given department:
select empid, deptid
from t
group by empid deptid
having sum(case when action = 'ADDED' then 1
when action = 'REMOVED' then -1
end);
Use analytic LISTAGG function to get the complet ehistory of the actions per employee and department
listagg(ACTION,',') within group (order by DATE_D, ACTION) over (partition by EMPID, DEPARTMENT) ACTION_LST
Note order is defined on DATE and ACTION, as if you are removed on the same day you probably want to see ADDED,REMOVED
Basic query (using your rowset as tab)
with dt as (
select EMPID, DEPARTMENT, DATE_D, ACTION,
listagg(ACTION,',') within group (order by DATE_D, ACTION) over (partition by EMPID, DEPARTMENT) ACTION_LST
from tab
EMPID, DEPARTMENT, DATE_D, ACTION, ACTION_LST
1 Food 01.01.2021 00:00:00 ADDED ADDED
2 Auto 01.01.2021 00:00:00 ADDED ADDED
2 Food 01.01.2021 00:00:00 ADDED ADDED,REMOVED
2 Food 04.01.2021 00:00:00 REMOVED ADDED,REMOVED
3 Auto 04.01.2021 00:00:00 REMOVED REMOVED
3 Electric 02.01.2021 00:00:00 ADDED ADDED,REMOVED
3 Electric 04.01.2021 00:00:00 REMOVED ADDED,REMOVED
This approach is anyway interessting to see all distinct combination of the actions.
Now you simple add a filtering query that eliminates the not needed combinations:
with dt as (
select EMPID, DEPARTMENT, DATE_D, ACTION,
listagg(ACTION,',') within group (order by DATE_D, ACTION) over (partition by EMPID, DEPARTMENT) ACTION_LST
from tab)
select EMPID, DEPARTMENT, DATE_D, ACTION
from dt
where ACTION_LST not in ('ADDED,REMOVED')
Use aggregation and set the condition in the HAVING clause:
select e.EmpID, d.Department
from Employee e
inner join Emp_Dept ed on ....
inner join Department d on ....
where .....
group by e.EmpID, d.Department
having count(distinct case when ed.Action in ('ADDED', 'REMOVED') THEN ed.Action END) < 2
You may use pattern recognition feature of 12C and above (with more details in the Data Warehousing Guide), which can also handle sequences of additions and removals and looks quite natural here:
select *
from t
match_recognize(
partition by empid, department
order by dt
/*Find matched rows*/
measures
classifier() as cls
/*To include all the rows and then filter out classified rows*/
all rows per match with unmatched rows
/*Addition followed by removal*/
pattern (add0 rem0)
define
add0 as action = 'ADDED',
rem0 as action = 'REMOVED'
)
/*Exclude matched (added -> removed)*/
where cls is null
EMPID | DEPARTMENT | DT | CLS | ACTION
----: | :--------- | :-------- | :--- | :------
1 | Food | 01-JAN-21 | null | ADDED
2 | Auto | 01-JAN-21 | null | ADDED
3 | Auto | 01-APR-21 | null | REMOVED
/*Create a sequence of add -> remove -> add*/
insert into t
select 2, 'Food', date '2021-05-01', 'ADDED' from dual union all
select 2, 'Food', date '2021-06-01', 'REMOVED' from dual union all
select 2, 'Food', date '2021-07-01', 'ADDED' from dual
select *
from t
match_recognize(
partition by empid, department
order by dt
measures
classifier() as cls
all rows per match with unmatched rows
pattern (add0 rem0)
define
add0 as action = 'ADDED',
rem0 as action = 'REMOVED'
)
where cls is null
EMPID | DEPARTMENT | DT | CLS | ACTION
----: | :--------- | :-------- | :--- | :------
1 | Food | 01-JAN-21 | null | ADDED
2 | Auto | 01-JAN-21 | null | ADDED
2 | Food | 01-JUL-21 | null | ADDED
3 | Auto | 01-APR-21 | null | REMOVED
db<>fiddle here

Pulling multiple entries based on ROW_NUMBER

I got the row_num column from a partition. I want each Type to match with at least one Sent and one Resent. For example, Jon's row is removed below because there is no Resent. Kim's Sheet row is also removed because again, there is no Resent. I tried using a CTE to take all columns for a Code if row_num = 2 but Kim's Sheet row obviously shows up because they're all under one Code. If anyone could help, that'd be great!
Edit: I'm using SSMS 2018. There are multiple Statuses other than Sent and Resent.
What my table looks like:
+-------+--------+--------+---------+---------+
| Code | Name | Type | Status | row_num |
+-------+--------+--------+---------+---------+
| 123 | Jon | Sheet | Sent | 1 |
| 221 | Kim | Sheet | Sent | 1 |
| 221 | Kim | Book | Resent | 1 |
| 221 | Kim | Book | Sent | 2 |
| 221 | Kim | Book | Sent | 3 |
+-------+--------+--------+---------+---------+
What I want it to look like:
+-------+--------+--------+---------+---------+
| Code | Name | Type | Status | row_num |
+-------+--------+--------+---------+---------+
| 221 | Kim | Book | Resent| 1 |
| 221 | Kim | Book | Sent | 2 |
| 221 | Kim | Book | Sent | 3 |
+-------+--------+--------+---------+---------+
Here is my CTE code:
WITH CTE AS
(
SELECT *
FROM #MyTable
)
SELECT *
FROM #MyTable
WHERE Code IN (SELECT Code FROM CTE WHERE row_num = 2)
If sent and resent are the only values for status, then you can use:
select t.*
from t
where exists (select 1
from t t2
where t2.name = t.name and
t2.type = t.type and
t2.status <> t.status
);
You can also phrase this with window functions:
select t.*
from (select t.*,
min(status) over (partition by name, type) as min_status,
max(status) over (partition by name, type) as max_status
from t
) t
where min_status <> max_status;
Both of these can be tweaked if other status values are possible. However, based on your question and sample data, that does not seem necessary.
FIDDLE
CREATE TABLE Table1(ID integer,Name VARCHAR(10),Type VARCHAR(10),Status VARCHAR(10),row_num integer);
INSERT INTO Table1 VALUES
('123','Jon','Sheet','Sent','1'),
('221','Kim','Sheet','Sent','1'),
('221','Kim','Book','Resent','1'),
('221','Kim','Book','Sent','2'),
('221','Kim','Book','Sent','3');
SELECT t1.*
FROM Table1 t1
WHERE EXISTS (
select 1
from Table1 t2
where t2.Name=t1.Name
and t2.Type=t1.TYpe
and t2.Status = case when t1.Status='Sent'
then 'Resent'
else 'Sent' end)
It would be easier if you would provide some scripts to create table and put these test data, but try something like
with a1 as (
select
name, type,
row_number() over (partition by code, Name, type, status) as rn
from #MyTable
), a2 as (
select * from a1 where rn > 1
)
select t.*
from #MyTable as t
inner join a2 on t.name = a2.name and t.type = a2.type;
Here you
calculate another row number using partitions by code, name, type and status,
then fetch these with this new row number > 1
and finally, you use that to join to original table and get interesting you rows
Syntax may vary on MSSQL, but you should give it a try. And please use better names than me ;-)
This solution is quite generic because it doesn't rely on used statuses. They're not hardcoded. And you can easily control what matters by changing partitions.
Fiddle

Oracle - Finding missing /non-joined records

I have an issue in Oracle 12 that is easiest explained with the traditional database design scenario of students, classes, and students taking classes called registrations. I understand this model well. I have a scenario where I need to get a COMPLETE list, of all students against ALL classes, and whether or not they are taking that class or not...
Lets use this table design here...
CREATE TABLE CLASSES
(CLASSID VARCHAR2(10) PRIMARY KEY,
CLASSNAME VARCHAR2(25),
INSTRUCTOR VARCHAR2(25) );
CREATE TABLE STUDENTS
(STUDENTID VARCHAR2(10) PRIMARY KEY,
STUDENTNAMENAME VARCHAR2(25)
STUDY_MAJOR VARCHAR2(25) );
CREATE TABLE REGISTRATION
(
CLASSID VARCHAR2(10 BYTE),
STUDENTID VARCHAR2(10 BYTE),
GRADE NUMBER(4,0),
CONSTRAINT "PK1" PRIMARY KEY ("CLASSID", "STUDENTID"),
CONSTRAINT "FK1" FOREIGN KEY ("CLASSID") REFERENCES "CLASSES" ("CLASSID") ENABLE,
CONSTRAINT "FK2" FOREIGN KEY ("STUDENTID") REFERENCES "EGR_MM"."STUDENTS" ("STUDENTID") ENABLE
) ;
So assume the following... 300 students, and 15 different classes... and the REGISTRATION table will show how many students taking how many classes... What I need is that info PLUS all the NON-TAKEN combinations... i.e. I need a report (SQL statement) that shows ALL possible combinations... i.e. 300 x 15, and then whether that row exists in the registration table...so for example, the output should look like this...
STUDENTID Class1_GRADE Class2_Grade Class3_Grade` Class4_Grade
101 A B Not Taking A
102 C Not Taking Not Taking Not Taking
****** THIS STUDENT NOT TAKING ANY CLASSES So NOT in the Registrations Table
103 Not Taking Not Taking Not Taking Not Taking
This would work as well, and I can probably do a PIVOT to get the above listing.
STUDENTID CLASSID GRADE
101 Class1 A
101 Class2 B
101 Class3 Not Taking
101 Class4 A
...
102 Class1 C
102 Class2 Not Taking
102 Class3 Not Taking
102 Class4 Not Taking
...
103 Class1 Not Taking // THIS STUDENT NOT TAKING ANY CLASSES
103 Class2 Not Taking
103 Class3 Not Taking
103 Class4 Not Taking
How do I fill in the missing data, i.e. the combination of students and classes NOT taken...?
CROSS JOIN the students and classes and then LEFT OUTER JOIN the registrations and then use COALESCE to get the Not taken value:
SELECT s.studentid,
c.classid,
COALESCE( TO_CHAR( r.grade ), 'Not taken' ) AS grade
FROM students s
CROSS JOIN classes c
LEFT OUTER JOIN registration r
ON ( s.studentid = r.studentid AND c.classid = r.classid )
Which, if you have the data:
INSERT INTO Classes
SELECT LEVEL,
'Class' || LEVEL,
'Instructor' || LEVEL
FROM DUAL
CONNECT BY LEVEL <= 3;
INSERT INTO Students
SELECT TO_CHAR( LEVEL, 'FM000' ),
'Student' || LEVEL,
'Major'
FROM DUAL
CONNECT BY LEVEL <= 5;
INSERT INTO Registration
SELECT 1, '001', 4 FROM DUAL UNION ALL
SELECT 1, '002', 2 FROM DUAL UNION ALL
SELECT 1, '003', 5 FROM DUAL UNION ALL
SELECT 2, '001', 3 FROM DUAL UNION ALL
SELECT 3, '001', 1 FROM DUAL;
Then it outputs:
STUDENTID | CLASSID | GRADE
:-------- | :------ | :--------
001 | 1 | 4
002 | 1 | 2
003 | 1 | 5
001 | 2 | 3
001 | 3 | 1
005 | 1 | Not taken
004 | 2 | Not taken
003 | 3 | Not taken
005 | 3 | Not taken
005 | 2 | Not taken
002 | 2 | Not taken
003 | 2 | Not taken
004 | 1 | Not taken
002 | 3 | Not taken
004 | 3 | Not taken
If you want to pivot it then:
SELECT *
FROM (
SELECT s.studentid,
c.classid,
COALESCE( TO_CHAR( r.grade ), 'Not taken' ) AS grade
FROM students s
CROSS JOIN classes c
LEFT OUTER JOIN registration r
ON ( s.studentid = r.studentid AND c.classid = r.classid )
)
PIVOT ( MAX( grade ) FOR classid IN (
1 AS Class1,
2 AS Class2,
3 AS Class3
) )
ORDER BY StudentID
Which outputs:
STUDENTID | CLASS1 | CLASS2 | CLASS3
:-------- | :-------- | :-------- | :--------
001 | 4 | 3 | 1
002 | 2 | Not taken | Not taken
003 | 5 | Not taken | Not taken
004 | Not taken | Not taken | Not taken
005 | Not taken | Not taken | Not taken
db<>fiddle here
This is just conditional aggregation:
select s.studentid,
max(case when r.classid = 1 then r.grade end) as class1_grade,
max(case when r.classid = 2 then r.grade end) as class2_grade,
. . .
from students s left join
registrations r
on r.studentid = s.studentid;
You do have to list the columns explicitly. To avoid that, you need dynamic SQL (execute immediate).
Getting the results with one grade per row is simpler. Use a cross join to generate the rows and a left join to bring in the values:
select s.studentid, c.classid, r.grade
from students s cross join
classes c left join
registrations r
on r.studentid = s.studentid and r.classid = c.classid;

A better way to aggregate into a default value

For this example I have three tables (individual, business, and ind_to_business). Individual has information on people. Business has information on businesses. And ind_to_business has information on which people are linked to which business. Here are their DDL:
CREATE TABLE individual
(
ID INTEGER PRIMARY KEY,
NAME VARCHAR2(100) NOT NULL,
ENTERPRISE_ID VARCHAR2(25) NOT NULL UNIQUE
);
CREATE TABLE business
(
ID INTEGER PRIMARY KEY,
NAME VARCHAR2(100) NOT NULL,
ENTERPRISE_ID VARCHAR2(25) NOT NULL UNIQUE
);
CREATE TABLE ind_to_business
(
ID INTEGER PRIMARY KEY,
IND_ID REFERENCES individual(id),
BUS_ID REFERENCES business(id),
START_DT DATE NOT NULL,
END_DT DATE
);
I'm looking for the best way to display one row for each person. If they are linked to one business, I want to display the the business's ENTERPRISE_ID. If they are linked to more than one business, I want to display the default value 'Multiple'. They will always be linked to a business, so there is no LEFT JOIN necessary. They can also be linked to a business more than once (Leaving and coming back). Multiple records for the same business would be aggregated.
So for the following sample data:
Individual:
+----+------------+---------------+
| ID | NAME | ENTERPRISE_ID |
+----+------------+---------------+
| 1 | John Smith | 53a23B7 |
| 2 | Jane Doe | 63f2a35 |
+----+------------+---------------+
Business:
+----+----------+---------------+
| ID | NAME | ENTERPRISE_ID |
+----+----------+---------------+
| 3 | ABC Corp | 2a34d9b |
| 4 | XYZ Inc | 34bf21e |
+----+----------+---------------+
ind_to_business
+----+--------+--------+-------------+-------------+
| ID | IND_ID | BUS_ID | START_DT | END_DT |
+----+--------+--------+-------------+-------------+
| 5 | 1 | 3 | 01-JAN-2000 | 31-DEC-2002 |
| 6 | 1 | 3 | 01-JAN-2015 | |
| 7 | 2 | 3 | 01-JAN-2000 | |
| 8 | 2 | 4 | 01-MAR-2006 | 05-JUN-2010 |
| 9 | 2 | 4 | 15-DEC-2019 | |
+----+--------+--------+-------------+-------------+
I would expect the following output:
+---------+------------+------------+
| IND_ID | NAME | LINKED_BUS |
+---------+------------+------------+
| 53a23B7 | John Smith | 2a34d9b |
| 63f2a35 | Jane Doe | Multiple |
+---------+------------+------------+
Here is my current query:
SELECT DISTINCT
sub.ind_id,
sub.name,
DECODE(sub.bus_count, 1, sub.bus_id, 'Multiple') AS LINKED_BUS
FROM (SELECT i.enterprise_id AS IND_ID,
i.name,
b.enterprise_id AS BUS_ID,
COUNT(DISTINCT b.enterprise_id) OVER (PARTITION BY i.id) AS BUS_COUNT
FROM individual i
INNER JOIN ind_to_business i2b ON i.id = i2b.ind_id
INNER JOIN business b ON i2b.bus_id = b.id) sub;
My query works, but this is running on a large dataset and taking a long time to run. I'm wondering if anyone has any ideas on how improve this so that there isn't so much wasted processing (i.e Needing to do a DISTINCT on the final result or doing COUNT(DISTINCT) in the inline view only to use that value in the DECODE above).
I've also created a DBFiddle for this question. (Link)
Thanks in advance for any input.
You could try and use a correlated subquery. This removes the need for outer distinct:
SELECT
i.enterprise_id ind_id,
i.name,
(
SELECT DECODE(COUNT(DISTINCT b.enterprise_id), 1, MIN(bus_id), 'Multiple')
FROM ind_to_business i2b
INNER JOIN business b ON i2b.bus_id = b.id
WHERE i2b.ind_id = i.id
) linked_bus
FROM individual i
You can join with the aggregated ind_to_business per individual. One way to do this:
select i.id, i.name, coalesce(b.enterprise_id, 'Multiple')
from individual i
join
(
select
ind_id,
case when min(bus_id) = max(bus_id) then min(bus_id) else null end as bus_id
from ind_to_business
group by ind_id
) ib on ib.ind_id = i.id
left join business b on b.id = ib.bus_id
order by i.id;
First you should sub-query to get all needed dimensions and then do all your final aggregation using CASE statement.
select
ind_id,
name,
case
when count(*) > 1 then 'Multiple'
else ind_id
end as linked_bus
from
(
select
distinct i.enterprise_id as ind_id,
i.name,
b.enterprise_id as bus_id
from individual i
join ind_to_business i2b
on i.id = i2b.ind_id
join business b
on i2b.bus_id = b.id
) vals
group by
ind_id,
name
order by
ind_id
No need of using DISTINCT twice. You could use subquery factoring and put the in-line view in WITH clause, and make the data set DISTINCT in the subquery itself.
WITH data AS
(
SELECT distinct
i.enterprise_id AS IND_ID,
i.name,
b.enterprise_id AS BUS_ID
FROM individual i
JOIN ind_to_business i2b ON i.id = i2b.ind_id
JOIN business b ON i2b.bus_id = b.id
)
SELECT ind_id,
name,
case
when count(*) = 1 then MIN(bus_id)
else 'Multiple'
end AS LINKED_BUS
FROM data
GROUP BY ind_id, name;
IND_ID NAME LINKED_BUS
---------- ---------- -------------------------
53a23B7 John Smith 2a34d9b
63f2a35 Jane Doe Multiple