Left outer join on aggregate queries - sql

So I have two payment tables that I want to compare in a Oracle SQL DB. I want to compare the the total payments using the location and invoice and total payments. It's more comlex then this but basically it is:
select
tbl1.location,
tbl1.invoice,
Sum(tbl1.payments),
Sum(tbl2.payments)
From
tbl1
left outer join tbl2 on
tbl1.location = tbl2.location
and tbl1.invoice = tbl2.invoice
group by
(tbl1.location,tbl1.invoice)
I want the left outer join because in addition to comparing payment amounts, I want see check all orders in tbl1 that may not exist in tbl2.
The issue is that there is that there is multiple records for each order (location & invoice) in both tables (not the same number of records necessarily ie 2 in tbl1 to 1 in tbl2 or vice versa) but the total payments for each order (location & invoice) should match. So just doing a direct join gives me a cartesian product.
So I am thinking I could do two queries, first aggregating the total payments by store & invoice for each and then do a join on those results because in the aggregate results, I would only have one record for each order (store & invoice). But I don't know how to do this. I've tried several subqueries but can't seem the shake the cartesian product. I'd like to be able to do this in one query as opposed to creating tables and joining on those as this will be ongoing.
Thanks in advance for any help.

You can use the With statement to create the two querys and join then as you said. I will put just the sintaxe and if you need more help just ask. Thats because you didn't provide full details on your tables. So I will just guess on my answer.
WITH tmpTableA as (
select
tbl1.location,
tbl1.invoice,
Sum(tbl1.payments) totalTblA
From
tbl1
group by
tbl1.location,
tbl1.invoice
),
tmpTableB as (
select
tbl2.location,
tbl2.invoice,
Sum(tbl2.payments) totalTblB
From
tbl2
group by
tbl2.location,
tbl2.invoice
)
Select tmpTableA.location, tmpTableA.invoice, tmpTableA.totalTblA,
tmpTableB.location, tmpTableB.invoice, tmpTableB.totalTblB
from tmpTableA, tmpTableB
where tmpTableA.location = tmpTableB.location (+)
and tmpTableA.invoice = tmpTableB.invoice (+)
The (+) operator is the left join operator for Oracle Database (Of course, you can use the LEFT JOIN statements if you prefer )

Two other options:
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE tbl1 ( id, location, invoice, payments ) AS
SELECT 1, 'a', 1, 1 FROM DUAL
UNION ALL SELECT 2, 'a', 1, 1 FROM DUAL
UNION ALL SELECT 3, 'a', 1, 1 FROM DUAL
UNION ALL SELECT 4, 'a', 1, 1 FROM DUAL
UNION ALL SELECT 5, 'a', 1, 1 FROM DUAL
UNION ALL SELECT 6, 'a', 2, 1 FROM DUAL
UNION ALL SELECT 7, 'a', 2, 1 FROM DUAL
UNION ALL SELECT 8, 'a', 2, 1 FROM DUAL
UNION ALL SELECT 9, 'b', 1, 1 FROM DUAL
UNION ALL SELECT 10, 'b', 2, 1 FROM DUAL;
CREATE TABLE tbl2 ( id, location, invoice, payments ) AS
SELECT 1, 'a', 1, 1 FROM DUAL
UNION ALL SELECT 2, 'a', 1, 1 FROM DUAL
UNION ALL SELECT 3, 'a', 1, 1 FROM DUAL
UNION ALL SELECT 4, 'a', 2, 1 FROM DUAL
UNION ALL SELECT 5, 'a', 2, 1 FROM DUAL
UNION ALL SELECT 6, 'b', 1, 1 FROM DUAL
UNION ALL SELECT 7, 'b', 1, 1 FROM DUAL
UNION ALL SELECT 8, 'b', 1, 1 FROM DUAL
UNION ALL SELECT 9, 'b', 1, 1 FROM DUAL
UNION ALL SELECT 10, 'b', 1, 1 FROM DUAL;
Query 1:
This one uses a correlated sub-query to calculate the total for the second table:
SELECT location,
invoice,
SUM( payments ) AS total_payments_1,
COALESCE( (SELECT SUM( payments )
FROM tbl2 i
WHERE o.location = i.location
AND o.invoice = i.invoice),
0 ) AS total_payments_2
FROM tbl1 o
GROUP BY
location,
invoice
ORDER BY
location,
invoice
Results:
| LOCATION | INVOICE | TOTAL_PAYMENTS_1 | TOTAL_PAYMENTS_2 |
|----------|---------|------------------|------------------|
| a | 1 | 5 | 3 |
| a | 2 | 3 | 2 |
| b | 1 | 1 | 5 |
| b | 2 | 1 | 0 |
Query 2:
This one uses a named sub-query to pre-calculate the totals for table 1 then performs a LEFT OUTER JOIN with the second table and includes the total for table 1 in the group.
Without any indexes then, from the explain plans, Query 1 seems to be much more efficient but your indexes might mean the optimizer finds a better plan.
WITH tbl1_sums AS (
SELECT location,
invoice,
SUM( payments ) AS total_payments_1
FROM tbl1
GROUP BY
location,
invoice
)
SELECT t1.location,
t1.invoice,
t1.total_payments_1,
COALESCE( SUM( t2.payments ), 0 ) AS total_payments_2
FROM tbl1_sums t1
LEFT OUTER JOIN
tbl2 t2
ON ( t1.location = t2.location
AND t1.invoice = t2.invoice)
GROUP BY
t1.location,
t1.invoice,
t1.total_payments_1
ORDER BY
t1.location,
t1.invoice
Results:
| LOCATION | INVOICE | TOTAL_PAYMENTS_1 | TOTAL_PAYMENTS_2 |
|----------|---------|------------------|------------------|
| a | 1 | 5 | 3 |
| a | 2 | 3 | 2 |
| b | 1 | 1 | 5 |
| b | 2 | 1 | 0 |

Sorry, my first answer was wrong. Thank you for providing the sqlfiddle, MT0.
The point that i missed is that you need to sum up the payments on each table first, so there's only one line left in each, then join them. This is what MT0 does in his statements.
If you want a solution that looks more "symmetric", try:
select A.location, A.invoice, B.total sum1, C.total sum2
from (select distinct location, invoice from tbl1) A
left outer join (select location, invoice, sum(payments) as total from tbl1 group by location, invoice) B on A.location=B.location and A.invoice=B.invoice
left outer join (select location, invoice, sum(payments) as total from tbl2 group by location, invoice) C on A.location=C.location and A.invoice=C.invoice
which results in
LOCATION INVOICE SUM1 SUM2
a 2 3 2
a 1 5 3
b 1 1 5
b 2 1 (null)

Related

Subqueries with Group By

I have looked around on the internet for a while for a way to make this query work but am unable to work it out so far. I am trying to return the item descriptions and the quantity of items sold, This is what I have got at the moment:
SELECT itemdesc, quantity, (SELECT COUNT(quantity) FROM invoiceitem
WHERE invoiceitem.itemno = item.itemno
GROUP BY COUNT(invoiceitem.quantity)) Quantity
FROM item;
I am very lost at the moment, not sure if I am even linking the right tables together, can provide an ER Diagram if it helps, any help would be greatly appreciated, thankyou.
ANSWER:
SELECT item.itemdesc, (SELECT SUM(invoiceitem.quantity) FROM invoiceitem
WHERE invoiceitem.itemno = item.itemno
GROUP BY item.itemdesc) Quantity
FROM item
ORDER BY quantity;
Thankyou all!
Your outer query:
SELECT itemdesc,
quantity
/* ignoring the subquery */
FROM item;
Will not work as the item table does not have a quantity column.
If you intended to use the itemprice column then your query would be:
SELECT itemdesc,
itemprice,
( SELECT COUNT(quantity)
FROM invoiceitem
WHERE invoiceitem.itemno = item.itemno
) AS Quantity
FROM item
Which, for the sample data:
CREATE TABLE item (
itemno PRIMARY KEY,
itemdesc,
itemprice
) AS
SELECT 1, 'ItemA', 1 FROM DUAL UNION ALL
SELECT 2, 'ItemB', 2 FROM DUAL UNION ALL
SELECT 3, 'ItemC', 3 FROM DUAL UNION ALL
SELECT 4, 'ItemD', 4 FROM DUAL UNION ALL
SELECT 5, 'ItemE', 5 FROM DUAL;
CREATE TABLE invoiceitem( itemno, quantity ) AS
SELECT 1, 10 FROM DUAL UNION ALL
SELECT 1, 20 FROM DUAL UNION ALL
SELECT 1, 30 FROM DUAL UNION ALL
SELECT 3, 40 FROM DUAL UNION ALL
SELECT 4, 50 FROM DUAL UNION ALL
SELECT 5, NULL FROM DUAL;
Outputs:
ITEMDESC | ITEMPRICE | QUANTITY
:------- | --------: | -------:
ItemA | 1 | 3
ItemB | 2 | 0
ItemC | 3 | 1
ItemD | 4 | 1
ItemE | 5 | 0
And equivalent query using a join would be (assuming that item.itemno is a primary key):
SELECT MAX( i.itemdesc ) AS itemdesc,
MAX( i.itemprice ) AS itemprice,
COUNT(ii.quantity) AS Quantity
FROM item i
LEFT OUTER JOIN invoiceitem ii
ON ( ii.itemno = i.itemno )
GROUP BY i.itemno
You need to use LEFT OUTER JOIN rather than INNER JOIN to join the corresponding rows with zero items in the invoiceitem table.
db<>fiddle here
Group by like below should work. Please check.
select
item.itemdesc, count(invoiceitem.quantity) Quantity
from
item item
join invoiceitem invoiceitem on item.itemno = invoiceitem.itemno
group by
item.itemdesc
Thanks

SQL report - Matching percent value with a number

I have a small issue with my report and I need to know if its even possible to do it?
Im using Oracle12c and the tool OBIEE, im trying to create a custom column with numbers values (1 and 2) that are matching my results from my "Percent" column in a way I described below.
Here is my results in table:
I will give u an example of how it should work:
Emilian is an owner of few customers, the customers have their annual revenue listed and the column next to it its the Percent value of the total customer revenue for Emilian. Now, in my custom column I need to show "1" for customers that contribute more than (or exact) 80% of his total and "2" for the rest. So in Emilian Case, his first two customers will be "1" since 78% + 14% is already above 80% and the rest will be "2". For other Owners that only have one customer, all of them logically would be matched with "1" since their contribution is 100%
Hope I made this clear, will be veery grateful for the help with coding it :)
Alex
There's probably a much more efficient way to do this. I built up what you need to get at with a series of sub-selects. This still doesn't handle the equal percents, but you said that isn't an expected problem. I'd still watch out for it.
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE t1 ( ownerId int, customerId int, revenue int ) ;
INSERT INTO t1 ( ownerid, customerid, revenue )
SELECT 1, 1, 99 FROM dual UNION ALL
SELECT 1, 2, 200 FROM dual UNION ALL
SELECT 1, 3, 300 FROM dual UNION ALL
SELECT 1, 4, 400 FROM dual UNION ALL
SELECT 2, 5, 100 FROM dual UNION ALL
SELECT 2, 6, 100 FROM dual UNION ALL
SELECT 2, 7, 200 FROM dual UNION ALL
SELECT 2, 8, 600 FROM dual UNION ALL
SELECT 3, 9, 100 FROM dual UNION ALL
SELECT 3, 10, 900 FROM dual UNION ALL
SELECT 4, 11, 1000 FROM dual UNION ALL
SELECT 5, 12, 1000 FROM dual UNION ALL
SELECT 6, 13, 200 FROM dual UNION ALL
SELECT 6, 14, 200 FROM dual UNION ALL
SELECT 6, 15, 200 FROM dual UNION ALL
SELECT 6, 16, 200 FROM dual UNION ALL
SELECT 6, 17, 200 FROM dual UNION ALL
SELECT 42, 736784, 1480000 FROM dual UNION ALL
SELECT 42, 736580, 280160 FROM dual UNION ALL
SELECT 42, 1040137, 112486 FROM dual UNION ALL
SELECT 42, 738685, 22903 FROM dual UNION ALL
SELECT 42, 736781, 56 FROM dual
;
Query 1:
SELECT s3.ownerID, s3.customerID, s3.revenue, s3.OwnerRevenue
, CAST(s3.customerRevPct AS decimal(5,2)) AS customerRevPct
, CASE WHEN s3.PctRT < 80 OR s3.custCount = 1 THEN 1 ELSE 2 END AS customCol
/* Do the running pcts add up to 80+? 1 customer = 100% == 1. What if all are pcts are equal? */
FROM (
SELECT s2.*
, 100-SUM(nvl(s2.customerRevPct,0)) OVER (PARTITION BY s2.ownerID ORDER BY s2.customerRevPct, s2.customerID RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pctRT
, COUNT(*) OVER (PARTITION BY s2.ownerID ORDER BY (s2.ownerID) ) AS custCount /* Is there only 1 customer? */
FROM (
SELECT s1.*
, ( ( ( s1.revenue * 1.0 ) / s1.ownerRevenue ) * 100 ) AS customerRevPct
FROM (
SELECT t1.ownerID, t1.customerID, t1.revenue
, SUM(t1.revenue) OVER ( PARTITION BY t1.ownerID ) AS ownerRevenue
FROM t1
) s1
) s2
) s3
WHERE ownerID = 42 /* REMOVE THIS LINE - TESTING ONLY */
ORDER BY s3.ownerID, s3.customerRevPct DESC
Results:
| OWNERID | CUSTOMERID | REVENUE | OWNERREVENUE | CUSTOMERREVPCT | CUSTOMCOL |
|---------|------------|---------|--------------|----------------|-----------|
| 42 | 736784 | 1480000 | 1895605 | 78.08 | 1 |
| 42 | 736580 | 280160 | 1895605 | 14.78 | 1 |
| 42 | 1040137 | 112486 | 1895605 | 5.93 | 2 |
| 42 | 738685 | 22903 | 1895605 | 1.21 | 2 |
| 42 | 736781 | 56 | 1895605 | 0 | 2 |
EDIT: I changed the Fiddle to illustrate your data example.
create table custrev(owner varchar2(100), cust_id number, revenue number);
insert into custrev values('Emilian',1,1480000);
insert into custrev values('Emilian',2,280160);
insert into custrev values('Emilian',3,112486);
insert into custrev values('Emilian',4,22903);
insert into custrev values('Emilian',5,56);
insert into custrev values('Andy',6,1378);
insert into custrev values('Sandy',7,560000);
commit;
Below is the SQL for your requirement.
select owner, cust_id, revenue, pct,
case when pct = 100 then 1
when flg is null or flg < 80 then 1
else 2 end flag_col
from (select owner, cust_id, revenue, pct,--cumulative_sum,
lag(cumulative_sum) over(partition by owner
order by revenue desc) flg
from (select owner, cust_id, revenue, pct,
sum(pct) over(partition by owner
order by revenue desc
rows between unbounded preceding
and current row) cumulative_sum
from (select owner, cust_id, revenue,
round(ratio_to_report(revenue) over(partition by owner)*100,2) pct
from custrev)
)
)
order by owner, revenue desc;
Output:
OWNER CUST_ID REVENUE PCT FLAG_COL
Andy 6 1378 100 1
Emilian 1 1480000 78.08 1
Emilian 2 280160 14.78 1
Emilian 3 112486 5.93 2
Emilian 4 22903 1.21 2
Emilian 5 56 0 2
Sandy 7 560000 100 1
Alex,
OBIEE is based on models. Not on SQL.
So sorry to say this but the SQL code will help you exactly zero...

Multiple repeated structures in Bigquery

Following up on this- Bigquery multiple unnest in a single select
We are using bigquery as our warehousing solution and are trying to push the limit by trying to consolidate. A simple example is client tracking. Client generates revenue, has several touch points on our site, and independently maintains several accounts with us. For a business user wanting to do behavior analysis on clients, they want to track visits, revenue generated and how their accounts impacT retention, we are trying to evaluate if a nested structure would work for us
Below is an example. I have 3 tables.
Clients (C)
C_Key| C_Name
-----|------
1 | ABC
2 | DEF
Accounts (A)
A_Key | C_Key
11 | 1
12 | 1
21 | 2
22 | 2
23 | 2
Revenue (R)
R_Key | C_Key | Revenue
-------|---------|----------
11 | 1 | $10
12 | 1 | $20
21 | 2 | $10
I used array_agg to combine these three into a single nested table that looks like below:
{Client,
Accounts:
[{
}],
Revenue:
[{
}]
}
I want to be able to use multiple unnests in a single query like below
Select client, Count Distinct(Accounts) and SUM(Revenue) from <single nested
table>, unnest accounts, unnest revenue
The expected output are 2 rows,
1,2,$30
2,3,$10
However, having multiple unnests in the same query results in a cross join.
The actual output is
1,2,$60
2,3,$30
Below is for BigQuery Standard SQL
First let's clarify creation of single nested table
I hope you did something like :
#standardSQL
WITH clients AS (
SELECT 1 AS c_key, 'abc' AS c_name UNION ALL
SELECT 2, 'def'
), accounts AS (
SELECT 11 AS a_key, 1 AS c_key UNION ALL
SELECT 12, 1 UNION ALL
SELECT 21, 2 UNION ALL
SELECT 22, 2 UNION ALL
SELECT 23, 2
), revenue AS (
SELECT 11 AS r_key, 1 AS c_key, 10 AS revenue UNION ALL
SELECT 12, 1, 20 UNION ALL
SELECT 21, 2, 10
), single_nested_table AS (
SELECT x.c_key, x.c_name, accounts, revenue
FROM (
SELECT c.c_key, c_name, ARRAY_AGG(a) AS accounts --, array_agg(r) as revenue
FROM clients AS c
LEFT JOIN accounts AS a ON a.c_key = c.c_key
GROUP BY c.c_key, c_name
) x
JOIN (
SELECT c.c_key, c_name, ARRAY_AGG(r) AS revenue
FROM clients AS c
LEFT JOIN revenue AS r ON r.c_key = c.c_key
GROUP BY c.c_key, c_name
) y
ON x.c_key = y.c_key
)
SELECT *
FROM single_nested_table
which creates table as
Row c_key c_name accounts.a_key accounts.c_key revenue.r_key revenue.c_key revenue.revenue
1 1 abc 11 1 11 1 10
12 1 12 1 20
2 2 def 21 2 21 2 10
22 2
23 2
Not that important what exactly query you used to create that table - but do important to clear the structure / schema!
So now, back to your question
#standardSQL
WITH clients AS (
SELECT 1 AS c_key, 'abc' AS c_name UNION ALL
SELECT 2, 'def'
), accounts AS (
SELECT 11 AS a_key, 1 AS c_key UNION ALL
SELECT 12, 1 UNION ALL
SELECT 21, 2 UNION ALL
SELECT 22, 2 UNION ALL
SELECT 23, 2
), revenue AS (
SELECT 11 AS r_key, 1 AS c_key, 10 AS revenue UNION ALL
SELECT 12, 1, 20 UNION ALL
SELECT 21, 2, 10
), single_nested_table AS (
SELECT x.c_key, x.c_name, accounts, revenue
FROM (
SELECT c.c_key, c_name, ARRAY_AGG(a) AS accounts --, array_agg(r) as revenue
FROM clients AS c
LEFT JOIN accounts AS a ON a.c_key = c.c_key
GROUP BY c.c_key, c_name
) x
JOIN (
SELECT c.c_key, c_name, ARRAY_AGG(r) AS revenue
FROM clients AS c
LEFT JOIN revenue AS r ON r.c_key = c.c_key
GROUP BY c.c_key, c_name
) y
ON x.c_key = y.c_key
)
SELECT
c_key, c_name,
ARRAY_LENGTH(accounts) AS distinct_accounts,
(SELECT SUM(revenue) FROM UNNEST(revenue)) AS revenue
FROM single_nested_table
this gives what you asked for:
Row c_key c_name distinct_accounts revenue
1 1 abc 2 30
2 2 def 3 10

Selecting groups of rows where at least one row of each group meets a criteria

I'm trying to SELECT groups of rows having one row with a certain criteria.
I've tried it with CASE WHEN statements without any success. Keep in mind this table has hundred of records.
What I'm trying to accomplish is this:
One row of the group must have a subcategory equal to "GAMECONSOLE".
Rows having the same category, description and section form one group.
The ID is different so MIN and MAX does not work either.
ID SECTION DESCRIPTION CATEGORY SUBCATEGORY
21349 14010014 TODDLER TOY GAMECONSOLE
21278 14010014 TODDLER TOY BICYCLE
21431 15020021 TODDLER TOY CHESS
In this example the two first rows should be selected because they form one group and one row of the group is a "GAMECONSOLE".
CASE WHEN is used when you have to take a decision within a column expression. Filtering on row level must be done in a WHERE clause:
SELECT T.id, T.section, T.description, T.category, T.subcategory
FROM
myTable T
INNER JOIN myTable S
ON T.section = S.section AND
T.description = S.description AND
T.category = S.category
WHERE
S.subcategory = 'GAMECONSOLE'
You can join the table with itself on the columns that have to be equal. The table with alias S selects the right subategory. T selects all corresponding rows of the groups.
SELECT a1.ID
, a1.SECTION
, a1.DESCRIPTION
, a1.CATEGORY
, a1.SUBCATEGORY
FROM MyTable a1
INNER JOIN MyTable a2 ON a2.DESCRIPTION = a1.DESCRIPTION
AND a2.CATEGORY = a1.CATEGORY
AND a2.SECTION = a1.SECTION
WHERE a2.SUBCATEGORY = 'GAMECONSOLE'
-- you may want to further filter the Where clause and apply a group by or distinct to get the actual results you are wanting
Your description sounds like:
select
...
from
(
select
...
,sum(case when subcategory = 'GAMECONSOLE' then 1 else 0 end)
over (partition by category, description, section) as cnt
from tab
) dt
where cnt > 0
SELECT *
FROM myTable T
WHERE Section = (SELECT Section
FROM myTable Q
WHERE Q.subcategory = 'GAMECONSOLE')
Using an analytic function you can get the answer without using a self join.
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE TEST( ID, SECTION, DESCRIPTION, CATEGORY, SUBCATEGORY ) AS
SELECT 1, 1, 'TODDLER', 'TOY', 'GAMECONSOLE' FROM DUAL
UNION ALL SELECT 2, 1, 'TODDLER', 'TOY', 'BICYCLE' FROM DUAL
UNION ALL SELECT 3, 2, 'TODDLER', 'TOY', 'CHESS' FROM DUAL
UNION ALL SELECT 4, 3, 'COMPUTERS', 'SOFTWARE', 'BOOK' FROM DUAL
UNION ALL SELECT 5, 4, 'COMPUTERS', 'SOFTWARE', 'SOFTWARE' FROM DUAL
UNION ALL SELECT 6, 5, 'COMPUTERS', 'HARDWARE', 'MONITOR' FROM DUAL
UNION ALL SELECT 7, 6, 'COMPUTERS', 'HARDWARE', 'GAMECONSOLE' FROM DUAL
UNION ALL SELECT 8, 7, 'COMPUTERS', 'HARDWARE', 'KEYBOARD' FROM DUAL
UNION ALL SELECT 9, 8, 'TODDLER', 'BEDDING', 'BED' FROM DUAL
Query 1:
SELECT ID, SECTION, DESCRIPTION, CATEGORY, SUBCATEGORY
FROM (
SELECT t.*,
COUNT( CASE SUBCATEGORY WHEN 'GAMECONSOLE' THEN 1 END ) OVER ( PARTITION BY DESCRIPTION, CATEGORY ) AS HAS_SUBCATEGORY
FROM TEST t
)
WHERE HAS_SUBCATEGORY > 0
Results:
| ID | SECTION | DESCRIPTION | CATEGORY | SUBCATEGORY |
|----|---------|-------------|----------|-------------|
| 8 | 7 | COMPUTERS | HARDWARE | KEYBOARD |
| 7 | 6 | COMPUTERS | HARDWARE | GAMECONSOLE |
| 6 | 5 | COMPUTERS | HARDWARE | MONITOR |
| 3 | 2 | TODDLER | TOY | CHESS |
| 2 | 1 | TODDLER | TOY | BICYCLE |
| 1 | 1 | TODDLER | TOY | GAMECONSOLE |
Try
SELECT * FROM <TABLE_NAME> WHERE SUBCATEGORY like "GAMECONSOLE";
or
SELECT * FROM <TABLE_NAME> WHERE SUBCATEGORY = "GAMECONSOLE";
Replace <TABLE_NAME> with the actual table name.
Further readings:
https://dev.mysql.com/doc/refman/5.0/en/select.html

SQL - Select what is not in second table from assocciative

I have a table "person", an associative table "person_vaccination" and a table "vaccination".
I want to get the person who has missing vaccinations but so far I only got it to work when I have the id.
SELECT vac.VACCINATION_Name
FROM VACCINATION vac
WHERE vac.VACCINATION_NUMBER NOT IN
(SELECT v.VACCINATION_NUMBER
FROM PERSON per
Join PERSON_VACCINATION pv ON per.PERSON_NUMBER = pv.PERSON_NUMBER
JOIN VACCINATION v ON pv.VACCINATION_NUMBER = v.VACCINATION_NUMBER
WHERE per.PERSON_NUMBER = 6)
It works fine but how do I get all the people missing their vaccinations? (ex:
555 , Vacccination 1
555 , Vacccination 2
666 , Vacccination 1)
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE VACCINATION ( VACCINATION_NUMBER, VACCINATION_NAME ) AS
SELECT 1, 'Vac 1' FROM DUAL
UNION ALL SELECT 2, 'Vac 2' FROM DUAL
UNION ALL SELECT 3, 'Vac 3' FROM DUAL
UNION ALL SELECT 4, 'Vac 4' FROM DUAL;
CREATE TABLE PERSON_VACCINATION ( VACCINATION_NUMBER, PERSON_NUMBER ) AS
SELECT 1, 1 FROM DUAL
UNION ALL SELECT 2, 1 FROM DUAL
UNION ALL SELECT 3, 1 FROM DUAL
UNION ALL SELECT 4, 1 FROM DUAL
UNION ALL SELECT 1, 2 FROM DUAL
UNION ALL SELECT 2, 2 FROM DUAL
UNION ALL SELECT 3, 2 FROM DUAL;
CREATE TABLE PERSON ( PERSON_NUMBER, PERSON_NAME ) AS
SELECT 1, 'P1' FROM DUAL
UNION ALL SELECT 2, 'P2' FROM DUAL
UNION ALL SELECT 3, 'P3' FROM DUAL;
Query 1:
SELECT p.PERSON_NAME,
v.VACCINATION_NAME
FROM VACCINATION v
CROSS JOIN
PERSON p
WHERE NOT EXISTS ( SELECT 1
FROM PERSON_VACCINATION pv
WHERE pv.VACCINATION_NUMBER = v.VACCINATION_NUMBER
AND pv.PERSON_NUMBER = p.PERSON_NUMBER )
ORDER BY p.PERSON_NAME,
p.PERSON_NUMBER,
v.VACCINATION_NAME,
v.VACCINATION_NUMBER
Results:
| PERSON_NAME | VACCINATION_NAME |
|-------------|------------------|
| P2 | Vac 4 |
| P3 | Vac 1 |
| P3 | Vac 2 |
| P3 | Vac 3 |
| P3 | Vac 4 |
Instead of an INNER JOIN, you should use LEFT JOIN.
Take a look at this link: http://www.w3schools.com/sql/sql_join_left.asp
If you are after people with no vaccinations at all, then you can use a LEFT OUTER JOIN between PERSON and PERSON_VACCINATION, then find all entries where a PERSON_VACCINATION column is NULL.
SELECT PERSON_NUMBER
FROM PERSON P
LEFT OUTER JOIN
PERSON_VACCINATION PV
ON P.PERSON_NUMBER = PV.PERSON_NUMBER
WHERE PV.PERSON_NUMBER IS NULL
If you are unfamiliar with LEFT OUTER JOIN, it tries to find matching rows in PERSON_VACCINATION for each row in PERSON. If there are no matching rows, it leaves the PERSON row in the result set, and shows NULL values for all columns in the PERSON_VACCINATION table.
If you are looking for a list of people and the vaccinations they do not have then #MT0's answer is correct. You need to create a result set containing all possible combinations of PERSON and VACCINATION (a Cross Join), then check which of those combinations actually exist in PERSON_VACCINATION. Any entry that does not exist is are your missing vaccinations.