Following up on this- Bigquery multiple unnest in a single select
We are using bigquery as our warehousing solution and are trying to push the limit by trying to consolidate. A simple example is client tracking. Client generates revenue, has several touch points on our site, and independently maintains several accounts with us. For a business user wanting to do behavior analysis on clients, they want to track visits, revenue generated and how their accounts impacT retention, we are trying to evaluate if a nested structure would work for us
Below is an example. I have 3 tables.
Clients (C)
C_Key| C_Name
-----|------
1 | ABC
2 | DEF
Accounts (A)
A_Key | C_Key
11 | 1
12 | 1
21 | 2
22 | 2
23 | 2
Revenue (R)
R_Key | C_Key | Revenue
-------|---------|----------
11 | 1 | $10
12 | 1 | $20
21 | 2 | $10
I used array_agg to combine these three into a single nested table that looks like below:
{Client,
Accounts:
[{
}],
Revenue:
[{
}]
}
I want to be able to use multiple unnests in a single query like below
Select client, Count Distinct(Accounts) and SUM(Revenue) from <single nested
table>, unnest accounts, unnest revenue
The expected output are 2 rows,
1,2,$30
2,3,$10
However, having multiple unnests in the same query results in a cross join.
The actual output is
1,2,$60
2,3,$30
Below is for BigQuery Standard SQL
First let's clarify creation of single nested table
I hope you did something like :
#standardSQL
WITH clients AS (
SELECT 1 AS c_key, 'abc' AS c_name UNION ALL
SELECT 2, 'def'
), accounts AS (
SELECT 11 AS a_key, 1 AS c_key UNION ALL
SELECT 12, 1 UNION ALL
SELECT 21, 2 UNION ALL
SELECT 22, 2 UNION ALL
SELECT 23, 2
), revenue AS (
SELECT 11 AS r_key, 1 AS c_key, 10 AS revenue UNION ALL
SELECT 12, 1, 20 UNION ALL
SELECT 21, 2, 10
), single_nested_table AS (
SELECT x.c_key, x.c_name, accounts, revenue
FROM (
SELECT c.c_key, c_name, ARRAY_AGG(a) AS accounts --, array_agg(r) as revenue
FROM clients AS c
LEFT JOIN accounts AS a ON a.c_key = c.c_key
GROUP BY c.c_key, c_name
) x
JOIN (
SELECT c.c_key, c_name, ARRAY_AGG(r) AS revenue
FROM clients AS c
LEFT JOIN revenue AS r ON r.c_key = c.c_key
GROUP BY c.c_key, c_name
) y
ON x.c_key = y.c_key
)
SELECT *
FROM single_nested_table
which creates table as
Row c_key c_name accounts.a_key accounts.c_key revenue.r_key revenue.c_key revenue.revenue
1 1 abc 11 1 11 1 10
12 1 12 1 20
2 2 def 21 2 21 2 10
22 2
23 2
Not that important what exactly query you used to create that table - but do important to clear the structure / schema!
So now, back to your question
#standardSQL
WITH clients AS (
SELECT 1 AS c_key, 'abc' AS c_name UNION ALL
SELECT 2, 'def'
), accounts AS (
SELECT 11 AS a_key, 1 AS c_key UNION ALL
SELECT 12, 1 UNION ALL
SELECT 21, 2 UNION ALL
SELECT 22, 2 UNION ALL
SELECT 23, 2
), revenue AS (
SELECT 11 AS r_key, 1 AS c_key, 10 AS revenue UNION ALL
SELECT 12, 1, 20 UNION ALL
SELECT 21, 2, 10
), single_nested_table AS (
SELECT x.c_key, x.c_name, accounts, revenue
FROM (
SELECT c.c_key, c_name, ARRAY_AGG(a) AS accounts --, array_agg(r) as revenue
FROM clients AS c
LEFT JOIN accounts AS a ON a.c_key = c.c_key
GROUP BY c.c_key, c_name
) x
JOIN (
SELECT c.c_key, c_name, ARRAY_AGG(r) AS revenue
FROM clients AS c
LEFT JOIN revenue AS r ON r.c_key = c.c_key
GROUP BY c.c_key, c_name
) y
ON x.c_key = y.c_key
)
SELECT
c_key, c_name,
ARRAY_LENGTH(accounts) AS distinct_accounts,
(SELECT SUM(revenue) FROM UNNEST(revenue)) AS revenue
FROM single_nested_table
this gives what you asked for:
Row c_key c_name distinct_accounts revenue
1 1 abc 2 30
2 2 def 3 10
Related
I have a table like this one:
+------+------+
| ID | Cust |
+------+------+
| 1 | A |
| 1 | A |
| 1 | B |
| 1 | B |
| 2 | A |
| 2 | A |
| 2 | A |
| 2 | B |
| 3 | A |
| 3 | B |
| 3 | B |
+------+------+
I would like to get the IDs that have at least two times A and two times B. So in my example, the query should return only the ID 1,
Thanks!
In MySQL:
SELECT id
FROM test
GROUP BY id
HAVING GROUP_CONCAT(cust ORDER BY cust SEPARATOR '') LIKE '%aa%bb%'
In Oracle
WITH cte AS ( SELECT id, LISTAGG(cust, '') WITHIN GROUP (ORDER BY cust) custs
FROM test
GROUP BY id )
SELECT id
FROM cte
WHERE custs LIKE '%aa%bb%'
I would just use two levels of aggregation:
select id
from (select id, cust, count(*) as cnt
from t
where cust in ('A', 'B')
group by id, cust
) ic
group by id
having count(*) = 2 and -- both customers are in the result set
min(cnt) >= 2 -- and there are at least two instances
This is one option; lines #1 - 13 represent sample data. Query you might be interested in begins at line #14.
SQL> with test (id, cust) as
2 (select 1, 'a' from dual union all
3 select 1, 'a' from dual union all
4 select 1, 'b' from dual union all
5 select 1, 'b' from dual union all
6 select 2, 'a' from dual union all
7 select 2, 'a' from dual union all
8 select 2, 'a' from dual union all
9 select 2, 'b' from dual union all
10 select 3, 'a' from dual union all
11 select 3, 'b' from dual union all
12 select 3, 'b' from dual
13 )
14 select id
15 from (select
16 id,
17 sum(case when cust = 'a' then 1 else 0 end) suma,
18 sum(case when cust = 'b' then 1 else 0 end) sumb
19 from test
20 group by id
21 )
22 where suma = 2
23 and sumb = 2;
ID
----------
1
SQL>
You can use group by and having for the relevant Cust ('A' , 'B')
And query twice (I chose to use with to avoid multiple selects and to cache it)
with more_than_2 as
(
select Id, Cust, count(*) c
from tab
where Cust in ('A', 'B')
group by Id, Cust
having count(*) >= 2
)
select *
from tab
where exists ( select 1 from more_than_2 where more_than_2.Id = tab.Id and more_than_2.Cust = 'A')
and exists ( select 1 from more_than_2 where more_than_2.Id = tab.Id and more_than_2.Cust = 'B')
What you want is a perfect candidate for match_recognize. Here you go:
select id_ as id from t
match_recognize
(
order by id, cust
measures id as id_
pattern (A {2, } B {2, })
define A as cust = 'A',
B as cust = 'B'
)
Output:
Regards,
Ranagal
I have a small issue with my report and I need to know if its even possible to do it?
Im using Oracle12c and the tool OBIEE, im trying to create a custom column with numbers values (1 and 2) that are matching my results from my "Percent" column in a way I described below.
Here is my results in table:
I will give u an example of how it should work:
Emilian is an owner of few customers, the customers have their annual revenue listed and the column next to it its the Percent value of the total customer revenue for Emilian. Now, in my custom column I need to show "1" for customers that contribute more than (or exact) 80% of his total and "2" for the rest. So in Emilian Case, his first two customers will be "1" since 78% + 14% is already above 80% and the rest will be "2". For other Owners that only have one customer, all of them logically would be matched with "1" since their contribution is 100%
Hope I made this clear, will be veery grateful for the help with coding it :)
Alex
There's probably a much more efficient way to do this. I built up what you need to get at with a series of sub-selects. This still doesn't handle the equal percents, but you said that isn't an expected problem. I'd still watch out for it.
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE t1 ( ownerId int, customerId int, revenue int ) ;
INSERT INTO t1 ( ownerid, customerid, revenue )
SELECT 1, 1, 99 FROM dual UNION ALL
SELECT 1, 2, 200 FROM dual UNION ALL
SELECT 1, 3, 300 FROM dual UNION ALL
SELECT 1, 4, 400 FROM dual UNION ALL
SELECT 2, 5, 100 FROM dual UNION ALL
SELECT 2, 6, 100 FROM dual UNION ALL
SELECT 2, 7, 200 FROM dual UNION ALL
SELECT 2, 8, 600 FROM dual UNION ALL
SELECT 3, 9, 100 FROM dual UNION ALL
SELECT 3, 10, 900 FROM dual UNION ALL
SELECT 4, 11, 1000 FROM dual UNION ALL
SELECT 5, 12, 1000 FROM dual UNION ALL
SELECT 6, 13, 200 FROM dual UNION ALL
SELECT 6, 14, 200 FROM dual UNION ALL
SELECT 6, 15, 200 FROM dual UNION ALL
SELECT 6, 16, 200 FROM dual UNION ALL
SELECT 6, 17, 200 FROM dual UNION ALL
SELECT 42, 736784, 1480000 FROM dual UNION ALL
SELECT 42, 736580, 280160 FROM dual UNION ALL
SELECT 42, 1040137, 112486 FROM dual UNION ALL
SELECT 42, 738685, 22903 FROM dual UNION ALL
SELECT 42, 736781, 56 FROM dual
;
Query 1:
SELECT s3.ownerID, s3.customerID, s3.revenue, s3.OwnerRevenue
, CAST(s3.customerRevPct AS decimal(5,2)) AS customerRevPct
, CASE WHEN s3.PctRT < 80 OR s3.custCount = 1 THEN 1 ELSE 2 END AS customCol
/* Do the running pcts add up to 80+? 1 customer = 100% == 1. What if all are pcts are equal? */
FROM (
SELECT s2.*
, 100-SUM(nvl(s2.customerRevPct,0)) OVER (PARTITION BY s2.ownerID ORDER BY s2.customerRevPct, s2.customerID RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pctRT
, COUNT(*) OVER (PARTITION BY s2.ownerID ORDER BY (s2.ownerID) ) AS custCount /* Is there only 1 customer? */
FROM (
SELECT s1.*
, ( ( ( s1.revenue * 1.0 ) / s1.ownerRevenue ) * 100 ) AS customerRevPct
FROM (
SELECT t1.ownerID, t1.customerID, t1.revenue
, SUM(t1.revenue) OVER ( PARTITION BY t1.ownerID ) AS ownerRevenue
FROM t1
) s1
) s2
) s3
WHERE ownerID = 42 /* REMOVE THIS LINE - TESTING ONLY */
ORDER BY s3.ownerID, s3.customerRevPct DESC
Results:
| OWNERID | CUSTOMERID | REVENUE | OWNERREVENUE | CUSTOMERREVPCT | CUSTOMCOL |
|---------|------------|---------|--------------|----------------|-----------|
| 42 | 736784 | 1480000 | 1895605 | 78.08 | 1 |
| 42 | 736580 | 280160 | 1895605 | 14.78 | 1 |
| 42 | 1040137 | 112486 | 1895605 | 5.93 | 2 |
| 42 | 738685 | 22903 | 1895605 | 1.21 | 2 |
| 42 | 736781 | 56 | 1895605 | 0 | 2 |
EDIT: I changed the Fiddle to illustrate your data example.
create table custrev(owner varchar2(100), cust_id number, revenue number);
insert into custrev values('Emilian',1,1480000);
insert into custrev values('Emilian',2,280160);
insert into custrev values('Emilian',3,112486);
insert into custrev values('Emilian',4,22903);
insert into custrev values('Emilian',5,56);
insert into custrev values('Andy',6,1378);
insert into custrev values('Sandy',7,560000);
commit;
Below is the SQL for your requirement.
select owner, cust_id, revenue, pct,
case when pct = 100 then 1
when flg is null or flg < 80 then 1
else 2 end flag_col
from (select owner, cust_id, revenue, pct,--cumulative_sum,
lag(cumulative_sum) over(partition by owner
order by revenue desc) flg
from (select owner, cust_id, revenue, pct,
sum(pct) over(partition by owner
order by revenue desc
rows between unbounded preceding
and current row) cumulative_sum
from (select owner, cust_id, revenue,
round(ratio_to_report(revenue) over(partition by owner)*100,2) pct
from custrev)
)
)
order by owner, revenue desc;
Output:
OWNER CUST_ID REVENUE PCT FLAG_COL
Andy 6 1378 100 1
Emilian 1 1480000 78.08 1
Emilian 2 280160 14.78 1
Emilian 3 112486 5.93 2
Emilian 4 22903 1.21 2
Emilian 5 56 0 2
Sandy 7 560000 100 1
Alex,
OBIEE is based on models. Not on SQL.
So sorry to say this but the SQL code will help you exactly zero...
For example my table contains the following data:
ID price
-------------
1 10
1 10
1 20
2 20
2 20
3 30
3 30
4 5
4 5
4 15
So given the example above,
ID price
-------------
1 30
2 20
3 30
4 20
-----------
ID 100
How to write query in oracle? first sum(distinct price) group by id then sum(all price).
I would be very careful with a data structure like this. First, check that all ids have exactly one price:
select id
from table t
group by id
having count(distinct price) > 1;
I think the safest method is to extract a particular price for each id (say the maximum) and then do the aggregation:
select sum(price)
from (select id, max(price) as price
from table t
group by id
) t;
Then, go fix your data so you don't have a repeated additive dimension. There should be a table with one row per id and price (or perhaps with duplicates but controlled by effective and end dates).
The data is messed up; you should not assume that the price is the same on all rows for a given id. You need to check that every time you use the fields, until you fix the data.
first sum(distinct price) group by id then sum(all price)
Looking at your desired output, it seems you also need the final sum(similar to ROLLUP), however, ROLLUP won't directly work in your case.
If you want to format your output in exactly the way you have posted your desired output, i.e. with a header for the last row of total sum, then you could set the PAGESIZE in SQL*Plus.
Using UNION ALL
For example,
SQL> set pagesize 7
SQL> WITH DATA AS(
2 SELECT ID, SUM(DISTINCT price) AS price
3 FROM t
4 GROUP BY id
5 )
6 SELECT to_char(ID) id, price FROM DATA
7 UNION ALL
8 SELECT 'ID' id, sum(price) FROM DATA
9 ORDER BY ID
10 /
ID PRICE
--- ----------
1 30
2 20
3 30
4 20
ID PRICE
--- ----------
ID 100
SQL>
So, you have an additional row in the end with the total SUM of price.
Using ROLLUP
Alternatively, you could use ROLLUP to get the total sum as follows:
SQL> set pagesize 7
SQL> WITH DATA AS
2 ( SELECT ID, SUM(DISTINCT price) AS price FROM t GROUP BY id
3 )
4 SELECT ID, SUM(price) price
5 FROM DATA
6 GROUP BY ROLLUP(id);
ID PRICE
---------- ----------
1 30
2 20
3 30
4 20
ID PRICE
---------- ----------
100
SQL>
First do the DISTINCT and then a ROLLUP
SELECT ID, SUM(price) -- sum of the distinct prices
FROM
(
SELECT DISTINCT ID, price -- distinct prices per ID
FROM tab
) dt
GROUP BY ROLLUP(ID) -- two levels of aggregation, per ID and total sum
SELECT ID,SUM(price) as price
FROM
(SELECT ID,price
FROM TableName
GROUP BY ID,price) as T
GROUP BY ID
Explanation:
The inner query will select different prices for each ids.
i.e.,
ID price
-------------
1 10
1 20
2 20
3 30
4 5
4 15
Then the outer query will select SUM of those prices for each id.
Final Result :
ID price
----------
1 30
2 20
3 30
4 20
Result in SQL Fiddle.
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE MYTABLE ( ID, price ) AS
SELECT 1, 10 FROM DUAL
UNION ALL SELECT 1, 10 FROM DUAL
UNION ALL SELECT 1, 20 FROM DUAL
UNION ALL SELECT 2, 20 FROM DUAL
UNION ALL SELECT 2, 20 FROM DUAL
UNION ALL SELECT 3, 30 FROM DUAL
UNION ALL SELECT 3, 30 FROM DUAL
UNION ALL SELECT 4, 5 FROM DUAL
UNION ALL SELECT 4, 5 FROM DUAL
UNION ALL SELECT 4, 15 FROM DUAL;
Query 1:
SELECT COALESCE( TO_CHAR(ID), 'ID' ) AS ID,
SUM( PRICE ) AS PRICE
FROM ( SELECT DISTINCT ID, PRICE FROM MYTABLE )
GROUP BY ROLLUP ( ID )
ORDER BY ID
Results:
| ID | PRICE |
|----|-------|
| 1 | 30 |
| 2 | 20 |
| 3 | 30 |
| 4 | 20 |
| ID | 100 |
So I have two payment tables that I want to compare in a Oracle SQL DB. I want to compare the the total payments using the location and invoice and total payments. It's more comlex then this but basically it is:
select
tbl1.location,
tbl1.invoice,
Sum(tbl1.payments),
Sum(tbl2.payments)
From
tbl1
left outer join tbl2 on
tbl1.location = tbl2.location
and tbl1.invoice = tbl2.invoice
group by
(tbl1.location,tbl1.invoice)
I want the left outer join because in addition to comparing payment amounts, I want see check all orders in tbl1 that may not exist in tbl2.
The issue is that there is that there is multiple records for each order (location & invoice) in both tables (not the same number of records necessarily ie 2 in tbl1 to 1 in tbl2 or vice versa) but the total payments for each order (location & invoice) should match. So just doing a direct join gives me a cartesian product.
So I am thinking I could do two queries, first aggregating the total payments by store & invoice for each and then do a join on those results because in the aggregate results, I would only have one record for each order (store & invoice). But I don't know how to do this. I've tried several subqueries but can't seem the shake the cartesian product. I'd like to be able to do this in one query as opposed to creating tables and joining on those as this will be ongoing.
Thanks in advance for any help.
You can use the With statement to create the two querys and join then as you said. I will put just the sintaxe and if you need more help just ask. Thats because you didn't provide full details on your tables. So I will just guess on my answer.
WITH tmpTableA as (
select
tbl1.location,
tbl1.invoice,
Sum(tbl1.payments) totalTblA
From
tbl1
group by
tbl1.location,
tbl1.invoice
),
tmpTableB as (
select
tbl2.location,
tbl2.invoice,
Sum(tbl2.payments) totalTblB
From
tbl2
group by
tbl2.location,
tbl2.invoice
)
Select tmpTableA.location, tmpTableA.invoice, tmpTableA.totalTblA,
tmpTableB.location, tmpTableB.invoice, tmpTableB.totalTblB
from tmpTableA, tmpTableB
where tmpTableA.location = tmpTableB.location (+)
and tmpTableA.invoice = tmpTableB.invoice (+)
The (+) operator is the left join operator for Oracle Database (Of course, you can use the LEFT JOIN statements if you prefer )
Two other options:
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE tbl1 ( id, location, invoice, payments ) AS
SELECT 1, 'a', 1, 1 FROM DUAL
UNION ALL SELECT 2, 'a', 1, 1 FROM DUAL
UNION ALL SELECT 3, 'a', 1, 1 FROM DUAL
UNION ALL SELECT 4, 'a', 1, 1 FROM DUAL
UNION ALL SELECT 5, 'a', 1, 1 FROM DUAL
UNION ALL SELECT 6, 'a', 2, 1 FROM DUAL
UNION ALL SELECT 7, 'a', 2, 1 FROM DUAL
UNION ALL SELECT 8, 'a', 2, 1 FROM DUAL
UNION ALL SELECT 9, 'b', 1, 1 FROM DUAL
UNION ALL SELECT 10, 'b', 2, 1 FROM DUAL;
CREATE TABLE tbl2 ( id, location, invoice, payments ) AS
SELECT 1, 'a', 1, 1 FROM DUAL
UNION ALL SELECT 2, 'a', 1, 1 FROM DUAL
UNION ALL SELECT 3, 'a', 1, 1 FROM DUAL
UNION ALL SELECT 4, 'a', 2, 1 FROM DUAL
UNION ALL SELECT 5, 'a', 2, 1 FROM DUAL
UNION ALL SELECT 6, 'b', 1, 1 FROM DUAL
UNION ALL SELECT 7, 'b', 1, 1 FROM DUAL
UNION ALL SELECT 8, 'b', 1, 1 FROM DUAL
UNION ALL SELECT 9, 'b', 1, 1 FROM DUAL
UNION ALL SELECT 10, 'b', 1, 1 FROM DUAL;
Query 1:
This one uses a correlated sub-query to calculate the total for the second table:
SELECT location,
invoice,
SUM( payments ) AS total_payments_1,
COALESCE( (SELECT SUM( payments )
FROM tbl2 i
WHERE o.location = i.location
AND o.invoice = i.invoice),
0 ) AS total_payments_2
FROM tbl1 o
GROUP BY
location,
invoice
ORDER BY
location,
invoice
Results:
| LOCATION | INVOICE | TOTAL_PAYMENTS_1 | TOTAL_PAYMENTS_2 |
|----------|---------|------------------|------------------|
| a | 1 | 5 | 3 |
| a | 2 | 3 | 2 |
| b | 1 | 1 | 5 |
| b | 2 | 1 | 0 |
Query 2:
This one uses a named sub-query to pre-calculate the totals for table 1 then performs a LEFT OUTER JOIN with the second table and includes the total for table 1 in the group.
Without any indexes then, from the explain plans, Query 1 seems to be much more efficient but your indexes might mean the optimizer finds a better plan.
WITH tbl1_sums AS (
SELECT location,
invoice,
SUM( payments ) AS total_payments_1
FROM tbl1
GROUP BY
location,
invoice
)
SELECT t1.location,
t1.invoice,
t1.total_payments_1,
COALESCE( SUM( t2.payments ), 0 ) AS total_payments_2
FROM tbl1_sums t1
LEFT OUTER JOIN
tbl2 t2
ON ( t1.location = t2.location
AND t1.invoice = t2.invoice)
GROUP BY
t1.location,
t1.invoice,
t1.total_payments_1
ORDER BY
t1.location,
t1.invoice
Results:
| LOCATION | INVOICE | TOTAL_PAYMENTS_1 | TOTAL_PAYMENTS_2 |
|----------|---------|------------------|------------------|
| a | 1 | 5 | 3 |
| a | 2 | 3 | 2 |
| b | 1 | 1 | 5 |
| b | 2 | 1 | 0 |
Sorry, my first answer was wrong. Thank you for providing the sqlfiddle, MT0.
The point that i missed is that you need to sum up the payments on each table first, so there's only one line left in each, then join them. This is what MT0 does in his statements.
If you want a solution that looks more "symmetric", try:
select A.location, A.invoice, B.total sum1, C.total sum2
from (select distinct location, invoice from tbl1) A
left outer join (select location, invoice, sum(payments) as total from tbl1 group by location, invoice) B on A.location=B.location and A.invoice=B.invoice
left outer join (select location, invoice, sum(payments) as total from tbl2 group by location, invoice) C on A.location=C.location and A.invoice=C.invoice
which results in
LOCATION INVOICE SUM1 SUM2
a 2 3 2
a 1 5 3
b 1 1 5
b 2 1 (null)
I have a table "person", an associative table "person_vaccination" and a table "vaccination".
I want to get the person who has missing vaccinations but so far I only got it to work when I have the id.
SELECT vac.VACCINATION_Name
FROM VACCINATION vac
WHERE vac.VACCINATION_NUMBER NOT IN
(SELECT v.VACCINATION_NUMBER
FROM PERSON per
Join PERSON_VACCINATION pv ON per.PERSON_NUMBER = pv.PERSON_NUMBER
JOIN VACCINATION v ON pv.VACCINATION_NUMBER = v.VACCINATION_NUMBER
WHERE per.PERSON_NUMBER = 6)
It works fine but how do I get all the people missing their vaccinations? (ex:
555 , Vacccination 1
555 , Vacccination 2
666 , Vacccination 1)
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE VACCINATION ( VACCINATION_NUMBER, VACCINATION_NAME ) AS
SELECT 1, 'Vac 1' FROM DUAL
UNION ALL SELECT 2, 'Vac 2' FROM DUAL
UNION ALL SELECT 3, 'Vac 3' FROM DUAL
UNION ALL SELECT 4, 'Vac 4' FROM DUAL;
CREATE TABLE PERSON_VACCINATION ( VACCINATION_NUMBER, PERSON_NUMBER ) AS
SELECT 1, 1 FROM DUAL
UNION ALL SELECT 2, 1 FROM DUAL
UNION ALL SELECT 3, 1 FROM DUAL
UNION ALL SELECT 4, 1 FROM DUAL
UNION ALL SELECT 1, 2 FROM DUAL
UNION ALL SELECT 2, 2 FROM DUAL
UNION ALL SELECT 3, 2 FROM DUAL;
CREATE TABLE PERSON ( PERSON_NUMBER, PERSON_NAME ) AS
SELECT 1, 'P1' FROM DUAL
UNION ALL SELECT 2, 'P2' FROM DUAL
UNION ALL SELECT 3, 'P3' FROM DUAL;
Query 1:
SELECT p.PERSON_NAME,
v.VACCINATION_NAME
FROM VACCINATION v
CROSS JOIN
PERSON p
WHERE NOT EXISTS ( SELECT 1
FROM PERSON_VACCINATION pv
WHERE pv.VACCINATION_NUMBER = v.VACCINATION_NUMBER
AND pv.PERSON_NUMBER = p.PERSON_NUMBER )
ORDER BY p.PERSON_NAME,
p.PERSON_NUMBER,
v.VACCINATION_NAME,
v.VACCINATION_NUMBER
Results:
| PERSON_NAME | VACCINATION_NAME |
|-------------|------------------|
| P2 | Vac 4 |
| P3 | Vac 1 |
| P3 | Vac 2 |
| P3 | Vac 3 |
| P3 | Vac 4 |
Instead of an INNER JOIN, you should use LEFT JOIN.
Take a look at this link: http://www.w3schools.com/sql/sql_join_left.asp
If you are after people with no vaccinations at all, then you can use a LEFT OUTER JOIN between PERSON and PERSON_VACCINATION, then find all entries where a PERSON_VACCINATION column is NULL.
SELECT PERSON_NUMBER
FROM PERSON P
LEFT OUTER JOIN
PERSON_VACCINATION PV
ON P.PERSON_NUMBER = PV.PERSON_NUMBER
WHERE PV.PERSON_NUMBER IS NULL
If you are unfamiliar with LEFT OUTER JOIN, it tries to find matching rows in PERSON_VACCINATION for each row in PERSON. If there are no matching rows, it leaves the PERSON row in the result set, and shows NULL values for all columns in the PERSON_VACCINATION table.
If you are looking for a list of people and the vaccinations they do not have then #MT0's answer is correct. You need to create a result set containing all possible combinations of PERSON and VACCINATION (a Cross Join), then check which of those combinations actually exist in PERSON_VACCINATION. Any entry that does not exist is are your missing vaccinations.