How to select passengers that never flew to a city - sql

I will send the Database Description in an Image.
I tried this Select but I'm afraid that this isn't right
SELECT t.type , a.ICAOId , a.name , ci.id , c.ISOAlpha2ID , p.docReference , ti.docReference , ti.number , p.name , p.surname
FROM dbo.AirportType t
INNER JOIN dbo.Airport a ON t.type = a.type
INNER JOIN dbo.City ci ON a.city = ci.id
INNER JOIN dbo.Country c ON ci.ISOalpha2Id = c.ISOalpha2Id
INNER JOIN dbo.Passenger p ON c.ISOalpha2Id = p.nationality
INNER JOIN dbo.Ticket ti ON p.docReference = ti.docReference
WHERE NOT ci.id = 'Tokyo'
Can you please help to get this right?
enter image description here

You could make a list of the passengers that HAVE flown to the city then use that as a subquery to select the ones not in the list
I am just going to make an example of how it should be done
Subquery:
SELECT p.id FROM passengers
JOIN tickets t ON p.id = t.passengerID
JOIN city c ON c.id = t.cityID
Now you just put that into another query that selects the elements not in it
SELECT * FROM passenger
WHERE id not in (
SELECT p.id FROM passengers
JOIN tickets t ON p.id = t.passengerID
JOIN city c ON c.id = t.cityID
WHERE c.name= 'tokyo'
)
Notice I didn't use your attribute names, you will have to change those.
This was a bit simplified version of what you will have to do because the city is not directly in your tickets table. So you will also have to join tickets, with coupons, and flights to get the people that have flown to a city. But from there it is the same.
Overall I believe this should help you get what you have to do.

A minimal reproducible example is not provided.
Here is a conceptual example, that could be easily extended to a real scenario.
SQL
-- DDL and sample data population, start
DECLARE #passenger TABLE (passengerID INT PRIMARY KEY, passenger_name VARCHAR(20));
INSERT #passenger (passengerID, passenger_name) VALUES
(1, 'Anna'),
(2, 'Paul');
DECLARE #city TABLE (cityID INT PRIMARY KEY, city_name VARCHAR(20));
INSERT #city (cityID, city_name) VALUES
(1, 'Miami'),
(2, 'Orldando'),
(3, 'Tokyo');
-- Already visited cities
DECLARE #passenger_city TABLE (passengerID INT, cityID INT);
INSERT #passenger_city (passengerID, cityID) VALUES
(1, 1),
(2, 3);
-- DDL and sample data population, end
SELECT * FROM #passenger;
SELECT * FROM #city;
SELECT * FROM #passenger_city;
;WITH rs AS
(
SELECT c.passengerID, b.cityID
FROM #passenger AS c
CROSS JOIN #city AS b -- get all possible combinations of passengers and cities
EXCEPT -- filter out already visited cities
SELECT passengerID, cityID FROM #passenger_city
)
SELECT c.*, b.city_name
FROM rs
INNER JOIN #passenger AS c ON c.passengerID = rs.passengerID
INNER JOIN #city AS b ON b.cityID = rs.cityID
ORDER BY c.passenger_name, b.city_name;
Output
passengerID
passenger_name
city_name
1
Anna
Orldando
1
Anna
Tokyo
2
Paul
Miami
2
Paul
Orldando

Related

Joining together results of two select statements

I have the following schema and I am just stuck on one particular part.
CREATE TABLE Suppliers (ID INT, Name VARCHAR(128), Postcode VARCHAR(10));
CREATE TABLE Branches (ID INT, Name VARCHAR(128), Postcode VARCHAR(10));
CREATE TABLE Postcode_States (ID INT, State VARCHAR(128), Postcode VARCHAR(10));
SELECT S.Name AS SupplierName, PS.State AS SupplierState
FROM Suppliers AS S
LEFT JOIN Postcode_States PS ON S.Postcode = PS.Postcode;
SELECT B.Name AS BranchName, PS.State AS BranchCounty
FROM Branches AS B
LEFT JOIN Postcode_States PS ON B.Postcode = PS.Postcode;
I have three tables, Suppliers, Branches and Postcode_States. I have selected all the Suppliers and their states joined on postcode in one query and all the Branches and their states joined on postcode in another query. Can anyone give me any guidance on how I could combine these two queries so that I could return all the Suppliers and Branches with the same state? Thanks
Sample data and requested query output
You can join:
select s.name, b.name, p.state
from suppliers s
inner join branches b on b.postcode = s.postcode
inner join postcodes p on p.postcode = s.postcode
If you want states with more than one supplier, you can use window functions:
select s.*
from (select s.name, b.name, p.state,
count(*) over (partition by state) as cnt
from suppliers s join
branches b
on b.postcode = s.postcode join
postcodes p
on p.postcode = s.postcode
) s
where cnt >= 2
order by cnt desc, state;
This doesn't return Texas, but your question suggests that you want duplicates.

Figure out the total number of people in an overlapping er database

I am trying to find:
the total number of doctors which aren't patients
the total number of patients which aren't doctors
the total number of people who are both patients and doctors
I can't seem to get the correct answer.
SQL:
CREATE TABLE persons (
id integer primary key,
name text
);
CREATE TABLE doctors (
id integer primary key,
type text,
FOREIGN KEY (id) REFERENCES persons(id)
);
CREATE TABLE patients (
id integer primary key,
suffering_from text,
FOREIGN KEY (id) REFERENCES persons(id)
);
INSERT INTO persons (id, name) VALUES
(1, 'bob'), (2, 'james'), (3, 'bill'), (4, 'mark'), (5, 'chloe');
INSERT INTO doctors (id, type) VALUES
(2, 'family doctor'), (3, 'eye doctor'), (5, 'family doctor');
INSERT INTO patients (id, suffering_from) VALUES
(1, 'flu'), (2, 'diabetes');
Select statement:
select count(d.id) as total_doctors, count(pa.id) as total_patients, count(d.id) + count(pa.id) as both_doctor_and_patient
from persons p
JOIN doctors d
ON p.id = d.id
JOIN patients pa
ON p.id = pa.id;
http://www.sqlfiddle.com/#!17/98ae9/2
One option uses left joins from persons and conditional aggrgation:
select
count(dr.id) filter(where pa.id is null) cnt_doctor_not_patient,
count(pa.id) filter(where dr.id is null) cnt_patient_not_doctor,
count(pa.id) filter(where do.id is not null) cnt_patient_and_doctor,
count(*) filter(where dr.id is null and pa.id is null) cnt_persons_not_dotor_nor_patient
from persons pe
left join doctors dr on dr.id = pe.id
left join patients pa on pa.id = pe.id
As a bonus, this gives you an opportunity to count the persons that are neither patient nor doctor. If you don't need that information, then a full join is simpler, and does not require bringing the persons table:
select
count(dr.id) filter(where pa.id is null) cnt_doctor_not_patient,
count(pa.id) filter(where dr.id is null) cnt_patient_not_doctor,
count(pa.id) filter(where dr.id is not null) cnt_patient_and_doctor
from doctors dr
full join patients pa using (id)
You can simply solve this using LEFT JOIN like:
--Aren't doctors:
SELECT count(*) from persons as A left join doctors as B on A.id=B.id where B.id is null
--Aren't patients:
SELECT count(*) from persons as A left join patients as B on A.id=B.id where B.id is null
--Both:
SELECT
(SELECT count(*) from persons as A left join patients as B on A.id=B.id where B.id is not null) +
(SELECT count(*) from persons as A left join doctors as B on A.id=B.id where B.id is not null)
AS summ
Here a CTE alternative:
with doc_not_pat
as(
select count(*) as Doc_Not_Pat
from doctors d
where not exists (select 1 from patients p where p.id = d.id)
),
pat_not_doc as(
select count(*) as Pat_Not_Doc
from patients p
where not exists ( select 1 from doctors d where d.id = p.id)
),
pat_and_doc as(
select count(*) as Pat_And_Doc
from patients p
where exists (select 1 from doctors d where d.id = p.id)
)
select (select Doc_Not_Pat
from doc_not_pat dcp) as Doc_Not_Pat,
(select Pat_Not_Doc
from pat_not_doc) as Pat_Not_Doc,
(select Pat_And_Doc
from pat_and_doc) as Pat_And_Doc

Accessing derived tables from outer query

In the following problem
Filtering based on Joining Multiple Tables in SQL
I managed to determine that the posters problem was happening because he was accessing derived tables from the outer query.
What I don't understand is why this happened.
So if you run the following
create table salesperson (
id int, name varchar(40)
)
create table customer (
id int, name varchar(40)
)
create table orders (
number int, cust_id int, salesperson_id int
)
insert into salesperson values (1, 'abe'); insert into salesperson values (2, 'bob');
insert into salesperson values (5, 'chris'); insert into salesperson values (7, 'dan');
insert into salesperson values (8, 'ken'); insert into salesperson values (11, 'joe');
insert into customer values (4, 'Samsonic'); insert into customer values (6, 'panasung');
insert into customer values (7, 'samony'); insert into customer values (9, 'orange');
insert into orders values (10, 4, 2); insert into orders values (20, 4, 8);
insert into orders values (30, 9, 1); insert into orders values (40, 7, 2);
insert into orders values (50, 6, 7); insert into orders values (60, 6, 7);
insert into orders values (70, 9, 7);
SELECT *
FROM salesperson s
INNER JOIN orders o ON s.id = o.salesperson_id
INNER JOIN customer c ON o.cust_id = c.id
WHERE s.name NOT IN (
select s.name where c.name='Samsonic'
)
SELECT *
FROM salesperson s
INNER JOIN orders o ON s.id = o.salesperson_id
INNER JOIN customer c ON o.cust_id = c.id
WHERE s.name NOT IN (
SELECT s.name
FROM salesperson s
INNER JOIN orders o ON s.id = o.salesperson_id
INNER JOIN customer c ON o.cust_id = c.id
WHERE c.name = 'Samsonic'
)
The first select statement accesses the derived tables in the outer query, while the other creates its own joins and derives its own tables.
Why does the first select contain bob while the other one does not?
In your first query you are only removing the rows which has customer name Samsonic, since Bob has a record for samony that one comes in the out put.
In the second one you are getting the salesperson who has the customer name Samsonic in that case you are getting both Bob and Ken then you are removing all there records for both Bob and Ken using the 'not in'so both records for bob is getting removed hence you dont get any.
The difference is that in your first query you are only removing orders which involve Samsonic, because the exclusion is only looking at data in the current row. Whereas by the sounds of it you want to remove any sales-person who has ever sold a Samsonic. You can see the difference with in the results of the following query:
SELECT *, s.name, c.name
, case when s.name NOT IN (
select s.name where c.name='Samsonic'
) then 1 else 0 end /* Order not Samsonic */
, case when not exists (
select 1
from Orders O1
inner join Customer C1 on o1.cust_id = c1.id
where C1.Name = 'Samsonic' and o1.salesperson_id = O.salesperson_id
) then 1 else 0 end /* Salesperson never sold a Samsonic */
FROM salesperson s
INNER JOIN orders o ON s.id = o.salesperson_id
INNER JOIN customer c ON o.cust_id = c.id
Your first query has a select with no from clause. So the where is equivalent to:
WHERE s.name NOT IN (CASE WHEN c.name = 'Samsonic' THEN s.name END)
Or more simply:
WHERE c.name <> 'Samsonic'
Bob has an order that is not with 'Samsonic', so Bob is in the result set. In other words, the logic is looking at each row individually.
The second version is looking at all names that have made an order. Bob is one of those names, so this applies to all orders made by Bob.
If you want to exclude all salespersons who have ever made an order to 'Samsonic', then I would recommend using window functions instead of complicated logic:
SELECT *
FROM (SELECT s.id as salesperson_id, s.name as salesperson_name, c.id as customer_id, c.name as customer_name, o.number,
SUM(CASE WHEN c.name = 'Samsonic' THEN 1 ELSE 0 END) OVER (PARTITION BY s.id) as num_samsonic
FROM salesperson s INNER JOIN
orders o
ON s.id = o.salesperson_id INNER JOIN
customer c
ON o.cust_id = c.id
WHERE c.name <> 'Samsonic'
) soc
WHERE num_samsonic = 0

PostgreSQL Select Join Not in List

The project is using Postgres 9.3
I have tables (that I have simplified) as follows:
t_person (30 million records)
- id
- first_name
- last_name
- gender
t_city (70,000 records)
- id
- name
- country_id
t_country (20 records)
- id
- name
t_last_city_visited (over 200 million records)
- person_id
- city_id
- country_id
- There is a unique constraint on person_id, country_id to
ensure that each person only has one last city per country
What I need to do are variations on the following:
Get the ids of Person who are female who have visited country 'UK'
but have never visited country 'USA'
I have tried the following, but it is too slow.
select t_person.id from t_person
join t_last_city_visited
on (
t_last_city_visited.person_id = t_person.id
and country_id = (select id from t_country where name = 'UK')
)
where gender = 'female'
except
(
select t_person.id from t_person
join t_last_city_visited
on (
t_last_city_visited.person_id = t_person.id
and country_id = (select id from t_country where name = 'USA')
)
)
I would really appreciate any help.
Hint: What you want to do here is to find the females for whom there EXISTS a visit to the UK, but where NOT EXISTS a visit to the US.
Something like:
select ...
from t_person
where ...
and exists (select null
from t_last_city_visited join
t_country on (...)
where t_country.name = 'UK')
and not exists (select null
from t_last_city_visited join
t_country on (...)
where t_country.name = 'US')
Another approach, to find the people who have visited the UK and not the US, which you can then join to the people to filter by gender:
select person_id
from t_last_city_visited join
t_country on t_last_city_visited.country_id = t_country.id
where t_country.name in ('US','UK')
group by person_id
having max(t_country.name) = 'UK'
Could you please run analyze and execute this query?
-- females who visited UK
with uk_person as (
select distinct person_id
from t_last_city_visited t
inner join t_person p on t.person_id = p.id and 'F' = p.gender
where country_id = (select id from t_country where name = 'UK')
),
-- females who visited US
us_person as (
select distinct person_id
from t_last_city_visited t
inner join t_person p on t.person_id = p.id and 'F' = p.gender
where country_id = (select id from t_country where name = 'US')
)
-- females who visited UK but not US
select uk.person_id
from uk_person uk
left join us_person us on uk.person_id = us.person_id
where us.person_id is null
This is one of the many ways this query can be formed. You might have to run them to find out which one works best and indexing tweaks you may need to make to have them run faster.
This is the way I would approach it, you can later substitute the inner queries by a with alias as #zedfoxus said
select
id
from
(SELECT
p.id id
FROM
t_person p JOIN t_last_city_visited lcv
ON(lcv.person_id = p.id)
JOIN country c
ON(lcv.country_id = c.id and cname = 'UK')
WHERE
p.gender = 'female') v JOIN
(SELECT
p2.id id
FROM
t_person p2 JOIN t_last_city_visited lcv2
ON(lcv2.person_id = p2.id)
JOIN country c
ON(lcv.country_id = c.id and cname != 'USA')
WHERE
p.gender = 'female') nv
ON(v.id = nv.id)

Getting first line of a LEFT OUTER JOIN

I have 3 tables:
(SELECT DISTINCT ID
FROM IDS)a
LEFT OUTER JOIN
(SELECT NAME, ID
FROM NAMES)b
ON a.ID = b.ID
LEFT OUTER JOIN
(SELECT ADDRESS FROM ADDRESSES
WHERE ROWNUM <2
ORDER BY UPDATED_DATE DESC)c
ON a.ID = c.ID
An ID can have only one name but can have multiple addresses. I only want the latest one. This query returns the address as null even when there is an address I guess cause it only fetches the first address from the table and then tries LEFT JOIN it to the ID of addresses which it canno find. What is the correct way of writing this query?
Try KEEP DENSE_RANK
Data source:
CREATE TABLE person
(person_id int primary key, firstname varchar2(4), lastname varchar2(9))
/
INSERT ALL
INTO person (person_id, firstname, lastname)
VALUES (1, 'john', 'lennon')
INTO person (person_id, firstname, lastname)
VALUES (2, 'paul', 'mccartney')
SELECT * FROM dual;
CREATE TABLE address
(person_id int, address_id int primary key, city varchar2(8))
/
INSERT ALL
INTO address (person_id, address_id, city)
VALUES (1, 1, 'new york')
INTO address (person_id, address_id, city)
VALUES (1, 2, 'england')
INTO address (person_id, address_id, city)
VALUES (1, 3, 'japan')
INTO address (person_id, address_id, city)
VALUES (2, 4, 'london')
SELECT * FROM dual;
Query:
select
p.person_id, p.firstname, p.lastname,
x.recent_city
from person p
left join (
select person_id,
min(city) -- can change this to max(city). will work regardless of min/max
-- important you do this to get the recent: keep(dense_rank last)
keep(dense_rank last order by address_id)
as recent_city
from address
group by person_id
) x on x.person_id = p.person_id
Live test: http://www.sqlfiddle.com/#!4/7b1c9/2
Not all database has similar functionality with Oracle's KEEP DENSE_RANK windowing function, you can use plain windowing function instead:
select
p.person_id, p.firstname, p.lastname,
x.recent_city, x.pick_one_only
from person p
left join (
select
person_id,
row_number() over(partition by person_id order by address_id desc) as pick_one_only,
city as recent_city
from address
) x on x.person_id = p.person_id and x.pick_one_only = 1
Live test: http://www.sqlfiddle.com/#!4/7b1c9/48
Or use tuple testing, shall work on databases that doesn't support windowing function:
select
p.person_id, p.firstname, p.lastname,
x.recent_city
from person p
left join (
select
person_id,city as recent_city
from address
where (person_id,address_id) in
(select person_id, max(address_id)
from address
group by person_id)
) x on x.person_id = p.person_id
Live test: http://www.sqlfiddle.com/#!4/7b1c9/21
Not all database supports tuple testing like in the preceding code though. You can use JOIN instead:
select
p.person_id, p.firstname, p.lastname,
x.recent_city
from person p
left join (
select
address.person_id,address.city as recent_city
from address
join
(
select person_id, max(address_id) as recent_id
from address
group by person_id
) r
ON address.person_id = r.person_id
AND address.address_id = r.recent_id
) x on x.person_id = p.person_id
Live test: http://www.sqlfiddle.com/#!4/7b1c9/24
You can use the analytic function RANK
(SELECT DISTINCT ID
FROM IDS) a
LEFT OUTER JOIN
(SELECT NAME, ID
FROM NAMES) b
ON a.ID = b.ID
LEFT OUTER JOIN
(SELECT ADDRESS ,
rank() over (partition by id
order by updated_date desc) rnk
FROM ADDRESSES) c
ON ( a.ID = c.ID
and c.rnk = 1)
Without having access to any database at the moment, you should be able to do
(SELECT DISTINCT ID
FROM IDS) a LEFT OUTER JOIN
(SELECT NAME, ID
FROM NAMES)b ON a.ID = b.ID LEFT OUTER JOIN
(SELECT TOP 1 ADDRESS
FROM ADDRESSES
ORDER BY UPDATED_DATE DESC) c ON a.ID = c.ID
As you might see, the "TOP 1" at 'Address' will only return the first row of the result set.
Also, are you sure that a.ID and c.ID is the same?
I would imagine you need something like .... c ON a.ID = c.AddressID
If not, i'm not entirely sure how you link multiple addresses to a single ID.
(SELECT DISTINCT ID
FROM IDS)a
LEFT OUTER JOIN
(SELECT NAME, ID
FROM NAMES)b
ON a.ID = b.ID
LEFT OUTER JOIN
(SELECT ADDRESS, ROWNUMBER() OVER(PARTITON BY ID ORDER BY UPDATED_DATE DESC) RN
FROM ADDRESSES
)c
ON a.ID = c.ID
where c.RN=1