PostgreSQL Select Join Not in List - sql

The project is using Postgres 9.3
I have tables (that I have simplified) as follows:
t_person (30 million records)
- id
- first_name
- last_name
- gender
t_city (70,000 records)
- id
- name
- country_id
t_country (20 records)
- id
- name
t_last_city_visited (over 200 million records)
- person_id
- city_id
- country_id
- There is a unique constraint on person_id, country_id to
ensure that each person only has one last city per country
What I need to do are variations on the following:
Get the ids of Person who are female who have visited country 'UK'
but have never visited country 'USA'
I have tried the following, but it is too slow.
select t_person.id from t_person
join t_last_city_visited
on (
t_last_city_visited.person_id = t_person.id
and country_id = (select id from t_country where name = 'UK')
)
where gender = 'female'
except
(
select t_person.id from t_person
join t_last_city_visited
on (
t_last_city_visited.person_id = t_person.id
and country_id = (select id from t_country where name = 'USA')
)
)
I would really appreciate any help.

Hint: What you want to do here is to find the females for whom there EXISTS a visit to the UK, but where NOT EXISTS a visit to the US.
Something like:
select ...
from t_person
where ...
and exists (select null
from t_last_city_visited join
t_country on (...)
where t_country.name = 'UK')
and not exists (select null
from t_last_city_visited join
t_country on (...)
where t_country.name = 'US')
Another approach, to find the people who have visited the UK and not the US, which you can then join to the people to filter by gender:
select person_id
from t_last_city_visited join
t_country on t_last_city_visited.country_id = t_country.id
where t_country.name in ('US','UK')
group by person_id
having max(t_country.name) = 'UK'

Could you please run analyze and execute this query?
-- females who visited UK
with uk_person as (
select distinct person_id
from t_last_city_visited t
inner join t_person p on t.person_id = p.id and 'F' = p.gender
where country_id = (select id from t_country where name = 'UK')
),
-- females who visited US
us_person as (
select distinct person_id
from t_last_city_visited t
inner join t_person p on t.person_id = p.id and 'F' = p.gender
where country_id = (select id from t_country where name = 'US')
)
-- females who visited UK but not US
select uk.person_id
from uk_person uk
left join us_person us on uk.person_id = us.person_id
where us.person_id is null
This is one of the many ways this query can be formed. You might have to run them to find out which one works best and indexing tweaks you may need to make to have them run faster.

This is the way I would approach it, you can later substitute the inner queries by a with alias as #zedfoxus said
select
id
from
(SELECT
p.id id
FROM
t_person p JOIN t_last_city_visited lcv
ON(lcv.person_id = p.id)
JOIN country c
ON(lcv.country_id = c.id and cname = 'UK')
WHERE
p.gender = 'female') v JOIN
(SELECT
p2.id id
FROM
t_person p2 JOIN t_last_city_visited lcv2
ON(lcv2.person_id = p2.id)
JOIN country c
ON(lcv.country_id = c.id and cname != 'USA')
WHERE
p.gender = 'female') nv
ON(v.id = nv.id)

Related

How to select passengers that never flew to a city

I will send the Database Description in an Image.
I tried this Select but I'm afraid that this isn't right
SELECT t.type , a.ICAOId , a.name , ci.id , c.ISOAlpha2ID , p.docReference , ti.docReference , ti.number , p.name , p.surname
FROM dbo.AirportType t
INNER JOIN dbo.Airport a ON t.type = a.type
INNER JOIN dbo.City ci ON a.city = ci.id
INNER JOIN dbo.Country c ON ci.ISOalpha2Id = c.ISOalpha2Id
INNER JOIN dbo.Passenger p ON c.ISOalpha2Id = p.nationality
INNER JOIN dbo.Ticket ti ON p.docReference = ti.docReference
WHERE NOT ci.id = 'Tokyo'
Can you please help to get this right?
enter image description here
You could make a list of the passengers that HAVE flown to the city then use that as a subquery to select the ones not in the list
I am just going to make an example of how it should be done
Subquery:
SELECT p.id FROM passengers
JOIN tickets t ON p.id = t.passengerID
JOIN city c ON c.id = t.cityID
Now you just put that into another query that selects the elements not in it
SELECT * FROM passenger
WHERE id not in (
SELECT p.id FROM passengers
JOIN tickets t ON p.id = t.passengerID
JOIN city c ON c.id = t.cityID
WHERE c.name= 'tokyo'
)
Notice I didn't use your attribute names, you will have to change those.
This was a bit simplified version of what you will have to do because the city is not directly in your tickets table. So you will also have to join tickets, with coupons, and flights to get the people that have flown to a city. But from there it is the same.
Overall I believe this should help you get what you have to do.
A minimal reproducible example is not provided.
Here is a conceptual example, that could be easily extended to a real scenario.
SQL
-- DDL and sample data population, start
DECLARE #passenger TABLE (passengerID INT PRIMARY KEY, passenger_name VARCHAR(20));
INSERT #passenger (passengerID, passenger_name) VALUES
(1, 'Anna'),
(2, 'Paul');
DECLARE #city TABLE (cityID INT PRIMARY KEY, city_name VARCHAR(20));
INSERT #city (cityID, city_name) VALUES
(1, 'Miami'),
(2, 'Orldando'),
(3, 'Tokyo');
-- Already visited cities
DECLARE #passenger_city TABLE (passengerID INT, cityID INT);
INSERT #passenger_city (passengerID, cityID) VALUES
(1, 1),
(2, 3);
-- DDL and sample data population, end
SELECT * FROM #passenger;
SELECT * FROM #city;
SELECT * FROM #passenger_city;
;WITH rs AS
(
SELECT c.passengerID, b.cityID
FROM #passenger AS c
CROSS JOIN #city AS b -- get all possible combinations of passengers and cities
EXCEPT -- filter out already visited cities
SELECT passengerID, cityID FROM #passenger_city
)
SELECT c.*, b.city_name
FROM rs
INNER JOIN #passenger AS c ON c.passengerID = rs.passengerID
INNER JOIN #city AS b ON b.cityID = rs.cityID
ORDER BY c.passenger_name, b.city_name;
Output
passengerID
passenger_name
city_name
1
Anna
Orldando
1
Anna
Tokyo
2
Paul
Miami
2
Paul
Orldando

What is the most efficient way of selecting data from relational database?

I just started working with databases and
I have this data sample from PostgreSQL tutorial
https://www.postgresqltutorial.com/postgresql-sample-database/
Which diagram looks like this:
I want to find all film categories rented in for example Canada. Is there a way of doing it without using SELECT within SELECT.. statement like this:
SELECT * FROM category WHERE category_id IN (
SELECT category_id FROM film_category WHERE film_id IN (
SELECT film_id FROM film WHERE film_id IN (
SELECT film_id FROM inventory WHERE inventory_id IN (
SELECT inventory_id FROM rental WHERE staff_id IN (
SELECT staff_id FROM staff WHERE store_id IN (
SELECT store_id FROM store WHERE address_id IN (
SELECT address_id FROM address WHERE city_id IN (
SELECT city_id FROM city WHERE country_id IN (
SELECT country_id FROM country WHERE country IN ('Canada')
)
)
)
)
)
)
)
)
)
I'm sure there must be something that i'm missing.
The proper way is to use joins instead of all these nested subqueries:
select distinct c.category_id, c.name
from category c
inner join film_category fc on fc.category_id = c.category_id
inner join inventory i on i.film_id = fc.film_id
inner join rental r on r.inventory_id = i.inventory_id
inner join staff s on s.staff_id = r.staff_id
inner join store sr on sr.store_id = s.store_id
inner join address a on a.address_id = sr.address_id
inner join city ct on ct.city_id = a.city_id
inner join country cr on cr.country_id = ct.country_id
where cr.country = 'Canada'
For your requirement you must join 9 tables (1 less than your code because the table film is not really needed as the column film_id can link the tables film_category and inventory directly).
Notice the aliases for each table which shortens the code and makes it more readable and the ON clauses which are used to link each pair of tables.
Also the keyword DISTINCT is used so you don't get duplicates in the results because all these joins will return many rows for each category.

SQL Query, Average climbed and pair that has climbed the most peaks

My Databases look like so:
PEAK (NAME, ELEV, DIFF, MAP, REGION)
CLIMBER (NAME, SEX)
PARTICIPATED (TRIP_ID, NAME)
CLIMBED (TRIP_ID, PEAK, WHEN)
PEAK gives info about the mountain peaks that the user is interested in. The table lists the name of each peak, it elevation(in ft), its difficulty level(on a scale of 1-5), the map that it is located on, and the region of the Sierra Nevada that it is located in.
CLIMBER lists the members of club, and gives their name and gender.
PARTICIPATED gives the set of climbers who participated in each of the various climbing trips. The number of participants in each trip varies.
CLIMBED tells which peaks were climbed on each climbing trip, along w/ the data that each peak was climbed.
I need help with writing a query for the following:
Compute the average number of peaks scaled by the men in the club and by the women in the club.
Which pair of climbers have climbed the most peaks together, and how many peaks is that?
Who has climbed more than 20 peaks in some 60 day span?
For the first query, so far I have found a way to compute the total number of peaks climbed by either gender, for men:
SELECT SUM(C)
FROM
(SELECT CD.PEAK, COUNT(*) C
FROM CLIMBED CD
WHERE CD.TRIP_ID IN
(SELECT TRIP_ID
FROM PARTICIPATED PA
WHERE PA.NAME IN
(SELECT NAME
FROM CLIMBER
WHERE SEX = 'M'))
GROUP BY CD.PEAK) T;
For the second query, I have the following which I'm fairly sure isn't correct:
SELECT TEMP2.TRIP_ID, COUNT (*)
FROM
(SELECT P1.NAME, P2.NAME, P1.TRIP_ID
FROM PARTICIPATED P1, PARTICIPATED P2
WHERE P1.NAME <> P2.NAME AND
P1.TRIP_ID = P2.TRIP_ID) TEMP1,
(SELECT *
FROM CLIMBED) TEMP2
WHERE TEMP2.TRIP_ID = TEMP1.TRIP_ID
GROUP BY TEMP2.TRIP_ID;
Question 1:
For total number of trips (including every time a peak was climbed)
SELECT t1.sex, AVG(t1.peak_count) AS average
FROM
(SELECT sex, COUNT(trip_id) AS peak_count
FROM climber c LEFT JOIN participated p ON c.name = p.name GROUP BY c.name, c.sex) t1
For each time a UNIQUE peak was climbed:
SELECT t1.sex, AVG(t1.peak_count) AS average
FROM
(SELECT sex, COUNT(trip_id) AS peak_count
FROM climber c LEFT JOIN participated p ON c.name = p.name GROUP BY c.name, c.sex) t1
Question 2:
SELECT P1.Name, P2.Name, COUNT(DISTINCT p1.trip_id) AS trips
FROM participated p1 INNER JOIN participated p2 ON p1.trip_id = p2.trip_id
WHERE p1.name > p2.name -- > instead of <> gets only one of the pairs
GROUP BY P1.Name, P2.Name
HAVING COUNT(DISTINCT p1.trip_id) > 0
ORDER BY trips DESC
Question 3:
SELECT p.name, cl.when AS span_begin_date, DATEADD(day, 60, cl.when) AS span_end_date, count(c2.trip_id) AS peaks
FROM climbed cl LEFT JOIN
climbed c2 ON c2.when BETWEEN cl.when AND DATEADD(day, 60, cl.when)
GROUP BY p.name, cl.when, DATEADD(day, 60, cl.when)
HAVING COUNT(c2.trip_id) > 20
ORDER BY peaks
Here is my solution. If you provide sample data, this can be verified. For question 3, the some 60 day span is not clear. Can you please specify better?
Question 1
select x.sex, avg(x.peaks_escalated) as peaks
from (
select u.name, u.sex, count(distinct c.peak) as peaks_escalated
from t1_climbed c
inner join t1_participated p on c.trip_id = p.trip_id
inner join t1_climber u on p.name = u.name
group by u.name, u.sex ) x
group by x.sex
Question 2
with list1 as (
select u.name as member, c.trip_id, c.peak, c.when
from t1_climbed c
inner join t1_participated p on c.trip_id = p.trip_id
inner join t1_climber u on p.name = u.name
)
select a.member as m1, b.member as m2, count(distinct a.peak) as total
from list1 a inner join list1 b
on a.trip_id = b.trip_id
and a.peak = b.peak
and a.when = b.when
and a.member <> b.member
group by a.member, b.member
Oracle Setup:
CREATE TABLE PEAK (
NAME VARCHAR2(50) PRIMARY KEY,
ELEV INT,
DIFF INT,
MAP VARCHAR2(10),
REGION VARCHAR2(10)
);
CREATE TABLE CLIMBER (
NAME VARCHAR2(50) PRIMARY KEY,
SEX CHAR(1) CHECK ( SEX IN ( 'M', 'F' ) )
);
-- Created this to have a primary key
CREATE TABLE TRIPS (
TRIP_ID INT PRIMARY KEY
);
CREATE TABLE PARTICIPATED (
TRIP_ID INT REFERENCES TRIPS( TRIP_ID ),
NAME VARCHAR2(50) REFERENCES CLIMBER( NAME ),
PRIMARY KEY ( TRIP_ID, NAME )
);
CREATE TABLE CLIMBED (
TRIP_ID INT REFERENCES TRIPS( TRIP_ID ),
PEAK VARCHAR2(50) REFERENCES PEAK ( NAME ),
"WHEN" DATE
);
Question 1
SELECT sex,
AVG( num_peaks ) AS avg_peaks
FROM (
SELECT c.*,
COUNT( DISTINCT l.peak ) num_peaks
FROM CLIMBED l
INNER JOIN
PARTICIPATED p
ON ( p.trip_id = l.trip_id )
RIGHT OUTER JOIN
CLIMBER c
ON ( p.name = c.name )
GROUP BY c.name, c.sex
)
GROUP BY sex;
You need to OUTER JOIN climbers as they could have not participated in any trips (so having climbed 0 peaks) and this needs to be taken into account in the average. It is also possible that a person could have climbed a peak multiple times - when you want the number of peaks climbed by a person you want to exclude multiple climbs on the same peak and will need to use COUNT( DISTINCT ... ) (or another similar technique) - if you want to count multiple climbs then remove the DISTINCT keyword.
Question 2:
SELECT *
FROM (
SELECT name1,
name2,
COUNT( DISTINCT c.peak ) AS num_peaks_climbed
FROM (
SELECT p1.name AS name1,
p2.name AS name2,
p1.trip_id
FROM PARTICIPATED p1
INNER JOIN
PARTICIPATED p2
ON ( p1.trip_id = p2.trip_id AND p1.name < p2.name )
) p
INNER JOIN
climbed c
ON ( p.trip_id = c.trip_id )
GROUP BY name1, name2
ORDER BY num_peaks_climbed DESC
)
WHERE ROWNUM = 1;
Question 3:
SELECT *
FROM (
SELECT p.name,
COUNT( c.peak ) OVER ( PARTITION BY p.name
ORDER BY c."WHEN"
RANGE BETWEEN INTERVAL '-60' DAY PRECEDING
AND CURRENT ROW
) AS num_peaks_in_60_days,
c."WHEN" AS last_date_of_range
FROM PARTICIPATED p
INNER JOIN
climbed c
ON ( p.trip_id = c.trip_id )
)
WHERE num_peaks_in_60_days > 20;

SQL query to find a record which has all matching records in another table

I have below 3 tables and I want to write a SQL query which will list the store present in all city: (here the result should be "Walmart")
Stores:
ID Name
1 Walmart
2 Target
3 Sears
Stores_City
ID Store_id City ID
1 1 10
2 1 20
3 2 10
4 1 30
City
ID Name
10 NewYork
20 Boston
30 Eagan
I am unable to find a query that works. Any help is appreciated!
select s.Name
from Stores s
inner join
(
select store_id, count(distinct city_id)
from stores_city
group by store_id
having count(distinct city_id) = (select count(*) from City)
) x
on x.store_id = s.id;
You can do it by grouping on store_id and checking for the count from stores table.
A straight join would work
Select distinct s.name from stores s inner join store _city SC on s.id=sc.id
Inner join city c on
Sc.city_id = c.id
Here is another way that will work:
select s.*
from stores s
where not exists (
select c.id
from city c
except
select sc.city_id
from stores_city sc
where sc.store_id = s.id
)
Try this:
SELECT
s.Name
FROM Stores s
WHERE NOT EXISTS (SELECT TOP 1
1
FROM City c
LEFT JOIN Stores_City sc
ON c.ID = sc.CityID
AND sc.Store_id = s.ID
WHERE sc.ID IS NULL)

select tables relationship

I need help. I have 3 tables like this:
product
* id
- name
supplier
* id
- name
- active
product_supplier
* id_product
* id_supplier
Is last table lists the product to the supplier.
What I need is to build a query that returns me only active suppliers and still are not related to specific product.
Thanks!!
Try this with subquery as below:
SELECT *
FROM supplier
WHERE active = 'Y'
AND id NOT IN (SELECT DISTINCT id_supplier
FROM product_supplier)
This is your question: "What I need is to build a query that returns me only active suppliers and still are not related to specific product."
You are looking for active suppliers that don't have a particular product.
select s.id
from product_supplier ps join
supplier s
on ps.id_supplier = s.id
where s.active = 1
group by s.id
having sum(case when ps.id_product = XX then 1 else 0 end) > 0;
You can can also do this with not exists:
select s.id
from supplier s
where s.active = 1 and
not exists (select 1
from product_supplier ps
where ps.id_supplier = s.id and
ps.id_product = XX
)
And, you can do this with a left join:
select s.*
from supplier s left join
product_supplier ps
on ps.id_supplier = s.id and ps.id_product = XX
where s.active = 1 and ps.id_supplier is null;
This seems like the most natural way to express this in SQL.
Thanks very much...It work with this.
SELECT *
FROM supplier s
WHERE NOT
EXISTS (
SELECT *
FROM product_supplier ps
WHERE s.id = ps.id_supplier
AND ps.id_product =5
)
AND s.active =1