SQL Query with multiple Inner Joins returns wrong Count of Values - sql

I'm new to sql and I'm practicing on a created database in oracle about airbnb listings in amsterdam and berlin. I'm trying to JOIN the Tables HOSTS (host_id, host_name), LISTINGS (which includes all listings in the two city with the attributes listings_id, listings_name, Price, host_id...), Neighbourhoods (Neighbourhood_Group, City and Neighbourhood) and reviews (with review_id, listings_id as a foreign key, reviewer_id, reviewer_name and comment).
Now i want to write a query, which returns the average price, lowest price, highest price, the city (Berlin or Amsterdam), the neighbourhood (Centrum, Alexanderplatz...), the amount of listings and the amount of reviews, all grouped by the distinct neighbourhood, and by the WHERE statement, that limits the returns to those listings, that are hosted by Hosts which have less than 3 listings total.
Now if I run the query without the reviews table and only order it by amount_listings, I get the correct amount of listings per neighbourhood for the column "amount_listings".
SELECT avg(l.price) as Mean_Price,
n.city,
n.neighbourhood,
count (l.listings_id) as amount_listings,
min(l.price),
max(l.price)
FROM listings l
INNER JOIN neighborhood n
ON l.neighbourhood = n.neighbourhood
INNER JOIN hosts h
ON l.host_id = h.host_id
WHERE h.host_id IN (
SELECT host_id
FROM listings
GROUP BY host_id
HAVING count(host_id) < 3
)
GROUP BY n.neighbourhood, n.city
ORDER BY amount_listings DESC;
But if i include the amount of reviews in the query, the results are incorrect. The amount of reviews and the amount of listings column show the wrong amount (too much).
SELECT avg(l.price) as Mean_Price, count(l.listings_id) as amount_listings,
min(l.price), max(l.price), n.city, n.neighbourhood, count(r.review_id) as amount_reviews
FROM listings l
INNER JOIN neighborhood n
ON l.neighbourhood = n.neighbourhood
INNER JOIN hosts h
ON l.host_id = h.host_id
INNER JOIN reviews r
ON l.listings_id= r.listings_id
WHERE h.host_id IN (
SELECT host_id
FROM listings
GROUP BY host_id
HAVING count(host_id) < 3
)
GROUP BY n.neighbourhood, n.city
ORDER BY amount_listings DESC, amount_reviews DESC;
I dont know why the amount_listings and amount_reviews return such wrong results.

Aggregate before joining. You want to join the review count to each listing and then the aggregated listing information to the neighbourhood.
select
lr.mean_price,
n.city,
n.neighbourhood,
lr.amount_listings,
lr.min_price,
lr.max_price,
lr.amount_reviews
from neighborhood n
join
(
select
l.neighbourhood,
min(l.price) as min_price,
max(l.price) as max_price,
avg(l.price) as mean_price,
count(*) as amount_listings,
coalesce(sum(r.reviews_for_listing), 0) as amount_reviews
from listings l
left join
(
select
listings_id,
count(*) as reviews_for_listing
from reviews
group by listings_id
) r on r.listings_id = l.listings_id
where l.host_id in
(
select host_id
from listings
group by host_id
having count(*) < 3
)
group by l.neighbourhood
) lr on lr.neighbourhood = n.neighbourhood
order by n.city, n.neighbourhood;

The cause is that since you are adding a new table with more than 0 rows, records are counted several times, depending on how many reviews are. Since you are only interested in the count of reviews, let's join that instead with a nice old trick:
SELECT avg(l.price) as Mean_Price, count(l.listings_id) as amount_listings,
min(l.price), max(l.price), n.city, n.neighbourhood, r.review_count as amount_reviews
FROM listings l
INNER JOIN neighborhood n
ON l.neighbourhood = n.neighbourhood
INNER JOIN hosts h
ON l.host_id = h.host_id
INNER JOIN (select reviews.listings_id as listings_id, count(*) as review_count from reviews where reviews.listings_id = l.listings_id) r
ON l.listings_id= r.listings_id
WHERE h.host_id IN (
SELECT host_id
FROM listings
GROUP BY host_id
HAVING count(host_id) < 3
)
GROUP BY n.neighbourhood, n.city
ORDER BY amount_listings DESC, amount_reviews DESC;

Related

Returning the entity with max number of participants

I'm using pgsql to find the cage number (cno) that holds the largest number of animals but doesn't have a bird in it.
The way I tried to do it is by creating a table that counts the number of animals in each cage (not including those with birds) and then return the ones where the count equals the max value.
select temp.cno,temp.size
from
(select cage.cno,cage.size,count(*) as q
from cage,animal
where cage.cno = animal.cno and cage.cno not in (select cno from animal where lower(atype)='sheep')
group by cage.cno,cage.size) as temp
where temp.q = (select max(q) from temp)
I'm getting the following error message
ERROR: relation "temp" does not exist
LINE 7: where temp.q = (select max(q) from temp)
Any idea how to overcome this issue? Why isn't temp recognized within the last sub query?
Here are the tables
cage (cno, type, size)
animal (aid, aname, cno, atype)
You already found out that a subquery defined in the FROM is not visible inside another subquery defined in the WHERE clause.
This is easily solvable with the use of a CTE (with a proper join):
WITH temp AS (
SELECT c.cno, c.size, COUNT(*) AS q
FROM cage c INNER JOIN animal a
ON a.cno = c.cno
WHERE c.cno NOT IN (SELECT cno FROM animal WHERE LOWER(atype) = 'bird')
GROUP BY c.cno, c.size
)
SELECT cno, size
FROM temp
WHERE q = (SELECT MAX(q) FROM temp);
But, if there is a case that in a cage exist animals of more than one type then the condition:
c.cno NOT IN (SELECT cno FROM animal WHERE LOWER(atype) = 'bird')
is not correct, because it returns all cages which contain other types than birds without a restriction that there are only other types than birds.
You can apply this restriction with aggregation.
If you want/expect only 1 cage as result:
SELECT c.cno, c.size
FROM cage c INNER JOIN animal a
ON a.cno = c.cno
GROUP BY c.cno
HAVING MAX((LOWER(a.atype) = 'bird')::int) = 0
ORDER BY COUNT(*) DESC LIMIT 1;
If you want more than one cages with the largest number of animals, use RANK() window function:
WITH cte AS (
SELECT c.cno, c.size,
RANK() OVER (ORDER BY COUNT(*) DESC) rnk
FROM cage c INNER JOIN animal a
ON a.cno = c.cno
GROUP BY c.cno
HAVING MAX((LOWER(a.atype) = 'bird')::int) = 0
)
SELECT cno, size FROM cte WHERE rnk = 1;
Note that since cno is the PRIMARY KEY of cage you only need to group by cno.
I solved it by ordering the results descending and using limit 1 to show the first row (which is the max)

unable to count the occurrence of a guest in 2 different restaurants and display guest name

The question im trying to answer is : to find the names of guest who visited more than 2 different restaurants on 15-JUNE-20.
There is a:
Guest table with GID,Gname
Visit table with VID, GID, RESTID, VDATE
Restaurant table with RESTID, RNAME
whenever i tried introducing the groupby i would get the error
SELECT GuestN.GID, GuestN.Gname
FROM GuestN
WHERE GuestN.GID IN (
SELECT VisitN.GID
FROM VisitN
WHERE VisitN.Vdate = '15-JUN-20' AND VisitN.restID IN (
SELECT RestaurantN.Restid
FROM RestaurantN having count(*)>2));
The table RestaurantN is not needed since you have restID in the table VisitN and you are not interested in the restaurant'a name, but only on their number.
Join GuestN to VisitN, aggregate and set the condition in the HAVING clause:
SELECT g.GID, g.Gname
FROM GuestN g INNER JOIN VisitN v
ON v.GID = g.GID
WHERE v.Vdate = '15-JUN-20'
GROUP BY g.GID, g.Gname
HAVING COUNT(DISTINCT v.restID) > 2

SELECT * FROM table in addition of aggregation function

Short context:
I would like to show a list of all companies except if they are in the sector 'defense' or 'government' and their individual total spent on training classes. Only the companies that have this total amount above 1000 must be shown.
So I wrote the following query:
SELECT NAME, ADDRESS, ZIP_CODE, CITY, SUM(FEE-PROMOTION) AS "Total spent on training at REX"
FROM COMPANY INNER JOIN PERSON ON (COMPANY_NUMBER = EMPLOYER) INNER JOIN ENROLLMENT ON (PERSON_ID = STUDENT)
WHERE SECTOR_CODE NOT IN (SELECT CODE
FROM SECTOR
WHERE DESCRIPTION = 'Government' OR DESCRIPTION = 'Defense')
GROUP BY NAME, ADDRESS, ZIP_CODE, CITY
HAVING SUM(FEE-PROMOTION) > 1000
ORDER BY SUM(FEE-PROMOTION) DESC
Now what I actually need is, instead of defining every single column in the COMPANY table, I would like to show ALL columns of the COMPANY table using *.
SELECT * (all tables from COMPANY here), SUM(FEE-PROMOTION) AS "Total spent on training at REX"
FROM COMPANY INNER JOIN PERSON ON (COMPANY_NUMBER = EMPLOYER) INNER JOIN ENROLLMENT ON (PERSON_ID = STUDENT)
WHERE SECTOR_CODE NOT IN (SELECT CODE
FROM SECTOR
WHERE DESCRIPTION = 'Government' OR DESCRIPTION = 'Defense')
GROUP BY * (How to fix it here?)
HAVING SUM(FEE-PROMOTION) > 1000
ORDER BY SUM(FEE-PROMOTION) DESC
I could define every single column from COMPANY in the SELECT and that solution will do the job (as in the first example), but how can I make the query shorter using "SELECT * from the table COMPANY"?
The key idea is to summarize in the subquery to get the total spend for the company. This allows you to remove the aggregation from the outer query:
select c.*, pe.total_spend
from company c join
sector s
on c.sector_code = s.code left join
(select p.employer, sum(e.fee - e.promotion) as training_spend
from person p join
enrollment e
on p.person_id = e.student
group by p.employer
) pe
on pe.employer = c.company_number
where s.sector not in ('Government', 'Defense') and
pe.total_spend > 1000

How to perform max on an inner join with 2 different counts on columns?

How to find the user with the most referrals that have at least three blue shoes using PostgreSQL?
table 1 - users
name (matches shoes.owner_name)
referred_by (foreign keyed to users.name)
table 2 - shoes
owner_name (matches persons.name)
shoe_name
shoe_color
What I have so far is separate queries returning parts of what I want above:
(SELECT count(*) as shoe_count
FROM shoes
GROUP BY owner_name
WHERE shoe_color = “blue”
AND shoe_count>3) most_shoes
INNER JOIN
(SELECT count(*) as referral_count
FROM users
GROUP BY referred_by
) most_referrals
ORDER BY referral_count DESC
LIMIT 1
Two subqueries seem like the way to go. They would look like:
SELECT s.owner_name, s.show_count, r.referral_count
FROM (SELECT owner_name, count(*) as shoe_count
FROM shoes
WHERE shoe_color = 'blue'
GROUP BY owner_name
HAVING shoe_count >= 3
) s JOIN
(SELECT referred_by, count(*) as referral_count
FROM users
GROUP BY referred_by
) r
ON s.owner_name = r.referred_by
ORDER BY r.referral_count DESC
LIMIT 1 ;

complex sql query from 4 tables

I am developing an online travel guide with a lot of hotels. Each hotel belongs to a specific category, has a lot room types and each of hotel room has different price per season. I want to make a complex query from 4 tables in order to get the total number of hotels per hotels category where the minimum price of each hotel rooms is between 2 values which are adjusted by a slider.
My tables look like:
Categories
id_category
category_name
Hotels
id_hotel
hotel_name
category_id
......
hotels_room_types
id_hotels_room_type
hotel_id
room_type_id
......
hotels_room_types_seasons
hotels_room_types_id
season_id
price
......
for example some values of category_name are: Hotels, apartments, hostels
I would like my results table to have two fields like the following:
Hotels 32
apartments 0
hostels 5
I tried the following query but it returns the total number of all hotels per category, not the number of hotels where the minimum price of their rooms is between the price range.
SELECT c.category_name, count( DISTINCT id_hotel ) , min( price ) min_price
FROM categories c
LEFT JOIN hotels w ON ( c.id_category = w.category_id )
LEFT JOIN (
hotels_room_types
INNER JOIN hotels_room_types_seasons ON hotels_room_types.id_hotels_room_types = hotels_room_types_seasons.hotels_room_types_id)
ON w.id_hotel = hotels_room_types.hotel_id
GROUP BY c.category_name
HAVING min_price >=10 AND min_price <=130
Could anyone help me how to write the appropriate query?
Thanks!!!
SELECT Categories.Name, COUNT(DISTINCT ID_Hotel) [Count]
FROM Hotels
INNER JOIN Categories
ON Category_ID = ID_Category
INNER JOIN
( SELECT Hotel_ID, MIN(Price) [LowestPrice]
FROM hotels_room_types
INNER JOIN hotels_room_types_seasons
ON id_hotels_room_type = hotels_room_types_id
-- CONSIDER FILTERING BY SEASON HERE
GROUP BY Hotel_ID
) price
ON price.Hotel_ID = Hotels.ID_Hotel
WHERE LowestPrice BETWEEN 10 AND 130 -- OR WHATEVER YOUR PARAMETERS ARE
GROUP BY Categories.Name
I have no idea what RDBMS you are using but I do not know any where your query would work. The problem you were having with the Min Price (I assume) is because you are applying the logic after grouping by category, so you are counting all hotels where the category has a lowest price between 10 and 130, not where the hotel has a room with the lowest price between 10 and 130.
select
c.Category_name,
count(*) NumHotels
from
( select distinct
byRoomType.hotel_id
from
hotels_room_types_seasons bySeason
join hotels_room_types byRoomType
on bySeason.hotels_room_types_id = byRoomType.id_hotels_room_type
where
bySeason.Price between LowPriceParameter and HighPriceParameter
) QualifiedHotels
join Hotels
on QualifiedHotels.hotel_id = Hotels.id_hotel
join Categories c
on category_id = c.id_category