Multiple joins, average on one table, count on another

Multiple joins, average on one table, count on another - sql

I have four tables in a database: City, User, CityRating, CityGreeting. The CityRating table has the UserID and CityID as the PK, and those are FKs to the USer and City table. The CityGreeting table has no PK, but has the UserID and CityID as FKs (the idea is that a user can greet a city as many times as desired, but only rate a city once).
I am trying to write a query that will return the average rating of the city overall, as well as the times a specific user greeted the city:
select City.CityID, City.CityName, City.CityStateOrProvince,
ROUND(AVG(Cast(RateCity.Rating as float)), 2) as AverageRating,
(select COUNT(HelloCity.CityID) from HelloCity where HelloCity.UserID like '<guid>') as TimesVisited
from City
right join RateCity
on City.CityID = RateCity.CityID
right join HelloCity
on City.CityID = HelloCity.CityID
group by City.CityID, City.CityName,
City.CityStateOrProvince, City.CityCountry, City.CityImageUri
Even if I can get this to work as expected (which it currently is not) I feel like it is really messy. In terms of best practices, would it be better to write two queries? This operation would be performed in an api, not sure if the performance would be better on writing two seperate queries instead, or one complex one like this. Any insight on this or how to get the query to work as expected?
***EDIT: Added picture to clarify: Average Rating is the average of all users who rated, and TimesVisited is the amount of times one specific user has visited the city.

I believe you need to aggregate the tables, apart from city separately for this to work correctly:
select c.*, rc.AverageRating, coalesce(hc.TimesVisited, 0) as TimesVisited
from City c join
(select CityId, ROUND(AVG(Cast(RateCity.Rating as float)), 2) as AverageRating
from RateCity rc
group by CityId
) rc
on c.CityID = rc.CityID left join
(select CityId, count(*) as TimesVisited
from HelloCity hc
where hc.UserID like '<guid>'
group by CityId
) hc
on c.CityId = hc.CityId;
Notes:
Table aliases make the query easier to write and to read.
I doubt you really mean right join. That would imply that there are CityIds in the other two tables that are not in City.
By doing the aggregation for each other table, you don't need an aggregation in the outer query.
I do think you want a left join for the HelloCity table, because not all cities might have visitors.
You might want a left join for the RateCity table as well, if not all cities have ratings.

why don't you use a CTE and then do the individual parts in each CTE, it helps to break it down instead of trying to mash together bunch of joins: for example:
DECLARE #userId VARCHAR(10) = 'userid1' ;
WITH
CITY_RATING_CTE (cityId, AverageRating) AS
( SELECT cityId,
AVG(Rating) AS rating
FROM RateCity
GROUP BY cityId),
TIMES_VISITED_CTE AS
( SELECT cityId,
count(*) AS TimesVisited
FROM HelloCity
WHERE UserId = #userId
GROUP BY cityId)
SELECT c.CityId,
c.CityName,
c.CityStateOrProvince,
c.CityImageUri,
cr.AverageRating,
tv.TimesVisited
FROM City c
JOIN CITY_RATING_CTE cr ON cr.cityId = c.CityId
JOIN TIMES_VISITED_CTE tv ON cr.cityId = cr.cityId;

Related

Can't figure out whether I need to NEST or JOIN or something else?

I have 3 tables - continents, country info & flights.
I want to run a query that ranks the continents in DESC order by counting the number of countries within them that have 0 flights booked historically. But shows the continents NAME instead of its id
Continents table - cont_id (P-KEY), name varchar, notes varchar
Country table - cntry_id (P-KEY), name varchar, abbreviation varchar
Flights info table - cntry_id int, cont_id int, flights float, date date
I'm doing the on Metabase and thus far I've managed to get it to do everything but show the continents name, it only goes as far as showing its id. I have tried to NEST the main query and tried to using a join instead but neither have worked
SELECT "public"."flights_info"."cont_id", count(*) AS "count"
FROM "public"."flights_info"
WHERE "public"."flights_info"."flights" <= 0
GROUP BY "public"."flights_info"."cont_id"
ORDER BY "count" DESC
I'm successfully getting the cont_id, I just need a line of code that will make it run a lookup from the continents table and give me names that match the id's (I only want the names to show not the ID's)

So the simple answer is to take your query, reformat it a little, and then add an INNER JOIN to the continent table. Something like this would probably work:
SELECT
c.[name] AS continent_name,
COUNT(*) AS [count]
FROM
[public].flights_info f
INNER JOIN [public].continents c ON c.cont_id = f.cont_id
WHERE
f.flights <= 0
GROUP BY
c.[name]
ORDER BY
2 DESC;
However, I'm not convinced your original query is correct. You said you wanted to count the number of countries in each continent with no flights booked historically, and I don't think this is what you are counting at all. Instead you are counting the number of rows for each continent with a flights value of 0 or less than zero. Now maybe this is actually how your database works, and if so then cool, the query above should get you onto the right track.
However, if this database works anything like I think it should do then you would need a very different query, e.g. this one:
SELECT
c.[name] AS continent_name,
COUNT(DISTINCT cn.cntry_id) AS [count]
FROM
[public].continents c
INNER JOIN [public].country cn ON cn.cont_id = c.cont_id
LEFT JOIN [public].flights_info f ON f.cont_id = c.cont_id AND f.cntry_id = cn.cntry_id
WHERE
ISNULL(f.flights, 0) <= 0
GROUP BY
c.[name]
ORDER BY
2 DESC;
How does this work? Well it starts off with the continent table, and then links this to countries, to get a list of the countries in each continent. Then it performs a LEFT JOIN to the flights table, so it will get hits even if there's no flight data. Finally it counts up the number of countries where there was a flights value of 0 or less, or where there's no flights data at all.
Even this probably isn't correct, as if you had two rows for a country (I'm going to assume the flights table has a row for each continent, country, date), where one had a flights = 0 and one had a flights = 10, then this would still report that country as having no flights. But now I'm getting too far away from the original question I feel...

You can use join or sub-query to do this
Using JOIN
You just need to join with Continents table based cont_id column and fetch Continents name.
select Continents.name, count(*) AS "count"
FROM flights_info flights_info
join Continents Continents
on Continents.cont_id = flights_info.cont_id
WHERE flights_info.flights <= 0
GROUP BY flights_info.cont_id,Continents.name
ORDER BY "count" DESC
Using Sub-Query
You can write another query that give you Continents name by matching cont_id.
select (select Continents.name from Continents Continents where Continents.cont_id =
flights_info.cont_id ) Continent_name, count(*) AS "count"
FROM flights_info flights_info
WHERE flights_info.flights <= 0
GROUP BY flights_info.cont_id
ORDER BY "count" DESC

Subtracting values of columns from two different tables

I would like to take values from one table column and subtract those values from another column from another table.
I was able to achieve this by joining those tables and then subtracting both columns from each other.
Data from first table:
SELECT max_participants FROM courses ORDER BY id;
Data from second table:
SELECT COUNT(id) FROM participations GROUP BY course_id ORDER BY course_id;
Here is some code:
SELECT max_participants - participations AS free_places FROM
(
SELECT max_participants, COUNT(participations.id) AS participations
FROM courses
INNER JOIN participations ON participations.course_id = courses.id
GROUP BY courses.max_participants, participations.course_id
ORDER BY participations.course_id
) AS course_places;
In general, it works, but I was wondering, if there is some way to make it simplier or maybe my approach isn't correct and this code will not work in some conditions? Maybe it needs to be optimized.
I've read some information about not to rely on natural order of result set in databases and that information made my doubts to appear.

If you want the values per course, I would recommend:
SELECT c.id, (c.max_participants - COUNT(p.id)) AS free_places
FROM courses c LEFT JOIN
participations p
ON p.course_id = c.id
GROUP BY c.id, c.max_participants
ORDER BY 1;
Note the LEFT JOIN to be sure all courses are included, even those with no participants.
The overall number is a little tricker. One method is to use the above as a subquery. Alternatively, you can pre-aggregate each table:
select c.max_participants - p.num_participants
from (select sum(max_participants) as max_participants from courses) c cross join
(select count(*) as num_participants from participants from participations) p;

Get row from one table, plus COUNT from a related table

I'm trying to build an SQL query where I grab one table's information (WHERE shops.shop_domain = X) along with the COUNT of the customers table WHERE customers.product_id = 4242451.
The shops table DOES NOT have product.id in it, but the customers table DOES HAVE the shop_domain in it, hence my attempt to do some sort of join.
I essentially want to return the following:
shops.id
shops.name
shops.shop_domain
COUNT OF CUSTOMERS WHERE customers.product_id = '4242451'
Here is my not so lovely attempt at the query.
I think I have the idea right (maybe...) but I can't wrap my head around building this query.
SELECT shops.id, shops.name, shops.shop_domain, COUNT(customers.customer_id)
FROM shops
LEFT JOIN customers ON shops.shop_domain = customers.shop_domain
WHERE shops.shop_domain = 'myshop.com' AND
customers.product_id = '4242451'
GROUP BY shops.shop_id
Relevant database schemas:
shops:
id, name, shop_domain
customers:
id, name, product_id, shop_domain

You are close. The condition on customers needs to go in the ON clause, because this is a LEFT JOIN and customers is the second table:
SELECT s.id, s.name, s.shop_domain, COUNT(c.customer_id)
FROM shops s LEFT JOIN
customers c
ON s.shop_domain = c.shop_domain AND c.product_id = '4242451'
WHERE s.shop_domain = 'myshop.com'
GROUP BY s.id, s.name, s.shop_domain;
I am also inclined to include all three columns in the GROUP BY, although Postgres (and ANSI/ISO standards) are happy with just id if it is declared as the primary key in the table.

A correlated subquery should be substantially cheaper (and simpler) for the purpose:
SELECT id, name, shop_domain
, (SELECT count(*)
FROM customers
WHERE shop_domain = s.shop_domain
AND product_id = 4242451) AS special_count
FROM shops s
WHERE shop_domain = 'myshop.com';
This way you only need to aggregate in the subquery, and need not worry about undesired effects on the outer query.
Assuming product_id is a numeric data type, so I use a numeric literal (4242451) instead of a string literal '4242451' - which might cause problems otherwise.

Combining distinct and count() from two tables

I have two tables:
Customers (name, address, postcode (FK))
Postcodes (postcode (PK), county)
I want to find out how many customers are in each county.
I am assuming I need an inner join on postcode but don't know how to combine this with a count(customer_id) and distinct(county).

Although you can write queries with SELECT DISTINCT country it prevents you from doing aggregates such as COUNT. Instead you can use GROUP BY which broadly has the same effect as DISTINCT but with much more power and flexibility.
These two queries give the same results, but the second lets you then go on to add your JOIN and COUNT statements.
SELECT DISTINCT county FROM postcodes
SELECT county FROM postcodes GROUP BY county
By and large, don't use SELECT DISTINCT, but use this kind of pattern...
SELECT
postcodes.county,
COUNT(customers.customer_id)
FROM
postcodes
INNER JOIN
customers
ON customers.postcode = postcodes.postcode
GROUP BY
postcodes.county

Just join the Customers table to the Postcodes table on the common field 'postcode '. Then you can use Group By to get your counts and return one row per County
SELECT
County,
COUNT(Customer_Id) CustomerCount
FROM
Postcodes pc
JOIN Customers c ON pc.PostalCode = c.PostalCode
GROUP BY
County

Correct way to use "NOT IN" Postgres

I have two tables, People, and Vehicles. Vehicles belongs to people. Im trying to check if a person does not have a vehicle. I was attempting to do this by joining People and Vehicles, and displaying the persons ID that is NOT IN Vehicles.person_id.
This is returning nothing, and has me wondering if there is something I did wrong, or if there is a more efficient way of doing this.
Query is below
Select People.id
From People
INNER JOIN Vehicles
on People.id=Vehicles.person_id
where People.id NOT IN Vehicles.person_id;

Use left join to figure out the persons with no vehicles
Select distinct People.id
From People
LEFT JOIN Vehicles on People.id=Vehicles.person_id
where Vehicles.person_id is NULL

NOT IN can have issues with NULL values, and should probably be avoided for performance reasons if the subquery is very large.
Try NOT EXISTS:
SELECT p.id
FROM People p
WHERE NOT EXISTS (
SELECT 1
FROM Vehicles v
WHERE v.person_id = p.id)

another solution, using sets:
Select id From People
except
SELECT person_id FROM Vehicles

Use Subquery as below:
Select id
From People
WHERE id NOT IN (SELECT distinct person_id
FROM Vehicles
WHERE person_id IS NOT NULL)
select all people who are not in (by Select id From People WHERE id NOT IN) the List of all the people who has vehicle by SELECT distinct person_id FROM Vehicles (you could avoid null as well here if you want).

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Multiple joins, average on one table, count on another - sql

Related

Can't figure out whether I need to NEST or JOIN or something else?

Subtracting values of columns from two different tables

Get row from one table, plus COUNT from a related table

Combining distinct and count() from two tables

Correct way to use "NOT IN" Postgres

Categories

Resources