Find max value in aggregate function - sql

i have the following Query
USE Movies;
SELECT
c.CountryName
,d.DirectorName
,f.FilmRunTimeMinutes AS [TotalRunTime]
FROM
tblFilm as f
JOIN tblCountry as c on c.CountryID = f.FilmCountryID
JOIN tblDirector as d on d.DirectorID = f.FilmDirectorID
ORDER BY
DirectorName
which gives me the following result:
so far so good.
Then i grouped my result, to Sum up the TotalRunTime for each Director and Country:
SELECT
c.CountryName
,d.DirectorName
,SUM(CONVERT(DECIMAL, f.FilmRunTimeMinutes)) AS [TotalRunTime]
,COUNT(*)
FROM
tblFilm as f
JOIN tblCountry as c on c.CountryID = f.FilmCountryID
JOIN tblDirector as d on d.DirectorID = f.FilmDirectorID
GROUP BY
CountryName
,DirectorName
this gives me following result:
Now i want the actor with the highest Count(*) (the colum 'no column name') and i tried this:
SELECT
c.CountryName
,d.DirectorName
,SUM(CONVERT(DECIMAL, f.FilmRunTimeMinutes)) AS [TotalRunTime]
,COUNT(*)
FROM
tblFilm as f
JOIN tblCountry as c on c.CountryID = f.FilmCountryID
JOIN tblDirector as d on d.DirectorID = f.FilmDirectorID
GROUP BY
CountryName
,DirectorName
HAVING
COUNT(*) = MAX(Count(*))
But it´s not working :(. Can you please explain me why it´s not working in detail and how i can get the row with the max(count(*))? In this example it should give me the row Japan | Akira usw.

Just use TOP (1) clause :
SELECT TOP (1) c.CountryName, d.DirectorName,
SUM(CONVERT(DECIMAL, f.FilmRunTimeMinutes)) AS [TotalRunTime]
COUNT(*) AS cnt
FROM tblFilm as f JOIN
tblCountry as c
on c.CountryID = f.FilmCountryID JOIN
tblDirector as d
on d.DirectorID = f.FilmDirectorID
GROUP BY CountryName, DirectorName
ORDER BY cnt DESC;
However, this might be fail if the cnt has ties if so, then use RANK() instead :
SELECT t.*
FROM (SELECT c.CountryName, d.DirectorName,
SUM(CONVERT(DECIMAL, f.FilmRunTimeMinutes)) AS [TotalRunTime]
COUNT(*) AS cnt,
RANK() OVER (ORDER BY COUNT(*) DESC) AS Seq
FROM tblFilm as f JOIN
tblCountry as c
ON c.CountryID = f.FilmCountryID JOIN
tblDirector as d
ON d.DirectorID = f.FilmDirectorID
GROUP BY CountryName, DirectorName
) t
WHERE seq = 1;

Related

Writing a subquery in SQL, how to combine two queries?

My goal is to find the top 10 countries within the top 10 cities. I successfully used this query:
SELECT
COUNT(A.customer_id) AS number_of_customers,
D.country, C.city
FROM
customer A
INNER JOIN
address B ON A.address_id = B.address_id
INNER JOIN
city C ON B.city_id = C.city_id
INNER JOIN
country D ON C.country_ID = D.country_ID
WHERE
country IN ('India', 'China', 'United States', 'Japan', 'Mexico', 'Brazil', 'Russian Federation', 'Phillipines', 'Turkey', 'Indonesia')
GROUP BY
C.city, D.country
ORDER BY
number_of_customers DESC
LIMIT 10
But I would like to use a subquery rather than listing the countries which I found using a previous query:
SELECT
COUNT(A.customer_id) AS number_of_customers,
D.country
FROM
customer A
INNER JOIN
address B ON A.address_id = B.address_id
INNER JOIN
city C ON B.city_id = C.city_id
INNER JOIN
country D ON C.country_ID = D.country_ID
GROUP BY
D.country
ORDER BY
number_of_customers DESC
LIMIT 10
How can I combine these two queries correctly? I keep getting different errors when I try to replace the list of countries with in the second query I posted. I apologize if this is a stupid question; I am a beginner.
My attempt:
SELECT
COUNT(A.customer_id) AS number_of_customers,
D.country, C.city
FROM
customer A
INNER JOIN
address B ON A.address_id = B.address_id
INNER JOIN
city C ON B.city_id = C.city_id
INNER JOIN
country D ON C.country_ID = D.country_ID
WHERE
country IN (SELECT COUNT(A.customer_id) AS number_of_customers, D.country
FROM customer A
INNER JOIN address B ON A.address_id = B.address_id
INNER JOIN city C ON B.city_id = C.city_id
INNER JOIN country D ON C.country_ID = D.country_ID
GROUP BY D.country
ORDER BY number_of_customers DESC
LIMIT 10)
GROUP BY
C.city, D.country
ORDER BY
number_of_customers DESC
LIMIT 10
But I get an error
subquery has too many columns
For instance, in PostgreSQL you can use with queries, see the Documentation:
WITH top_countries AS (
SELECT count(A.customer_id) AS number_of_customers,
D.country AS country
FROM customer A
INNER JOIN address B ON A.address_id = B.address_id
INNER JOIN city C ON B.city_id = C.city_id
INNER JOIN country D ON C.country_ID = D.country_ID
GROUP BY D.country
ORDER BY number_of_customers DESC
LIMIT 10
)
SELECT count(A.customer_id) AS number_of_customers,
D.country, C.city
FROM customer A
INNER JOIN address B ON A.address_id = B.address_id
INNER JOIN city C ON B.city_id = C.city_id
INNER JOIN country D ON C.country_ID = D.country_ID
WHERE country IN (SELECT tc.country FROM top_countries)
GROUP BY C.city,D.country
ORDER BY number_of_customers DESC
LIMIT 10

For each country, report the movie genre with the highest average rates

For each country, report the movie genre with the highest average ratings, and I am missing only one step that i cant figure it out.
Here's my current code:
SELECT c.code AS c_CODE, menres.genre AS GENRE, AVG(RATE) as AVERAGE_rate,MAX(RATE) AS MAXIMUM_rate, MIN(RATE) AS MINIMUM_rate from movirates
leftJOIN movgenres ON movgenres.movieid = movratings.movieid
left JOIN users ON users.userid = movrates.userid
left JOIN c ON c.code = users.city
LEFT JOIN menres ON movenres.genreid = menres.code
GROUP BY menres.genre , c.code
order by c.code asc, avg(rate) desc, menres.genre desc ;
You can use the ROW_NUMBER window function to assign a unique rank to each of your rows:
partitioned by country code
ordered by descendent average rating
Once you get this ranking, you may want to select all those rows which have the highest average rating (which are the same having the ranking equal to 1).
WITH cte AS (
SELECT c.code AS COUNTRY_CODE,
mg.genre AS GENRE,
AVG(rating) AS AVERAGE_RATING,
MAX(rating) AS MAXIMUM_RATING,
MIN(RATING) AS MINIMUM_RATING
FROM moviesratings r
INNER JOIN moviesgenres g ON g.movieid = r.movieid
INNER JOIN users u ON u.userid = r.userid
INNER JOIN countries c ON c.code = u.country
LEFT JOIN mGenres mg ON mg.code = g.genreid
GROUP BY mg.genre,
c.code
ORDER BY c.code,
AVG(rating) DESC,
mg.genre DESC;
)
SELECT *
FROM (SELECT *,
ROW_NUMBER() OVER(
PARTITION BY COUNTRY_CODE,
ORDER BY AVERAGE_RATING) AS rn
FROM cte) ranked_averages
WHERE rn = 1
Note: The code inside the common table expression is equivalent to yours. If you're willing to share your input tables, I may even suggest an improved query.
You should use window function in this case by using rank() then select the first rank only.
with mov_rates(c.code, genre, average, max, min)
as.
select c.code c_code,
e.genre genre,
avg (rate) avg
max (rate) max
min (rate) min
from movrates a
LEFT join movge.nres b on a.movieid = b.movieid
LEFT join users c on a.userid = c.user
LEFT join countr.ies d on c.code = d.code
left join mGenres e on b.genreid = e.code
group by d.country_code, e.x
),
rategenre (rank, c_code, genre, avgrate, max, min)
as
(
select rank() over (partition by c.c order by avgrates asc) rank,
country code,
genre,
average_r.ating,
maximum_rating,
minimum_.ating
from movrate \\just practicing on something
)
selec.t 2
from genre
where rank = 5
Reference:
OVER Clause

How to do I query all distinct rows with only their highest values?

I have been trying to query each city's popular genre. I am only trying to get the rows that I have highlighted. I tried using MAX() on a group by but gave me a syntax error.
My CTE query is as follows, its based on the dbeaver sample dataset:
with q_table
as
( select City, Genre, count(*) as counts
from
(select c.City, g.Name as Genre
from bus5dwr.dbeaver_sample.Customer c
inner join bus5dwr.dbeaver_sample.Invoice i
on i.CustomerId = c.CustomerId
inner join bus5dwr.dbeaver_sample.InvoiceLine il
on il.InvoiceId = i.InvoiceId
inner join bus5dwr.dbeaver_sample.track t
on t.TrackId = il.TrackId
inner join bus5dwr.dbeaver_sample.Genre g
on g.GenreId = t.GenreId
where Country = 'USA'
) as t2
group by City, Genre)
I tried the following query.
I don't have a dataset to test this on, but you should be able to just add a ROW_NUMBER() function to your CTE to get the values you are looking for. Such as:
with q_table
as
( select City, Genre, count(*) as counts,
,ROW_NUMBER() OVER(partition by City order by count(*) desc) RN
from
(select c.City, g.Name as Genre
from bus5dwr.dbeaver_sample.Customer c
inner join bus5dwr.dbeaver_sample.Invoice i
on i.CustomerId = c.CustomerId
inner join bus5dwr.dbeaver_sample.InvoiceLine il
on il.InvoiceId = i.InvoiceId
inner join bus5dwr.dbeaver_sample.track t
on t.TrackId = il.TrackId
inner join bus5dwr.dbeaver_sample.Genre g
on g.GenreId = t.GenreId
where Country = 'USA'
) as t2
group by City, Genre)
SELECT City, Genre, Counts
from q_table
WHERE RN=1
Order BY City
This use of MAX should work.
Edit; Added inner join. Thanks to Gordon Linoff for the observation that my original answer didn't actually achieve anything.
with q_table
as
( select City, Genre, count(*) as counts
from
(select c.City, g.Name as Genre
from bus5dwr.dbeaver_sample.Customer c
inner join bus5dwr.dbeaver_sample.Invoice i
on i.CustomerId = c.CustomerId
inner join bus5dwr.dbeaver_sample.InvoiceLine il
on il.InvoiceId = i.InvoiceId
inner join bus5dwr.dbeaver_sample.track t
on t.TrackId = il.TrackId
inner join bus5dwr.dbeaver_sample.Genre g
on g.GenreId = t.GenreId
where Country = 'USA'
) as t2
group by City, Genre)
SELECT a.City, a.Genre, a.counts
FROM q_table a
INNER JOIN (
SELECT City, MAX(counts) counts
FROM q_table
GROUP BY City
) b ON a.City = b.City AND a.counts = b.counts;
try this
with q_table
as
(select * from (
( select City, Genre, count(*) as counts
from
(select c.City, g.Name as Genre
from bus5dwr.dbeaver_sample.Customer c
inner join bus5dwr.dbeaver_sample.Invoice i
on i.CustomerId = c.CustomerId
inner join bus5dwr.dbeaver_sample.InvoiceLine il
on il.InvoiceId = i.InvoiceId
inner join bus5dwr.dbeaver_sample.track t
on t.TrackId = il.TrackId
inner join bus5dwr.dbeaver_sample.Genre g
on g.GenreId = t.GenreId
where Country = 'USA'
) as t2
group by City, Genre)) as t3 where count in (select max(count) count from t3 group by city)

Query , joining, SQL server example (Concert) with a couple of related tables

I dont know who to return what I wrote before, apologise. vowejin firnefk rneqkln qrecjinrelqkjnr klwencirowejncienfvenciernicnreinc ikrenicernircniwncikwnkwjnkcjwnkjnckjncwkjnwckjnweknckejnckwjnckjnwekcjnwekjnckwjenckjwenkcjnwekjnckwenckwjenklwneocnwocnowencoejnkjwencojnwekojcnwekjcnkwejnckejcnkwejnckjwenkcjnwkjcnwkn:)
Using TOP:
SELECT TOP 1
PID, NAME, AGE
FROM (
SELECT
p.*, h.HID
FROM Performer p
INNER JOIN Concert c
ON c.PID = p.PID
INNER JOIN Hall h
ON h.HID = c.HID
INNER JOIN Tickets t
ON t.CID = c.CID
GROUP BY p.PID, p.NAME, p.AGE, h.HID, h.CAPACITY
HAVING COUNT(t.TID) = h.CAPACITY
) t
GROUP BY PID, NAME, AGE
ORDER BY COUNT(*) DESC
This should return expected result
;with Cte1 AS (
select C.CID, P.Name AS PerformerName, H.Name AS HallName, H.Capacity, H.HID
from #Performer P
inner join #Concert C on C.PID = P.PID
inner join #Hall H on H.HID = C.HID
)
, Cte2 AS (
select C.CID, H.HID, COUNT(*) SellCount
from #Concert C
inner join #Hall H on H.HID = C.HID
inner join #Tickets T on T.CID = C.CID
group by C.CID, H.HID
)
select Cte1.CID, Cte1.PerformerName, Cte1.HallName, Cte2.SellCount
from Cte1 inner join Cte2 on Cte2.CID = Cte1.CID AND Cte2.HID = Cte1.HID
where Cte1.Capacity = Cte2.SellCount

Identifying percentage in Fact Table

I am new in programming and could not find an answer.
I have following dimensions(tables) and fact table:
Customer: CustomerId, HomeRegion
Regions: RegionId, RegionName
MyTime: id, MyHour
Fact table: CustomerId, RegionId, TimeId, FactId
I must have report as: HomeRegion, Hour, RegionName, UserPercentage.
As shown in the example, only 3.67% people whose home region is A move to B at 9am and so on.
I should create simular one.
The problem is obtainig UserPercentage. Here is the code I did so far.
SELECT c.HomeRegion, mt.myhour as Time, r.RegionName as CurrentRegion,
(SELECT COUNT(*)
/*number of users who move from their home
region to CurrentRegion at specific time*/
)/COUNT(c.CustomerId)*100 as UserPercentage
FROM dbo.FactTable ft
inner join dbo.Customer c
ON ft.CustomerId = c.CustomerId
inner join dbo.myTime mt
ON ft.TimeId = mt.ID
inner join dbo.Regions r
ON ft.RegionId = r.RegionId
WHERE mt.myhour = '09'
GROUP BY c.HomeRegion, mt.myhour, r.RegionName
ORDER BY c.HomeRegion, r.RegionName
Using the analytical functions
* no need to select or groupby myHour constant
* assuming one Customer should be located in 1 region at once (if not - it would be much harder to select)
select HomeRegion, CurrentRegion,
count(*) / count(*) over () as overall_share,
count(*) / count(*) over (partition by HomeRegion) as homeregion_share,
from
(SELECT c.HomeRegion, r.RegionName as CurrentRegion, c.CustomerId as CUST
FROM dbo.FactTable ft
inner join dbo.Customer c
ON ft.CustomerId = c.CustomerId
inner join dbo.myTime mt
ON ft.TimeId = mt.ID
inner join dbo.Regions r
ON ft.RegionId = r.RegionId
WHERE mt.myhour = '09'
GROUP BY c.HomeRegion, r.RegionName, c.CustomerId) uni_users
GROUP by HomeRegion, CurrentRegion
Try something like this in your comment area.
SELECT (TMP1.Count*100)/COUNT(TMP2.CustomerId) AS 'Percentage'
FROM
(
SELECT COUNT(*) AS 'Count'
FROM dbo.FactTable ft
inner join dbo.Customer c ON ft.CustomerId = c.CustomerId
inner join dbo.Regions r ON ft.RegionId = r.RegionId
WHERE
r.RegionName IN ('A','B','C','D','E') AND
c.HomeRegion IN ('A','B','C','D','E')
) AS 'TMP1', dbo.Customer AS 'TMP2'