Finding the decade with largest records, SQL Server - sql

I have the following db diagram :
I want to find the decade (for example 1990 to 2000) that has the most number of movies.
Actually it only deals with "Movies" table.
Any idea on how to do that?

You can use the LEFT function in SQL Server to get the decade from the year. The decade is the first 3 digits of the year. You can group by the decade and then count the number of movies. If you sort, or order, the results by the number of movies - the decade with the largest number of movies will be at the top. For example:
select
count(id) as number_of_movies,
left(cast([year] as varchar(4)), 3) + '0s' as decade
from movies
group by left(cast([year] as varchar(4)), 3)
order by number_of_movies desc

An alternative to the string approach is to use integer division to get the decade:
SELECT [Year]/10*10 as [Decade]
, COUNT(*) as [CountMovies]
FROM Movies
GROUP BY [Year]/10*10
ORDER BY [CountMovies] DESC
This returns all, ordered by the decade(s) with the most movies. You could add a TOP (1) to only get the top, but then you'd need to consider tiebreaker scenarios to ensure you get deterministic results.

select substring(cast([year] as varchar), 1, 3) as Decade,
Count(1) [Count]
from Movies
group by substring(cast([year] as varchar), 1, 3)
order by 2 desc

SELECT floor(Year(getdate())/10)*10
, floor(year('5/11/2004')/10)*10
, floor(Year('7/23/1689')/10)*10
, floor(Year('7/09/1989')/10)*10

Despite being an old question I found this solution via trying
DATE_PART('decade',(year::date)) AS decade,
DATE_TRUNC('decade',(year::date)) AS decade_truncated,
also works for
DATE_PART('century',(year::date)) AS decade,
DATE_TRUNC('century',(year::date)) AS decade_truncated,

In my case, I am having year as a string column. To get the movies grouped by decades,
SELECT DISTINCT NUMRANGE(
CAST(FLOOR(CAST(year AS INT)/ 10) * 10 AS INT),
CAST((FLOOR(CAST(year AS INT)/ 10) * 10) + 9 AS INT)
) AS "decades", COUNT(*) AS "movie_count"
FROM movies
WHERE year IS NOT NULL AND year != ''
GROUP BY decades
ORDER BY movie_count DESC;
This gives the number of movies in that decade. Hope this one helps someone...

Related

Getting top 10 most popular within an array column using SQL UNNEST

I am working with a sample data set which gives the following result:
Continuing to work, I am now trying to get the top 10 Production Companies (based on "production_companies" field) that made the most number of movies in the most popular genre for a year.
The output
Rank | Production Company | Popular Genre | Movie Count
I thought breaking this down to getting the most popular genre for the year would be the 1st step with the following query:
select
genres.name AS _genre,
FROM
commons.movies m,
UNNEST(m.genres) as genres
WHERE
SUBSTR(m.release_date, 1, 4) = '2008'
GROUP BY
genres.name
ORDER BY
COUNT(genres.name) DESC
LIMIT
1
I have now go the output as 'Drama' being the most popular genre for the year 2008.
Answering the question to get the most popular prod company and their count has been a bit challenging and failing several times.
I have after several tries got to:
select
o_prd_cmp.name,
o_mov.title
from
commons.movies o_mov,
unnest(o_mov.genres) as o_gnr,
UNNEST(o_mov.production_companies) AS o_prd_cmp
where
SUBSTR(o_mov.release_date, 1, 4) = '2008'
AND o_gnr.name = (
select
genres.name AS _genre,
FROM
commons.movies m,
UNNEST(m.genres) as genres
WHERE
SUBSTR(m.release_date, 1, 4) = '2008'
GROUP BY
genres.name
ORDER BY
COUNT(genres.name) DESC
LIMIT
1
)
Any help with this is greatly appreciated.

find an average of a column using group with inner join and then filtering through the groups

I've been trying to solve an sqlite question where I have two tables: Movies and movie_cast.
Movies has the columns: id, movie_title, and `score. Here is a sample of the data:
11|Star Wars|76.496
62|2001:Space Odyssey|39.064
152|Start Trek|26.551
movie_cast has the columns: movie_id, cast_id, cast_name, birthday, popularity. Here is a sample.
11|2|Mark Hamill|9/25/51|15.015
11|3|Harrison Ford|10/21/56|8.905
11|5|Peter Cushing|05/26/13|6.35
IN this case movies.id and movie_cast.movie_id are the same.
The question is to Find the top ten cast members who have the highest average movie scores.
Do not include movies with score <25 in the average score calculation.
▪ Exclude cast members who have appeared in two or fewer movies.
My query is as below but it doesn't seem to get me the right answer.
SELECT movie_cast.cast_id,
movie_cast.cast_name,
printf("%.2f",CAST(AVG(movies.score) as float)),
COUNT(movie_cast.cast_name)
FROM movies
INNER JOIN movie_cast ON movies.id = movie_cast.movie_id
WHERE movies.score >= 25
GROUP BY movie_cast.cast_id
HAVING COUNT(movie_cast.cast_name) > 2
ORDER BY AVG(movies.score ) DESC, movie_cast.cast_name ASC
LIMIT 10
The answers I get are in the format cast_id,cat_name,avg score.
-And example is: 3 Harrison Ford 52.30
I've analyzed and re-analyzed my logic but to no avail. I'm not sure where I'm going wrong. Any help would be great!
Thank you!
This is how I would write the query:
SELECT mc.cast_id,
mc.cast_name,
PRINTF('%.2f', AVG(m.score)) avg_score
FROM movie_cast mc INNER JOIN movies m
ON m.id = mc.movie_id
WHERE m.score >= 25
GROUP BY mc.cast_id, mc.cast_name
HAVING COUNT(*) > 2
ORDER BY AVG(m.score) DESC, mc.cast_name ASC
LIMIT 10;
I use aliases for the tables to shorten the code and make it more readable.
There is no need to cast the average to a float because the average in SQLite is always a real number.
Both COUNT(movie_cast.cast_name) can be simplified to COUNT(*) but the 1st one in the SELECT list is not needed by your requirement (if it is then add it).
The function PRINTF() returns a string, but if you want a number returned then use ROUND():
ROUND(AVG(m.score), 2) avg_score

Oracle SQL query, getting a a maximum of a sum

Hey, guys. I'm struggling to solve one query, just cant get around it.
Basically, I got a some tables from data mart :
DimTheatre(TheatreId(PK), TheatreNo, Name, Address, MainTel);
DimTrow(TrowId(PK), TrowNo, RowName, RowType);
DimProduction(ProductionId(PK), ProductionNo, Title, ProductionDir, PlayAuthor);
DimTime(TimeId(PK), Year, Month, Day, Hour);
TicketPurchaseFact( TheatreId(FK), TimeId(FK), TrowId(FK),
PId(FK), TicketAmount);
The thing I'm trying to achieve in oracle is - I need to retrieve the most popular row type in each theatre by value of ticket sale
Thing I'm doing now is :
SELECT dthr.theatreid, dthr.name, max(tr.rowtype) keep(dense_rank last order
by tpf.ticketamount), sum(tpf.ticketamount) TotalSale
FROM TicketPurchaseFact tpf, DimTheatre dthr, DimTrow tr
WHERE dthr.theatreid = tpf.theatreid
GROUP BY dthr.theatreid, dthr.name;
It does give me the output, but the 'TotalSale' column is totally out of place, it gives much way higher numbers than they should be.. How could I approach this issue :) ?
I am not sure how MAX() KEEP () would help your case if I understand the problem correctly. But the below approach should work:
SELECT x.theatreid, x.name, x.rowtype, x.total_sale
FROM
(SELECT z.theatreid, z.name, z.rowtype, z.total_sale, DENSE_RANK() OVER (PARTITION BY z.theatreid, z.name ORDER BY z.total_sale DESC) as popular_row_rank
FROM
(SELECT dthr.theatreid, dthr.name, tr.rowtype, SUM(tpf.ticketamount) as total_sale
FROM TicketPurchaseFact tpf, DimTheatre dthr, DimTrow tr
WHERE dthr.theatreid = tpf.theatreid AND tr.trowid = tpf.trowid
GROUP BY dthr.theatreid, dthr.name, tr.rowtype) z
) x
WHERE x.popular_row_rank = 1;
You want the row type per theatre with the highest ticket amount. So join purchases and rows and then aggregate to get the total per rowtype. Use RANK to rank your row types per theatre and stay with the best ranked ones. At last join with the theatre table to get the theatre name.
select
theatreid,
t.name,
tr.trowid
from
(
select
p.theatreid,
r.rowtype,
rank() over (partition by p.theatreid order by sum(p.ticketamount) desc) as rn
from ticketpurchasefact p
join dimtrow r using (trowid)
group by p.theatreid, r.rowtype
) tr
join dimtheatre t using (theatreid)
where tr.rn = 1;

Ratio or Percentage from group by SQL query from column with condition and without condition

I am having some trouble with a SQL query. From a table let's call it Reports:
I want to group all the reports by the name column.
Then for each of those name groups I want to go to the rating column and count the number of times the rating was 15 or less. Let's say this happened 10 times for one of the groups with the name BOBBO.
I also want to know the number of times ratings were submitted (same as total number of records for each name group). So using the name group BOBBO let's say he has 20 ratings.
So under the condition the group BOBBO 50% of the time has a rating 15 or less.
I've seen these posts -- I am still having some trouble cracking this.
using-count-and-return-percentage-against-sum-of-records
getting-two-counts-and-then-dividing-them
getting-a-percentage-from-mysql-with-a-group-by-condition-and-precision
divide-two-counts-from-one-select
After reading those I tried queries like these:
ActiveRecord::Base.connection.execute
("SELECT COUNT(*) Matched,
(select COUNT(rating) from reports group by name) Total,
CAST(COUNT(*) AS FLOAT)/CAST((SELECT COUNT(*) FROM reports group by name) AS FLOAT)*100 Percentage from reports
where rating <= 15 order by Percentage")
ActiveRecord::Base.connection.execute
("select name, sum(rating) / count(rating) as bad_rating
from reports group by name having bad_rating <= 15")
Any help would be very much appreciated!
Consider a conditional aggregate for the bad ratings divided by full count:
SELECT [name],
SUM(CASE WHEN [rating] <= 15 THEN 1 ELSE 0 END) / Count(*) AS bad_rating
FROM Reports
GROUP BY [name]
Or as #CL. points out a shorter conditional aggregate (where logical expression is summed):
SELECT [name],
SUM([rating] <= 15) / Count(*) AS bad_rating
FROM Reports
GROUP BY [name]

SQL - Get consecutively minimum numbers

Title may not make sense so I will provide some context.
I have a table, call it Movies.
A movie tuple has the values: Name, Director, Genre, Year
I'm trying to create a query that allows me to return all Directors who have never released two consecutive Horror films with more than 4 years apart.
I'm not sure where I'd begin but I'm trying to start off by creating a query that given some specific year, returns the next minimum year, so that I can check if the difference between these two is less than 4, and keep doing that for all movies.
My attempt was:
SELECT D1.Director
FROM Movies D1
WHERE D1.Director NOT IN
(SELECT D2.Director FROM Director D2
WHERE D2.Director = D1.Director
AND D2.Genre = 'Horror'
AND D1.Genre = 'Horror' AND D2.Year - D1.Year > 4
OR D1.Year - D2.Year > 4)
which does not work for obvious reasons.
I've also had a few attempts using joins, and it works on films that follow a pattern such as 2000, 2003, 2006, but fail if more than 3 films.
You could try this:
Select all data, and use lag or lead to return the last or next year. After that look at the difference between the two.
WITH TempTable AS (
SELECT
Name,
Director,
Genre,
Year,
LAG(Year) OVER (PARTITION BY Name, Director, Genre ORDER BY Year ASC) AS 'PriorYear'
FROM
Movies
WHERE
Genre = 'Horror'
)
SELECT
Name,
Director
FROM
TempTable
GROUP BY
Name,
Director
HAVING
MAX(Year-PriorYear) < 2
Try this:
SELECT * FROM (
SELECT director, min(diff) as diff FROM (
SELECT m1.director, m1.year as year1, m2.year as year2, m2.year-m1.year as diff
FROM `movies` m1, movies m2
WHERE m1.director = m2.director and m1.name <> m2.name and m1.year<=m2.year
and m1.genre='horror' and m2.genre='horror'
) d1 group by director
) d2 WHERE diff>4
First, in the inner Select it will list all movie pairs of directors' horror movies with year difference calculated, then minimum of these are selected (for consecutiveness), then longer than 4 years differences are selected...