Top 1 Profitable movie for each decade [duplicate] - sql

This question already has answers here:
Get top 1 row of each group
(19 answers)
Closed last month.
I need help with SQL Server query. I need to extract only the top 1 profitable movie for each decade.
Suppose there are 20 distinct decades and I only want to extract the top 1 profitable movie for each decade. Can someone help me with the query?
I have attached the screen shot for the reference. My result shows all the profitable movies for each decade. I only want the top 1 profitable movie for each decade.
For reference enter image description here
Select
decade, Movie_Title, Profit
from
DW.IMDB_MOVIE
group by
decade, Movie_Title, profit
order by
decade, profit desc

One option is using WITH TIES in concert with the window function row_number()
Example
Select top 1 with ties *
From DW.IMDB_MOVIE
Order by row_number() over (partition by decade order by profit desc)
Or a nudge more performant
with cte as (
Select *
,RN = row_number() over (partition by decade order by profit desc)
From DW.IMDB_MOVIE
)
Select *
From cte
Where RN=1

Related

Snowflake SQL code to show only second record for items with duplicate ID

I'm trying to get my head around SQL and am using Snowflake as a testbed to do this. I have a table with products which have multiple reviews against them. I am trying to structure a query to only show products with 2 or more reviews and then only show the second review. As I say, this is merely me trying to better understand SQL so selecting the second review is a random ask. The table is made up of 4 columns. 1 is Product ID, 2 is Product Name, 3 is Review and 4 is Date Review was posted.
Thanks in advance for any help.
You use row_number() for this type of query:
select t.*
from (select t.*,
row_number() over (partition by product_id order by date_review asc) as seqnum
from t
) t
where seqnum = 2;
You can use a windowing function like ROW_NUMBER() to make numbered groupings, eg:
WITH Review_Sequence (
SELECT r.*,
ROW_NUMBER() OVER (PARTITION BY Product_ID ORDER BY Review_Date) Review_No
FROM Reviews r
)
SELECT * FROM Review_Sequence WHERE Review_No = 2

Retrieving the top 20% of a particular column in Oracle SQL? [duplicate]

This question already has answers here:
Top n percent top n%
(4 answers)
Closed 7 years ago.
I've created a database where workers have a charging rate and work a particular number of hours. From this I've done a select statement which simply displays some information about the worker, and then I've created a new column called Earnings which uses rate_per_hour * task_hours to give their total earnings.
My question is, is there a way to only display the top 20% highest earners based on the new Earnings column created in the select statement?
So far I have this:
SELECT worker.worker_id
,worker_first_name
,worker_surname
,worker_case_id
,task_hours
,rate.rate_id
,rate_per_hour
,task_hours * rate_per_hour AS Earnings
FROM worker
,note
,rate
WHERE worker.worker_id = note.worker_id
AND rate.rate_id = note.rate_id;
I just need display the top 20% of earnings based on that new column I've made. Is this possible?
Thanks, apologies for my lack of experience!
First, you should use explicit join syntax. Second, you can do what you want using percentile_cont() or percentile_disc(). However, I often do this using row_number() and count():
SELECT wnr.*
FROM (SELECT w.worker_id, worker_first_name, worker_surname, worker_case_id,
task_hours, r.rate_id, rate_per_hour,
task_hours * rate_per_hour AS Earnings,
row_number() over (order by task_hours * rate_per_hour desc) as seqnum,
count(*) as cnt
FROM worker w JOIN
note n
ON w.worker_id = n.worker_id JOIN
rate r
ON r.rate_id = n.rate_id
) wnr
WHERE seqnum <= cnt * 0.2;
You also might use rank analytical function instead of row_number in case you want equal rank for equal earnings.
Select * from (
Select employee,earnings,rank() over (order by earnings desc )/count(*) over() As top from employees)
Where top<=0.2;

Calculating Top Ten Categories in SSIS

I'm using an SSIS package to update my contents daily basis. There are thousands of contents which have different Moderation ID's and I want to calculate top ten categories FOR EACH Moderation ID. Before I realized that I should calculate it for each ModerationId, I used this query to get the Contents to be updated:
SELECT TOP 10 ModerationId, Category, COUNT(ContentSeqNum) AS Total FROM Content
WHERE Category IS NOT NULL
GROUP BY ModerationId, Category ORDER BY ModerationId, Total DESC
And that was a faulty approach because this query calculates top ten Categories for all the data, which should be different top ten categories for different ModerationId's.
How can I change this query to calculate Top 10 Categories for each ModerationId?
Use Window Function to get the to calculate top ten categories for Moderation ID. Try this.
SELECT moderationid,
category,
total
FROM (SELECT Row_number() OVER (partition BY moderationid
ORDER BY Count(contentseqnum)) Rn,
moderationid,
category,
Count(contentseqnum) AS Total
FROM content
WHERE category IS NOT NULL
GROUP BY moderationid,
category) A
WHERE rn <= 10
Use Row_number() function
select * from
(
select *,
row_number() over(partition by ModerationId order by ModerationId) as sno
from Content WHERE Category IS NOT NULL
) as t
where sno<=10
Find more methods at http://beyondrelational.com/modules/2/blogs/70/posts/10845/return-top-n-rows.aspx
Try this:
SELECT TOP(10) ModerationId, Category, COUNT(ContentSeqNum) OVER(PARTITION BY ModerationId ORDER BY ModerationId) AS Total
FROM Content
WHERE Category IS NOT NULL
ORDER BY Total DESC

I need the Top 10 results from table

I need to get the Top 10 results for each Region, Market and Name along with those with highest counts (Gaps). There are 4 Regions with 1 to N Markets. I can get the Top 10 but cannot figure out how to do this without using a Union for every Market. Any ideas on how do this?
SELECT DISTINCT TOP 10
Region, Market, Name, Gaps
FROM
TableName
ORDER BY
Region, Market, Gaps DESC
One approach would be to use a CTE (Common Table Expression) if you're on SQL Server 2005 and newer (you aren't specific enough in that regard).
With this CTE, you can partition your data by some criteria - i.e. your Region, Market, Name - and have SQL Server number all your rows starting at 1 for each of those "partitions", ordered by some criteria.
So try something like this:
;WITH RegionsMarkets AS
(
SELECT
Region, Market, Name, Gaps,
RN = ROW_NUMBER() OVER(PARTITION BY Region, Market, Name ORDER BY Gaps DESC)
FROM
dbo.TableName
)
SELECT
Region, Market, Name, Gaps
FROM
RegionsMarkets
WHERE
RN <= 10
Here, I am selecting only the "first" entry for each "partition" (i.e. for each Region, Market, Name tuple) - ordered by Gaps in a descending fashion.
With this, you get the top 10 rows for each (Region, Market, Name) tuple - does that approach what you're looking for??
I think you want row_number():
select t.*
from (select t.*,
row_number() over (partition by region, market order by gaps desc) as seqnum
from tablename t
) t
where seqnum <= 10;
I am not sure if you want name in the partition by clause. If you have more than one name within a market, that may be what you are looking for. (Hint: Sample data and desired results can really help clarify a question.)

Write an Oracle query to get top 10 products for top 5000 stores [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Get top 10 products for every category
I am looking for an Oracle query to get top 5000 stores and for each store get top 10 products and for each top 10 products get top 5 sub-products. So In total I should get 5000*10*5 rows.
Can someone help me get this using Oracle's analytical functions.
My current query looks like
SELECT
store,
product,
sub-product,
count(*) as sales
FROM stores_data
GROUP BY store, product, sub-product;
Please assume table names as stores_data with columns store_id , product,sub_product
You should use dense_rank to get the top N rows.
Something like
SELECT
storeid,
store,
productid,
product,
subproductid,
subproduct
FROM
(
SELECT
s.storeid,
s.store,
p.productid,
p.product,
sp.subproductid,
sp.subproduct,
dense_rank() over ( order by s.storeid) as storerank,
dense_rank() over ( partition by s.storeid
order by p.productid) as productrank
dense_rank() over ( partition by s.storeid, p.productid
order by sp.subproductid) as productrank
FROM
stores s
INNER JOIN products p on p.storeid = s.storeid
INNER JOIN subproduct sp on sp.productid = p.productid
) t
WHERE
t.storerank <= 5000 and
t.productrank < 10 and
t.subproductrank < 5
Of course, I don't now your tables nor the relation between them. And the actual fields and conditions you want to check for, so this is just a simple query getting the top N records based on their id. Also, this query expects a product to have only one store which might not be the case.. At least it will show you how to use dense_rank to get a three-layered sorting/filtering.
I'll leave the other answer because that looks more like how such a table structure should be, I think.
But you described in your other thread to have a table that looks like this:
create table store_data (
store varchar2(40),
product varchar2(40),
subproduct varchar2(40),
sales int);
That actually looks like data that is aggregated already and that you do now want to analyze again. You query could look like this. It first aggregates the sum of the sales, so you can order shops and products by sales too (the sales in the table seem to be for the subproducts. After that, you can add ranks to the shops and products by sales. I added a rank to the subproducts too. I used rank here, so there is a gap in the numbering when more records have the same sales. This way, when you got 8 records with a rank of 1, because they all have the same sales, the 6th record will actually have rank 9 instead of 2, so you will only select the 8 top stores (you wanted 5, but why skip the other 3 if they actually sold exactly the same) and not 4 others too.
select
ts.*
from
(
select
ss.*,
rank() over (order by storesales) as storerank,
rank() over (partition by store order by productsales) as productrank,
rank() over (partition by store, product order by subproductsales) as subproductrank
from
(
select
sd.*,
sum(sales) over (partition by store) as STORESALES,
sum(sales) over (partition by store, product) as PRODUCTSALES,
sum(sales) over (partition by store, product, subproduct) as SUBPRODUCTSALES
from
store_data sd
) ss
) ts
where
ts.storerank <= 2 and
ts.productrank <= 3 and
ts.subproductrank <= 4