Select highest profit from each year SQL - sql

How do I obtain the highest value for each year within a table. So let's say we have a table movies and I want to find the highest profiting film for each year.
This is my attempt so far:
SELECT year, MAX(income - cost) AS profit, title
FROM Movies m, Movies m2
GROUP BY year
I am pretty certain it is going to need some sub selects but I can't visualise what I need to do. I was also thinking probably some sort of distinct option to rule out duplicate years.
Title Year Income Cost Length
A 2000 10 2 2
B 2000 9 7 2
So from this the expected result would be
Title Year Profit
A 2000 8

I'm guessing slightly at what you want, but since you've not specified any RDBMS a generic solution would be:
SELECT m.Year, (m.Income - m.Cost) AS Profit, m.Title
FROM Movies m
INNER JOIN
( SELECT m.Year, MAX(m.Income - m.Cost) AS Profit
FROM Movies
GROUP BY m.Year
) MaxProfit
ON MaxProfit.Year = m.Year
AND MaxProfit.Profit = (m.Income - m.Cost)
ORDER BY m.Year
You can also do this using analytic functions if your DBMS permits. e.g. SQL-Server
WITH MovieCTE AS
( SELECT m.Year,
Profit = (m.Income - m.Cost),
m.Title,
RowNumber = ROW_NUMBER() OVER(PARTITION BY m.Year ORDER BY (m.Income - m.Cost) DESC)
FROM Movies
)
SELECT year, Profit, Title
FROM MovieCTE
WHERE RowNumber = 1
It is possible I have misunderstood your exact criteria, but I am sure the same priciples can be applied, you will just need to alter the grouping and the join in the first example, or the partition by in the second.

select m1year,m1profit,title
from
(
(select year as m1year, max(income- cost) as m1profit from movies group by year) m1
join
(select m2year, (income-cost) as m2profit ,title as profit from movies) m2
on
m1profit = m2profit
) m

This will give the highest profit movie for each year, and choose the first title in the event of a tie:
select a.year, a.profit,
(select min(title) from Movies where year = a.year and income - cost = a.profit) as title
from (
select year, max(income - cost) as profit
from Movies -- title, year, cost, income, number
group by year
) as a
order by year desc

Related

SQL get top 3 values / bottom 3 values with group by and sum

I am working on a restaurant management system. There I have two tables
order_details(orderId,dishId,createdAt)
dishes(id,name,imageUrl)
My customer wants to see a report top 3 selling items / least selling 3 items by the month
For the moment I did something like this
SELECT
*
FROM
(SELECT
SUM(qty) AS qty,
order_details.dishId,
MONTHNAME(order_details.createdAt) AS mon,
dishes.name,
dishes.imageUrl
FROM
rms.order_details
INNER JOIN dishes ON order_details.dishId = dishes.id
GROUP BY order_details.dishId , MONTHNAME(order_details.createdAt)) t
ORDER BY t.qty
This gives me all the dishes sold count order by qty.
I have to manually filter max 3 records and reject the rest. There should be a SQL way of doing this. How do I do this in SQL?
You would use row_number() for this purpose. You don't specify the database you are using, so I am guessing at the appropriate date functions. I also assume that you mean a month within a year, so you need to take the year into account as well:
SELECT ym.*
FROM (SELECT YEAR(od.CreatedAt) as yyyy,
MONTH(od.createdAt) as mm,
SUM(qty) AS qty,
od.dishId, d.name, d.imageUrl,
ROW_NUMBER() OVER (PARTITION BY YEAR(od.CreatedAt), MONTH(od.createdAt) ORDER BY SUM(qty) DESC) as seqnum_desc,
ROW_NUMBER() OVER (PARTITION BY YEAR(od.CreatedAt), MONTH(od.createdAt) ORDER BY SUM(qty) DESC) as seqnum_asc
FROM rms.order_details od INNER JOIN
dishes d
ON od.dishId = d.id
GROUP BY YEAR(od.CreatedAt), MONTH(od.CreatedAt), od.dishId
) ym
WHERE seqnum_asc <= 3 OR
seqnum_desc <= 3;
Using the above info i used i combination of group by, order by and limit
as shown below. I hope this is what you are looking for
SELECT
t.qty,
t.dishId,
t.month,
d.name,
d.mageUrl
from
(
SELECT
od.dishId,
count(od.dishId) AS 'qty',
date_format(od.createdAt,'%Y-%m') as 'month'
FROM
rms.order_details od
group by date_format(od.createdAt,'%Y-%m'),od.dishId
order by qty desc
limit 3) t
join rms.dishes d on (t.dishId = d.id)

Using SQLite, how can I calculate the maximum year on year growth rate for each year?

I am learning about SQL and I am doing a practice exercise called World Populations SQL Practice on Codecademy. There is one table with three columns: country, population, and year. I am interested in calculating the country with the maximum year-on-year growth rate each year. (This wasn't suggested by Codecademy, I just think it's an interesting idea).
I can calculate all of the year-on-year growth rates with this query:
SELECT country,
100.0 * ((SELECT population FROM population_years AS p2
WHERE p2.year = p1.year + 1
AND p2.country = p1.country)
- population) / population AS year_on_year_growth,
year
FROM population_years AS p1
WHERE year_on_year_growth IS NOT NULL
ORDER BY year_on_year_growth;
and I can calculate the maximum year-on-year growth rate for a particular year, such as 2005, with a query such as this:
SELECT country,
100.0 * ((SELECT population FROM population_years AS p2
WHERE p2.year = p1.year + 1
AND p2.country = p1.country)
- population) / population AS year_on_year_growth,
year
FROM population_years AS p1
WHERE year = 2005
AND year_on_year_growth IS NOT NULL
ORDER BY year_on_year_growth DESC
LIMIT 1;
Using python, I can solve the problem using the first query saved as yoy_query if I do this:
yoy_result = c.execute(yoy_query).fetchall()
sorted([record for record in yoy_result if record[1] == max([row[1] for row in yoy_result if row[2] == record[2]])],key=lambda x:x[2])
and I get the desired result:
[('Montserrat', 7.34177215189872, 2000), ('Montserrat', 13.4433962264151, 2001), ('Afghanistan', 5.803891762260126, 2002), ('Montserrat', 10.467706013363028, 2003), ('Liberia', 4.7976709085316545, 2004), ('Jordan', 7.088496587486171, 2005), ('Jordan', 6.764378108744186, 2006), ('Montserrat', 12.638580931263864, 2007), ('Liberia', 4.157111008408977, 2008), ('Niger', 3.737166190281749, 2009)]
But I can't think of a way to do this using SQL. Any ideas? I think the reason it seems much easier in python is because I'm able to save the intermediate result, then run a second calculation on that.
You can do it with window functions LAG() and RANK():
select country, year_on_year_growth, year
from (
select *, rank() over (partition by year order by year_on_year_growth desc) as rnk
from (
select *,
100.0 * (population / lag(population) over (partition by country order by year) - 1) as year_on_year_growth
from population_years
)
)
The expression:
lag(population) over (partition by country order by year)
returns the country's population the previous year (assuming that there are no gaps between the years).
So I calculated the growth rate as:
((current year's population) / (previous year's population)) - 1
I guess the simplest thing to do would actually be to just use a view as follows:
CREATE VIEW yoy_growth
AS
SELECT country,
100.0 * ((SELECT population FROM population_years AS p2
WHERE p2.year = p1.year + 1
AND p2.country = p1.country)
- population) / population AS year_on_year_growth,
year
FROM population_years AS p1
WHERE year_on_year_growth IS NOT NULL
ORDER BY year_on_year_growth;
SELECT * FROM yoy_growth AS y1
WHERE year_on_year_growth = (
SELECT MAX(year_on_year_growth)
FROM yoy_growth AS y2
WHERE y1.year = y2.year
)
ORDER BY year;
That way I get the result I want, although the query does seem a little slow.

how to filter data in sql based on percentile

I have 2 tables, the first one is contain customer information such as id,age, and name . the second table is contain their id, information of product they purchase, and the purchase_date (the date is from 2016 to 2018)
Table 1
-------
customer_id
customer_age
customer_name
Table2
------
customer_id
product
purchase_date
my desired result is to generate the table that contain customer_name and product who made purchase in 2017 and older than 75% of customer that make purchase in 2016.
Depending on your flavor of SQL, you can get quartiles using the more general ntile analytical function. This basically adds a new column to your query.
SELECT MIN(customer_age) as min_age FROM (
SELECT customer_id, customer_age, ntile(4) OVER(ORDER BY customer_age) AS q4 FROM table1
WHERE customer_id IN (
SELECT customer_id FROM table2 WHERE purchase_date = 2016)
) q
WHERE q4=4
This returns the lowest age of the 4th-quartile customers, which can be used in a subquery against the customers who made purchases in 2017.
The argument to ntile is how many buckets you want to divide into. In this case 75%+ equals 4th quartile, so 4 buckets is OK. The OVER() clause specifies what you want to sort by (customer_age in our case), and also lets us partition (group) the data if we want to, say, create multiple rankings for different years or countries.
Age is a horrible field to include in a database. Every day it changes. You should have date-of-birth or something similar.
To get the 75% oldest value in 2016, there are several possibilities. I usually go for row_number() and count(*):
select min(customer_age)
from (select c.*,
row_number() over (order by customer_age) as seqnum,
count(*) over () as cnt
from customers c join
where exists (select 1
from customer_products cp
where cp.customer_id = c.customer_id and
cp.purchase_date >= '2016-01-01' and
cp.purchase_date < '2017-01-01'
)
)
where seqnum >= 0.75 * cnt;
Then, to use this for a query for 2017:
with a2016 as (
select min(customer_age) as customer_age
from (select c.*,
row_number() over (order by customer_age) as seqnum,
count(*) over () as cnt
from customers c
where exists (select 1
from customer_products cp
where cp.customer_id = c.customer_id and
cp.purchase_date >= '2016-01-01' and
cp.purchase_date < '2017-01-01'
)
) c
where seqnum >= 0.75 * cnt
)
select c.*, cp.product_id
from customers c join
customer_products cp
on cp.customer_id = c.customer_id and
cp.purchase_date >= '2017-01-01' and
cp.purchase_date < '2018-01-01' join
a2016 a
on c.customer_age >= a.customer_age;

Query to get top product gainers by sales over previous week

I have a database table with three columns.
WeekNumber, ProductName, SalesCount
Sample data is shown in below table. I want top 10 gainers(by %) for week 26 over previous week i.e. week 25. The only condition is that the product should have sales count greater than 0 in both the weeks.
In the sample data B,C,D are the common products and C has the highest % gain.
Similarly, I will need top 10 losers also.
What I have tried till now is to make a inner join and get common products between two weeks. However, I am not able to get the top gainers logic.
The output should be like
Product PercentGain
C 400%
D 12.5%
B 10%
This will give you a generic answer, not just for any particular week:
select top 10 product , gain [gain%]
from
(
SELECT product, ((curr.salescount-prev.salescount)/prev.salescount)*100 gain
from
(select weeknumber, product, salescount from tbl) prev
JOIN
(select weeknumber, product, salescount from tbl) curr
on prev.weeknumber = curr.weeknumber - 1
AND prev.product = curr.product
where prev.salescount > 0 and curr.salescount > 0
)A
order by gain desc
If you are interested in weeks 25 and 26, then just add the condition below in the WHERE clause:
and prev.weeknumber = 25
If you are using SQL-Server 2012 (or newer), you could use the lag function to match "this" weeks sales with the previous week's. From there on, it's just some math:
SELECT TOP 10 product, sales/prev_sales - 1 AS gain
FROM (SELECT product,
sales,
LAG(sales) OVER (PARTITION BY product
ORDER BY weeknumber) AS prev_sales
FROM mytable) t
WHERE weeknumber = 26 AND
sales > 0 AND
prev_sales > 0 AND
sales > prev_sales
ORDER BY sales/prev_sales
this is the Query .
select top 10 product , gain [gain%]
from
(
SELECT curr.Product, ( (curr.Sales - prev.Sales ) *100)/prev.Sales gain
from
(select weeknumber, product, sales from ProductInfo where weeknumber = 25 ) prev
JOIN
(select weeknumber, product, sales from ProductInfo where weeknumber = 26 ) curr
on prev.product = curr.product
where prev.Sales > 0 and curr.Sales > 0
)A
order by gain desc

TERADATA: Aggregate across multiple tables

Consider the following query where aggregation happens across two tables: Sales and Promo and the aggregate values are again used in a calculation.
SELECT
sales.article_id,
avg((sales.euro_value - ZEROIFNULL(promo.euro_value)) / NULLIFZERO(sales.qty - ZEROIFNULL(promo.qty)))
FROM
( SELECT
sales.article_id,
sum(sales.euro_value),
sum(sales.qty)
from SALES_TABLE sales
where year >= 2011
group by article_id
) sales
LEFT OUTER JOIN
( SELECT
promo.article_id,
sum(promo.euro_value),
sum(promo.qty)
from PROMOTION_TABLE promo
where year >= 2011
group by article_id
) promo
ON sales.article_id = promo.article_id
GROUP BY sales.article_id;
Some notes on the query:
Both the inner queries return huge number of rows due to large number of articles. Running explain on teradata, the inner queries themselves take very less time, but the join takes a long time.
Assume primary key on article_id is present and both the tables are partitioned by year.
Left Outer Join because second table contains optional data.
So, can you suggest a better way of writing this query. Thanks for reading this far :)
Not really sure how the avg function got into the mix, so I'm removing it.
SELECT article_id,
(SUM(sales_value) - SUM(promo_value)) /
(SUM(sales_qty) - SUM(promo_qty))
FROM (
SELECT
article_id,
sum(euro_value) AS sales_value,
sum(qty) AS sales_qty,
0 AS promo_value,
0 AS promo_qty
from SALES_TABLE sales
where year >= 2011
group by article_id
UNION ALL
SELECT
article_id,
0 AS sales_value,
0 AS sales_qty,
sum(euro_value) AS promo_value,
sum(qty) AS promo_qty
from SALES_TABLE sales
where year >= 2011
group by article_id
) AS comb
GROUP BY article_id;