Using SQLite, how can I calculate the maximum year on year growth rate for each year? - sql

I am learning about SQL and I am doing a practice exercise called World Populations SQL Practice on Codecademy. There is one table with three columns: country, population, and year. I am interested in calculating the country with the maximum year-on-year growth rate each year. (This wasn't suggested by Codecademy, I just think it's an interesting idea).
I can calculate all of the year-on-year growth rates with this query:
SELECT country,
100.0 * ((SELECT population FROM population_years AS p2
WHERE p2.year = p1.year + 1
AND p2.country = p1.country)
- population) / population AS year_on_year_growth,
year
FROM population_years AS p1
WHERE year_on_year_growth IS NOT NULL
ORDER BY year_on_year_growth;
and I can calculate the maximum year-on-year growth rate for a particular year, such as 2005, with a query such as this:
SELECT country,
100.0 * ((SELECT population FROM population_years AS p2
WHERE p2.year = p1.year + 1
AND p2.country = p1.country)
- population) / population AS year_on_year_growth,
year
FROM population_years AS p1
WHERE year = 2005
AND year_on_year_growth IS NOT NULL
ORDER BY year_on_year_growth DESC
LIMIT 1;
Using python, I can solve the problem using the first query saved as yoy_query if I do this:
yoy_result = c.execute(yoy_query).fetchall()
sorted([record for record in yoy_result if record[1] == max([row[1] for row in yoy_result if row[2] == record[2]])],key=lambda x:x[2])
and I get the desired result:
[('Montserrat', 7.34177215189872, 2000), ('Montserrat', 13.4433962264151, 2001), ('Afghanistan', 5.803891762260126, 2002), ('Montserrat', 10.467706013363028, 2003), ('Liberia', 4.7976709085316545, 2004), ('Jordan', 7.088496587486171, 2005), ('Jordan', 6.764378108744186, 2006), ('Montserrat', 12.638580931263864, 2007), ('Liberia', 4.157111008408977, 2008), ('Niger', 3.737166190281749, 2009)]
But I can't think of a way to do this using SQL. Any ideas? I think the reason it seems much easier in python is because I'm able to save the intermediate result, then run a second calculation on that.

You can do it with window functions LAG() and RANK():
select country, year_on_year_growth, year
from (
select *, rank() over (partition by year order by year_on_year_growth desc) as rnk
from (
select *,
100.0 * (population / lag(population) over (partition by country order by year) - 1) as year_on_year_growth
from population_years
)
)
The expression:
lag(population) over (partition by country order by year)
returns the country's population the previous year (assuming that there are no gaps between the years).
So I calculated the growth rate as:
((current year's population) / (previous year's population)) - 1

I guess the simplest thing to do would actually be to just use a view as follows:
CREATE VIEW yoy_growth
AS
SELECT country,
100.0 * ((SELECT population FROM population_years AS p2
WHERE p2.year = p1.year + 1
AND p2.country = p1.country)
- population) / population AS year_on_year_growth,
year
FROM population_years AS p1
WHERE year_on_year_growth IS NOT NULL
ORDER BY year_on_year_growth;
SELECT * FROM yoy_growth AS y1
WHERE year_on_year_growth = (
SELECT MAX(year_on_year_growth)
FROM yoy_growth AS y2
WHERE y1.year = y2.year
)
ORDER BY year;
That way I get the result I want, although the query does seem a little slow.

Related

Get greater (subquery ) list than the AVG (subquery) in SQLite3

consider the following table:
covid_data(
CASES INT,
DEATHS INT,
COUNTRIES VARCHAR(64),
);
I am trying to get the names of the countries which the mortality rate is greater than the AVG mortality rate. The formula I am using to get the number of deaths based on every 1000 cases is:
(NUMBER OF DEATHS / NUMBER OF CASES) * 1000
To get the AVG I use this query:
SELECT AVG(rate)
FROM (
SELECT CAST(SUM(deaths) AS FLOAT) / SUM(cases) * 1000 AS rate
FROM covid_data
) covid_data;
To list the countries with a greater rate than this AVG this is one of the many attempts I have tried so far.
SELECT countries, CAST(SUM(deaths) AS FLOAT) / SUM(cases) * 1000 AS RATEM
FROM covid_data
GROUP BY countries
HAVING RATEM > (SELECT AVG(RATE)
FROM (
SELECT CAST(SUM(DEATHS) AS FLOAT) / SUM(CASES) * 1000 AS RATE
FROM covid_data
) covid_data);
This is returning an error: no such column: RATEM
As you can see I am struggling with this basic concepts I would appreciate as well any books/courses/resources to better understand this relations.
You can use window functions:
SELECT cd.country
FROM (SELECT cd.*,
SUM(deaths * 1.0) OVER () / SUM(cases) OVER () as mortality_ratio
FROM covid_data
) cd
WHERE (deaths * 1.0 / NULLIF(cases, 0)) > mortality_ratio;
Note that the average of the mortality ratio in each country is NOT the same as the overall mortality ratio. I think you understand this but I just want to emphasize that point. The average ratio would be:
AVG(deaths * 1.0 / NULLIF(cases, 0))
You could use window functions:
select t.*
from (
select
t.*,
1.0 * deaths / cases rate,
1.0 * sum(deaths) over() / sum(cases) over() avg_rate
from covid_date
) t
where rate > avg_rate

Selecting a second column with a SUM(SUM(value)) function

I am working on a simple query trying to display the total of the totals for 12 periods. I am using a SUM(SUM(value)) function to retrieve the data that I want, however, I am having a hard time displaying a second column in my result.
SELECT CENTRE, SUM(SUM(AMOUNT)) "TOTAL PAY" FROM AB
WHERE ACCOUNT LIKE 'N%' AND CENTRE = '2001' AND YEAR > 2015 GROUP BY AMOUNT, CENTRE;
The error that I am getting has to do with the grouping of the sentence.
Can you please tell me what I have done wrong. I have solved the problem with a sub-query, but I need to fix this query as well because it is used in a more advanced one as a sub-query.
Your questions is too vague to know for sure what you want. For example, what do you mean by "totals for 12 periods"? Is it that for YEAR > 2015 you have 12 rows? Are you always having CENTRE = in your WHERE clause? If so, this might be what you want:
SELECT
MAX(centre) "CENTRE",
SUM(amount) "TOTAL PAY"
FROM
ab
WHERE
account LIKE 'N%'
AND
centre = '2001'
AND
year > 2015;
Or in case CENTRE = 'smth' might not be in your WHERE clause an you need total values for each CENTRE:
SELECT
centre "CENTRE",
SUM(amount) "TOTAL PAY"
FROM
ab
WHERE
account LIKE 'N%'
/* AND
centre = '2001'*/
AND
year > 2015
GROUP BY
centre;
Or in case for every (or one) CENTRE row you need to have total value of all centres:
SELECT
"CENTRE",
total "TOTAL PAY"
FROM
(
SELECT
centre,
ROW_NUMBER() OVER(PARTITION BY
centre
ORDER BY
0
) rn,
SUM(
amount
) OVER(PARTITION BY
0
) total
FROM
ab
WHERE
account LIKE 'N%'
AND
year > 2015
)
WHERE
rn = 1;

SQL Year over year growth percentage from data same query

How do I calculate the percentage difference from 2 different columns, calculated in that same query? Is it even possible?
This is what I have right now:
SELECT
Year(OrderDate) AS [Year],
Count(OrderID) AS TotalOrders,
Sum(Invoice.TotalPrice) AS TotalRevenue
FROM
Invoice
INNER JOIN Order
ON Invoice.InvoiceID = Order.InvoiceID
GROUP BY Year(OrderDate);
Which produces this table
Now I'd like to add one more column with the YoY growth, so even when 2016 comes around, the growth should be there..
EDIT:
I should clarify that I'd like to have for example next to
2015,5,246.28 -> 346,15942029% ((R2015-R2014) / 2014 * 100)
If you save your existing query as qryBase, you can use it as the data source for another query to get what you want:
SELECT
q1.Year,
q1.TotalOrders,
q1.TotalRevenue,
IIf
(
q0.TotalRevenue Is Null,
Null,
((q1.TotalRevenue - q0.TotalRevenue) / q0.TotalRevenue) * 100
) AS YoY_growth
FROM
qryBase AS q1
LEFT JOIN qryBase AS q0
ON q1.Year = (q0.Year + 1);
Access may complain it "can't represent the join expression q1.Year = (q0.Year + 1) in Design View", but you can still edit the query in SQL View and it will work.
What you are looking for is something like this?
Year Revenue Growth
2014 55
2015 246 4.47
2016 350 1.42
You could wrap the original query a twice to get the number from both years.
select orders.year, orders.orders, orders.revenue,
(select (orders.revenue/subOrders.revenue)
from
(
--originalQuery or table link
) subOrders
where subOrders.year = (orders.year-1)
) as lastYear
from
(
--originalQuery or table link
) orders
here's a cheap union'd table example.
select orders.year, orders.orders, orders.revenue,
(select (orders.revenue/subOrders.revenue)
from
(
select 2014 as year, 2 as orders, 55.20 as revenue
union select 2015 as year, 2 as orders, 246.28 as revenue
union select 2016 as year, 7 as orders, 350.47 as revenue
) subOrders
where subOrders.year = (orders.year-1)
) as lastYear
from
(
select 2014 as year, 2 as orders, 55.20 as revenue
union select 2015 as year, 2 as orders, 246.28 as revenue
union select 2016 as year, 7 as orders, 350.47 as revenue
) orders

Select highest profit from each year SQL

How do I obtain the highest value for each year within a table. So let's say we have a table movies and I want to find the highest profiting film for each year.
This is my attempt so far:
SELECT year, MAX(income - cost) AS profit, title
FROM Movies m, Movies m2
GROUP BY year
I am pretty certain it is going to need some sub selects but I can't visualise what I need to do. I was also thinking probably some sort of distinct option to rule out duplicate years.
Title Year Income Cost Length
A 2000 10 2 2
B 2000 9 7 2
So from this the expected result would be
Title Year Profit
A 2000 8
I'm guessing slightly at what you want, but since you've not specified any RDBMS a generic solution would be:
SELECT m.Year, (m.Income - m.Cost) AS Profit, m.Title
FROM Movies m
INNER JOIN
( SELECT m.Year, MAX(m.Income - m.Cost) AS Profit
FROM Movies
GROUP BY m.Year
) MaxProfit
ON MaxProfit.Year = m.Year
AND MaxProfit.Profit = (m.Income - m.Cost)
ORDER BY m.Year
You can also do this using analytic functions if your DBMS permits. e.g. SQL-Server
WITH MovieCTE AS
( SELECT m.Year,
Profit = (m.Income - m.Cost),
m.Title,
RowNumber = ROW_NUMBER() OVER(PARTITION BY m.Year ORDER BY (m.Income - m.Cost) DESC)
FROM Movies
)
SELECT year, Profit, Title
FROM MovieCTE
WHERE RowNumber = 1
It is possible I have misunderstood your exact criteria, but I am sure the same priciples can be applied, you will just need to alter the grouping and the join in the first example, or the partition by in the second.
select m1year,m1profit,title
from
(
(select year as m1year, max(income- cost) as m1profit from movies group by year) m1
join
(select m2year, (income-cost) as m2profit ,title as profit from movies) m2
on
m1profit = m2profit
) m
This will give the highest profit movie for each year, and choose the first title in the event of a tie:
select a.year, a.profit,
(select min(title) from Movies where year = a.year and income - cost = a.profit) as title
from (
select year, max(income - cost) as profit
from Movies -- title, year, cost, income, number
group by year
) as a
order by year desc

sql query to calculate monthly growth percentage

I need to build a query with 4 columns (sql 2005).
Column1: Product
Column2: Units sold
Column3: Growth from previous month (in %)
Column4: Growth from same month last year (in %)
In my table the year and months have custom integer values. For example, the most current month is 146 - but also the table has a year (eg 2011) column and month (eg 7) column.
Is it possible to get this done in one query or do i need to start employing temp tables etc??
Appreciate any help.
thanks,
KS
KS,
To do this on the fly, you could use subqueries.
SELECT product, this_month.units_sold,
(this_month.sales-last_month.sales)*100/last_month.sales,
(this_month.sales-last_year.sales)*100/last_year.sales
FROM (SELECT product, SUM(units_sold) AS units_sold, SUM(sales) AS sales
FROM product WHERE month = 146 GROUP BY product) AS this_month,
(SELECT product, SUM(units_sold) AS units_sold, SUM(sales) AS sales
FROM product WHERE month = 145 GROUP BY product) AS last_month,
(SELECT product, SUM(units_sold) AS units_sold, SUM(sales) AS sales
FROM product WHERE month = 134 GROUP BY product) AS this_year
WHERE this_month.product = last_month.product
AND this_month.product = last_year.product
If there's a case where a product was sold in one month but not another month, you will have to do a left join and check for null values, especially if last_month.sales or last_year.sales is 0.
I hope I got them all:
SELECT
Current_Month.product_name, units_sold_current_month,
units_sold_last_month * 100 / units_sold_current_month prc_last_month,
units_sold_last_year * 100 / units_sold_current_month prc_last_year
FROM
(SELECT product_id, product_name, sum(units_sold) units_sold_current_month FROM MyTable WHERE YEAR = 2011 AND MONTH = 7) Current_Month
JOIN
(SELECT product_id, product_name, sum(units_sold) units_sold_last_month FROM MyTable WHERE YEAR = 2011 AND MONTH = 6) Last_Month
ON Current_Month.product_id = Last_Month.product_id
JOIN
(SELECT product_id, product_name, sum(units_sold) units_sold_last_year FROM MyTable WHERE YEAR = 2010 AND MONTH = 7) Last_Year
ON Current_Month.product_id = Last_Year.product_id
I am slightly guessing as the structure of the table provided is the result table, right? You will need to do self-join on month-to-previous-month basis:
SELECT <growth computation here>
FROM SALES s1 LEFT JOIN SALES s2 ON (s1.month = s2.month-1) -- last month join
LEFT JOIN SALES s3 ON (s1.month = s3.month - 12) -- lat year join
where <growth computation here> looks like
((s1.sales - s2.sales)/s2.sales * 100),
((s1.sales - s3.sales)/s3.sales * 100)
I use LEFT JOIN for months that have no previous months. Change your join conditions based on actual relations in month/year columns.