when I run the below, I was expecting some lead_difference values to be negative (2016 value < 1990 value). However I don't see that in the output. Could you pls help check if I missed anything? Thanks! Images of output
)
WITH a AS
(SELECT
region,
year,
SUM(forest_area_sqkm)/(SUM(total_area_sq_mi*2.59)) as percentage_forest_region
FROM forestation2
GROUP BY region, year)
SELECT
region,
year,
percentage_forest_region,
LEAD(percentage_forest_region) OVER (ORDER BY percentage_forest_region) - percentage_forest_region AS lead_difference
FROM a
WHERE year = '2016' OR year = '1990'
GROUP BY region, year, percentage_forest_region
ORDER BY region, year, percentage_forest_region DESC;
You are leading by the percentage itself. Presumably, you want the next value for the region based on the year. That would be:
LEAD(percentage_forest_region) OVER (PARTITION BY region ORDER BY year) - percentage_forest_region AS lead_difference
If you order by percentage_forest_region then all years and regions are combined and the next value is the next larger value.
Related
I am exploring a dataset in Microsoft SQL Server Management, regarding sales.
I want to obtain the day with the highest number of items sold for each year, therefore a table like this (the values in the rows are totally random):
Year
Purchase Day
Max_Daily_Sales
2011
2011-11-12
48
2012
2012-12-22
123
I first tried to run this query:
WITH CTE_DailySales AS
(
SELECT DISTINCT
Purchase_Day,
Year,
SUM(Order_Quantity) OVER (PARTITION BY Purchase_Day, Year) AS Daily_Quantity_Sold
FROM
[sql_cleaning].[dbo].[Sales$]
)
SELECT
Year, MAX(Daily_Quantity_Sold) AS Max_Daily_Sales
FROM
CTE_DailySales
GROUP BY
Year
ORDER BY
Year
It partially works since it gives me the highest quantity of items sold in a day for each year. However, I would also like to specify what day of the year it was.
If I try to write Purchase_Day in the Select statement, it returns the max for each day, not the single day with the highest number of items sold.
How could I resolve this problem?
I hope I've been clear enough and thanks you all for your help
I suggest you use ROW_NUMBER to get you max value, your query would be:
WITH CTE_DailySales AS
(
SELECT Purchase_Day,
Year,
SUM(Order_Quantity) Daily_Quantity_Sold,
ROW_NUMBER() OVER(PARTITION BY Year ORDER BY SUM(Order_Quantity) DESC) as rn
FROM
[sql_cleaning].[dbo].[Sales$]
GROUP BY Purchase_Day,
Year
)
SELECT
*
FROM
CTE_DailySales
WHERE rn = 1
Simply :
SELECT Purchase_Day,
Year,
SUM(Order_Quantity) OVER(PARTITION BY Purchase_Day, Year) AS Daily_Quantity_Sold,
MAX(SUM(Order_Quantity)) OVER(PARTITION BY Purchase_Day, Year) AS MAX_QTY_YEAR
FROM [sql_cleaning].[dbo].[Sales$];
I have a big dataset on ticket sales throughout a single year. The schema I am working with is:
ID
date_time_sale (Timestamp, yyyy-MM-dd hh-mm-ss)
weekday (varchar, Mon to Sun)
number_tickets (integer)
ticket_price (float)
total_price (float)
I am trying to get to get the weekday of every month of the year where the highest number of tickets was sold, so, for example, the output would be:
year
month
weekday
total_tickets
2015
01
SAT
5400
2015
02
SUN
4300
2015
03
SUN
6400
I tried using the following, but admittedly SQL is not my strongest skill:
SELECT DISTINCT EXTRACT(YEAR FROM date_time_sale) AS YEAR,
EXTRACT(MONTH FROM date_time_sale) AS MONTH,
week_day,
RANK () OVER (PARTITION BY YEAR, MOMTH ORDER BY count(week_day) ASC) weekday_count
from ticket_sales
order by YEAR, MONTH
But I keep running into errors. I tried using a HAVING clause, but I coludn't go anywhere. Any tip on how to effectively use the RANK () OVER (PARTITION BY) clause to get this output, please? Or do I need to use COUNT () OVER?
The analysis exception says:
`cannot resolve '`YEAR`' given input columns: [ticket_sales.YEAR, ticket_sales.MONTH, weekday]; line 1 pos 292;\n'Sort ['YEAR ASC NULLS FIRST, 'MONTH ASC NULLS FIRST], true\n+- Project [YEAR#342, MONTH#358
but then it is quite a long error.
Update:
So I tried this code:
SELECT DISTINCT year,
month,
week_day,
COUNT (week_day) OVER (PARTITION BY year, month, week_day) AS weekday_count
from ticket_sales
order by year, month, weekday_count DESC
And what that did is give the results of all week days in the for every months, so the output is 12*7 instead of 12 rows. Still ways to learn around this but at least I am somewhere.
Try this query and let me know if return the desire result:
I'm not sure if field name is number_tickets or total_tickets, I used number_tickets.
First I sum numbers tickets from year, month and week day, then return a row per year and month with the week's day in which more tickets were sold.
WITH total_by_day AS (SELECT EXTRACT(YEAR FROM date_time_sale) AS YEAR,
EXTRACT(MONTH FROM date_time_sale) AS MONTH,
week_day,
SUM(number_tickets) AS number_tickets
FROM ticket_sales
GROUP BY YEAR, MONTH, week_day)
SELECT DISTINCT
YEAR,
MONTH,
FIRST_VALUE(week_day) OVER (PARTITION BY YEAR, MONTH ORDER BY number_tickets DESC) AS week_day,
FIRST_VALUE(number_tickets) OVER (PARTITION BY YEAR, MONTH ORDER BY number_tickets DESC) AS total_tickets
FROM total_by_day
ORDER BY YEAR, MONTH;
In Postgresql database I got the desire result.
I have daily city level data with some counts. I have to aggregate this data at monthly level(1st day of each month) and then create lag variables based on last 1 week from 1st day of month.
I have used following code to create lag variables for last 1 month using (after aggregating data at monthly level ( with 1st date of month)
sum(count) over (partition by City order by month_date rows between 1 preceding and 1 preceding) as last_1_month_count
Is there a way to aggregate data at monthly level and create lag variables based on last 7,14,21,28 days using window function?
you can use this L
select
CITY
, month(Date)
, year(date)
, sum(count)
from table1
where date < Datediff(days , 7 , getdate())
group by
City
, month(Date)
, year(date)
I think you're looking for something like this. The first cte summarizes city counts to the day, week, month, year. The second summarizes the counts to the week, month, year. To group sales by weeks starting from the 1st day it uses the DAY function along with YEAR and MONTH. Since DAY returns and integer, groups of distinct weeks can be created by dividing by 7, i.e. DAY(day_dt)/7.
One way to get the prior week sales would be to join the week sales summary cte to itself where the week is offset by -1. Since the prior week might possible have 0 sales it seems safer to LEFT JOIN than to use LAG imo
with
day_sales_cte(city, day_dt, yr, mo, wk, sum_count) as (
select city, day_dt, year(day_dt), month(day_cte), day(day_dt)/7, sum([count]) sum_counts
from city_level_data
group by city, day_dt, year(day_dt), month(day_cte), day(day_dt)/7)
wk_sales_cte(city, yr, mo, wk, sum_count) as (
select city, yr, mo, wk, sum(sum_counts) sum_counts
from sales_cte
group by city, yr, mo, wk)
select ws.*, ws2.sum_sales prior_wk_sales
from wk_sales_cte ws
left join wk_sales_cte ws2 on ws.city=ws2.city
and ws.yr=ws2.yr
and ws.mo=ws2.mo
and ws.wk=ws.wk-1;
I have four grouping variables Month, State,County, City. In addition I have the metric column sales which can be null I would like to calculate the percent change of sales per month for each City.
My solution would have the same grouping but with the sales column replaced by percent change for each month in calendar year 2019. Any help with a solution is appreciated.
You can use window functions:
select month, state, city, sales,
lag(sales) over (partition by state, city order by month) as prev_month,
(-1 + sales / lag(sales) over (partition by state, city order by month)) as change_ratio
from t;
I'm new to SQL on BigQuery and I'm blocked on a project I have to compile.
I'm being asked to find the year over year growth of sales in percentage on a database that doesn't even sum the revenues... I know I have to assemble various request but can't figure out how to calculate the growth of sales.
Here is where I am at :
Has Anybody an insight on how to do so?
Thanks a lot !
(1) Starting from what you have, group by product line to get this year and last year's revenue in each row:
#standardsql
with yearly_sales AS (
select year, product_line, sum(revenue) as revenue
from `dataset.sales`
group by product_line, year
),
year_on_year AS (
select array_agg(struct(year, revenue))
OVER(partition by product_line ORDER BY year
RANGE BETWEEN PRECEDING AND CURRENT ROW) AS data
from yearly_sales
)
(2) Compute year-on-year growth from the two values you now have in each row
Below is for BigQuery Standard SQL
#standardSQL
SELECT product_line, year, revenue, prev_year_revenue,
ROUND(100 * (revenue - prev_year_revenue)/prev_year_revenue) year_over_year_growth_percent
FROM (
SELECT product_line, year, revenue,
LAG(revenue) OVER(PARTITION BY product_line ORDER BY year) prev_year_revenue
FROM (
SELECT product_line, year, SUM(revenue) revenue
FROM `project.dataset.table`
GROUP BY product_line, year
)
)
-- ORDER BY product_line, year
I tried with your information (plus mine made up data for 2007) and I arrived here:
SELECT
year,
sum(revenue) as year_sum
FROM
YearlyRevenue.SportCompany
GROUP BY
year
ORDER BY
year_sum
Whose result is:
R year year_sum
1 2005 1.159E9
2 2006 1.4953E9
3 2007 1.5708E9
Now the % growth should be added. Have a look here for inspiration.
Let me know if you don't succeed and I will try the hard part, with no guarantees.