I'm new to SQL on BigQuery and I'm blocked on a project I have to compile.
I'm being asked to find the year over year growth of sales in percentage on a database that doesn't even sum the revenues... I know I have to assemble various request but can't figure out how to calculate the growth of sales.
Here is where I am at :
Has Anybody an insight on how to do so?
Thanks a lot !
(1) Starting from what you have, group by product line to get this year and last year's revenue in each row:
#standardsql
with yearly_sales AS (
select year, product_line, sum(revenue) as revenue
from `dataset.sales`
group by product_line, year
),
year_on_year AS (
select array_agg(struct(year, revenue))
OVER(partition by product_line ORDER BY year
RANGE BETWEEN PRECEDING AND CURRENT ROW) AS data
from yearly_sales
)
(2) Compute year-on-year growth from the two values you now have in each row
Below is for BigQuery Standard SQL
#standardSQL
SELECT product_line, year, revenue, prev_year_revenue,
ROUND(100 * (revenue - prev_year_revenue)/prev_year_revenue) year_over_year_growth_percent
FROM (
SELECT product_line, year, revenue,
LAG(revenue) OVER(PARTITION BY product_line ORDER BY year) prev_year_revenue
FROM (
SELECT product_line, year, SUM(revenue) revenue
FROM `project.dataset.table`
GROUP BY product_line, year
)
)
-- ORDER BY product_line, year
I tried with your information (plus mine made up data for 2007) and I arrived here:
SELECT
year,
sum(revenue) as year_sum
FROM
YearlyRevenue.SportCompany
GROUP BY
year
ORDER BY
year_sum
Whose result is:
R year year_sum
1 2005 1.159E9
2 2006 1.4953E9
3 2007 1.5708E9
Now the % growth should be added. Have a look here for inspiration.
Let me know if you don't succeed and I will try the hard part, with no guarantees.
Related
I am exploring a dataset in Microsoft SQL Server Management, regarding sales.
I want to obtain the day with the highest number of items sold for each year, therefore a table like this (the values in the rows are totally random):
Year
Purchase Day
Max_Daily_Sales
2011
2011-11-12
48
2012
2012-12-22
123
I first tried to run this query:
WITH CTE_DailySales AS
(
SELECT DISTINCT
Purchase_Day,
Year,
SUM(Order_Quantity) OVER (PARTITION BY Purchase_Day, Year) AS Daily_Quantity_Sold
FROM
[sql_cleaning].[dbo].[Sales$]
)
SELECT
Year, MAX(Daily_Quantity_Sold) AS Max_Daily_Sales
FROM
CTE_DailySales
GROUP BY
Year
ORDER BY
Year
It partially works since it gives me the highest quantity of items sold in a day for each year. However, I would also like to specify what day of the year it was.
If I try to write Purchase_Day in the Select statement, it returns the max for each day, not the single day with the highest number of items sold.
How could I resolve this problem?
I hope I've been clear enough and thanks you all for your help
I suggest you use ROW_NUMBER to get you max value, your query would be:
WITH CTE_DailySales AS
(
SELECT Purchase_Day,
Year,
SUM(Order_Quantity) Daily_Quantity_Sold,
ROW_NUMBER() OVER(PARTITION BY Year ORDER BY SUM(Order_Quantity) DESC) as rn
FROM
[sql_cleaning].[dbo].[Sales$]
GROUP BY Purchase_Day,
Year
)
SELECT
*
FROM
CTE_DailySales
WHERE rn = 1
Simply :
SELECT Purchase_Day,
Year,
SUM(Order_Quantity) OVER(PARTITION BY Purchase_Day, Year) AS Daily_Quantity_Sold,
MAX(SUM(Order_Quantity)) OVER(PARTITION BY Purchase_Day, Year) AS MAX_QTY_YEAR
FROM [sql_cleaning].[dbo].[Sales$];
Here is my dataset,
It has a reservation (unique ID) a reservation_dt a fiscal year (all the same year for the most part) month both numerical and name as well as a reservation status then it has total number reserved followed by a counter (basically
1 for each reservation row)
these are my guidelines (they need to be turned into columns by Month)
Requested - Count of All Distinct reservations
Num_Requested (sum total_number_requested by month)
Booked (count of All Distinct reservations status is order created)
Num_Booked (sum total_number_requested by month) where status is order created
Not_Booked (count of All Distinct reservations where status unfulfilled)
Not_Num_Booked, (sum total_number_requested by month where status is unfulfilled)
I am looking to translate this into a pivot table and this is what I've got so far and can't figure out why its not working.
I figured I would turn each of the above guidlines into a column, using either sum(total_number_Requested) or count(total_requested) where reseravation status is ... and such.
I'm open to any other ideas of how to make this simpler and make it work.
SELECT [month_name],
fyear AS fyear,
Requested,
Num_Requested
FROM (SELECT reservation,
reservation_status,
total_number_requested,
fyear,
[month_name],
[month],
total_requested
FROM #temp2) SourceTable
PIVOT (SUM(total_number_requested)
FOR reservation_status IN ([Requested])) PivotNumbRequested PIVOT(COUNT(reservation)
FOR total_requested IN ([Num_Requested])) PivotCountRequested
WHERE [month] = 7
ORDER BY fyear,
[month];
Use conditional expressions to emulate data pivot. Example:
SELECT fyear, Month, Monthname, Count(*) AS CountALL, Sum(total_number_requested) AS TotNum,
Sum(IIf(reservation_status = "Order Created", total_number_Requested, Null)) AS SumCreated
FROM tablename
GROUP BY fyear, Month, MonthName
More info:
SQLServer - Multiple PIVOT on same columns
Crosstab Query on multiple data points
I have the following database on Google Big Query (SQL Standard) with the date and the revenue. Both in int format. I need to get the totals by month and year. I am not able to get the part of the date I am interested on. Basically the numbers of position 1 and 6 from the first column.
Revenue Database:
This is what I have tried but then I need to run this code for every month separately:
SELECT sum(revenue)
from revenue.table
where date between 20210601 and 20210630
Any clue on how to do this? Thanks!
If the value is in int format, then use arithmetic:
select floor(date / 100) as yyyymm, sum(revenue)
from revenue.table
group by yyyymm;
If it were stored properly as a date, then you would use the built-in date_trunc():
select date_trunc(date, month) as yyyymm, sum(revenue)
from revenue.table
group by yyyymm;
There are similar functions for related data types: timestamp_trunc() and datetime_trunc().
If your date is actually an INTEGER, then I would:
GROUP BY DIV(date/100)
I would go with below
select
date_trunc(parse_date('%Y%m%d', '' || date), month) month,
sum(revenue) revenue
from `revenue.table`
group by month
Try this:
SELECT substring(cast(Date as string),1,6) YearMonth,
sum(Revenue)
FROM `<Dataset_NAME>.<Table_NAME>`
group by YearMonth;
Adding my own solution in case it helps someone in the future:
SELECT substring(CAST(date AS STRING), 1, 6) AS month, sum(revenue) AS total_rev
FROM `revenue.table`
GROUP BY month
ORDER BY month DESC
I have a table which calculates the headcount based on the date they are hired , but i want to see a cumulative hc for tat year for example I might have hired only 20 in 2016 but i should show my overall hc till 2015+20 in 2016 and the it should go on.
if my requirement is from 2019 onwards it should show the cumulative till 2019 and go from there.
select FISC_YR_ID,ASSOC_TYPE_NM,
count(ASSOC_BDGE_NBR) over(order by FISC_YR_ID,FISC_MTH_ID rows between unbounded preceding and current row) as CUM_HC
--order by FISC_YR_ID asc )
from HC_table
where FISC_YR_ID >2018
this is the table
I can't quite tell what the query has to do with the question or sample data, so this focuses on the question.
You can use a cumulative sum:
select year, hired, sum(hired) over (order by year)
from t;
If you want to filter this, then use a subquery:
select t.*
from (select year, hired, sum(hired) over (order by year) as cumulative
from t
) t
where year > 2018
I'm trying to output a top 3 products per quarter, that should be a total of 12 rows, since 3 top products per quarter.
Closest output is the one provided below i have no idea how to like partition it every quarter
SELECT * FROM (SELECT QUARTER, PRODUCT_NAME, SUM(QUANTITY) "QTY_SOLD", SALES, SUM(PROFIT) "PROFIT_GENERATED" FROM DELIVERIES_FACT
WHERE EXTRACT(YEAR from SHIP_DATE) = 2015 GROUP BY QUARTER, PRODUCT_NAME, SALES ORDER BY "PROFIT_GENERATED" DESC)
WHERE rownum <= 3
getting an output of
I've written this SQL extracting the calendar quarter from SHIP_DATE; you can adjust as needed.
Similarly, RANK(), ROW_NUMBER(), and DENSE_RANK() all are different; you may wish to experiment with each analytical function to see which best fits your data and handles ties the way you want them to.
SELECT *
FROM (SELECT RANK() OVER (PARTITION BY SHIP_QUARTER
ORDER BY PROFIT_GENERATED desc) AS PROFIT_RANK_BY_Q,
ORIG.*
FROM
(SELECT EXTRACT(QUARTER from SHIP_DATE) AS SHIP_QUARTER,
PRODUCT_NAME,
SUM(QUANTITY) "QTY_SOLD", SALES, SUM(PROFIT) "PROFIT_GENERATED"
FROM DELIVERIES_FACT
WHERE EXTRACT(YEAR from SHIP_DATE) = 2015
GROUP BY EXTRACT(QUARTER from SHIP_DATE), PRODUCT_NAME, SALES
)
)
WHERE PROFIT_RANK_BY_Q <= 3
order by SHIP_QUARTER, PROFIT_RANK_BY_Q