Cumulative total from a table - sql

I have a table which calculates the headcount based on the date they are hired , but i want to see a cumulative hc for tat year for example I might have hired only 20 in 2016 but i should show my overall hc till 2015+20 in 2016 and the it should go on.
if my requirement is from 2019 onwards it should show the cumulative till 2019 and go from there.
select FISC_YR_ID,ASSOC_TYPE_NM,
count(ASSOC_BDGE_NBR) over(order by FISC_YR_ID,FISC_MTH_ID rows between unbounded preceding and current row) as CUM_HC
--order by FISC_YR_ID asc )
from HC_table
where FISC_YR_ID >2018
this is the table

I can't quite tell what the query has to do with the question or sample data, so this focuses on the question.
You can use a cumulative sum:
select year, hired, sum(hired) over (order by year)
from t;
If you want to filter this, then use a subquery:
select t.*
from (select year, hired, sum(hired) over (order by year) as cumulative
from t
) t
where year > 2018

Related

Problems with the MAX() IN SQL: not returning all desired information

I am exploring a dataset in Microsoft SQL Server Management, regarding sales.
I want to obtain the day with the highest number of items sold for each year, therefore a table like this (the values in the rows are totally random):
Year
Purchase Day
Max_Daily_Sales
2011
2011-11-12
48
2012
2012-12-22
123
I first tried to run this query:
WITH CTE_DailySales AS
(
SELECT DISTINCT
Purchase_Day,
Year,
SUM(Order_Quantity) OVER (PARTITION BY Purchase_Day, Year) AS Daily_Quantity_Sold
FROM
[sql_cleaning].[dbo].[Sales$]
)
SELECT
Year, MAX(Daily_Quantity_Sold) AS Max_Daily_Sales
FROM
CTE_DailySales
GROUP BY
Year
ORDER BY
Year
It partially works since it gives me the highest quantity of items sold in a day for each year. However, I would also like to specify what day of the year it was.
If I try to write Purchase_Day in the Select statement, it returns the max for each day, not the single day with the highest number of items sold.
How could I resolve this problem?
I hope I've been clear enough and thanks you all for your help
I suggest you use ROW_NUMBER to get you max value, your query would be:
WITH CTE_DailySales AS
(
SELECT Purchase_Day,
Year,
SUM(Order_Quantity) Daily_Quantity_Sold,
ROW_NUMBER() OVER(PARTITION BY Year ORDER BY SUM(Order_Quantity) DESC) as rn
FROM
[sql_cleaning].[dbo].[Sales$]
GROUP BY Purchase_Day,
Year
)
SELECT
*
FROM
CTE_DailySales
WHERE rn = 1
Simply :
SELECT Purchase_Day,
Year,
SUM(Order_Quantity) OVER(PARTITION BY Purchase_Day, Year) AS Daily_Quantity_Sold,
MAX(SUM(Order_Quantity)) OVER(PARTITION BY Purchase_Day, Year) AS MAX_QTY_YEAR
FROM [sql_cleaning].[dbo].[Sales$];

Count number of years since last deduction

I have a table similar to below where the same account has its fiscal years (FY) and deductions for each year broken out in multiple rows. Accounts can range from 1 - 20+ years. How do I group to one unique row that shows the current year and how many years its been since the account had a deduction?
from this:
to this:
Started to utilize the CTE approach as I have in the past, but as before it started to get ugly and I know there has to be a simpler approach...
Assuming the current year is the most recent year, you would use aggregation:
select account, max(fy),
sum(case when fy = max_fy then deductions end) as this_year_deduction,
max(fy) - max(case when deduction < 0 then fy end) as years_since_deduction
from (select t.*, max(fy) over (partition by account) as max_fy
from t
) t
group by account;
Note: I assume the third column is the most recent deduction. The query uses a window function to extract that.
Haven't used the methods below but I think it is close to what is needed. Corrections welcome. (Code not tested)
with nonZeroes as
(
select * from YourTable where deductions <> 0
)
select Account,
FY,
FY - LAST_VALUE(FY) OVER (PARTITION BY Account
ORDER BY Year Desc
RANGE BETWEEN CURRENT ROW AND UNBOUNDED PRECEDING) AS years_since_deductions
from nonZeroes

SQL Server - Cumulative Sum over Last 12 Months, but starting from the Last Month (SQL Server 18)

I need to run a cumulative sum of a value over the Last 12 Months. So far, my cumulative calculation are working, but starting from the Current Month.
I need the total of Last 12 Months, starting from the Last Month.
Currently, I'm using OVER clause on SQL, starting to running the cumulative total from the current row/month.
Please, refer below my code example:
SELECT *,
SUM(Amount) OVER (PARTITION BY ID ORDER BY Date_Month ROWS BETWEEN 11 PRECEDING AND CURRENT ROW) AS TwelveMoTtl
FROM (
SELECT DISTINCT
CAST(DATEADD(MONTH, DATEDIFF(MONTH, 0, TransactionDt), 0) AS DATE) AS Date_Month,
ID,
SUM(Amount) AS Amount
FROM MyTable
WHERE TransactionDt >= '2019-01-01'
GROUP BY
ID,
CAST(DATEADD(MONTH, DATEDIFF(MONTH, 0, TransactionDt), 0) AS DATE)
Here is my results (using only one ID to simplify the example):
As my example, the calculation are starting from the current row, and running over the last 12 months.
If we take the February row for example, I need the cumulative sum from Jan, 2020 to February, 2019.
Any suggestions how could I do it?
Thanks,
You seem to understand window functions pretty well. You just have to adjust the window frame:
SUM(Amount) OVER (PARTITION BY ID
ORDER BY Date_Month
ROWS BETWEEN 12 PRECEDING AND 1 PRECEDING
)
I forgot that I may have NULL rows in my table. So, the solution as to do a cumulative sum, even if there's missing dates. For example:
I need to running over the last 12 calendar months whether there are amount in those months or not.
Any ideas?
Thanks,
Rafael

how to produce a customer retention table /cohort analysis with SQL

I'm trying to write an SQL query (Presto SQL syntax) to produce a customer retention table (see sample below).
A customer who makes at least one transaction in a month is considered as retained for that month.
this is the table
user_id transaction_date
bdcff651- . 2018-01-01
bdcff641 . 2018-03-15
this is the result I would like to get
The first row should be understood as follows:
Out of all customers who made their first transaction in the month of Jan 2018 (defined as “Jan Activation Cohort”), 35% subsequently made a transaction during the one month period following their first transaction date, 23% in the next month, 15% in the next month and so on.
Date 1st Month 2nd Month 3rd Month
2018-01-01 35% 23% . 15%
2018-02-0 33 % 26% . 13%
2018-03-0 36% 27% 12%
As an example, if person XYZ makes his first transaction on 10th February 2018, his 1st month will be from 11th February 2018 to 10th March 2018, 2nd month will be from 11th March 2018 to 10th April 2018 and so on. This person’s details need to appear in the Feb 2018 cohort in the Customer Retention Table.
would appreciate any help! thanks.
You can use conditional aggregation. However, I am not sure what your real calculations are.
If I just use the built-in definitions of date_diff(), then the logic looks like:
select date_trunc(month, first_td) as yyyymm,
count(distinct user_id) as cnt,
(count(distinct case when date_diff(month, first_td, transaction_date) = 1
then user_id
end) /
count(distinct user_id)
) as month_1_ratio,
(count(distinct case when date_diff(month, first_td, transaction_date) = 2
then user_id
end) /
count(distinct user_id)
) as month_2_ratio
from (select t.*,
min(transaction_date) over (partition by user_id) as first_td
from t
) t
group by date_trunc(month, first_td)
order by yyyymm;
I am not familiar with Presto exactly, and do not have a way to test Presto code. However, it looks like from searching around a bit that it wouldn't be too hard to convert to Presto syntax from something like SQL Server syntax. Here is what I would do in SQL Server and you should be able to carry the concept over to Presto:
with transactions_info_per_user as (
select user_id, min(transaction_date) as first_transaction,
convert(datepart(year, min(transaction_date)) as varchar(4)) + convert(datepart(month, min(transaction_date)) as varchar(2)) as activation_cohort
from my_table
group by user_id
),
users_per_activation_cohort as (
select activation_cohort, count(*) as number_of_users
from transactions_info_per_user
group by activation_cohort
),
months_after_activation_per_purchase as (
select distinct mt.user_id, ti.activation_cohort, datediff(month, mt.transaction_date, ti.first_transaction) AS months_after_activation
from my_table mt
left join transactions_info_per_user as ti
on mt.user_id = ti.user_id
),
final as (
select activation_cohort, months_after_activation, count(*) as user_count_per_cohort_with_purchase_per_month_after_activation
from months_after_activation_per_purchase
group by activation_cohort, months_after_activation
)
select activation_cohort, months_after_activation,
convert(user_count_per_cohort_with_purchase_per_month_after_activation as decimal(9,2)) / convert(users_per_activation_cohort as decimal(9,2)) * 100
from final
--Then pivot months_after_activation into columns
I was very explicit with the naming of things so you could follow the thought process. Here is an example of how to pivot in Presto. Hopefully this helps you!

SQL BigQuery : Calculate Value per time period

I'm new to SQL on BigQuery and I'm blocked on a project I have to compile.
I'm being asked to find the year over year growth of sales in percentage on a database that doesn't even sum the revenues... I know I have to assemble various request but can't figure out how to calculate the growth of sales.
Here is where I am at :
Has Anybody an insight on how to do so?
Thanks a lot !
(1) Starting from what you have, group by product line to get this year and last year's revenue in each row:
#standardsql
with yearly_sales AS (
select year, product_line, sum(revenue) as revenue
from `dataset.sales`
group by product_line, year
),
year_on_year AS (
select array_agg(struct(year, revenue))
OVER(partition by product_line ORDER BY year
RANGE BETWEEN PRECEDING AND CURRENT ROW) AS data
from yearly_sales
)
(2) Compute year-on-year growth from the two values you now have in each row
Below is for BigQuery Standard SQL
#standardSQL
SELECT product_line, year, revenue, prev_year_revenue,
ROUND(100 * (revenue - prev_year_revenue)/prev_year_revenue) year_over_year_growth_percent
FROM (
SELECT product_line, year, revenue,
LAG(revenue) OVER(PARTITION BY product_line ORDER BY year) prev_year_revenue
FROM (
SELECT product_line, year, SUM(revenue) revenue
FROM `project.dataset.table`
GROUP BY product_line, year
)
)
-- ORDER BY product_line, year
I tried with your information (plus mine made up data for 2007) and I arrived here:
SELECT
year,
sum(revenue) as year_sum
FROM
YearlyRevenue.SportCompany
GROUP BY
year
ORDER BY
year_sum
Whose result is:
R year year_sum
1 2005 1.159E9
2 2006 1.4953E9
3 2007 1.5708E9
Now the % growth should be added. Have a look here for inspiration.
Let me know if you don't succeed and I will try the hard part, with no guarantees.