How to change the base year of constant dollars - sql

I have a table that contains the monthly values ($) of building permits, per region, per type of structure. I have them in current dollars and constant 2012 dollars. I would like to change the constant dollars to a base of the most recent month, ie 2021-05.
The Worldbank says this about changing the base year of constant dollars:
For example, you can rescale the 2010 data to 2005 by first creating an index dividing each year of the constant 2010 series by its 2005 value (thus, 2005 will equal 1). Then multiply each year's index result by the corresponding 2005 current U.S. dollar price value.
My table looks something like this (in reality, there are many cities, each having many types of structures, eg: Residential, institutional, etc.):
Period City Type of structure Value valueAdjustment
2011-01-01 New York Commercial, total 125478 Current Dollars
2011-01-01 New York Commercial, total 129276 Constant dollars
2011-02-01 New York Commercial, total 120568 Current Dollars
2011-02-01 New York Commercial, total 124110 Constant dollars
...
2021-04-01 New York Commercial, total 197296 Current Dollars
2021-04-01 New York Commercial, total 154500 Constant dollars
2021-05-01 New York Commercial, total 155043 Current Dollars
2021-05-01 New York Commercial, total 121082 Constant dollars
What I've thought of doing is to create a column, Rank, to then use some variation of ROW_NUMBER to easily compare every month to 2021-05. I populated the rank like such:
WITH cteRank AS(
SELECT t.*,
Rnk = DENSE_RANK()OVER (ORDER BY YEAR([Period]), DATEPART(MONTH,[Period]) )
- COUNT(CASE WHEN YEAR([Period]) = 2021 AND DATEPART(MONTH,[Period])=2 THEN 1 END) OVER ()
- 1 +836
FROM [buildingPermits] t)
UPDATE cteRank SET [Rank] = Rnk FROM cteRank ;
The +836 is because with the way I coded it, it counts every instance of 2021-05, so I added the count to cancel it out. Not very efficient, but it works.
The resulting Rank column looks like this:
Period Rank City Type of structure Value valueAdjustment
2011-01-01 -124 New York Commercial, total 125478 Current Dollars
2011-01-01 -124 New York Commercial, total 129276 Constant dollars
2011-02-01 -123 New York Commercial, total 120568 Current Dollars
2011-02-01 -123 New York Commercial, total 124110 Constant dollars
...
2021-04-01 -1 New York Commercial, total 197296 Current Dollars
2021-04-01 -1 New York Commercial, total 154500 Constant dollars
2021-05-01 0 New York Commercial, total 155043 Current Dollars
2021-05-01 0 New York Commercial, total 121082 Constant dollars
The last step is the Worldbank's formula adapted to my need:
For example, you can rescale the 2012 data to 2021-05 by first creating an index dividing each year of the constant 2012 series by its 2021-05 value (thus, 2021-05 will equal 1). Then multiply each year's index result by the corresponding 2021-05 current U.S. dollar price value.
So for 2011-01, it would be:
(Constant 2011-01) / (Constant 2021-05) * (Current 2021-05)
129276 / 121082 * 155043
=165535
Here's some pseudo code for the division using a subquery, but it obviously returns an error because I didn't target specific cities and Type of Structure. Partitioning returned an error as well, Incorrect syntax near the keyword 'over'
SELECT Period, [Type of structure],City,
Value/ (SELECT Value
FROM [buildingPermits]
WHERE YEAR(Period) = 2021
and DATEPART(MONTH,Period) = 5
and valueAdjustment= 'Constant dollars' and [Rank] = 0)
FROM [buildingPermits]
WHERE valueAdjustment= 'Constant dollars'
Error: Subquery returned more than 1 value.
This is not permitted when the subquery follows =, !=, <, <= , >, >=
or when the subquery is used as an expression.
I think a SELF JOIN, a temp table, subquery or somehow using MAX(Rank) to get the value at rank 0 (2021-05) could do the trick, but I don't know how to go about implementing either of those solutions.
Any help is appreciated

Does this get you closer to what you want?
SELECT Const201101.[Period], Const201101.City, Const201101.[Type of structure],
Const201101.[Value] as [Const 201101 Value],
Const202105.[Value] as [Const 202105 Value],
Curr202105.[Value] as [Curr 202105 Value]
FROM [buildingPermits] Const201101
JOIN [buildingPermits] Const202105 ON
Const202105.City = Const201101.City AND
Const202105.[Type of structure] = Const201101.[Type of structure] AND
Const202105.[Period] = '2021-05' AND
Const202105.valueAdjustment = 'Constant dollars'
JOIN [buildingPermits] Curr202105 ON
Curr202105.City = Const201101.City AND
Curr202105.[Type of structure] = Const201101.[Type of structure] AND
Curr202105.[Period] = '2021-05' AND
Curr202105.valueAdjustment = 'Current Dollars'
WHERE Const201101.valueAdjustment= 'Constant dollars' AND Const201101.[Period] = '2011-01-01'

Related

How to calculate a percentage on different values from same column with different criteria

I'm trying to write a query in SSRS (using SQL) to calculate an income statement percentage of sales for each month (the year is a parameter chosen by the user at runtime). However, the table I have to use for the data lists all of the years, months, accounts, dollars, etc together and looks like this:
ACCT_YEAR
ACCT_PERIOD
ACCOUNT_ID
CREDIT_AMOUNT
2021
1
4000
20000
2021
2
4000
25000
2021
1
5000
5000
2021
2
5000
7500
2021
1
6000
4000
2021
2
6000
8000
etc, etc (ACCOUNT_ID =4000 happens to be the sales account)
As an example,
I need to calculate
CREDIT_AMOUNT when ACCT_YEAR = 2021, ACCT_PERIOD=1, and ACCOUNT_ID=5000
/
CREDIT_AMOUNT when ACCT_YEAR = 2021, ACCT_PERIOD=1, and ACCOUNT_ID=4000
* 100
I would then do that for each ACCT_PERIOD in the ACCT_YEAR.
Hope that makes sense...What I want would look like this:
ACCT_YEAR
ACCT_PERIOD
ACCOUNT_ID
PERCENTAGE
2021
1
5000
0.25
2021
2
5000
0.30
2021
1
6000
0.20
2021
2
6000
0.32
I'm trying to create a graph that shows the percentage of sales of roughly 10 different accounts (I know their specific account_ID's and will filter by those ID's) and use the line chart widget to show the trends by month.
I've tried CASE scenarios, OVER scenarios, and nested subqueries. I feel like this should be simple but I'm being hardheaded and not seeing the obvious solution.
Any suggestions?
Thank you!
One important behaviour to note is that window functions are applied after the where clause.
Because you need the window functions to be applied before any where clause (which would filter account 4000 out), they need to be used in one scope, and the where clause in another scope.
WITH
perc AS
(
SELECT
*,
credit_amount * 100.0
/
SUM(
CASE WHEN account_id = 4000 THEN credit_amount END
)
OVER (
PARTITION BY acct_year, accr_period
)
AS credit_percentage
FROM
your_table
)
SELECT
*
FROM
perc
WHERE
account_id IN (5000,6000)
You just to use a matrix with a parent column group for ACCT_YEAR and a child column group for ACCT_PERIOD. Then you can use your calculation. If you format the textbox for percentage, you won't need to multiply it by 100.
Textbox value: =IIF(ACCOUNT_ID=4000, Sum(CREDIT_AMOUNT), 0) = 0, 0, IIF(ACCOUNT_ID=5000, Sum(CREDIT_AMOUNT), 0) / IIF(ACCOUNT_ID=4000, Sum(CREDIT_AMOUNT), 0)

Cumulative sum with proc-sql

I want to create a table "table_min_date_100d_per_country" which contains the first date where the cumulation by date of COVID cases exceeds 100 per country.
I have the columns date, cas_covid, country.
Sample data is..
Date Cas_covid country
2019-12-31 10 France
2020-01-01 15 France
2020-01-02 45 France
2020-01-03 5 France
2020-01-04 15 France
2020-01-05 11 France
The output is
2020-01-05 COVID cases = 101 country = France
Thanks.
If you are using SAS, it is much easier to get the cumulative sum with data step. There is no direct way of doing so with proc sql. Assuming your data is called "old_data" and it is already sorted by country and date, the following code will create a new dataset with the cumulative sum ("cum_sum") variable, by country:
data temp_data;
set old_data;
by country;
if first.country then cum_sum=0;
cum_sum+Cas_covid;
run;
After calculating the cumulative sum by country, you can get your desired output with proc sql, if you prefer, by evaluating only the results of cum_sum over 99 and keeping only the minimum for every country, like:
proc sql;
create table table_min_date_100d_per_country as
select distinct
date,
cum_sum as COVID_cases,
country
from temp_data
group by country /*This line gets you summarizing statistics by country*/
where cum_sum >= 100 /*This line says that you only evaluate results >= 100*/
having COVID_cases = min(COVID_cases) /*Within the end table, you only keep the minimum number of covid cases per country (after preselecting above 99)*/;
quit;
If your data is not sorted, you should first run
proc sort data=old_data;
by country date;
Best regards,

How to add custom YoY field to output?

I'm attempting to determine the YoY growth by month, 2017 to 2018, for number of Company bookings per property.
I've tried casting and windowed functions but am not obtaining the correct result.
Example Table 1: Bookings
BookID Amnt BookType InDate OutDate PropertyID Name Status
-----------------------------------------------------------------
789555 $1000 Company 1/1/2018 3/1/2018 22111 Wendy Active
478141 $1250 Owner 1/1/2017 2/1/2017 35825 John Cancelled
There are only two book types (e.g., Company, Owner) and two Book Status (e.g., Active and Cancelled).
Example Table 2: Properties
Property ID State Property Start Date Property End Date
---------------------------------------------------------------------
33111 New York 2/3/2017
35825 Michigan 7/21/2016
The Property End Date is blank when the company still owns it.
Example Table 3: Months
Start of Month End of Month
-------------------------------------------
1/1/2018 1/31/2018
The previous developer created this table which includes a row for each month from 2015-2020.
I've tried many various iterations of my current code and can't even come close.
Desired Outcome
I need to find the YoY growth by month, 2017 to 2018, for number of Company bookings per property. The stakeholder has requested the output to have the below columns:
Month Name Bookings_Per_Property_2017 Bookings_Per_Property_2018 YoY
-----------------------------------------------------------------------
The number of Company bookings per property in a month should be calculated by counting the total number of active Company bookings made in a month divided by the total number of properties active in the month.
Here is a solution that should be close to what you need. It works by:
LEFT JOINing the three tables; the important part is to properly check the overlaps in date ranges between months(StartOfMonth, EndOfMonth), bookings(InDate, OutDate) and properties(PropertyStartDate, PropertyEndDate): you can have a look at this reference post for general discussion on how to proceed efficiently
aggregating by month, and using conditional COUNT(DISTINCT ...) to count the number of properties and bookings in each month and year. The logic implicitly relies on the fact that this aggregate function ignores NULL values. Since we are using LEFT JOINs, we also need to handle the possibility that a denominator could have a 0 value.
Notes:
you did not provide expected results so this cannot be tested
also, you did not explain how to compute the YoY column, so I left it alone; I assume that you can easily compute it from the other columns
Query:
SELECT
MONTH(m.StartOfMonth) AS [Month],
COUNT(DISTINCT CASE WHEN YEAR(StartOfMonth) = 2017 THEN b.BookID END)
/ NULLIF(COUNT(DISTINCT CASE WHEN YEAR(StartOfMonth) = 2017 THEN p.PropertyID END), 0)
AS Bookings_Per_Property_2017,
COUNT(DISTINCT CASE WHEN YEAR(StartOfMonth) = 2018 THEN b.BookID END)
/ NULLIF(COUNT(DISTINCT CASE WHEN YEAR(StartOfMonth) = 2018 THEN p.PropertyID END), 0)
AS Bookings_Per_Property_2018
FROM months m
LEFT JOIN bookings b
ON m.StartOfMonth <= b.OutDate
AND m.EndOfMonth >= b.InDate
AND b.status = 'Active'
AND b.BookType = 'Company'
LEFT JOIN properties p
ON m.StartOfMonth <= COLAESCE(p.PropertyEndDate, m.StartOfMonth)
AND m.EndOfMonth >= p.PropertyStartDate
GROUP BY MONTH(m.StartOfMonth)

Finding a lagged cumulative total from a table containing both cumulative and delta entries

I have a SQL table with a schema where a value is either a cumulative value for a particular category, or a delta on top of the previous value. While I appreciate this is not a particularly great design, it comes from an external source and thus I can't change it in any way.
The table looks something like the following:
Date Category AmountSoldType AmountSold
-----------------------------------------------------
Jan 1 Apples Cumulative 100
Jan 1 Bananas Cumulative 50
Jan 2 Apples Delta 20
Jan 2 Bananas Delta 10
Jan 3 Apples Delta 25
Jan 3 Bananas Cumulative 75
For this example, I want to produce the total cumulative number of fruits sold by item at the beginning of each day:
Date Category AmountSold
--------------------------------
Jan 1 Apples 0
Jan 1 Bananas 0
Jan 2 Apples 100
Jan 2 Bananas 50
Jan 3 Apples 170
Jan 3 Bananas 60
Jan 4 Apples 195
Jan 4 Bananas 75
Intuitively, I want to take the most recent cumulative total, and add any deltas that have appeared since that entry.
I imagine something akin to
SELECT Date, Category
LEAD((subquery??), 1) OVER (PARTITION BY Category ORDER BY Date) AS Amt
FROM Fruits
GROUP BY Date, Category
ORDER BY Date ASC
is what I want, but I'm having trouble putting the right subquery together. Any suggestions?
You seem to want to add the deltas to the most recent cumulative -- all before the current date.
If so, I think this logic does what you want:
select f.*,
(max(case when date = date_cumulative then amountsold else 0 end) over (partition by category
) +
sum(case when date > date_cumulative then amountsold else 0 end) over (partition by category order by date rows between unbounded preceding and 1 preceding
)
) amt
from (select f.*,
max(case when AmountSoldType = 'cumulative' then date else 0 end) over
(partition by category order by date rows between unbounded preceding and current_row
) as date_cumulative
from fruits f
) f
I'm a bit confused by this data set (notwithstanding the mistake in adding up the apples). I assume the raw data states end-of-day figures, so for example 20 apples were sold on Jan 2 (because there is a delta of 20 reported for that day).
In your example results, it does not appear valid to say that zero apples were sold on Jan 1. It isn't actually possible to say how many were sold on that day, because it is not clear whether the 100 cumulative apples were accrued during Jan 1 (and thus should be excluded from the start-of-day figure you seek) or whether they were accrued on previous days (and should be included), or some mix of the two. That day's data should thus be null.
It is also not clear whether all data sets must begin with a cumulative, or whether data sets can begin with a delta (which might require working backwards from a subsequent cumulative), and whether you potentially have access to multiple data sets from your external source which form a continuous consistent sequence, or whether "cumulatives" relate purely to a single data set received. I'm going to assume at least that all data sets begin with a cumulative.
All that said, this problem is a simple case of firstly converting all rows into either all deltas, or all cumulatives. Assuming we go for all cumulatives, then recursing through each row in order, it is a case of either selecting the AmountSold as-is (if the row is a cumulative), or adding the AmountSold to the result of the previous step (if it is a delta).
Once pre-processed like this, then for a start-of-day cumulative, it is all just a question of looking at the previous day's cumulative (which was an end-of-day cumulative, if my initial assumption was correct that all raw data relates to end-of-day figures).
Using the LAG function in this final step to get the previous day's cumulative, will also neatly produce a null for the first row.

Query assistance please

Given the following table (much simplified for the purposes of this question):
id perPeriod actuals createdDate
---------------------------------------------------------
1 14 22 2011-10-04 00:00:00.000
2 14 9 2011-10-04 00:00:00.000
3 14 3 2011-10-03 00:00:00.000
4 14 5 2011-10-03 00:00:00.000
I need a query that gives me the average daily "actuals" figure. Note, however, that there are TWO RECORDS PER DAY (often more), so I can't just do AVG(actuals).
Also, if the daily "actuals" average exceeds the daily "perPeriod" average, I want to take the perPeriod value instead of the "average" value. Thus, in the case of the first two records: The actuals average for 4th October is (22+9) / 2 = 15.5. And the perPeriod average for the same day is (14 + 14) / 2 = 14. Now, 15.5 is greater than 14, so the daily "actuals" average for that day should be the "perPeriod" average.
Hope that makes sense. Any pointers greatly appreciated.
EDIT
I need an overall daily average, not an average per date. As I said, I would love to just do AVG(actuals) on the entire table, but the complicating factor is that a particular day can occupy more than one row, which would skew the results.
Is this what you want?
First, if the second payperiod average needed to be the average across a different grouping (It doesn't in this case), then you would need to use a subquery like this:
Select t.CreatedDate,
Case When Avg(actuals) < p.PayPeriodAvg
Then Avg(actuals) Else p.PayPeriodAvg End Average
From table1 t Join
(Select CreatedDate, Avg(PayPeriod) PayPeriodAvg
From table1
Group By CreatedDate) as p
On p.CreatedDate = t.CreatedDate
Group By t.CreatedDate, p.PayPeriodAvg
or, in this case, since the PayPeriod Average is grouped on the same thing, (CreatedDate) as the actuals average, you don't need a subquery, so even easier:
Select t.CreatedDate,
Case When Avg(actuals) < Avg(PayPeriod)
Then Avg(actuals) Else Avg(PayPeriod) End Average
From table1 t
Group By t.CreatedDate
with your sample data, both of these return
CreatedDate Average
----------------------- -----------
2011-10-03 00:00:00.000 4
2011-10-04 00:00:00.000 14
SELECT DAY(createdDate), MONTH(createdDate), YEAR(createdDate), MIN(AVG(actuals), MAX(perPeriod))
FROM MyTable
GROUP BY Day(createdDate, MONTH(createdDate), YEAR(createdDate)
Try this out:
select createdDate,
case
when AVG(actuals) > max(perPeriod) then max(perPeriod)
else AVG(actuals)
end
from SomeTestTable
group by createdDate