Calculative cumulative returns using SQL - sql

I currently generate a user's "monthly_return" between two months using the code below. How would I turn "monthly_return" into a cumulative "linked" return similar to the StackOverflow question linked below?
Similar question: Running cumulative return in sql
I tried:
exp(sum(log(1 + cumulative_return) over (order by date)) - 1)
But get the error:
PG::WrongObjectType: ERROR: OVER specified, but log is not a window function nor an aggregate function LINE 3: exp(sum(log(1 + cumulative_return) over (order by date)) - 1... ^ : SELECT portfolio_id, exp(sum(log(1 + cumulative_return) over (order by date)) - 1) FROM (SELECT date, portfolio_id, (value_cents * 0.01 - cash_flow_cents * 0.01) / (lag(value_cents * 0.01, 1) over ( ORDER BY portfolio_id, date)) - 1 AS cumulative_return FROM portfolio_balances WHERE portfolio_id = 16 ORDER BY portfolio_id, date) as return_data;
The input data would be:
1/1/2017: $100 value, $100 cash flow
1/2/2017: $100 value, $0 cash flow
1/3/2017: $100 value, $0 cash flow
1/4/2017: $200 value, $100 cash flow
The output would be:
1/1/2017: 0% cumulative return
1/2/2017: 0% cumulative return
1/3/2017: 0% cumulative return
1/4/2017: 0% cumulative return
My current code which shows monthly returns which are not linked (cumulative).
SELECT
date,
portfolio_id,
(value_cents * 0.01 - cash_flow_cents * 0.01) / (lag(value_cents * 0.01, 1) over ( ORDER BY portfolio_id, date)) - 1 AS monthly_return
FROM portfolio_balances
WHERE portfolio_id = 16
ORDER BY portfolio_id, date;

If you want a cumulative sum:
SELECT p.*,
SUM(monthly_return) OVER (PARTITION BY portfolio_id ORDER BY date) as running_monthly_return
FROM (SELECT date, portfolio_id,
(value_cents * 0.01 - cash_flow_cents * 0.01) / (lag(value_cents * 0.01, 1) over ( ORDER BY portfolio_id, date)) - 1 AS monthly_return
FROM portfolio_balances
WHERE portfolio_id = 16
) p
ORDER BY portfolio_id, date;
I don't see that this makes much sense, because you have the cumulative sum of a ratio, but that appears to be what you are asking for.

Related

How to spread month to day with amount value divided by total days per month

I have data with an amount of 1 month and want to change it to 30 days.
if 1 month the amount is 20000 then per day is 666.67
The following are sample data and results:
Account
Project
Date
Segment
Amount
Acc1
1
September 2022
Actual
20000
Result :
I need a query using sql server
You may try a set-based approach using an appropriate number table and a calculation with windowed COUNT().
Data:
SELECT *
INTO Data
FROM (VALUES
('Acc1', 1, CONVERT(date, '20220901'), 'Actual', 20000.00)
) v (Account, Project, [Date], Segment, Amount)
Statement for all versions, starting from SQL Server 2016 (the number table is generated using JSON-based approach with OPENJSON()):
SELECT d.Account, d.Project, a.[Date], d.Segment, a.Amount
FROM Data d
CROSS APPLY (
SELECT
d.Amount / COUNT(*) OVER (ORDER BY (SELECT NULL)),
DATEADD(day, CONVERT(int, [key]), d.[Date])
FROM OPENJSON('[1' + REPLICATE(',1', DATEDIFF(day, d.[Date], EOMONTH(d.[Date]))) + ']')
) a (Amount, Date)
Statement for SQL Server 2022 (the number table is generated with GENERATE_SERIES()):
SELECT d.Account, d.Project, a.[Date], d.Segment, a.Amount
FROM Data d
CROSS APPLY (
SELECT
d.Amount / COUNT(*) OVER (ORDER BY (SELECT NULL)),
DATEADD(day, [value], d.[Date])
FROM GENERATE_SERIES(0, DATEDIFF(day, d.[Date], EOMONTH(d.[Date])))
) a (Amount, Date)
Notes:
Both approaches calculate the days for each month. If you always want 30 days per month, replace DATEDIFF(day, d.[Date], EOMONTH(d.[Date])) with 29.
There is a rounding issue with this calculation. You may need to implement an additional calculation for the last day of the month.
You can use a recursive CTE to generate each day of the month and then divide the amount by the number of days in the month to achive the required output
DECLARE #Amount NUMERIC(18,2) = 20000,
#MonthStart DATE = '2022-09-01'
;WITH CTE
AS
(
SELECT
CurrentDate = #MonthStart,
DayAmount = CAST(#Amount/DAY(EOMONTH(#MonthStart)) AS NUMERIC(18,2)),
RemainingAmount = CAST(#Amount - (#Amount/DAY(EOMONTH(#MonthStart))) AS NUMERIC(18,2))
UNION ALL
SELECT
CurrentDate = DATEADD(DAY,1,CurrentDate),
DayAmount = CASE WHEN DATEADD(DAY,1,CurrentDate) = EOMONTH(#MonthStart)
THEN RemainingAmount
ELSE DayAmount END,
RemainingAmount = CASE WHEN DATEADD(DAY,1,CurrentDate) = EOMONTH(#MonthStart)
THEN 0
ELSE CAST(RemainingAmount-DayAmount AS NUMERIC(18,2)) END
FROM CTE
WHERE CurrentDate < EOMONTH(#MonthStart)
)
SELECT
CurrentDate,
DayAmount
FROM CTE
In case you want an equal split without rounding errors and without loops you can use this calculation. It spreads the rounding error across all entries, so they are all as equal as possible.
DECLARE #Amount NUMERIC(18,2) = 20000,
#MonthStart DATE = '20220901'
SELECT DATEADD(DAY,Numbers.i - 1,#MonthStart)
, ShareSplit.Calculated_Share
, SUM(ShareSplit.Calculated_Share) OVER (ORDER BY (SELECT NULL)) AS Calculated_Total
FROM (SELECT DISTINCT number FROM master..spt_values WHERE number BETWEEN 1 AND DAY(EOMONTH(#MonthStart)))Numbers(i)
CROSS APPLY(SELECT CAST(ROUND(#Amount * 100 / DAY(EOMONTH(#MonthStart)),0) * 0.01
+ CASE
WHEN Numbers.i
<= ABS((#Amount - (ROUND(#Amount * 100 / DAY(EOMONTH(#MonthStart)),0) / 100.0 * DAY(EOMONTH(#MonthStart)))) * 100)
THEN 0.01 * SIGN(#Amount - (ROUND(#Amount * 100 / DAY(EOMONTH(#MonthStart)),0) / 100.0 * DAY(EOMONTH(#MonthStart))))
ELSE 0
END AS DEC(18,2)) AS Calculated_Share
)ShareSplit

SQL - Calculate percentage by group, for multiple groups

I have a table in GBQ in the following format :
UserId Orders Month
XDT 23 1
XDT 0 4
FKR 3 6
GHR 23 4
... ... ...
It shows the number of orders per user and month.
I want to calculate the percentage of users who have orders, I did it as following :
SELECT
HasOrders,
ROUND(COUNT(*) * 100 / CAST( SUM(COUNT(*)) OVER () AS float64), 2) Parts
FROM (
SELECT
*,
CASE WHEN Orders = 0 THEN 0 ELSE 1 END AS HasOrders
FROM `Table` )
GROUP BY
HasOrders
ORDER BY
Parts
It gives me the following result:
HasOrders Parts
0 35
1 65
I need to calculate the percentage of users who have orders, by month, in a way that every month = 100%
Currently to do this I execute the query once per month, which is not practical :
SELECT
HasOrders,
ROUND(COUNT(*) * 100 / CAST( SUM(COUNT(*)) OVER () AS float64), 2) Parts
FROM (
SELECT
*,
CASE WHEN Orders = 0 THEN 0 ELSE 1 END AS HasOrders
FROM `Table` )
WHERE Month = 1
GROUP BY
HasOrders
ORDER BY
Parts
Is there a way execute a query once and have this result ?
HasOrders Parts Month
0 25 1
1 75 1
0 45 2
1 55 2
... ... ...
SELECT
SIGN(Orders),
ROUND(COUNT(*) * 100.000 / SUM(COUNT(*), 2) OVER (PARTITION BY Month)) AS Parts,
Month
FROM T
GROUP BY Month, SIGN(Orders)
ORDER BY Month, SIGN(Orders)
Demo on Postgres:
https://dbfiddle.uk/?rdbms=postgres_10&fiddle=4cd2d1455673469c2dfc060eccea8020
You've stated that it's important for the total to be 100% so you might consider rounding down in the case of no orders and rounding up in the case of has orders for those scenarios where the percentages falls precisely on an odd multiple of 0.5%. Or perhaps rounding toward even or round smallest down would be better options:
WITH DATA AS (
SELECT SIGN(Orders) AS HasOrders, Month,
COUNT(*) * 10000.000 / SUM(COUNT(*)) OVER (PARTITION BY Month) AS PartsPercent
FROM T
GROUP BY Month, SIGN(Orders)
ORDER BY Month, SIGN(Orders)
)
select HasOrders, Month, PartsPercent,
PartsPercent - TRUNCATE(PartsPercent) AS Fraction,
CASE WHEN HasOrders = 0
THEN FLOOR(PartsPercent) ELSE CEILING(PartsPercent)
END AS PartsRound0Down,
CASE WHEN PartsPercent - TRUNCATE(PartsPercent) = 0.5
AND MOD(TRUNCATE(PartsPercent), 2) = 0
THEN FLOOR(PartsPercent) ELSE ROUND(PartsPercent) -- halfway up
END AS PartsRoundTowardEven,
CASE WHEN PartsPercent - TRUNCATE(PartsPercent) = 0.5 AND PartsPercent < 50
THEN FLOOR(PartsPercent) ELSE ROUND(PartsPercent) -- halfway up
END AS PartsSmallestTowardZero
from DATA
It's usually not advisable to test floating-point values for equality and I don't know how BigQuery's float64 will work with the comparison against 0.5. One half is nevertheless representable in binary. See these in a case where the breakout is 101 vs 99. I don't have immediate access to BigQuery so be aware that Postgres's rounding behavior is different:
https://dbfiddle.uk/?rdbms=postgres_10&fiddle=c8237e272427a0d1114c3d8056a01a09
Consider below approach
select hasOrders, round(100 * parts, 2) as parts, month from (
select month,
countif(orders = 0) / count(*) `0`,
countif(orders > 0) / count(*) `1`,
from your_table
group by month
)
unpivot (parts for hasOrders in (`0`, `1`))
with output like below

Get greater (subquery ) list than the AVG (subquery) in SQLite3

consider the following table:
covid_data(
CASES INT,
DEATHS INT,
COUNTRIES VARCHAR(64),
);
I am trying to get the names of the countries which the mortality rate is greater than the AVG mortality rate. The formula I am using to get the number of deaths based on every 1000 cases is:
(NUMBER OF DEATHS / NUMBER OF CASES) * 1000
To get the AVG I use this query:
SELECT AVG(rate)
FROM (
SELECT CAST(SUM(deaths) AS FLOAT) / SUM(cases) * 1000 AS rate
FROM covid_data
) covid_data;
To list the countries with a greater rate than this AVG this is one of the many attempts I have tried so far.
SELECT countries, CAST(SUM(deaths) AS FLOAT) / SUM(cases) * 1000 AS RATEM
FROM covid_data
GROUP BY countries
HAVING RATEM > (SELECT AVG(RATE)
FROM (
SELECT CAST(SUM(DEATHS) AS FLOAT) / SUM(CASES) * 1000 AS RATE
FROM covid_data
) covid_data);
This is returning an error: no such column: RATEM
As you can see I am struggling with this basic concepts I would appreciate as well any books/courses/resources to better understand this relations.
You can use window functions:
SELECT cd.country
FROM (SELECT cd.*,
SUM(deaths * 1.0) OVER () / SUM(cases) OVER () as mortality_ratio
FROM covid_data
) cd
WHERE (deaths * 1.0 / NULLIF(cases, 0)) > mortality_ratio;
Note that the average of the mortality ratio in each country is NOT the same as the overall mortality ratio. I think you understand this but I just want to emphasize that point. The average ratio would be:
AVG(deaths * 1.0 / NULLIF(cases, 0))
You could use window functions:
select t.*
from (
select
t.*,
1.0 * deaths / cases rate,
1.0 * sum(deaths) over() / sum(cases) over() avg_rate
from covid_date
) t
where rate > avg_rate

how to query daily cost of specific product in bigQuery?

I exported billing to bigquery and I want to get the translations total cost in specific date from bigQuery monthly or specific date. like April 1, 2019.
google docs sample query get monthly.
SELECT
invoice.month,
SUM(cost)
+ SUM(IFNULL((SELECT SUM(c.amount)
FROM UNNEST(credits) c), 0))
AS total,
(SUM(CAST(cost * 1000000 AS int64))
+ SUM(IFNULL((SELECT SUM(CAST(c.amount * 1000000 as int64))
FROM UNNEST(credits) c), 0))) / 1000000
AS total_exact
FROM `project.dataset.gcp_billing_export_v1_XXXXXX_XXXXXX_XXXXXX`
GROUP BY 1
ORDER BY 1 ASC
;
but I created my query this way:
$myVariable=
"SELECT
COUNT(*) total_times,
SUM(cost) total_cost
FROM
`project.dataset.gcp_billing_export_v1_XXXXXX_XXXXXX_XXXXXX`
WHERE
service.description = 'Translate' AND (usage_end_time >= timestamp('2019-04-04 00:00:00') AND usage_end_time <= timestamp('2019-04-04 23:59:59'))";
I want to get the total cost of the current day and the total cost from the first day of the month to the current day.
sample:
1. 2019/04/04: 4223.05 - (882 Times)
2. 2019/04/Total: 16505.43 - (3882 Times)
You can further add details to your working query:
SELECT
service.description,
timestamp_trunc(usage_start_time,DAY) as time_fragment,
ROUND(SUM(cost)
+ SUM(IFNULL((SELECT SUM(c.amount)
FROM UNNEST(credits) c), 0)),3)
AS total,
round((SUM(CAST(cost * 1000000 AS int64))
+ SUM(IFNULL((SELECT SUM(CAST(c.amount * 1000000 as int64))
FROM UNNEST(credits) c), 0))) / 1000000,3)
AS total_exact
FROM `project.dataset.gcp_billing_export_v1_XXXXXX_XXXXXX_XXXXXX`
WHERE service.description='Translate'
GROUP BY 1,2
ORDER BY 2 desc;
which displays:
you can further to into HOURly granularity if you edit line 3.

Value for two days

I have a table that shows the interest rate on an account on a given day. I'm looking to query the interest rate on one day and the interest rate for the next day. Specifically, I'm looking to find accounts where the interest rate changed from one day to the next.
My data looks like this
Account Number / Loan Number / Date / Interest Rate
1234 / 5656 / 1/1/18 / 12%
1234 / 5656 / 1/2/18 / 12%
1234 / 5656 / 1/3/18 / 0%
1234 / 5656 / 1/4/18 / 0%
I want the query to return just the two days where the interest rate changed from 12% to 0%.
With the above data, it would return the following:
Account Number / Loan Number / Date / Interest Rate / Next Day / Next Day Rate
1234 / 5656 / 1/2/18 / 12% / 1/3/18 / 0%
The code I'm using (see below) is returning the same Date multiple times and modifying the "Next Day Rate" by some factor I cannot identify.
This is what I have so far.
select
tc.loanaccountid AS 'Account'
, l.LoanNumber AS 'Loan Number'
, tc.trialbalancedate AS 'Date'
, tc.interestrate AS 'Interest Rate'
, tb.trialbalancedate AS 'Next Day'
, tb.interestrate AS 'Next Day Rate'
from dbo.dailytrialbalance tc
join dbo.loanaccount l on tc.loanaccountid = l.loanaccountid
left join dbo.dailytrialbalance tb on dateadd(day, 1, tc.trialbalancedate) =
tb.trialbalancedate
where tc.PortfolioCodeId = '10'
and tc.interestrate = '0'
and tb.interestrate > '0'
I'm still learning SQL, so any help is appreciated. Thanks.
Here is a way with LEAD for SQL Server 2012 onward. I added the (SELECT NULL) as a place holder for something more realistic to determine the difference since you have duplicate data for 1/2/18:
with cte as(
select
tc.loanaccountid AS 'Account'
, l.LoanNumber AS 'Loan Number'
, tc.trialbalancedate AS 'Date'
, tc.interestrate AS 'Interest Rate'
, NextDay = lead(tc.trialbalancedate) over (partition by tc.loanaccountid order by tc.trialbalancedate, (select null))
, NextDayRate = lead(tc.interestrate) over (partition by tc.loanaccountid order by tc.trialbalancedate, (select null))
from dbo.dailytrialbalance tc
join dbo.loanaccount l on tc.loanaccountid = l.loanaccountid)
select *
from cte
where NextDayRate = 0 and [Interest Rate] != 0
Here is it in action, with your test data:
declare #testData table (AccountNumber int, LoanNumber int, Date date, InterestRate varchar(3))
insert into #testData
values
(1234,5656,'20180101','12%'),
(1234,5656,'20180102','12%'),
(1234,5656,'20180103','0%'),
(1234,5656,'20180104','0%')
;with cte as(
select
AccountNumber
,LoanNumber
,[Date]
,InterestRate
,NextDay = lead([Date]) over (partition by AccountNumber order by [Date])
,NextDayRate = lead(InterestRate) over (partition by AccountNumber order by [Date])
from #testData)
select *
from cte
where NextDayRate = '0%' and InterestRate != '0%'