FIFO Capital Gains Calculations in Python - pandas

I'm trying to calculate capital gains on some earnings in python but having a bit of trouble figuring out the right data structures to use and the best approach for doing what I want to do because I need to account for long-term and short-term tax rate differences.
Below is a typical representation of the data:
Unfortunately, I don't think I can simply add and subtract dollar-cost averages because buying at 50 # $50 on Jan 1, 2017 50 # $50 on Feb 1, 2017 and selling 100 # $100 on Jan 15, 2018 results in long term cap gains (15%) for the first 50 units and short term cap gains (25%) on the second 50 units.
So it seems like with every sale, I still need to keep track of all prior sales # particular sale dates, but I also have to amend the first X records to reflect that these purchases have been partially/fully depleted.
I don't know how to do this or what data structure works best for this?

Related

Different Correlation Coefficent for Different Time Ranges

I built a DataFrame where there are the following data:
Daily Price of Gas Future of N Day;
Daily Price of Petroil
Future of N Day;
Daily Price of Dau-Ahead Eletricity Market in
Italy;
The data are taken from 2010 to 2022 time range, so 12 years of historical time data.
The DataFrame head looks like this:
PETROIL GAS ELECTRICITY
0 64.138395 2.496172 68.608696
1 65.196161 2.482612 113.739130
2 64.982403 2.505938 112.086957
3 64.272606 2.500000 110.043478
4 65.993436 2.521739 95.260870
So on this DataFrame I tried to build the Correlation Matric throught the Pandas metod .corr() and faced one big issue:
If I take all 12 years as data I get:
almost Zero as correlation between Electricity and Petroil price;
low correlation (0.12) between Electricity and Gas price;
While if I try to split in three time range (2010-2014; 2014-2018; 2018-2022) I get really high correlation (in both case between 0.40 and 0.60).
So I am here asking these two questions:
Why I get this so high difference when I split the time ranges?
Considering I am doing this kind of analysis to use Petroil and Gas
prices to predict the electricity price, which of these two analysis
should I consider? The first one (with low correlation) that
considers the entire time range or the second one (with higher
correlation) that is split into different time ranges?
Thank you for your answers.

AWS QuickSight: maxOver() calculated field?

I am a stock trader who visualizes data in QuickSight. I identify the trades I want to submit to the market, sometimes for the same stock, at the same time, but in opposite directions depending on the price of the stock at that time. See below for an example of trades I might identify for 1/19/22 0800:
Date
Hour
Stock
Direction
Price
Volume
1/19/22
0800
Apple
BUY
$10
2
1/19/22
0800
Apple
SELL
$20
1
1/19/22
0800
Microsoft
BUY
$15
3
Using QuickSight, I want to visualize (in pivot tables and charts) the volume that I trade, using the maximum possible trade volume. For example, QuickSight simply sums the Volume column to 6, when really I want it to sum to 5, because the max possible trade volume for that hour is 5 (the Apple trades in the example are mutually exclusive, because the stock price cannot be both beneath $10, triggering a BUY, and above $20, triggering a SELL at the same date-time. Therefore, I want the day's traded volume to reflect the MAX possible volume I could have traded (2+3)).
I have used the maxOver() function as so: maxOver({volume}, [{stock}, {date}, {hour}], PRE_AGG), but I would like to view my trade volume rolled up to the day as so:
Date
Volume
1/19
5
Is there a way to do this using QuickSight calculated fields? Should this aggregation be done with a SQL custom field?
Add a new calculated field called
volume_direction_specifier
{Volume} * 10 + ifelse({Direction}='BUY', 1, 2)
This is a single number that will indicate the direction and volume. (this is needed in cases where the max possible volume is the same for both the BUY and SELL entries within the same hour).
Then compute the maxOver on this new field in a calculated field called max_volume_direction_specifier
maxOver({volume_direction_specifier}, [{stock}, {date}, {hour}], PRE_AGG)
Add a new field which will give the Volume for rows that have the max volume_direction_specifier per hour
volume_for_max_trade_volume_per_hour
ifelse(volume_direction_specifier = max_volume_direction_specifier, {volume}, null)
And finally, you should be able to add volume_for_max_trade_volume_per_hour to your table (grouped by day) and its SUM will give the maximum possible trade volume per day.

Calculating interest using SQL

I am using PostgreSQL, and have a table for a billing cycle and another for payments made in a billing cycle.
I am trying to figure out how to calculate interest based on how much amount was left after each billing cycle's last payment date. Problem is that every time a repayment is made, the interest has to be calculated on the amount remaining after that.
My thoughts on building this query are like this. Build data for all dates from last pay date of the billing cycle to today. Using partitioning, get the remaining amount for the first date. For second date, use amount from previous row and add interest to it, and then calculate interest on this one.
Unfortunately I am stuck just at the thought and can't figure out how to make this into a query!
Here's some sample data to make things easier to understand.
Billing Cycles:
id | ends_at
-----+---------------------
1 | 2017-11-30
2 | 2017-11-30
Payments:
amount | billing_cycle_id | type | created_at
-----------+------------------+---------+----------------------------
6000.0000 | 1 | payment | 2017-11-15 18:40:22.151713
2000.0000 | 1 |repayment| 2017-11-19 11:45:15.6167
2000.0000 | 1 |repayment| 2017-12-02 11:46:40.757897
So if we see, user made a repayment on the 19th, so amount due for interest post ends date(30th Nov 2017), is only 4000. So, from 30th to the 2nd, interest will be calculated daily on 4000. However, from the 2nd, interest needs to be calculated on 2000 only.
Interest Calculations(Today being 2017-12-04):
date | amount | interest
------------+---------+----------
2017-12-01 | 4000 | 100 // First day of pending dues.
2017-12-02 | 2100 | 52.5 // Second day of pending dues.
2017-12-03 | 2152.5 | 53.8125 // Third day of pending dues.
2017-12-04 |2206.3125| // Fourth's day interest will be added tomorrow
Your data is too sparse. It doesn't make any sense to need to write this query, because over time the query will get significantly more complicated. What happens when interest rates change over time?
The table itself (or a secondary table, depending on how you want to structure it) could have a running balance you add every time a deposit / withdrawal is made. (I suggest this table be add-only) Otherwise you're making both the calculation and accounting far harder on yourself than it should be. Even with the way you've presented the problem here, there's not enough information to do the calculation. (interest rate is missing) When that's the case, your stored procedure is going to be too complicated. Complicated means bugs, and people get irritated about bugs when you're talking about their money.

identifying trends and classifying using sql

i have a table xyz, with three columns rcvr_id,mth_id and tpv. rcvr_id is an id given to a customer, mth_id is a column which stores the month number( mth_id is calculated as (2012-1900) * 12 + 1,2,3.. ( depending on the month). So for example Dec 2011 will have month_id of 1344, Jan 2012 1345 etc. Tpv is a variable which shows the customers transaction amount.
Example table
rcvr_id mth_id tpv
1 1344 23
2 1344 27
3 1344 54
1 1345 98
3 1345 102
.
.
.
so on
P.S if a customer does not have a transaction in a given month, his row for that month wont exist.
Now, the question. Based on transactions for the months 1327 to 1350, i need to classify a customer as steady or sporadic.
Here is a description.
The above image is for 1 customer. i have millions of customers.
How do i go about it? I have no clue how to identify trends in sql .. or rather how to do it the best way possible.
ALSO i am working on teradata.
Ok i have found out how to get standard deviation. Now the important question is : How do i set a standard deviation limit on my own? i just cant randomly say "if standard dev is above 40% he is sporadic else steady". I thought of calculating average of standard deviation for all customers and if it is above that then he is sporadic else steady. But i feel there could be a better logic
I would suggest the STDDEV_POP function - a higher value indicates a greater variation in values.
select
rcvr_id, STDDEV_POP(tpv)
from yourtable
group by rcvr_id
STDDEV_POP is the function for Standard Deviation
If this doesn't differentiate enough, you may need to look at regression functions and variance.

Predictive Ordering Logic

I have a problem and was wondering if anyone could help or if it is even possible to have an algorithm for something like this.
I need to create a predictive ordering wizard. So based on previous sales, we will determine that that a certain amount of an item is required. E.g 31 apples. Now i need to work out the number of cases that needs to be ordered. If the cases come in say 60, 30, 15, 10 apples, the order should be a case of 30 and a case of 10 apples.
The number of items that need to be ordered change in each row of the result set. The case sizes could also change for each item. So some items may have an option of 5 different cases and some items may land up with an option of only one case.
Other examples would be i need 39 cans of coke and the cases come in only 24 per case. Therefore needing 2 cases. I need 2 shots of baileys and the bottle of baileys come in 50cl or 70cl. Therefore i need the 50cl.
The results sets columns are ItemName, ItemSize, QuantityRequired, PackSize and PackSizeMultiple.
The ItemName is the item to be ordered. ItemSize is the size the item is used in eg. can of coke. QuantityRequired how man of the item, in this case cans of coke, need to be ordered. PackSize is the size of the case. PackSizeMultiple is the number to multiply the item with to work out how many of the items are in the case.
ps. this will be a query in SQL Server 2008
Sounds like you need a UOM (Unit of Measure) table and a function to calc co-pack measure count and and unit count measure qty. with UOM type based on time between orders. You would also need to create a cron cycle and freeze table managed by week/time interval in order to create a freeze view of the current qty sold each week and the number of units since last order. Based on the 2 previous orders to your prior order you would set the current prediction based on min time between the last 2 freeze cycles containing an order and the duration of days between them. based on the average time between orders and the unit qty in each order, you can create a unit decay ratio percentage based on days and store it in each slice forward. Based on a reference to this data you will be able to create a prediction that will allow you to trigger a notice to sales or a message to the client to reorder. In addition, if you engage response data from sales based on unit count feedback from the client, you can reference an actual and tune your decay rate against your prediction. You should also consider managing and rolling up these freezes by month, so that you can view historical trending and forecast revenue based on velocity of reorder and same period last year. Basically this is similar to sales forcasting and we are switching out your opportunity percentage of close with Predicted Remaining Qty. percentage remaining.