how to convert income from one currency into another using a histroical fx table in python with pandas - pandas

I have two dataframes, one is a income df and the other is a fx df. my income df shows income from different accounts on different dates but it also shows extra income in a different currency. my fx df shows the fx rates for certain currency pairs on the same date the extra income came into the accounts.
I want to convert the currency of the extra income into the same currency as the account so for example, account HP on 23/3 has extra income = 35 GBP, i want to convert that into EUR as that's the currency of the account. Please note it has to use the fx table as i have a long history of data points to fill and other accounts so i do not want to manually code 35 * the fx rate. Finally i then want to create another column for income df that will sum the daily income + extra income in the same currency together
im not sure how to bring both df together so i can get the correct fx rate for that sepecifc date to convert the currency of the extra income into the currency of the account
my code is below
import pandas as pd
income_data = {'date': ['23/3/22', '23/3/22', '24/3/22', '25/3/22'], 'account': ['HP', 'HP', 'JJ', 'JJ'],
'daily_income': [1000, 1000, 2000, 2000], 'ccy of account': ['EUR', 'EUR', 'USD', 'USD'],
'extra_income': [50, 35, 10, 12.5], 'ccy of extra_income': ['EUR', 'GBP', 'EUR', 'USD']}
income_df = pd.DataFrame(income_data)
fx_data = {'date': ['23/3/22', '23/3/22', '24/3/22', '25/3/22'], 'EUR/GBP': [0.833522, 0.833522, 0.833621, 0.833066],
'USD/EUR': [0.90874, 0.90874, 0.91006, 0.90991]}
fx_df = pd.DataFrame(fx_data)
the final df should look like this (i flipped the fx rate so 1/0.833522 to get some of the values)
Would really appreicate if someone could help me with this. my inital thpought was merge but i dont have a common column and not sure if map function would work either as i dont have a dictionary. apologies in advance if any of my code is not greate - i am still self learning, thanks!

Consider creating a common column for merging in both data frames. Below uses assign to add columns and Series operators (over arithmetic ones: +, -, *, /).
# ADD NEW COLUMN AS CONCAT OF CCY COLUMNS
income_data = income_data.assign(
currency_ratio = lambda df: df["ccy of account"] + "/" + df["ccy of extra_income"]
)
# ADD REVERSED CURRENCY RATIOS
# RESHAPE WIDE TO LONG FORMAT
fx_data_long = pd.melt(
fx_data.assign(**{
"GBP/EUR": lambda df: df["EUR/GBP"].div(-1),
"EUR/USD": lambda df: df["USD/EUR"].div(-1)
}),
id_vars = "date",
var_name = "currency_ratio",
value_name = "fx_rate"
)
# MERGE AND CALCULATE
income_data = (
income_data.merge(
fx_data_long,
on = ["date", "currency_ratio"],
how = "left"
).assign(
total_income = lambda df: df["daily_income"].add(df["extra_income"].mul(df["fx_rate"]))
)
)

Related

pandas calculate time difference per several levels in one column df

I have a following dataset:
I would like to get a result as follows:
The goal is to calculate duration per "Level" column.
Dataset:
import pandas as pd
from datetime import datetime, date
data = {'Time': ["08:35:00", "08:40:00", "08:45:00", "08:55:00", "08:57:00", "08:59:00"],
'Level': [250, 250, 250, 200, 200, 200]}
df = pd.DataFrame(data)
df['Time'] = pd.to_datetime(df['Time'],format= '%H:%M:%S' ).dt.time
Difference between two datetimes i am able to calculate with the code:
t1 = df['Time'].iloc[0]
t2 = df['Time'].iloc[1]
c = datetime.combine(date.today(), t2) - datetime.combine(date.today(), t1)
But i am not able to "automate" the calculation. This code works the only for integers.
df2 = df.groupby('Level').apply(lambda x: x.Time.max() - x.Time.min())
If you keep the date part of Time, the calculation is a lot easier:
df = pd.DataFrame(data)
# Keep the date part, even though it's meaningless
df["Time"] = pd.to_datetime(df["Time"], format="%H:%M:%S")
def to_string(duration: pd.Timedelta) -> str:
total = duration.total_seconds()
hours, remainder = divmod(total, 3600)
minutes, seconds = divmod(remainder, 60)
return f"{hours:02.0f}:{minutes:02.0f}:{seconds:02.0f}"
level = df["Level"]
# CAUTION: avoid calling to_string until the very last step,
# when you need to display your result. There's not many
# calculations you can do with strings.
df["Time"].groupby(level).diff().groupby(level).sum().apply(to_string)

Calculate Average/mean() of a column in Python/Pandas/Numpy based on a different values in another column

I'd like to calculate an average of a column using pandas based on different numbers in another column.
I have two columns A,B: I'd like to have an extra column showing the average of B when values of A are between => 0 and < 20 , =>20 and <40, =>40 and <60, =>60 and <80 , =>80 and <100 and so on.. 100 as a maximum is an example .. lets say until the max number column A which could be 20000
enter image description here
I have tried using an if statement but that only works for limited values, what about if I have 20000 as my max value and want the average for a range = 5 for A values?
enter image description here
Use cut + groupby.transform:
bins = [0, 20, 40, 60, 80, 101]
df['C'] = df['B'].groupby(pd.cut(df['A'], bins=bins, right=False)).transform('mean')
If you want to generate the bins programmatically:
import numpy as np
MAX = 100
STEP = 20
bins = np.arange(0, MAX+1, STEP)
bins[-1] += 1

Generating Percentages from Pandas

0
I am working with a data set from SQL currently -
import pandas as pd
df = spark.sql("select * from donor_counts_2015")
df_info = df.toPandas()
print(df_info)
The output looks like this (I can't include the actual output for privacy reasons): enter image description here
As you can see, it's a data set that has the name of a fund and then the number of people who have donated to that fund. What I am trying to do now is calculate what percent of funds have only 1 donation, what percent have 2, 34, etc. I am wondering if there is an easy way to do this with pandas? I also would appreciate if you were able to see the percentage of a range of funds too, like what percentage of funds have between 50-100 donations, 500-1000, etc. Thanks!
You can make a histogram of the donations to visualize the distribution. np.histogram might help. Or you can also sort the data and count manually.
For the first task, to get the percentage the column 'number_of_donations', you can do:
df['number_of_donations'].value_counts(normalize=True) * 100
For the second task, you need to create a new column with categories, and then make the same:
# Create a Serie with categories
New_Serie = pd.cut(df.number_of_donations,bins=[0,100,200,500,99999999],labels = ['Few','Medium','Many','Too Many'])
# Change the name of the Column
New_Serie.name = Category
# Concat df and New_Serie
df = pd.concat([df, New_Serie], axis=1)
# Get the percentage of the Categories
df['Category'].value_counts(normalize=True) * 100

How to plot only business hours and weekdays in pandas

I have hourly stock data.
I need a) to format it so that matplotlib ignores weekends and non-business hours and b) an hourly frequency.
The problem:
Currently, the graph looks crammed and I suspect it is because matplotlib is taking into account 24 hours instead of 8, and 7 days a week instead of business days.
How do I tell pandas to only take into account business hours, M- F?
How I am graphing the data:
I am looping through a list of price data dataframes, graphing each data frame:
mm = 0
for ii in df:
Ddate = ii['Date']
Pprice = ii['Price']
d = Ddate.to_list()
p = Pprice.to_list()
dates = make_dt(d)
prices = unstring(p)
plt.figure()
plt.plot(dates,prices)
plt.title(stocks[mm])
plt.grid(True)
plt.xlabel('Dates')
plt.ylabel('Prices')
mm += 1
the graph:
To fetch business days, you can use below function:
df["IsBDay"] = bool(len(pd.bdate_range(df['date'], df['date'])))
//Above line should add a new column into the DF as IsBday.
//You can also use Lambda expression to check and have new column for BDay.
df['IsBDay'] = df['date'].apply(lambda x: 'True' if bool(len(pd.bdate_range(x, x))) else 'False')
Now create a new DF that will have only True IsBday column value and other columns.
df[df.IsBday != 'False']
Now your DF is ready for ploting.
Hope this helps.

How do I create a new dataframe using grouped output from another dataframe?

I have tick data containing symbols, bid prices, and ask prices. I was able to find the average spread and standard deviation of each symbol.
I'd like to create a confidence interval for each symbol and have the final DataFrame output have the columns
ticker symbol
average spread
lower bound 95% confidence
upper bound 95% confidence
How can I do that? This is how far I've been able to get:
df = pd.read_csv('C:\\Users\\William\\Desktop\\tickdata.csv',
dtype={'ticker': str, 'bidPrice': np.float64, 'askPrice': np.float64, 'afterHours': str},
usecols=['ticker', 'bidPrice', 'askPrice', 'afterHours'],
nrows=3000000
)
df = df[df.afterHours == "False"]
df = df[df.bidPrice != 0]
df = df[df.askPrice != 0]
df['spread'] = (df.askPrice - df.bidPrice)
print(df.groupby(['ticker'])['spread'].mean())
print(df.groupby(['ticker'])['spread'].std(ddof=0) * 1.96)
just call pd.dataframe on it.
new_df = pd.dataframe(df.groupby(['ticker'])['spread'].mean())
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html