Multiplying series across two dataframes via a lookup table (third dataframe) - pandas

I have stock prices in a dataframe called 'stock_data' as hown here:
stock_data = pd.DataFrame(np.random.rand(5,4)*100, index=pd.date_range(start='1/1/2022', periods=5),columns = "A B C D".split())
stock_data
``
A B C D
2022-01-01 50.499862 65.011650 91.563112 45.107004
2022-01-02 53.218393 86.534942 54.575897 28.154673
2022-01-03 96.827564 49.782633 19.894127 47.529094
2022-01-04 18.226396 27.908952 67.141263 66.101363
2022-01-05 1.061750 29.833253 94.161190 85.542529
``
I have currency exchange rates here in a series called 'currency_list'. Note the index is same as column names in stock_data for reference
currency_list=pd.Series(['USD','CAD','EUR','CHF'], index="A B C D".split())
currency_list
``
A USD
B CAD
C EUR
D CHF
dtype: object
``
I have currency exchange rates here in a dataframe called 'forex_data'
Forex_data=pd.DataFrame(np.random.rand(5,3), index=pd.date_range(start='1/1/2022', periods=5),columns = "USD CAD EUR".split())
Forex_data
`
``
USD CAD EUR
2022-01-01 0.194238 0.996759 0.900205
2022-01-02 0.366476 0.054540 0.474838
2022-01-03 0.709269 0.723097 0.655717
2022-01-04 0.557701 0.878100 0.824146
2022-01-05 0.865796 0.432785 0.222463
``
Now I want to convert the prices to my base currency (let's say CHF) by the following logic -
2022-05-01 price of stock A is 50.499*0.194 , and so forth.
I am just stuck and don't know what to do - could someone help?

Example
import numpy as np
df1 = pd.DataFrame(np.random.randint(5, 20,(5,4)), index=pd.date_range(start='1/1/2022', periods=5),columns = list("ABCD"))
s1 = pd.Series(['USD','CAD','EUR','CHF'], index=list("ABCD"))
df2 = pd.DataFrame(np.random.randint(10,40, (5, 3)) / 10, index=pd.date_range(start='1/1/2022', periods=5),columns = "USD CAD EUR".split())
df1
A B C D
2022-01-01 8 19 6 12
2022-01-02 15 8 18 6
2022-01-03 9 11 14 17
2022-01-04 17 13 17 17
2022-01-05 11 12 10 19
s1
A USD
B CAD
C EUR
D CHF
dtype: object
df2
USD CAD EUR
2022-01-01 2.7 1.0 1.4
2022-01-02 3.6 3.1 1.2
2022-01-03 2.7 2.1 1.0
2022-01-04 3.8 2.4 3.6
2022-01-05 2.0 1.6 3.6
Code
mapping columns of df1 and use mul
out = df1.set_axis(df1.columns.map(s1), axis=1).mul(df2).reindex(df2.columns, axis=1)
out
USD CAD EUR
2022-01-01 21.6 19.0 8.4
2022-01-02 54.0 24.8 21.6
2022-01-03 24.3 23.1 14.0
2022-01-04 64.6 31.2 61.2
2022-01-05 22.0 19.2 36.0

Related

Monthly max and min values along with corresponding date in df

Yearly data of stocks in following format
Unnamed: 0 SC_NAME SC_GROUP HIGH LOW CLOSE NO_OF_SHRS ISIN_CODE TRADING_DATE Month Year Mt_Year Qtr_Year
0 0 ABB LTD. A 2280.90 2224.20 2234.85 7219 INE117A01022 2021-12-31 12 2021 12-21 2021-4
1 1 AEGIS LOGIS A 223.90 217.65 221.80 49973 INE208C01025 2021-12-31 12 2021 12-21 2021-4
2 2 AMAR RAJA BA A 638.35 621.05 636.85 149244 INE885A01032 2021-12-31 12 2021 12-21 2021-4
3 3 A.SARABHAI X 34.40 31.50 33.65 367979 INE432A01017 2021-12-31 12 2021 12-21 2021-4
4 4 HDFC A 2608.00 2555.60 2586.85 48669 INE001A01036 2021-12-31 12 2021 12-21 2021-4
Expecting following:
Month wise or quarterly high and Low along with corresponding dates for all SC_NAME
result = data.groupby(['Year','Mt_Year','SC_NAME']).agg({'HIGH':['max'],'LOW':['min']})
along with corresponding date for HIGH and LOW Values.
HIGH LOW Date for high Date for low ISIN_CODE SC_GROUP
max min
Year Mt_Year SC_NAME
2021 01-21 EMERALD 17.35 11.40 INE030Q01015 X
20 MICRONS 42.20 36.15 INE144J01027 B
21ST CEN.MGM 11.50 10.30 INE253B01015 B
... ... ... ... ...
12-21 ZODIAC VEN 32.40 22.10 INE945J01027 X
ZOMATO 157.80 124.70 INE758T01015 B
ZUARI AGRO 125.40 101.50 INE840M01016 B
Thanking you.
IIUC, you can use idxmax and idxmin to get the lowest and the max index then extract TRADING_DATE for both:
out = (df.groupby(['Year', 'Month', 'SC_NAME'], as_index=False)
.agg(PRICE_HIGH=('HIGH', 'max'),
DATE_HIGH=('HIGH', 'idxmax'),
PRICE_LOW=('LOW', 'min'),
DATE_LOW=('LOW', 'idxmin'),
ISIN_CODE=('ISIN_CODE', 'first')))
out['DATE_HIGH'] = df.loc[out['DATE_HIGH'].tolist(), 'TRADING_DATE']
out['DATE_LOW'] = df.loc[out['DATE_LOW'].tolist(), 'TRADING_DATE']
Output:
>>> out
Year Month SC_NAME PRICE_HIGH DATE_HIGH PRICE_LOW DATE_LOW ISIN_CODE
0 2021 12 A.SARABHAI 34.40 2021-12-31 31.50 2021-12-31 INE432A01017
1 2021 12 ABB LTD. 2280.90 2021-12-31 2224.20 2021-12-31 INE117A01022
2 2021 12 AEGIS LOGIS 223.90 2021-12-31 217.65 2021-12-31 INE208C01025
3 2021 12 AMAR RAJA BA 638.35 2021-12-31 621.05 2021-12-31 INE885A01032
4 2021 12 HDFC 2608.00 2021-12-31 2555.60 2021-12-31 INE001A01036

how to extract values from previous dataframe based on row and column condition?

sorry for my naive, but i can't solve this. any reference or solution ?
df1 =
date a b c
0 2011-12-30 100 400 700
1 2021-01-30 200 500 800
2 2021-07-30 300 600 900
df2 =
date c b
0 2021-07-30 NaN NaN
1 2021-01-30 NaN NaN
2 2011-12-30 NaN NaN
desired output:
date c b
0 2021-07-30 900 600
1 2021-01-30 800 500
2 2011-12-30 700 400
Use DataFrame.fillna with convert date to indices in both DataFrames:
df = df2.set_index('date').fillna(df1.set_index('date')).reset_index()
print (df)
date c b
0 2021-07-30 900.0 600.0
1 2021-01-30 800.0 500.0
2 2011-12-30 700.0 400.0
You can reindex_like df2 after setting date a temporary index:
out = df1.set_index('date').reindex_like(df2.set_index('date')).reset_index()
output:
date c b
0 2021-07-30 900 600
1 2021-01-30 800 500
2 2011-12-30 700 400
Another possible solution, using pandas.DataFrame.update:
df2 = df2.set_index('date')
df2.update(df1.set_index('date'))
df2.reset_index()
Output:
date c b
0 2021-07-30 900.0 600.0
1 2021-01-30 800.0 500.0
2 2011-12-30 700.0 400.0

Merging two series with alternating dates into one grouped Pandas dataframe

Given are two series, like this:
#period1
DATE
2020-06-22 310.62
2020-06-26 300.05
2020-09-23 322.64
2020-10-30 326.54
#period2
DATE
2020-06-23 312.05
2020-09-02 357.70
2020-10-12 352.43
2021-01-25 384.39
These two series are correlated to each other, i.e. they each mark either the beginning or the end of a date period. The first series marks the end of a period1 period, the second series marks the end of period2 period. The end of a period2 period is at the same time also the start of a period1 period, and vice versa.
I've been looking for a way to aggregate these periods as date ranges, but apparently this is not easily possible with Pandas dataframes. Suggestions extremely welcome.
In the easiest case, the output layout should reflect the end dates of periods, which period type it was, and the amount of change between start and stop of the period.
Explicit output:
DATE CHG PERIOD
2020-06-22 NaN 1
2020-06-23 1.43 2
2020-06-26 12.0 1
2020-09-02 57.65 2
2020-09-23 35.06 1
2020-10-12 29.79 2
2020-10-30 25.89 1
2021-01-25 57.85 2
However, if there is any possibility of actually grouping by a date range consisting of start AND stop date, that would be much more favorable
Thank you!
p1 = pd.DataFrame(data={'Date': ['2020-06-22', '2020-06-26', '2020-09-23', '2020-10-30'], 'val':[310.62, 300.05, 322.64, 326.54]})
p2 = pd.DataFrame(data={'Date': ['2020-06-23', '2020-09-02', '2020-10-12', '2021-01-25'], 'val':[312.05, 357.7, 352.43, 384.39]})
p1['period'] = 1
p2['period'] = 2
df = p1.append(p2).sort_values('Date').reset_index(drop=True)
df['CHG'] = abs(df['val'].diff(periods=1))
df.drop('val', axis=1)
Output:
Date period CHG
0 2020-06-22 1 NaN
1 2020-06-23 2 1.43
2 2020-06-26 1 12.00
3 2020-09-02 2 57.65
4 2020-09-23 1 35.06
5 2020-10-12 2 29.79
6 2020-10-30 1 25.89
7 2021-01-25 2 57.85
EDIT: matching the format START - STOP - CHANGE - PERIOD
Starting from the above data frame:
df['Start'] = df.Date.shift(periods=1)
df.rename(columns={'Date': 'Stop'}, inplace=True)
df = df1[['Start', 'Stop', 'CHG', 'period']]
df
Output:
Start Stop CHG period
0 NaN 2020-06-22 NaN 1
1 2020-06-22 2020-06-23 1.43 2
2 2020-06-23 2020-06-26 12.00 1
3 2020-06-26 2020-09-02 57.65 2
4 2020-09-02 2020-09-23 35.06 1
5 2020-09-23 2020-10-12 29.79 2
6 2020-10-12 2020-10-30 25.89 1
7 2020-10-30 2021-01-25 57.85 2
# If needed:
df1.index = pd.to_datetime(df1.index)
df2.index = pd.to_datetime(df2.index)
df = pd.concat([df1, df2], axis=1)
df.columns = ['start','stop']
df['CNG'] = df.bfill(axis=1)['start'].diff().abs()
df['PERIOD'] = 1
df.loc[df.stop.notna(), 'PERIOD'] = 2
df = df[['CNG', 'PERIOD']]
print(df)
Output:
CNG PERIOD
Date
2020-06-22 NaN 1
2020-06-23 1.43 2
2020-06-26 12.00 1
2020-09-02 57.65 2
2020-09-23 35.06 1
2020-10-12 29.79 2
2020-10-30 25.89 1
2021-01-25 57.85 2
2021-01-29 14.32 1
2021-02-12 22.57 2
2021-03-04 15.94 1
2021-05-07 45.42 2
2021-05-12 16.71 1
2021-09-02 47.78 2
2021-10-04 24.55 1
2021-11-18 41.09 2
2021-12-01 19.23 1
2021-12-10 20.24 2
2021-12-20 15.76 1
2022-01-03 22.73 2
2022-01-27 46.47 1
2022-02-09 26.30 2
2022-02-23 35.59 1
2022-03-02 15.94 2
2022-03-08 21.64 1
2022-03-29 45.30 2
2022-04-29 49.55 1
2022-05-04 17.06 2
2022-05-12 36.72 1
2022-05-17 15.98 2
2022-05-19 18.86 1
2022-06-02 27.93 2
2022-06-17 51.53 1

GroupBy, Transpose, and flatten rows in Pandas

as_of_date
industry
sector
deal
year
quarter
stage
amount
yield
0
2022-01-01
Mortgage
RMBS
XYZ
2022
NaN
A
111
0.1
1
2022-01-01
Mortgage
RMBS
XYZ
2022
1
A
222
0.2
2
2022-01-01
Mortgage
RMBS
XYZ
2022
2
A
333
0.3
3
2022-01-01
Mortgage
RMBS
XYZ
2022
3
A
444
0.4
4
2022-01-01
Mortgage
RMBS
XYZ
2022
4
A
555
0.5
5
2022-01-01
Mortgage
RMBS
XYZ
2022
Nan
B
123
0.6
6
2022-01-01
Mortgage
RMBS
XYZ
2022
1
B
234
0.7
7
2022-01-01
Mortgage
RMBS
XYZ
2022
2
B
345
0.8
8
2022-01-01
Mortgage
RMBS
XYZ
2022
3
B
456
0.9
9
2022-01-01
Mortgage
RMBS
XYZ
2022
4
B
567
1.0
For each group (as_of_date, industry, sector, deal, year, stage), I need to display all the amounts and yields in one line
I have tried this -
df.groupby(['as_of_date', 'industry', 'sector', 'deal', 'year', 'stage'])['amount', 'yield' ].apply(lambda df: df.reset_index(drop=True)).unstack().reset_index()
but this is not working correctly.
Basically, I need this as output rows -
2022-01-01 Mortgage RMBS XYZ 2022 A 111 222 333 444 555 0.1 0.2 0.3 0.4 0.5
2022-01-01 Mortgage RMBS XYZ 2022 B 123 234 345 456 567 0.6 0.7 0.8 0.9 1.0
What would be the correct way to achieve this with Pandas? Thank you
This can be calculated by creating a list for each column first, then combined this (using +), and turning this into a string, removing the [, ], ,:
df1 = df.groupby(['as_of_date', 'industry', 'sector', 'deal', 'year', 'stage']).apply(
lambda x: str(list(x['amount']) + list(x['yield']))[1:-1].replace(",", ""))
df1
#Out:
#as_of_date industry sector deal year stage
#2022-01-01 Mortgage RMBS XYZ 2022 A 111 222 333 444 555 0.1 0.2 0.3 0.4 0.5
# B 123 234 345 456 567 0.6 0.7 0.8 0.9 1.0
Maybe this?
df.groupby(['as_of_date', 'industry', 'sector', 'deal', 'year', 'stage']).agg(' '.join).reset_index()
does this answer your question?
df2 = df.pivot(index=['as_of_date','industry','sector','deal','year', 'stage'], columns=['quarter']).reset_index()
to flatten the columns names
df2.columns = df2.columns.to_series().str.join('_')
df2
as_of_date_ industry_ sector_ deal_ year_ stage_ amount_1 amount_2 amount_3 amount_4 amount_NaN amount_Nan yield_1 yield_2 yield_3 yield_4 yield_NaN yield_Nan
0 2022-01-01 Mortgage RMBS XYZ 2022 A 222.0 333.0 444.0 555.0 111.0 NaN 0.2 0.3 0.4 0.5 0.1 NaN
1 2022-01-01 Mortgage RMBS XYZ 2022 B 234.0 345.0 456.0 567.0 NaN 123.0 0.7 0.8 0.9 1.0 NaN 0.6

Pandas groupby issue after melt bug?

Python version 3.8.12
pandas 1.4.1
Given the following dataframe:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'id': [1000] * 4,
'date': ['2022-01-01'] * 4,
'ts': pd.date_range('2022-01-01', freq='5M', periods=4),
'A': np.random.randint(1, 6, size=4),
'B': np.random.rand(4)
})
that looks like this:
id
date
ts
A
B
0
1000
2022-01-01
2022-01-01 00:00:00
4
0.98019
1
1000
2022-01-01
2022-01-01 00:05:00
3
0.82021
2
1000
2022-01-01
2022-01-01 00:10:00
4
0.549684
3
1000
2022-01-01
2022-01-01 00:15:00
5
0.0818311
I transposed the columns A and B with pandas melt:
melted = df.melt(
id_vars=['id', 'date', 'ts'],
value_vars=['A', 'B'],
var_name='label',
value_name='value',
ignore_index=True
)
that looks like this:
id
date
ts
label
value
0
1000
2022-01-01
2022-01-01 00:00:00
A
4
1
1000
2022-01-01
2022-01-01 00:05:00
A
3
2
1000
2022-01-01
2022-01-01 00:10:00
A
4
3
1000
2022-01-01
2022-01-01 00:15:00
A
5
4
1000
2022-01-01
2022-01-01 00:00:00
B
0.98019
5
1000
2022-01-01
2022-01-01 00:05:00
B
0.82021
6
1000
2022-01-01
2022-01-01 00:10:00
B
0.549684
7
1000
2022-01-01
2022-01-01 00:15:00
B
0.0818311
Then I groupby and select the first group:
melted.groupby(['id', 'date']).first()
that gives me this:
ts label value
id date
1000 2022-01-01 2022-01-01 A 4.0
but I would expect this output instead:
ts A B
id date
1000 2022-01-01 2022-01-01 00:00:00 4 0.980190
2022-01-01 2022-01-01 00:05:00 3 0.820210
2022-01-01 2022-01-01 00:10:00 4 0.549684
2022-01-01 2022-01-01 00:15:00 5 0.081831
What am I not getting? Or this is a bug? Also why the ts columns is converted to a date?
my bad!!! I thought first will get the first group but instead it will get the first element for each group, as stated in the documentation for the aggregation functions of pandas. Sorry folks, was doing this late at night and could not think straight :/
To select the first group, I needed to use get_group function.