How to go from relative dates to absolute dates in DataFrame columns - pandas

I have a pandas DataFrame containing forward prices for future maturities, quoted on multiple different trading months ('trade date'). Trade dates are given in absolute terms ('January'). The maturities are given in relative terms ('M+1').
How can I convert the maturities into an absolute format, i.e. in trade date 'January' the maturity 'M+1' should say 'February'.
Here is example data:
import pandas as pd
import numpy as np
data_keys = ['trade date', 'm+1', 'm+2', 'm+3']
data = {'trade date':['jan','feb','mar','apr'],
'm+1':np.random.randn(4),
'm+2':np.random.randn(4),
'm+3':np.random.randn(4)}
df = pd.DataFrame(data)
df = df[data_keys]
Starting data:
trade date m+1 m+2 m+3
0 jan -0.446535 -1.012870 -0.839881
1 feb 0.013255 0.265500 1.130098
2 mar 0.406562 -1.122270 -1.851551
3 apr -0.890004 0.752648 0.778100
Result:
Should have Feb, Mar, Apr, May, Jun, Jul in the columns. NaN will be shown in many instances.

The starting DataFrame:
trade date m+1 m+2 m+3
0 jan -1.350746 0.948835 0.579352
1 feb 0.011813 2.020158 -1.221110
2 mar -0.183187 -0.303099 1.323092
3 apr 0.081105 0.662628 -0.703152
Solution:
Define a list of all possible absolute dates you will encounter, in
chronological order. Do the same for relative dates.
Create a function to act on groups coming from df.groupby. The
function will convert the column names of each group appropriately to
an absolute format.
Apply the function.
Pandas handles the clever concatenation of all groups.
Code:
abs_in_order = ['jan','feb','mar','apr','may','jun','jul','aug']
rel_in_order = ['m+0','m+1','m+2','m+3','m+4']
def rel2abs(group, abs_in_order, rel_in_order):
abs_date = group['trade date'].unique()[0]
l = len(rel_in_order)
i = abs_in_order.index(abs_date)
namesmap = dict(zip(rel_in_order, abs_in_order[i:i+l]))
group.rename(columns=namesmap, inplace=True)
return group
grouped = df.groupby(['trade date'])
df = grouped.apply(rel2abs, abs_in_order, rel_in_order)
Pandas may mess up the column order. Do this to get back to something in chronological order:
order = ['trade date'] + abs_in_order
cols = [e for e in order if e in df.columns]
df[cols]
Result:
trade date feb mar apr may jun jul
0 jan -1.350746 0.948835 0.579352 NaN NaN NaN
1 feb NaN 0.011813 2.020158 -1.221110 NaN NaN
2 mar NaN NaN -0.183187 -0.303099 1.323092 NaN
3 apr NaN NaN NaN 0.081105 0.662628 -0.703152

You question doesn't contain enough information to answer it.
You say that the prices are quoted on dates given in absolute terms ('January').
January is not a date, but 2-Jan-2015 is.
What is your actual 'date' and what is its format (i.e. text, datetime.date, pd.Timestamp, etc.). You can use type(date) to check where date is an instance of whatever quote date it represents.
The easiest solution is to get your trade dates into pd.Timestamps and then add an offset:
trade_date = pd.Timestamp('2015-1-15')
>>> trade_date + pd.DateOffset(months=1)
Timestamp('2015-02-15 00:00:00')

Related

Extract index values from groupby using numpy array

I'm trying to extract 7 rows of data grouped by the DTY column in a Dataframe. I think I need to use a numpy array to filter, but can't seem to get it working.
Here's an extract of my original CLimate_DF dataframe (note there are NaN values for the :
Year Month Day DTY Precip9am MaxT ... Wind ms 21 hr
1989 1 1 1 0 29.7 ... 0
1989 1 2 2 0 31.1 ... 4.6
1989 1 3 3 0.4 32 ... 2.1
... ... ... ... ... ... ... ...
2019 12 31 365 21.2 31.3 ... 2.1
First, I created a numpy array filter based on a given date - this works and creates the right array:
#Enter Day of Interest (DOI) yyyy, mm, dd
DOI = datetime.datetime(2020, 2, 6)
DTY = DOI.timetuple()[7]
DTYMinus3 = DTY-3
DTYPlus3 = DTY+3
DTY_Array = np.linspace(DTYMinus3, DTYPlus3,7)
DTY_Array = np.array(DTY_Array)
I now want to extract all values from all years for columns 'Precip9am' through to 'Winds ms 21 hr' and filter those columns to just include DTY = DTY_Array
I've grouped the Climate_DF to DTY and then try to apply a filter:
ClimateDTY_DF = Climate_DF.groupby("DTY")
DTY_Climate = ClimateDTY_DF[DTY_Array]
I get the following error:
KeyError: 'Columns not found: 34.0, 35.0, 36.0, 37.0, 38.0, 39.0, 40.0'
I'm assuming this is because the filter is trying to find the values in the ClimateDTY_DF columns, but I need **it to be filtering through the DTY column to find all the indexes from the array and extract the values from each column aftern 'Precip9am'**.
How do I do this? Do I transpose first? Or do I need to create some kind of loop that creates a DF for DTY and each column from 'Precip9am' through to 'Winds ms 21 hr' and extracts just the DTY from the array for each?

how split data with respect of months?

Hi I have a time series data set. I would like to make a new column for each month.
data:
creationDate fre skill
2019-02-15T20:43:29Z 14 A
2019-02-15T21:10:32Z 15 B
2019-03-22T07:14:50Z 41 A
2019-03-22T06:47:41Z 64 B
2019-04-11T09:49:46Z 25 A
2019-04-11T09:49:46Z 29 B
output:
skill 2019-02 2019-03 2019-04
A 14 41 25
B 15 64 29
I know I can do it manually like below and make columns (when I have date1_start and date1_end):
dfdate1=data[(data['creationDate'] >= date1_start) & (data['creationDate']<= date1_end)]
But since I have many many months, it is not feasible to that this ways for each month.
Use DataFrame.pivot with convert datetimes to month periods by Series.dt.to_period:
df['dates'] = pd.to_datetime(df['creationDate']).dt.to_period('m')
df = df.pivot('skill','dates','fre')
Or to custom strings YYYY-MM by Series.dt.strftime:
df['dates'] = pd.to_datetime(df['creationDate']).dt.strftime('%Y-%m')
df = df.pivot('skill','dates','fre')
EDIT:
ValueError: Index contains duplicate entries, cannot reshape
It means there are duplicates, use DataFrame.pivot_table with some aggregation, e.g. sum, mean:
df = df.pivot_table(index='skill',columns='dates',values='fre', aggfunc='sum')

Removing Space between bars in seaborn barplot

I am trying to plot following data. Duration is Jan to Dec. Type varies from 1 to 7. Key point is, not all types exist for each month. This is not missing value, type simply do not exist.
Month Type Coef
Jan 1 2.3
Jan 2 2.1
..
Code:
ax = sns.barplot(x = 'Month', y = 'Coef_E',hue = 'LCZ',data = df_E, palette=palette)
Result
I want to remove space market by arrows.

Groupby month parameter in Multi-level Index in pandas

I have a large DF which is structured like this. It has multiple stocks in level 0 and Date is level 1. Starts monthly data at 12/31/2004 and continues to 12/31/2017 (not shown).
Date DAILY_RETURN
A 12/31/2004 NaN
1/31/2005 -8.26
2/28/2005 8.55
3/31/2005 -7.5
4/29/2005 -6.53
5/31/2005 15.71
6/30/2005 -4.12
7/29/2005 13.99
8/31/2005 22.56
9/30/2005 1.83
10/31/2005 -2.26
11/30/2005 11.4
12/30/2005 -6.65
1/31/2006 1.86
2/28/2006 6.16
3/31/2006 4.31
What I want to do is groupby the month and then count the number of POSITIVE returns in the daily_returns by month (ie 01, then 02, 03, etc from the Date part of the index). This code will give me the count but only by index level=0.
df3.groupby(level=0)['DAILY_RETURN'].agg(['count'])
There are other question out there, this one being the closest but I can not get the code to work. Can someone help out. Ultimately what I want to do is groupby stock and then month and FILTER all stocks that have at least 70% positive returns by month. I cant seem to figure out how to get the positive return from the dataframe either
How to group pandas DataFrame entries by date in a non-unique column
Here it is for a smaller data, using datetime
import pandas as pd
from datetime import datetime
df = pd.DataFrame()
df['Date'] = ['12/31/2004', '1/31/2005', '12/31/2005', '2/28/2006', '2/28/2007']
df['DAILY_RETURN'] = [-8, 9, 5, 10, 14]
df = df[df.DAILY_RETURN > 0]
df['Date_obj'] = df['Date'].apply(lambda x: datetime.strptime(x, '%m/%d/%Y').month)
df.groupby('Date_obj').count()[['DAILY_RETURN']]

Create datetime from columns in a DataFrame

I got a DataFrame with these columns :
year month day gender births
I'd like to create a new column type "Date" based on the column year, month and day as : "yyyy-mm-dd"
I'm just beginning in Python and I just can't figure out how to proceed...
Assuming you are using pandas to create your dataframe, you can try:
>>> import pandas as pd
>>> df = pd.DataFrame({'year':[2015,2016],'month':[2,3],'day':[4,5],'gender':['m','f'],'births':[0,2]})
>>> df['dates'] = pd.to_datetime(df.iloc[:,0:3])
>>> df
year month day gender births dates
0 2015 2 4 m 0 2015-02-04
1 2016 3 5 f 2 2016-03-05
Taken from the example here and the slicing (iloc use) "Selection" section of "10 minutes to pandas" here.
You can useĀ .assign
For example:
df2= df.assign(ColumnDate = df.Column1.astype(str) + '- ' + df.Column2.astype(str) + '-' df.Column3.astype(str) )
It is simple and it is much faster than lambda if you have tonnes of data.