pybbg: extracting data from Bloomberg using ISIN codes, not tickers - bloomberg

I have a large list of ISIN codes and would like to use them to pull Bloomberg data into Python using pybbg.
for example, this gives me nan values for all ISIN codes:
fld_list = ['OAS_SPREAD_MID','DUR_ADJ_MID','DUR_ADJ_OAS_MID']
bb = bbg.bdp("US46628LAA61 ISIN", fld_list)
When using the tickers, I get all field values.
Any ideas would be really appreciated.
Many thanks,

The correct syntax to request data for an ISIN is /isin/US46628LAA61.

With xbbg you can do this:
In[1]: from xbbg import blp
In[2]: fld_list = ['OAS_SPREAD_MID','DUR_ADJ_MID','DUR_ADJ_OAS_MID']
In[3]: blp.bdp(['US46628LAA61 Mtge', 'US46631JAA60 Mtge'], fld_list)
Out[3]:
ticker field value
0 US46628LAA61 Mtge OAS_SPREAD_MID -5.30
1 US46628LAA61 Mtge DUR_ADJ_MID 6.00
2 US46628LAA61 Mtge DUR_ADJ_OAS_MID 2.43
3 US46631JAA60 Mtge OAS_SPREAD_MID 50.10
4 US46631JAA60 Mtge DUR_ADJ_MID 1.71
5 US46631JAA60 Mtge DUR_ADJ_OAS_MID 4.09

Related

Specific calculations for unique column values in DataFrame

I want to make a beta calculation in my dataframe, where beta = Σ(daily returns - mean daily return) * (daily market returns - mean market return) / Σ (daily market returns - mean market return)**2
But I want my beta calculation to apply to specific firms. In my dataframe, each firm as an ID code number (specified in column 1), and I want each ID code to be associated with its unique beta.
I tried groupby, loc and for loop, but it seems to always return an error since the beta calculation is quite long and requires many parenthesis when inserted.
Any idea how to solve this problem? Thank you!
Dataframe:
index ID price daily_return mean_daily_return_per_ID daily_market_return mean_daily_market_return date
0 1 27.50 0.008 0.0085 0.0023 0.03345 01-12-2012
1 2 33.75 0.0745 0.0745 0.00458 0.0895 06-12-2012
2 3 29,20 0.00006 0.00006 0.0582 0.0045 01-05-2013
3 4 20.54 0.00486 0.005125 0.0009 0.0006 27-11-2013
4 1 21.50 0.009 0.0085 0.0846 0.04345 04-05-2014
5 4 22.75 0.00539 0.005125 0.0003 0.0006
I assume the following form of your equation is what you intended.
Then the following should compute the beta value for each group
identified by ID.
Method 1: Creating our own function to output beta
import pandas as pd
import numpy as np
# beta_data.csv is a csv version of the sample data frame you
# provided.
df = pd.read_csv("./beta_data.csv")
def beta(daily_return, daily_market_return):
"""
Returns the beta calculation for two pandas columns of equal length.
Will return NaN for columns that have just one row each. Adjust
this function to account for groups that have only a single value.
"""
mean_daily_return = np.sum(daily_return) / len(daily_return)
mean_daily_market_return = np.sum(daily_market_return) / len(daily_market_return)
num = np.sum(
(daily_return - mean_daily_return)
* (daily_market_return - mean_daily_market_return)
)
denom = np.sum((daily_market_return - mean_daily_market_return) ** 2)
return num / denom
# groupby the column ID. Then 'apply' the function we created above
# columnwise to the two desired columns
betas = df.groupby("ID")["daily_return", "daily_market_return"].apply(
lambda x: beta(x["daily_return"], x["daily_market_return"])
)
print(f"betas: {betas}")
Method 2: Using pandas' builtin statistical functions
Notice that beta as stated above is just covarianceof DR and
DMR divided by variance of DMR. Therefore we can write the above
program much more concisely as follows.
import pandas as pd
import numpy as np
df = pd.read_csv("./beta_data.csv")
def beta(dr, dmr):
"""
dr: daily_return (pandas columns)
dmr: daily_market_return (pandas columns)
TODO: Fix the divided by zero erros etc.
"""
num = dr.cov(dmr)
denom = dmr.var()
return num / denom
betas = df.groupby("ID")["daily_return", "daily_market_return"].apply(
lambda x: beta(x["daily_return"], x["daily_market_return"])
)
print(f"betas: {betas}")
The output in both cases is.
ID
1 0.012151
2 NaN
3 NaN
4 -0.883333
dtype: float64
The reason for getting NaNs for IDs 2 and 3 is because they only have a single row each. You should modify the function beta to accomodate these corner cases.
Maybe you can start like this?
id_list = list(set(df["ID"].values.tolist()))
for firm_id in id_list:
new_df = df.loc[df["ID"] == firm_id]

Series.replace cannot use dict-like to_replace and non-None value [duplicate]

I've got a pandas DataFrame filled mostly with real numbers, but there is a few nan values in it as well.
How can I replace the nans with averages of columns where they are?
This question is very similar to this one: numpy array: replace nan values with average of columns but, unfortunately, the solution given there doesn't work for a pandas DataFrame.
You can simply use DataFrame.fillna to fill the nan's directly:
In [27]: df
Out[27]:
A B C
0 -0.166919 0.979728 -0.632955
1 -0.297953 -0.912674 -1.365463
2 -0.120211 -0.540679 -0.680481
3 NaN -2.027325 1.533582
4 NaN NaN 0.461821
5 -0.788073 NaN NaN
6 -0.916080 -0.612343 NaN
7 -0.887858 1.033826 NaN
8 1.948430 1.025011 -2.982224
9 0.019698 -0.795876 -0.046431
In [28]: df.mean()
Out[28]:
A -0.151121
B -0.231291
C -0.530307
dtype: float64
In [29]: df.fillna(df.mean())
Out[29]:
A B C
0 -0.166919 0.979728 -0.632955
1 -0.297953 -0.912674 -1.365463
2 -0.120211 -0.540679 -0.680481
3 -0.151121 -2.027325 1.533582
4 -0.151121 -0.231291 0.461821
5 -0.788073 -0.231291 -0.530307
6 -0.916080 -0.612343 -0.530307
7 -0.887858 1.033826 -0.530307
8 1.948430 1.025011 -2.982224
9 0.019698 -0.795876 -0.046431
The docstring of fillna says that value should be a scalar or a dict, however, it seems to work with a Series as well. If you want to pass a dict, you could use df.mean().to_dict().
Try:
sub2['income'].fillna((sub2['income'].mean()), inplace=True)
In [16]: df = DataFrame(np.random.randn(10,3))
In [17]: df.iloc[3:5,0] = np.nan
In [18]: df.iloc[4:6,1] = np.nan
In [19]: df.iloc[5:8,2] = np.nan
In [20]: df
Out[20]:
0 1 2
0 1.148272 0.227366 -2.368136
1 -0.820823 1.071471 -0.784713
2 0.157913 0.602857 0.665034
3 NaN -0.985188 -0.324136
4 NaN NaN 0.238512
5 0.769657 NaN NaN
6 0.141951 0.326064 NaN
7 -1.694475 -0.523440 NaN
8 0.352556 -0.551487 -1.639298
9 -2.067324 -0.492617 -1.675794
In [22]: df.mean()
Out[22]:
0 -0.251534
1 -0.040622
2 -0.841219
dtype: float64
Apply per-column the mean of that columns and fill
In [23]: df.apply(lambda x: x.fillna(x.mean()),axis=0)
Out[23]:
0 1 2
0 1.148272 0.227366 -2.368136
1 -0.820823 1.071471 -0.784713
2 0.157913 0.602857 0.665034
3 -0.251534 -0.985188 -0.324136
4 -0.251534 -0.040622 0.238512
5 0.769657 -0.040622 -0.841219
6 0.141951 0.326064 -0.841219
7 -1.694475 -0.523440 -0.841219
8 0.352556 -0.551487 -1.639298
9 -2.067324 -0.492617 -1.675794
Although, the below code does the job, BUT its performance takes a big hit, as you deal with a DataFrame with # records 100k or more:
df.fillna(df.mean())
In my experience, one should replace NaN values (be it with Mean or Median), only where it is required, rather than applying fillna() all over the DataFrame.
I had a DataFrame with 20 variables, and only 4 of them required NaN values treatment (replacement). I tried the above code (Code 1), along with a slightly modified version of it (code 2), where i ran it selectively .i.e. only on variables which had a NaN value
#------------------------------------------------
#----(Code 1) Treatment on overall DataFrame-----
df.fillna(df.mean())
#------------------------------------------------
#----(Code 2) Selective Treatment----------------
for i in df.columns[df.isnull().any(axis=0)]: #---Applying Only on variables with NaN values
df[i].fillna(df[i].mean(),inplace=True)
#---df.isnull().any(axis=0) gives True/False flag (Boolean value series),
#---which when applied on df.columns[], helps identify variables with NaN values
Below is the performance i observed, as i kept on increasing the # records in DataFrame
DataFrame with ~100k records
Code 1: 22.06 Seconds
Code 2: 0.03 Seconds
DataFrame with ~200k records
Code 1: 180.06 Seconds
Code 2: 0.06 Seconds
DataFrame with ~1.6 Million records
Code 1: code kept running endlessly
Code 2: 0.40 Seconds
DataFrame with ~13 Million records
Code 1: --did not even try, after seeing performance on 1.6 Mn records--
Code 2: 3.20 Seconds
Apologies for a long answer ! Hope this helps !
If you want to impute missing values with mean and you want to go column by column, then this will only impute with the mean of that column. This might be a little more readable.
sub2['income'] = sub2['income'].fillna((sub2['income'].mean()))
# To read data from csv file
Dataset = pd.read_csv('Data.csv')
X = Dataset.iloc[:, :-1].values
# To calculate mean use imputer class
from sklearn.impute import SimpleImputer
imputer = SimpleImputer(missing_values=np.nan, strategy='mean')
imputer = imputer.fit(X[:, 1:3])
X[:, 1:3] = imputer.transform(X[:, 1:3])
Directly use df.fillna(df.mean()) to fill all the null value with mean
If you want to fill null value with mean of that column then you can use this
suppose x=df['Item_Weight'] here Item_Weight is column name
here we are assigning (fill null values of x with mean of x into x)
df['Item_Weight'] = df['Item_Weight'].fillna((df['Item_Weight'].mean()))
If you want to fill null value with some string then use
here Outlet_size is column name
df.Outlet_Size = df.Outlet_Size.fillna('Missing')
Pandas: How to replace NaN (nan) values with the average (mean), median or other statistics of one column
Say your DataFrame is df and you have one column called nr_items. This is: df['nr_items']
If you want to replace the NaN values of your column df['nr_items'] with the mean of the column:
Use method .fillna():
mean_value=df['nr_items'].mean()
df['nr_item_ave']=df['nr_items'].fillna(mean_value)
I have created a new df column called nr_item_ave to store the new column with the NaN values replaced by the mean value of the column.
You should be careful when using the mean. If you have outliers is more recommendable to use the median
Another option besides those above is:
df = df.groupby(df.columns, axis = 1).transform(lambda x: x.fillna(x.mean()))
It's less elegant than previous responses for mean, but it could be shorter if you desire to replace nulls by some other column function.
using sklearn library preprocessing class
from sklearn.impute import SimpleImputer
missingvalues = SimpleImputer(missing_values = np.nan, strategy = 'mean', axis = 0)
missingvalues = missingvalues.fit(x[:,1:3])
x[:,1:3] = missingvalues.transform(x[:,1:3])
Note: In the recent version parameter missing_values value change to np.nan from NaN
I use this method to fill missing values by average of a column.
fill_mean = lambda col : col.fillna(col.mean())
df = df.apply(fill_mean, axis = 0)
You can also use value_counts to get the most frequent values. This would work on different datatypes.
df = df.apply(lambda x:x.fillna(x.value_counts().index[0]))
Here is the value_counts api reference.

Pandas Check Excel import for isnumeric() data in Dataframe

I am importing data into Pandas from Excel and I need to verify that the data is numeric based on the Columns.
month value dp wd ... mg fee pr comment
0 2013-07-31 208372.33 4206.84 4692.22 ... 0 0 0 some comment
1 2013-08-31 210669.77 0.00 1270.28 ... 0 0 0
There are about 20 columns and I only need to exclude the "month" and "comment" columns.
Is there something like: df.iloc[:, 2: 18].isnumeric() or will this require a loop?
I would like to get a True / False response.
thank you.
One way is select_dtypes and compare:
np.array_equal(df.select_dtypes(include='number').columns, df.columns[1:-1])
You can use apply method to apply series methods to columns of a dataframe.
df2 = df.drop(["month", "comment"], axis=1)
df2 = df2.apply(lambda x: x.str.isnumeric())

Unexpected Result Updating a Copy of a DF when using iterrows

When I ran this code, I was expected df2 to update accurately but it does not. Here is the code...
import pandas as pd
import numpy as np
exam_data = [{'name':'Anastasia', 'score':12.5}, {'name':'Dima','score':9}, {'name':'Katherine','score':16.5}]
df = pd.DataFrame(exam_data)
df2 = df.copy()
for index, row in df.iterrows():
df2['score'] = row['score'] * 2
print(row['name'], row['score'])
print(df2)
As you can see from the output below, the scores did not double, they were all set to 33.0
Anastasia 12.5
Dima 9.0
Katherine 16.5
name score
0 Anastasia 33.0
1 Dima 33.0
2 Katherine 33.0
What is going on, why am I seeing that unanticipated result?
Because you set df2['score'] every time you iteration. Try to make a change:
row['score'] = row['score'] * 2
Pandas works column-wise; instead of iterating over the rows (which is slow), you can just use
df2['score'] = df['score'] * 2
That will update the entire column at once.

How to apply function on mutlple-index pandas dataframe elegantly like on panel?

Suppose I have a dataframe like:
ticker MS AAPL
field price volume price volume
0 -0.861210 -0.319607 -0.855145 0.635594
1 -1.986693 -0.526885 -1.765813 1.696533
2 -0.154544 -1.152361 -1.391477 -2.016119
3 0.621641 -0.109499 0.143788 -0.050672
generated from following codes, please ignore the numbers which are just as examples
columns = pd.MultiIndex.from_tuples([('MS', 'price'), ('MS', 'volume'), ('AAPL', 'price'), ('AAPL', 'volume')], names=['ticker', 'field'])
data = np.random.randn(4, 4)
df = pd.DataFrame(data, columns=columns)
Now, I would like to calculate pct_change() or any function user definded on each price column, and add a new column on 'field' level to store the result.
I know how to do it elegantly if the data is a Panel, which is deprecated since ver 0.20. Suppose panel's 3 axis are date, ticker and field:
p[:,:, 'ret'] = p[:,:,'price'].pct_change()
That's all. But I have not found a similar elegant way to do it with multiple index dataframe.
You can using IndexSlice
df.loc[:,pd.IndexSlice[:,'price']].apply(pd.Series.pct_change).rename(columns={'price':'ret'})
Out[1181]:
ticker MS AAPL
field ret ret
0 NaN NaN
1 -1.420166 -0.279805
2 3.011155 0.062529
3 -1.609004 0.759954
def cstm(s):
return s.pct_change()
new = pd.concat(
[df.xs('price', 1, 1).apply(cstm)],
axis=1, keys=['new']
).swaplevel(0, 1, 1)
df.join(new).sort_index(1)
ticker AAPL MS
field new price volume new price volume
0 NaN -0.855145 0.635594 NaN -0.861210 -0.319607
1 1.064928 -1.765813 1.696533 1.306863 -1.986693 -0.526885
2 -0.211991 -1.391477 -2.016119 -0.922211 -0.154544 -1.152361
3 -1.103335 0.143788 -0.050672 -5.022430 0.621641 -0.109499
Or
def cstm(s):
return s.pct_change()
df.stack(0).assign(
new=lambda d: d.groupby('ticker').price.apply(cstm)
).unstack().swaplevel(0, 1, 1).sort_index(1)