MatPlotLib with custom dictionaries convert to graphs - dataframe

Problem:
I have a list of ~108 dictionaries named list_of_dictionary and I would like to use Matplotlib to generate line graphs.
The dictionaries have the following format (this is one of 108):
{'price': [59990,
59890,
60990,
62990,
59990,
59690],
'car': '2014 Land Rover Range Rover Sport',
'datetime': [datetime.datetime(2020, 1, 22, 11, 19, 26),
datetime.datetime(2020, 1, 23, 13, 12, 33),
datetime.datetime(2020, 1, 28, 12, 39, 24),
datetime.datetime(2020, 1, 29, 18, 39, 36),
datetime.datetime(2020, 1, 30, 18, 41, 31),
datetime.datetime(2020, 2, 1, 12, 39, 7)]
}
Understanding the dictionary:
The car 2014 Land Rover Range Rover Sport was priced at:
59990 on datetime.datetime(2020, 1, 22, 11, 19, 26)
59890 on datetime.datetime(2020, 1, 23, 13, 12, 33)
60990 on datetime.datetime(2020, 1, 28, 12, 39, 24)
62990 on datetime.datetime(2020, 1, 29, 18, 39, 36)
59990 on datetime.datetime(2020, 1, 30, 18, 41, 31)
59690 on datetime.datetime(2020, 2, 1, 12, 39, 7)
Question:
With this structure how could one create mini-graphs with matplotlib (say 11 rows x 10 columns)?
Where each mini-graph will have:
the title of the graph frome car
x-axis from the datetime
y-axis from the price
What I have tried:
df = pd.DataFrame(list_of_dictionary)
df = df.set_index('datetime')
print(df)
I don't know what to do thereafter...
Relevant Research:
Plotting a column containing lists using Pandas
Pandas column of lists, create a row for each list element
I've read these multiple times, but the more I read it, the more confused I get :(.

I don't know if it's sensible to try and plot that many plots on a figure. You'll have to make some choices to be able to fit all the axes decorations on the page (titles, axes labels, tick labels, etc...).
but the basic idea would be this:
car_data = [{'price': [59990,
59890,
60990,
62990,
59990,
59690],
'car': '2014 Land Rover Range Rover Sport',
'datetime': [datetime.datetime(2020, 1, 22, 11, 19, 26),
datetime.datetime(2020, 1, 23, 13, 12, 33),
datetime.datetime(2020, 1, 28, 12, 39, 24),
datetime.datetime(2020, 1, 29, 18, 39, 36),
datetime.datetime(2020, 1, 30, 18, 41, 31),
datetime.datetime(2020, 2, 1, 12, 39, 7)]
}]*108
fig, axs = plt.subplots(11,10, figsize=(20,22)) # adjust figsize as you please
for car,ax in zip(car_data, axs.flat):
ax.plot(car["datetime"], car['price'], '-')
ax.set_title(car['car'])
Ideally, all your axes could share the same x and y axes so you could have the labels only on the left-most and bottom-most axes. This is taken care of automatically if you add sharex=True and sharey=True to subplots():
fig, axs = plt.subplots(11,10, figsize=(20,22), sharex=True, sharey=True) # adjust figsize as you please

Related

Outliers in data

I have a dataset like so -
15643, 14087, 12020, 8402, 7875, 3250, 2688, 2654, 2501, 2482, 1246, 1214, 1171, 1165, 1048, 897, 849, 579, 382, 285, 222, 168, 115, 92, 71, 57, 56, 51, 47, 43, 40, 31, 29, 29, 29, 29, 28, 22, 20, 19, 18, 18, 17, 15, 14, 14, 12, 12, 11, 11, 10, 9, 9, 8, 8, 8, 8, 7, 6, 5, 5, 5, 4, 4, 4, 3, 3, 3, 3, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
Based on domain knowledge, I know that larger values are the only ones we want to include in our analysis. How do I determine where to cut off our analysis? Should it be don't include 15 and lower or 50 and lower etc?
You can do a distribution check with quantile function. Then you can remove values below lowest 1 percentile or 2 percentile. Following is an example:
import numpy as np
data = np.array(data)
print(np.quantile(data, (.01, .02)))
Another method is calculating the inter quartile range (IQR) and setting lowest bar for analysis is Q1-1.5*IQR
Q1, Q3 = np.quantile(data, (0.25, 0.75))
data_floor = Q1 - 1.5 * (Q3 - Q1)

How can I keep a subplot table from stretching?

How should I alter the following script to keep the subplot (on right_ from stretching? Is there a way to set either plot area of the subplot? Frustrating as I go thru the row/column sizing in the function, but when plot it just expands to fill the area. In the left subplot is the full list (22 rows). In the right I just pass half the df rows, and it fills vertically? Thx.
import pandas as pd
import matplotlib.pyplot as plt
import six
plt.rcParams['font.family'] = "Lato"
raw_data = dict(TF_001=[42, 39, 86, 15, 23, 57, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
21, 22, 23, 24, 25],
SP500=[52, 41, 79, 80, 34, 47, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,
23, 24, 25],
Strategy=[62, 37, 84, 51, 67, 32, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21,
22,
23, 24, 25],
LP_Port=[72, 43, 36, 26, 53, 88, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21,
22, 23, 24, 25])
df = pd.DataFrame(raw_data, index=pd.Index(
['Sharpe Ratio', 'Sortino Ratio', 'Calmars Ratio', 'Ulcer Index', 'Max Drawdown', 'Volatility',
'VaR', 'CVaR', 'R-Squared', 'CAGR', 'Risk-of-Ruin', 'Gain-Pain Ratio', 'Pitfall Indicator',
'Serentity Ratio', 'Common Sense Ratio', 'Kelly Criteria', 'Payoff Ratio', 'Ratio-A',
'Ratio-B', 'Ratio-C', 'Ratio-D', 'Ratio-E'], name='Metric'),
columns=pd.Index(['TF_001', 'SP500', 'Strategy', 'LP_Port'], name='Series'))
def create_table(data,
ax=None,
col_width=None,
row_height=None,
font_size=8,
header_color='#E5E5E5',
row_colors=None,
edge_color='w',
header_columns=0,
bbox=None):
if row_colors is None:
row_colors = ['#F1F8E9', 'w']
if bbox is None:
bbox = [0, 0, 1, 1]
data_table = ax.table(cellText=data.values,
colLabels=data.columns,
rowLabels=data.index,
bbox=bbox,
cellLoc='center',
rowLoc='left',
colLoc='center',
colWidths=([col_width] * len(data.columns)))
cell_map = data_table.get_celld()
for i in range(0, len(data.columns)):
cell_map[(0, i)].set_height(row_height * 0.2)
data_table.auto_set_font_size(False)
data_table.set_fontsize(font_size)
for k, cell in six.iteritems(data_table._cells):
cell.set_edgecolor(edge_color)
if k[0] == 0 or k[1] < header_columns:
cell.set_text_props(weight='heavy', color='black')
cell.set_facecolor(header_color)
else:
cell.set_facecolor(row_colors[k[0] % len(row_colors)])
for row, col in data_table._cells:
if (row == 0) or (col == -1):
data_table._cells[(row, col)].set_alpha(0.8)
return ax
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 7), constrained_layout=False)
create_table(df, ax1, col_width=1.1, row_height=0.25, font_size=8)
create_table(df.iloc[0:11, ], ax2, col_width=1.1, row_height=0.25, font_size=8)
ax1.set_title("- Conventional Risk Measures -",
fontsize=10,
fontweight='heavy',
loc='center')
ax1.axis('off')
ax2.set_title("- Second Order Risk Measures -",
fontsize=10,
fontweight='heavy',
loc='center')
ax2.axis('off')
plt.suptitle('EF QuantOne - Performance and Risk Assessment ("PaRA")',
x=0.0175,
y=0.9775,
ha='left',
fontsize=12,
weight='heavy')
plt.tight_layout()
plt.savefig('risk_parameter_table[1].pdf',
orientation='portrait',
pad_inches=0.5)
plt.show()
Figured it out ...
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import six
plt.rcParams['font.family'] = "Lato"
raw_data = dict(TF_001=[42, 39, 86, 15, 23, 57, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
21, 22, 23, 24, 25],
SP500=[52, 41, 79, 80, 34, 47, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,
23, 24, 25],
Strategy=[62, 37, 84, 51, 67, 32, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21,
22,
23, 24, 25],
LP_Port=[72, 43, 36, 26, 53, 88, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21,
22, 23, 24, 25])
df = pd.DataFrame(raw_data, index=pd.Index(
['Sharpe Ratio', 'Sortino Ratio', 'Calmars Ratio', 'Ulcer Index', 'Max Drawdown', 'Volatility',
'VaR', 'CVaR', 'R-Squared', 'CAGR', 'Risk-of-Ruin', 'Gain-Pain Ratio', 'Pitfall Indicator',
'Serentity Ratio', 'Common Sense Ratio', 'Kelly Criteria', 'Payoff Ratio', 'Ratio-A',
'Ratio-B', 'Ratio-C', 'Ratio-D', 'Ratio-E'], name='Metric'),
columns=pd.Index(['TF_001', 'SP500', 'Strategy', 'LP_Port'], name='Series'))
def create_table(data,
ax=None,
col_width=None,
row_height=None,
font_size=8,
header_color='#E5E5E5',
row_colors=None,
edge_color='w',
header_columns=0,
bbox=None):
if row_colors is None:
row_colors = ['#F1F8E9', 'w']
if bbox is None:
bbox = [0, 0, 1, 1]
data_table = ax.table(cellText=data.values,
colLabels=data.columns,
rowLabels=data.index,
bbox=bbox,
cellLoc='center',
rowLoc='left',
colLoc='center',
colWidths=([col_width] * len(data.columns)))
cell_map = data_table.get_celld()
for i in range(0, len(data.columns)):
cell_map[(0, i)].set_height(row_height * 0.2)
data_table.auto_set_font_size(False)
data_table.set_fontsize(font_size)
for k, cell in six.iteritems(data_table._cells):
cell.set_edgecolor(edge_color)
if k[0] == 0 or k[1] < header_columns:
cell.set_text_props(weight='heavy', color='black')
cell.set_facecolor(header_color)
else:
cell.set_facecolor(row_colors[k[0] % len(row_colors)])
for row, col in data_table._cells:
if (row == 0) or (col == -1):
data_table._cells[(row, col)].set_alpha(0.8)
return ax
# fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(11, 8.5), constrained_layout=False)
fig = plt.figure(figsize=(12, 10))
w, h = fig.get_size_inches()
div = np.array([w, h, w, h])
col_width = 1.1
row_height = 0.25
ax1_subplot_size = (np.array(df.shape[::-1]) + np.array([0, 1])) * np.array(
[col_width, row_height])
ax1 = fig.add_axes(np.array([1.6, 1, 4.4, 5.75]) / div)
ax2_subplot_size = (np.array(df.shape[::-1]) + np.array([0, 1])) * np.array(
[col_width, row_height])
ax2 = fig.add_axes(np.array([7.5, 3.75, 4.4, 3]) / div)
create_table(df, ax1, col_width, row_height, font_size=8)
create_table(df.iloc[0:11, ], ax2, col_width, row_height, font_size=8)
ax1.set_title("- Conventional Risk Measures -",
fontsize=10,
fontweight='heavy',
loc='center')
ax1.axis('off')
ax2.set_title("- Second Order Risk Measures -",
fontsize=10,
fontweight='heavy',
loc='center')
ax2.axis('off')
plt.suptitle('EF QuantOne - Performance and Risk Assessment ("PaRA")',
x=0.0175,
y=0.9775,
ha='left',
fontsize=12,
weight='heavy')
# plt.tight_layout()
plt.savefig('risk_parameter_table[1].pdf',
orientation='portrait',
pad_inches=0.5)
plt.show()

Appending numpy arrays using numpy.insert

I have a numpy array (inputs) of shape (30,1). I want to insert 31st value (eg. x = 2). Trying to use the np.insert function but it is giving me out of bounds error.
np.insert(inputs,b+1,x)
IndexError: index 31 is out of bounds for axis 0 with size 30
Short answer: you need to insert it at index b, not b+1.
The index you pass to np.insert(..) [numpy-doc], is the one where the element should be added. If you insert it at index 30, then it will be positioned last. Note that indexes are zero-based. So if you have an array with 30 elements, then the last index is 29. If you thus insert this at index 30, we get:
>>> a
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29])
>>> np.insert(a,30,42)
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 42])

How should width be set for a bar in matplotlib?

I'm using python 2, and the following code is just using some example data, my actual data can be of varying lengths and might not be minutely.
import numpy as np
import datetime
import matplotlib
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
x_values = [datetime.datetime(2018, 11, 8, 11, 16),
datetime.datetime(2018, 11, 8, 11, 17),
datetime.datetime(2018, 11, 8, 11, 18),
datetime.datetime(2018, 11, 8, 11, 19),
datetime.datetime(2018, 11, 8, 11, 20),
datetime.datetime(2018, 11, 8, 11, 21),
datetime.datetime(2018, 11, 8, 11, 22),
datetime.datetime(2018, 11, 8, 11, 23),
datetime.datetime(2018, 11, 8, 11, 24),
datetime.datetime(2018, 11, 8, 11, 25),
datetime.datetime(2018, 11, 8, 11, 26),
datetime.datetime(2018, 11, 8, 11, 27),
datetime.datetime(2018, 11, 8, 11, 28),
datetime.datetime(2018, 11, 8, 11, 29),
datetime.datetime(2018, 11, 8, 11, 30),
datetime.datetime(2018, 11, 8, 11, 31)]
y_values = [1392.1017964071857,
1392.2814371257484,
1392.37125748503,
1227.6802721088436,
1083.1,
1317.0461538461539,
1393.059880239521,
1393.4011976047905,
1393.491017964072,
1393.8502994011976,
1318.3461538461538,
1229.4965986394557,
1394.2095808383233,
1394.3892215568862,
1394.6586826347304,
1394.688622754491]
rects1 = ax.bar(x_values, y_values)
fig.tight_layout()
plt.show()
How am I supposed to set the width of the bars automatically? As it is I get the following:
If I set the width to 0.0006 then it looks good for the example data:
from which I've worked out that matplotlib is measuring the x axis in days (since 0.0007 days is almost exactly 1 minute, which matches my time intervals, and 0.0006 gives the gaps between bars) but that's no good if I get hourly values or seconds, or weeks, etc. Surely there's an option for handling this automatically?
If you want the bar width to be no larger than the difference between any successive datetimes, you can calculate that number and supply it to the bar's width argument.
import matplotlib.dates as mdates
width = np.min(np.diff(mdates.date2num(x_values)))
ax.bar(x_values, y_values, width=width, ec="k")

Numpy array changes shape when accessing with indices

I have a small matrix A with dimensions MxNxO
I have a large matrix B with dimensions KxMxNxP, with P>O
I have a vector ind of indices of dimension Ox1
I want to do:
B[1,:,:,ind] = A
But, the lefthand of my equation
B[1,:,:,ind].shape
is of dimension Ox1xMxN and therefore I can not broadcast A (MxNxO) into it.
Why does accessing B in this way change the dimensions of the left side?
How can I easily achieve my goal?
Thanks
There's a feature, if not a bug, that when slices are mixed in the middle of advanced indexing, the sliced dimensions are put at the end.
Thus for example:
In [204]: B = np.zeros((2,3,4,5),int)
In [205]: ind=[0,1,2,3,4]
In [206]: B[1,:,:,ind].shape
Out[206]: (5, 3, 4)
The 3,4 dimensions have been placed after the ind, 5.
We can get around that by indexing first with 1, and then the rest:
In [207]: B[1][:,:,ind].shape
Out[207]: (3, 4, 5)
In [208]: B[1][:,:,ind] = np.arange(3*4*5).reshape(3,4,5)
In [209]: B[1]
Out[209]:
array([[[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]],
[[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34],
[35, 36, 37, 38, 39]],
[[40, 41, 42, 43, 44],
[45, 46, 47, 48, 49],
[50, 51, 52, 53, 54],
[55, 56, 57, 58, 59]]])
This only works when that first index is a scalar. If it too were a list (or array), we'd get an intermediate copy, and couldn't set the value like this.
https://docs.scipy.org/doc/numpy-1.15.0/reference/arrays.indexing.html#combining-advanced-and-basic-indexing
It's come up in other SO questions, though not recently.
weird result when using both slice indexing and boolean indexing on a 3d array