openpyxl: chart.title = "Chart Heading" not functional. How do I add the heading to a scatter chart? - openpyxl

I'm using Python 3.6.3 and openpyxl 2.5.4
I wrote some code and noticed that setting my chart title with chart.title = "Test Heading" does nothing. As a sanity check I copied and running the example from here:
from openpyxl import Workbook
from openpyxl.chart import (
ScatterChart,
Reference,
Series,
)
wb = Workbook()
ws = wb.active
rows = [
['Size', 'Batch 1', 'Batch 2'],
[2, 40, 30],
[3, 40, 25],
[4, 50, 30],
[5, 30, 25],
[6, 25, 35],
[7, 20, 40],
]
for row in rows:
ws.append(row)
chart = ScatterChart()
chart.title = "Scatter Chart"
chart.style = 13
chart.x_axis.title = 'Size'
chart.y_axis.title = 'Percentage'
xvalues = Reference(ws, min_col=1, min_row=2, max_row=7)
for i in range(2, 4):
values = Reference(ws, min_col=i, min_row=1, max_row=7)
series = Series(values, xvalues, title_from_data=True)
chart.series.append(series)
ws.add_chart(chart, "A10")
wb.save("scatter.xlsx")
Sadly the title in my sample output is still missing:
Oddly, changing title_from_data=True to title_from_data=False also seems to have no effect on the contents of the chart.

This looks very much like a bug in the application you're using to view the file, which I suspect is LibreOffice.

Related

Create new panda dataframe with fixed distance using interpolate

I have a dataframe of the following form.
df = {'X': [0, 3, 6, 7, 8, 11],
'Y1': [8, 5, 4, 3, 2, 1.5],
'Y2': [1, 2, 4, 5, 5, 5]}
I would like to create a new dataframe where I use interpolate where 'X' is stepping in fixed steps [0, 2, 4, 6, 8, 10].
To find the new 'Y' values I need to find f(x)=Y1 and then I can evaluate for each step in X. But since I have many Y's I think there must be a more clever way to do this.
The solution I found was the following:
step_size = 0.25
no_steps = int(np.floor(max(b['X'])/step_size))
for i in range(0,no_steps+1):
b = b.append({'X' : 0.25*i, 'StepNo' : 10, 'PointNo' : 23+i}, ignore_index=True)
b = b.sort_values(['X'])
b = b.set_index(['X'])
c = b.interpolate('index')
c = c.reset_index()
c = c.sort_values(['PointNo'])
So first I define step size. Then I calculate number of steps. Then I append the steps into the dataframe. Sort the dataframe and reindex so I can use interpolate using 'index' as values.

Add a trendline to a barchart with xlswriter

I am trying to create a bar chart with a trendline. I can do this in excel and would like to automate the process. xlswriter is pretty easy to use and I have replicated the bar chart it is just the trend line that does not work for me. It seems to add 2 elements the line and an additional bar on the top of each stack.
This is the code to create the chart on the left
import xlsxwriter
# create worbook, workseet and chart
workbook = xlsxwriter.Workbook("Example.xlsx")
worksheet = workbook.add_worksheet()
chart1 = workbook.add_chart({'type': 'column', 'subtype': 'stacked'})
# Add the worksheet data
headings = ['Model 1', 'Model 2', 'Capacity']
data = [
[10, 40, 50, 20, 10, 50],
[30, 60, 70, 50, 40, 30],
[20, 30, 40, 40, 30, 30]
]
worksheet.write_row('A1', headings)
worksheet.write_column('A2', data[0])
worksheet.write_column('B2', data[1])
worksheet.write_column('C2', data[2])
# Configure the first series.
chart1.add_series({
'name': '=Sheet1!$A$1',
'values': '=Sheet1!$A$2:$A$7',
})
# Configure the first series.
chart1.add_series({
'name': '=Sheet1!$B$1',
'values': '=Sheet1!$B$2:$B$7',
})
chart1.add_series({
'name': '=Sheet1!$C$1',
'values': '=Sheet1!$C$2:$C$7',
'trendline': {'type': 'linear'},
})
# Set an Excel chart style.
chart1.set_style(11)
# Add a chart title
chart1.set_title ({'name': 'xlsxwriter chart'})
# Insert the chart into the worksheet (with an offset).
worksheet.insert_chart('F1', chart1)
# Finally, close the Excel file
workbook.close()
The bars select the data I am trying to insert as a trend line. Any help would be appreciated.
It looks like what you are trying to do is to add a secondary line chart rather than a trendline. You can do this with the XlsxWriter chart.combine() method.
Like this:
import xlsxwriter
# create worbook, workseet and chart
workbook = xlsxwriter.Workbook("Example.xlsx")
worksheet = workbook.add_worksheet()
chart1 = workbook.add_chart({'type': 'column', 'subtype': 'stacked'})
# Add the worksheet data
headings = ['Model 1', 'Model 2', 'Capacity']
data = [
[10, 40, 50, 20, 10, 50],
[30, 60, 70, 50, 40, 30],
[20, 30, 40, 40, 30, 30]
]
worksheet.write_row('A1', headings)
worksheet.write_column('A2', data[0])
worksheet.write_column('B2', data[1])
worksheet.write_column('C2', data[2])
# Configure the first series.
chart1.add_series({
'name': '=Sheet1!$A$1',
'values': '=Sheet1!$A$2:$A$7',
})
# Configure the first series.
chart1.add_series({
'name': '=Sheet1!$B$1',
'values': '=Sheet1!$B$2:$B$7',
})
# Add a chart title
chart1.set_title ({'name': 'xlsxwriter chart'})
# Create a second line chart.
chart2 = workbook.add_chart({'type': 'line'})
chart2.add_series({
'name': '=Sheet1!$C$1',
'values': '=Sheet1!$C$2:$C$7',
})
# Combine the charts.
chart1.combine(chart2)
# Insert the chart into the worksheet (with an offset).
worksheet.insert_chart('F1', chart1)
# Finally, close the Excel file
workbook.close()
Output:

matplotlib histogram with equal bars width

I use a histogram to display the distribution. Everything works fine if the spacing of the bins is uniform. But if the interval is different, then the bar width is appropriate (as expected). Is there a way to set the width of the bar independent of the size of the bins ?
This is what i have
This what i trying to draw
from matplotlib import pyplot as plt
my_bins = [10, 20, 30, 40, 50, 120]
my_data = [5, 5, 6, 8, 9, 15, 25, 27, 33, 45, 46, 48, 49, 111, 113]
fig1 = plt.figure()
ax1 = fig1.add_subplot(121)
ax1.set_xticks(my_bins)
ax1.hist(my_data, my_bins, histtype='bar', rwidth=0.9,)
fig1.show()
I cannot mark your question as a duplicate, but I think my answer to this question might be what you are looking for?
I'm not sure how you'll make sense of the result, but you can use numpy.histogram to calculate the height of your bars, then plot those directly against an arbitrary x-scale.
x = np.random.normal(loc=50, scale=200, size=(2000,))
bins = [0,1,10,20,30,40,50,75,100]
fig = plt.figure()
ax = fig.add_subplot(211)
ax.hist(x, bins=bins, edgecolor='k')
ax = fig.add_subplot(212)
h,e = np.histogram(x, bins=bins)
ax.bar(range(len(bins)-1),h, width=1, edgecolor='k')
EDIT Here's with the adjustment to the x-tick labels so that the correspondence is easier to see.
my_bins = [10, 20, 30, 40, 50, 120]
my_data = [5, 5, 6, 8, 9, 15, 25, 27, 33, 45, 46, 48, 49, 111, 113]
fig = plt.figure()
ax = fig.add_subplot(211)
ax.hist(my_data, bins=my_bins, edgecolor='k')
ax = fig.add_subplot(212)
h,e = np.histogram(my_data, bins=my_bins)
ax.bar(range(len(my_bins)-1),h, width=1, edgecolor='k')
ax.set_xticks(range(len(my_bins)-1))
ax.set_xticklabels(my_bins[:-1])

Matplotlib: barh full width bars

I'm trying to generate a stacked horizontal bar chart in matplotlib. The issue I am facing is that the width of the bars does not fully fill the available width of the plotting area (additional space on the right).
Unfortunately I couldn't find any information on this online.
What could I do to resolve this?
Chart with additional space on the right of the bars
measures = ("A", "B", "C", "D", "A", "B", "C", "D", "A", "B")
measure_bars = y_pos = np.arange(len(measures))
yes_data = [10, 10, 10, 10, 15, 10, 10, 10, 10, 10]
number_of_answers = [20, 30, 20, 20, 20, 20, 20, 20, 20, 20]
font = {'fontname': 'Arial', 'color': '#10384f'}
yes_data = [i / j * 100 for i, j in zip(yes_data, number_of_answers)]
no_data = [100 - i for i in yes_data]
bar_width = 0.6
plt.rcParams['xtick.top'] = plt.rcParams['xtick.labeltop'] = True
plt.rcParams['xtick.bottom'] = plt.rcParams['xtick.labelbottom'] = False
fig = plt.figure()
plt.barh(measure_bars, yes_data, color='#89d329', height=bar_width, zorder=2)
plt.barh(measure_bars, no_data, left=yes_data, color='#ff3162', height=bar_width, zorder=3)
plt.grid(color=font["color"], zorder=0)
plt.yticks(measure_bars, measures, **font)
plt.title("TECHNICAL AND ORGANIZATIONAL MEASURES", fontweight="bold", size="16", x=0.5, y=1.1, **font)
ax = plt.axes()
ax.xaxis.set_major_formatter(PercentFormatter())
ax.spines['bottom'].set_color(font["color"])
ax.spines['top'].set_color(font["color"])
ax.spines['right'].set_color(font["color"])
ax.spines['left'].set_color(font["color"])
ax.xaxis.label.set_color(font["color"])
ax.tick_params(axis='x', colors=font["color"])
for tick in ax.get_xticklabels():
tick.set_fontname(font["fontname"])
ax.xaxis.set_ticks(np.arange(0.0, 100.1, 10))
plt.gca().legend(('Yes', 'No'), bbox_to_anchor=(0.7, 0), ncol=2, shadow=False)
plt.show()
Please add (somewhere in the middle)
ax.set_xlim(0, 1)

implementation of Hierarchial Agglomerative clustering

i am newbie and just want to implement Hierarchical Agglomerative clustering for RGB images. For this I extract all values of RGB from an image. And I process image.Next I find its distance and then develop the linkage. Now from linkage I want to extract my original data (i.e RGB values) on specified indices with indices id. Here is code I have done so far.
image = Image.open('image.jpg')
image = image.convert('RGB')
im = np.array(image).reshape((-1,3))
rgb = list(im.getdata())
X = pdist(im)
Y = linkage(X)
I = inconsistent(Y)
based on the 4th column of consistency. I opt minimum value of the cutoff in order to get maximum clusters.
cutoff = 0.7
cluster_assignments = fclusterdata(Y, cutoff)
# Print the indices of the data points in each cluster.
num_clusters = cluster_assignments.max()
print "%d clusters" % num_clusters
indices = cluster_indices(cluster_assignments)
ind = np.array(enumerate(rgb))
for k, ind in enumerate(indices):
print "cluster", k + 1, "is", ind
dendrogram(Y)
I got results like this
cluster 6 is [ 6 11]
cluster 7 is [ 9 12]
cluster 8 is [15]
Means cluster 6 contains the indices of 6 and 11 leafs. Now at this point I stuck in how to map these indices to get original data(i.e rgb values). indices of each rgb values to each pixel in the image. And then I have to generate codebook to implement Agglomeration Clustering. I have no idea how to approach this task. Read a lot of stuff but nothing clued.
Here is my solution:
import numpy as np
from scipy.cluster import hierarchy
im = np.array([[54,101,9],[ 67,89,27],[ 67,85,25],[ 55,106,1],[ 52,108,0],
[ 55,78,24],[ 19,57,8],[ 19,46,0],[ 95,110,15],[112,159,57],
[ 67,118,26],[ 76,127,35],[ 74,128,30],[ 25,62,0],[100,120,9],
[127,145,61],[ 48,112,25],[198,25,21],[203,11,10],[127,171,60],
[124,173,45],[120,133,19],[109,137,18],[ 60,85,0],[ 37,0,0],
[187,47,20],[127,170,52],[ 30,56,0]])
groups = hierarchy.fclusterdata(im, 0.7)
idx_sorted = np.argsort(groups)
group_sorted = groups[idx_sorted]
im_sorted = im[idx_sorted]
split_idx = np.where(np.diff(group_sorted) != 0)[0] + 1
np.split(im_sorted, split_idx)
output:
[array([[203, 11, 10],
[198, 25, 21]]),
array([[187, 47, 20]]),
array([[127, 171, 60],
[127, 170, 52]]),
array([[124, 173, 45]]),
array([[112, 159, 57]]),
array([[127, 145, 61]]),
array([[25, 62, 0],
[30, 56, 0]]),
array([[19, 57, 8]]),
array([[19, 46, 0]]),
array([[109, 137, 18],
[120, 133, 19]]),
array([[100, 120, 9],
[ 95, 110, 15]]),
array([[67, 89, 27],
[67, 85, 25]]),
array([[55, 78, 24]]),
array([[ 52, 108, 0],
[ 55, 106, 1]]),
array([[ 54, 101, 9]]),
array([[60, 85, 0]]),
array([[ 74, 128, 30],
[ 76, 127, 35]]),
array([[ 67, 118, 26]]),
array([[ 48, 112, 25]]),
array([[37, 0, 0]])]