Scaling textplot_wordcloud quanteda - error-handling

I want to plot features from my quanteda dfm.
When I use the textplot_wordcloud (see code) I get the error:
In wordcloud(x, min_size, max_size, min_count, max_words, ... : Term x could not be fit on page. It will not be plotted.
dfm_joint <- dfm(tokens_skip)
textplot_wordcloud(dfm_joint, min_size = 2, rotation = 0.25, max_words = 100,
color = rev(RColorBrewer::brewer.pal(10, "RdBu")))
I guess it lies within the scaling of the plot but is there any possibility to adjust the plot size within the textplot_wordcloud function? Because the argument "adjust" delivered with the function is just for adapting the size of the words which doesn´t fix the problem.
Thanks very much in advance.

Related

julia savefig() saves wrong marker shape

The code is to using PyPlot to scatter and save in julia.
using PyPlot;pygui(true)
fig = figure()
for i = 1:400
scatter([i,i+1], [i+1, i+2], color = "blue", s = 0.1)
end
PyPlot.savefig("1.png", figsize = (16, 9),dpi = 1200, bbox_inches="tight")
But the plot result and saved figure is different:
What I want is some simple dots:
But in saved figure the marker shape is circles :
As you can see, the marker type is changed.
I found this only occurs when scattering highly dense dots. How should I fix this?
I just had the exact same problem. The only way I figured out was to set markeredgecolor="none", that got rid of the circles around the dots.
Might not be the most efficient solution, but at least I could produce the plots I needed.

Is it possible to break x and y axis at the same time on lineplot?

I am working on drawing lineplots with matplotlib.
I checked several posts and could understand how the line break works on matplotlib (Break // in x axis of matplotlib)
However, I was wondering is it possible to break x and y axis all together at the same time.
My current drawing looks like below.
As shown on the graph, x-axis [2000,5000] waste spaces a lot.
Because I have more data that need to be drawn after 7000, I want to save more space.
Is it possible to split x-axis together with y-axis?
Or is there another convenient way to not to show specific region on lineplot?
If there is another library enabling this, I am willing to drop matplotlib and adopt others...
Maybe splitting the axis isn't your best choice. I would perhaps try inserting another smaller figure into the open space of your large figure using add_axes(). Here is a small example.
t = np.linspace(0, 5000, 1000) # create 1000 time stamps
data = 5*t*np.exp(-t/100) # and some fake data
fig, ax = plt.subplots()
ax.plot(t, data)
box = ax.get_position()
width = box.width*0.6
height = box.height*0.6
x = 0.35
y = 0.35
subax = fig.add_axes([x,y,width,height])
subax.plot(t, data)
subax.axis([0, np.max(t)/10, 0, np.max(data)*1.1])
plt.show()

Drawing a diagonal line of my matplotlib scatterplot?

I am trying to draw a diagonal line on my figure to demonstrate how my data compares to someone else's, so I want a line representing 1:1 relationship. I'm trying to use plt.plot to do the line between two points but there is no line on my plot. This is my code + the figure. Can anyone tell me why it is not working?
plot23 = sns.regplot(x = Combined['log10(L/L_solar)'], y = Combined['logLum'],
fit_reg=False).set_title('Figure 23: Comparing luminosities')
plt.xlabel('logL from my data', fontsize=13)
plt.ylabel('logL from Drout', fontsize=13)
plt.axis((4, 6, 4, 6))
plt.plot([0,0], [6,6], 'k-')
plt.show()
plot23.figure.savefig("figure23.png")
You made a mistake in using plt.plot. The syntax is
plt.plot(xarray,yarray, ...)
.
This should be :
plt.plot([0,6], [0,6], 'k-')
To draw infinite lines under a specified angle, e.g. 45deg through the origin at (0,0) you can use
plt.axline( (0,0),slope=-1,linestyle='--',color='red')
This line will span the full axes box regardless the data extremes and you don't have to first determine the axis limits or data limits.
This is a generalization of plt.axhline() and plt.axvline(), which are used to draw infinite length horizontal and vertical lines.
Ref: https://matplotlib.org/stable/gallery/pyplots/axline.html

colorbars for grid of line (not contour) plots in matplotlib

I'm having trouble giving colorbars to a grid of line plots in Matplotlib.
I have a grid of plots, which each shows 64 lines. The lines depict the penalty value vs time when optimizing the same system under 64 different values of a certain hyperparameter h.
Since there are so many lines, instead of using a standard legend, I'd like to use a colorbar, and color the lines by the value of h. In other words, I'd like something that looks like this:
The above was done by adding a new axis to hold the colorbar, by calling figure.add_axes([0.95, 0.2, 0.02, 0.6]), passing in the axis position explicitly as parameters to that method. The colorbar was then created as in the example code here, by instantiating a ColorbarBase(). That's fine for single plots, but I'd like to make a grid of plots like the one above.
To do this, I tried doubling the number of subplots, and using every other subplot axis for the colorbar. Unfortunately, this led to the colorbars having the same size/shape as the plots:
Is there a way to shrink just the colorbar subplots in a grid of subplots like the 1x2 grid above?
Ideally, it'd be great if the colorbar just shared the same axis as the line plot it describes. I saw that the colorbar.colorbar() function has an ax parameter:
ax
parent axes object from which space for a new colorbar axes will be stolen.
That sounds great, except that colorbar.colorbar() requires you to pass in a imshow image, or a ContourSet, but my plot is neither an image nor a contour plot. Can I achieve the same (axis-sharing) effect using ColorbarBase?
It turns out you can have different-shaped subplots, so long as all the plots in a given row have the same height, and all the plots in a given column have the same width.
You can do this using gridspec.GridSpec, as described in this answer.
So I set the columns with line plots to be 20x wider than the columns with color bars. The code looks like:
grid_spec = gridspec.GridSpec(num_rows,
num_columns * 2,
width_ratios=[20, 1] * num_columns)
colormap_type = cm.cool
for (x_vec_list,
y_vec_list,
color_hyperparam_vec,
plot_index) in izip(x_vec_lists,
y_vec_lists,
color_hyperparam_vecs,
range(len(x_vecs))):
line_axis = plt.subplot(grid_spec[grid_index * 2])
colorbar_axis = plt.subplot(grid_spec[grid_index * 2 + 1])
colormap_normalizer = mpl.colors.Normalize(vmin=color_hyperparam_vec.min(),
vmax=color_hyperparam_vec.max())
scalar_to_color_map = mpl.cm.ScalarMappable(norm=colormap_normalizer,
cmap=colormap_type)
colorbar.ColorbarBase(colorbar_axis,
cmap=colormap_type,
norm=colormap_normalizer)
for (line_index,
x_vec,
y_vec) in zip(range(len(x_vec_list)),
x_vec_list,
y_vec_list):
hyperparam = color_hyperparam_vec[line_index]
line_color = scalar_to_color_map.to_rgba(hyperparam)
line_axis.plot(x_vec, y_vec, color=line_color, alpha=0.5)
For num_rows=1 and num_columns=1, this looks like:

Matplotlib / Pandas histogram incorrect alignment

# A histogram
n = np.random.randn(100000)
fig, axes = plt.subplots(1, 2, figsize=(12,4))
axes[0].hist(n)
axes[0].set_title("Default histogram")
axes[0].set_xlim((min(n), max(n)))
axes[1].hist(n, cumulative=True, bins=50)
axes[1].set_title("Cumulative detailed histogram")
axes[1].set_xlim((min(n), max(n)));
This is from an ipython notebook here In[41]
It seems that the histogram bars don't correctly align with the grids (see first subplot). That is the same problem I face in my own plots.
Can someone explain why?
Look for the align option in matplotlib hist. You can align left, right, or center. By default your bins will not be centered which is why you see left aligned bins. This is spelled out in the matplotlib hist docs: http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.hist
What if you have a gaussian that spread from -2647 to +1324 do yo expect to have 3971 bins ? maybe too much. 39 ? then you are off by 0.71. what about 40 ? Off by 0.29.
The way histogram works is you can set the bins= parameter (number of bins, default 10). On the right graph, the scale seem to go from around -4.5 to +4.5 which make a span of 9 divided by 10 bins that gives 0.9/bin.
Also when you do histogram, it is not obvious "how" you want to bin things and represent it.
if you have a bin from 0 to 1, is it 0 < x <= 1, 0 <= x < 1 ? if you have only integer values, I suspect you would also prefer bins to be centered around integer values ? right ?
So histogram is a quick method that give you insight in the data, but does not prevent you from setting its parameters to represent the data the way yo like.
This blog post has nice demo of affect of parameter in histogram plotting and explain some alternate methods of plotting.