Matplotlib / Pandas histogram incorrect alignment - matplotlib

# A histogram
n = np.random.randn(100000)
fig, axes = plt.subplots(1, 2, figsize=(12,4))
axes[0].hist(n)
axes[0].set_title("Default histogram")
axes[0].set_xlim((min(n), max(n)))
axes[1].hist(n, cumulative=True, bins=50)
axes[1].set_title("Cumulative detailed histogram")
axes[1].set_xlim((min(n), max(n)));
This is from an ipython notebook here In[41]
It seems that the histogram bars don't correctly align with the grids (see first subplot). That is the same problem I face in my own plots.
Can someone explain why?

Look for the align option in matplotlib hist. You can align left, right, or center. By default your bins will not be centered which is why you see left aligned bins. This is spelled out in the matplotlib hist docs: http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.hist

What if you have a gaussian that spread from -2647 to +1324 do yo expect to have 3971 bins ? maybe too much. 39 ? then you are off by 0.71. what about 40 ? Off by 0.29.
The way histogram works is you can set the bins= parameter (number of bins, default 10). On the right graph, the scale seem to go from around -4.5 to +4.5 which make a span of 9 divided by 10 bins that gives 0.9/bin.
Also when you do histogram, it is not obvious "how" you want to bin things and represent it.
if you have a bin from 0 to 1, is it 0 < x <= 1, 0 <= x < 1 ? if you have only integer values, I suspect you would also prefer bins to be centered around integer values ? right ?
So histogram is a quick method that give you insight in the data, but does not prevent you from setting its parameters to represent the data the way yo like.
This blog post has nice demo of affect of parameter in histogram plotting and explain some alternate methods of plotting.

Related

Why the point size using sns.lmplot is different when I used plt.scatter?

I want to do a scatterplot according x and y variables, and the points size depend of a numeric variable and the color of every point depend of a categorical variable.
First, I was trying this with plt.scatter:
Graph 1
After, I tried this using lmplot but the point size is different in relation to the first graph.
I think the two graphs should be equals. Why not?
The point size is different in every graph.
Graph 2
Your question is no so much descriptive but i guess you want to control the size of the marker. Here is more documentation
Here is the start point for you.
A numeric variable can also be assigned to size to apply a semantic mapping to the areas of the points:
import seaborn as sns
tips = sns.load_dataset("tips")
sns.scatterplot(data=tips, x="total_bill", y="tip", hue="size", size="size")
For seaborn scatterplot:
df = sns.load_dataset("anscombe")
sp = sns.scatterplot(x="x", y="y", hue="dataset", data=df)
And to change the size of the points you use the s parameter.
sp = sns.scatterplot(x="x", y="y", hue="dataset", data=df, s=100)

Matplotlib/Seaborn: Boxplot collapses on x axis

I am creating a series of boxplots in order to compare different cancer types with each other (based on 5 categories). For plotting I use seaborn/matplotlib. It works fine for most of the cancer types (see image right) however in some the x axis collapses slightly (see image left) or strongly (see image middle)
https://i.imgur.com/dxLR4B4.png
Looking into the code how seaborn plots a box/violin plot https://github.com/mwaskom/seaborn/blob/36964d7ffba3683de2117d25f224f8ebef015298/seaborn/categorical.py (line 961)
violin_data = remove_na(group_data[hue_mask])
I realized that this happens when there are too many nans
Is there any possibility to prevent this collapsing by code only
I do not want to modify my dataframe (replace the nans by zero)
Below you find my code:
boxp_df=pd.read_csv(pf_in,sep="\t",skip_blank_lines=False)
fig, ax = plt.subplots(figsize=(10, 10))
sns.violinplot(data=boxp_df, ax=ax)
plt.xticks(rotation=-45)
plt.ylabel("label")
plt.tight_layout()
plt.savefig(pf_out)
The output is a per cancer type differently sized plot
(depending on if there is any category completely nan)
I am expecting each plot to be in the same width.
Update
trying to use the order parameter as suggested leads to the following output:
https://i.imgur.com/uSm13Qw.png
Maybe this toy example helps ?
|Cat1|Cat2|Cat3|Cat4|Cat5
|3.93| |0.52| |6.01
|3.34| |0.89| |2.89
|3.39| |1.96| |4.63
|1.59| |3.66| |3.75
|2.73| |0.39| |2.87
|0.08| |1.25| |-0.27
Update
Apparently, the problem is not the data but the length of the title
https://github.com/matplotlib/matplotlib/issues/4413
Therefore I would close the question
#Diziet should I delete it or does my issue might help other ones?
Sorry for not including the line below in the code example:
ax.set_title("VERY LONG TITLE", fontsize=20)
It's hard to be sure without data to test it with, but I think you can pass the names of your categories/cancers to the order= parameter. This forces seaborn to use/display those, even if they are empty.
for instance:
tips = sns.load_dataset("tips")
ax = sns.violinplot(x="day", y="total_bill", data=tips, order=['Thur','Fri','Sat','Freedom Day','Sun','Durin\'s Day'])

How can I set the number of ticks in Julia using Pyplot?

I am struggling to 'translate' the instructions I find for Python to the use of Pyplot in Julia. This must be a simple question, but do you know how to set the number of ticks in a plot in Julia using Pyplot?
If you have
x = [1,2,3,4,5]
y = [1,3,6,8,11]
you can
PyPlot.plot(x,y)
which draws the plot
and then do
PyPlot.xticks([1,3,5])
for tics at 1,3 and 5 on the x-axis
PyPlot.yticks([1,6,11])
for tics at 1,6 and 11 on the y-axis
Tic spacing
if you want fx 4 tics and want it evenly spaced and dont mind Floats, you can do
collect(linspace(x[1], x[end], 4).
If you need the tics to be integers and you want 4 tics, you can do
collect(x[1]:div(x[end],4):x[end])
Edit
Maybe this wont belong here but atleast you'll see it...
whenever you're looking for a method that's supposed to be in a module X you can find these methods by typing in the REPL X. + TAB key
to clarify, if you want to search a module for a method you suspect starts with an x, like xticts, in the REPL (terminal/shell) do
PyPlot.x
and press TAB twice and you'll see
julia> PyPlot.x
xkcd xlabel xlim xscale xticks
and if you're not sure exactly how the method works, fx its arguments, and there isnt any help available, you can call
methods(PyPlot.xticks)
to see every "version" that method has
Bonus
The module for all the standard methods, like maximum, vcat etc is Base
After some trying and searching, I found a way to do it. One can just set the number of bins that should be on each axis. Here is a minimal example:
using PyPlot
x = linspace(0, 10, 200)
y = sin(x)
fig, ax = subplots()
ax[:plot](x, y, "r-", linewidth=2, label="sine function", alpha=0.6)
ax[:legend](loc="upper center")
ax[:locator_params](axis ="y", nbins=4)
The last line specifies the number of bins that should be used on the y-axis. Leaving the argument axis unspecified will set that option for both axis at the same value.

matplotlib/pyplot: print only ticks once in scatter plot?

I am looking for a way to clean-up the ticks in my pyplot scatter plot.
To create a scatter plot from a Pandas dataset column with strings as elements, I followed the example in [2] - and got me a nice scatter plot:
input are 10k data points where the X axis has only ~200 unique 'names', that got matched to scalars for plotting. Obviously, plotting all the 10k ticks on the x axis is a bit clocked. So, I am looking for a way, to print each unique tick only once and not for each data point?
My code looks like:
fig2 = plt.figure()
WNsUniques, WNs = numpy.unique(taskDataFrame['modificationhost'], return_inverse=True)
scatterWNs = fig2.add_subplot(111)
scatterWNs.scatter(WNs, taskDataFrame['cpuconsumptiontime'])
scatterWNs.set(xticks=range(len(WNsUniques)), xticklabels=WNsUniques)
plt.xticks(rotation='vertical')
plt.savefig("%s_WNs-CPUTime_scatter.%s" % (dfName,"pdf"))
actually, I was hoping that setting the plot x ticks to the unique names should be sufficient - but apparently not? Probably it is something easy, but how do I reduce the ticks for my subplot to unique once (should they not already be uniqueified as returned by numpy.unique?)?
Maybe someone has an idea for me?
Cheers ans thanks,
Thomas
You can use the set_xticks method to accomplish this. Note that 200 axis ticks with labels are still quite a lot to force on a small plot like this, and this is what you might already be seeing with the above code. Without complete code to play with, I can't say for sure.
Additionally, what is the size of WNsUniques? That can easily be used to check if your call to unique is doing what you think.

Change the labels of a colorbar from increasing to decreasing values

I'd like to change the labels for the colorbar from increasing to decreasing values. When I try to do this via vmin and vmax I get the error message:
minvalue must be less than or equal to maxvalue
So, for example I'd like the colorbar to start at 20 on the left and go up to 15 on the right.
This is my code for the colorbar so far, but in this example the values go from 15 to 20 and I'd like to reverse that order:
cmap1 = mpl.cm.YlOrBr_r
norm1 = mpl.colors.Normalize(15,20)
cb1 = mpl.colorbar.ColorbarBase(colorbar1, cmap=cmap1, norm=norm1, orientation='horizontal')
cb1.set_label('magnitude')
The colorbars displayed below are probably not exactly like yours, as they are just example colorbars to function as a proof of concept.
In the following I assume you have a colorbar similar to this, with increasing values to the right:
Method 1: Inverting the x-axis
Inverts the whole x-axis of the colorbar
If you want to invert the x-axis, meaning that the values on the x-axis are descending to the right, making the colorbar "mirrored", you can make use of the ColorbarBase's ax attribute:
cb1 = mpl.colorbar.ColorbarBase(colorbar1,
cmap=cmap1,
norm=norm1,
orientation='horizontal')
cb1.ax.invert_xaxis()
This gives.the output below.
It is also possible to change the number of ticklabels by setting the colorbars locator. Here the MultipleLocator is used, although you can use many other locators as well.
from matplotlib.ticker import MultipleLocator
cb1.locator = MultipleLocator(1) # Show ticks only for each multiple of 1
cb1.update_ticks()
cb1.ax.invert_xaxis()
Method 2: Using custom ticklabels
Reverses the order of the ticklabels, keeping the orientation of the colorbar
If you want the orientation of the colorbar itself as it is, and only reverse the order in which the ticklabels appear, you can use the set_ticks and set_ticklabels methods. This is more of a "brute force" approach than the previous solution.
cb1.set_ticks(np.arange(15, 21))
cb1.set_ticklabels(np.arange(20, 14, -1))
This gives the colorbar seen below. Note that the colors are kept intact, only the tick locations and ticklabels have changed.
An alternative solution for producing the colorbar in Method 2:
cmap1 = cmap1.reversed()
cb1.ax.invert_yaxis()
works for me: variable_you_want.ax.invert_yaxis()