Matplotlib value on top left and remove it - matplotlib

I have this array with 10 values.
I get that my array has so many numbers behind the comma.
But I notice there's value on top left corner.
Anyone knows what is it and how remove it?
thank you in advance.
the array:
0.00409960926442099
0.00409960926442083
0.004099609264420652
0.004099609264420653
0.004099609264420585
0.0040996092644205884
0.004099609264420545
0.004099609264420517
0.004099609264420514
0.004099609264420513

As your values are all very close together, the usual ticks would all be the same. For example, if you use '%.6f' as the tick format, you'd get '0.00410' for each of the ticks. That would not be very helpful. Therefore, matplotlib puts a base number '4.099609264420e-3' together with an offset '1e-16' to label the yticks. So, every real ytick would be the base plus the offset times the tick-value.
To get rid of these strange numbers, you have to re-evaluate what exactly you want to achieve with your plot. If you'd set some y-limits (e.g. plt.ylim(0.004099, 0.004100)), you'd get a quite dull horizontal line. Note that 1e-16 is very close to the maximum precision you can get using standard floating-point math.
Here is some demo code to show how it would look with the '%.6f' format:
import matplotlib.pyplot as plt
import matplotlib.ticker as mtick
plt.plot([0.00409960926442099, 0.00409960926442083, 0.004099609264420652, 0.004099609264420653,
0.004099609264420585, 0.0040996092644205884, 0.004099609264420545, 0.004099609264420517,
0.004099609264420514, 0.004099609264420513])
plt.gca().yaxis.set_major_formatter(mtick.FormatStrFormatter('%.6f'))
plt.tight_layout()
plt.show()

Related

change scientific notation abbreviation of y axis units to a string

First I would like to apologize as I know I am not asking this question correctly (which is why I cant find what is likely a simple answer).
I have a graph
As you can see above the y axis it says 1e11 meaning that the units are in 100 Billions. I would like to change the graph to read 100 Billion instead of 1e11.
I am not sure what such a notation is called.
To be clear I am not asking to change the whole y axis to number values like other questions I only want to change the top 1e11 to be more readable to those who are less mathematical.
ax.get_yaxis().get_major_formatter().set_scientific(False)
results in an undesired result
import numpy as np
from matplotlib.ticker import FuncFormatter
def billions(x, pos):
return '$%1.1fB' % (x*1e-9)
formatter = FuncFormatter(billions)
ax.yaxis.set_major_formatter(formatter)
located from https://matplotlib.org/examples/pylab_examples/custom_ticker1.html
produces

Adding descriptive stats to this plot

In pandas/seaborn:
sns.distplot(combo['resubmits'], kde=False, bins=8)
plt.savefig("g1.png")
Makes a very pretty histogram. I want to include a textual "legend" showing the mean, stdev, n, etc as numbers in a box. You would think this is so common that there's a semi automatic way to do it but I can't find it.
There is a feature request for that.
However, note that using matplotlib.pyplot.axvline, you can easily do it yourself for now.
from matplotlib import pyplot as plt
plt.axvline(x, 0, y_max)
where x=combo['resubmits'].mean() and y_max is the maximal value of hist(combo['resubmits'])'s bins' values.

Better ticks and tick labels with log scale

I am trying to get better looking log-log plots and I almost got what I want except for a minor problem.
The reason my example throws off the standard settings is that the x values are confined within less than one decade and I want to use decimal, not scientific notation.
Allow me to illustrate with an example:
import matplotlib.pyplot as plt
%matplotlib inline
import matplotlib as mpl
import numpy as np
x = np.array([0.6,0.83,1.1,1.8,2])
y = np.array([1e-5,1e-4,1e-3,1e-2,0.1])
fig1,ax = plt.subplots()
ax.plot(x,y)
ax.set_xscale('log')
ax.set_yscale('log')
which produces:
There are two problems with the x axis:
The use of scientific notation, which in this case is counterproductive
The horrible "offset" at the lower right corner
After much reading, I added three lines of code:
ax.xaxis.set_major_formatter(mpl.ticker.ScalarFormatter())
ax.xaxis.set_minor_formatter(mpl.ticker.ScalarFormatter())
ax.ticklabel_format(style='plain',axis='x',useOffset=False)
This produces:
My understanding of this is that there are 5 minor ticks and 1 major one. It is much better, but still not perfect:
I would like some additional ticks between 1 and 2
Formatting of label at 1 is wrong. It should be "1.0"
So I inserted the following line before the formatter statement:
ax.xaxis.set_major_locator(mpl.ticker.MultipleLocator(0.2))
I finally get the ticks I want:
I now have 8 major and 2 minor ticks. Now, this almost looks right except for the fact that the tick labels at 0.6, 0.8 and 2.0 appear bolder than the others. What is the reason for this and how can I correct it?
The reason, some of the labels appear bold is that they are part of the major and minor ticklabels. If two texts perfectly overlap, they appear bolder due to the antialiasing.
You may decide to only use minor ticklabels and set the major ones with a NullLocator.
Since the locations of the ticklabels you wish to have is really specific there is no automatic locator that would provide them out of the box. For this special case it may be easiest to use a FixedLocator and specify the labels you wish to have as a list.
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import numpy as np
x = np.array([0.6,0.83,1.1,1.8,2])
y = np.array([1e-5,1e-4,1e-3,1e-2,0.1])
fig1,ax = plt.subplots(dpi=72, figsize=(6,4))
ax.plot(x,y)
ax.set_xscale('log')
ax.set_yscale('log')
locs = np.append( np.arange(0.1,1,0.1),np.arange(1,10,0.2))
ax.xaxis.set_minor_locator(ticker.FixedLocator(locs))
ax.xaxis.set_major_locator(ticker.NullLocator())
ax.xaxis.set_minor_formatter(ticker.ScalarFormatter())
plt.show()
For a more generic labeling, one could of course subclass a locator, but we would then need to know the logic to use to determine the ticklabels. (As I do not see a well defined logic for the desired ticks from the question, I feel it would be wasted effort to provide such a solution for now.)

how to shift x axis labesl on line plot?

I'm using pandas to work with a data set and am tring to use a simple line plot with error bars to show the end results. It's all working great except that the plot looks funny.
By default, it will put my 2 data groups at the far left and right of the plot, which obscures the error bar to the point that it's not useful (the error bars in this case are key to intpretation so I want them plainly visible).
Now, I fix that problem by setting xlim to open up some space on either end of the x axis so that the error bars are plainly visible, but then I have an offset from where the x labels are to where the actual x data is.
Here is a simplified example that shows the problem:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df6 = pd.DataFrame( [-0.07,0.08] , index = ['A','B'])
df6.plot(kind='line', linewidth=2, yerr = [ [0.1,0.1],[0.1,0.1 ] ], elinewidth=2,ecolor='green')
plt.xlim(-0.2,1.2) # Make some room at ends to see error bars
plt.show()
I tried to include a plot (image) showing the problem but I cannot post images yet, having just joined up and do not have anough points yet to post images.
What I want to know is: How do I shift these labels over one tick to the right?
Thanks in advance.
Well, it turns out I found a solution, which I will jsut post here in case anyone else has this same issue in the future.
Basically, it all seems to work better in the case of a line plot if you just specify both the labels and the ticks in the same place at the same time. At least that was helpful for me. It sort of forces you to keep the length of those two lists the same, which seems to make the assignment between ticks and labels more well behaved (simple 1:1 in this case).
So I coudl fix my problem by including something like this:
plt.xticks([0, 1], ['A','B'] )
right after the xlim statement in code from original question. Now the A and B align perfectly with the place where the data is plotted, not offset from it.
Using above solution it works, but is less good-looking since now the x grid is very coarse (this is purely and aesthetic consideration). I could fix that by using a different xtick statement like:
plt.xticks([-0.2, 0, 0.2, 0.4, 0.6, 0.8, 1.0], ['','A','','','','','B',''])
This gives me nice looking grid and the data where I need it, but of course is very contrived-looking here. In the actual program I'd find a way to make that less clunky.
Hope that is of some help to fellow seekers....

hist() - how to force equal bins width?

Assuming I have the following array: [1,1,1,2,2,40,60,70,75,80,85,87,95] and I want to create a histogram out of it based on the following bins - x<=2, [3<=x<=80], [x>=81].
If I do the following: arr.hist(bins=(0,2,80,100)) I get the bins to be at different widths (based on their x range). I want them to represent different size ranges but appear in the histogram at the same width. Is it possible in an elegant way?
I can think of adding a new column for this (holding the bin id that will be calculated based on the boundaries I want) but don't really like this solution..
Thanks!
Sounds like you want a bar graph; You could use bar:
import numpy as np
import matplotlib.pyplot as plt
arr=np.array([1,1,1,2,2,40,60,70,75,80,85,87,95])
h=np.histogram(arr,bins=(0,2,80,100))
plt.bar(range(3),h[0],width=1)
xlab=['x<=2', '3<=x<=80]', 'x>=81']
plt.xticks(arange(0.5,3.5,1),xlab)