Forcing exact boundaries in matplotlib's histogram - matplotlib

If I feed a collection of values to matplotlib's histogram, I get a min and max boundary that has been rounded to a nice number.
E.g. if I feed values between -1 and 18.4, my histogram axis will run from -10 to 50.
Is there a way to force the axis to be a perfect fit for the data, without padding?

Use the range attribute to adjust the range in which the histogram buckets are created, and use xlim to control the exact range displayed on the x-axis.
from pylab import hist, xlim
hist(np.random.random(100) * 100, range=(5,90))
xlim(5,90)

Related

What is the x and y axis showing for this matplotlib.pyplot histogram of a .wav file?

The code below generates a histogram from a .wav file, but what exactly does the histogram show? Is the x axis amplitude binned by sampling rate? Is the y axis a count of how many samples are in each amplitude bin? And how is amplitude calculated?
import numpy as np
import matplotlib.pyplot as plt
Fs, data = wavfile.read('audio file')
plt.hist(data, bins = 'auto')
plt.show()
When you plot the histogram, the "x" is the "y" bins (the y gets quantized), and the "y" is the counts, meaning the number of elements that have y values in the bin range. Each bin overs a different range of y, and the histogram counts up the number of elements in that range.
According to this excellent article.
A histogram basically depicts an estimate of the probability
distribution of some variable. To construct a histogram, the range of
possible variable values gets divided into a series of intervals
called bins. The bins must be adjacent to each other and are often
(but necessarily) of equal width. Then a count of how many values fall
into each interval determines the height of each bin such that the
height is proportional to the number of cases in each bin. A histogram
may also be normalized to display “relative” frequencies. It then
shows the proportion of cases that fall into each of several
categories, with the sum of the heights equaling one.
You can find more detailed information here # matplotlib documentation.
If you still have a doubt, I found a very good video series on YouTube by Valerio Velardo and you can find all the slides and codes are here.

Using matplotlib to plot a matrix with the third variable as source for a color map

Say you have the matrix given by three arrays, being:
x = N-dimensional array.
y = M-dimensional array.
And z is a set of "somewhat random" values from -0.3 to 0.3 in a NxM shape. I need to create a plot in which the x values are in the x-axis, y values are in the y-axis and using z as the source to indicate the intensity of each pixel with a color map.
So far, I have tried using
plt.contourf(x,y,z)
and the resulting plot is very nice for me (attached at the end of this paragraph), but a smoothing is automatically applied to the plot! I need to be able to distinguish the pixels and I cannot find a way to do it.
contourf result
I have also studied the possibility of using
ax.matshow(z)
in order to sucesfully see the pixels... but then I am struggling trying to personalize the x and y axis, since only the index of the pixel is shown (see below).
matshow result
Would you please give me some ideas? Thank you.
Without more information on your x,y data it's hard to know, but I would guess you are looking for pcolormesh.
plt.pcolormesh(x,y,z)
This would take the x and y data as input and hence shows the z data at the appropriate coordinates.
You can use imshow with the keyword interpolation='nearest'.
plt.imshow(z, interpolation='nearest')

How do you set the color range for a patch object in matplotlib?

How do you set the color range for a patch object in matplotlib?
cmap = plt.cm.get_cmap('jet')
I'm using this is a loop to iterate over a data frame and color patches according to the value returned. I'd like to set the upper and lower bounds of the colourmap.
currentAxis = plt.gca()
currentAxis.add_patch(Circle((xpos, ypos), radius, fill=False, color=cmap(a[alt_plot_column]), lw=4))
where a[alt_lot_column] returns a float that is used to color the patch by.
The question is - how to scale this colormap so that the max min values are defined by the ranges of the colormap? e.g. 0-20.
I've tried to set the max and vain attributes, but these don't seem to apply to patch objects.
A colormap takes values between 0 and 1 and maps them to a color. Therefore cmap(20) will not work.
You would need to somehow normalize your data. In this case it seems easy,
norm = lambda x: x/20.
cmap(norm(a))
In the general (linear) case you can use a normalization instance
norm = matplotlib.colors.Normalize(vmin=0, vmax=20)
cmap(norm(a))

Interpreting the Y values of a normal distribution

I've written this code to generate a normal distribution of a set of values 1,2,3 :
import pandas as pd
import random
import numpy as np
df = pd.DataFrame({'col1':[1,2,3]})
print(df)
fig, ax = plt.subplots(1,1)
df.plot(kind='hist', normed=True, ax=ax)
Returns :
The X values are the range of possible values but how are the Y values interpreted ?
Reading http://www.stat.yale.edu/Courses/1997-98/101/normal.htm the Y value is calculated using :
A normal distribution has a bell-shaped density curve described by its
mean and standard deviation . The density curve is symmetrical,
centered about its mean, with its spread determined by its standard
deviation. The height of a normal density curve at a given point x is
given by
What is the meaning of this formula ?
I think you are confusing two concepts here. A histogram will just plot how many times a certain value appears. So for your list of [1,2,3], the value 1 will appear once and the same for 2 and 3. If you would have set Normed=False you would get the plot you have now with a height of 1.0.
However, when you set Normed=True, you will turn on normalization. Note that this does not have anything to do with a normal distribution. Have a look at the documentation for hist, which you can find here: http://matplotlib.org/api/pyplot_api.html?highlight=hist#matplotlib.pyplot.hist
There you see that what the option Normed does, which is:
If True, the first element of the return tuple will be the counts normalized to form a probability density, i.e., n/(len(x)`dbin), i.e., the integral of the histogram will sum to 1. If stacked is also True, the sum of the histograms is normalized to 1.
So it gives you the formula right there. So in your case, you have three points, i.e. len(x)=3. If you look at your plot you can see that your bins have a width of 0.2 so dbin=0.2. Each value appears only once for for both 1, 2, and 3, you will have n=1. Thus the height of your bars should be 1/(3*0.2) = 1.67, which is exactly what you see in your histogram.
Now for the normal distribution, that is just a specific probability function that is defined as the formula you gave. It is useful in many fields as it relates to uncertainties. You'll see it a lot in statistics for example. The Wikipedia article on it has lots of info.
If want to generate a list of values that conform to a normal distribution, I would suggest reading the documentation of numpy.random.normal which will do this for you: https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.normal.html

markers on loglog matplotlib figure

I'm plotting multiple curves in loglog scale in matplotlib and, to make them distinguishable, I'm using markers. Since there are a lot of data points, I use markevery=100. But with the horizontal axis on the logarithmic scale, these get clustered. Is there a way to get the markers to space out logarithmically too?
Rather than specifying an integer for markevery which will place a marker at every Nth datapoint, use a float which ensures that the points will be equally spaced along the line (regardless of whether a linear or log scale is used).
every=0.1, (i.e. a float) then markers will be spaced at approximately equal distances along the line; the distance along the line between markers is determined by multiplying the display-coordinate distance of the axes bounding-box diagonal by the value of every.
t = np.arange(0.01, 30, 0.01)
plt.loglog(t, 20 * np.exp(-t / 10.0), '-o', markevery=0.1)