Matplotlib namespace issues? - matplotlib

I have a question regarding the Matplotlib.pyplot and namespaces.
See the following code:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.pyplot import cm
x=np.linspace(0,1,28)
color=iter(cm.gist_rainbow_r(np.linspace(0,1,28)))
plt.clf()
for s in range(28):
c=next(color)
plt.plot(x,x*s, c=c)
plt.show()
The idea was to have the plots in different colors of the rainbow map.
Now what happens is that on first execution it works, but then things are getting weird.
On several consecutive executions the map is stopped being used and instead of that the default map is used.
I see that the problem may lie within the "c=c" in the plot function, but I have played around with different namings "c", "color", .... and could not find the systematic of the issue here.
Can someone reproduce the problem and (try the code at least 5 times or so consecutively) is able to explain, what is going on here?
Thanks

This is known issue with mpl + python3.4+ that has been fixed in mpl v1.5+.
Many of the style parameters have multiple aliases (ex 'c' vs 'color') which mpl was not merging properly and the artists were essentially getting told two different colors which internally means there is a dictionary with both 'c' and 'color' in it.
In python 3.4+ process-to-process order of iteration of dictionaries is random by default due to the seed for the underlying hash table being randomized (this was to prevent a possible DOS attack based on intentional hash table collisions). In older versions of python it so happened that the user supplied color always came later in the iteration order so things coincidentally worked.
The simple work around (iirc) is to use plot(x, y, color=c) or update to mpl 1.5.1.

Related

Draw an ordinary plot with the same style as in plt.hist(histtype='step')

The method plt.hist() in pyplot has a way to create a 'step-like' plot style when calling
plt.hist(data, histtype='step')
but the 'ordinary' methods that plot raw data without processing (plt.plot(), plt.scatter(), etc.) apparently do not have style options to obtain the same result. My goal is to plot a given set of points using that style, without making histogram of these points.
Is that achievable with standard library methods for plotting a given 2-D set of points?
I also think that there is at least one hack (generating a fake distribution which would have histogram equal to our data) and a 'low-level' solution to draw each segment manually, but none of these ways seems favorable.
Maybe you are looking for drawstyle="steps".
import numpy as np; np.random.seed(42)
import matplotlib.pyplot as plt
data = np.cumsum(np.random.randn(10))
plt.plot(data, drawstyle="steps")
plt.show()
Note that this is slightly different from histograms, because the lines do not go to zero at the ends.

Difference between matplotlib.countourf and matlab.contourf() - odd sharp edges in matplotlib

I am a recent migrant from Matlab to Python and have recently worked with Numpy and Matplotlib. I recoded one of my scripts from Matlab, which employs Matlab's contourf-function, into Python using matplotlib's corresponding contourf-function. I managed to replicate the output in Python, apart that the contourf-plots are not exacly the same, for a reason that is unknown to me. As I run the contourf-function in matplotlib, I get this otherwise nice figure but it has these sharp edges on the contour-levels on top and bottom, which should not be there (see Figure 1 below, matplotlib-output). Now, when I export the arrays I used in Python to Matlab (i.e. the exactly same data set that was used to generate the matplotlib-contourf-plot) and use Matlab's contourf-function, I get a slightly different output, without those sharp contour-level edges (see Figure 2 below, Matlab-output). I used the same number of levels in both figures. In figure 3 I have made a scatterplot of the same data, which shows that there are no such sharp edges in the data as shown in the contourf-plot (I added contour-lines just for reference). Example dataset can be downloaded through Dropbox-link given below. The data set contains three txt-files: X, Y, Z. Each of them are an 500x500 arrays, which can be directly used with contourf(), i.e. plt.contourf(X,Y,Z,...). The code that used was
plt.contourf(X,Y,Z,10, cmap=plt.cm.jet)
plt.contour(X,Y,Z,10,colors='black', linewidths=0.5)
plt.axis('equal')
plt.axis('off')
Does anyone have an idea why this happens? I would appreciate any insight on this!
Cheers,
Jussi
Below are the details of my setup:
Python 3.7.0
IPython 6.5.0
matplotlib 2.2.3
Matplotlib output
Matlab output
Matplotlib-scatter
Link to data set
The confusing thing about the matlab plot is that its colorbar shows much more levels than there are actually in the plot. Hence you don't see the actual intervals that are contoured.
You would achieve the same result in matplotlib by choosing 12 instead of 11 levels.
import numpy as np
import matplotlib.pyplot as plt
X, Y, Z = [np.loadtxt("data/roundcontourdata/{}.txt".format(i)) for i in list("XYZ")]
levels = np.linspace(Z.min(), Z.max(), 12)
cntr = plt.contourf(X,Y,Z,levels, cmap=plt.cm.jet)
plt.contour(X,Y,Z,levels,colors='black', linewidths=0.5)
plt.colorbar(cntr)
plt.axis('equal')
plt.axis('off')
plt.show()
So in conclusion, both plots are correct and show the same data. Just the levels being automatically chosen are different. This can be circumvented by choosing custom levels depending on the desired visual appearance.

Adding descriptive stats to this plot

In pandas/seaborn:
sns.distplot(combo['resubmits'], kde=False, bins=8)
plt.savefig("g1.png")
Makes a very pretty histogram. I want to include a textual "legend" showing the mean, stdev, n, etc as numbers in a box. You would think this is so common that there's a semi automatic way to do it but I can't find it.
There is a feature request for that.
However, note that using matplotlib.pyplot.axvline, you can easily do it yourself for now.
from matplotlib import pyplot as plt
plt.axvline(x, 0, y_max)
where x=combo['resubmits'].mean() and y_max is the maximal value of hist(combo['resubmits'])'s bins' values.

Better ticks and tick labels with log scale

I am trying to get better looking log-log plots and I almost got what I want except for a minor problem.
The reason my example throws off the standard settings is that the x values are confined within less than one decade and I want to use decimal, not scientific notation.
Allow me to illustrate with an example:
import matplotlib.pyplot as plt
%matplotlib inline
import matplotlib as mpl
import numpy as np
x = np.array([0.6,0.83,1.1,1.8,2])
y = np.array([1e-5,1e-4,1e-3,1e-2,0.1])
fig1,ax = plt.subplots()
ax.plot(x,y)
ax.set_xscale('log')
ax.set_yscale('log')
which produces:
There are two problems with the x axis:
The use of scientific notation, which in this case is counterproductive
The horrible "offset" at the lower right corner
After much reading, I added three lines of code:
ax.xaxis.set_major_formatter(mpl.ticker.ScalarFormatter())
ax.xaxis.set_minor_formatter(mpl.ticker.ScalarFormatter())
ax.ticklabel_format(style='plain',axis='x',useOffset=False)
This produces:
My understanding of this is that there are 5 minor ticks and 1 major one. It is much better, but still not perfect:
I would like some additional ticks between 1 and 2
Formatting of label at 1 is wrong. It should be "1.0"
So I inserted the following line before the formatter statement:
ax.xaxis.set_major_locator(mpl.ticker.MultipleLocator(0.2))
I finally get the ticks I want:
I now have 8 major and 2 minor ticks. Now, this almost looks right except for the fact that the tick labels at 0.6, 0.8 and 2.0 appear bolder than the others. What is the reason for this and how can I correct it?
The reason, some of the labels appear bold is that they are part of the major and minor ticklabels. If two texts perfectly overlap, they appear bolder due to the antialiasing.
You may decide to only use minor ticklabels and set the major ones with a NullLocator.
Since the locations of the ticklabels you wish to have is really specific there is no automatic locator that would provide them out of the box. For this special case it may be easiest to use a FixedLocator and specify the labels you wish to have as a list.
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import numpy as np
x = np.array([0.6,0.83,1.1,1.8,2])
y = np.array([1e-5,1e-4,1e-3,1e-2,0.1])
fig1,ax = plt.subplots(dpi=72, figsize=(6,4))
ax.plot(x,y)
ax.set_xscale('log')
ax.set_yscale('log')
locs = np.append( np.arange(0.1,1,0.1),np.arange(1,10,0.2))
ax.xaxis.set_minor_locator(ticker.FixedLocator(locs))
ax.xaxis.set_major_locator(ticker.NullLocator())
ax.xaxis.set_minor_formatter(ticker.ScalarFormatter())
plt.show()
For a more generic labeling, one could of course subclass a locator, but we would then need to know the logic to use to determine the ticklabels. (As I do not see a well defined logic for the desired ticks from the question, I feel it would be wasted effort to provide such a solution for now.)

matplotlib get rid of max_open_warning output

I wrote a script that calls functions from QIIME to build a bunch of plots among other things. Everything runs fine to completion, but matplotlib always throws the following feedback for every plot it creates (super annoying):
/usr/local/lib/python2.7/dist-packages/matplotlib/pyplot.py:412: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (matplotlib.pyplot.figure) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam figure.max_num_figures).
max_open_warning, RuntimeWarning)
I found this page which seems to explain how to fix this problem , but after I follow directions, nothing changes:
import matplotlib as mpl
mpl.rcParams[figure.max_open_warning'] = 0
I went into the file after calling matplotlib directly from python to see which rcparams file I should be investigating and manually changed the 20 to 0. Still no change. In case the documentation was incorrect, I also changed it to 1000, and still am getting the same warning messages.
I understand that this could be a problem for people running on computers with limited power, but that isn't a problem in my case. How can I make this feedback go away permanently?
Try setting it this way:
import matplotlib as plt
plt.rcParams.update({'figure.max_open_warning': 0})
Not sure exactly why this works, but it mirrors the way I have changed the font size in the past and seems to fix the warnings for me.
Another way I just tried and it worked:
import matplotlib as mpl
mpl.rc('figure', max_open_warning = 0)
When using Seaborn you can do it like this
import seaborn as sns
sns.set_theme(rc={'figure.max_open_warning': 0})
Check out this article which basically says to plt.close(fig1) after you're done with fig1. This way you don't have too many figs floating around in memory.
In Matplotlib, figure.max_open_warning is a configuration parameter that determines the maximum number of figures that can be opened before a warning is issued. By default, the value of this parameter is 20. This means that if you open more than 20 figures in a single Matplotlib session, you will see a warning message. You can change the value of this parameter by using the matplotlib.rcParams function. For example:
import matplotlib.pyplot as plt
plt.rcParams['figure.max_open_warning'] = 50
This will set the value of figure.max_open_warning to 50, so that you will see a warning message if you open more than 50 figures in a single Matplotlib session.