Selective patterns with Matplotlib imshow without using patches - matplotlib

Is there a way to place patterns into selected areas on an imshow graph? To be precise, I need to make it so that, in addition to the numerical-data-carrying colored squares, I also have different patterns in other squares indicate different failure modes for the experiment (and also generate a key explaining the meaning of these different patterns). An example of a pattern that would be useful would be various types of crosshatches. I need to be able to do this without disrupting the main color-numerical data relationship on the graph.
Due to the back-end I am working within for the GUI containing the graph, I cannot utilize patches (they fail to pickle and make it from the back-end to the front-end via the multiprocessing package). I was wondering if anyone knew of another way to do this.
grid = np.ma.array(grid, mask=np.isnan(grid))
ax.imshow(grid, interpolation='nearest', aspect='equal', vmax = private.vmax, vmin = private.vmin)
# Up to here works fine and draws the graph showing only the data with white spaces for any point that failed
if show_fail and faildat != []:
faildat = faildat[np.lexsort((faildat[:,yind],faildat[:,xind]))]
fails = []
for i in range(len(faildat)): #gives coordinates with failures as (x,y)
fails.append((faildat[i,1],faildat[i,0]))
for F in fails:
ax.FUNCTION NEEDED HERE
ax.minorticks_off()
ax.set_xticks(range(len(placex)))
ax.set_yticks(range(len(placey)))
ax.set_xticklabels(placex)
ax.set_yticklabels(placey, rotation = 0)
ax.colorbar()
ax.show()

Related

How to get plot as svg string in sympy plotting backends library?

I use sympy plotting backends library to create plots directly from sympy expressions. I chose this library because it gives more options for plots fine tunning compared to standart sympy plotting module.
As backend in choosen library i use matplotlib.
My aim is to get resulting plot as svg string in order to insert it in the web page later. I need to do it programmatically. I use the code below:
from spb import plot, MB
import io
from sympy import symbols, sin, cos
x = symbols("x")
# create plot
p1 = plot(
(sin(x), "a", dict(color="k", linestyle=":")),
(cos(x), "b"),
backend=MB, show=False)
# buffer:
f = io.StringIO()
# save plot in buffer as svg string:
p1._fig.savefig(f, format = "svg")
# return result as svg string to insert it in web page later:
return f.getvalue()
The problem is that i get an exception:
'NoneType' object has no attribute 'savefig'
p1._fig.savefig(f, format = "svg")
But if i slightly modify the code:
...
# buffer:
f = io.StringIO()
# show plot:
p1.show()
# save plot in buffer as svg string:
p1._fig.savefig(f, format = "svg")
...
Everything works just fine. But the problem is that i do not want to show the plot, i need to save it as svg string. Anyone knows how to solve this task?
Here I am, the developer of that module.
That error is caused by the fact that when you create a plot with MatplotlibBackend and show=False, the figure is not created (too long to explain why); this behavior is specific to MatplotlibBackend, the other backends should not be affected by it. So, p1.fig is None.
However, the plotting function exposes the save method, which is nothing more than a wrapper to a specific plotting library "save" functionality. If you look at the source code, you'll see that MatplotlibBackend.save calls matplotlib's savefig, but first it checks if the figure has been created. If not, it forces the creation.
So, all you have to do is:
p1.save(f, format = "svg")
One final note. If possible, do not use attributes or methods starting with _ (underscore). They represent private attributes and the names might change from version to version. If you really need to retrieve the matplotlib figure, use p1.fig.
EDIT to answer the question in the comment about performance:
For back-compatibility reasons, the new module uses an adaptive algorithm for line plots by default, which is different from the one used on SymPy. On the one hand it can be easily applied to a wider set of applications, on the other hand it is slower.
You have two options: either way you might want to change the configuration file of the module.
Option 1: the adaptive algorithm minimizes some loss function (loss_fn) and stops when a threshold (adaptive_goal) has been reached. We can increase this threshold (which by default is set to 0.01), thus improving performance but sacrificing the smooth quality of lines.
from spb.defaults import cfg, set_defaults
# it requires a few tries to find an appropriate value
cfg["adaptive"]["goal"] = 0.02
set_defaults(cfg)
# restart the kernel to load the new configuration
Option 2: don't use the adaptive algorithm and switch to the uniform meshing algorithm which uses Numpy and vectorization (usually done with adaptive=False and possibly setting an appropriate number of discretization points n=something)! This is very very fast in comparison to the adaptive algorithm.
Think about it: generally, our plots are relatively small in comparison to the screen size. 1000 points per line (or whatever number you decide to use) should create smooth-enough lines.
So, you can deactivate the adaptive algorithm on a plot-by-plot basis (with adaptive=False), or you can set the module to always use the uniform meshing algorithm (this is the setup that I use on my machine).
from spb.defaults import cfg, set_defaults
# disable adaptive algorithm
cfg["adaptive"]["used_by_default"] = False
set_defaults(cfg)
# restart the kernel to load the new configuration
Then, when you create a plot and you feel like it should be smoother, simply increase the number of discretization points by setting n=something (default value is 1000).
You can find more customization options on this documentation page.

Multiple axis scale in Lets plot Kotlin

I'm learning some data science related topics and oh boy, this is a jungle of different libraries for everything 😅
Because of things, I went with Lets-plot, which has a nice Kotlin API that I'm using combined with Kotlin kernel for Jupyter notebooks
Overall, things are going pretty good. Most tutorials & docs I see online use different libraries for plotting (e.g. Seaborn, Matplotlib, Plotly) so most of the time I have to do some reading of the Lets-Plot-Kotlin reference and try/error until I find the equivalent code for my graphs
Currently, I'm trying to graph the distribution of differences between two values. Overall, this looks pretty good. I can just do something like
(letsPlot(df)
+ geomHistogram { x = "some-column" }
).show()
which gives a nice graph
It would be interesting to see the density estimator as well, geomDensity to the rescue!
(letsPlot(df)
+ geomDensity(color = "red") { x = "some-column" }
).show()
Nice! Now let's watch them both together
(letsPlot(df)
+ geomDensity(color = "red") { x = "some-column" }
+ geomHistogram() { x = "some-column" }
).show()
As you can see, there's a small red line in the bottom (the geomDensity!). Problem here (I would say) is that both layers are using the same Y scale. Histogram is working with 0-20 values and density with 0-0.02 so when plotted together it's just a line at the bottom
Is there any way to add several layers in the same plot that use their own scale? I've read some blogposts that claim that you should not go for it (seems to be pretty much accepted by the community.
My target is to achieve something similar to what you can do with Seaborn by doing
plt.figure(figsize=(10,4),dpi=200)
sns.histplot(data=df,x='some_column',kde=True,bins=25)
(yes I know I took the lets plot screenshot without the bins configured. Not relevant, I'd say ¯_(ツ)_/¯ )
Maybe I'm just approaching the problem with a mindset I should not? As mentioned, I'm still learning so every alternative will be highly welcomed 😃
Just, please, don't go with the "Switch to Python". I'm exploring and I'd prefer to go one topic at a time
In order for histogram and density layers to share the same y-scale you need to map variable "..density.." to aesthetic "y" in the histogram layer (by default histogram maps "..count.." to "y").
You will find an example of it in cell [4] in this notebook: https://nbviewer.org/github/JetBrains/lets-plot-kotlin/blob/master/docs/examples/jupyter-notebooks/distributions.ipynb
BWT, many of the pages in Lets-Plot Kotlin API Reference are equipped with links on demo-notebooks, in "Examples" section: geomHistogram().
And of course you can find a lot of info online on the R ggplot2 package which is largely applicable to Lets-Plot as well. For example: Histogram with kernel density estimation.
Finally :) , calling show() is not necessary - Jupyter Kotlin kernel will render plot automatically if plot expression is the last one in the cell which is often the case.

Interpolating data onto a line of points

I have some irregularly spaced data and need to analyze it. I can successfully interpolate this data onto a regular grid using mlab.griddata (or rather, the natgrid implementation of it). This allows me to use pcolormesh and contour to generate plots, extract levels, etc. Using plot.contour, I then extract a certain level using get_paths from the contour CS.collections().
Now, what I'd like to do is then, with my original irregularly spaced data, interpolate some quantities onto this specific contour line (i.e., NOT onto a regular grid). The similarly named griddata function from Scipy allows for this behavior, and it almost works. However, I find that as I increase the number of original points, I can get odd erratic behavior in the interpolation. I'm wondering if there's a way around this, i.e., another way to interpolate irregularly spaced (or regularly spaced data for that matter, since I can use my regularly spaced data from mlab.griddata) onto a specific line.
Let me show some numerical examples of what I'm talking about. Take a look at this figure:
The top left shows my data as points, and the line shows an extracted level of level=0 from some data D that I have at those points (x,y) [note, I have data 'D', 'Energy', and 'Pressure', all defined in this (x,y) space]. Once I have this curve, I can plot the interpolated quantities of D, Energy, and Pressure onto my specific line. First, note the plot of D (middle, right). It should be zero at all points, but it's not quite zero at all points. The likely cause of this is that the line that corresponds to the 0 level is generated from a uniform set of points that came from mlab.griddata, whereas the plot of 'D' is generated from my ORIGINAL data interpolated onto that level curve. You can also see some unphysical wiggles in 'Energy' and 'Pressure'.
Okay, seems easy enough, right? Maybe I should just get more original data points along my level=0 curve. Getting some more of these points, I then generate the following plots:
First look at the top left. You can see that I've sampled the hell out of the (x,y) space in the vicinity of my level=0 curve. Furthermore, you can see that my new "D" plot (middle, right) now correctly interpolates to zero in the region that it originally didn't. But now I get some wiggles at the start of the curve, as well as getting some other wiggles in the 'Energy' and 'Pressure' in this space! It is far from obvious to me that this should occur, since my original data points are still there and I've only supplemented additional points. Furthermore, some regions where my interpolation is going bad aren't even near the points that I added in the second run -- they are exclusively neighbored by my original points.
So this brings me to my original question. I'm worried that the interpolation that produces the 'Energy', 'D', and 'Pressure' curves is not working correctly (this is scigrid's griddata). Mlab's griddata only interpolates to a regular grid, whereas I want to interpolate to this specific line shown in the top left plot. What's another way for me to do this?
Thanks for your time!
After posting this, I decided to try scipy.interpolate.SmoothBivariateSpline, which produced the following result:
You can now see that my line is smoothed, so it seems like this will work. I'll mark this as the answer unless someone posts something soon that hints that there may be an even better solution.
Edit: As requested, below is some of the code used to generate these plots. I don't have a minimally working example, and the above plots were generated in a larger framework of code, but I'll write the important parts schematically below with comments.
# x,y,z are lists of data where the first point is x[0],y[0],z[0], and so on
minx=min(x)
maxx=max(x)
miny=min(y)
maxy=max(y)
# convert to numpy arrays
x=np.array(x)
y=np.array(y)
z=np.array(z)
# here we are creating a fine grid to interpolate the data onto
xi=np.linspace(minx,maxx,100)
yi=np.linspace(miny,maxy,100)
# here we interpolate our data from the original x,y,z unstructured grid to the new
# fine, regular grid in xi,yi, returning the values zi
zi=griddata(x,y,z,xi,yi)
# now let's do some plotting
plt.figure()
# returns the CS contour object, from which we'll be able to get the path for the
# level=0 curve
CS=plt.contour(x,y,z,levels=[0])
# can plot the original data if we want
plt.scatter(x,y,alpha=0.5,marker='x')
# now let's get the level=0 curve
for c in CS.collections:
data=c.get_paths()[0].vertices
# lineX,lineY are simply the x,y coordinates for our level=0 curve, expressed as arrays
lineX=data[:,0]
lineY=data[:,1]
# so it's easy to plot this too
plt.plot(lineX,lineY)
# now what to do if we want to interpolate some other data we have, say z2
# (also at our original x,y positions), onto
# this level=0 curve?
# well, first I tried using scipy.interpolate.griddata == scigrid like so
origdata=np.transpose(np.vstack((x,y))) # just organizing this data like the
# scigrid routine expects
lineZ2=scigrid(origdata,z2,data,method='linear')
# plotting the above curve (as plt.plot(lineZ2)) gave me really bad results, so
# trying a spline approach
Z2spline=SmoothBivariateSpline(x,y,z2)
# the above creates a spline object on our original data. notice we haven't EVALUATED
# it anywhere yet (we'll want to evaluate it on our level curve)
Z2Line=[]
# here we evaluate the spline along all our points on the level curve, and store the
# result as a new list
for i in range(0,len(lineX)):
Z2Line.append(Z2spline(lineX[i],lineY[i])[0][0]) # the [0][0] is just to get the
# value, which is enclosed in
# some array structure for some
# reason otherwise
# you can then easily plot this
plt.plot(Z2Line)
Hope this helps someone!

Put pcolormesh and contour onto same grid?

I'm trying to display 2D data with axis labels using both contour and pcolormesh. As has been noted on the matplotlib user list, these functions obey different conventions: pcolormesh expects the x and y values to specify the corners of the individual pixels, while contour expects the centers of the pixels.
What is the best way to make these behave consistently?
One option I've considered is to make a "centers-to-edges" function, assuming evenly spaced data:
def centers_to_edges(arr):
dx = arr[1]-arr[0]
newarr = np.linspace(arr.min()-dx/2,arr.max()+dx/2,arr.size+1)
return newarr
Another option is to use imshow with the extent keyword set.
The first approach doesn't play nicely with 2D axes (e.g., as created by meshgrid or indices) and the second discards the axis numbers entirely
Your data is a regular mesh? If it doesn't, you can use griddata() to obtain it. I think that if your data is too big, a sub-sampling or regularization always is possible. If the data is too big, maybe your output image always will be small compared with it and you can exploit this.
If you use imshow() with "extent" and "interpolation='nearest'", you will see that the data is cell-centered, and extent provided the lower edges of cells (corners). On the other hand, contour assumes that the data is cell-centered, and X,Y must be the center of cells. So, you need to be care about the input domain for contour. The trivial example is:
x = np.arange(-10,10,1)
X,Y = np.meshgrid(x,x)
P = X**2+Y**2
imshow(P,extent=[-10,10,-10,10],interpolation='nearest',origin='lower')
contour(X+0.5,Y+0.5,P,20,colors='k')
My tests told me that pcolormesh() is a very slow routine, and I always try to avoid it. griddata and imshow() always is a good choose for me.

Creating grid and interpolating (x,y,z) for contour plot sagemath

!I have values in the form of (x,y,z). By creating a list_plot3d plot i can clearly see that they are not quite evenly spaced. They usually form little "blobs" of 3 to 5 points on the xy plane. So for the interpolation and the final "contour" plot to be better, or should i say smoother(?), do i have to create a rectangular grid (like the squares on a chess board) so that the blobs of data are somehow "smoothed"? I understand that this might be trivial to some people but i am trying this for the first time and i am struggling a bit. I have been looking at the scipy packages like scipy.interplate.interp2d but the graphs produced at the end are really bad. Maybe a brief tutorial on 2d interpolation in sagemath for an amateur like me? Some advice? Thank you.
EDIT:
https://docs.google.com/file/d/0Bxv8ab9PeMQVUFhBYWlldU9ib0E/edit?pli=1
This is mostly the kind of graphs it produces along with this message:
Warning: No more knots can be added because the number of B-spline
coefficients
already exceeds the number of data points m. Probably causes:
either
s or m too small. (fp>s)
kx,ky=3,3 nx,ny=17,20 m=200 fp=4696.972223 s=0.000000
To get this graph i just run this command:
f_interpolation = scipy.interpolate.interp2d(*zip(*matrix(C)),kind='cubic')
plot_interpolation = contour_plot(lambda x,y:
f_interpolation(x,y)[0], (22.419,22.439),(37.06,37.08) ,cmap='jet', contours=numpy.arange(0,1400,100), colorbar=True)
plot_all = plot_interpolation
plot_all.show(axes_labels=["m", "m"])
Where matrix(c) can be a huge matrix like 10000 X 3 or even a lot more like 1000000 x 3. The problem of bad graphs persists even with fewer data like the picture i attached now where matrix(C) was only 200 x 3. That's why i begin to think that it could be that apart from a possible glitch with the program my approach to the use of this command might be totally wrong, hence the reason for me to ask for advice about using a grid and not just "throwing" my data into a command.
I've had a similar problem using the scipy.interpolate.interp2d function. My understanding is that the issue arises because the interp1d/interp2d and related functions use an older wrapping of FITPACK for the underlying calculations. I was able to get a problem similar to yours to work using the spline functions, which rely on a newer wrapping of FITPACK. The spline functions can be identified because they seem to all have capital letters in their names here http://docs.scipy.org/doc/scipy/reference/interpolate.html. Within the scipy installation, these newer functions appear to be located in scipy/interpolate/fitpack2.py, while the functions using the older wrappings are in fitpack.py.
For your purposes, RectBivariateSpline is what I believe you want. Here is some sample code for implementing RectBivariateSpline:
import numpy as np
from scipy import interpolate
# Generate unevenly spaced x/y data for axes
npoints = 25
maxaxis = 100
x = (np.random.rand(npoints)*maxaxis) - maxaxis/2.
y = (np.random.rand(npoints)*maxaxis) - maxaxis/2.
xsort = np.sort(x)
ysort = np.sort(y)
# Generate the z-data, which first requires converting
# x/y data into grids
xg, yg = np.meshgrid(xsort,ysort)
z = xg**2 - yg**2
# Generate the interpolated, evenly spaced data
# Note that the min/max of x/y isn't necessarily 0 and 100 since
# randomly chosen points were used. If we want to avoid extrapolation,
# the explicit min/max must be found
interppoints = 100
xinterp = np.linspace(xsort[0],xsort[-1],interppoints)
yinterp = np.linspace(ysort[0],ysort[-1],interppoints)
# Generate the kernel that will be used for interpolation
# Note that the default version uses three coefficients for
# interpolation (i.e. parabolic, a*x**2 + b*x +c). Higher order
# interpolation can be used by setting kx and ky to larger
# integers, i.e. interpolate.RectBivariateSpline(xsort,ysort,z,kx=5,ky=5)
kernel = interpolate.RectBivariateSpline(xsort,ysort,z)
# Now calculate the linear, interpolated data
zinterp = kernel(xinterp, yinterp)