imshow non unifrom matrix bin size - matplotlib

I am trying to create an image with imshow, but the bins in my matrix are not equal.
For example the following matrix
C = [[1,2,2],[2,3,2],[3,2,3]]
is for X = [1,4,8] and for Y = [2,4,9]
I know I can just do xticks and yticks, but I want the axis to be equal..This means that I will need the squares which build the imshow to be in different sizes.
Is it possible?

This seems like a job for pcolormesh.
From When to use imshow over pcolormesh:
Fundamentally, imshow assumes that all data elements in your array are
to be rendered at the same size, whereas pcolormesh/pcolor associates
elements of the data array with rectangular elements whose size may
vary over the rectangular grid.
pcolormesh plots a matrix as cells, and take as argument the x and y coordinates of the cells, which allows you to draw each cell in a different size.
I assume the X and Y of your example data are meant to be the size of the cells. So I converted them in coordinates with:
xSize=[1,4,9]
ySize=[2,4,8]
x=np.append(0,np.cumsum(xSize)) # gives [ 0 1 5 13]
y=np.append(0,np.cumsum(ySize)) # gives [ 0 2 6 15]
Then if you want a similar behavior as imshow, you need to revert the y axis.
c=np.array([[1,2,2],[2,3,2],[3,2,3]])
plt.pcolormesh(x,-y,c)
Which gives us:

Related

Changing variable labels/legend in raster plot to discrete characters

I have just made a plot using raster data that consists of 6 different land types and fit them to polygon vectors. I'm trying to change the values on the continuous scale bar (1-6) to the names of each landtype (e.g. grasslands, urban, etc) which is what each different colour represents. I have tried inserting breaks, however then each box in the legend contains labels (1-2, 2-3, 3-4 etc.)
Raster plot where each diff colour represents diff land type
This is my code:
rasterxpolygonplotcode
Example data
library(terra)
r <- rast(nrows=10, ncols=10)
values(r) <- sample(3, ncell(r), replace=TRUE)
cover <- c("forest", "water", "urban")
You can either do:
plot(r, type="classes", levels=cover)
Or first make the raster categorical
levels(r) <- data.frame(id=1:3, cover=c("forest", "water", "urban"))
plot(r)

Using matplotlib to plot a matrix with the third variable as source for a color map

Say you have the matrix given by three arrays, being:
x = N-dimensional array.
y = M-dimensional array.
And z is a set of "somewhat random" values from -0.3 to 0.3 in a NxM shape. I need to create a plot in which the x values are in the x-axis, y values are in the y-axis and using z as the source to indicate the intensity of each pixel with a color map.
So far, I have tried using
plt.contourf(x,y,z)
and the resulting plot is very nice for me (attached at the end of this paragraph), but a smoothing is automatically applied to the plot! I need to be able to distinguish the pixels and I cannot find a way to do it.
contourf result
I have also studied the possibility of using
ax.matshow(z)
in order to sucesfully see the pixels... but then I am struggling trying to personalize the x and y axis, since only the index of the pixel is shown (see below).
matshow result
Would you please give me some ideas? Thank you.
Without more information on your x,y data it's hard to know, but I would guess you are looking for pcolormesh.
plt.pcolormesh(x,y,z)
This would take the x and y data as input and hence shows the z data at the appropriate coordinates.
You can use imshow with the keyword interpolation='nearest'.
plt.imshow(z, interpolation='nearest')

Tensorflow extract_glimpse offset

I am trying to use the extract_glimpse function of tensorflow but I encounter some difficulties with the offset parameter.
Let's assume that I have a batch of one single channel 5x5 matrix called M and that I want to extract a 3x3 matrix of it.
When I call extract_glimpse([M], [3,3], [[1,1]], centered=False, normalized=False), it returns the result I am expecting: the 3x3 matrix centered at the position (1,1) in M.
But when I call extract_glimpse([M], [3,3], [[2,1]], centered=False, normalized=False), it doesn't return the 3x3 matrix centered at the position (2,1) in M but it returns the same as in the first call.
What is the point that I don't get?
The pixel coordinates actually have a range of 2 times the size (not documented - so it is a bug indeed). This is at least true in the case of centered=True and normalized=False. With those settings, the offsets range from minus the size to plus the size of the tensor. I therefore wrote a wrapper that is more intuitive to numpy users, using pixel coordinates starting at (0,0). This wrapper and more details about the problem are available on the tensorflow GitHub page.
For your particular case, I would try something like:
offsets1 = [-5 + 3,
-5 + 3]
extract_glimpse([M], [3,3], [offsets1], centered=True, normalized=False)
offsets2 = [-5 + 3 + 2,
-5 + 3]
extract_glimpse([M], [3,3], [offsets2], centered=True, normalized=False)

pyplot scatter plot marker size

In the pyplot document for scatter plot:
matplotlib.pyplot.scatter(x, y, s=20, c='b', marker='o', cmap=None, norm=None,
vmin=None, vmax=None, alpha=None, linewidths=None,
faceted=True, verts=None, hold=None, **kwargs)
The marker size
s:
size in points^2. It is a scalar or an array of the same length as x and y.
What kind of unit is points^2? What does it mean? Does s=100 mean 10 pixel x 10 pixel?
Basically I'm trying to make scatter plots with different marker sizes, and I want to figure out what does the s number mean.
This can be a somewhat confusing way of defining the size but you are basically specifying the area of the marker. This means, to double the width (or height) of the marker you need to increase s by a factor of 4. [because A = WH => (2W)(2H)=4A]
There is a reason, however, that the size of markers is defined in this way. Because of the scaling of area as the square of width, doubling the width actually appears to increase the size by more than a factor 2 (in fact it increases it by a factor of 4). To see this consider the following two examples and the output they produce.
# doubling the width of markers
x = [0,2,4,6,8,10]
y = [0]*len(x)
s = [20*4**n for n in range(len(x))]
plt.scatter(x,y,s=s)
plt.show()
gives
Notice how the size increases very quickly. If instead we have
# doubling the area of markers
x = [0,2,4,6,8,10]
y = [0]*len(x)
s = [20*2**n for n in range(len(x))]
plt.scatter(x,y,s=s)
plt.show()
gives
Now the apparent size of the markers increases roughly linearly in an intuitive fashion.
As for the exact meaning of what a 'point' is, it is fairly arbitrary for plotting purposes, you can just scale all of your sizes by a constant until they look reasonable.
Edit: (In response to comment from #Emma)
It's probably confusing wording on my part. The question asked about doubling the width of a circle so in the first picture for each circle (as we move from left to right) it's width is double the previous one so for the area this is an exponential with base 4. Similarly the second example each circle has area double the last one which gives an exponential with base 2.
However it is the second example (where we are scaling area) that doubling area appears to make the circle twice as big to the eye. Thus if we want a circle to appear a factor of n bigger we would increase the area by a factor n not the radius so the apparent size scales linearly with the area.
Edit to visualize the comment by #TomaszGandor:
This is what it looks like for different functions of the marker size:
x = [0,2,4,6,8,10,12,14,16,18]
s_exp = [20*2**n for n in range(len(x))]
s_square = [20*n**2 for n in range(len(x))]
s_linear = [20*n for n in range(len(x))]
plt.scatter(x,[1]*len(x),s=s_exp, label='$s=2^n$', lw=1)
plt.scatter(x,[0]*len(x),s=s_square, label='$s=n^2$')
plt.scatter(x,[-1]*len(x),s=s_linear, label='$s=n$')
plt.ylim(-1.5,1.5)
plt.legend(loc='center left', bbox_to_anchor=(1.1, 0.5), labelspacing=3)
plt.show()
Because other answers here claim that s denotes the area of the marker, I'm adding this answer to clearify that this is not necessarily the case.
Size in points^2
The argument s in plt.scatter denotes the markersize**2. As the documentation says
s : scalar or array_like, shape (n, ), optional
size in points^2. Default is rcParams['lines.markersize'] ** 2.
This can be taken literally. In order to obtain a marker which is x points large, you need to square that number and give it to the s argument.
So the relationship between the markersize of a line plot and the scatter size argument is the square. In order to produce a scatter marker of the same size as a plot marker of size 10 points you would hence call scatter( .., s=100).
import matplotlib.pyplot as plt
fig,ax = plt.subplots()
ax.plot([0],[0], marker="o", markersize=10)
ax.plot([0.07,0.93],[0,0], linewidth=10)
ax.scatter([1],[0], s=100)
ax.plot([0],[1], marker="o", markersize=22)
ax.plot([0.14,0.86],[1,1], linewidth=22)
ax.scatter([1],[1], s=22**2)
plt.show()
Connection to "area"
So why do other answers and even the documentation speak about "area" when it comes to the s parameter?
Of course the units of points**2 are area units.
For the special case of a square marker, marker="s", the area of the marker is indeed directly the value of the s parameter.
For a circle, the area of the circle is area = pi/4*s.
For other markers there may not even be any obvious relation to the area of the marker.
In all cases however the area of the marker is proportional to the s parameter. This is the motivation to call it "area" even though in most cases it isn't really.
Specifying the size of the scatter markers in terms of some quantity which is proportional to the area of the marker makes in thus far sense as it is the area of the marker that is perceived when comparing different patches rather than its side length or diameter. I.e. doubling the underlying quantity should double the area of the marker.
What are points?
So far the answer to what the size of a scatter marker means is given in units of points. Points are often used in typography, where fonts are specified in points. Also linewidths is often specified in points. The standard size of points in matplotlib is 72 points per inch (ppi) - 1 point is hence 1/72 inches.
It might be useful to be able to specify sizes in pixels instead of points. If the figure dpi is 72 as well, one point is one pixel. If the figure dpi is different (matplotlib default is fig.dpi=100),
1 point == fig.dpi/72. pixels
While the scatter marker's size in points would hence look different for different figure dpi, one could produce a 10 by 10 pixels^2 marker, which would always have the same number of pixels covered:
import matplotlib.pyplot as plt
for dpi in [72,100,144]:
fig,ax = plt.subplots(figsize=(1.5,2), dpi=dpi)
ax.set_title("fig.dpi={}".format(dpi))
ax.set_ylim(-3,3)
ax.set_xlim(-2,2)
ax.scatter([0],[1], s=10**2,
marker="s", linewidth=0, label="100 points^2")
ax.scatter([1],[1], s=(10*72./fig.dpi)**2,
marker="s", linewidth=0, label="100 pixels^2")
ax.legend(loc=8,framealpha=1, fontsize=8)
fig.savefig("fig{}.png".format(dpi), bbox_inches="tight")
plt.show()
If you are interested in a scatter in data units, check this answer.
You can use markersize to specify the size of the circle in plot method
import numpy as np
import matplotlib.pyplot as plt
x1 = np.random.randn(20)
x2 = np.random.randn(20)
plt.figure(1)
# you can specify the marker size two ways directly:
plt.plot(x1, 'bo', markersize=20) # blue circle with size 10
plt.plot(x2, 'ro', ms=10,) # ms is just an alias for markersize
plt.show()
From here
It is the area of the marker. I mean if you have s1 = 1000 and then s2 = 4000, the relation between the radius of each circle is: r_s2 = 2 * r_s1. See the following plot:
plt.scatter(2, 1, s=4000, c='r')
plt.scatter(2, 1, s=1000 ,c='b')
plt.scatter(2, 1, s=10, c='g')
I had the same doubt when I saw the post, so I did this example then I used a ruler on the screen to measure the radii.
I also attempted to use 'scatter' initially for this purpose. After quite a bit of wasted time - I settled on the following solution.
import matplotlib.pyplot as plt
input_list = [{'x':100,'y':200,'radius':50, 'color':(0.1,0.2,0.3)}]
output_list = []
for point in input_list:
output_list.append(plt.Circle((point['x'], point['y']), point['radius'], color=point['color'], fill=False))
ax = plt.gca(aspect='equal')
ax.cla()
ax.set_xlim((0, 1000))
ax.set_ylim((0, 1000))
for circle in output_list:
ax.add_artist(circle)
This is based on an answer to this question
If the size of the circles corresponds to the square of the parameter in s=parameter, then assign a square root to each element you append to your size array, like this: s=[1, 1.414, 1.73, 2.0, 2.24] such that when it takes these values and returns them, their relative size increase will be the square root of the squared progression, which returns a linear progression.
If I were to square each one as it gets output to the plot: output=[1, 2, 3, 4, 5]. Try list interpretation: s=[numpy.sqrt(i) for i in s]

plotting matrices with gnuplot

I am trying to plot a matrix in Gnuplot as I would using imshow in Matplotlib. That means I just want to plot the actual matrix values, not the interpolation between values. I have been able to do this by trying
splot "file.dat" u 1:2:3 ps 5 pt 5 palette
This way we are telling the program to use columns 1,2 and 3 in the file, use squares of size 5 and space the points with very narrow gaps. However the points in my dataset are not evenly spaced and hence I get discontinuities.
Anyone a method of plotting matrix values in gnuplot regardless of not evenly spaced in Xa and y axes?
Gnuplot doesn't need to have evenly space X and Y axes. ( see another one of my answers: https://stackoverflow.com/a/10690041/748858 ). I frequently deal with grids that look like x[i] = f_x(i) and y[j] = f_y(j). This is quite trivial to plot, the datafile just looks like:
#datafile.dat
x1 y1 z11
x1 y2 z12
...
x1 yN z1N
#<--- blank line (leave these comments out of your datafile ;)
x2 y1 z21
x2 y2 z22
...
x2 yN z2N
#<--- blank line
...
...
#<--- blank line
xN y1 zN1
...
xN yN zNN
(note the blank lines)
A datafile like that can be plotted as:
set view map
splot "datafile.dat" u 1:2:3 w pm3d
the option set pm3d corners2color can be used to fine tune which corner you want to color the rectangle created.
Also note that you could make essentially the same plot doing this:
set view map
plot "datafile.dat" u 1:2:3 w image
Although I don't use this one myself, so it might fail with a non-equally spaced rectangular grid (you'll need to try it).
Response to your comment
Yes, pm3d does generate (M-1)x(N-1) quadrilaterals as you've alluded to in your comment -- It takes the 4 corners and (by default) averages their value to assign a color. You seem to dislike this -- although (in most cases) I doubt you'd be able to tell a difference in the plot for reasonably large M and N (larger than 20). So, before we go on, you may want to ask yourself if it is really necessary to plot EVERY POINT.
That being said, with a little work, gnuplot can still do what you want. The solution is to specify that a particular corner is to be used to assign the color to the entire quadrilateral.
#specify that the first corner should be used for coloring the quadrilateral
set pm3d corners2color c1 #could also be c2,c3, or c4.
Then simply append the last row and last column of your matrix to plot it twice (making up an extra gridpoint to accommodate the larger dataset. You're not quite there yet, you still need to shift your grid values by half a cell so that your quadrilaterals are centered on the point in question -- which way you shift the cells depends on your choice of corner (c1,c2,c3,c4) -- You'll need to play around with it to figure out which one you want.
Note that the problem here isn't gnuplot. It's that there isn't enough information in the datafile to construct an MxN surface given MxN triples. At each point, you need to know it's position (x,y) it's value (z) and also the size of the quadrilateral to be draw there -- which is more information than you've packed into the file. Of course, you can guess the size in the interior points (just meet halfway), but there's no guessing on the exterior points. but why not just use the size of the next interior point?. That's a good question, and it would (typically) work well for rectangular grids, but that is only a special case (although a common one) -- which would (likely) fail miserably for many other grids. The point is that gnuplot decided that averaging the corners is typically "close enough", but then gives you the option to change it.
See the explanation for the input data here. You may have to change your data file's format accordingly.