I am trying to create an image from a matrix z2 over a raster defined by np.meshgrid(grid_x, grid_y) such that the value of the image at vx=grid_x[i], vy=grid_y[j] is z2[i, j]. On top of this image, I am trying to add a scatter plot of a number of points obtained by three vectors x, y, z such that the i-th point has the coordinate (x[k], y[k]) and the value z[k]. All of these scattered points lies within the region of the aforementioned raster.
Here's an example of the aforementioned data I am trying to plot.
import numpy as np
np.random.seed(1)
z2 = np.ones((1000, 1000)) * 0.66
z2[0, 0] = 0
z2[-1, -1] = 1
x = np.random.rand(1000) * 1000
y = np.random.rand(1000) * 1000
z = np.random.rand(1000)
grid_x = np.linspace(0, 999, 1000)
grid_y = np.linspace(0, 999, 1000)
In order to do this, I am using a 2D plot where the x and y values are used to define the position of the points and z is indicated by a color drawn from a colormap.
What is required of this image is that 1) there should be no white space between the actual plot and the edge of the figure; 2) the unit length on the x and y axis should be equal; 3) the image should not be too large. In order to achieve these, I am using the following code for plotting.
import matplotlib.pyplot as plt
from matplotlib import cm
def plot_img(x, y, z, grid_x, grid_y, z2, set_fig_size=True):
# determine the figure size
if set_fig_size:
height, width = np.array(z2.shape, dtype=float)
dpi = max(max(640 // height, 640 // width), 1)
width, height = width * dpi, height * dpi
plt.gcf().set_size_inches(width, height)
plt.gcf().set_dpi(dpi)
# plot the figure
plt.gca().axis('off')
plt.gca().axis('equal')
plt.gca().set_position([0, 0, 1, 1])
plt.xlim((grid_x[0], grid_x[-1]))
plt.ylim((grid_y[0], grid_y[-1]))
# the raster
cmap = cm.get_cmap('gray')
cmap.set_bad(color='red', alpha=0.5)
plt.imshow(z2, cmap=cmap, interpolation='none', origin='lower',
extent=(grid_x[0], grid_x[-1], grid_y[0], grid_y[-1]))
# the scatter plot
min_z, max_z = np.min(z), np.max(z)
c = (z - min_z) / (max_z - min_z)
plt.scatter(x, y, marker='o', c=c, cmap='Greens')
plt.show()
Strangely, when I run plot_img(x, y, z, grid_x, grid_y, z2) using the aforementioned example data, the following image shows up.
Essentially, only the raster data got plotted, while the scattered data is not.
I then tried plot_img(x, y, z, grid_x, grid_y, z2, set_fig_size=False). The result is
Note that here to clearly show the white spaces in the figure, I kept the background of PyCharm surrounding it. Essentially, there are white spaces that I do not wish included in this figure.
I wonder why this is happening, and how I can fix the code to get the correct output, which is essentially the second result without the white spaces. Thanks!
Replace your dpi and figsize code by
# determine the figure size
height, width = np.array(z2.shape, dtype=float)
dpi = 200
# get size in inches:
width, height = height / dpi, width / dpi
plt.gcf().set_size_inches(width, height)
plt.gcf().set_dpi(dpi)
and you will have a 1000x1000 pixel figure, which at 200 dpi is 5"x5".
I am trying to plot some data with a discrete color bar. I was following the example given (https://gist.github.com/jakevdp/91077b0cae40f8f8244a) but the issue is this example does not work 1-1 with different spacing. For example, the spacing in the example in the link is for only increasing by 1 but my data is increasing by 0.5. You can see the output from the code I have.. Any help with this would be appreciated. I know I am missing something key here but cant figure it out.
import matplotlib.pylab as plt
import numpy as np
def discrete_cmap(N, base_cmap=None):
"""Create an N-bin discrete colormap from the specified input map"""
# Note that if base_cmap is a string or None, you can simply do
# return plt.cm.get_cmap(base_cmap, N)
# The following works for string, None, or a colormap instance:
base = plt.cm.get_cmap(base_cmap)
color_list = base(np.linspace(0, 1, N))
cmap_name = base.name + str(N)
return base.from_list(cmap_name, color_list, N)
num=11
x = np.random.randn(40)
y = np.random.randn(40)
c = np.random.randint(num, size=40)
plt.figure(figsize=(10,7.5))
plt.scatter(x, y, c=c, s=50, cmap=discrete_cmap(num, 'jet'))
plt.colorbar(ticks=np.arange(0,5.5,0.5))
plt.clim(-0.5, num - 0.5)
plt.show()
Not sure what version of matplotlib/pyplot introduced this, but plt.get_cmap now supports an int argument specifying the number of colors you want to get, for discrete colormaps.
This automatically results in the colorbar being discrete.
By the way, pandas has an even better handling of the colorbar.
import numpy as np
from matplotlib import pyplot as plt
plt.style.use('ggplot')
# remove if not using Jupyter/IPython
%matplotlib inline
# choose number of clusters and number of points in each cluster
n_clusters = 5
n_samples = 20
# there are fancier ways to do this
clusters = np.array([k for k in range(n_clusters) for i in range(n_samples)])
# generate the coordinates of the center
# of each cluster by shuffling a range of values
clusters_x = np.arange(n_clusters)
clusters_y = np.arange(n_clusters)
np.random.shuffle(clusters_x)
np.random.shuffle(clusters_y)
# get dicts like cluster -> center coordinate
x_dict = dict(enumerate(clusters_x))
y_dict = dict(enumerate(clusters_y))
# get coordinates of cluster center for each point
x = np.array(list(x_dict[k] for k in clusters)).astype(float)
y = np.array(list(y_dict[k] for k in clusters)).astype(float)
# add noise
x += np.random.normal(scale=0.5, size=n_clusters*n_samples)
y += np.random.normal(scale=0.5, size=n_clusters*n_samples)
### Finally, plot
fig, ax = plt.subplots(figsize=(12,8))
# get discrete colormap
cmap = plt.get_cmap('viridis', n_clusters)
# scatter points
scatter = ax.scatter(x, y, c=clusters, cmap=cmap)
# scatter cluster centers
ax.scatter(clusters_x, clusters_y, c='red')
# add colorbar
cbar = plt.colorbar(scatter)
# set ticks locations (not very elegant, but it works):
# - shift by 0.5
# - scale so that the last value is at the center of the last color
tick_locs = (np.arange(n_clusters) + 0.5)*(n_clusters-1)/n_clusters
cbar.set_ticks(tick_locs)
# set tick labels (as before)
cbar.set_ticklabels(np.arange(n_clusters))
Ok so this is the hack I found for my own question. I am sure there is a better way to do this but this works for what I am doing. Feel free to suggest a better way to do this.
import numpy as np
import matplotlib.pylab as plt
def discrete_cmap(N, base_cmap=None):
"""Create an N-bin discrete colormap from the specified input map"""
# Note that if base_cmap is a string or None, you can simply do
# return plt.cm.get_cmap(base_cmap, N)
# The following works for string, None, or a colormap instance:
base = plt.cm.get_cmap(base_cmap)
color_list = base(np.linspace(0, 1, N))
cmap_name = base.name + str(N)
return base.from_list(cmap_name, color_list, N)
num=11
plt.figure(figsize=(10,7.5))
x = np.random.randn(40)
y = np.random.randn(40)
c = np.random.randint(num, size=40)
plt.scatter(x, y, c=c, s=50, cmap=discrete_cmap(num, 'jet'))
cbar=plt.colorbar(ticks=range(num))
plt.clim(-0.5, num - 0.5)
cbar.ax.set_yticklabels(np.arange(0.0,5.5,0.5))
plt.show()
For some reason I cannot upload the image associated with the code above. I get an error when uploading so not sure how to show the final example. But simply I set the color bar axes for tick labels for a vertical color bar and passed in the labels I want and it produced the correct output.
I have a masked array which is used by matplotlib.plt.contourf to project a temperature contour on a glabal map. I was trying to smooth the contour, but unfortunately none of the proposed solutions seems to be able to handle masked array. I tested these solutions:
-scipy.ndimage.gaussian_filter - moving averages
scipy.ndimage.zoom
none of them works(they count in the masked values also). Is there any way I can smooth my contour on maskedArray
I have added this part after trying the proposed 'inpaint' solution and the results were unchanged. here is the code (if it helps)
import Scientific.IO.NetCDF as S
import mpl_toolkits.basemap as bm
import numpy.ma as MA
import numpy as np
import matplotlib.pyplot as plt
import inpaint
def main():
fileobj = S.NetCDFFile('Bias.ANN.tas_A1_1.nc', mode='r')
# take the values
set1 = {'time', 'lat', 'lon'}
set2 = set(fileobj.variables.keys())
set3 = set2 - set1
datadim = set3.pop()
print "******************datadim: "+datadim
data = fileobj.variables[datadim].getValue()[0,:,:]
lon = fileobj.variables['lon'].getValue()
lat = fileobj.variables['lat'].getValue()
fileobj.close()
data, lon = bm.shiftgrid(180.,data, lon,start=False)
data = MA.masked_equal(data, 1.0e20)
#data2 = inpaint.replace_nans(data, 10, 0.25, 2, 'idw')
#- Make 2-D longitude and latitude arrays:
[lon2d, lat2d] =np.meshgrid(lon, lat)
#- Set up map:
mapproj = bm.Basemap(projection='cyl',
llcrnrlat=-90.0, llcrnrlon=-180.00,
urcrnrlat=90.0, urcrnrlon=180.0)
mapproj.drawcoastlines(linewidth=.5)
mapproj.drawmapboundary(fill_color='.8')
#mapproj.drawparallels(N.array([-90, -45, 0, 45, 90]), labels=[1,0,0,0])
#mapproj.drawmeridians(N.array([0, 90, 180, 270, 360]), labels=[0,0,0,1])
lonall, latall = mapproj(lon2d, lat2d)
cmap=plt.cm.Spectral
#- Make a contour plot of the temperature:
mymapf = plt.contourf(lonall, latall, data, 20, cmap=cmap)
#plt.clabel(mymapf, fontsize=12)
plt.title(cmap.name)
plt.colorbar(mymapf, orientation='horizontal')
plt.savefig('sample2.png', dpi=150, edgecolor='red', format='png', bbox_inches='tight', pad_inches=.2)
plt.close()
if __name__ == "__main__":
main()
I am comparing the output from this code (the first figure), with output of the same datafile from Panoply. Zoomin in and looking more precisely it seems like it is not the smoothness problem, but the pyplot model provides one stripe slimmer, or the contours are cut earlier (the outer boundaries shows this clearly, and inner contours are different due to this fact). It makes it to look like that the pyplot model is not as smooth as the Panoply one. how can I get (nearly) the same model? Am I distinguishing it right?
I had similar problem and google pointed me to this: blog post. Basically he's using inpaint algorithm to interpolate missing values and produce valid array for filtering.
The code is at the end of the post, and you can save it to site-packages (or else) and load it as module (i.e. inpaint.py):
import inpaint
filled = inpaint.replace_nans(NANMask, 5, 0.5, 2, 'idw')
I'm happy with the result, and I guess it will suite missing temperature values just fine. There is also next version here: github but code will need some cleaning for general usage as it's part of a project.
For reference, easy use and preservation sake I'll post the code (of initial version) here:
# -*- coding: utf-8 -*-
"""A module for various utilities and helper functions"""
import numpy as np
#cimport numpy as np
#cimport cython
DTYPEf = np.float64
#ctypedef np.float64_t DTYPEf_t
DTYPEi = np.int32
#ctypedef np.int32_t DTYPEi_t
##cython.boundscheck(False) # turn of bounds-checking for entire function
##cython.wraparound(False) # turn of bounds-checking for entire function
def replace_nans(array, max_iter, tol,kernel_size=1,method='localmean'):
"""Replace NaN elements in an array using an iterative image inpainting algorithm.
The algorithm is the following:
1) For each element in the input array, replace it by a weighted average
of the neighbouring elements which are not NaN themselves. The weights depends
of the method type. If ``method=localmean`` weight are equal to 1/( (2*kernel_size+1)**2 -1 )
2) Several iterations are needed if there are adjacent NaN elements.
If this is the case, information is "spread" from the edges of the missing
regions iteratively, until the variation is below a certain threshold.
Parameters
----------
array : 2d np.ndarray
an array containing NaN elements that have to be replaced
max_iter : int
the number of iterations
kernel_size : int
the size of the kernel, default is 1
method : str
the method used to replace invalid values. Valid options are
`localmean`, 'idw'.
Returns
-------
filled : 2d np.ndarray
a copy of the input array, where NaN elements have been replaced.
"""
# cdef int i, j, I, J, it, n, k, l
# cdef int n_invalids
filled = np.empty( [array.shape[0], array.shape[1]], dtype=DTYPEf)
kernel = np.empty( (2*kernel_size+1, 2*kernel_size+1), dtype=DTYPEf )
# cdef np.ndarray[np.int_t, ndim=1] inans
# cdef np.ndarray[np.int_t, ndim=1] jnans
# indices where array is NaN
inans, jnans = np.nonzero( np.isnan(array) )
# number of NaN elements
n_nans = len(inans)
# arrays which contain replaced values to check for convergence
replaced_new = np.zeros( n_nans, dtype=DTYPEf)
replaced_old = np.zeros( n_nans, dtype=DTYPEf)
# depending on kernel type, fill kernel array
if method == 'localmean':
print 'kernel_size', kernel_size
for i in range(2*kernel_size+1):
for j in range(2*kernel_size+1):
kernel[i,j] = 1
print kernel, 'kernel'
elif method == 'idw':
kernel = np.array([[0, 0.5, 0.5, 0.5,0],
[0.5,0.75,0.75,0.75,0.5],
[0.5,0.75,1,0.75,0.5],
[0.5,0.75,0.75,0.5,1],
[0, 0.5, 0.5 ,0.5 ,0]])
print kernel, 'kernel'
else:
raise ValueError( 'method not valid. Should be one of `localmean`.')
# fill new array with input elements
for i in range(array.shape[0]):
for j in range(array.shape[1]):
filled[i,j] = array[i,j]
# make several passes
# until we reach convergence
for it in range(max_iter):
print 'iteration', it
# for each NaN element
for k in range(n_nans):
i = inans[k]
j = jnans[k]
# initialize to zero
filled[i,j] = 0.0
n = 0
# loop over the kernel
for I in range(2*kernel_size+1):
for J in range(2*kernel_size+1):
# if we are not out of the boundaries
if i+I-kernel_size < array.shape[0] and i+I-kernel_size >= 0:
if j+J-kernel_size < array.shape[1] and j+J-kernel_size >= 0:
# if the neighbour element is not NaN itself.
if filled[i+I-kernel_size, j+J-kernel_size] == filled[i+I-kernel_size, j+J-kernel_size] :
# do not sum itself
if I-kernel_size != 0 and J-kernel_size != 0:
# convolve kernel with original array
filled[i,j] = filled[i,j] + filled[i+I-kernel_size, j+J-kernel_size]*kernel[I, J]
n = n + 1*kernel[I,J]
# divide value by effective number of added elements
if n != 0:
filled[i,j] = filled[i,j] / n
replaced_new[k] = filled[i,j]
else:
filled[i,j] = np.nan
# check if mean square difference between values of replaced
#elements is below a certain tolerance
print 'tolerance', np.mean( (replaced_new-replaced_old)**2 )
if np.mean( (replaced_new-replaced_old)**2 ) < tol:
break
else:
for l in range(n_nans):
replaced_old[l] = replaced_new[l]
return filled
def sincinterp(image, x, y, kernel_size=3 ):
"""Re-sample an image at intermediate positions between pixels.
This function uses a cardinal interpolation formula which limits
the loss of information in the resampling process. It uses a limited
number of neighbouring pixels.
The new image :math:`im^+` at fractional locations :math:`x` and :math:`y` is computed as:
.. math::
im^+(x,y) = \sum_{i=-\mathtt{kernel\_size}}^{i=\mathtt{kernel\_size}} \sum_{j=-\mathtt{kernel\_size}}^{j=\mathtt{kernel\_size}} \mathtt{image}(i,j) sin[\pi(i-\mathtt{x})] sin[\pi(j-\mathtt{y})] / \pi(i-\mathtt{x}) / \pi(j-\mathtt{y})
Parameters
----------
image : np.ndarray, dtype np.int32
the image array.
x : two dimensions np.ndarray of floats
an array containing fractional pixel row
positions at which to interpolate the image
y : two dimensions np.ndarray of floats
an array containing fractional pixel column
positions at which to interpolate the image
kernel_size : int
interpolation is performed over a ``(2*kernel_size+1)*(2*kernel_size+1)``
submatrix in the neighbourhood of each interpolation point.
Returns
-------
im : np.ndarray, dtype np.float64
the interpolated value of ``image`` at the points specified
by ``x`` and ``y``
"""
# indices
# cdef int i, j, I, J
# the output array
r = np.zeros( [x.shape[0], x.shape[1]], dtype=DTYPEf)
# fast pi
pi = 3.1419
# for each point of the output array
for I in range(x.shape[0]):
for J in range(x.shape[1]):
#loop over all neighbouring grid points
for i in range( int(x[I,J])-kernel_size, int(x[I,J])+kernel_size+1 ):
for j in range( int(y[I,J])-kernel_size, int(y[I,J])+kernel_size+1 ):
# check that we are in the boundaries
if i >= 0 and i <= image.shape[0] and j >= 0 and j <= image.shape[1]:
if (i-x[I,J]) == 0.0 and (j-y[I,J]) == 0.0:
r[I,J] = r[I,J] + image[i,j]
elif (i-x[I,J]) == 0.0:
r[I,J] = r[I,J] + image[i,j] * np.sin( pi*(j-y[I,J]) )/( pi*(j-y[I,J]) )
elif (j-y[I,J]) == 0.0:
r[I,J] = r[I,J] + image[i,j] * np.sin( pi*(i-x[I,J]) )/( pi*(i-x[I,J]) )
else:
r[I,J] = r[I,J] + image[i,j] * np.sin( pi*(i-x[I,J]) )*np.sin( pi*(j-y[I,J]) )/( pi*pi*(i-x[I,J])*(j-y[I,J]))
return r
#cdef extern from "math.h":
# double sin(double)
A simple smoothing function that works with masked data will solve this. One can then avoid the approaches that involve making up data (ie, interpolating, inpainting, etc); and making up data should always be avoided.
The main issue that arises when smoothing masked data is that for each point, smoothing uses the neighboring values to calculate a new value at a center point, but when those neighbors are masked, the new value for the center point will also become masked due to the rules of masked arrays. Therefore, one needs to do the calculation with unmasked data, and explicitly account for the mask. That's easy to do, and is not in the function smooth below.
from numpy import *
import pylab as plt
# make a grid and a striped mask as test data
N = 100
x = linspace(0, 5, N, endpoint=True)
grid = 2. + 1.*(sin(2*pi*x)[:,newaxis]*sin(2*pi*x)>0.)
m = resize((sin(pi*x)>0), (N,N))
plt.imshow(grid.copy(), cmap='jet', interpolation='nearest')
plt.colorbar()
plt.title('original data')
def smooth(u, mask):
m = ~mask
r = u*m # set all 'masked' points to 0. so they aren't used in the smoothing
a = 4*r[1:-1,1:-1] + r[2:,1:-1] + r[:-2,1:-1] + r[1:-1,2:] + r[1:-1,:-2]
b = 4*m[1:-1,1:-1] + m[2:,1:-1] + m[:-2,1:-1] + m[1:-1,2:] + m[1:-1,:-2] # a divisor that accounts for masked points
b[b==0] = 1. # for avoiding divide by 0 error (region is masked so value doesn't matter)
u[1:-1,1:-1] = a/b
# run the data through the smoothing filter a few times
for i in range(10):
smooth(grid, m)
mg = ma.array(grid, mask=m) # put together the mask and the data
plt.figure()
plt.imshow(mg, cmap='jet', interpolation='nearest')
plt.colorbar()
plt.title('smoothed with mask')
plt.show()
The main point is that at the boundary of the mask, the masked values are not used in the smoothing. (This is also where the grid squares switch values, so it would be clear in the figure if the masked neighboring values were being used.)
We also just had this problem and the astropy package has us covered:
import numpy as np
import matplotlib.pyplot as plt
# Some Axes
x = np.arange(100)
y = np.arange(100)
#Some Interesting Shape
z = np.array(np.outer(np.sin((x+y)/10),np.sin(y/3)),dtype=float)
# some mask
mask = np.outer(np.sin((x+y)/20),np.sin(y/5))**2>.9
# masked data represent noise, so lets put in some trash into the masked points
z[mask] = (np.random.random(size = (100,100))*10)[mask]
# masked data
z_masked = np.ma.masked_array(z, mask)
# "Conventional" filter
filter_kernelsize = 2
import scipy.ndimage
z_filtered_bad = scipy.ndimage.gaussian_filter(z_masked,filter_kernelsize)
# Lets filter it
import astropy.convolution.convolve
from astropy.convolution import Gaussian2DKernel
k = Gaussian2DKernel(1.5)
z_filtered = astropy.convolution.convolve(z_masked, k, boundary='extend')
### Plots:
fig, axes = plt.subplots(2,2)
plt.sca(axes[0,0])
plt.title('Raw Data')
plt.imshow(z)
plt.colorbar()
plt.sca(axes[0,1])
plt.title('Raw Data Masked')
plt.imshow(z_masked)
plt.colorbar()
plt.sca(axes[1,0])
plt.title('ndimage filter (ignores mask)')
plt.imshow(z_filtered_bad)
plt.colorbar()
plt.sca(axes[1,1])
plt.title('astropy filter (uses mask)')
plt.imshow(z_filtered)
plt.colorbar()
plt.tight_layout()
Output plot of the code