Numerical evaluation of arctangent functions - numpy

I am having difficulties to interpret results of arctangent functions. This behaviour is consistent for all implementations I came across, so I will here limit myself to NumPy and MATLAB.
The idea is to have circle of randomly placed points. The goal is to represent their positions in polar coordinate system and since they are uniformly distributed, I expect the θ angle (which is calculated using atan2 function) to be also distributed randomly over interval -π ... π.
Here is the code for MATLAB:
stp = 2*pi/2^8;
siz = 100;
num = 100000000;
x = randi([-siz, siz], [1, num]);
y = randi([-siz, siz], [1, num]);
m = (x.^2+y.^2) < siz^2;
[t, ~] = cart2pol(x(m), y(m));
figure()
histogram(t, -pi:stp:pi);
And here for Python & NumPy:
import numpy as np
import matplotlib.pyplot as pl
siz = 100
num = 100000000
rng = np.random.default_rng()
x = rng.integers(low=-siz, high=siz, size=num, endpoint=True)
y = rng.integers(low=-siz, high=siz, size=num, endpoint=True)
m = (x**2+y**2) < siz**2
t = np.arctan2(y[m], x[m]);
pl.hist(t, range=[-np.pi, np.pi], bins=2**8)
pl.show()
In both cases I got results looking like this, where one can easily see "steps" for each multiple of π/4.
It looks like some sort of precision error, but strangely for angles where I would not expect that. Also this behaviour is present for ordinary atan function as well.

Notice that you are using integers
So for each pair (p,q) you will have floor(sqrt(p**2 + q**2)/gcd(p,q)/r) pairs that give the same angle arctan(p,q). Then for the multiples of (p,q) the gcd(p,q) is 1
Notice also that p**2+q**2 is 1 for the multiples of pi/2 and 2 for the odd multiples of pi/4, with this we can predict that there will be more items that are even multiples of pi/4 than odd mulitples of pi/4. And this agrees with what we see in your plot.
Example
Let's plot the points with integer coordinates that lie in a circle of radius 10.
import numpy as np
import matplotlib.pyplot as plt
from collections import Counter
def gcd(a,b):
if a == 0 or b == 0:
return max(a,b)
while b != 0:
a,b = b, a%b
return a;
R = 10
x,y = np.indices((R+1, R+1))
m = (x**2 + y**2) <= R**2
x,y = x[m], y[m]
t = np.linspace(0, np.pi / 2)
plt.figure(figsize=(6, 6))
plt.plot(x, y, 'o')
plt.plot(R * np.cos(t), R * np.sin(t))
lines = Counter((xi / gcd(xi,yi),
yi / gcd(xi,yi)) for xi, yi in zip(x,y))
plt.axis('off')
for (x,y),f in lines.items():
if f != 1:
r = np.sqrt(x**2 + y**2)
plt.plot([0, R*x/r], [0, R*y/r], alpha=0.25)
plt.text(R*1.03*x/r, R*1.03*y/r, f'{int(y)}/{int(x)}: {f}')
Here you see on the plot a few points that share the same angle as some other. For the 45 degrees there are 7 points, and for multiples of 90 there are 10. Many of the points have a unique angle.
Basically you have many angles with few poitns and a few angles that hit many points.
But overall the points are distributed nearly uniformly with respect to angle. Here I plot the cumulative frequency that is nearly a straight line (what it would be if the distribution was unifrom), and the bin frequency form some triangular fractal pattern.
R = 20
x,y = np.indices((R+1, R+1))
m = (x**2 + y**2) <= R**2
x,y = x[m], y[m]
plt.figure(figsize=(6,6))
plt.subplot(211)
plt.plot(np.sort(np.arctan2(x,y))*180/np.pi, np.arange(len(x)), '.', markersize=1)
plt.subplot(212)
plt.plot(np.arctan2(x,y)*180/np.pi, np.gcd(x,y), '.', markersize=4)
If the size of the circle increases and you do a histogram with sufficiently wide bins you will not notice the variations, otherwise you will see this pattern in the histogram.

Related

Is the pyplot contour plot using bin centers or bin edges?

I have trouble understanding the documentation for matplotlib.pyplot.contour (https://matplotlib.org/3.5.0/api/_as_gen/matplotlib.pyplot.contour.html). The documentation says that the arguments for contour are X,Y-coordinates and Z heights.
However, my plot is offset by a half bin width. So it seems to me as if the contour plot needs x_edges and y_edges instead. Could someone help me understand how this works and what is the proper way to use this function?
Documentation says this:
X, Y: array-like, optional
The coordinates of the values in Z.
X and Y must both be 2D with the same shape as Z (e.g. created via numpy.meshgrid), or they must both be 1-D such that len(X) == N is the number of columns in Z and len(Y) == M is the number of rows in Z. [...]
Z(M, N): array-like
The height values over which the contour is drawn.
This is my code:
import matplotlib.pyplot as plt
import numpy as np
fig, ax = plt.subplots()
values = np.tile(3,(10,2)) # array of [[3,3] ... [3,3]]
X = Y = np.arange(0.5, 6.5, 1) # 0.5 ... 5.5
edges = np.arange(0, 7, 1) # 0, 1, 2, 3, 4, 5, 6
Z, xedges, yedges = np.histogram2d(values[0],
values[1],
bins=(edges, edges))
# len(edges) == 7; len(X) == len(Y) == 6
# Z.shape == (6, 6)
# contour reference says use "The coordinates of the values in Z."
c = ax.contour(X, Y, Z)
plt.savefig('peak_at_3p5.png')
plt.cla()
# however I only get the intended result when using edges
c = ax.contour(xedges[:-1], yedges[:-1], Z)
plt.savefig('peak_at_3p0.png')
Here's the result:
The two figures that the code produces
I also tried using numpy.meshgrid to construct X and Y as 2D arrays but the peak in the figure is still offset by a half bin width.

Initial conditions in odeint or solve_ivp

I am new to matplotlib/python. I'd like to solve some simple odes with odeint or solve_ivp in some interval, say for example the segment[0, 10], in which the "initial condition" is at t_0= 5 and not t_0 = 0. It seems to me that both odeint and solve_ivp take, as value y_0, the initial condition, that is, y_0 is the value of the function at the beginning of the specified time array t.
How can i solve an ode, with a condition in the middle of my interval of integration?
Let's say you have the following ODE: a body in the earth's gravitational field
''' x = distance from the sea level in meters
R = the radius of earth in meters
g = the gravitational constant at seal level in meters/seconds^2
v = velocity in meters/s
'''
v * dv / dx = -g * R^2 / (R + x)^2
A ball is tossed from a building that is 30 m high at a velocity of 20 m/s. The non-zero boundary condition is therefore:
v( 30 ) = 20
To plot this solution, shift the values in the x coordinates so that it starts at 30 meters, the height of the building. Not 0 meters, which represents the ground.
import numpy as np
from scipy.integrate import odeint
import matplotlib.pyplot as plt
def model(v, x):
#constants
g = 9.80665 #gravitational force at sea level
R = 6.371e6 #radius of earth
gamma = - g*(R/(R+x))**2
dvdx = 1/v*gamma
return dvdx
#Boundary conditions
v_30 = 20 # Velocity is 20 m/s at x = 30m
#time points
x = np.linspace(30, 55) # Start at 30 and not 0
#solve ODE
v_x = odeint(model, v_30, x)
#plot results
plt.plot(x, v_x)

Show confidence limits and prediction limits in scatter plot

I have two arrays of data for height and weight:
import numpy as np, matplotlib.pyplot as plt
heights = np.array([50,52,53,54,58,60,62,64,66,67,68,70,72,74,76,55,50,45,65])
weights = np.array([25,50,55,75,80,85,50,65,85,55,45,45,50,75,95,65,50,40,45])
plt.plot(heights,weights,'bo')
plt.show()
How can I produce a plot similar to the following?
Here's what I put together. I tried to closely emulate your screenshot.
Given
import numpy as np
import scipy as sp
import scipy.stats as stats
import matplotlib.pyplot as plt
%matplotlib inline
# Raw Data
heights = np.array([50,52,53,54,58,60,62,64,66,67,68,70,72,74,76,55,50,45,65])
weights = np.array([25,50,55,75,80,85,50,65,85,55,45,45,50,75,95,65,50,40,45])
Two detailed options to plot confidence intervals:
def plot_ci_manual(t, s_err, n, x, x2, y2, ax=None):
"""Return an axes of confidence bands using a simple approach.
Notes
-----
.. math:: \left| \: \hat{\mu}_{y|x0} - \mu_{y|x0} \: \right| \; \leq \; T_{n-2}^{.975} \; \hat{\sigma} \; \sqrt{\frac{1}{n}+\frac{(x_0-\bar{x})^2}{\sum_{i=1}^n{(x_i-\bar{x})^2}}}
.. math:: \hat{\sigma} = \sqrt{\sum_{i=1}^n{\frac{(y_i-\hat{y})^2}{n-2}}}
References
----------
.. [1] M. Duarte. "Curve fitting," Jupyter Notebook.
http://nbviewer.ipython.org/github/demotu/BMC/blob/master/notebooks/CurveFitting.ipynb
"""
if ax is None:
ax = plt.gca()
ci = t * s_err * np.sqrt(1/n + (x2 - np.mean(x))**2 / np.sum((x - np.mean(x))**2))
ax.fill_between(x2, y2 + ci, y2 - ci, color="#b9cfe7", edgecolor="")
return ax
def plot_ci_bootstrap(xs, ys, resid, nboot=500, ax=None):
"""Return an axes of confidence bands using a bootstrap approach.
Notes
-----
The bootstrap approach iteratively resampling residuals.
It plots `nboot` number of straight lines and outlines the shape of a band.
The density of overlapping lines indicates improved confidence.
Returns
-------
ax : axes
- Cluster of lines
- Upper and Lower bounds (high and low) (optional) Note: sensitive to outliers
References
----------
.. [1] J. Stults. "Visualizing Confidence Intervals", Various Consequences.
http://www.variousconsequences.com/2010/02/visualizing-confidence-intervals.html
"""
if ax is None:
ax = plt.gca()
bootindex = sp.random.randint
for _ in range(nboot):
resamp_resid = resid[bootindex(0, len(resid) - 1, len(resid))]
# Make coeffs of for polys
pc = sp.polyfit(xs, ys + resamp_resid, 1)
# Plot bootstrap cluster
ax.plot(xs, sp.polyval(pc, xs), "b-", linewidth=2, alpha=3.0 / float(nboot))
return ax
Code
# Computations ----------------------------------------------------------------
# Modeling with Numpy
def equation(a, b):
"""Return a 1D polynomial."""
return np.polyval(a, b)
x = heights
y = weights
p, cov = np.polyfit(x, y, 1, cov=True) # parameters and covariance from of the fit of 1-D polynom.
y_model = equation(p, x) # model using the fit parameters; NOTE: parameters here are coefficients
# Statistics
n = weights.size # number of observations
m = p.size # number of parameters
dof = n - m # degrees of freedom
t = stats.t.ppf(0.975, n - m) # t-statistic; used for CI and PI bands
# Estimates of Error in Data/Model
resid = y - y_model # residuals; diff. actual data from predicted values
chi2 = np.sum((resid / y_model)**2) # chi-squared; estimates error in data
chi2_red = chi2 / dof # reduced chi-squared; measures goodness of fit
s_err = np.sqrt(np.sum(resid**2) / dof) # standard deviation of the error
# Plotting --------------------------------------------------------------------
fig, ax = plt.subplots(figsize=(8, 6))
# Data
ax.plot(
x, y, "o", color="#b9cfe7", markersize=8,
markeredgewidth=1, markeredgecolor="b", markerfacecolor="None"
)
# Fit
ax.plot(x, y_model, "-", color="0.1", linewidth=1.5, alpha=0.5, label="Fit")
x2 = np.linspace(np.min(x), np.max(x), 100)
y2 = equation(p, x2)
# Confidence Interval (select one)
plot_ci_manual(t, s_err, n, x, x2, y2, ax=ax)
#plot_ci_bootstrap(x, y, resid, ax=ax)
# Prediction Interval
pi = t * s_err * np.sqrt(1 + 1/n + (x2 - np.mean(x))**2 / np.sum((x - np.mean(x))**2))
ax.fill_between(x2, y2 + pi, y2 - pi, color="None", linestyle="--")
ax.plot(x2, y2 - pi, "--", color="0.5", label="95% Prediction Limits")
ax.plot(x2, y2 + pi, "--", color="0.5")
#plt.show()
The following modifications are optional, originally implemented to mimic the OP's desired result.
# Figure Modifications --------------------------------------------------------
# Borders
ax.spines["top"].set_color("0.5")
ax.spines["bottom"].set_color("0.5")
ax.spines["left"].set_color("0.5")
ax.spines["right"].set_color("0.5")
ax.get_xaxis().set_tick_params(direction="out")
ax.get_yaxis().set_tick_params(direction="out")
ax.xaxis.tick_bottom()
ax.yaxis.tick_left()
# Labels
plt.title("Fit Plot for Weight", fontsize="14", fontweight="bold")
plt.xlabel("Height")
plt.ylabel("Weight")
plt.xlim(np.min(x) - 1, np.max(x) + 1)
# Custom legend
handles, labels = ax.get_legend_handles_labels()
display = (0, 1)
anyArtist = plt.Line2D((0, 1), (0, 0), color="#b9cfe7") # create custom artists
legend = plt.legend(
[handle for i, handle in enumerate(handles) if i in display] + [anyArtist],
[label for i, label in enumerate(labels) if i in display] + ["95% Confidence Limits"],
loc=9, bbox_to_anchor=(0, -0.21, 1., 0.102), ncol=3, mode="expand"
)
frame = legend.get_frame().set_edgecolor("0.5")
# Save Figure
plt.tight_layout()
plt.savefig("filename.png", bbox_extra_artists=(legend,), bbox_inches="tight")
plt.show()
Output
Using plot_ci_manual():
Using plot_ci_bootstrap():
Hope this helps. Cheers.
Details
I believe that since the legend is outside the figure, it does not show up in matplotblib's popup window. It works fine in Jupyter using %maplotlib inline.
The primary confidence interval code (plot_ci_manual()) is adapted from another source producing a plot similar to the OP. You can select a more advanced technique called residual bootstrapping by uncommenting the second option plot_ci_bootstrap().
Updates
This post has been updated with revised code compatible with Python 3.
stats.t.ppf() accepts the lower tail probability. According to the following resources, t = sp.stats.t.ppf(0.95, n - m) was corrected to t = sp.stats.t.ppf(0.975, n - m) to reflect a two-sided 95% t-statistic (or one-sided 97.5% t-statistic).
original notebook and equation
statistics reference (thanks #Bonlenfum and #tryptofan)
verified t-value given dof=17
y2 was updated to respond more flexibly with a given model (#regeneration).
An abstracted equation function was added to wrap the model function. Non-linear regressions are possible although not demonstrated. Amend appropriate variables as needed (thanks #PJW).
See Also
This post on plotting bands with statsmodels library.
This tutorial on plotting bands and computing confidence intervals with uncertainties library (install with caution in a separate environment).
You can use seaborn plotting library to create plots as you want.
In [18]: import seaborn as sns
In [19]: heights = np.array([50,52,53,54,58,60,62,64,66,67, 68,70,72,74,76,55,50,45,65])
...: weights = np.array([25,50,55,75,80,85,50,65,85,55,45,45,50,75,95,65,50,40,45])
...:
In [20]: sns.regplot(heights,weights, color ='blue')
Out[20]: <matplotlib.axes.AxesSubplot at 0x13644f60>
I need to do this sort of plot occasionally... this was my first time doing it with Python/Jupyter, and this post helps me a lot, especially the detailed Pylang answer.
I know there are 'easier' ways to get there, but I think this way is much more didactic and allows me to learn step by step what's going on. I even learned here that there are 'prediction intervals'! Thanks.
Below is the Pylang code in a more straightforward fashion, including the calculation of Pearson's correlation (and so the r2) and the mean square error (MSE). Of course, the final plot (!) must be adapted for every dataset...
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
heights = np.array([50,52,53,54,58,60,62,64,66,67,68,70,72,74,76,55,50,45,65])
weights = np.array([25,50,55,75,80,85,50,65,85,55,45,45,50,75,95,65,50,40,45])
x = heights
y = weights
slope, intercept = np.polyfit(x, y, 1) # linear model adjustment
y_model = np.polyval([slope, intercept], x) # modeling...
x_mean = np.mean(x)
y_mean = np.mean(y)
n = x.size # number of samples
m = 2 # number of parameters
dof = n - m # degrees of freedom
t = stats.t.ppf(0.975, dof) # Students statistic of interval confidence
residual = y - y_model
std_error = (np.sum(residual**2) / dof)**.5 # Standard deviation of the error
# calculating the r2
# https://www.statisticshowto.com/probability-and-statistics/coefficient-of-determination-r-squared/
# Pearson's correlation coefficient
numerator = np.sum((x - x_mean)*(y - y_mean))
denominator = ( np.sum((x - x_mean)**2) * np.sum((y - y_mean)**2) )**.5
correlation_coef = numerator / denominator
r2 = correlation_coef**2
# mean squared error
MSE = 1/n * np.sum( (y - y_model)**2 )
# to plot the adjusted model
x_line = np.linspace(np.min(x), np.max(x), 100)
y_line = np.polyval([slope, intercept], x_line)
# confidence interval
ci = t * std_error * (1/n + (x_line - x_mean)**2 / np.sum((x - x_mean)**2))**.5
# predicting interval
pi = t * std_error * (1 + 1/n + (x_line - x_mean)**2 / np.sum((x - x_mean)**2))**.5
############### Ploting
plt.rcParams.update({'font.size': 14})
fig = plt.figure()
ax = fig.add_axes([.1, .1, .8, .8])
ax.plot(x, y, 'o', color = 'royalblue')
ax.plot(x_line, y_line, color = 'royalblue')
ax.fill_between(x_line, y_line + pi, y_line - pi, color = 'lightcyan', label = '95% prediction interval')
ax.fill_between(x_line, y_line + ci, y_line - ci, color = 'skyblue', label = '95% confidence interval')
ax.set_xlabel('x')
ax.set_ylabel('y')
# rounding and position must be changed for each case and preference
a = str(np.round(intercept))
b = str(np.round(slope,2))
r2s = str(np.round(r2,2))
MSEs = str(np.round(MSE))
ax.text(45, 110, 'y = ' + a + ' + ' + b + ' x')
ax.text(45, 100, '$r^2$ = ' + r2s + ' MSE = ' + MSEs)
plt.legend(bbox_to_anchor=(1, .25), fontsize=12)
For a project of mine, I needed to create intervals for time-series modeling, and to make the procedure more efficient I created tsmoothie: A python library for time-series smoothing and outlier detection in a vectorized way.
It provides different smoothing algorithms together with the possibility to computes intervals.
In the case of linear regression:
import numpy as np
import matplotlib.pyplot as plt
from tsmoothie.smoother import *
from tsmoothie.utils_func import sim_randomwalk
# generate 10 randomwalks of length 50
np.random.seed(33)
data = sim_randomwalk(n_series=10, timesteps=50,
process_noise=10, measure_noise=30)
# operate smoothing
smoother = PolynomialSmoother(degree=1)
smoother.smooth(data)
# generate intervals
low_pi, up_pi = smoother.get_intervals('prediction_interval', confidence=0.05)
low_ci, up_ci = smoother.get_intervals('confidence_interval', confidence=0.05)
# plot the first smoothed timeseries with intervals
plt.figure(figsize=(11,6))
plt.plot(smoother.smooth_data[0], linewidth=3, color='blue')
plt.plot(smoother.data[0], '.k')
plt.fill_between(range(len(smoother.data[0])), low_pi[0], up_pi[0], alpha=0.3, color='blue')
plt.fill_between(range(len(smoother.data[0])), low_ci[0], up_ci[0], alpha=0.3, color='blue')
In the case of regression with order bigger than 1:
# operate smoothing
smoother = PolynomialSmoother(degree=5)
smoother.smooth(data)
# generate intervals
low_pi, up_pi = smoother.get_intervals('prediction_interval', confidence=0.05)
low_ci, up_ci = smoother.get_intervals('confidence_interval', confidence=0.05)
# plot the first smoothed timeseries with intervals
plt.figure(figsize=(11,6))
plt.plot(smoother.smooth_data[0], linewidth=3, color='blue')
plt.plot(smoother.data[0], '.k')
plt.fill_between(range(len(smoother.data[0])), low_pi[0], up_pi[0], alpha=0.3, color='blue')
plt.fill_between(range(len(smoother.data[0])), low_ci[0], up_ci[0], alpha=0.3, color='blue')
I point out also that tsmoothie can carry out the smoothing of multiple time-series in a vectorized way. Hope this can help someone

want to smooth a contour from a masked array

I have a masked array which is used by matplotlib.plt.contourf to project a temperature contour on a glabal map. I was trying to smooth the contour, but unfortunately none of the proposed solutions seems to be able to handle masked array. I tested these solutions:
-scipy.ndimage.gaussian_filter - moving averages
scipy.ndimage.zoom
none of them works(they count in the masked values also). Is there any way I can smooth my contour on maskedArray
I have added this part after trying the proposed 'inpaint' solution and the results were unchanged. here is the code (if it helps)
import Scientific.IO.NetCDF as S
import mpl_toolkits.basemap as bm
import numpy.ma as MA
import numpy as np
import matplotlib.pyplot as plt
import inpaint
def main():
fileobj = S.NetCDFFile('Bias.ANN.tas_A1_1.nc', mode='r')
# take the values
set1 = {'time', 'lat', 'lon'}
set2 = set(fileobj.variables.keys())
set3 = set2 - set1
datadim = set3.pop()
print "******************datadim: "+datadim
data = fileobj.variables[datadim].getValue()[0,:,:]
lon = fileobj.variables['lon'].getValue()
lat = fileobj.variables['lat'].getValue()
fileobj.close()
data, lon = bm.shiftgrid(180.,data, lon,start=False)
data = MA.masked_equal(data, 1.0e20)
#data2 = inpaint.replace_nans(data, 10, 0.25, 2, 'idw')
#- Make 2-D longitude and latitude arrays:
[lon2d, lat2d] =np.meshgrid(lon, lat)
#- Set up map:
mapproj = bm.Basemap(projection='cyl',
llcrnrlat=-90.0, llcrnrlon=-180.00,
urcrnrlat=90.0, urcrnrlon=180.0)
mapproj.drawcoastlines(linewidth=.5)
mapproj.drawmapboundary(fill_color='.8')
#mapproj.drawparallels(N.array([-90, -45, 0, 45, 90]), labels=[1,0,0,0])
#mapproj.drawmeridians(N.array([0, 90, 180, 270, 360]), labels=[0,0,0,1])
lonall, latall = mapproj(lon2d, lat2d)
cmap=plt.cm.Spectral
#- Make a contour plot of the temperature:
mymapf = plt.contourf(lonall, latall, data, 20, cmap=cmap)
#plt.clabel(mymapf, fontsize=12)
plt.title(cmap.name)
plt.colorbar(mymapf, orientation='horizontal')
plt.savefig('sample2.png', dpi=150, edgecolor='red', format='png', bbox_inches='tight', pad_inches=.2)
plt.close()
if __name__ == "__main__":
main()
I am comparing the output from this code (the first figure), with output of the same datafile from Panoply. Zoomin in and looking more precisely it seems like it is not the smoothness problem, but the pyplot model provides one stripe slimmer, or the contours are cut earlier (the outer boundaries shows this clearly, and inner contours are different due to this fact). It makes it to look like that the pyplot model is not as smooth as the Panoply one. how can I get (nearly) the same model? Am I distinguishing it right?
I had similar problem and google pointed me to this: blog post. Basically he's using inpaint algorithm to interpolate missing values and produce valid array for filtering.
The code is at the end of the post, and you can save it to site-packages (or else) and load it as module (i.e. inpaint.py):
import inpaint
filled = inpaint.replace_nans(NANMask, 5, 0.5, 2, 'idw')
I'm happy with the result, and I guess it will suite missing temperature values just fine. There is also next version here: github but code will need some cleaning for general usage as it's part of a project.
For reference, easy use and preservation sake I'll post the code (of initial version) here:
# -*- coding: utf-8 -*-
"""A module for various utilities and helper functions"""
import numpy as np
#cimport numpy as np
#cimport cython
DTYPEf = np.float64
#ctypedef np.float64_t DTYPEf_t
DTYPEi = np.int32
#ctypedef np.int32_t DTYPEi_t
##cython.boundscheck(False) # turn of bounds-checking for entire function
##cython.wraparound(False) # turn of bounds-checking for entire function
def replace_nans(array, max_iter, tol,kernel_size=1,method='localmean'):
"""Replace NaN elements in an array using an iterative image inpainting algorithm.
The algorithm is the following:
1) For each element in the input array, replace it by a weighted average
of the neighbouring elements which are not NaN themselves. The weights depends
of the method type. If ``method=localmean`` weight are equal to 1/( (2*kernel_size+1)**2 -1 )
2) Several iterations are needed if there are adjacent NaN elements.
If this is the case, information is "spread" from the edges of the missing
regions iteratively, until the variation is below a certain threshold.
Parameters
----------
array : 2d np.ndarray
an array containing NaN elements that have to be replaced
max_iter : int
the number of iterations
kernel_size : int
the size of the kernel, default is 1
method : str
the method used to replace invalid values. Valid options are
`localmean`, 'idw'.
Returns
-------
filled : 2d np.ndarray
a copy of the input array, where NaN elements have been replaced.
"""
# cdef int i, j, I, J, it, n, k, l
# cdef int n_invalids
filled = np.empty( [array.shape[0], array.shape[1]], dtype=DTYPEf)
kernel = np.empty( (2*kernel_size+1, 2*kernel_size+1), dtype=DTYPEf )
# cdef np.ndarray[np.int_t, ndim=1] inans
# cdef np.ndarray[np.int_t, ndim=1] jnans
# indices where array is NaN
inans, jnans = np.nonzero( np.isnan(array) )
# number of NaN elements
n_nans = len(inans)
# arrays which contain replaced values to check for convergence
replaced_new = np.zeros( n_nans, dtype=DTYPEf)
replaced_old = np.zeros( n_nans, dtype=DTYPEf)
# depending on kernel type, fill kernel array
if method == 'localmean':
print 'kernel_size', kernel_size
for i in range(2*kernel_size+1):
for j in range(2*kernel_size+1):
kernel[i,j] = 1
print kernel, 'kernel'
elif method == 'idw':
kernel = np.array([[0, 0.5, 0.5, 0.5,0],
[0.5,0.75,0.75,0.75,0.5],
[0.5,0.75,1,0.75,0.5],
[0.5,0.75,0.75,0.5,1],
[0, 0.5, 0.5 ,0.5 ,0]])
print kernel, 'kernel'
else:
raise ValueError( 'method not valid. Should be one of `localmean`.')
# fill new array with input elements
for i in range(array.shape[0]):
for j in range(array.shape[1]):
filled[i,j] = array[i,j]
# make several passes
# until we reach convergence
for it in range(max_iter):
print 'iteration', it
# for each NaN element
for k in range(n_nans):
i = inans[k]
j = jnans[k]
# initialize to zero
filled[i,j] = 0.0
n = 0
# loop over the kernel
for I in range(2*kernel_size+1):
for J in range(2*kernel_size+1):
# if we are not out of the boundaries
if i+I-kernel_size < array.shape[0] and i+I-kernel_size >= 0:
if j+J-kernel_size < array.shape[1] and j+J-kernel_size >= 0:
# if the neighbour element is not NaN itself.
if filled[i+I-kernel_size, j+J-kernel_size] == filled[i+I-kernel_size, j+J-kernel_size] :
# do not sum itself
if I-kernel_size != 0 and J-kernel_size != 0:
# convolve kernel with original array
filled[i,j] = filled[i,j] + filled[i+I-kernel_size, j+J-kernel_size]*kernel[I, J]
n = n + 1*kernel[I,J]
# divide value by effective number of added elements
if n != 0:
filled[i,j] = filled[i,j] / n
replaced_new[k] = filled[i,j]
else:
filled[i,j] = np.nan
# check if mean square difference between values of replaced
#elements is below a certain tolerance
print 'tolerance', np.mean( (replaced_new-replaced_old)**2 )
if np.mean( (replaced_new-replaced_old)**2 ) < tol:
break
else:
for l in range(n_nans):
replaced_old[l] = replaced_new[l]
return filled
def sincinterp(image, x, y, kernel_size=3 ):
"""Re-sample an image at intermediate positions between pixels.
This function uses a cardinal interpolation formula which limits
the loss of information in the resampling process. It uses a limited
number of neighbouring pixels.
The new image :math:`im^+` at fractional locations :math:`x` and :math:`y` is computed as:
.. math::
im^+(x,y) = \sum_{i=-\mathtt{kernel\_size}}^{i=\mathtt{kernel\_size}} \sum_{j=-\mathtt{kernel\_size}}^{j=\mathtt{kernel\_size}} \mathtt{image}(i,j) sin[\pi(i-\mathtt{x})] sin[\pi(j-\mathtt{y})] / \pi(i-\mathtt{x}) / \pi(j-\mathtt{y})
Parameters
----------
image : np.ndarray, dtype np.int32
the image array.
x : two dimensions np.ndarray of floats
an array containing fractional pixel row
positions at which to interpolate the image
y : two dimensions np.ndarray of floats
an array containing fractional pixel column
positions at which to interpolate the image
kernel_size : int
interpolation is performed over a ``(2*kernel_size+1)*(2*kernel_size+1)``
submatrix in the neighbourhood of each interpolation point.
Returns
-------
im : np.ndarray, dtype np.float64
the interpolated value of ``image`` at the points specified
by ``x`` and ``y``
"""
# indices
# cdef int i, j, I, J
# the output array
r = np.zeros( [x.shape[0], x.shape[1]], dtype=DTYPEf)
# fast pi
pi = 3.1419
# for each point of the output array
for I in range(x.shape[0]):
for J in range(x.shape[1]):
#loop over all neighbouring grid points
for i in range( int(x[I,J])-kernel_size, int(x[I,J])+kernel_size+1 ):
for j in range( int(y[I,J])-kernel_size, int(y[I,J])+kernel_size+1 ):
# check that we are in the boundaries
if i >= 0 and i <= image.shape[0] and j >= 0 and j <= image.shape[1]:
if (i-x[I,J]) == 0.0 and (j-y[I,J]) == 0.0:
r[I,J] = r[I,J] + image[i,j]
elif (i-x[I,J]) == 0.0:
r[I,J] = r[I,J] + image[i,j] * np.sin( pi*(j-y[I,J]) )/( pi*(j-y[I,J]) )
elif (j-y[I,J]) == 0.0:
r[I,J] = r[I,J] + image[i,j] * np.sin( pi*(i-x[I,J]) )/( pi*(i-x[I,J]) )
else:
r[I,J] = r[I,J] + image[i,j] * np.sin( pi*(i-x[I,J]) )*np.sin( pi*(j-y[I,J]) )/( pi*pi*(i-x[I,J])*(j-y[I,J]))
return r
#cdef extern from "math.h":
# double sin(double)
A simple smoothing function that works with masked data will solve this. One can then avoid the approaches that involve making up data (ie, interpolating, inpainting, etc); and making up data should always be avoided.
The main issue that arises when smoothing masked data is that for each point, smoothing uses the neighboring values to calculate a new value at a center point, but when those neighbors are masked, the new value for the center point will also become masked due to the rules of masked arrays. Therefore, one needs to do the calculation with unmasked data, and explicitly account for the mask. That's easy to do, and is not in the function smooth below.
from numpy import *
import pylab as plt
# make a grid and a striped mask as test data
N = 100
x = linspace(0, 5, N, endpoint=True)
grid = 2. + 1.*(sin(2*pi*x)[:,newaxis]*sin(2*pi*x)>0.)
m = resize((sin(pi*x)>0), (N,N))
plt.imshow(grid.copy(), cmap='jet', interpolation='nearest')
plt.colorbar()
plt.title('original data')
def smooth(u, mask):
m = ~mask
r = u*m # set all 'masked' points to 0. so they aren't used in the smoothing
a = 4*r[1:-1,1:-1] + r[2:,1:-1] + r[:-2,1:-1] + r[1:-1,2:] + r[1:-1,:-2]
b = 4*m[1:-1,1:-1] + m[2:,1:-1] + m[:-2,1:-1] + m[1:-1,2:] + m[1:-1,:-2] # a divisor that accounts for masked points
b[b==0] = 1. # for avoiding divide by 0 error (region is masked so value doesn't matter)
u[1:-1,1:-1] = a/b
# run the data through the smoothing filter a few times
for i in range(10):
smooth(grid, m)
mg = ma.array(grid, mask=m) # put together the mask and the data
plt.figure()
plt.imshow(mg, cmap='jet', interpolation='nearest')
plt.colorbar()
plt.title('smoothed with mask')
plt.show()
The main point is that at the boundary of the mask, the masked values are not used in the smoothing. (This is also where the grid squares switch values, so it would be clear in the figure if the masked neighboring values were being used.)
We also just had this problem and the astropy package has us covered:
import numpy as np
import matplotlib.pyplot as plt
# Some Axes
x = np.arange(100)
y = np.arange(100)
#Some Interesting Shape
z = np.array(np.outer(np.sin((x+y)/10),np.sin(y/3)),dtype=float)
# some mask
mask = np.outer(np.sin((x+y)/20),np.sin(y/5))**2>.9
# masked data represent noise, so lets put in some trash into the masked points
z[mask] = (np.random.random(size = (100,100))*10)[mask]
# masked data
z_masked = np.ma.masked_array(z, mask)
# "Conventional" filter
filter_kernelsize = 2
import scipy.ndimage
z_filtered_bad = scipy.ndimage.gaussian_filter(z_masked,filter_kernelsize)
# Lets filter it
import astropy.convolution.convolve
from astropy.convolution import Gaussian2DKernel
k = Gaussian2DKernel(1.5)
z_filtered = astropy.convolution.convolve(z_masked, k, boundary='extend')
### Plots:
fig, axes = plt.subplots(2,2)
plt.sca(axes[0,0])
plt.title('Raw Data')
plt.imshow(z)
plt.colorbar()
plt.sca(axes[0,1])
plt.title('Raw Data Masked')
plt.imshow(z_masked)
plt.colorbar()
plt.sca(axes[1,0])
plt.title('ndimage filter (ignores mask)')
plt.imshow(z_filtered_bad)
plt.colorbar()
plt.sca(axes[1,1])
plt.title('astropy filter (uses mask)')
plt.imshow(z_filtered)
plt.colorbar()
plt.tight_layout()
Output plot of the code

Using matplotlibs contourf function with 1D vectors

I have a large data set with files containing three also large single column vectors (backazimuth, frequency and power) for every day in a month. I would like to display the data on a polar plot using something like contourf. However, I am not sure how to reshape the power data into a 2D array. An example is below,
from pylab import *
x=rand(100)
y=rand(100)
z = rand(100) # 1D
BAZ, FREQ = meshgrid(x, y)
ax = plt.subplot(111, polar=True)
contourf(BAZ, FREQ, z) # z needs to be 2D
Any know how I can reshape z so this will work???
thanks,
David
From link in tiago's comment above answer is,
x=rand(100)
y=rand(100)
z = rand(100)
xgrid = np.linspace(x.min(), x.max(), 100)
ygrid = np.linspace(y.min(), y.max(), 100)
xgrid, ygrid = np.meshgrid(xgrid, ygrid)
zgrid = griddata((x,y),z, (xgrid, ygrid))
ax = plt.subplot(111, polar=True)
contourf(xgrid, ygrid, zgrid)
Thanks,
D.