How to perform raster calculation (e.g. aspect) on subset of raster based on point intersection in R - optimization

I'm working with some raster data in R using the raster package. I want to calculate and extract some geographic information (e.g., slope, aspect) from the raster, but only at specific points (I also have some data as a SpatialPointsDataFrame at which I want to calculate slope/aspect/etc.). I'm doing this for several high-resolution rasters, and it seems like a poor use of resources to calculate this for every raster cell when I only need maybe 5-10% of them.
I thought maybe the raster::stackApply function might work, but that seems to perform calculations on subsets of a rasterBrick rather than calculations on subsets of a single raster based on point locations (please correct me if I'm wrong). I also thought I could do a for loop, where I extract the surrounding cells nearest each point of interest, and iteratively calculate slope/aspect that way. That seems clunky, and I was hoping for a more elegant or built-in solution, but it should work.
These are my thoughts so far on the for loop, but I'm not sure how best to even do this.
# Attach packages
library(rgdal)
library(raster)
# Generate example raster data
r = raster()
set.seed(0)
values(r) = runif(ncell(r), min = 0, max = 1000)
# Generate example point data
df.sp = SpatialPoints(
coords = cbind(runif(25, min = -100, max = 100),
runif(25, min = -50, max = 50)),
proj4string = crs(r))
# Iterate on each row of SpatialPoints
for (i in 1:nrow(df.sp)) {
# Find cell index of current SpatialPoint
cell.idx = raster::extract(r, df.sp[i,], cellnumbers = TRUE)[1]
# Find indices of cells surrounding point of interest
neighbors.idx = raster::adjacent(r, cell.idx, directions = 16)
# Get DEM values for cell and surrounding cells
vals.local = r[c(cell.idx, neighbors.idx[,2])]
# Somehow convert this back to an appropriate georeferenced matrix
#r.local = ...
# Perform geometric calculations on local raster
#r.stack = terrain(r.local, opt = c('slope', 'aspect'))
# Remaining data extraction, etc. (I can take it from here...)
}
In summary, I need a method to calculate slope and aspect from a DEM raster only at specific points as given by a SpatialPoints object. If you know of a pre-built or more elegant solution, great! If not, some help finishing the for loop (how to best extract a neighborhood of surrounding cells and run calculations on that) would be most appreciated as well.

Interesting question. Here is a possible approach.
library(raster)
r <- raster()
set.seed(0)
values(r) <- runif(ncell(r), min = 0, max = 1000)
coords <- cbind(runif(25, min = -100, max = 100),
runif(25, min = -50, max = 50))
x <- rasterize(coords, r)
f <- focal(x, w=matrix(1, nc=3, nr=3), na.rm=TRUE)
rr <- mask(r, f)
slope <- terrain(rr, "slope")
extract(slope, coords)
# [1] 0.0019366236 0.0020670699 0.0006305257 0.0025334280 0.0023480935 0.0007527267 0.0002699272 0.0004699626
# [9] 0.0004869054 0.0025651333 0.0010415805 0.0008574920 0.0010664869 0.0017700297 0.0001666226 0.0008405391
#[17] 0.0017682167 0.0009854172 0.0015350466 0.0017714466 0.0012994945 0.0016563132 0.0003276584 0.0020499529
#[25] 0.0006582073
Probably not much efficiency gain, as it still processes all the NA values
So maybe like this, more along your line of thinking:
cells <- cellFromXY(r, coords)
ngbs <- raster::adjacent(r, cells, pairs=TRUE)
slope <- rep(NA, length(cells))
for (i in 1:length(cells)) {
ci <- ngbs[ngbs[,1] == cells[i], 2]
e <- extentFromCells(r, ci)
x <- crop(r, e)
slope[i] <- terrain(x, "slope")[5]
}
slope
#[1] 0.0019366236 0.0020670699 0.0006305257 0.0025334280 0.0023480935 0.0007527267 0.0002699272 0.0004699626
#[9] 0.0004869054 0.0025651333 0.0010415805 0.0008574920 0.0010664869 0.0017700297 0.0001666226 0.0008405391
#[17] 0.0017682167 0.0009854172 0.0015350466 0.0017714466 0.0012994945 0.0016563132 0.0003276584 0.0020499529
#[25] 0.0006582073
But I find that brute force
slope <- terrain(r, "slope")
extract(slope, coords)
is fastest, 9x faster than my first alternative and 4 times faster than the second alternative

Related

systemfit 3SLS Testing for Overidentification Restrictions

currently I'm struggling to find a good way to perform the Hansen/Sargan tests of Overidentification restrictions within a Three-Stage Least Squares model (3SLS) in panel data using R. I was digging the whole day in different networks and couldn't find a way of depicting the tests in R using the well-known systemfit package.
Currently, my code is simple.
violence_c_3sls <- Crime ~ ln_GDP +I(ln_GDP^2) + ln_Gini
income_c_3sls <-ln_GDP ~ Crime + ln_Gini
gini_c_3sls <- ln_Gini ~ ln_GDP + I(ln_GDP^2) + Crime
inst <- ~ Educ_Gvmnt_Exp + I(Educ_Gvmnt_Exp^2)+ Health_Exp + Pov_Head_Count_1.9
system_c_3sls <- list(violence_c_3sls, income_c_3sls, gini_c_3sls)
fitsur_c_3sls <-systemfit(system_c_3sls, "3SLS",inst=inst, data=df_new, methodResidCov = "noDfCor" )
summary(fitsur_c_3sls)
However, adding more instruments to create an over-identified system do not yield in an output of the Hansen/Sargan test, thus I assume the test should be executed aside from the output and probably associated to systemfit class object.
Thanks in advance.
With g equations, l exogenous variables, and k regressors, the Sargan test for 3SLS is
where u is the stacked residuals, \Sigma is the estimated residual covariance, and P_W is the projection matrix on the exogenous variables. See Ch 12.4 from Davidson & MacKinnon ETM.
Calculating the Sargan test from systemfit should look something like this:
sargan.systemfit=function(results3sls){
result <- list()
u=as.matrix(resid(results3sls)) #model residuals, n x n_eq
n_eq=length(results3sls$eq) # number of equations
n=nrow(u) #number of observations
n_reg=length(coef(results3sls)) # total number of regressors
w=model.matrix(results3sls,which='z') #Matrix of instruments, in block diagonal form with one block per equation
#Need to aggregate into a single block (in case different instruments used per equation)
w_list=lapply(X = 1:n_eq,FUN = function(eq_i){
this_eq_label=results3sls$eq[[eq_i]]$eqnLabel
this_w=w[str_detect(rownames(w),this_eq_label),str_detect(colnames(w),this_eq_label)]
colnames(this_w)=str_remove(colnames(this_w),paste0(this_eq_label,'_'))
return(this_w)
})
w=do.call(cbind,w_list)
w=w[,!duplicated(colnames(w))]
n_inst=ncol(w) #w is n x n_inst, where n_inst is the number of unique instruments/exogenous variables
#estimate residual variance (or use residCov, should be asymptotically equivalent)
var_u=crossprod(u)/n #var_u=results3sls$residCov
P_w=w%*%solve(crossprod(w))%*%t(w) #Projection matrix on instruments w
#as.numeric(u) vectorizes the residuals into a n_eq*n x 1 vector.
result$statistic <- as.numeric(t(as.numeric(u))%*%kronecker(solve(var_u),P_w)%*%as.numeric(u))
result$df <- n_inst*n_eq-n_reg
result$p.value <- 1 - pchisq(result$statistic, result$df)
result$method = paste("Sargan over-identifying restrictions test")
return(result)
}

how to get better Kriging result graphs in openturns?

I performed spherical Kriging, but I can't seem to get good output graphs.
The coordinates(x, and y) range from around around 51 latitude and around 6.5 as longitude
my observations range from -70 to +10
here is my code :
import openturns as ot
import pandas as pd
# your input / output data can be easily formatted as samples for openturns
df = pd.read_csv("kreuzkerpenutm.csv")
inputdata = ot.Sample(df[['x','y']].values)
outputdata = ot.Sample(df[['z']].values)
dimension = 2 # dimension of your input (x,y)
basis = ot.ConstantBasisFactory(dimension).build()
covarianceModel = ot.SphericalModel(dimension)
algo = ot.KrigingAlgorithm(inputdata, outputdata, covarianceModel, basis)
algo.run()
result = algo.getResult()
metamodel = result.getMetaModel()
lower = [-10.0] * 2 # lower bound of the 2D window
upper = [50.0] * 2 # upper bound of the 2D window
graph = metamodel.draw(lower, upper)
graph.setBoundingBox(ot.Interval(lower, upper))
graph.add(ot.Cloud(inputdata)) # overlay a scatter plot of the observation points
graph.setTitle("Kriging metamodel")
# A View object allows us to interact with the underlying matplotlib figure
from openturns.viewer import View
view = View(graph, legend_kw={'bbox_to_anchor':(1,1), 'loc':"upper left"})
view.getFigure().tight_layout()
here is my output:
kriging metamodel graph
I don't know why my graph won't show me my inputs aswell as my kriging results.
thanks for ideas and help
If the input data is not scaled in [-1,1]^d, the kriging metamodel may have issues to identify the scale parameters using maximum likelihood optimization. In order to help for this, we may:
provide a better starting point for the scale parameters of the covariance model (this is trick "A" below),
set the bounds of the optimization algorithm so that the interval where the parameters are searched for correspond to the data at hand (this is trick "B" below).
This is what the following script does, using simulated data instead of a csv data file. In the script, I create the data using a g function which is scaled so that it produces results in the [-10, 70] range, as in your problem. Please look carefuly at the setScale() method which sets the initial value of the covariance model: this is the starting point of the optimization algorithm. Then look at the setOptimizationBounds() method, which sets the bounds of the optimization algorithm.
import openturns as ot
dimension = 2 # dimension of your input (x,y)
distribution = ot.ComposedDistribution([ot.Uniform(-10.0, 50.0)] * dimension)
inputdata = distribution.getSample(100)
g = ot.SymbolicFunction(["x", "y"], ["30 + 3.0 * sin(x / 10.0) * (y / 10.0) ^ 2"])
outputdata = g(inputdata)
basis = ot.ConstantBasisFactory(dimension).build()
covarianceModel = ot.SphericalModel(dimension)
covarianceModel.setScale(inputdata.getMax()) # Trick A
algo = ot.KrigingAlgorithm(inputdata, outputdata, covarianceModel, basis)
# Trick B, v2
x_range = inputdata.getMax() - inputdata.getMin()
scale_max_factor = 2.0 # Must be > 1, tune this to match your problem
scale_min_factor = 0.1 # Must be < 1, tune this to match your problem
maximum_scale_bounds = scale_max_factor * x_range
minimum_scale_bounds = scale_min_factor * x_range
scaleOptimizationBounds = ot.Interval(minimum_scale_bounds, maximum_scale_bounds)
algo.setOptimizationBounds(scaleOptimizationBounds)
algo.run()
result = algo.getResult()
metamodel = result.getMetaModel()
metamodel.setInputDescription(["x", "y"])
metamodel.setOutputDescription(["z"])
lower = [-10.0] * 2 # lower bound of the 2D window
upper = [50.0] * 2 # upper bound of the 2D window
graph = metamodel.draw(lower, upper)
graph.setBoundingBox(ot.Interval(lower, upper))
graph.add(ot.Cloud(inputdata)) # overlay a scatter plot of the observation points
graph.setTitle("Kriging metamodel")
# A View object allows us to interact with the underlying matplotlib figure
from openturns.viewer import View
view = View(graph, legend_kw={"bbox_to_anchor": (1, 1), "loc": "upper left"})
view.getFigure().tight_layout()
The previous script produces the following figure.
There are other ways to implement trick B. Here is one provided by J.Pelamatti:
# Trick B, v3
for d in range(X_train.getDimension()):
dist = scipy.spatial.distance.pdist(X_train[:,d])
scale_max_factor = 2.0 # Must be > 1, tune this to match your problem
scale_min_factor = 0.1 # Must be < 1, tune this to match your problem
maximum_scale_bounds = scale_max_factor * np.max(dist)
minimum_scale_bounds = scale_min_factor * np.min(dist)
This topic is discussed in this particular thread in OT's forum.
Sorry for the late answer.
Which version of openturns are you using?
Probably you have an embedded transformation of (input) data, which makes the data range between (-3, 3) approximately (standard scaling). The kriging result should contains the transformation in such a case.
With more recent openturns implementations, this feature has been removed.
Hope this can help.
Cheers

How to implement a method to generate Poincaré sections for a non-linear system of ODEs?

I have been trying to work out how to calculate Poincaré sections for a system of non-linear ODEs, using a paper on the exact system as reference, and have been wrestling with numpy to try and make it run better. This is intended to run within a bounded domain.
Currently, I have the following code
import numpy as np
from scipy.integrate import odeint
X = 0
Y = 1
Z = 2
def generate_poincare_map(function, initial, plane, iterations, delta):
intersections = []
p_i = odeint(function, initial.flatten(), [0, delta])[-1]
for i in range(1, iterations):
p_f = odeint(function, p_i, [i * delta, (i+1) * delta])[-1]
if (p_f[Z] > plane) and (p_i[Z] < plane):
intersections.append(p_i[:2])
if (p_f[Z] > plane) and (p_i[Z] < plane):
intersections.append(p_i[:2])
p_i = p_f
return np.stack(intersections)
This is pretty wasteful due to the integration solely between successive time steps, and seems to produce incorrect results. The original reference includes sections along the lines of
whereas mine tend to result in something along the lines of
Do you have any advice on how to proceed to make this more correct, and perhaps a little faster?
To get a Pointcaré map of the ABC flow
def ABC_ode(u,t):
A, B, C = 0.75, 1, 1 # matlab parameters
x, y, z = u
return np.array([
A*np.sin(z)+C*np.cos(y),
B*np.sin(x)+A*np.cos(z),
C*np.sin(y)+B*np.cos(x)
])
def mysolver(u0, tspan): return odeint(ABC_ode, u0, tspan, atol=1e-10, rtol=1e-11)
you have first to understand that the dynamical system is really about the points (cos(x),sin(x)) etc. on the unit circle. So values different by multiples of 2*pi represent the same point. In the computation of the section one has to reflect this, either by computing it on the Cartesian product of the 3 circles. Let's stay with the second variant, and chose [-pi,pi] as the fundamental period to have the zero location well in the center. Keep in mind that jumps larger pi are from the angle reduction, not from a real crossing of that interval.
def find_crosssections(x0,y0):
u0 = [x0,y0,0]
px = []
py = []
u = mysolver(u0, np.arange(0, 4000, 0.5)); u0 = u[-1]
u = np.mod(u+pi,2*pi)-pi
x,y,z = u.T
for k in range(len(z)-1):
if z[k]<=0 and z[k+1]>=0 and z[k+1]-z[k]<pi:
# find a more exact intersection location by linear interpolation
s = -z[k]/(z[k+1]-z[k]) # 0 = z[k] + s*(z[k+1]-z[k])
rx, ry = (1-s)*x[k]+s*x[k+1], (1-s)*y[k]+s*y[k+1]
px.append(rx);
py.append(ry);
return px,py
To get a full picture of the Poincare cross-section and avoid duplicate work, use a grid of squares and mark if one of the intersections already fell in it. Only start new iterations from the centers of free squares.
N=20
grid = np.zeros([N,N], dtype=int)
for i in range(N):
for j in range(N):
if grid[i,j]>0: continue;
x0, y0 = (2*i+1)*pi/N-pi, (2*j+1)*pi/N-pi
px, py = find_crosssections(x0,y0)
for rx,ry in zip(px,py):
m, n = int((rx+pi)*N/(2*pi)), int((ry+pi)*N/(2*pi))
grid[m,n]=1
plt.plot(px, py, '.', ms=2)
You can now play with the density of the grid and the length of the integration interval to get the plot a little more filled out, but all characteristic features are already here. But I'd recommend re-programming this in a compiled language, as the computation will take some time.

find ranges to create Uniform histogram

I need to find ranges in order to create a Uniform histogram
i.e: ages
to 4 ranges
data_set = [18,21,22,24,27,27,28,29,30,32,33,33,42,42,45,46]
is there a function that gives me the ranges so the histogram is uniform?
in this case
ranges = [(18,24), (27,29), (30,33), (42,46)]
This example is easy, I'd like to know if there is an algorithm that deals with complex data sets as well
thanks
You are looking for the quantiles that split up your data equally. This combined with cutshould work. So, suppose you want n groups.
set.seed(1)
x <- rnorm(1000) # Generate some toy data
n <- 10
uniform <- cut(x, c(-Inf, quantile(x, prob = (1:(n-1))/n), Inf)) # Determine the groups
plot(uniform)
Edit: now corrected to yield the correct cuts in the ends.
Edit2: I don't quite understand the downvote. But this also works in your example:
data_set = c(18,21,22,24,27,27,28,29,30,32,33,33,42,42,45,46)
n <- 4
groups <- cut(data_set, breaks = c(-Inf, quantile(data_set, prob = 1:(n-1)/n), Inf))
levels(groups)
With some minor renaming nessesary. For slightly better level names, you could also put in min(x) and max(x) instead of -Inf and Inf.

Is it possible to optimize this Matlab code for doing vector quantization with centroids from k-means?

I've created a codebook using k-means of size 4000x300 (4000 centroids, each with 300 features). Using the codebook, I then want to label an input vector (for purposes of binning later on). The input vector is of size Nx300, where N is the total number of input instances I receive.
To compute the labels, I calculate the closest centroid for each of the input vectors. To do so, I compare each input vector against all centroids and pick the centroid with the minimum distance. The label is then just the index of that centroid.
My current Matlab code looks like:
function labels = assign_labels(centroids, X)
labels = zeros(size(X, 1), 1);
% for each X, calculate the distance from each centroid
for i = 1:size(X, 1)
% distance of X_i from all j centroids is: sum((X_i - centroid_j)^2)
% note: we leave off the sqrt as an optimization
distances = sum(bsxfun(#minus, centroids, X(i, :)) .^ 2, 2);
[value, label] = min(distances);
labels(i) = label;
end
However, this code is still fairly slow (for my purposes), and I was hoping there might be a way to optimize the code further.
One obvious issue is that there is a for-loop, which is the bane of good performance on Matlab. I've been trying to come up with a way to get rid of it, but with no luck (I looked into using arrayfun in conjunction with bsxfun, but haven't gotten that to work). Alternatively, if someone know of any other way to speed this up, I would be greatly appreciate it.
Update
After doing some searching, I couldn't find a great solution using Matlab, so I decided to look at what is used in Python's scikits.learn package for 'euclidean_distance' (shortened):
XX = sum(X * X, axis=1)[:, newaxis]
YY = Y.copy()
YY **= 2
YY = sum(YY, axis=1)[newaxis, :]
distances = XX + YY
distances -= 2 * dot(X, Y.T)
distances = maximum(distances, 0)
which uses the binomial form of the euclidean distance ((x-y)^2 -> x^2 + y^2 - 2xy), which from what I've read usually runs faster. My completely untested Matlab translation is:
XX = sum(data .* data, 2);
YY = sum(center .^ 2, 2);
[val, ~] = max(XX + YY - 2*data*center');
Use the following function to calculate your distances. You should see an order of magnitude speed up
The two matrices A and B have the columns as the dimenions and the rows as each point.
A is your matrix of centroids. B is your matrix of datapoints.
function D=getSim(A,B)
Qa=repmat(dot(A,A,2),1,size(B,1));
Qb=repmat(dot(B,B,2),1,size(A,1));
D=Qa+Qb'-2*A*B';
You can vectorize it by converting to cells and using cellfun:
[nRows,nCols]=size(X);
XCell=num2cell(X,2);
dist=reshape(cell2mat(cellfun(#(x)(sum(bsxfun(#minus,centroids,x).^2,2)),XCell,'UniformOutput',false)),nRows,nRows);
[~,labels]=min(dist);
Explanation:
We assign each row of X to its own cell in the second line
This piece #(x)(sum(bsxfun(#minus,centroids,x).^2,2)) is an anonymous function which is the same as your distances=... line, and using cell2mat, we apply it to each row of X.
The labels are then the indices of the minimum row along each column.
For a true matrix implementation, you may consider trying something along the lines of:
P2 = kron(centroids, ones(size(X,1),1));
Q2 = kron(ones(size(centroids,1),1), X);
distances = reshape(sum((Q2-P2).^2,2), size(X,1), size(centroids,1));
Note
This assumes the data is organized as [x1 y1 ...; x2 y2 ...;...]
You can use a more efficient algorithm for nearest neighbor search than brute force.
The most popular approach are Kd-Tree. O(log(n)) average query time instead of the O(n) brute force complexity.
Regarding a Maltab implementation of Kd-Trees, you can have a look here