Can I use Lattice auto.key or key to make a legend with points for some data and lines for others? - legend

I often make figures that have observed data represented as points and model-predicted data represented as lines, using distribute.type to assign graph types. Is there a way to make a legend that only shows points for the points data, and lines for the lines data? The auto.key default is points, and if I add lines with "list(lines=TRUE)" the legend shows both points and lines for every data label:
x <- seq(0, 8*pi, by=pi/6)
Y1pred <- sin(x)
Y1obs <- Y1pred + rnorm(length(x), mean=0, sd=0.2)
Y2pred <- cos(x)
Y2obs <- Y2pred + rnorm(length(x), mean=0, sd=0.4)
xyplot(Y1obs + Y2obs + Y1pred + Y2pred ~ x,
type=c('p','p','l','l'),
distribute.type=TRUE,
auto.key=list(lines=TRUE, columns=2)
)
There is a rather complicated example using 'key' on p. 158 of Deepayans' book on Lattice, I am wondering if there are simple options?

Yes, following S, the lines components of key supports different type-s (but not points). Using auto.key, you could do
xyplot(Y1obs + Y2obs + Y1pred + Y2pred ~ x,
type=c('p','p','l','l'),
distribute.type=TRUE,
auto.key = list(points = FALSE, lines = TRUE,
columns = 2,
type = c('p','p','l','l')))
Ideally, you would want to put the type only inside the lines component, and that's how you should do it if you use key. For auto.key, there can be only one line anyway, so this should be fine.

Related

systemfit 3SLS Testing for Overidentification Restrictions

currently I'm struggling to find a good way to perform the Hansen/Sargan tests of Overidentification restrictions within a Three-Stage Least Squares model (3SLS) in panel data using R. I was digging the whole day in different networks and couldn't find a way of depicting the tests in R using the well-known systemfit package.
Currently, my code is simple.
violence_c_3sls <- Crime ~ ln_GDP +I(ln_GDP^2) + ln_Gini
income_c_3sls <-ln_GDP ~ Crime + ln_Gini
gini_c_3sls <- ln_Gini ~ ln_GDP + I(ln_GDP^2) + Crime
inst <- ~ Educ_Gvmnt_Exp + I(Educ_Gvmnt_Exp^2)+ Health_Exp + Pov_Head_Count_1.9
system_c_3sls <- list(violence_c_3sls, income_c_3sls, gini_c_3sls)
fitsur_c_3sls <-systemfit(system_c_3sls, "3SLS",inst=inst, data=df_new, methodResidCov = "noDfCor" )
summary(fitsur_c_3sls)
However, adding more instruments to create an over-identified system do not yield in an output of the Hansen/Sargan test, thus I assume the test should be executed aside from the output and probably associated to systemfit class object.
Thanks in advance.
With g equations, l exogenous variables, and k regressors, the Sargan test for 3SLS is
where u is the stacked residuals, \Sigma is the estimated residual covariance, and P_W is the projection matrix on the exogenous variables. See Ch 12.4 from Davidson & MacKinnon ETM.
Calculating the Sargan test from systemfit should look something like this:
sargan.systemfit=function(results3sls){
result <- list()
u=as.matrix(resid(results3sls)) #model residuals, n x n_eq
n_eq=length(results3sls$eq) # number of equations
n=nrow(u) #number of observations
n_reg=length(coef(results3sls)) # total number of regressors
w=model.matrix(results3sls,which='z') #Matrix of instruments, in block diagonal form with one block per equation
#Need to aggregate into a single block (in case different instruments used per equation)
w_list=lapply(X = 1:n_eq,FUN = function(eq_i){
this_eq_label=results3sls$eq[[eq_i]]$eqnLabel
this_w=w[str_detect(rownames(w),this_eq_label),str_detect(colnames(w),this_eq_label)]
colnames(this_w)=str_remove(colnames(this_w),paste0(this_eq_label,'_'))
return(this_w)
})
w=do.call(cbind,w_list)
w=w[,!duplicated(colnames(w))]
n_inst=ncol(w) #w is n x n_inst, where n_inst is the number of unique instruments/exogenous variables
#estimate residual variance (or use residCov, should be asymptotically equivalent)
var_u=crossprod(u)/n #var_u=results3sls$residCov
P_w=w%*%solve(crossprod(w))%*%t(w) #Projection matrix on instruments w
#as.numeric(u) vectorizes the residuals into a n_eq*n x 1 vector.
result$statistic <- as.numeric(t(as.numeric(u))%*%kronecker(solve(var_u),P_w)%*%as.numeric(u))
result$df <- n_inst*n_eq-n_reg
result$p.value <- 1 - pchisq(result$statistic, result$df)
result$method = paste("Sargan over-identifying restrictions test")
return(result)
}

how to get better Kriging result graphs in openturns?

I performed spherical Kriging, but I can't seem to get good output graphs.
The coordinates(x, and y) range from around around 51 latitude and around 6.5 as longitude
my observations range from -70 to +10
here is my code :
import openturns as ot
import pandas as pd
# your input / output data can be easily formatted as samples for openturns
df = pd.read_csv("kreuzkerpenutm.csv")
inputdata = ot.Sample(df[['x','y']].values)
outputdata = ot.Sample(df[['z']].values)
dimension = 2 # dimension of your input (x,y)
basis = ot.ConstantBasisFactory(dimension).build()
covarianceModel = ot.SphericalModel(dimension)
algo = ot.KrigingAlgorithm(inputdata, outputdata, covarianceModel, basis)
algo.run()
result = algo.getResult()
metamodel = result.getMetaModel()
lower = [-10.0] * 2 # lower bound of the 2D window
upper = [50.0] * 2 # upper bound of the 2D window
graph = metamodel.draw(lower, upper)
graph.setBoundingBox(ot.Interval(lower, upper))
graph.add(ot.Cloud(inputdata)) # overlay a scatter plot of the observation points
graph.setTitle("Kriging metamodel")
# A View object allows us to interact with the underlying matplotlib figure
from openturns.viewer import View
view = View(graph, legend_kw={'bbox_to_anchor':(1,1), 'loc':"upper left"})
view.getFigure().tight_layout()
here is my output:
kriging metamodel graph
I don't know why my graph won't show me my inputs aswell as my kriging results.
thanks for ideas and help
If the input data is not scaled in [-1,1]^d, the kriging metamodel may have issues to identify the scale parameters using maximum likelihood optimization. In order to help for this, we may:
provide a better starting point for the scale parameters of the covariance model (this is trick "A" below),
set the bounds of the optimization algorithm so that the interval where the parameters are searched for correspond to the data at hand (this is trick "B" below).
This is what the following script does, using simulated data instead of a csv data file. In the script, I create the data using a g function which is scaled so that it produces results in the [-10, 70] range, as in your problem. Please look carefuly at the setScale() method which sets the initial value of the covariance model: this is the starting point of the optimization algorithm. Then look at the setOptimizationBounds() method, which sets the bounds of the optimization algorithm.
import openturns as ot
dimension = 2 # dimension of your input (x,y)
distribution = ot.ComposedDistribution([ot.Uniform(-10.0, 50.0)] * dimension)
inputdata = distribution.getSample(100)
g = ot.SymbolicFunction(["x", "y"], ["30 + 3.0 * sin(x / 10.0) * (y / 10.0) ^ 2"])
outputdata = g(inputdata)
basis = ot.ConstantBasisFactory(dimension).build()
covarianceModel = ot.SphericalModel(dimension)
covarianceModel.setScale(inputdata.getMax()) # Trick A
algo = ot.KrigingAlgorithm(inputdata, outputdata, covarianceModel, basis)
# Trick B, v2
x_range = inputdata.getMax() - inputdata.getMin()
scale_max_factor = 2.0 # Must be > 1, tune this to match your problem
scale_min_factor = 0.1 # Must be < 1, tune this to match your problem
maximum_scale_bounds = scale_max_factor * x_range
minimum_scale_bounds = scale_min_factor * x_range
scaleOptimizationBounds = ot.Interval(minimum_scale_bounds, maximum_scale_bounds)
algo.setOptimizationBounds(scaleOptimizationBounds)
algo.run()
result = algo.getResult()
metamodel = result.getMetaModel()
metamodel.setInputDescription(["x", "y"])
metamodel.setOutputDescription(["z"])
lower = [-10.0] * 2 # lower bound of the 2D window
upper = [50.0] * 2 # upper bound of the 2D window
graph = metamodel.draw(lower, upper)
graph.setBoundingBox(ot.Interval(lower, upper))
graph.add(ot.Cloud(inputdata)) # overlay a scatter plot of the observation points
graph.setTitle("Kriging metamodel")
# A View object allows us to interact with the underlying matplotlib figure
from openturns.viewer import View
view = View(graph, legend_kw={"bbox_to_anchor": (1, 1), "loc": "upper left"})
view.getFigure().tight_layout()
The previous script produces the following figure.
There are other ways to implement trick B. Here is one provided by J.Pelamatti:
# Trick B, v3
for d in range(X_train.getDimension()):
dist = scipy.spatial.distance.pdist(X_train[:,d])
scale_max_factor = 2.0 # Must be > 1, tune this to match your problem
scale_min_factor = 0.1 # Must be < 1, tune this to match your problem
maximum_scale_bounds = scale_max_factor * np.max(dist)
minimum_scale_bounds = scale_min_factor * np.min(dist)
This topic is discussed in this particular thread in OT's forum.
Sorry for the late answer.
Which version of openturns are you using?
Probably you have an embedded transformation of (input) data, which makes the data range between (-3, 3) approximately (standard scaling). The kriging result should contains the transformation in such a case.
With more recent openturns implementations, this feature has been removed.
Hope this can help.
Cheers

How can I adjust bars on X-axis accord to numerical order?

The ideal order of bars on X-axis is (S1,S2,S3,S4,S5,S6,S7,S8,S9,S10).But it just can not be adjust successfully :(
ggplot(data=T1, aes(x=Scenarios, y=Yields, fill=Scenarios)) +
geom_bar(stat="identity") +
scale_fill_manual(values=cbPalette)
Under the hood ggplot converts the strings to factors and the default levels will be the strings sorted alphabetically (or depending how you create T1, it could be at that point). If you want to customize the sort order, explicitly cast your x variable as a factor with the level order you prefer.
Either ahead of time:
T1 <- dplyr::mutate(T1, Scenarios = factor(Scenarios, Scenarios))
or inline with the ggplot call
ggplot(aes(x= factor(scenarios, scenarios), y= yields, fill=scenarios)) +
geom_bar(stat="identity")

How to perform raster calculation (e.g. aspect) on subset of raster based on point intersection in R

I'm working with some raster data in R using the raster package. I want to calculate and extract some geographic information (e.g., slope, aspect) from the raster, but only at specific points (I also have some data as a SpatialPointsDataFrame at which I want to calculate slope/aspect/etc.). I'm doing this for several high-resolution rasters, and it seems like a poor use of resources to calculate this for every raster cell when I only need maybe 5-10% of them.
I thought maybe the raster::stackApply function might work, but that seems to perform calculations on subsets of a rasterBrick rather than calculations on subsets of a single raster based on point locations (please correct me if I'm wrong). I also thought I could do a for loop, where I extract the surrounding cells nearest each point of interest, and iteratively calculate slope/aspect that way. That seems clunky, and I was hoping for a more elegant or built-in solution, but it should work.
These are my thoughts so far on the for loop, but I'm not sure how best to even do this.
# Attach packages
library(rgdal)
library(raster)
# Generate example raster data
r = raster()
set.seed(0)
values(r) = runif(ncell(r), min = 0, max = 1000)
# Generate example point data
df.sp = SpatialPoints(
coords = cbind(runif(25, min = -100, max = 100),
runif(25, min = -50, max = 50)),
proj4string = crs(r))
# Iterate on each row of SpatialPoints
for (i in 1:nrow(df.sp)) {
# Find cell index of current SpatialPoint
cell.idx = raster::extract(r, df.sp[i,], cellnumbers = TRUE)[1]
# Find indices of cells surrounding point of interest
neighbors.idx = raster::adjacent(r, cell.idx, directions = 16)
# Get DEM values for cell and surrounding cells
vals.local = r[c(cell.idx, neighbors.idx[,2])]
# Somehow convert this back to an appropriate georeferenced matrix
#r.local = ...
# Perform geometric calculations on local raster
#r.stack = terrain(r.local, opt = c('slope', 'aspect'))
# Remaining data extraction, etc. (I can take it from here...)
}
In summary, I need a method to calculate slope and aspect from a DEM raster only at specific points as given by a SpatialPoints object. If you know of a pre-built or more elegant solution, great! If not, some help finishing the for loop (how to best extract a neighborhood of surrounding cells and run calculations on that) would be most appreciated as well.
Interesting question. Here is a possible approach.
library(raster)
r <- raster()
set.seed(0)
values(r) <- runif(ncell(r), min = 0, max = 1000)
coords <- cbind(runif(25, min = -100, max = 100),
runif(25, min = -50, max = 50))
x <- rasterize(coords, r)
f <- focal(x, w=matrix(1, nc=3, nr=3), na.rm=TRUE)
rr <- mask(r, f)
slope <- terrain(rr, "slope")
extract(slope, coords)
# [1] 0.0019366236 0.0020670699 0.0006305257 0.0025334280 0.0023480935 0.0007527267 0.0002699272 0.0004699626
# [9] 0.0004869054 0.0025651333 0.0010415805 0.0008574920 0.0010664869 0.0017700297 0.0001666226 0.0008405391
#[17] 0.0017682167 0.0009854172 0.0015350466 0.0017714466 0.0012994945 0.0016563132 0.0003276584 0.0020499529
#[25] 0.0006582073
Probably not much efficiency gain, as it still processes all the NA values
So maybe like this, more along your line of thinking:
cells <- cellFromXY(r, coords)
ngbs <- raster::adjacent(r, cells, pairs=TRUE)
slope <- rep(NA, length(cells))
for (i in 1:length(cells)) {
ci <- ngbs[ngbs[,1] == cells[i], 2]
e <- extentFromCells(r, ci)
x <- crop(r, e)
slope[i] <- terrain(x, "slope")[5]
}
slope
#[1] 0.0019366236 0.0020670699 0.0006305257 0.0025334280 0.0023480935 0.0007527267 0.0002699272 0.0004699626
#[9] 0.0004869054 0.0025651333 0.0010415805 0.0008574920 0.0010664869 0.0017700297 0.0001666226 0.0008405391
#[17] 0.0017682167 0.0009854172 0.0015350466 0.0017714466 0.0012994945 0.0016563132 0.0003276584 0.0020499529
#[25] 0.0006582073
But I find that brute force
slope <- terrain(r, "slope")
extract(slope, coords)
is fastest, 9x faster than my first alternative and 4 times faster than the second alternative

color range in LineCollection

I'm overplotting multicolored lines on an image, the color of the lines is supposed to represent a given parameter that varies between roughtly -1 and 3.
The following portion of code is the one that builds these lines :
x = self._tprun.r[0,p,::100] # x coordinate
y = self._tprun.r[1,p,::100] # y coordinate
points = np.array([x, y]).T.reshape(-1, 1, 2)
segments = np.concatenate([points[:-1], points[1:]], axis=1)
# 'color' is the parameter that will color the line
vmin = self._color[p,:].min()
vmax = self._color[p,:].max()
lc = LineCollection(segments,
cmap=plt.get_cmap('jet'),
norm=plt.Normalize(vmin=vmin,vmax=vmax))
lc.set_array(self._color[p,:])
lc.set_linewidth(1)
self._ax.add_collection(lc)
This code is inside a loop on 'p' and so it will create several lines at locations given by the arrays 'x' and 'y' and for which the color should be given by the value of 'self._color[p,:]'.
As I said, '_color[p,:]' roughly varies between -1 and 3. Here is an example of what '_color[p,:]' may be :
My problem is that the lines that are created appear without much variation of the color, they all look kind of monochrome dark blue whereas _color[p,:] varies much and I ask for the normalization to take its min/max values.
here is an example of such a line (look at the oscillating dark blue line, other black lines are a contour of another value) :
Is there something I'm missing in the way these functions work?
Got it!
Answer to the question is here :
x = self._tprun.r[0,p,::100] # re-sample every 100 values !!
y = self._tprun.r[1,p,::100] #
# [...]
#lc.set_array(self._color[p,:]) # self._color[p,:] is not resampled
lc.set_array(self._color[p,::100]) # this works because resampled
meaning that the 'color' array was actually much larger than the arrays used for position of the line segments.... only the first values of '_color' where used where its values do not vary that much.