GWAS SOMMER r2 value and marker size effect in sommer 4.2.0 - size

I am trying to get the r2 values and size effect for each marker in a GWAS using the GWAS function in the r package "sommer". From the documentation I see they should be in the object mix1$scores: scores A dataframe with as many columns as markers analyzed and 5 rows containingthe following:beta: marker effects.score: marker scores (-log_10p) for the traits.Fstat: F-statistic associated to the test.R2: R2 value for each marker.R2s: R2 value for each marker scaled
However, in my model, only marker scores are listed.
Example :
library(sommer)
data(DT_cpdata)
DT <- DT_cpdata
MP <- MP_cpdata
#### create the variance-covariance matrix
A <- A.mat(GT) # additive relationship matrix#
#### look at the data and fit the model
head(DT)
mix1 <- GWAS(color~1,
random=~vs(id,Gu=A)
+ Rowf + Colf,
rcov=~units,
data=DT,
M=GT, gTerm = "u:id")
mix1$scores
Is there a way to compute easily size effects from output from sommer in R ?
Thank you for developping this wonderfull package.
This question was already asked in https://stackoverflow.com/questions/66764318/gwas-sommer-r2-scores/74615357#74615357 but with previous version of Sommer. I am using sommer 4.2.0
Thx for all your help;
Michel

Related

Growth curves in R with standard deviation

I am trying to plot my data (replicate results for each strain) and i want only one line graph for each strain, this means averaged results of replicates for each strain with points along the line with error bars (error between replicate data).
If you click on the image above, it shows the plot i have so far, which displays WT and WT.1 as seperate lines and all other replicates. However, they are replicates of each strain (WT,DrsbR,DsigB) and i want them to appear as one line of mean results for each strain instead. I am using ggplot package- and melting data with reshape package, but cannot figure out how to make my replicates appear as one line together with error bars (standard deviation of mean results between replicates).
The image in black and white is something i am looking for in my graph- seperate line with points of replicate data plotted as a mean value.
library(reshape2)
melted<-melt(abs2)
print(abs2)
melted<-melt(abs2,id=1,measured=c("WT","WT.1","DsigB","DsigB.1","DrsbR","DrsbR.1"))
View(melted)
colnames(melted)<-c("Time","Strain","Values")
##line graph for melted data
melted$Time<-as.factor(melted$Time)
abs2line=ggplot(melted,aes(Time,Values))+geom_line(aes(colour=Strain,group=Strain))
abs2line+
stat_summary(fun=mean,
geom="point",
aes(group=Time))+
stat_summary(fun.data=mean_cl_boot,
geom="errorbar",
width=.2)+
xlab("Time")+
ylab("OD600")+
theme_classic()+
labs(title="Growth Curve of Mutant Strains")
summary(melted)
print(melted)
One approach is to take your melted data frame and separate out the "variable" column into "species" and "strain" using the separate() function from tidyr. I don't have your dataset -- it is appreciated if you are able to share your dataset via dput(your.data.frame) for future questions -- so I made a dummy dataset that's similar to yours. Here we have two "species" (red and blue) and two "strains" for each species.
df <- data.frame(
time = seq(0, 40, by=10),
blue = c(0:4),
blue.1 = c(0, 1.1, 1.9, 3.1, 4.1),
red = seq(0, 8, by=2),
red.1 = c(0, 2.1, 4.2, 5.5, 8.2)
)
df.melt <- melt(df,
id.vars = 'time',
measure.vars = c('blue', 'blue.1', 'red', 'red.1'))
We can then use tidyr::separate() to separate the resulting "variable" column into a "species" column and a "strain" column. Luckily, your data contains a "." which can be a handy character to use for the separation:
df.melt.mod <- df.melt %>%
separate(col=variable, into=c('species', 'strain'), sep='\\.')
Note: The above code will give you a warning related to the point that "blue" and "red" do not have the "." character, thereby giving you NA for the "strain" column. We don't care here, because we're not using that column for anything here. In your own dataset, you can similarly not care too much.
Then, you can actually just use stat_summary() for all geoms... modify as you see fit for your own visual and thematic preference. Note that order matters for layering, so I plot geom_line first, then geom_point, then geom_errorbar. Also note that you can assign the group=species aesthetic in the base ggplot() call and that mapping applies to all geoms unless overwritten.
ggplot(df.melt.mod, aes(x=time, y=value, group=species)) +
stat_summary(
fun = mean,
geom='line',
aes(color=species)) +
stat_summary(
fun=mean,
geom='point') +
stat_summary(
fun.data=mean_cl_boot,
geom='errorbar',
width=0.5) +
theme_bw()

Miscalculation of new function "pcsegdist" in Matlab R2018b

I try to test the new function "pcsegdist" in Matlab R2018b. However, the result is wrong for Segment point cloud into clusters based on Euclidean distance
Example: I test with 3D data points- 1797 points (please see attached test.txt file). It is noted that the smallest distance between 2 neighbor points is 0.3736
tic
clear;clc;filename = 'test.txt'; load('test.txt');P = test(:,1:3);%get data=coordinate(x,y,z) from set of data "column" at (all row & column 1,2,3)
ptCloud = pointCloud(P);
minDistance = 0.71;%this value should less than the smallest 3D distance between 2 clusters
[labels,numClusters] = pcsegdist(ptCloud,minDistance);%numClusters: the number of Cluster
%labels: is the kx1 matrix. This is index of each voxel in each cluster
toc
%% Generate the cell_cluster
cell_cluster={};x=P(:,1);y=P(:,2);z=P(:,3);
for i=1:numClusters
cluster_i=[x(labels==i),y(labels==i),z(labels==i)];%call x,y,z coord of all points which is belong the same cluster
cell_cluster{end+1} = cluster_i;%this is (1xk)cell. where k=number of cluster
end
figure;Plot_cell(cell_cluster);view(3);% plot result cluster(using function to plot)
But when I verify by using manually method (ground truth data), the result should as below figure:
Thus, I wonder about the result of new function "pcsegdist" in Matlab R2018b, Or I misunderstand or I wrong somewhere ?enter link description here

use shapefile to mask raster data in ArcGIS, then weighted sum

I want to mask a raster data using a shapefile with ArcGIS, then weighted sum the masked parts.
Following is the path of the tool I used.
Spatial Analysis Tool -> Extraction -> Extract by mask.
When I use this tool to realize my intention, I always get several grids. However, what I want is an output having the same shape as my shapefile.
I hope the output includes several parts and can be weighted sum.
This is a coding site. For questions like this I would try https://gis.stackexchange.com/ instead.
I am not sure what you mean with weighted sum in this context, but here is an example of what you can do with R
Example data
library(raster)
p <- shapefile(system.file("external/lux.shp", package="raster"))[1,]
r <- raster(extent(p)+2, vals=1:100)
plot(x)
plot(p, add=T)
Raster cropped to polygon
x <- crop(r, p)
plot(x)
plot(p, add=T)
Disaggregate cells so that they fit the polygon better, followdd by crop and mask
d <- disaggregate(r, 100)
x <- crop(d, p)
m <- mask(x, p)
plot(m)
plot(p, add=T)

Creating grid and interpolating (x,y,z) for contour plot sagemath

!I have values in the form of (x,y,z). By creating a list_plot3d plot i can clearly see that they are not quite evenly spaced. They usually form little "blobs" of 3 to 5 points on the xy plane. So for the interpolation and the final "contour" plot to be better, or should i say smoother(?), do i have to create a rectangular grid (like the squares on a chess board) so that the blobs of data are somehow "smoothed"? I understand that this might be trivial to some people but i am trying this for the first time and i am struggling a bit. I have been looking at the scipy packages like scipy.interplate.interp2d but the graphs produced at the end are really bad. Maybe a brief tutorial on 2d interpolation in sagemath for an amateur like me? Some advice? Thank you.
EDIT:
https://docs.google.com/file/d/0Bxv8ab9PeMQVUFhBYWlldU9ib0E/edit?pli=1
This is mostly the kind of graphs it produces along with this message:
Warning: No more knots can be added because the number of B-spline
coefficients
already exceeds the number of data points m. Probably causes:
either
s or m too small. (fp>s)
kx,ky=3,3 nx,ny=17,20 m=200 fp=4696.972223 s=0.000000
To get this graph i just run this command:
f_interpolation = scipy.interpolate.interp2d(*zip(*matrix(C)),kind='cubic')
plot_interpolation = contour_plot(lambda x,y:
f_interpolation(x,y)[0], (22.419,22.439),(37.06,37.08) ,cmap='jet', contours=numpy.arange(0,1400,100), colorbar=True)
plot_all = plot_interpolation
plot_all.show(axes_labels=["m", "m"])
Where matrix(c) can be a huge matrix like 10000 X 3 or even a lot more like 1000000 x 3. The problem of bad graphs persists even with fewer data like the picture i attached now where matrix(C) was only 200 x 3. That's why i begin to think that it could be that apart from a possible glitch with the program my approach to the use of this command might be totally wrong, hence the reason for me to ask for advice about using a grid and not just "throwing" my data into a command.
I've had a similar problem using the scipy.interpolate.interp2d function. My understanding is that the issue arises because the interp1d/interp2d and related functions use an older wrapping of FITPACK for the underlying calculations. I was able to get a problem similar to yours to work using the spline functions, which rely on a newer wrapping of FITPACK. The spline functions can be identified because they seem to all have capital letters in their names here http://docs.scipy.org/doc/scipy/reference/interpolate.html. Within the scipy installation, these newer functions appear to be located in scipy/interpolate/fitpack2.py, while the functions using the older wrappings are in fitpack.py.
For your purposes, RectBivariateSpline is what I believe you want. Here is some sample code for implementing RectBivariateSpline:
import numpy as np
from scipy import interpolate
# Generate unevenly spaced x/y data for axes
npoints = 25
maxaxis = 100
x = (np.random.rand(npoints)*maxaxis) - maxaxis/2.
y = (np.random.rand(npoints)*maxaxis) - maxaxis/2.
xsort = np.sort(x)
ysort = np.sort(y)
# Generate the z-data, which first requires converting
# x/y data into grids
xg, yg = np.meshgrid(xsort,ysort)
z = xg**2 - yg**2
# Generate the interpolated, evenly spaced data
# Note that the min/max of x/y isn't necessarily 0 and 100 since
# randomly chosen points were used. If we want to avoid extrapolation,
# the explicit min/max must be found
interppoints = 100
xinterp = np.linspace(xsort[0],xsort[-1],interppoints)
yinterp = np.linspace(ysort[0],ysort[-1],interppoints)
# Generate the kernel that will be used for interpolation
# Note that the default version uses three coefficients for
# interpolation (i.e. parabolic, a*x**2 + b*x +c). Higher order
# interpolation can be used by setting kx and ky to larger
# integers, i.e. interpolate.RectBivariateSpline(xsort,ysort,z,kx=5,ky=5)
kernel = interpolate.RectBivariateSpline(xsort,ysort,z)
# Now calculate the linear, interpolated data
zinterp = kernel(xinterp, yinterp)

Efficient computation of "variable (number of points included)" moving average in R

I'm trying to implement a variable exponential moving average on a time series of intraday data (i.e 10 seconds). By variable, I mean that the size of the window included in the moving average depends on another factor (i.e. volatility). I was thinking of the following:
MA(t)=alpha(t)*price(t) + (1-alpha(t))MA(t-1),
where alpha corresponds for example to a changing volatility index.
In a backtest on huge series (more than 100000) points, this computation causes me "troubles". I have the complete vectors alpha and price, but for the current values of MA I always need the value just calculated before. Thus, so far I do not see a vectorized solution????
Another idea, I had, was trying to directly apply the implemented EMA(..,n=f()) function to every data point, by always having a different value for f(). But I do not find a fast solution neither so far.
Would be very kind if somebody could help me with my problem??? Even other suggestions of how constructing a variable moving average would be great.
Thx a lot in advance
Martin
A very efficient moving average operation is also possible via filter():
## create a weight vector -- this one has equal weights, other schemes possible
weights <- rep(1/nobs, nobs)
## and apply it as a one-sided moving average calculations, see help(filter)
movavg <- as.vector(filter(somevector, weights, method="convolution", side=1))
That was left-sided only, other choices are possible.
For timeseries, see the function rollmean in the zoo package.
You actually don't calculate a moving average, but some kind of a weighted cumulative average. A (weighted) moving average would be something like :
price <- runif(100,10,1000)
alpha <- rbeta(100,1,0.5)
tp <- embed(price,2)
ta <- embed(alpha,2)
MA1 <- apply(cbind(tp,ta),1,function(x){
weighted.mean(x[1:2],w=2*x[3:4]/sum(x))
})
Make sure you rescale the weights so they sum to the amount of observations.
For your own calculation, you could try something like :
MAt <- price*alpha
ma.MAt <- matrix(rep(MAt,each=n),nrow=n)
ma.MAt[upper.tri(ma.MAt)] <- 0
tt1 <- sapply(1:n,function(x){
tmp <- rev(c(rep(0,n-x),1,cumprod(rev(alpha[1:(x-1)])))[1:n])
sum(ma.MAt[i,]*tmp)
})
This calculates the averages as linear combinations of MAt, with weights defined by the cumulative product of alpha.
On a sidenote : I assumed the index to lie somewhere between 0 and 1.
I just added a VMA function to the TTR package to do this. For example:
library(quantmod) # loads TTR
getSymbols("SPY")
SPY$absCMO <- abs(CMO(Cl(SPY),20))/100
SPY$vma <- VMA(Cl(SPY), SPY$absCMO)
chartSeries(SPY,TA="addTA(SPY$vma,on=1,col='blue')")
x <- xts(rnorm(1e6),Sys.time()-1e6:1)
y <- xts(runif(1e6),Sys.time()-1e6:1)
system.time(VMA(x,y)) # < 0.5s on a 2.2Ghz Centrino
A couple notes from the documentation:
‘VMA’ calculate a variable-length
moving average based on the absolute
value of ‘w’. Higher (lower) values
of ‘w’ will cause ‘VMA’ to react
faster (slower).
The pre-compiled binaries should be on R-forge within 24 hours.