Parsing ESRI projection information to create a CoordinateReferenceSystem - esri

I have a lot of people supplying me ESRI ASC gridded data files that were generated using ESRI tools. When they do this, the PRJ files contain the following type of information. If differs depending on the projection of course, e.g. UTM, ALBERS etc..
Does GeoTools have a parser that can create a CoordinateReferenceSystem from this format of projection definition?
Projection ALBERS
Datum NAD83
Spheroid GRS80
Units METERS
Zunits NO
Xshift 0.0
Yshift 0.0
Parameters
29 30 0.0 /* 1st standard parallel
45 30 0.0 /* 2nd standard parallel
-96 0 0.0 /* central meridian
23 0 0.0 /* latitude of projection's origin
0.0 /* false easting (meters)
0.0 /* false northing (meters)

Not directly, as that doesn't seem to be a standard (or ESRI variant) WKT projection file. But there is probably enough information in there to create a Coordinate Reference System as described here.
MathTransformFactory mtFactory = ReferencingFactoryFinder.getMathTransformFactory(null);
CRSFactory crsFactory = ReferencingFactoryFinder.getCRSFactory(null);
GeographicCRS geoCRS = org.geotools.referencing.crs.DefaultGeographicCRS.WGS84;
CartesianCS cartCS = org.geotools.referencing.cs.DefaultCartesianCS.GENERIC_2D;
ParameterValueGroup parameters = mtFactory.getDefaultParameters("Transverse_Mercator");
parameters.parameter("central_meridian").setValue(-111.0);
parameters.parameter("latitude_of_origin").setValue(0.0);
parameters.parameter("scale_factor").setValue(0.9996);
parameters.parameter("false_easting").setValue(500000.0);
parameters.parameter("false_northing").setValue(0.0);
Conversion conversion = new DefiningConversion("Transverse_Mercator", parameters);
Map<String, ?> properties = Collections.singletonMap("name", "WGS 84 / UTM Zone 12N");
ProjectedCRS projCRS = crsFactory.createProjectedCRS(properties, geoCRS, conversion, cartCS);

Related

Is there any way to convert a portfolio class from portfolio analytics into a data frame

I'm trying to find the optimal weights for an especific target return using Portfolio Analytics library and ROI optimization; However, even that I know that that target return should be feasable and should be part of the efficient frontier, the ROI optimization does not find any solution.
The code that I'm using is the following:
for(i in 0:n){
target=minret+(i)*Del
p <- portfolio.spec(assets = colnames(t_EROAS)) #Specification of asset classes
p <- add.constraint(p, type = "full_investment") #An investment that has to sum 1
p <- add.constraint(portfolio=p, type="box", min=0, max=1) #No short position long-only
p <- add.constraint(p,
type="group",
groups=group_list,
group_min=VCONSMIN[,1],
group_max=VCONSMAX[,1])
p <- add.constraint(p, type = "return", name = "mean", return_target = target)
p <- add.objective(p, type="risk", name="var")
eff.opt <- optimize.portfolio(t_EROAS, p, optimize_method = "ROI",trace=TRUE)}
n=30 but is just finding 27 portfolios and the efficient frontier that I'm creating is looking empty from portfolio 27 to portfolio 30, the 28 and 29 seems to not have a solution but I'm not sure that this is correct.
What I want to have is an efficient frontier on a data frame format with a fixed number of portfolios, and it seems that the only way to achive this is by this method. Any help or any ideas that could help?

How should I impute NaN values in a categorical column?

Should I encode a categorical column and use label encoding, then impute NaN values with most frequent value, or are there other ways?
As encoding requires converting dataframe to array, then imputing would require again array to dataframe conversion (all this for a single column, and there are more columns like that).
Fore example, I have the variable BsmtQual which evaluates the height of a basement and has following number of categories:
Ex Excellent (100+ inches)
Gd Good (90-99 inches)
TA Typical (80-89 inches)
Fa Fair (70-79 inches)
Po Poor (<70 inches
NA No Basement
Out of 2919 values in BsmtQual, 81 are NaN values.
For problems you have in the future like this that don't involve coding you should post at https://datascience.stackexchange.com/.
This depends on a few things. First of all, how important is this variable in your exercise? Assuming that you are doing classification, you could try removing all rows without with NaN values, running a few models, then removing the variable and running the same models again. If you haven't seen a dip in accuracy, then you might consider removing the variable completely.
If you do see a dip in accuracy or can't judge impact due to the problem being unsupervised, then there are several other methods you can try. If you just want a quick fix, and if there aren't too many NaNs or categories, then you can just impute with the most frequent value. This shouldn't cause too many problems if the previous conditions are satisfied.
If you want to be more exact, then you could consider using the other variables you have to predict the class of the categorical variable (obviously this will only work if the categorical variable is correlated to some of your other variables). You could use a variety of algorithms for this, including classifiers or clustering. It all depends on the distribution of your categorical variable and how much effort you want to put it in to solve your issue.
(I'm only learning as well, however I think thats most of your options)
"… or there are other ways."
Example:
Ex Excellent (100+ inches) 5 / 5 = 1.0
Gd Good (90-99 inches) 4 / 5 = 0.8
TA Typical (80-89 inches) 3 / 5 = 0.6
Fa Fair (70-79 inches) 2 / 5 = 0.4
Po Poor (<70 inches 1 / 5 = 0.2
NA No Basement 0 / 5 = 0.0
However, labels express less precision (affects accuracy if combined with actual measurements).
Could be solved by either scaling values over category range (e.g. scaling 0 - 69 inches over 0.0 - 0.2), or by approximation value for each category (more linearly accurate). For example, if highest value is 200 inch:
Ex Excellent (100+ inches) 100 / 200 = 0.5000
Gd Good (90-99 inches) ((99 - 90) / 2) + 90 / 200 = 0.4725
TA Typical (80-89 inches) ((89 - 80) / 2) + 80 / 200 = 0.4225
Fa Fair (70-79 inches) ((79 - 70) / 2) + 70 / 200 = 0.3725
Po Poor (<70 inches (69 / 2) / 200 = 0.1725
NA No Basement 0 / 200 = 0.0000
Actual measurement 120 inch 120 / 200 = 0.6000
Produces decent approximation (range mid-point value, except Ex, which is a minimum value). If calculations on such columns produce inaccuracies it is for notation imprecision (labels express ranges rather than values).

does GrADS have a "astd" (similarly to aave) command I could use?

I would like to have the spatial standard deviation for a variable (let's say temperature). in other words, does GrADS have a "astd" (similarly to aave) command I could use?
There is no command like this in GRADS. But you can actually compute the standard deviation in two ways:
[1] Compute manually. For example:
*compute the mean
x1 = ave(ts1.1,t=1,t=120)
*compute stdev
s1 = sqrt(ave(pow(ts1.1-x1,2),t=1,t=120)*(n1/(n1-1)))
n here is the number of samples.
[2] You can use the built in function 'stat' in GRADS.
You can use 'set stat on' or 'set gxout stat'
These commands will give you statics such as the following:
Data Type = grid
Dimensions = 0 1
I Dimension = 1 to 73 Linear 0 5
J Dimension = 1 to 46 Linear -90 4
Sizes = 73 46 3358
Undef value = -2.56e+33
Undef count = 1763 Valid count = 1595
Min, Max = 243.008 302.818
Cmin, cmax, cint = 245 300 5
Stats[sum,sumsqr,root(sumsqr),n]: 452778 1.29046e+08 11359.8 1595
Stats[(sum,sumsqr,root(sumsqr))/n]: 283.874 80906.7 284.441
Stats[(sum,sumsqr,root(sumsqr))/(n-1)]: 284.052 80957.4 284.53
Stats[(sigma,var)(n)]: 17.9565 322.437
Stats[(sigma,var)(n-1)]: 17.9622 322.64
Contouring: 245 to 300 interval 5
Sigma here is the standard deviation and Var is variance.
Is this what you are looking for?

Using "rollmedian" function as a input for "arima" function

My time-series data includes date-time and temperature columns as follows:
rn25_29_o:
ambtemp dt
1 -1.96 2007-09-28 23:55:00
2 -2.02 2007-09-28 23:57:00
3 -1.92 2007-09-28 23:59:00
4 -1.64 2007-09-29 00:01:00
5 -1.76 2007-09-29 00:03:00
6 -1.83 2007-09-29 00:05:00
I am using median smoothing function to enhance small fluctuations that are caused because of imprecise measurements.
unique_timeStamp <- make.time.unique(rn25_29_o$dt)
temp.zoo<-zoo(rn25_29_o$ambtemp,unique_timeStamp)
m.av<-rollmedian(temp.zoo, n,fill = list(NA, NULL, NA))
subsequently, the output of the median smoothing is used for building temporal model and achieving predictions by using the following code:
te = (x.fit = arima(m.av, order = c(1, 0, 0)))
# fit the model and print the results
x.fore = predict(te, n.ahead=50)
Finally, I encounter with the following error:
Error in seq.default(head(tt, 1), tail(tt, 1), deltat) : 'by'
argument is much too small
FYI: The modeling and prediction function works properly by using original time-series data.
Please, guide me through this error.
The problem occurred because of the properties of the zoo package.
Thus, the code can be amended to :
Median_ambtemp <- rollmedian(ambtemp,n,fill = list(NA, NULL, NA)) te = (x.fit = arima(Median_ambtemp, order = c(1, 0, 0)))
# fit the model and print the results
x.fore = predict(te, n.ahead=5)

GPS sentence: GPRMA

I'm writting a NMEA sentences parser, and couldn't find ant documentation about GPRMA sentence except that it is: "Recommended minimum specific Loran-C data". does anyone know what is the meaning of this sentence?
does the longitude and latitude in it refer to the gps device current location?
thanks
From the very handy guide at http://aprs.gids.nl/nmea/#rma:
eg. $GPRMA,A,llll.ll,N,lllll.ll,W,,,ss.s,ccc,vv.v,W*hh
1 = Data status
2 = Latitude
3 = N/S
4 = longitude
5 = W/E
6 = not used
7 = not used
8 = Speed over ground in knots
9 = Course over ground
10 = Variation
11 = Direction of variation E/W
12 = Checksum
Now, LORAN data is not GPS. It is similar, but an old standard of ground stations that were used to find positions. So to specifically answer your question, no, this is not GPS data. If you want GPS data, you will need $GPRMC.