Count (or sum) the number of the gridpoints from a high resultion 2-D data, that are closest to the nearst gridpoints of a 2-D coarse resolution? - numpy

I have two datasets, the first one is a high spatial resolution, and its values are 0 and 1, and the second dataset has coarse spatial resolution data (its values are not important in my case).
I would like to count the number of gridpoints from the high-resolution data which are closest to the gridpoints of the coarse-resolution data, where the values of the high-resolution data are 1.
In other words, count the number of high-resolution gridpoints with the value of 1, that fall within the pixels of the coarse-resolution data.
Example of the data for coarse spatial resolution data
lon = [ 176.25, 176.75, 177.25, 177.75, 178.25, 178.75, 179.25, 179.75]
lat = [-87.25, -87.75, -88.25, -88.75, -89.25, -89.75]
temperature = np.random.rand(6, 8)
coarse_res = xr.DataArray(temperature, coords={'lat': lat,'lon': lon}, dims=["lat", "lon"])
Example of the data for high spatial resolution data
lon = [176.125,176.375,176.625,176.875,177.125,177.375,177.625,177.875,178.125,178.375,178.625,178.875,179.125,179.375,179.625,179.875]
lat = [-87.125, -87.375, -87.625, -87.875, -88.125, -88.375, -88.625, -88.875, -89.125, -89.375, -89.625, -89.875]
ds_2 = np.random.randint(0, 2, size=(12, 16))
high_res = xr.DataArray(ds_2, coords={'lat': lat,'lon': lon}, dims=["lat", "lon"])
In the end, I would like to calculate the fraction of the high_res gridpoints/pixels with the value of 1 surrounding the coarse-resolution gridpoint. For example, if the first gridpoint of the coarse_res data is surrounded by 4 high-res gridpoints and these values are 0, 1, 1, 1 the fraction should be 0.75.

You can do this with xr.Dataset.groupby_bins:
low_lon_edges = np.arange(176., 178.001, 0.5)
low_lat_edges = np.arange(-90, -86.9, 0.5)
low_lon_centers = (low_lon_edges[:-1] + low_lon_edges[1:]) / 2
low_lat_centers = (low_lat_edges[:-1] + low_lat_edges[1:]) / 2
aggregated = (
high_res
.groupby_bins('lon', bins=low_lon_edges, labels=low_lon_centers)
.sum(dim="lon")
.groupby_bins('lat', bins=low_lat_edges, labels=low_lat_centers)
.sum(dim="lat")
)
Additionally, if the cells nest perfectly (it looks like you're dealing with 1/4 and 1/2 degree data which are both centered on the half cell, so this should work fine) you can just use xr.Dataset.coarsen:
aggregated = ds.coarsen(lat=2, lon=2, boundary="exact").sum()

Related

Question about Random Forest data preprocessing

I'm trying to analyze about which product will broke.
I have 4 variables(X) and 1 result(Y)
I have some data experimented under the same conditions, and their results are different.
for example
Y : results(1,0)
X : temp, day, angle, used_time
train_data <- data.frame(temp = c(10,10,20,25,20,30,30,10),day = c("Mon","Mon","Tues","Tues","Thurs","Mon","Mon","Thurs"),angle = c(90,90,90,180,180,90,90,180), used_time = c(25,25,25,30,30,30,30,25), results = c(1,1,0,1,0,0,0,0))
like in this train_data, I have same X and same Y in 1,2 rows and 6,7 rows.
Should I combine this same rows into 1 row? (I've experimented twice)
I wonder if leaving these two rows would be more effective from a modeling point of view.

Taking mean of N largest values of group by absolute value

I have some DataFrame:
d = {'fruit': ['apple', 'pear', 'peach'] * 6, 'values': np.random.uniform(-5,5,18), 'values2': np.random.uniform(-5,5,18)}
df = pd.DataFrame(data=d)
I can take the mean of each fruit group as such:
df.groupby('fruit').mean()
However, for each group of fruit, I'd like to take the mean of the N number of largest values as
ranked by absolute value.
So for example, if my values were as follows and N=3:
[ 0.7578507 , 3.81178045, -4.04810913, 3.08887538, 2.87999752, 4.65670954]
The desired outcome would be (4.65670954 + -4.04810913 + 3.81178045) / 3 = ~1.47
Edit - to clarify that sign is preserved in outcome:
(4.65670954 + -20.04810913 + 3.81178045) / 3 = -3.859
Updating with a new approach that I think is simpler. I was avoiding apply like the plague but maybe this is one of the more acceptable uses. Plus it fixes the fact that you want to mean the original values as ranked by their absolute values:
def foo(d):
return d[d.abs().nlargest(3).index].mean()
out = df.groupby('fruit')['values'].apply(foo)
So you index each group by the 3 largest absolute values, then mean.
And for the record my original, incorrect, and slower code was:
df['values'].abs().groupby(df['fruit']).nlargest(3).groupby("fruit").mean()

SQL Sever Geospatial, find location of point at a distance along a linestring

We are investigating migrating a prototype into SQL Server (azure).
We have LineStrings that also have M values. What we would like to do is given another M value find out what its geographical location is.
To aid your visualisation, here is a real-world example:
I have a linestring that represents a flight path. Because the flight goes up and down the distance the plane has actually moved is not the same as the total length of the linestring. We have calibrated M values as a part of the linestring but need to be able to plot on it where a given event occurred. All we know about this event is its M value.
SET #g = geometry::STGeomFromText('LINESTRING(1 0 NULL 0, 2 2 NULL 5, 1 4 NULL 9, 3 6 NULL 15)', 0);
Given something like the above, what is the lat and long of a point with an M value of 8?
This should be an equivalent postgis's ST_LocateAlong
The M value is not a time, but a distance. It should be understood that this distance is arbitrary and does not directly relate to the length of the line and is calibrated against known points. This is due to the set being based on historic data that is in no way accurate by today's standards.
*Note I am not sure if I have Nulled the Z or M value. The extra parameter we are considering here is the M only.

VB2010 Setting logarithmic scale intervals

I'm developing a financial application in which I need to display data in a chart with a logarithmic scale on the Y axis. Everything works fine except for the intervals. With the following:
chart.ChartAreas(0).AxisY.IsLogarithmic = True
chart.ChartAreas(0).AxisY.LogarithmBase = 10
chart.ChartAreas(0).AxisY.Interval = 1
chart.ChartAreas(0).AxisY.Minimum = CalcMinYVal(minYVal)
I get the CalcMinYVal multiplied by 10^0,10^1,10^2,10^3 and so on for the Y-axis values.
I would like to have the Y axis values increased by 1. How can I have the interval be REALLY 1?
You can enable the MinorGrid property
chart.ChartAreas(0).AxisY.MinorGrid = True
to show the horizontal lines in between the powers of 10 like shown below.
But there is a limitation in showing the value for each subdivision. They can only appear in fixed intervals by using the Interval property of the LabelStyle.
For example to show 10 subdivisions, you can set:
Chart.ChartAreas(0).AxisY.LabelStyle.Interval = 0.1
The number of the horizontal lines of the MinorGrid can be controlled by using its Interval propery:
Chart.ChartAreas(0).AxisY.MinorGrid.Interval = 1
and the values of the labels can be rounded by using the format property:
Chart.ChartAreas(0).AxisY.LabelStyle.Format = "{0.0}"

How to Resize using Lanczos

I can easily calculate the values for sinc(x) curve used in Lanczos, and I have read the previous explanations about Lanczos resize, but being new to this area I do not understand how to actually apply them.
To resample with lanczos imagine you
overlay the output and input over
eachother, with points signifying
where the pixel locations are. For
each output pixel location you take a
box +- 3 output pixels from that
point. For every input pixel that lies
in that box, calculate the value of
the lanczos function at that location
with the distance from the output
location in output pixel coordinates
as the parameter. You then need to
normalize the calculated values by
scaling them so that they add up to 1.
After that multiply each input pixel
value with the corresponding scaling
value and add the results together to
get the value of the output pixel.
For example, what does "overlay the input and output" actually mean in programming terms?
In the equation given
lanczos(x) = {
0 if abs(x) > 3,
1 if x == 0,
else sin(x*pi)/x
}
what is x?
As a simple example, suppose I have an input image with 14 values (i.e. in addresses In0-In13):
20 25 30 35 40 45 50 45 40 35 30 25 20 15
and I want to scale this up by 2, i.e. to an image with 28 values (i.e. in addresses Out0-Out27).
Clearly, the value in address Out13 is going to be similar to the value in address In7, but which values do I actually multiply to calculate the correct value for Out13?
What is x in the algorithm?
If the values in your input data is at t coordinates [0 1 2 3 ...], then your output (which is scaled up by 2) has t coordinates at [0 .5 1 1.5 2 2.5 3 ...]. So to get the first output value, you center your filter at 0 and multiply by all of the input values. Then to get the second output, you center your filter at 1/2 and multiply by all of the input values. Etc ...