Selecting time dimension for a given latitude and longitude from an Xarray dataset - indexing

I am looking for suggestion on the fastest way to select a time-series for a given latitude and longitude from an xarray dataset. xarray dataset that I am working with is 3 dimensional of the shape [400, 2000, 7200] where the first dimension is time (400), then latitude (2000) and longitude (7200). I simply need to read in individual time-series for each of the grid cells in a given rectangle. So I am reading in time-series one by one for each grid cell with the given rectangle.
For this selection I am using .sel option.
XR..sel(latitude=Y, longitude=X)
Where XR is a xarray dataset, and Y and X are the given latitude and longitude.
This works well but turns about to be very slow when repeated several times. Is there a faster option to do this?
Thank you for your help!

Related

Extract pixel values using polygons from time series image collection using Colab

I created a Sentinel Image collection (s7) and applied ee.Reducer.percentile (3month data)
S2_PU_perc = s7.reduce(ee.Reducer.percentile(ee.List([0,25,50, 75,100])))
I now need to extract from this file (S2_PU_perc) the 20 random pixel values from the polygon shapefile that I have (with 200 polygons in it) and get my output as CSV (this will have pixels in the rows and bands in col, every band will have 5 percentile values). I need this code in Colab.
Thanks very much will appreciate the help.

How Can I Find Peak Values of Defined Areas from Spectrogram Data using numpy?

I have spectrogram data from an audio analysis which looks like this:
On one axis I have frequencies in Hz and in the other times in seconds. I added the grid over the map to show the actual data points. Due to the nature of the used frequency analysis, the best results never give evenly spaced time and frequency values.
To allow comparison data from multiple sources, I would like to normalize this data. For this reason, I would like to calculate the peak values (maximum and minimum values) for specified areas in the map.
The second visualization shows the areas where I would like to calculate the peak values. I marked an area with a green rectangle to visualize this.
While for the time values, I would like to use equally spaced ranges (e.g 0.0-10.0, 10.0-20.0, 20.0-30.0), The frequency ranges are unevenly distributed. In higher frequencies, they will be like 450-550, 550-1500, 1500-2500, ...
You can download an example data-set here: data.zip. You can unpack the datasets like this:
with np.load(DATA_PATH) as data:
frequency_labels = data['frequency_labels']
time_labels = data['time_labels']
spectrogram_data = data['data']
DATA_PATH has to point to the path of the .npz data file.
As input, I would provide an array of frequency and time ranges. The result should be another 2d NumPy ndarray with either the maximum or the minimum values. As the amount of data is huge, I would like to rely on NumPy as much as possible to speed up the calculations.
How do I calculate the maximum/minimum values of defined areas from a 2d data map?

xarray, indexing with multidimensional coordinates

I'm not sure if it is possible, but I would like to add a new coordinate to a xarray which is the same size as the data that its indexing, but has a different number of dimensions.
For example, take a 2D array, one dimension is a date, and the other is time of day:
data = np.random.rand(5, 24)
date_idx = pd.date_range("20200921", "20200925")
time_of_day_idx = np.arange(0, np.timedelta64(1, "D"), np.timedelta64(1, "h"))
da = xr.DataArray(data,
dims=("date", "time_of_day"),
coords={"date" : date_idx,
"time_of_day": time_of_day_idx})
I can add a new 2D coordinate with the times for each value in the array.
time_idx = np.add.outer(date_idx, time_of_day_idx)
da.assign_coords(time=(("date", "time_of_day"), time_idx))
But what I'd really like to be able to do is to stack the dimensions together to create a new one dimensional coordinate for a time-series in the same array, something like this:
da.assign_coords(time=(("date", "time_of_day"), time_idx.flatten()))
However, doing this raises:
ValueError: dimensions ('date', 'time_of_day') must have the same length as the number of data dimensions, ndim=1
I know I can reshape or flatten a 2D xarray to 1D and create the new index for the time-series but it would be really handy to be able to have a 1D coordinate that can index into a 2D xarray. I've played a bit with stacking and MultiIndex, but I found the same problem, each index has separate values access each dimension, rather than a single value that can be used to index the location across both dimensions.
Does anyone know if this is possible?
This is not currently possible with xarrays.

Gridding/binning data

I have a dataset with three columns: lat, lon, and wind speed. My goal is to have a 2-dimensional lat/lon gridded array that sums the wind speed observations that fall within each gridbox. It seems like that should be possible with groupby or cut in pandas. But I can't puzzle through how to do that.
Here is an example of what I'm trying to replicate from another language: https://www.ncl.ucar.edu/Document/Functions/Built-in/bin_sum.shtml
It sounds like you are using pandas. Are the data already binned? If so, something like this should work
data.groupby(["lat_bins", "lon_bins"]).sum()
If the lat and lon data are not binned yet, you can use pandas.cut to create a binned value column like this
data["lat_bins"] = pandas.cut(x = data["lat"], bins=[...some binning...])

Numpy polyfit .How to get exact fit for the data points provided

I have x,y data points.
Using these points, I am trying to create a function to fit 50 (y points)points to generate the corresponding x coordinates.
But in my plot, when I try to zoom, the plot, I can see the 50 points provided is fitting the curve, but data points are slightly deviating from the plot. There is a small change from data point (in the range on delta=.001) with respect to the line generated from the 50 points if I zoom.
How do I generate a perfect curve which fits the data points along with the 50 points provided.
please refer the screenshot of the code
To cover 50 points perfectly you need to increase the order of the polynom. So instead of polyfit(x, y, 10) try polyfit(x, y, 49) ?
See https://arachnoid.com/polysolve/
A "perfect" fit (one in which all the data points are matched) can often be gotten by setting the degree of the regression to the number of data pairs minus one.