(n+1)-dim boolean masking a n-dim array with array of means as desired output - numpy

I have this 2D-array with values
values=np.random.rand(3,3)
and a 3D-array with boolean masks
masks = np.random.rand(5,3,3)>0.5
My desired output is an array of the means of the masked values. I can do that with:
np.array([values[masks[i]].mean() for i in range(len(masks))])
Is there a more efficient way of achieving that ?

You could use matrix-multplication with np.dot like so -
# Counts of valid mask elements for each element in output
counts = masks.sum(axis=(1,2))
# Use matrix multiplication to get sum of elementwise multiplications.
# Then, divide by counts for getting average/mean values as final output.
out = np.dot(masks.reshape(masks.shape[0],-1),values.ravel())/counts
One can also use np.tensordot to perform the dot-product without reshaping, like so -
out = np.tensordot(masks,values,axes=([1,2],[0,1]))/counts
For generic cases involving functions like min() & max(), you can broadcast values to a 3D array version of the same shape as masks and with elements set from values at True positions, otherwise set as NaNs. Then, you can use functions like np.nanmin and np.nanmax that allows users to perform such operations ignoring the NaNs, thus replicating our desired behavior. Thus, we would have -
# Masked array with values being put at True places of masks, otherwise NaNs
nan_masked_values = np.where(masks,values,np.nan)
# For performing .min() use np.nanmin
out_min = np.nanmin(nan_masked_values,axis=(1,2))
# For performing .max() use np.nanmax
out_max = np.nanmax(nan_masked_values,axis=(1,2))
Thus, the original .mean() calculation could be performed with np.nanmean like so -
out_mean = np.nanmean(nan_masked_values,axis=(1,2))

Related

Gdal: how to assign values to pixel based on condition?

I would like to change the values of the pixel of a geotiff raster such as is 1 if the pixel values are between 50 and 100 and 0 otherwise.
Following this post, this is what I am doing:
gdal_calc.py -A input.tif --outfile=output.tif --calc="1*(50<=A<=100)" --NoDataValue=0
but I got the following error
0.. evaluation of calculation 1*(50<=A<=100) failed
The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I think such a notation would only work if the expression returns a single boolean, but this returns an array of booleans. Hence the suggestion to aggregate the array to a scalar with something like any() or all().
You should be able to write it in a way compatible with Numpy arrays with something like this:
1 * ((50 <= A) & (A <=100))
Your original expression has an implicit and in it, whereas this uses an explicit & which translates to np.logical_and for an element-wise test if both values on either side are True.
I'm not sure what the multiplication with one adds in this case, it casts the bool to an int32 datatype. Even if you need to write the result as an int32 you can probably still leave the casting to GDAL in this case.
A toy example replicating this would be:
a = np.random.randint(0,2,5, dtype=np.bool_)
b = np.random.randint(0,2,5, dtype=np.bool_)
With this data a and b would fail in the same way, because it can't evaluate an entire array as True/False, whereas a & b would return a new array with the element-wise result.

Simple question about slicing a Numpy Tensor

I have a Numpy Tensor,
X = np.arange(64).reshape((4,4,4))
I wish to grab the 2,3,4 entries of the first dimension of this tensor, which you can do with,
Y = X[[1,2,3],:,:]
Is this a simpler way of writing this instead of explicitly writing out the indices [1,2,3]? I tried something like [1,:], which gave me an error.
Context: for my real application, the shape of the tensor is something like (30000,100,100). I would like to grab the last (10000, 100,100) to (30000,100,100) of this tensor.
The simplest way in your case is to use X[1:4]. This is the same as X[[1,2,3]], but notice that with X[1:4] you only need one pair of brackets because 1:4 already represent a range of values.
For an N dimensional array in NumPy if you specify indexes for less than N dimensions you get all elements of the remaining dimensions. That is, for N equal to 3, X[1:4] is the same as X[1:4, :, :] or X[1:4, :]. Only if you want to index some dimension while getting all elements in a dimension that comes before it is that you actually need to pass :. Such as X[:, 2:4], for instance.
If you wish to select from some row to the end of array, simply use python slicing notation as below:
X[10000:,:,:]
This will select all rows from 10000 to the end of array and all columns and depths for them.

numpy: Different results between diff and gradient for finite differences

I want to calculate the numerical derivative of two arrays a and b.
If I do
c = diff(a) / diff(b)
I get what I want, but I loose the edge (the last point) so c.shape ~= a.shape.
If I do
c = gradient(a, b)
then c.shape = a.shape, but I get a completely different result.
I have read how gradient is calculated in numpy and I guess it does a completely different thing, although I dont understand quite well the difference yet. But is there a way or another function to calculate the differential which also gives the values at the edges?
And why is the result so different between gradient and diff?
These functions, although related, do different actions.
np.diff simply takes the differences of matrix slices along a given axis, and used for n-th difference returns a matrix smaller by n along the given axis (what you observed in the n=1 case). Please see: https://docs.scipy.org/doc/numpy/reference/generated/numpy.diff.html
np.gradient produces a set of gradients of an array along all its dimensions while preserving its shape https://docs.scipy.org/doc/numpy/reference/generated/numpy.gradient.html Please also observe that np.gradient should be executed for one input array, your second argument b does not make sense here (was interpreted as first non-keyword argument from *varargs which is meant to describe spacings between the values of the first argument), hence the results that don't match your intuition.
I would simply use c = diff(a) / diff(b) and append values to c if you really need to have c.shape match a.shape. For instance, you might append zeros if you expect the gradient to vanish close to the edges of your window.

Is it possible to omit a Tensorflow scalar summary dependant on its value?

I build summary ops and add them to collections, then always evaluate the summary collection as part of the sess.run call during training/validation.
However, there are some cases where the value is nan, and it makes the Tensorboard graphs go bad. (triangles instead of data points, and the smoothing doesn't work with a nan value in between).
Is there a way to omit a particular summary from the collection, dependant on the value being valid? I could replace the nan value with a zero or similar, but any artificially chosen value would pollute the true reported statistics.
I add the summaries like this:
tf.summary.scalar('scc_precision_test', precision_test, [Constants.TEST_SUMMARIES])
Thanks!
You can check the value of your summary before writing it to the FileWriter:
prec_test = tf.summary.scalar('scc_precision_test', precision_test,
[Constants.TEST_SUMMARIES])
# ...
..., prec_test_sum = sess.run([..., prec_test], ...)
prec_test_sum = tf.Summary().FromString(prec_test_sum)
if np.isfinite(prec_test_sum.value[0].simple_value):
writer.add_summary(prec_test_sum.SerializeToString(), global_step=...)
If you have multiple summaries merged into a single tf.Summary object (e.g. made with tf.summary.merge/tf.summary.merge_all), then you would have to filter the value field:
prec_test = tf.summary.scalar('scc_precision_test', precision_test,
[Constants.TEST_SUMMARIES])
merged = tf.summary.merge_all(key=Constants.TEST_SUMMARIES)
# ...
..., merged_sum = sess.run([..., merged], ...)
merged_sum = tf.Summary().FromString(merged_sum)
# Reversed traversal to be able to delete elements while iterating correctly
for i, value in reversed(list(enumerate(merged_sum.value))):
# Discard summary if is scalar and not finite
if value.WhichOneof('value') == 'simple_value' and not np.isfinite(value.simple_value):
del merged_sum[i]
# Write all valid summaries
writer.add_summary(merged_sum.SerializeToString(), global_step=...)

Numpy sum over planes of 3d array, return a scalar

I'm making the transition from MATLAB to Numpy and feeling some growing pains.
I have a 3D array, lets say it's 3x3x3 and I want the scalar sum of each plane.
In matlab, I would use:
sum_vec = sum(3dArray,3);
TIA
wbg
EDIT: I was wrong about my matlab code. Matlab only vectorizes in one dim, so a loop wold be required. So numpy turns out to be more elegant...cool.
MATLAB
for i = 1:3
sum_vec(i) = sum(sum(3dArray(:,:,i));
end
You can do
sum_vec = np.array([plane.sum() for plane in cube])
or simply
sum_vec = cube.sum(-1).sum(-1)
where cube is your 3d array. You can specify 0 or 1 instead of -1 (or 2) depending on the orientation of the planes. The latter version is also better because it doesn't use a Python loop, which usually helps to improve performance when using numpy.
You should use the axis keyword in np.sum. Like in many other numpy functions, axis lets you perform the operation along a specific axis. For example, if you want to sum along the last dimension of the array, you would do:
import numpy as np
sum_vec = np.sum(3dArray, axis=-1)
And you'll get a resulting 2D array which corresponds to the sum along the last dimension to all the array slices 3dArray[i, k, :].
UPDATE
I didn't understand exactly what you wanted. You want to sum over two dimensions (a plane). In this case you can do two sums. For example, summing over the first two dimensions:
sum_vec = np.sum(np.sum(3dArray, axis=0), axis=0)
Instead of applying the same sum function twice, you may perform the sum on the reshaped array:
a = np.random.rand(10, 10, 10) # 3D array
b = a.view()
b.shape = (a.shape[0], -1)
c = np.sum(b, axis=1)
The above should be faster because you only sum once.
sumvec= np.sum(3DArray, axis=2)
or this works as well
sumvec=3DArray.sum(2)
Remember Python starts with 0 so axis=2 represent the 3rd dimension.
https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.sum.html
If you're trying to sum over a plane (and avoid loops, which is always a good idea) you can use np.sum and pass two axes as a tuple for your argument.
For example, if you have an (nx3x3) array then using
np.sum(a, (1,2))
Will give an (nx1x1), summing over a plane, not a single axis.