Gdal: how to assign values to pixel based on condition? - gdal

I would like to change the values of the pixel of a geotiff raster such as is 1 if the pixel values are between 50 and 100 and 0 otherwise.
Following this post, this is what I am doing:
gdal_calc.py -A input.tif --outfile=output.tif --calc="1*(50<=A<=100)" --NoDataValue=0
but I got the following error
0.. evaluation of calculation 1*(50<=A<=100) failed
The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

I think such a notation would only work if the expression returns a single boolean, but this returns an array of booleans. Hence the suggestion to aggregate the array to a scalar with something like any() or all().
You should be able to write it in a way compatible with Numpy arrays with something like this:
1 * ((50 <= A) & (A <=100))
Your original expression has an implicit and in it, whereas this uses an explicit & which translates to np.logical_and for an element-wise test if both values on either side are True.
I'm not sure what the multiplication with one adds in this case, it casts the bool to an int32 datatype. Even if you need to write the result as an int32 you can probably still leave the casting to GDAL in this case.
A toy example replicating this would be:
a = np.random.randint(0,2,5, dtype=np.bool_)
b = np.random.randint(0,2,5, dtype=np.bool_)
With this data a and b would fail in the same way, because it can't evaluate an entire array as True/False, whereas a & b would return a new array with the element-wise result.

Related

Combination function in numpy that be applied as vectorized method on a data frame

I have dataframe with around 45 million rows and I need to apply a method to calculate combinations of two columns. So the the function needs to be applied on all rows. For loop works with comb function from math module however takes lot of time, .apply also seems to be not viable option. I tried 3 other options
Option 1
comb function from math module as vectorized operation
df1['comb'] = comb(df1['c1'],df1['c2'])
This throws the error TypeError: 'Series' object cannot be interpreted as an integer
Option2
df1['comb'] = np.vectorize(comb_fun)(df1['c1'],df1['c2'])
This works however still takes time
Option 3
import scipy.special as ss
df1['comb'] = ss.comb(df1['c1'],df1['c2'])
This works, gives fast results however gives floating point results which affects my further calculations. When I use exact=True to avoid floating point precision, it gives the following error
TypeError: cannot convert the series to <class 'int'>
If anyone of you know other function/ way that can be applied as vectorized operation on data frame to calculate combinations please suggest. Thanks.

julia index matrix with vector

Suppose I have a 20-by-10 matrix m
and a 20-by-1 vector v, where each element is an integer between 1 to 10.
Is there smart indexing command something like m[:,v]
that would give a vector, where each element i is element of m at the index [i,v[i]]?
No, it seems that you cannot do it. Documentation (http://docs.julialang.org/en/stable/manual/arrays/) says:
If all the indices are scalars, then the result X is a single element from the array A. Otherwise, X is an array with the same number of dimensions as the sum of the dimensionalities of all the indices.
So, to get 1d result from indexing operation you need to have one of the indices to have dimensionality 0, i.e. to be just a scalar -- and you won't get what you want then.
Use comprehension, as proposed in the comment to your question.
To be explicit about the comprehension approach:
[m[i,v[i]] for i = 1:length(v)]
This is concise and clear enough that having a special syntax seems unnecessary.

Using vectorize to apply function to each row in Numpy 2d array

I have a 1000x784 matrix of data (10000 examples and 784 features) called X_valid and I'd like to apply the following function to each row in this matrix and get the numerical result:
def predict_prob(x_valid, cov, mean, prior):
return -0.5 * (x_valid.T.dot(np.linalg.inv(cov)).dot(x_valid) + mean.T.dot(
np.linalg.inv(cov)).dot(mean) + np.linalg.slogdet(cov)[1]) + np.log(
prior)
(x_valid is simply a row of data). I'm using numpy's vectorize to do this with the following code:
v_predict_prob = np.vectorize(predict_prob)
scores = v_predict_prob(X_valid, covariance[num], means[num], priors[num])
(covariance[num], means[num], and priors[num] are just constants.)
However, I get the following error when running this:
File "problem_5.py", line 48, in predict_prob
return -0.5 * (x_valid.T.dot(np.linalg.inv(cov)).dot(x_valid) + mean.T.dot(np.linalg.inv(cov)).dot(mean) + np.linalg.slogdet(cov)[1]) + np.log(prior)
AttributeError: 'numpy.float64' object has no attribute 'dot'
That is, it's not passing in each row of the matrix individually. Instead, it is passing in each entry of the matrix (not what I want).
How can I alter this to get the desired behavior?
vectorize is NOT a general substitute for iteration, nor does it claim to be faster. It mainly streamlines access to the numpy broadcasting functionality. In general the function that you vectorize will take scalar inputs, not rows or 1d arrays.
I don't think there is a way of configuring vectorize to pass an array to your function as opposed to an item.
You describe x_valid as 2d that you want to evaluate row by row. And the other terms as 'constants' which you select with [num]. What shape are those constants?
You function treats a lot of these terms as 2d arrays:
x_valid.T.dot(np.linalg.inv(cov)).dot(x_valid) +
mean.T.dot(np.linalg.inv(cov)).dot(mean) +
np.linalg.slogdet(cov)[1]) + np.log(prior)
x_valid.T is meaningful only if x_valid is 2d. If it is 1d, the transpose does noting.
np.linalg.inv(cov) only makes sense if cov is 2d.
mean.T.dot... assumes mean is 2d.
np.linalg.slogdet(cov)[1] assumes np.linalg.slogdet(cov) has 2 or more elements (or rows).
You need to show us that the function works with some real arrays before jumping into iteration or 'vectorize'.
I suggest just using a for loop:
def v_predict_prob(X_valid, c, m, p):
out = []
for row in X_valid:
out.append(predict_prob(row, c, m, p))
return np.array(out)
Under the hood np.vectorize is doing the same thing: http://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.vectorize.html
I know this question is a bit outdated, but I thought I would provide an answer for 2020.
Since the release of numpy 1.12, there is a new optional argument, "signature", which should allow 2D array functionality in most cases. Additionally, you will want to "exclude" the constants since they will not be vectorized.
All you would need to change is:
v_predict_prob = np.vectorize(predict_prob, exclude=['cov', 'mean', 'prior'], signature='(n)->()')
This signifies that the function should expect an n-dim array and output a scalar, and cov, mean, and prior will not be vectorized.

(n+1)-dim boolean masking a n-dim array with array of means as desired output

I have this 2D-array with values
values=np.random.rand(3,3)
and a 3D-array with boolean masks
masks = np.random.rand(5,3,3)>0.5
My desired output is an array of the means of the masked values. I can do that with:
np.array([values[masks[i]].mean() for i in range(len(masks))])
Is there a more efficient way of achieving that ?
You could use matrix-multplication with np.dot like so -
# Counts of valid mask elements for each element in output
counts = masks.sum(axis=(1,2))
# Use matrix multiplication to get sum of elementwise multiplications.
# Then, divide by counts for getting average/mean values as final output.
out = np.dot(masks.reshape(masks.shape[0],-1),values.ravel())/counts
One can also use np.tensordot to perform the dot-product without reshaping, like so -
out = np.tensordot(masks,values,axes=([1,2],[0,1]))/counts
For generic cases involving functions like min() & max(), you can broadcast values to a 3D array version of the same shape as masks and with elements set from values at True positions, otherwise set as NaNs. Then, you can use functions like np.nanmin and np.nanmax that allows users to perform such operations ignoring the NaNs, thus replicating our desired behavior. Thus, we would have -
# Masked array with values being put at True places of masks, otherwise NaNs
nan_masked_values = np.where(masks,values,np.nan)
# For performing .min() use np.nanmin
out_min = np.nanmin(nan_masked_values,axis=(1,2))
# For performing .max() use np.nanmax
out_max = np.nanmax(nan_masked_values,axis=(1,2))
Thus, the original .mean() calculation could be performed with np.nanmean like so -
out_mean = np.nanmean(nan_masked_values,axis=(1,2))

Numpy C-Api array_equal

I've tried to find function comparing two PyArrayObject - something like numpy array_equal But I haven't found anything. Do you know function like this?
If not - How to import this numpy array_equal to my C code?
Here's the code for array_equal:
def array_equal(a1, a2):
try:
a1, a2 = asarray(a1), asarray(a2)
except:
return False
if a1.shape != a2.shape:
return False
return bool(asarray(a1 == a2).all())
As you can see it is not a c-api level function. After making sure both inputs are arrays, and that shape match it performs a element == test, followed by all.
This does not work reliably with floats. It's ok with ints and booleans.
There probably is some sort of equality function in the c-api, but a clone of this probably isn't what you need.
PyArray_CountNonzero(PyArrayObject* self)
might be a good function. I remember from digging into the code earlier that PyArray_Nonzero uses it to determine how big of an array to allocate and return. You could give it an object that compares the elements of your 2 arrays (in what ever way is appropriate given the dtype), and then test for a nonzero count.
Or you could construct your own iterator that bails out as soon as it gets a not-equal pair of elements. Use nditer to get the full array broadcasting power.