Numpy Set Values In Range Above And Below Zero To Zero - numpy

I'm building a optimization model with numpy and i have matrices with the values of 1e-17 and also -1e-17 i would like to set them to zero in a range of lets say 1e-8 above and below zero so that the matrix coefficients are not as big.
I found this
a = [0 if a_ > thresh else a_ for a_ in a] in another post. But this only works for non negative values.
I tried adding another threshhold like this:
a = [0 if a_ > thresh and a_<thresh2 else a_ for a_ in a]
This gave an error message: "The truth value of an array with more than one element is amniguous. Use a.any() or a.all()."
I added this like this: a = [0 if (a_ > thresh and a_<thresh2).all() else a_ for a_ in a].
This did not work either.
Is there a simple solution to this problem? Any ideas on how to solve this?

I think you want the absolute value to be compared to the threshold?
It should be a builtin in python.
a = [0 if abs(a_) > thresh else a_ for a_ in a]
In case I'm missing something about python's treatment of small numbers without numpy, there's also a numpy.absolute(a_) function.

Related

Optimize this loss function (any way to vectorize it?)

def get_model_score(preds, actuals, sw):
total_loss = 0
for i in range(len(preds)):
for idx, v in enumerate(actuals[i]):
if v != 0:
total_loss += sw[i] * abs(preds[i][idx] - actuals[i][idx])
loss = total_loss / (sum(sw) * len(preds))
return loss
I have a loss function which essentially is a weighted absolute mean error. However, we can expect every "true" sample to only have one non-zero value ex. [0, 0, 1]. We only want to account for the loss between this non-zero value and the corresponding predicted value.
Take the following examples:
True: [0, 0, 1]
Predicted: [0.5, -0.5, 0.5]
The loss for this sample would simply just be 0.5. (In the actual function though we do also have an array of sample-wise weights as well- "sw")
That being said I'm having trouble figuring out if my function can be vectorized and put into Numpy.
Looks like this is what did the trick
> np.sum(np.abs(actuals[~np.isnan(actuals)] - preds[~np.isnan(actuals)])
> * sw) / (sum(sw) * len(preds))
Actually ended up going with nulls instead of zeros so the condition is ~np.isnan(actuals).
But yea, I think the trick was the use the condition on both the actual and pred array. On the pred array it will grab the correct index based on the condition from the actual array. If this helps for anyone doing something similar.

Possible to use np.where to check a condition in vector, but output rows in a 2D array

I have a series and a dataframe. I want to check if the values in a series pass a condition, and modify the row of the dataframe if they do, otherwise leave as is.
NumPy has a broadcasting issue with this - is there another way to do this?
ser = pd.Series([74, 80, 24], pd.date_range(start='2020-01-01', periods=3, freq='D'))
test = pd.DataFrame([pd.Series([1, 2], index=['a', 'b'])] * len(ser), index=ser.index)
np.where(ser<50, (test*2), test)
ValueError: operands could not be broadcast together with shapes (3,)
(3,2) (3,2)
I think a workaround would be to modify ser to be a dataframe with all equivalent columns, but it seems a little bit clunky.
Use broadcasting in NumPy, so they are not aligned by indices, only necessary same length of Series and DataFrame:
a = np.where(ser.to_numpy()[:, None]<50, (test*2), test)
print (a)
[[1 2]
[1 2]
[2 4]]

Julia Dataframe generate unlinked variable duplicate

I want to define a new column based on a which is afterwards not linked to the original.
using DataFrames
x = DataFrame(a=1:3)
x.b = x.a
x.b[1] += 1
There are several ways to do it, the major are:
x[:, :b] = x.a
or
x.b = x[:, :a]
You can also write:
x[!, :b] = x[:, :a]
(this can be useful if :b were a variable)
Finally you could also just write:
df.b = copy(df.a)
or
df.b = df.a[:]
All indexing rules for DataFrames.jl can be found at https://juliadata.github.io/DataFrames.jl/stable/lib/indexing/.
In short (simplifying a bit but these rules are enough to know in practice):
df.col is non-copying for getting and for setting a column
df[!, :col] is the same as df.col with the difference that you can then easily use a variable instead of a literal for indexing and it works with broadcasting while df.col does not work with broadcasting if :col were not present in a data frame
df[:, :col] copies for getting a column and is an in-place operation for setting a column, unless :col is not present in df in which case it freshly allocates it when setting

tf.argmax is returning a random high value , outside the valid dimension range

I have the following piece of code, where I have a tensor of dimensions (150,240,240).
Now from these 150 slices, I want to construct one slice (of size 240 by 240) by comparing all 150 slices across each of the values in the 240 by 240 matrix. I use tf.argmax for that . It usually goes right. But in some cases in the result, one of the values is really huge and random like 4294967390. How is that possible ? It should have returned a value between 0 and 149 for every dimension. Following is my code for doing that .
Note in the code below dimension of the variable result - (20,150,240,240)
for i in range(0, 20):
denominator = tf.reduce_logsumexp(result[i, :, :, :], axis=0)
if i == 0:
stackofArrs = tf.argmax(tf.exp(result[i, :, :, :]-denominator), axis=0)
else:
stackofArrs = tf.concat([stackofArrs, tf.argmax(tf.exp(result[i, :, :, :]-denominator), axis=0)], axis=0)
I thought if the logsumexp operation is causing any overflow ? But even in that case argmax shouldn't return a crazy value like this right ?
tf.argmax will output out of range values if it is applied on a tensor containing NaN or Inf values. You should make sure those are not present before applying the tf.argmax

Divide one numpy array by another only where both arrays are non-zero

What's the easiest, most Pythonic way to divide one numpy array by another (of the same shape, element-wise) only where both arrays are non-zero?
Where either the divisor or dividend is zero, the corresponding element in the output array should be zero. (This is the default output when the divisor is zero, but np.nan is the default output when the dividend is zero.)
This still tries to divide by 0, but it gives the correct result:
np.where(b==0, 0, a/b)
To avoid doing the divide-by-zero, you can do:
m = b!=0
c = np.zeros_like(a)
np.place(c, m, a[m]/b[m])
I would do it in two lines:
z = x/y
z[y == 0] = 0
As you said, if only the element in x is 0, z will already be 0 at that position. So let NumPy handle that, and then fix up the places where y is 0 by using NumPy's boolean indexing.