Is there a way to easily get the logarithm of a np.ndarray containing errors - numpy

If I have a np.array of values, Y, with a no.array of corresponding errors, Err, the error in the log scale will be
Err_{log} = log(Y+Err) - log(Y) = log ((Y+Err)/Y)
While I can place this in my code, this isn't much readable. Is there a function that does that?

NumPy has the function log1p(x) that computes the log of 1+x. So you could write:
Err_log = np.log1p(Err/Y)

Related

Why does tensorflow mse() change with each run?

I used the tensorflow tf.keras.metrics.MeanSquaredError() metric to evaluate the mean squared error between two numpy arrays. But each time I call mse() it give a different result.
a = np.random.random(size=(100,2000))
b = np.random.random(size=(100,2000))
for i in range(100):
v = mse(a, b).numpy()
plt.scatter(i,v)
print(v)
where I had previously defined mse = tf.keras.metrics.MeanSquaredError() Here is the Output. Any idea what is going wrong?
np.random.random generates random data every run. So, your code should result in different mse, shouldn't it?
run 1:
[0.87148841 0.50221413 0.49858526 ... 0.22311888 0.71320089 0.36298912]
Run 2:
[0.14941241 0.78560523 0.62436783 ... 0.1865485 0.2730567 0.49300401]

tf.io.decode_raw return tensor how to make it bytes or string

I'm struggling with this for a while. I searched stack and check tf2
doc a bunch of times. There is one solution indicated, but
I don't understand why my solution doesn't work.
In my case, I store a binary string (i.e., bytes) in tfrecords.
if I iterate over dataset via as_numpy_list or directly call numpy()
on each item, I can get back binary string.
while iterating the dataset, it does work.
I'm not sure what exactly map() passes to test_callback.
I see doesn't have a method nor property numpy, and the same about type
tf.io.decode_raw return. (it is Tensor, but it has no numpy as well)
Essentially I need to take a binary string, parse it via my
x = decoder.FromString(y) and then pass it my encoder
that will transform x binary string to tensor.
def test_callback(example_proto):
# I tried to figure out. can I use bytes?decode
# directly and what is the most optimal solution.
parsed_features = tf.io.decode_raw(example_proto, out_type=tf.uint8)
# tf.io.decoder returns tensor with N bytes.
x = creator.FromString(parsed_features.numpy)
encoded_seq = midi_encoder.encode(x)
return encoded_seq
raw_dataset = tf.data.TFRecordDataset(filenames=["main.tfrecord"])
raw_dataset = raw_dataset.map(test_callback)
Thank you, folks.
I found one solution but I would love to see more suggestions.
def test_callback(example_proto):
from_string = creator.FromString(example_proto.numpy())
encoded_seq = encoder.encoder(from_string)
return encoded_seq
raw_dataset = tf.data.TFRecordDataset(filenames=["main.tfrecord"])
raw_dataset = raw_dataset.map(lambda x: tf.py_function(test_callback, [x], [tf.int64]))
My understanding that tf.py_function has a penalty on performance.
Thank you

Using numpy for polynomial fit on pandas dataframe

I have a dataframe containing astronomical data:
I'm using statsmodels.formula.api to try to apply a polynomial fit to an dataframe, using columns labelled log_z and U, B, V, and other variables. I've got so far
sources['log_z'] = np.log10(sources.z)
mask = ~np.isnan((B-I)) & ~np.isnan(log_z)
model = ols(formula='(B-I) + np.power((U-R),2) ~ log_z', data = [log_z[mask], (B-I)[mask]]).fit()
but I keep getting
PatsyError: Error evaluating factor: TypeError: list indices must be integers or slices, not str
(B-I) + np.power((U-R),2) ~ log_z
^^^^^^^^^^^^^^^^^
even though I'm passing arrays into the function. I get the same error message (apart from the last line) no matter what arrays I use, or how I format them. Can anyone see what I'm doing wrong?

How to print out the minibatchData's value?

minibatch_size = 5
data = reader.next_minibatch(minibatch_size, input_map={ # fetch minibatch
x: reader.streams.query,
y: reader.streams.slot_labels
})
evaluator = C.eval.Evaluator(loss, progress_printer)
evaluator.test_minibatch(data)
print("labels=", data[y].as_sequences())
I got an error for data[y].as_sequences() saying:
raise ValueError('cannot convert sparse value to sequences '
ValueError: cannot convert sparse value to sequences without the corresponding variable
How do I fix this? What is a variable? What should I put?
data[y].as_sequences(variable=y) should do the trick, but I wouldn't recommend it.
On larger datasets as_sequences and asarray quickly cause out of memory exception to be thrown.
I ended up using this:
true_labels = cntk.ops.argmax(labels_input).eval(minibatch[labels_input]).astype(int)

Numpy - AttributeError: 'Zero' object has no attribute 'exp'

I'm having trouble solving a discrepancy between something breaking at runtime, but using the exact same data and operations in the python console, having it work fine.
# f_err - currently has value 1.11819388872025
# l_scales - currently a numpy array [1.17840183376334 1.13456764589809]
sq_euc_dists = self.se_term(x1, x2, l_scales) # this is fine. It calls cdists on x1/l_scales, x2/l_scales vectors
return (f_err**2) * np.exp(-0.5 * sq_euc_dists) # <-- errors on this line
The error that I get is
AttributeError: 'Zero' object has no attribute 'exp'
However, calling those exact same lines, with the same f_err, l_scales, and x1, x2 in the console right after it errors out, somehow does not produce errors.
I was not able to find a post referring to the 'Zero' object error specifically, and the non-'Zero' ones I found didn't seem to apply to my case here.
EDIT: It was a bit lacking in info, so here's an actual (extracted) runnable example with sample data I took straight out of a failed run, which when run in isolation works fine/I can't reproduce the error except in runtime.
Note that the sqeucld_dist function below is quite bad and I should be using scipy's cdist instead. However, because I'm using sympy's symbols for matrix elementwise gradients with over 15 partial derivatives in my real data, cdist is not an option as it doesn't deal with arbitrary objects.
import numpy as np
def se_term(x1, x2, l):
return sqeucl_dist(x1/l, x2/l)
def sqeucl_dist(x, xs):
return np.sum([(i-j)**2 for i in x for j in xs], axis=1).reshape(x.shape[0], xs.shape[0])
x = np.array([[-0.29932052, 0.40997373], [0.40203481, 2.19895326], [-0.37679417, -1.11028267], [-2.53012051, 1.09819485], [0.59390005, 0.9735], [0.78276777, -1.18787904], [-0.9300892, 1.18802775], [0.44852545, -1.57954101], [1.33285028, -0.58594779], [0.7401607, 2.69842268], [-2.04258086, 0.43581565], [0.17353396, -1.34430191], [0.97214259, -1.29342284], [-0.11103534, -0.15112815], [0.41541759, -1.51803154], [-0.59852383, 0.78442389], [2.01323359, -0.85283772], [-0.14074266, -0.63457529], [-0.49504797, -1.06690869], [-0.18028754, -0.70835799], [-1.3794126, 0.20592016], [-0.49685373, -1.46109525], [-1.41276934, -0.66472598], [-1.44173868, 0.42678815], [0.64623684, 1.19927771], [-0.5945761, -0.10417961]])
f_err = 1.11466725760716
l = [1.18388412685279, 1.02290811104357]
result = (f_err**2) * np.exp(-0.5 * se_term(x, x, l)) # This runs fine, but fails with the exact same calls and data during runtime
Any help greatly appreciated!
Here is how to reproduce the error you are seeing:
import sympy
import numpy
zero = sympy.sympify('0')
numpy.exp(zero)
You will see the same exception you are seeing.
You can fix this (inefficiently) by changing your code to the following to make things floating point.
def sqeucl_dist(x, xs):
return np.sum([np.vectorize(float)(i-j)**2 for i in x for j in xs],
axis=1).reshape(x.shape[0], xs.shape[0])
It will be better to fix your gradient function using lambdify.
Here's an example of how lambdify can be used on partial d
from sympy.abc import x, y, z
expression = x**2 + sympy.sin(y) + z
derivatives = [expression.diff(var, 1) for var in [x, y, z]]
derivatives is now [2*x, cos(y), 1], a list of Sympy expressions. To create a function which will evaluate this numerically at a particular set of values, we use lambdify as follows (passing 'numpy' as an argument like that means to use numpy.cos rather than sympy.cos):
derivative_calc = sympy.lambdify((x, y, z), derivatives, 'numpy')
Now derivative_calc(1, 2, 3) will return [2, -0.41614683654714241, 1]. These are ints and numpy.float64s.
A side note: np.exp(M) will calculate the element-wise exponent of each of the elements of M. If you are trying to do a matrix exponential, you need np.linalg.exmp.