np.matmul() inside tf.py_func() throws SIGBUS error - numpy

I am seeing a fatal error from matrix multiplication inside a py_func call.
In the py_func call I am multiplying a tensor with a set of 3D coordinates by a rotation matrix. This reproduces the error:
x = np.matmul(np.ones([640*480, 3]), np.eye(3))
When running outside a TF session this works with no problem, but inside the session when called via py_func I get
Process finished with exit code 138 (interrupted by signal 10: SIGBUS)
Trying different tensor sizes I see that for a shape (29000,3) the line works, and for (29200,3) it fails.
I am using TensorFlow-1.12.0.
What could cause this issue and how can I resolve it?

Related

Error: The shape of dict['input'] provided in model.execute(dict) must be [-1,128,128,3], but was [-1,128,128,3] in Tensorflow JS

I am running a simple benchmarking application with tensorflow js. Running into this very weird errors.
I create my input shape input_shape = model.inputs[0].shape
and then create a zeros array as dummy input to the model var zeros = tf.zeros([input_shape]);
and then calling await timeModelInference(model, zeros, 1) yields the error in the title.
util_base.js:153 Uncaught (in promise) Error: The shape of dict['input'] provided in model.execute(dict) must be [-1,128,128,3], but was [-1,128,128,3]
at Vv (util_base.js:153)
at graph_executor.js:572
at Array.forEach (<anonymous>)
at e.t.checkInputShapeAndType (graph_executor.js:563)
at e.<anonymous> (graph_executor.js:345)
at c (runtime.js:63)
at Generator._invoke (runtime.js:293)
at Generator.next (runtime.js:118)
at bv (runtime.js:747)
at o (runtime.js:747)
I've tried changing the first dimension to [1,128,128,3] or [,128,128,3] to no avail.
Thanks.
After debugging this for a bit, the problem was that i was calling tf.zeros([input_shape]) instead of tf.zeros(input_shape) This extra wrapping of arrays caused the problem even though the shapes were exactly the same. Better be more careful next time. :)

TypeError: can’t convert CUDA tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first (fastai)

I am following the code here:
https://www.kaggle.com/tanlikesmath/diabetic-retinopathy-with-resnet50-oversampling
However, during the metrics calculation, I am getting the following error:
File "main.py", line 50, in <module>
learn.fit_one_cycle(4,max_lr = 2e-3)
...
File "main.py", line 39, in quadratic_kappa
return torch.tensor(cohen_kappa_score(torch.argmax(y_hat,1), y, weights='quadratic'),device='cuda:0')
...
File "/pfs/work7/workspace/scratch/ul_dco32-conda-0/conda/envs/resnet50/lib/python3.8/site-packages/torch/tensor.py", line 486, in __array__
return self.numpy()
TypeError: can't convert CUDA tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
Here are the metrics and the model:
def quadratic_kappa(y_hat, y):
return torch.tensor(cohen_kappa_score(torch.argmax(y_hat,1), y, weights='quadratic'),device='cuda:0')
learn = cnn_learner(data, models.resnet50, metrics = [accuracy,quadratic_kappa])
learn.fit_one_cycle(4,max_lr = 2e-3)
As it is being said in the discussion https://discuss.pytorch.org/t/typeerror-can-t-convert-cuda-tensor-to-numpy-use-tensor-cpu-to-copy-the-tensor-to-host-memory-first/32850/6, I have to bring the data back to cpu. But I am slightly lost how to do it.
I tried to add .cpu() all over the metrics but could not solve it so far.
I'm assuming that both y and y_hat are CUDA tensors, that means that you need to bring them both to the CPU for the cohen_kappa_score, not just one.
def quadratic_kappa(y_hat, y):
return torch.tensor(cohen_kappa_score(torch.argmax(y_hat.cpu(),1), y.cpu(), weights='quadratic'),device='cuda:0')
# ^^^ ^^^
Calling .cpu() on a tensor that is already on the CPU has no effect, so it's safe to use in any case.
I went from a CPU to a GPU version and received this error. It was due to passing metrics=[mean_absolute_error,mean_squared_error] to the Learner object (in my case tabular_learner).
Removing the metric parameter solved the issue temporarily for me.

Error evaluating a TensorArray in a while loop

I've built the following TensorArray:
ta = tf.TensorArray(
dtype=tf.float32,
size=0,
dynamic_size=True,
element_shape=tf.TensorShape([None, None])
)
and called ta = ta.write(idx, my_tensor) inside a while_loop.
When evaluating the output = ta.stack() tensor in a session, I receive this error message:
ValueError: Cannot use '.../TensorArrayWrite/TensorArrayWriteV3' as
input to '.../TensorArrayStack_1/TensorArraySizeV3' because
'.../TensorArrayWrite/TensorArrayWriteV3' is in a while loop. See info
log for more details.
I don't understand this error message, could you please help me ?
Update: A minimal example might be difficult to come up with, but this is what I am doing: I am using the reference to the ta TensorArray inside the cell_input_fn of AttentionWrapper. This callback is used in AttentionWrapper's call method, where another TensorArray named alignment_history is being written. Therefore the while_loop code is not designed by me, it's part of the TF dynamic RNN computation tf.nn.dynamic_rnn.
Not sure if this is what's biting you, but you have to make sure your while_loop function takes the tensor array as input and emits an updated one as output; and you have to use the final version of the TensorArray at the end of the while_loop:
def fn(ta_old):
return ta_old.write(...)
ta_final = while_loop(..., body=fn, [tf.TensorArray(...)])
values = ta_final.stack()
specifically you should never access ta_old outside of fn().

Declaring theano variables for pymc3

I am having issues replicating a pymc2 code using pymc3.
I believe it is due to the fact pymc3 is using the theano type variables which are not compatible with the numpy operations I am using. So I am using the #theano.decorator:
I have this function:
with pymc3.Model() as model:
z_stars = pymc3.Uniform('z_star', self.z_min_ssp_limit, self.z_max_ssp_limit)
Av_stars = pymc3.Uniform('Av_star', 0.0, 5.00)
sigma_stars = pymc3.Uniform('sigma_star',0.0, 5.0)
#Fit observational wavelength
ssp_fit_output = self.ssp_fit_theano(z_stars, Av_stars, sigma_stars,
self.obj_data['obs_wave_resam'],
self.obj_data['obs_flux_norm_masked'],
self.obj_data['basesWave_resam'],
self.obj_data['bases_flux_norm'],
self.obj_data['int_mask'],
self.obj_data['normFlux_obs'])
#Define likelihood
like = pymc.Normal('ChiSq', mu=ssp_fit_output,
sd=self.obj_data['obs_fluxEr_norm'],
observed=self.obj_data['obs_fluxEr_norm'])
#Run the sampler
trace = pymc3.sample(iterations, step=step, start=start_conditions, trace=db)
where:
#theano.compile.ops.as_op(itypes=[t.dscalar,t.dscalar,t.dscalar,t.dvector,
t.dvector,t.dvector,t.dvector,t.dvector,t.dscalar],
otypes=[t.dvector])
def ssp_fit_theano(self, input_z, input_sigma, input_Av, obs_wave, obs_flux_masked,
rest_wave, bases_flux, int_mask, obsFlux_mean):
...
...
The first three variables are scalars (from the pymc3 uniform distribution). The
remaining variables are numpy arrays and the last one is a float. However, I am
getting this "'numpy.ndarray' object has no attribute 'type'" error:
File "/home/user/anaconda/lib/python2.7/site-packages/theano/gof/op.py", line 615, in __call__
node = self.make_node(*inputs, **kwargs)
File "/home/user/anaconda/lib/python2.7/site-packages/theano/gof/op.py", line 963, in make_node
if not all(inp.type == it for inp, it in zip(inputs, self.itypes)):
File "/home/user/anaconda/lib/python2.7/site-packages/theano/gof/op.py", line 963, in <genexpr>
if not all(inp.type == it for inp, it in zip(inputs, self.itypes)):
AttributeError: 'numpy.ndarray' object has no attribute 'type'
Please any advice in the right direction will be most welcomed.
I had a bunch of time-wasting-stops when I went from pymc2 to pymc3. The problem, I think, is that the doc is quite bad. I suspect they neglect the doc as far as the code is still evolving. 3 comments/advises:
I wish you could find some help using '#theano.compile.ops.as_op' here: failure to adapt pymc2 into pymc3 or here how to fit a method belonging to an instance with pymc3?
The drawback of '#theano.compile.ops.as_op' is that you implicitly exclude any analysis related to the gradient of your function. To have access to the gradient, I think you need to define your function in a more complex way presented here how to fit a method belonging to an instance with pymc3?
warning: for the moment, using theano seems to be a source of problem if you want to distribute your code under Windows. See build a .exe for Windows from a python 3 script importing theano with pyinstaller, but I am not sure whether it is just a personal clumsiness or really a problem. Personally I had to give up theano to be able to distribute my code...

tf.scatter_nd_update Variable Requirement vs RNN.__call__ method

I am developing a RNN and am using Tensorflow 1.1. I got the following error:
tensorflow.python.framework.errors_impl.InvalidArgumentError: The node 'model/att_seq2seq/encode/pocmru_rnn_encoder/rnn/while/Variable/Assign' has inputs from different frames. The input 'model/att_seq2seq/encode/pocmru_rnn_encoder/rnn/while/Identity_3' is in frame 'model/att_seq2seq/encode/pocmru_rnn_encoder/rnn/while/model/att_seq2seq/encode/pocmru_rnn_encoder/rnn/while/'. The input 'model/att_seq2seq/encode/pocmru_rnn_encoder/rnn/while/Variable' is in frame ''.
The error is caused by the lambda function in dynamic rnn method and a piece of code in my RNN.
tensorflow rnn.py "dynamic_rnn / _dynamic_rnn_loop / _time_step" that using a lambda function to call RNN.call method to loop through all inputs.
my code :
if type(myObject) != tf.Variable:
tp = tf.Variable(myObject, validate_shape=False)
else:
tp = myObject
Logically, i repeatedly use tf.scatter_nd_update to update myObject. The pseudo code would be like myObject = scatter_nd_update(myObject, indices, updates). Since tf.scatter_nd_update requires Variable as argument and returns tensor, I need to wrap tensor into Variable. Hence the code above (test variable and then wrap). How should I modify my code to make it work? Thanks!