Adding loss functions in MxNet - "Operator _copyto is non-differentiable because it didn't register FGradient attribute" - mxnet

I have a system that generates training data, and I want add loss functions together to get a batch size. I am trying to do (full code at commit in question),
for epoch in range(100):
with mx.autograd.record():
loss = 0.0
for k in range(40):
(i, x), (j, y) = random.choice(data), random.choice(data)
# Just compute loss on last output
if i == j:
loss = loss - l2loss(net(mx.nd.array(x)), net(mx.nd.array(y)))
else:
loss = loss + l2loss(net(mx.nd.array(x)), net(mx.nd.array(y)))
loss.backward()
trainer.step(BATCH_SIZE)
But I get an error like,
---------------------------------------------------------------------------
MXNetError Traceback (most recent call last)
<ipython-input-39-14981406278a> in <module>()
21 else:
22 loss = loss + l2loss(net(mx.nd.array(x)), net(mx.nd.array(y)))
---> 23 loss.backward()
24 trainer.step(BATCH_SIZE)
25 avg_loss += mx.nd.mean(loss).asscalar()
... More trace ...
MXNetError: [16:52:49] src/pass/gradient.cc:187: Operator _copyto is non-differentiable because it didn't register FGradient attribute.
How do I incrementally add loss functions like I am trying to?

What version of MXNet are you using? I couldn't reproduce this using the latest code base. You can try either GitHub master branch or version 0.12.

Related

What is the Tensorflow 2.4.1 analogous of TPU_strategy.experimental_run_v2 from version 2.1? How to replace that?

I am following this old notebook on Kaggle for BERT MLM training where the tensorflow version is 2.1. I cloned and tried running the code but there's an error that strategy has no experimental_run_v2.
In the official documentation of Custom training in TPU's this piece of information is given but i'm not able to grasp what do I have to change in my code to make it run:
# `run` replicates the provided computation and runs it
# with the distributed input.
#tf.function
def distributed_train_step(dataset_inputs):
per_replica_losses = strategy.run(train_step, args=(dataset_inputs,))
return strategy.reduce(tf.distribute.ReduceOp.SUM, per_replica_losses,
axis=None)
#tf.function
def distributed_test_step(dataset_inputs):
return strategy.run(test_step, args=(dataset_inputs,))
Below is the code which I am trying to run and I have commented the troublesome part. Could someone please help me with proper restructuring of this code?
def train_mlm(train_dist_dataset, total_steps=2000, evaluate_every=200):
step = 0
### Training lopp ###
for tensor in train_dist_dataset:
distributed_mlm_train_step(tensor) # --------- HERE IS THE ERROR -----
step+=1
if (step % evaluate_every == 0):
### Print train metrics ###
train_metric = train_mlm_loss_metric.result().numpy()
print("Step %d, train loss: %.2f" % (step, train_metric))
### Reset metrics ###
train_mlm_loss_metric.reset_states()
if step == total_steps:
break
#tf.function # What Should be replaced with this line of code?
def distributed_mlm_train_step(data):
strategy.experimental_run_v2(mlm_train_step, args=(data,)) # this is what causing the error
I think I have to use something to add the total error like the one in the documentation strategy.reduce(tf.distribute.ReduceOp.SUM, per_replica_losses, axis=None) but using this one gave me another obvious error ValueError: A non-DistributedValues value None cannot be reduced with the given reduce op ReduceOp.SUM.
Please see this article with TF 2.6
Things to note in the above example:
They construct the sum using for x in ... iteration. train_dist_dataset test_dist_dataset
The scaling loss is distributed_train_step the return value of tf.distribute.Strategy.reduce This value is merged as each replica is used, and then tf.distribute.Strategy.reduce spreads across batches by stacking each return value.
When executed tf.distribute.Strategy.experimental_run_v2, it tf.keras.Metrics should be updated in train_step and test_step.

Codes worked fine one week ago, but keep getting error since yesterday: Fine-tuning Bert model training via PyTorch on Colab

I am new to Bert. Two weeks ago I successfully ran a fine-tuning Bert model on a nlp classification task though the outcome was not brilliant. Yesterday, however, when I tried to run the same code and data, an AttributeError was always there, which says: 'str' object has no attribute 'dim'. Please know everything is on Colab and via PyTorch Transformers.
What should I do to fix it?
Here is one thing I tried when I installed transformers but turned out it did not work:
instead of
!pip install transformers ,
I tried to use previous transformers version:
!pip install --target lib --upgrade transformers==3.5.0
Any feedback will be greatly appreciated!
Please see the code and the error message as below:
Code:
train definition
# function to train the model
def train():
model.train()
total_loss, total_accuracy = 0, 0
# empty list to save model predictions
total_preds=[]
# iterate over batches
for step,batch in enumerate(train_dataloader):
# progress update after every 50 batches.
if step % 200 == 0 and not step == 0:
print(' Batch {:>5,} of {:>5,}.'.format(step, len(train_dataloader)))
# push the batch to gpu
batch = [r.to(device) for r in batch]
sent_id, mask, labels = batch
# clear previously calculated gradients
model.zero_grad()
# get model predictions for the current batch
preds = model(sent_id, mask)
# compute the loss between actual and predicted values
loss = cross_entropy(preds, labels)
# add on to the total loss
total_loss = total_loss + loss.item()
# backward pass to calculate the gradients
loss.backward()
# clip the the gradients to 1.0. It helps in preventing the exploding gradient problem
torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
# update parameters
optimizer.step()
# update learning rate schedule
# scheduler.step()
# model predictions are stored on GPU. So, push it to CPU
preds=preds.detach().cpu().numpy()
# append the model predictions
total_preds.append(preds)
# compute the training loss of the epoch
avg_loss = total_loss / len(train_dataloader)
# predictions are in the form of (no. of batches, size of batch, no. of classes).
# reshape the predictions in form of (number of samples, no. of classes)
total_preds = np.concatenate(total_preds, axis=0)
#returns the loss and predictions
return avg_loss, total_preds
training process
# set initial loss to infinite
best_valid_loss = float('inf')
# empty lists to store training and validation loss of each epoch
train_losses=[]
valid_losses=[]
#for each epoch
for epoch in range(epochs):
print('\n Epoch {:} / {:}'.format(epoch + 1, epochs))
#train model
train_loss, _ = train()
#evaluate model
valid_loss, _ = evaluate()
#save the best model
if valid_loss < best_valid_loss:
best_valid_loss = valid_loss
torch.save(model.state_dict(), 'saved_weights.pt')
# append training and validation loss
train_losses.append(train_loss)
valid_losses.append(valid_loss)
print(f'\nTraining Loss: {train_loss:.3f}')
print(f'Validation Loss: {valid_loss:.3f}')
Error message:
Epoch 1 / 10
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-41-c5138ddf6b25> in <module>()
12
13 #train model
---> 14 train_loss, _ = train()
15
16 #evaluate model
5 frames
/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in linear(input, weight, bias)
1686 if any([type(t) is not Tensor for t in tens_ops]) and has_torch_function(tens_ops):
1687 return handle_torch_function(linear, tens_ops, input, weight, bias=bias)
-> 1688 if input.dim() == 2 and bias is not None:
1689 # fused op is marginally faster
1690 ret = torch.addmm(bias, input, weight.t())
AttributeError: 'str' object has no attribute 'dim'
As far as I remember - there was an old transformer version in colab. Something like 2.11.0. Try:
!pip install transformers~=2.11.0
Change the version number until it works.

maximizing binary cross_entropy in a keras model

I don't know hot to create a model that is maximizing binary cross_entropy loss in a keras model.
research:
1.https://intellipaat.com/community/17707/how-to-maximize-loss-function-in-keras
that said:
Simply multiply the loss by -1 to maximize the loss function while trying to minimize it:
new_loss = -loss
but using:
model.compile(loss=-1 * 'binary_crossentropy', optimizer=adam_optimizer())
resulted in this error:
ValueError: The model cannot be compiled because it has no loss to optimize.
https://stats.stackexchange.com/questions/303229/why-does-keras-binary-crossentropy-loss-function-return-wrong-values
gave me a custom function that approximates the keras binary_crossentropy loss:
import keras.backend as K
def binary_crossentropy(y_true, y_pred):
result = []
for i in range(len(y_pred)):
y_pred[i] = [max(min(x, 1 - K.epsilon()), K.epsilon()) for x in y_pred[i]]
result.append(-np.mean([y_true[i][j] * math.log(y_pred[i][j]) + (1 - y_true[i][j]) * math.log(1 - y_pred[i][j]) for j in range(len(y_pred[i]))]))
return np.mean(result)
but I can not use it since it results in the error:
len is not well defined for symbolic Tensors. (43_54/Sigmoid:0) Please call `x.shape` rather than `len(x)` for shape information.
when I replace len with .shape[0]
I get the another error:
__index__ returned non-int (type NoneType)
I tinkered with the syntax in several more ways but nothing seems to work.
any ideas?
python 3.6
tensorflow 1.15
keras 2.3.1
You just need to define a new loss, based on the keras implementation:
def neg_binary_crossentropy(y_true, y_pred):
return -1.0 * keras.losses.binary_crossentropy(y_true, y_pred)
And then use it in model.compile:
model.compile(loss=neg_binary_crossentropy, optimizer="adam")

Adam optimizer error: one of the variables needed for gradient computation has been modified by an inplace operation

I am trying to implement Actor-Critic learning atuomation algorithm that is not same as basic actor-critic algorithm, it's little bit changed.
Anyway, I used Adam optimizer and implemented with pytorch
when i backward TD-error for Critic first, there's no error.
However, i backward loss for Actor, the error occured.
--------------------------------------------------------------------------- RuntimeError Traceback (most recent call
last) in
46 # update Actor Func
47 optimizer_M.zero_grad()
---> 48 loss.backward()
49 optimizer_M.step()
50
~\Anaconda3\lib\site-packages\torch\tensor.py in backward(self,
gradient, retain_graph, create_graph)
100 products. Defaults to False.
101 """
--> 102 torch.autograd.backward(self, gradient, retain_graph, create_graph)
103
104 def register_hook(self, hook):
~\Anaconda3\lib\site-packages\torch\autograd__init__.py in
backward(tensors, grad_tensors, retain_graph, create_graph,
grad_variables)
88 Variable._execution_engine.run_backward(
89 tensors, grad_tensors, retain_graph, create_graph,
---> 90 allow_unreachable=True) # allow_unreachable flag
91
92
RuntimeError: one of the variables needed for gradient computation has
been modified by an inplace operation
above is the content of error
I tried to find inplace operation, but I haven't found in my written code.
I think i don't know how to handle optimizer.
Here is main code:
for cur_step in range(1):
action = M_Agent(state, flag)
next_state, r = env.step(action)
# calculate TD Error
TD_error = M_Agent.cal_td_error(r, next_state)
# calculate Target
target = torch.FloatTensor([M_Agent.cal_target(TD_error)])
logit = M_Agent.cal_logit()
loss = criterion(logit, target)
# update value Func
optimizer_M.zero_grad()
TD_error.backward()
optimizer_M.step()
# update Actor Func
loss.backward()
optimizer_M.step()
Here is the agent network
# Actor-Critic Agent
self.act_pipe = nn.Sequential(nn.Linear(state, 128),
nn.ReLU(),
nn.Dropout(0.5),
nn.Linear(128, 256),
nn.ReLU(),
nn.Dropout(0.5),
nn.Linear(256, num_action),
nn.Softmax()
)
self.val_pipe = nn.Sequential(nn.Linear(state, 128),
nn.ReLU(),
nn.Dropout(0.5),
nn.Linear(128, 256),
nn.ReLU(),
nn.Dropout(0.5),
nn.Linear(256, 1)
)
def forward(self, state, flag, test=None):
temp_action_prob = self.act_pipe(state)
self.action_prob = self.cal_prob(temp_action_prob, flag)
self.action = self.get_action(self.action_prob)
self.value = self.val_pipe(state)
return self.action
I wanna update each network respectively.
and I wanna know that Basic TD Actor-Critic method uses TD error for loss??
or squared error between r+V(s') and V(s) ?
I think the problem is that you zero the gradients right before calling backward, after the forward propagation. Note that for automatic differentiation you need the computation graph and the intermediate results that you produce during your forward pass.
So zero the gradients before your TD error and target calculations! And not after you are finished your forward propagation.
for cur_step in range(1):
action = M_Agent(state, flag)
next_state, r = env.step(action)
optimizer_M.zero_grad() # zero your gradient here
# calculate TD Error
TD_error = M_Agent.cal_td_error(r, next_state)
# calculate Target
target = torch.FloatTensor([M_Agent.cal_target(TD_error)])
logit = M_Agent.cal_logit()
loss = criterion(logit, target)
# update value Func
TD_error.backward()
optimizer_M.step()
# update Actor Func
loss.backward()
optimizer_M.step()
To answer your second question, the DDPG algorithm for example uses the squared error (see the paper).
Another recommendation. In many cases large parts of the value and policy networks are shared in deep actor-critic agents: you have the same layers up to the last hidden layer, and use a single linear output for value prediction and a softmax layer for the action distribution. This is especially useful if you have high dimensional visual inputs, as it act as sort of a multi-task learning, but nevertheless you can try. (As I see you have a low-dimensional state vector).

how to log validation loss and accuracy using tfslim

Is there any way that I can log the validaton loss and accuracy to tensorboard when using tf-slim? When I was using keras, the following code can do this for me:
model.fit_generator(generator=train_gen(), validation_data=valid_gen(),...)
Then the model will evaluate the validation loss and accuracy after each epoch, which is very convenient. But how to achieve this using tf-slim? The following steps are using primitive tensorflow, which is not what I want:
with tf.Session() as sess:
for step in range(100000):
sess.run(train_op, feed_dict={X: X_train, y: y_train})
if n % batch_size * batches_per_epoch == 0:
print(sess.run(train_op, feed_dict={X: X_train, y: y_train}))
Right now, the steps to train a model using tf-slim is:
tf.contrib.slim.learning.train(
train_op=train_op,
logdir="logs",
number_of_steps=10000,
log_every_n_steps = 10,
save_summaries_secs=1
)
So how to evaluate validation loss and accuracy after each epoch with the above slim training procedure?
Thanks in advance!
The matter is still being discussed on TF Slim repo (issue #5987).
The framework allows you to easily create an evaluation script to run after / in parallel of your training (solution 1 below), but some people are pushing to be able to implement the "classic cycle of batch training + validation" (solution 2).
1. Use slim.evaluation in another script
TF Slim has evaluation methods e.g. slim.evaluation.evaluation_loop() you can use in another script (which can be run in parallel of your training) to periodically load the latest checkpoint of your model and perform evaluation. TF Slim page contains a good example how such a script may look: example.
2. Provide a custom train_step_fn to slim.learning.train()
A patchy solution the initiator of the discussion came up with makes use of a custom training step function you can provide to slim.learning.train():
"""
Snippet from code by Kevin Malakoff #kmalakoff
https://github.com/tensorflow/tensorflow/issues/5987#issue-192626454
"""
# ...
accuracy_validation = slim.metrics.accuracy(
tf.argmax(predictions_validation, 1),
tf.argmax(labels_validation, 1)) # ... or whatever metrics needed
def train_step_fn(session, *args, **kwargs):
total_loss, should_stop = train_step(session, *args, **kwargs)
if train_step_fn.step % FLAGS.validation_check == 0:
accuracy = session.run(train_step_fn.accuracy_validation)
print('Step %s - Loss: %.2f Accuracy: %.2f%%' % (str(train_step_fn.step).rjust(6, '0'), total_loss, accuracy * 100))
# ...
train_step_fn.step += 1
return [total_loss, should_stop]
train_step_fn.step = 0
train_step_fn.accuracy_validation = accuracy_validation
slim.learning.train(
train_op,
FLAGS.logs_dir,
train_step_fn=train_step_fn,
graph=graph,
number_of_steps=FLAGS.max_steps
)