ValueError: Target size (torch.Size([16])) must be the same as input size (torch.Size([16, 1])) - size

ValueError Traceback (most recent call last)
<ipython-input-30-33821ccddf5f> in <module>
23 output = model(data)
24 # calculate the batch loss
---> 25 loss = criterion(output, target)
26 # backward pass: compute gradient of the loss with respect to model parameters
27 loss.backward()
C:\Users\mnauf\Anaconda3\envs\federated_learning\lib\site-packages\torch\nn\modules\module.py in __call__(self, *input, **kwargs)
487 result = self._slow_forward(*input, **kwargs)
488 else:
--> 489 result = self.forward(*input, **kwargs)
490 for hook in self._forward_hooks.values():
491 hook_result = hook(self, input, result)
C:\Users\mnauf\Anaconda3\envs\federated_learning\lib\site-packages\torch\nn\modules\loss.py in forward(self, input, target)
593 self.weight,
594 pos_weight=self.pos_weight,
--> 595 reduction=self.reduction)
596
597
C:\Users\mnauf\Anaconda3\envs\federated_learning\lib\site-packages\torch\nn\functional.py in binary_cross_entropy_with_logits(input, target, weight, size_average, reduce, reduction, pos_weight)
2073
2074 if not (target.size() == input.size()):
-> 2075 raise ValueError("Target size ({}) must be the same as input size ({})".format(target.size(), input.size()))
2076
2077 return torch.binary_cross_entropy_with_logits(input, target, weight, pos_weight, reduction_enum)
ValueError: Target size (torch.Size([16])) must be the same as input size (torch.Size([16, 1]))
I am training a CNN. Working on the Horses vs humans dataset. This is my code. I am using criterion = nn.BCEWithLogitsLoss() and optimizer = optim.RMSprop(model.parameters(), lr=0.01). My final layer is self.fc2 = nn.Linear(512, 1). Out last neuron, will output 1 for horse and 0 for human, right? or should I choose 2 neurons for output?
16 is the batch size. Since the error says ValueError: Target size (torch.Size([16])) must be the same as input size (torch.Size([16, 1])). I don't understand, where do I need to make change, to rectify the error.

target = target.unsqueeze(1), before passing target to criterion, changed the target tensor size from [16] to [16,1]. Doing it solved the issue. Furthermore, I also needed to do target = target.float() before passing it to criterion, because our outputs are in float. Besides, there was another error in the code. I was using sigmoid activation function in the last layer, but I shouldn’t because the criterion I am using already comes with sigmoid builtin.

You can also try _, pred = torch.max(output, 1) and then pass the pred variable into Loss function.

I had the same error when I ran my model. I was able to correct it by returning torch.tensor([target]).float().to(device) at the Dataset class.

Related

Evaluating the state value function when using the SAC agent of TF-Agents

The state value function v at states x is a quantity of interest of the Markov decision process (MDP) which I intend to solve. (My MDP is fully observable: observation = state.)
I use the SAC agent of TF-agents to learn action value function q(x,a) and policy π. Thus given a state x, the policy returns an approximately optimal action a = π(x) so that v(x) ≈ q(x,π(x)).
Problem description: How can one write q(x,π(x)) as a TF-Agents expression?
I can examine the problem already with the SAC tutorial https://www.tensorflow.org/agents/tutorials/7_SAC_minitaur_tutorial by adding the following lines to the end of the tutorial:
# Resetting the environment to obtain a TimeStep object
time_step = env.reset()
# An observation which respects the observation specs of env, corresponding to x above
observation = time_step.observation
# Calling the evaluation policy we obtain an action, this is essentially π(x) above
action = eval_policy.action(time_step).action
# I was expecting that the next line would return q(x,π(x))
critic_net((observation,action))
The reason for the last line was that the input_tensor_spec of a CriticNetwork was described as a tuple of (observation, action) in https://www.tensorflow.org/agents/api_docs/python/tf_agents/agents/ddpg/critic_network/CriticNetwork.
However instead critic_net((observation,action)) raises the following error:
---------------------------------------------------------------------------
InvalidArgumentError Traceback (most recent call last)
<ipython-input-32-8446b099696b> in <module>
----> 1 critic_net((observation,action))
2 frames
/usr/local/lib/python3.8/dist-packages/tf_agents/networks/network.py in __call__(self, inputs, *args, **kwargs)
425 normalized_kwargs.pop("network_state", None)
426
--> 427 outputs, new_state = super(Network, self).__call__(**normalized_kwargs) # pytype: disable=attribute-error # typed-keras
428
429 nest_utils.assert_matching_dtypes_and_inner_shapes(
/usr/local/lib/python3.8/dist-packages/keras/utils/traceback_utils.py in error_handler(*args, **kwargs)
68 # To get the full stack trace, call:
69 # `tf.debugging.disable_traceback_filtering()`
---> 70 raise e.with_traceback(filtered_tb) from None
71 finally:
72 del filtered_tb
/usr/local/lib/python3.8/dist-packages/tf_agents/agents/ddpg/critic_network.py in call(***failed resolving arguments***)
166 actions = layer(actions, training=training)
167
--> 168 joint = tf.concat([observations, actions], 1)
169 for layer in self._joint_layers:
170 joint = layer(joint, training=training)
InvalidArgumentError: Exception encountered when calling layer 'CriticNetwork' (type CriticNetwork).
{{function_node __wrapped__ConcatV2_N_2_device_/job:localhost/replica:0/task:0/device:CPU:0}} ConcatOp : Dimension 0 in both shapes must be equal: shape[0] = [28,1] vs. shape[1] = [8,1] [Op:ConcatV2] name: concat
Call arguments received by layer 'CriticNetwork' (type CriticNetwork):
• inputs=('tf.Tensor(shape=(28,), dtype=float32)', 'tf.Tensor(shape=(8,), dtype=float32)')
• step_type=()
• network_state=()
• training=False
Can someone help me with the evaluation of the critic network?

TypeError: tuple indices must be integers or slices, not str, facing this error in keras model

I am running a keras model, LINK IS HERE. I have just changed the dataset for this model and when I run my model it throwing this error TypeError: tuple indices must be integers or slices, not str. As it's a image captioning model and the dataset is difficult for me to understand.
See the blow code and read also the location of the error.
`reduce_lr = keras.callbacks.ReduceLROnPlateau(
monitor="val_loss", factor=0.2, patience=3
)
# Create an early stopping callback.
early_stopping = tf.keras.callbacks.EarlyStopping(
monitor="val_loss", patience=5, restore_best_weights=True
)
history = dual_encoder.fit(
train_dataloader,
epochs=num_epochs,
#validation_data=val_dataloader,
#callbacks=[reduce_lr, early_stopping],
)
print("Training completed. Saving vision and text encoders...")
vision_encoder.save("vision_encoder")
text_encoder.save("text_encoder")
print("Models are saved.")
TypeError Traceback (most recent call last)
<ipython-input-31-745dd79762e6> in <module>()
15 history = dual_encoder.fit(
16 train_dataloader,
---> 17 epochs=num_epochs,
18 #validation_data=val_dataloader,
19 #callbacks=[reduce_lr, early_stopping],
11 frames
<ipython-input-26-0696c83bf387> in call(self, features, training)
16 with tf.device("/gpu:0"):
17 # Get the embeddings for the captions.
---> 18 caption_embeddings = text_encoder(features["caption"], training=training)
19 #caption_embeddings = text_encoder(train_inputs, training=training)
20 with tf.device("/gpu:1"):
TypeError: tuple indices must be integers or slices, not str'
The error is pointing to this location caption_embeddings = text_encoder(features["caption"], training=training)
Now I am confused, I don't know whether this error is due to the data which I am passing to my model like this history = dual_encoder.fit(train_dataloader) OR this error is related to caption_embeddings = text_encoder(features["caption"], training=training) and image_embeddings = vision_encoder(features["image"], training=training) which is defined in class DualEncoder.
Because I don't know what are these features["caption"] and features["image"] which is defined in Class DualEncoder as I have not changed these two with my new dataset if You check my CODE HERE IN THIS COLAB FILE.
The dataset (train_dataloader) seems to return a tuple of items: link. In particular, model input is a tuple (images, x_batch_input).
However, your code (in DualEncoder) seems to assume that it's a dict (with keys like "caption", "image", etc). I think that's the source of the mismatch.

Adam optimizer error: one of the variables needed for gradient computation has been modified by an inplace operation

I am trying to implement Actor-Critic learning atuomation algorithm that is not same as basic actor-critic algorithm, it's little bit changed.
Anyway, I used Adam optimizer and implemented with pytorch
when i backward TD-error for Critic first, there's no error.
However, i backward loss for Actor, the error occured.
--------------------------------------------------------------------------- RuntimeError Traceback (most recent call
last) in
46 # update Actor Func
47 optimizer_M.zero_grad()
---> 48 loss.backward()
49 optimizer_M.step()
50
~\Anaconda3\lib\site-packages\torch\tensor.py in backward(self,
gradient, retain_graph, create_graph)
100 products. Defaults to False.
101 """
--> 102 torch.autograd.backward(self, gradient, retain_graph, create_graph)
103
104 def register_hook(self, hook):
~\Anaconda3\lib\site-packages\torch\autograd__init__.py in
backward(tensors, grad_tensors, retain_graph, create_graph,
grad_variables)
88 Variable._execution_engine.run_backward(
89 tensors, grad_tensors, retain_graph, create_graph,
---> 90 allow_unreachable=True) # allow_unreachable flag
91
92
RuntimeError: one of the variables needed for gradient computation has
been modified by an inplace operation
above is the content of error
I tried to find inplace operation, but I haven't found in my written code.
I think i don't know how to handle optimizer.
Here is main code:
for cur_step in range(1):
action = M_Agent(state, flag)
next_state, r = env.step(action)
# calculate TD Error
TD_error = M_Agent.cal_td_error(r, next_state)
# calculate Target
target = torch.FloatTensor([M_Agent.cal_target(TD_error)])
logit = M_Agent.cal_logit()
loss = criterion(logit, target)
# update value Func
optimizer_M.zero_grad()
TD_error.backward()
optimizer_M.step()
# update Actor Func
loss.backward()
optimizer_M.step()
Here is the agent network
# Actor-Critic Agent
self.act_pipe = nn.Sequential(nn.Linear(state, 128),
nn.ReLU(),
nn.Dropout(0.5),
nn.Linear(128, 256),
nn.ReLU(),
nn.Dropout(0.5),
nn.Linear(256, num_action),
nn.Softmax()
)
self.val_pipe = nn.Sequential(nn.Linear(state, 128),
nn.ReLU(),
nn.Dropout(0.5),
nn.Linear(128, 256),
nn.ReLU(),
nn.Dropout(0.5),
nn.Linear(256, 1)
)
def forward(self, state, flag, test=None):
temp_action_prob = self.act_pipe(state)
self.action_prob = self.cal_prob(temp_action_prob, flag)
self.action = self.get_action(self.action_prob)
self.value = self.val_pipe(state)
return self.action
I wanna update each network respectively.
and I wanna know that Basic TD Actor-Critic method uses TD error for loss??
or squared error between r+V(s') and V(s) ?
I think the problem is that you zero the gradients right before calling backward, after the forward propagation. Note that for automatic differentiation you need the computation graph and the intermediate results that you produce during your forward pass.
So zero the gradients before your TD error and target calculations! And not after you are finished your forward propagation.
for cur_step in range(1):
action = M_Agent(state, flag)
next_state, r = env.step(action)
optimizer_M.zero_grad() # zero your gradient here
# calculate TD Error
TD_error = M_Agent.cal_td_error(r, next_state)
# calculate Target
target = torch.FloatTensor([M_Agent.cal_target(TD_error)])
logit = M_Agent.cal_logit()
loss = criterion(logit, target)
# update value Func
TD_error.backward()
optimizer_M.step()
# update Actor Func
loss.backward()
optimizer_M.step()
To answer your second question, the DDPG algorithm for example uses the squared error (see the paper).
Another recommendation. In many cases large parts of the value and policy networks are shared in deep actor-critic agents: you have the same layers up to the last hidden layer, and use a single linear output for value prediction and a softmax layer for the action distribution. This is especially useful if you have high dimensional visual inputs, as it act as sort of a multi-task learning, but nevertheless you can try. (As I see you have a low-dimensional state vector).

How can I load a saved model from object detection for inference?

I'm pretty new to Tensorflow and have been running experiments with SSDs with the Tensorflow Object Detection API. I can successfully train a model, but by default, it only save the last n checkpoints. I'd like to instead save the last n checkpoints with the lowest loss (I'm assuming that's the best metric to use).
I found tf.estimator.BestExporter and it exports a saved_model.pb along with variables. However, I have yet to figure out how to load that saved model and run inference on it. After running models/research/object_detection/export_inference_graph.py on the checkpoiont, I can easily load a checkpoint and run inference on it using the object detection jupyter notebook: https://github.com/tensorflow/models/blob/master/research/object_detection/object_detection_tutorial.ipynb
I've found documentation on loading saved models, and can load a graph like this:
with tf.Session(graph=tf.Graph()) as sess:
tags = [tag_constants.SERVING]
meta_graph = tf.saved_model.loader.load(sess, tags, PATH_TO_SAVED_MODEL)
detection_graph = tf.get_default_graph()
However, when I use that graph with the above jupyter notebook, I get errors:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-17-9e48f0d04df2> in <module>
7 image_np_expanded = np.expand_dims(image_np, axis=0)
8 # Actual detection.
----> 9 output_dict = run_inference_for_single_image(image_np, detection_graph)
10 # Visualization of the results of a detection.
11 vis_util.visualize_boxes_and_labels_on_image_array(
<ipython-input-16-0df86999596e> in run_inference_for_single_image(image, graph)
31 detection_masks_reframed, 0)
32
---> 33 image_tensor = tf.get_default_graph().get_tensor_by_name('image_tensor:0')
34 # image_tensor = tf.get_default_graph().get_tensor_by_name('serialized_example')
35
~/anaconda3/envs/sb/lib/python3.6/site-packages/tensorflow/python/framework/ops.py in get_tensor_by_name(self, name)
3664 raise TypeError("Tensor names are strings (or similar), not %s." %
3665 type(name).__name__)
-> 3666 return self.as_graph_element(name, allow_tensor=True, allow_operation=False)
3667
3668 def _get_tensor_by_tf_output(self, tf_output):
~/anaconda3/envs/sb/lib/python3.6/site-packages/tensorflow/python/framework/ops.py in as_graph_element(self, obj, allow_tensor, allow_operation)
3488
3489 with self._lock:
-> 3490 return self._as_graph_element_locked(obj, allow_tensor, allow_operation)
3491
3492 def _as_graph_element_locked(self, obj, allow_tensor, allow_operation):
~/anaconda3/envs/sb/lib/python3.6/site-packages/tensorflow/python/framework/ops.py in _as_graph_element_locked(self, obj, allow_tensor, allow_operation)
3530 raise KeyError("The name %s refers to a Tensor which does not "
3531 "exist. The operation, %s, does not exist in the "
-> 3532 "graph." % (repr(name), repr(op_name)))
3533 try:
3534 return op.outputs[out_n]
KeyError: "The name 'image_tensor:0' refers to a Tensor which does not exist. The operation, 'image_tensor', does not exist in the graph."
Is there a better way to load the saved model or convert it to an inference graph?
Thanks!
Tensorflow detection API supports different input formats during exporting as discribed in documentation of file export_inference_graph.py:
image_tensor: Accepts a uint8 4-D tensor of shape [None, None, None, 3]
encoded_image_string_tensor: Accepts a 1-D string tensor of shape [None]
containing encoded PNG or JPEG images. Image resolutions are expected to be
the same if more than 1 image is provided.
tf_example: Accepts a 1-D string tensor of shape [None] containing
serialized TFExample protos. Image resolutions are expected to be the same
if more than 1 image is provided.
So you should check that you use image_tensor input_type. The chosen input node will be named as "inputs" in exported model. So I suppose that replacing image_tensor:0 with inputs (or maybe inputs:0) will solve your problem.
Also I would like to recommend a useful tool to run exported models with several lines of code: tf.contrib.predictor.from_saved_model. Here is example of how to use it:
import tensorflow as tf
import cv2
img = cv2.imread("test.jpg")
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img_rgb = np.expand_dims(img, 0)
predict_fn = tf.contrib.predictor.from_saved_model("./saved_model")
output_data = predict_fn({"inputs": img_rgb})
print(output_data) # detector output dictionary

Adding loss functions in MxNet - "Operator _copyto is non-differentiable because it didn't register FGradient attribute"

I have a system that generates training data, and I want add loss functions together to get a batch size. I am trying to do (full code at commit in question),
for epoch in range(100):
with mx.autograd.record():
loss = 0.0
for k in range(40):
(i, x), (j, y) = random.choice(data), random.choice(data)
# Just compute loss on last output
if i == j:
loss = loss - l2loss(net(mx.nd.array(x)), net(mx.nd.array(y)))
else:
loss = loss + l2loss(net(mx.nd.array(x)), net(mx.nd.array(y)))
loss.backward()
trainer.step(BATCH_SIZE)
But I get an error like,
---------------------------------------------------------------------------
MXNetError Traceback (most recent call last)
<ipython-input-39-14981406278a> in <module>()
21 else:
22 loss = loss + l2loss(net(mx.nd.array(x)), net(mx.nd.array(y)))
---> 23 loss.backward()
24 trainer.step(BATCH_SIZE)
25 avg_loss += mx.nd.mean(loss).asscalar()
... More trace ...
MXNetError: [16:52:49] src/pass/gradient.cc:187: Operator _copyto is non-differentiable because it didn't register FGradient attribute.
How do I incrementally add loss functions like I am trying to?
What version of MXNet are you using? I couldn't reproduce this using the latest code base. You can try either GitHub master branch or version 0.12.