Evaluating the state value function when using the SAC agent of TF-Agents - tensorflow2.0

The state value function v at states x is a quantity of interest of the Markov decision process (MDP) which I intend to solve. (My MDP is fully observable: observation = state.)
I use the SAC agent of TF-agents to learn action value function q(x,a) and policy π. Thus given a state x, the policy returns an approximately optimal action a = π(x) so that v(x) ≈ q(x,π(x)).
Problem description: How can one write q(x,π(x)) as a TF-Agents expression?
I can examine the problem already with the SAC tutorial https://www.tensorflow.org/agents/tutorials/7_SAC_minitaur_tutorial by adding the following lines to the end of the tutorial:
# Resetting the environment to obtain a TimeStep object
time_step = env.reset()
# An observation which respects the observation specs of env, corresponding to x above
observation = time_step.observation
# Calling the evaluation policy we obtain an action, this is essentially π(x) above
action = eval_policy.action(time_step).action
# I was expecting that the next line would return q(x,π(x))
critic_net((observation,action))
The reason for the last line was that the input_tensor_spec of a CriticNetwork was described as a tuple of (observation, action) in https://www.tensorflow.org/agents/api_docs/python/tf_agents/agents/ddpg/critic_network/CriticNetwork.
However instead critic_net((observation,action)) raises the following error:
---------------------------------------------------------------------------
InvalidArgumentError Traceback (most recent call last)
<ipython-input-32-8446b099696b> in <module>
----> 1 critic_net((observation,action))
2 frames
/usr/local/lib/python3.8/dist-packages/tf_agents/networks/network.py in __call__(self, inputs, *args, **kwargs)
425 normalized_kwargs.pop("network_state", None)
426
--> 427 outputs, new_state = super(Network, self).__call__(**normalized_kwargs) # pytype: disable=attribute-error # typed-keras
428
429 nest_utils.assert_matching_dtypes_and_inner_shapes(
/usr/local/lib/python3.8/dist-packages/keras/utils/traceback_utils.py in error_handler(*args, **kwargs)
68 # To get the full stack trace, call:
69 # `tf.debugging.disable_traceback_filtering()`
---> 70 raise e.with_traceback(filtered_tb) from None
71 finally:
72 del filtered_tb
/usr/local/lib/python3.8/dist-packages/tf_agents/agents/ddpg/critic_network.py in call(***failed resolving arguments***)
166 actions = layer(actions, training=training)
167
--> 168 joint = tf.concat([observations, actions], 1)
169 for layer in self._joint_layers:
170 joint = layer(joint, training=training)
InvalidArgumentError: Exception encountered when calling layer 'CriticNetwork' (type CriticNetwork).
{{function_node __wrapped__ConcatV2_N_2_device_/job:localhost/replica:0/task:0/device:CPU:0}} ConcatOp : Dimension 0 in both shapes must be equal: shape[0] = [28,1] vs. shape[1] = [8,1] [Op:ConcatV2] name: concat
Call arguments received by layer 'CriticNetwork' (type CriticNetwork):
• inputs=('tf.Tensor(shape=(28,), dtype=float32)', 'tf.Tensor(shape=(8,), dtype=float32)')
• step_type=()
• network_state=()
• training=False
Can someone help me with the evaluation of the critic network?

Related

My traied model with tensorflow on transformer pipeline pop out error

I’m using this github text summarization and I have a problem. I have been struggling for two weeks and I could not figure that out.
I'm using a notebook from this github repository:
https://github.com/flogothetis/Abstractive-Summarization-T5-Keras
notebook link:
https://github.com/flogothetis/Abstractive-Summarization-T5-Keras/blob/main/AbstractiveSummarizationT5.ipynb
After training the model I want to use huggingface transformer pipeline to generate summarizations.
from transformers import pipeline
summarizer = pipeline(“summarization”, model=model, tokenizer=“t5small”, framework=“tf”)
summarizer(“some text”)
but it returns the following error:
AttributeError: ‘Functional’ object has no attribute 'config’
Anyone has any idea how can i solve it?
full error:
AttributeError Traceback (most recent call last)
/tmp/ipykernel_20/1872405895.py in
----> 1 summarizer = pipeline(“summarization”, model=model, tokenizer=“t5-small”, framework=“tf”)
2
3 summarizer(“The US has passed the peak on new coronavirus cases, President Donald Trump said and predicted that some states would reopen”)
/opt/conda/lib/python3.7/site-packages/transformers/pipelines/init.py in pipeline(task, model, config, tokenizer, framework, revision, use_fast, use_auth_token, model_kwargs, **kwargs)
432 break
433
→ 434 return task_class(model=model, tokenizer=tokenizer, modelcard=modelcard, framework=framework, task=task, **kwargs)
/opt/conda/lib/python3.7/site-packages/transformers/pipelines/text2text_generation.py in init(self, *args, **kwargs)
37
38 def init(self, *args, **kwargs):
—> 39 super().init(*args, **kwargs)
40
41 self.check_model_type(
/opt/conda/lib/python3.7/site-packages/transformers/pipelines/base.py in init(self, model, tokenizer, modelcard, framework, task, args_parser, device, binary_output)
548
549 # Update config with task specific parameters
→ 550 task_specific_params = self.model.config.task_specific_params
551 if task_specific_params is not None and task in task_specific_params:
552 self.model.config.update(task_specific_params.get(task))
AttributeError: ‘Functional’ object has no attribute 'config’

OSError: [Errno 95] Operation not supported: '/content/drive/Mask_RCNN' on Google Colab

Hello trying to use saved weights for a Mask RCNN model within colab and keep incurring the error message below. I have tried different ways of accessing the .h5 problem, which was an issue before, and now I have hit a brick wall. I have tried to train different parts of the model, nothing works. Nothing specific is available on google colab with these circumstances.
The following is the cell that throws the issue:
# Training dataset.
dataset_train = linkedinDataset()
dataset_train.load_dataset(dataset_dir, "train")
dataset_train.prepare()
# Validation dataset
dataset_val = linkedinDataset()
dataset_val.load_dataset(dataset_dir, "val")
dataset_val.prepare()
# *** This training schedule is an example. Update to your needs ***
#
#
#
print("Training network heads")
model.train(dataset_train,
dataset_val,
learning_rate=config.LEARNING_RATE,
epochs=5,
layers='heads')```
```Training network heads
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
<ipython-input-19-174a93609e58> in <module>()
17 learning_rate=config.LEARNING_RATE,
18 epochs=5,
---> 19 layers='heads')
2 frames
/content/Mask_RCNN/mrcnn/model.py in train(self, train_dataset, val_dataset, learning_rate, epochs,
layers, augmentation, custom_callbacks, no_augmentation_sources)
2334 # Create log_dir if it does not exist
2335 if not os.path.exists(self.log_dir):
-> 2336 os.makedirs(self.log_dir)
2337
2338 # Callbacks
/usr/lib/python3.6/os.py in makedirs(name, mode, exist_ok)
208 if head and tail and not path.exists(head):
209 try:
--> 210 makedirs(head, mode, exist_ok)
211 except FileExistsError:
212 # Defeats race condition when another thread created the path
/usr/lib/python3.6/os.py in makedirs(name, mode, exist_ok)
218 return
219 try:
--> 220 mkdir(name, mode)
221 except OSError:
222 # Cannot rely on checking for EEXIST, since the operating system
OSError: [Errno 95] Operation not supported: '/content/drive/Mask_RCNN'```
You cannot use
'/content/drive/Mask_RCNN'
You should save to either
'/content/Mask_RCNN'
Or, if to use Google Drive,
'/content/drive/MyDrive/Mask_RCNN'

ValueError: Target size (torch.Size([16])) must be the same as input size (torch.Size([16, 1]))

ValueError Traceback (most recent call last)
<ipython-input-30-33821ccddf5f> in <module>
23 output = model(data)
24 # calculate the batch loss
---> 25 loss = criterion(output, target)
26 # backward pass: compute gradient of the loss with respect to model parameters
27 loss.backward()
C:\Users\mnauf\Anaconda3\envs\federated_learning\lib\site-packages\torch\nn\modules\module.py in __call__(self, *input, **kwargs)
487 result = self._slow_forward(*input, **kwargs)
488 else:
--> 489 result = self.forward(*input, **kwargs)
490 for hook in self._forward_hooks.values():
491 hook_result = hook(self, input, result)
C:\Users\mnauf\Anaconda3\envs\federated_learning\lib\site-packages\torch\nn\modules\loss.py in forward(self, input, target)
593 self.weight,
594 pos_weight=self.pos_weight,
--> 595 reduction=self.reduction)
596
597
C:\Users\mnauf\Anaconda3\envs\federated_learning\lib\site-packages\torch\nn\functional.py in binary_cross_entropy_with_logits(input, target, weight, size_average, reduce, reduction, pos_weight)
2073
2074 if not (target.size() == input.size()):
-> 2075 raise ValueError("Target size ({}) must be the same as input size ({})".format(target.size(), input.size()))
2076
2077 return torch.binary_cross_entropy_with_logits(input, target, weight, pos_weight, reduction_enum)
ValueError: Target size (torch.Size([16])) must be the same as input size (torch.Size([16, 1]))
I am training a CNN. Working on the Horses vs humans dataset. This is my code. I am using criterion = nn.BCEWithLogitsLoss() and optimizer = optim.RMSprop(model.parameters(), lr=0.01). My final layer is self.fc2 = nn.Linear(512, 1). Out last neuron, will output 1 for horse and 0 for human, right? or should I choose 2 neurons for output?
16 is the batch size. Since the error says ValueError: Target size (torch.Size([16])) must be the same as input size (torch.Size([16, 1])). I don't understand, where do I need to make change, to rectify the error.
target = target.unsqueeze(1), before passing target to criterion, changed the target tensor size from [16] to [16,1]. Doing it solved the issue. Furthermore, I also needed to do target = target.float() before passing it to criterion, because our outputs are in float. Besides, there was another error in the code. I was using sigmoid activation function in the last layer, but I shouldn’t because the criterion I am using already comes with sigmoid builtin.
You can also try _, pred = torch.max(output, 1) and then pass the pred variable into Loss function.
I had the same error when I ran my model. I was able to correct it by returning torch.tensor([target]).float().to(device) at the Dataset class.

Adam optimizer error: one of the variables needed for gradient computation has been modified by an inplace operation

I am trying to implement Actor-Critic learning atuomation algorithm that is not same as basic actor-critic algorithm, it's little bit changed.
Anyway, I used Adam optimizer and implemented with pytorch
when i backward TD-error for Critic first, there's no error.
However, i backward loss for Actor, the error occured.
--------------------------------------------------------------------------- RuntimeError Traceback (most recent call
last) in
46 # update Actor Func
47 optimizer_M.zero_grad()
---> 48 loss.backward()
49 optimizer_M.step()
50
~\Anaconda3\lib\site-packages\torch\tensor.py in backward(self,
gradient, retain_graph, create_graph)
100 products. Defaults to False.
101 """
--> 102 torch.autograd.backward(self, gradient, retain_graph, create_graph)
103
104 def register_hook(self, hook):
~\Anaconda3\lib\site-packages\torch\autograd__init__.py in
backward(tensors, grad_tensors, retain_graph, create_graph,
grad_variables)
88 Variable._execution_engine.run_backward(
89 tensors, grad_tensors, retain_graph, create_graph,
---> 90 allow_unreachable=True) # allow_unreachable flag
91
92
RuntimeError: one of the variables needed for gradient computation has
been modified by an inplace operation
above is the content of error
I tried to find inplace operation, but I haven't found in my written code.
I think i don't know how to handle optimizer.
Here is main code:
for cur_step in range(1):
action = M_Agent(state, flag)
next_state, r = env.step(action)
# calculate TD Error
TD_error = M_Agent.cal_td_error(r, next_state)
# calculate Target
target = torch.FloatTensor([M_Agent.cal_target(TD_error)])
logit = M_Agent.cal_logit()
loss = criterion(logit, target)
# update value Func
optimizer_M.zero_grad()
TD_error.backward()
optimizer_M.step()
# update Actor Func
loss.backward()
optimizer_M.step()
Here is the agent network
# Actor-Critic Agent
self.act_pipe = nn.Sequential(nn.Linear(state, 128),
nn.ReLU(),
nn.Dropout(0.5),
nn.Linear(128, 256),
nn.ReLU(),
nn.Dropout(0.5),
nn.Linear(256, num_action),
nn.Softmax()
)
self.val_pipe = nn.Sequential(nn.Linear(state, 128),
nn.ReLU(),
nn.Dropout(0.5),
nn.Linear(128, 256),
nn.ReLU(),
nn.Dropout(0.5),
nn.Linear(256, 1)
)
def forward(self, state, flag, test=None):
temp_action_prob = self.act_pipe(state)
self.action_prob = self.cal_prob(temp_action_prob, flag)
self.action = self.get_action(self.action_prob)
self.value = self.val_pipe(state)
return self.action
I wanna update each network respectively.
and I wanna know that Basic TD Actor-Critic method uses TD error for loss??
or squared error between r+V(s') and V(s) ?
I think the problem is that you zero the gradients right before calling backward, after the forward propagation. Note that for automatic differentiation you need the computation graph and the intermediate results that you produce during your forward pass.
So zero the gradients before your TD error and target calculations! And not after you are finished your forward propagation.
for cur_step in range(1):
action = M_Agent(state, flag)
next_state, r = env.step(action)
optimizer_M.zero_grad() # zero your gradient here
# calculate TD Error
TD_error = M_Agent.cal_td_error(r, next_state)
# calculate Target
target = torch.FloatTensor([M_Agent.cal_target(TD_error)])
logit = M_Agent.cal_logit()
loss = criterion(logit, target)
# update value Func
TD_error.backward()
optimizer_M.step()
# update Actor Func
loss.backward()
optimizer_M.step()
To answer your second question, the DDPG algorithm for example uses the squared error (see the paper).
Another recommendation. In many cases large parts of the value and policy networks are shared in deep actor-critic agents: you have the same layers up to the last hidden layer, and use a single linear output for value prediction and a softmax layer for the action distribution. This is especially useful if you have high dimensional visual inputs, as it act as sort of a multi-task learning, but nevertheless you can try. (As I see you have a low-dimensional state vector).

How can I solve this elusive error in my multi-GPU Pytorch setup?

I have spent the past day trying to figure out how to use multiple GPUs. In theory, parallelizing models across multiple GPUs is supposed to be as as easy as simply wrapping models with nn.DataParallel. However, I have found that this does not work for me. To use the most simple and canonical thing I could find for proof of this, I ran the code in the Data Parallelism tutorial, line for line.
I have tried everything from only having a specific permutation of my GPUs be visible to CUDA to reinstalling everything related to CUDA but can't figure out why I cannot run with multiple GPUs. Some information about my machine:
Operating System: Ubuntu 16.04
GPUS: 4 1080tis
Pytorch version: 1.01
CUDA version: 10.0
The error code is the following:
---------------------------------------------------------------------------
KeyboardInterrupt Traceback (most recent call last)
<ipython-input-3-0f0d83e9ef13> in <module>
1 for data in rand_loader:
2 input = data.to(device)
----> 3 output = model(input)
4 print("Outside: input size", input.size(),
5 "output_size", output.size())
/usr/local/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
487 result = self._slow_forward(*input, **kwargs)
488 else:
--> 489 result = self.forward(*input, **kwargs)
490 for hook in self._forward_hooks.values():
491 hook_result = hook(self, input, result)
/usr/local/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py in forward(self, *inputs, **kwargs)
141 return self.module(*inputs[0], **kwargs[0])
142 replicas = self.replicate(self.module, self.device_ids[:len(inputs)])
--> 143 outputs = self.parallel_apply(replicas, inputs, kwargs)
144 return self.gather(outputs, self.output_device)
145
/usr/local/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py in parallel_apply(self, replicas, inputs, kwargs)
151
152 def parallel_apply(self, replicas, inputs, kwargs):
--> 153 return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
154
155 def gather(self, outputs, output_device):
/usr/local/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py in parallel_apply(modules, inputs, kwargs_tup, devices)
73 thread.start()
74 for thread in threads:
---> 75 thread.join()
76 else:
77 _worker(0, modules[0], inputs[0], kwargs_tup[0], devices[0])
/usr/local/lib/python3.6/threading.py in join(self, timeout)
1054
1055 if timeout is None:
-> 1056 self._wait_for_tstate_lock()
1057 else:
1058 # the behavior of a negative timeout isn't documented, but
/usr/local/lib/python3.6/threading.py in _wait_for_tstate_lock(self, block, timeout)
1070 if lock is None: # already determined that the C code is done
1071 assert self._is_stopped
-> 1072 elif lock.acquire(block, timeout):
1073 lock.release()
1074 self._stop()
KeyboardInterrupt:
Any insight into this error would be very much appreciated. From my relatively limited systems and CUDA knowledge, it has to do with some sort of locking, but I can't for the life of me figure out how to fix this.