I encounter this error when training with yolov7 RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu) - yolo

I want to train with yolov7 and get a .pt file, but I am getting such a problem while doing the training, what could be the reason?

Related

Tensorflow not using multiple GPUs - getting OOM

I'm running into OOM on a multi-gpu machine, because TF 2.3 seems to be allocating a tensor using only one GPU.
tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at conv_ops.cc:539 :
Resource exhausted: OOM when allocating tensor with shape[20532,64,48,32]
and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc.
But tensorflow does recognize multiple GPUs when I run my code:
Adding visible gpu devices: 0, 1, 2
Is there anything else I need to do to have TF use all GPUs?
The direct answer is yes, you do need to do more to get TF to recognize multiple GPUs. You should refer to this guide but the tldr is
mirrored_strategy = tf.distribute.MirroredStrategy()
with mirrored_strategy.scope():
...
https://www.tensorflow.org/guide/distributed_training#using_tfdistributestrategy_with_tfkerasmodelfit
But in your case, something else is happening. While this one tensor may be triggering the OOM, it's likely because a few previous large tensors were allocated.
The first dimension, your batch size, is 20532, which is really big. Since the factorization of that is 2**2 × 3 × 29 × 59, I'm going to guess you are working with CHW format and your source image was 3x64x128 which got trimmed after a few convolutions. I'd suspect an inadvertent broadcast. Print a model.summary() and then review the sizes of the tensors coming out of each layer. You may also need to look at your batching.

Set batch size of trained keras model to 1

I am having a keras model trained on my own dataset. However after loading weights the summary shows None as the first dimension(the batch size).
I want to know the process to fix the shape to batch size of 1, as it is compulsory for me to fix it so i can convert the model to tflite with GPU support.
What worked for me was to specify batch size to the Input layer, like this:
input = layers.Input(shape=input_shape, batch_size=1, dtype='float32', name='images')
This then carried through the rest of the layers.
The bad news is that despite this "fix" the tfl runtime still complains about dynamic tensors. I get these non-fatal errors in logcat when it runs:
E/tflite: third_party/tensorflow/lite/core/subgraph.cc:801 tensor.data.raw != nullptr was not true.
E/tflite: Attempting to use a delegate that only supports static-sized tensors with a graph that has dynamic-sized tensors (tensor#26 is a dynamic-sized tensor).
E/tflite: Ignoring failed application of the default TensorFlow Lite delegate indexed at 0.
The good news is that despite these errors it seems to be using the GPU anyway, based on performance testing.
I'm using:
tensorflow-lite-support:0.2.0'
tensorflow-lite-metadata:0.2.1'
tensorflow-lite:2.6.0'
tensorflow:tensorflow-lite-gpu:2.3.0'
Hopefully, they'll fix the runtime so it doesn't matter whether the batch size is 'None'. It shouldn't matter for doing inference.

OOM with tensorflow

I'm facing an OOM error whole training my tensorflow model, the structure is as follows:
tf.contrib.layers.embed_sequence initialized with GoogleNewsVector
2 * tf.nn.rnn_cell.DropoutWrapper(tf.nn.rnn_cell.LSTMCell) #forward
2 * tf.nn.rnn_cell.DropoutWrapper(tf.nn.rnn_cell.LSTMCell) #backward
tf.nn.bidirectional_dynamic_rnn wrapping the above layers
tf.layers.dense as an output layer
i tried to reduce the batch size down to as low as 64, my input data is padded to 1500, and my vocab size is 8938
The cluster i'm using is very powerful (https://wiki.calculquebec.ca/w/Helios/en) i'm using two nodes with 8 GPUs each and still getting this error:
2019-02-23 02:55:16.366766: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at reverse_op.cc:270 : Resource exhausted: OOM when
allocating tensor with shape[2000,800,300] and type float on
/job:localhost/replica:0/task:0/device:GPU:1 by allocator GPU_1_bfc
I'm using the estimator API with MirroredStrategy and still no use, is there a way maybe to ask tensorflow to just run the training using the GPUs and keep the tensors stores on the main machine memory? Any other suggestions are welcome.
Running a particular operation (e.g. some tensor multiplication during training) using GPU requires having those tensors stored on the GPU.
You might want to use Tensorboard or something like that to see which operations require the most memory for your calculation graph. In particular, it's possible that first link between the embeddings and LSTM is the culprit and you'd need to narrow that somehow.

tensorflow model attention_ocr occur oom error when training by use gtx1070

my gpu is gtx1070,when i want to train attention_ocr model ,the log occur error
ResourceExhaustedError:OOM when allocating tensor with shape
the document say has tested with gtx980,so my gtx1070 is ok in theory。

TF Object Detection API - Trouble running quantized network after freezing and quantizing my fine-tuned network

TensorFlow Object Detection API
Using the TensorFlow Object Detection API to retrain MobileNet on my own DataSet. The issue occurs as I try to run my inference graph that has been both frozen and quantized.
System:
Ubuntu 16.04,
TensorFlow 1.2 (from source, CPU only),
Bazel 0.4.5
Issue:
Use provided frozen_graph.pb from model zoo.
Quantize to 8-bit using
bazel-bin/tensorflow/tools/graph_transforms/transform_graph.
Run inference
This works, however,
Re-train and produce my own frozen_graph.pb using object_detection/export_inference_graph.py
Quantize to 8-bit using bazel-bin/tensorflow/tools/graph_transforms/transform_graph
Run inference <-- Produces error
Does NOT work, and the error I'm getting during the attempt to run the graph is:
File
"/home/unibap/TensorFlow/tensorflow-python2-sse4.2/local/lib/python2.7/site-packages/tensorflow/python/client/session.py",
line 1298, in _do_call
raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: The node
'Preprocessor/map/while/ResizeImage/ResizeBilinear/eightbit' has
inputs from different frames. The input
'Preprocessor/map/while/ResizeImage/size' is in frame
'Preprocessor/map/while/Preprocessor/map/while/'. The input
'Preprocessor/map/while/ResizeImage/ResizeBilinear_eightbit/Preprocessor/map/while/ResizeImage/ExpandDims/quantize'
is in frame ''.
Since I can quantize and run the provided frozen_graph.pb the issue has to be with the export tool? Which export tool was used to create the frozen_graph.pb that are in the model zoo? Or how was the export tool called?
PS:
Quote from comments in export_inference_graph.pb, assuring me that it should produce a frozen graph if checkpoint is provided.
"Optionally, one can freeze the graph by converting the weights in the provided
checkpoint as graph constants thereby eliminating the need to use a checkpoint
file during inference."
Best