I am trying to execute custom yolov5 object detection model. I am facing following error - yolo

Traceback (most recent call last):
File "C:\Users\Bhavesh\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\multiprocessing\reductions.py", line 36, in del
File "C:\Users\Bhavesh\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\storage.py", line 520, in _free_weak_ref
AttributeError: 'NoneType' object has no attribute '_free_weak_ref'
AttributeError: 'NoneType' object has no attribute '_free_weak_ref'

This problem happens more frequently when you are running out of CPU, I think you are running your model on CPU maybe try to switch to GPU if you have a big dataset or make sure the batch size fits your CPU if your dataset isn't that big. Sometimes this problem can disappear if some processes are done.

Apparently this AttributeError is related to pytorch not with yolov5.
Based on the comments from some people on this thread, it is resolved by downgrading pytorch version. However, this AttributeError has no impact on training and saving the data/model to the disk. Everything works fine with the training and model
More details on this thread.
https://github.com/pytorch/pytorch/issues/74016

I had the same problem, I fixed it by deleting the c:\users\MyUser.cache\torch folder and running the project again, this reinstalls the dependencies.
I hope it helps you

Related

How to drop elements in dataset that can cause an error while training a TensorFlow Lite model

I am trying to train a simple image classification model using TensorFlow Lite. I am following this documentation to write my code. As specified in the documentation, in order to train my model, I have written model = image_classifier.create(train_data, model_spec=model_spec.get('mobilenet_v2'), validation_data=validation_data). After training for a few seconds, however, I get an InvalidArgumentError. I believe that the error is due to something in my dataset but it is too difficult to eliminate all the sources of the error from the dataset manually because it consists of thousands of images. After some research, I found a potential solution - I could use tf.data.experimental.ignore_errors which would "produce a dataset that contains the same elements as the input, but silently drop any elements that caused an error." From the documentation, however, (here) I couldn't figure out how to integrate this transformation function with my code. If I place the line dataset = dataset.apply(tf.data.experimental.ignore_errors()) before training the model, the system doesn't know which elements to drop. If I place the line after, the system never reaches the line because an error arises in training. Moreover, the system gives an error message AttributeError: 'ImageClassifierDataLoader' object has no attribute 'apply'. I would appreciate if someone can tell me how to integrate tf.data.experimental.ignore_errors() with my model or possible alternatives to the issue I am facing.
Hi if you are exactly following the documentation then
tf.data.experimental.ignore_errors won't work for you because you are not loading your data using tf.data,You are most probably using from tflite_model_maker.image_classifier import DataLoader.
Note: Please mention the complete code snippet to help you out to solve the issue

saving RL agent by pickle, cannot save because of pickle.thread_RLock -- what is the source of this error?

I am trying to save my reinforcement learning agent class after training for further training later on by pickling it.
The script used is:
with open('agent.pickle','wb') as agent_file:
pickle.dump(agent,agent_file)
I am receiving an error:
TypeError: can't pickle _thread.RLock objects
I have searched this error message but not sure what the actual source of the error is. The traceback is uninformative with respect to specifically which line of code is causing this error. The scripts uses come from 3 independent .py files. A tensorflow, keras model has been built in one of them, but again unsure about where specifically this is coming from!I have read this error can come from lambda functions, but none of these are defined by myself, unless they are used internally byu a package such as tensorflow.
I too faced the same error but found a workaround. After model.fit(), use model.save("modelName"). It will make a folder inside which the model will be saved.
To load the model, use keras.models.load_model("modelName")

OOM error only after many epochs of training a tacotron model

I was checking out google's tacotron2 model, slightly modified it to fit to my data. The training runs successfully until about 9000 epoch, but throws an OOM error then (I repeated the training, but it stops at the exact same spot every time I try).
I added swap_memory=True option in the tf.nn.bidirectional_dynamic_rnn function to see if it resolves. After that change, the training runs a bit slower, but was able to run for more epochs, but it still throws OOM error at about 10000 epoch.
I'm using a 12GB titanX gpu. The model checkpoint files (3 files per checkpoint) are only 500 MB, and 80 MB for meta and data files. I don't know enough about checkpoints but if it represents all the model parameters and all variables necessary for training, it seems much smaller than 12 GB and I don't understand why OOM error occurs.
Does anybody have a clue what might cause OOM error? How do I check if there are stray variables/graphs keep accumulating? Or does the dynamic rnn somehow cause the problem?
have no found this error. maybe you can just upgrade tensorflow version or cuda driver. or just reduce batch size

Orthogonal initializer is being called on a matrix with more elements: Slowness may result

tensorflowjs version: 0.11.7
Keras version: 2.0.4
I am trying to run Keras converted model on browser. For conversion I have used tensorflowjs converter, the conversion went fine. However, at load time,
Orthogonal initializer is being called on a matrix with more than 2000
(2560000) elements: Slowness may result.
This message pops up and the memory blowup happens and lastly the browser stops working.
Following is my keras model architecture:
As we run the tensorflowjs code to load the aforementioned model, following warning pops up:
Is there a way to speed up the Orthogonal initialization process, so the model gets loaded in less time and with less effort?
Any help will be appreciated.

TensorFlow's target pruning can't find nodes

I wrote a Python script using the TensorFlow API, including a SummaryWriter that dumps the graph definition so I can look at it in TensorBoard.
When running the script, a NotFoundError is thrown saying PruneForTargets: Some target nodes not found: Reading/data_queue_EnqueueMany_1. As its name implies, the node in question was created by an enqueue_many call on a FIFOQueue (which is then started in a QueueRunner); it does in fact exist, and can be seen clearly in TensorBoard.
What could cause TensorFlow to not find some nodes?
This is a known issue that occurs when you start threads that access the TensorFlow graph (e.g. your QueueRunner) before adding more nodes to the graph. (The underlying tf.Graph data structure is not thread-safe for concurrent reads and writes.)
The solution is to move tf.train.start_queue_runners(sess) (and any other code that starts threads) after the last node is constructed. One way to double-check this is to add a call to tf.get_default_graph().finalize() immediately before calling start_queue_runners().