Error: in the file data/coco.names number of names 80 that isn't equal to classes=13 - yolo

I was using Google Colab to train Yolo-v3 to detect custom objects. I'm new to Colab, and darknet.
I used the following command for training:
!./darknet detector train "/content/gdrive/My Drive/darknet/obj.data" "/content/gdrive/My Drive/darknet/cfg/yolov3-PID.cfg" "/content/gdrive/My Drive/darknet/backup/yolov3-PID_final.weights" -dont_show
The training finished as follows, and it didn't display any details of the epochs (I don't know how many epochs actually run). Actually, it took very short time until it displayed Done!, and saved the weights as shown in the above image
Then, I tried to detect a test image with the following command:
!./darknet detect "/content/gdrive/My Drive/darknet/cfg/yolov3-PID.cfg" "/content/gdrive/My Drive/darknet/backup/yolov3-PID_final.weights" "/content/gdrive/My Drive/darknet/img/MN 111-0-515 (45).jpg" -dont-show
However, I got the following error:
Error: in the file data/coco.names number of names 80 that isn't equal to classes=13 in the file /content/gdrive/My Drive/darknet/cfg/yolov3-PID.cfg
Even, the resulting image didn't contain any bounding boxes, so I don't know if the training worked or not.
Could you pls advise what might be wrong with the training, and why the error is referring to coco.names, while I'm using other files for names, and configuration?

You did not share the yolov3-PID.cfg, obj.data and coco.names. I am assuming coco.names contain 80 classes as in the repo.
The error likely is in obj.data, where it seems your goal here is to detect 13 custom objects. If this is the case, then set classes=13, also replace names=data/coco.names with names=data/obj.names. Here, obj.names file should contain 13 lines for the custom class names. Also modify yolov3-PID.cfg to contain same amount of classes.
I suggest using this repo below if you are not already using this. It contains google colab training and inference script for yolov3, yolov4.
Here are the instructions for custom object detection training.

Nice work!!! coming this far. Well, everything is fine, you just need to edit the data folder of the darknet. By default it's using coco label, go to darknet folder --> find data folder --> coco.names file --> edit the file by removing 80 classes(in colab just double click to edit and ctrl+s to save) --> Put down your desired class and it's done!!!

i was having the same problem when i was training custom model in colab.
i just cloned darknet again in another folder and edited coco.name and moved it to my training folder. and it worked!!

Related

How do you save progress during a Keras Tuner run?

I am currently shifting through a larger search space with Keras Tuner on a free Google Colab instance. Because of usage limits, my search run will be interrupted before completion. I would like to save the progress of my search periodically in anticipation of those interruptions, and simply resume from the last checkpoint when the Colab resources become available to me again.
I found documentation on how to save specific models from a run, but I want to save the entire state of the search, including what was already tried and the results of those experiments.
Can I just call Tuner.get_state(), save the result, and then resume from where I left off with Tuner.set_state()? Or is there another way?
You don't need to call tuner.get_state() and tuner.set_state(). While instantiating a Tuner, say a RandomSearch, as mentioned in the example,
# While creating the tuner for the first time
tuner = RandomSearch(
build_model,
objective="val_accuracy",
max_trials=3,
executions_per_trial=2,
directory="my_dir",
project_name="helloworld",
)
You need to set the arguments directory and project_name. The checkpoints are saved in this directory. You may save this directory as a ZIP file and download using files.download().
When you get a new Colab instance, unzip that archive and restore the my_dir directory. Instantiate a Tuner again using,
# While loading the Tuner
tuner = RandomSearch(
build_model,
objective="val_accuracy",
max_trials=3,
executions_per_trial=2,
directory="my_dir", # <----- Use the directory as you did earlier
overwrite=False, # <-------
project_name="helloworld",
)
Now start the search and you'll notice that the best params so far hasn't changed. Also, tuner.results_summary() returns all search results.
See the documentation for the Tuner class here.

GluonCV inference with finetuned model - “Please make sure source and target networks have the same prefix” error

I used GluonCV to finetune an object detection model in order to recognize some custom classes, mostly following the related tutorial.
I tried using both “ssd_512_resnet50_v1_coco” and “ssd_512_mobilenet1.0_coco” as base models, and the training process ended successfully (the accuracy on the validation dataset is reasonably high).
The problem is, I tried running inference with the newly trained model, by using for example:
classes = ["CML_mug", "person"]
net = gcv.model_zoo.get_model('ssd_512_mobilenet1.0_custom',
classes=classes,
pretrained_base=False,
ctx=ctx)
net.load_params("saved_weights/-0070.params", ctx=ctx)
but I get the error:
AssertionError: Parameter 'mobilenet0_conv0_weight' is missing in file: saved_weights/CML_mobilenet_00/-0000.params, which contains parameters: 'ssd0_ssd0_mobilenet0_conv0_weight', 'ssd0_ssd0_mobilenet0_batchnorm0_gamma', 'ssd0_ssd0_mobilenet0_batchnorm0_beta', ..., 'ssd0_ssd0_ssdanchorgenerator2_anchor_2', 'ssd0_ssd0_ssdanchorgenerator3_anchor_3', 'ssd0_ssd0_ssdanchorgenerator4_anchor_4', 'ssd0_ssd0_ssdanchorgenerator5_anchor_5'. Please make sure source and target networks have the same prefix.
So, it seems the network parameters are named differently in the .params file and in the model I’m using for inference. Specifically, in the .params file, the name of the network weights is prefixed by the string “ssd0_ssd0_”, which lead to the error when invoking net.load_parameters.
I did this whole procedure a few times in the past without having problems, did anything change? I’m running it on Ubuntu 18.04, with mxnet-mkl (1.6.0) and gluoncv (0.7.0).
I tried loading the .params file by:
from mxnet import nd
model = nd.load(0070.param)
and I wanted to modify it and remove the “ssd0_ssd0_” string that is causing the problem.
I’m trying to navigate the dictionary, but between the keys I only found a:
ssd0_resnetv10_conv0_weight
so, slightly different than indicated in the error.
Anyway, this way of fixing the issue would be a little cumbersome, I’d prefer a more direct way.
Ok, fixed it. Basically, during training I was saving the .params file by using:
net.export(param_file)
and, as I said, loading them during inference by:
net.load_parameters(param_file)
However, it doesn’t work this way, but it does if instead of export I use:
net.save_parameters(param_file)

Incorrect Broadcast input array shape error when trying to use Pretraining

I am trying to use spacy's 'pre-train' feature for a NER task, so here is what I tried doing(I am still trying to use it),
Step 1: I started by initializing the model with 'en_core_web_lg' next I saved this model to disk and tested its NER capability on few lines to see if it recognizes the tags in those test lines. (Made a note of ignored tags)
Step 2: Next I created a .jsonl file with new data to train on (about 20 new lines, I wanted to see the model's capability given new data around an entity(ignored tags found earlier) will it be able to correctly identify tags after doing transfer learning). So using this .jsonl and the model I saved earlier file I used 'spacy pre-train' command to train, this created a token2vec .bin file for me (model999.bin).
Step 3: Next I created a function that takes the location of an earlier saved model(model saved in step 1) and location of token2vec (model999.bin file obtained in step 2). Inside the function it loads the model>creates/gets pipe>disables rest of the files>uses (pipe_name).model.tok2vec.from_bytes(file_.read()) to read from model999.bin and broadcast the learned vectors to base model.
But when I run this function, I get this error:
ValueError: could not broadcast input array from shape (96,3,384) into shape (96,3,480)
(I have uploaded the entire notebook here: [https://github.com/pratikdk/ner_test/blob/master/base_model_contextual_TF.ipynb ]).
In order to pre-train I used this function
python -m spacy pre-train ub.jsonl model_saves w2s
Here are the 20 lines I tried training on top of the base model
[ https://github.com/pratikdk/ner_test/blob/master/ub.jsonl ]
What am I doing wrong here exactly? Please can you also point the fix, I am sure many would need insight on this.
Environment
Operating System: CentOS
Python Version Used: 3.7.3
spaCy Version Used: 2.1.3
Environment Information: Anaconda Jupyter Lab
So I was able to fix this, the developer(on github) answered my question.
Here is the answer:
https://github.com/explosion/spaCy/issues/3616

Tensorflow Lite export looks like it do not add weigths and add unsupported operations

I want to reload some of my model variables with the saved weight in the chheckpoint and then export it to the tflite file.
The question is a bit tricky without see code, so I made this Colab jupyter notebook with the complete code to explain it better (All code is working, you can actually copy in a new collab and change if you want):
https://colab.research.google.com/drive/1wSor4CxEz36LgElVi4y_N8uiSt4-j9b2#scrollTo=XKBQzoW_wd4A
I got it working but with two issues:
The exported .tflite file is like 3Ks, so I do not believe it is the entire model with the weights in it. Only the input is an image of 128x128x3, one weight for each is more than 3K.
When I finally import the model in Android, I have this error: "Didn't find custom op for name 'VariableV2' /n Didn't find custom op for name 'ReorderAxes' /n Registration failed."
Maybe the last error is cause the save/restore operations? They look like are there when I save the graph definition.
Thanks in advance.
I realize my problem.. I'm trying to convert to TFLITE a model without previously freezing it, TFLITE do not allow "VariableV2" nodes cause they should not be there..
All the problem is corrected freezing the model like this:
output_graph_def = graph_util.convert_variables_to_constants(sess, sess.graph.as_graph_def(), ["output"])
I lost some time looking for that, hope it helps.

tensorflow retrain model file

im getting started with tensorflow und using retrain.py to teach it some new categories - this works well - however i have some questions:
In the comments of retrain.py it says:
"This produces a new model file that can be loaded and run by any TensorFlow
program, for example the label_image sample code"
however I havent found where this new model file is saved to ?
also: it does contain the whole model, right ? not just the retrained part ?
Thanks for clearing this up
1)I think you may want to save the new model.
When you want to save a model after some process, you can use
saver.save(sess, 'directory/model-name', *optional-arg).
Check out https://www.tensorflow.org/api_docs/python/tf/train/Saver
If you change model-name by epoch or any measure you would like to use, you can save the new model(otherwise, it may overlap with previous models saved).
You can find the model saved by searching 'checkpoint', '.index', '.meta'.
2)Saving the whole model or just part of it?
It's the part you need to learn bunch of ideas on tf.session and savers. You can save either the whole or just part, it's up to you. Again, start from the above link. The moral is that you put the variables you would like to save in a list quoted as 'var_list' in the link, and you can save only for them. When you call them back, you now also need to specify which variables in your model correspond to the variables in the loaded variables.
While running retrain.py you can give --output_graph and --output_labels parameters which specify the location to save graph (default is /tmp/output_graph.pb) and the labels as well. You can change those as per your requirements.