Incorrect Broadcast input array shape error when trying to use Pretraining - spacy

I am trying to use spacy's 'pre-train' feature for a NER task, so here is what I tried doing(I am still trying to use it),
Step 1: I started by initializing the model with 'en_core_web_lg' next I saved this model to disk and tested its NER capability on few lines to see if it recognizes the tags in those test lines. (Made a note of ignored tags)
Step 2: Next I created a .jsonl file with new data to train on (about 20 new lines, I wanted to see the model's capability given new data around an entity(ignored tags found earlier) will it be able to correctly identify tags after doing transfer learning). So using this .jsonl and the model I saved earlier file I used 'spacy pre-train' command to train, this created a token2vec .bin file for me (model999.bin).
Step 3: Next I created a function that takes the location of an earlier saved model(model saved in step 1) and location of token2vec (model999.bin file obtained in step 2). Inside the function it loads the model>creates/gets pipe>disables rest of the files>uses (pipe_name).model.tok2vec.from_bytes(file_.read()) to read from model999.bin and broadcast the learned vectors to base model.
But when I run this function, I get this error:
ValueError: could not broadcast input array from shape (96,3,384) into shape (96,3,480)
(I have uploaded the entire notebook here: [https://github.com/pratikdk/ner_test/blob/master/base_model_contextual_TF.ipynb ]).
In order to pre-train I used this function
python -m spacy pre-train ub.jsonl model_saves w2s
Here are the 20 lines I tried training on top of the base model
[ https://github.com/pratikdk/ner_test/blob/master/ub.jsonl ]
What am I doing wrong here exactly? Please can you also point the fix, I am sure many would need insight on this.
Environment
Operating System: CentOS
Python Version Used: 3.7.3
spaCy Version Used: 2.1.3
Environment Information: Anaconda Jupyter Lab

So I was able to fix this, the developer(on github) answered my question.
Here is the answer:
https://github.com/explosion/spaCy/issues/3616

Related

Error: in the file data/coco.names number of names 80 that isn't equal to classes=13

I was using Google Colab to train Yolo-v3 to detect custom objects. I'm new to Colab, and darknet.
I used the following command for training:
!./darknet detector train "/content/gdrive/My Drive/darknet/obj.data" "/content/gdrive/My Drive/darknet/cfg/yolov3-PID.cfg" "/content/gdrive/My Drive/darknet/backup/yolov3-PID_final.weights" -dont_show
The training finished as follows, and it didn't display any details of the epochs (I don't know how many epochs actually run). Actually, it took very short time until it displayed Done!, and saved the weights as shown in the above image
Then, I tried to detect a test image with the following command:
!./darknet detect "/content/gdrive/My Drive/darknet/cfg/yolov3-PID.cfg" "/content/gdrive/My Drive/darknet/backup/yolov3-PID_final.weights" "/content/gdrive/My Drive/darknet/img/MN 111-0-515 (45).jpg" -dont-show
However, I got the following error:
Error: in the file data/coco.names number of names 80 that isn't equal to classes=13 in the file /content/gdrive/My Drive/darknet/cfg/yolov3-PID.cfg
Even, the resulting image didn't contain any bounding boxes, so I don't know if the training worked or not.
Could you pls advise what might be wrong with the training, and why the error is referring to coco.names, while I'm using other files for names, and configuration?
You did not share the yolov3-PID.cfg, obj.data and coco.names. I am assuming coco.names contain 80 classes as in the repo.
The error likely is in obj.data, where it seems your goal here is to detect 13 custom objects. If this is the case, then set classes=13, also replace names=data/coco.names with names=data/obj.names. Here, obj.names file should contain 13 lines for the custom class names. Also modify yolov3-PID.cfg to contain same amount of classes.
I suggest using this repo below if you are not already using this. It contains google colab training and inference script for yolov3, yolov4.
Here are the instructions for custom object detection training.
Nice work!!! coming this far. Well, everything is fine, you just need to edit the data folder of the darknet. By default it's using coco label, go to darknet folder --> find data folder --> coco.names file --> edit the file by removing 80 classes(in colab just double click to edit and ctrl+s to save) --> Put down your desired class and it's done!!!
i was having the same problem when i was training custom model in colab.
i just cloned darknet again in another folder and edited coco.name and moved it to my training folder. and it worked!!

GluonCV inference with finetuned model - “Please make sure source and target networks have the same prefix” error

I used GluonCV to finetune an object detection model in order to recognize some custom classes, mostly following the related tutorial.
I tried using both “ssd_512_resnet50_v1_coco” and “ssd_512_mobilenet1.0_coco” as base models, and the training process ended successfully (the accuracy on the validation dataset is reasonably high).
The problem is, I tried running inference with the newly trained model, by using for example:
classes = ["CML_mug", "person"]
net = gcv.model_zoo.get_model('ssd_512_mobilenet1.0_custom',
classes=classes,
pretrained_base=False,
ctx=ctx)
net.load_params("saved_weights/-0070.params", ctx=ctx)
but I get the error:
AssertionError: Parameter 'mobilenet0_conv0_weight' is missing in file: saved_weights/CML_mobilenet_00/-0000.params, which contains parameters: 'ssd0_ssd0_mobilenet0_conv0_weight', 'ssd0_ssd0_mobilenet0_batchnorm0_gamma', 'ssd0_ssd0_mobilenet0_batchnorm0_beta', ..., 'ssd0_ssd0_ssdanchorgenerator2_anchor_2', 'ssd0_ssd0_ssdanchorgenerator3_anchor_3', 'ssd0_ssd0_ssdanchorgenerator4_anchor_4', 'ssd0_ssd0_ssdanchorgenerator5_anchor_5'. Please make sure source and target networks have the same prefix.
So, it seems the network parameters are named differently in the .params file and in the model I’m using for inference. Specifically, in the .params file, the name of the network weights is prefixed by the string “ssd0_ssd0_”, which lead to the error when invoking net.load_parameters.
I did this whole procedure a few times in the past without having problems, did anything change? I’m running it on Ubuntu 18.04, with mxnet-mkl (1.6.0) and gluoncv (0.7.0).
I tried loading the .params file by:
from mxnet import nd
model = nd.load(0070.param)
and I wanted to modify it and remove the “ssd0_ssd0_” string that is causing the problem.
I’m trying to navigate the dictionary, but between the keys I only found a:
ssd0_resnetv10_conv0_weight
so, slightly different than indicated in the error.
Anyway, this way of fixing the issue would be a little cumbersome, I’d prefer a more direct way.
Ok, fixed it. Basically, during training I was saving the .params file by using:
net.export(param_file)
and, as I said, loading them during inference by:
net.load_parameters(param_file)
However, it doesn’t work this way, but it does if instead of export I use:
net.save_parameters(param_file)

what's the pipeline to train tensorflow attention-ocr on customized dataset?

I've read some questions on stackoverflow about attention-ocr, and most of them are about the implementation detail of a specific step. What I wanted to know is the pipeline for us to fine-tune this model on our own dataset.
As far as I know, the steps should be:
0) Should we first download FSNS dataset?? I tried to bypass this step and try running inference on just one image, but it always give me error:"ImportError: No module named 'fsns". So I wonder if this error will go away once I set my own dataset up.
1) Store our data in the same format as FSNS. (Links on this topic: How to create dataset in the same format as the FSNS dataset?, how to create cutomized dataset for google tensorflow attention ocr? )
2) Download the pre-trained checkpoint(http://download.tensorflow.org/models/attention_ocr_2017_08_09.tar.gz)
3) Somehow modify the 'model.py' to fit your own purpose.
4) Somehow modify the 'train.py' to train your own module using tensorflow serving.
I am still on the early stage (creating own dataset) on this project now, and confused on how to do it and what's the next stage.
The error was caused by incorrect version of Python. They should be run with Python 2, and you can just change the 'import' sentence to solve this error. Try to change the 'import fsns' to 'from datasets import fsns'.

TensorBoard doesn't show all data points

I was running a very long training (reinforcement learning with 20M steps) and writing summary every 10k steps. In between step 4M and 6M, I saw 2 peaks in my TensorBoard scalar chart for game score, then I let it run and went to sleep. In the morning, it was running at about step 12M, but the peaks between step 4M and 6M that I saw earlier disappeared from the chart. I tried to zoom in and found out that TensorBoard (randomly?) skipped some of the data points. I also tried to export the data but some data point including the peaks are also missing in the exported .csv.
I looked for answers and found this in TensorFlow github page:
TensorBoard uses reservoir sampling to downsample your data so that it can be loaded into RAM. You can modify the number of elements it will keep per tag in tensorboard/backend/server.py.
Has anyone ever modified this server.py file? Where can I find the file and if I installed TensorFlow from source, do I have to recompile it after I modified the file?
You don't have to change the source code for this, there is a flag called --samples_per_plugin.
Quoting from the help command
--samples_per_plugin: An optional comma separated list of plugin_name=num_samples pairs to explicitly
specify how many samples to keep per tag for that plugin. For unspecified plugins, TensorBoard
randomly downsamples logged summaries to reasonable values to prevent out-of-memory errors for long
running jobs. This flag allows fine control over that downsampling. Note that 0 means keep all
samples of that type. For instance, "scalars=500,images=0" keeps 500 scalars and all images. Most
users should not need to set this flag.
(default: '')
So if you want to have a slider of 100 images, use:
tensorboard --samples_per_plugin images=100
The comment is out of date - it can actually be modified in tensorboard/backend/application.py, in the "Default Size Guidance". By default, it stores 1000 scalars. You can increase that limit arbitrarily, or set it to 0 to store every scalar.
You don't need to recompile TensorBoard, or even download it from source. You could just modify this file in your TensorBoard yourself.
If you install TensorFlow using pip in virtualenv (ubuntu, mac), then within your virtualenv directory the path to application.py should be something like lib/python2.7/site-packages/tensorflow/tensorboard/backend. If you modify that file, you should get the new setting in your tensorboard (when you run tensorboard in that virtualenv). If you're like me, you'll put a print statement too so you can be sure that you're running modified code :)

Training custom dataset with translate model

Running the model out of the box generates these files in the data dir :
ls
dev-v2.tgz newstest2013.en
giga-fren.release2.fixed.en newstest2013.en.ids40000
giga-fren.release2.fixed.en.gz newstest2013.fr
giga-fren.release2.fixed.en.ids40000 newstest2013.fr.ids40000
giga-fren.release2.fixed.fr training-giga-fren.tar
giga-fren.release2.fixed.fr.gz vocab40000.from
giga-fren.release2.fixed.fr.ids40000 vocab40000.to
Reading the src of translate.py :
https://github.com/tensorflow/models/blob/master/tutorials/rnn/translate/translate.py
tf.app.flags.DEFINE_string("from_train_data", None, "Training data.")
tf.app.flags.DEFINE_string("to_train_data", None, "Training data.")
To utilize my own training data I created dirs my-from-train-data & to-from-train-data and add my own training data to each of these dirs, training data is contained in the files mydata.from & mydata.to
my-to-train-data contains mydata.from
my-from-train-data contains mydata.to
I could not find documentation as to using own training data or what format it should take so I inferred this from the translate.py src and contents of data dir created when executing translate model out of the box.
Contents of mydata.from :
Is this a question
Contents of mydata.to :
Yes!
I then attempt to train the model using :
python translate.py --from_train_data my-from-train-data --to_train_data my-to-train-data
This returns with an error :
tensorflow.python.framework.errors_impl.NotFoundError: my-from-train-data.ids40000
Appears I need to create file my-from-train-data.ids40000 , what should it's contents be ? Is there an example of how to train this model using custom data ?
blue-sky
Great question, training a model on your own data is way more fun than using the standard data. An example of what you could put in the terminal is:
python translate.py --from_train_data mydatadir/to_translate.in --to_train_data mydatadir/to_translate.out --from_dev_data mydatadir/test_to_translate.in --to_dev_data mydatadir/test_to_translate.out --train_dir train_dir_model --data_dir mydatadir
What goes wrong in your example is that you are not pointing to a file, but to a folder. from_train_data should always point to a plaintext file, whose rows should be aligned with those in the to_train_data file.
Also: as soon as you run this script with sensible data (more than one line ;) ), translate.py will generate your ids (40.000 if from_vocab_size and to_vocab_size are not set). Important to know is that this file is created in the folder specified by data_dir... if you do not specify one this means they are generated in /tmp (I prefer them at the same place as my data).
Hope this helps!
Quick answer to :
Appears I need to create file my-from-train-data.ids40000 , what should it's contents be ? Is there an example of how to train this model using custom data ?
Yes, that's the vocab/ word-id file missing, which is generated when preparing to create the data.
Here is a tutorial from the Tesnorflow documentation.
quick over-view of the files and why you might be confused by the files outputted vs what to use:
python/ops/seq2seq.py: >> Library for building sequence-to-sequence models.
models/rnn/translate/seq2seq_model.py: >> Neural translation sequence-to-sequence model.
models/rnn/translate/data_utils.py: >> Helper functions for preparing translation data.
models/rnn/translate/translate.py: >> Binary that trains and runs the translation model.
The Tensorflow translate.py file requires several files to be generated when using your own corpus to translate.
It needs to be aligned, meaning: language line 1 in file 1. <> language line 1 file 2. This
allows the model to do encoding and decoding.
You want to make sure the Vocabulary have been generated from the dataset using this file:
Check these steps:
python translate.py
--data_dir [your_data_directory] --train_dir [checkpoints_directory]
--en_vocab_size=40000 --fr_vocab_size=40000
Note! If the Vocab-size is lower, then change that value.
There is a longer discussion here tensorflow/issues/600
If all else fails, check out this ByteNet implementation in Tensorflow which does translation task as well.