finetuning tensorflow seq2seq model - tensorflow

I've trained a seq2seq model for machine translation (DE-EN). And I have saved the trained model checkpoint. Now, I'd like to fine-tune this model checkpoint to some specific domain data samples which have not been seen in previous training phase. Is there a way to achieve this in tensorflow? Like modifying the embedding matrix somehow.
I couldn't find any relevant papers or works addressing this issue.
Also, I'm aware of the fact that the vocabulary files needs to be updated according to new sentence pairs. But, then do we have to again start training from scratch? Isn't there an easy way to dynamically update the vocabulary files and embedding matrix according to the new samples and continue training from the latest checkpoint?

Related

How to extract weights of DQN agent in TF-Agents framework?

I am using TF-Agents for a custom reinforcement learning problem, where I train a DQN (constructed using DqnAgents from the TF-Agents framework) on some features from my custom environment, and separately use a keras convolutional model to extract these features from images. Now I want to combine these two models into a single model and use transfer learning, where I want to initialize the weights of the first part of the network (images-to-features) as well as the second part which would have been the DQN layers in the previous case.
I am trying to build this combined model using keras.layers and compiling it with the Tf-Agents tf.networks.sequential class to bring it to the necessary form required when passing it to the DqnAgent() class. (Let's call this statement (a)).
I am able to initialize the image feature extractor network's layers with the weights since I saved it as a .h5 file and am able to obtain numpy arrays of the same. So I am able to do the transfer learning for this part.
The problem is with the DQN layers, where I saved the policy from the previous example using the prescribed Tensorflow Saved Model Format (pb) which gives me a folder containing model attributes. However, I am unable to view/extract the weights of my DQN in this way, and the recommended tf.saved_model.load('policy_directory') is not really transparent with respect to what data I can see regarding the policy. If I have to follow the transfer learning as I do in statement (a), I need to extract the weights of my DQN and assign them to the new network. The documentation seems to be quite sparse for this case where transfer learning needs to be applied.
Can anyone help me in this, by explaining how I can extract weights from the Saved Model method (from the pb file)? Or is there a better way to go about this problem?

How can i extract Trained model information from its weight files?

I had got pre-trained model weight files of a model for audio classification. How can i extract model information from that weights given (For examples no of layers used to build architecture)? This question asked in the interview. is if possible to extract model information from its weights.
Check these two threads
How to read keras model weights without a model
https://github.com/keras-team/keras/issues/91

Why does the loss explode during training from scratch? - Tensorflow Object Detection Models

First of all I want to state out that I am familiar with the benefits of transfer learning. Moreover I am able to train a pretrained model from 'modelzoo' on my dataset. But for research purposes I want to train my model from scratch without transferlearning.
I want to adopt the Faster-RCNN Resnet 101 implementation from tensorsflow's Object Detection API to my dataset. If I use one of the pretrained models the training goes as expected and the loss is always in 'normal' ranges (never above about 6). But if I do not use transferlearning the loss jumps very frequently to extrem high values (about 80,000,000), but between those values the loss is in normal ranges. In addition to this I do not see any predictions of the network on images in TensorBoard. It seems like the network does not make any predictions at all. The only thing which I change is to comment out those two lines in the model.config file:
# fine_tune_checkpoint: 'path'
# from_detection_checkpoint: true
I tried a lot of things to find the reason: Changed optimizer, changed the learning rate, used gradient clipping, changed the initializer used different machines to train on but nothing helps. Moreover I inspected my label_map as well as my record file. To ensure that those files are correct I redid the steps mentioned above by using the pascal voc dataset, the script to create records and the label map from the api, but even with this code from the Object Detection API without any code changes, the loss explodes (Tensorflow Object Detection API own inputs).

Where can I find the pretrained models of fasterRCNN / R-FCN with Mobilenet Feature extractor trained on COCO datset?

I want train a custom dataset on FasterRCNN with Mobilenetv1 or v2. I want to use the pre-trained models in tensorflow zoo. But I cant find faster Rcnn model with mobilenet as base extractor. Where can I get it?
I have already tensorflow zoo in github. I have previous used SSD+Mobilenet config for the same. Now I want to compare the results with FasterRCNN and RCNN with Mobilenet.
The official repo has not released Faster RCNN with mobilenet models yet. But if you want you can still use some other models with mobilenet trained on COCO, the process is a bit complicated.
There are two important steps to proceed.
First one is to have corresponding feature extractor class. For Faster RCNN, the models directory already contains faster_rcnn_mobilenet feature extractor implementation so this step is OK. But for R-FCN, you will have to implement the feature extractor class yourself.
Second one is to change tensor names available in the checkpoint. For example, if you use ssd_mobilenet_v1_xxx as checkpoint, then all tensors within mobilenet scope are named as FeatureExtractor/MobilenetV1/XXX while if in the faster_rcnn_mobilenet_v1 model, the tensor names within mobilenet scope are FirstStageFeatureExtractor/MobilenetV1/XXX (and SecondStageFeatureExtractor/MobilenetV1/XXX). So essentially you need to remove FirstStage (as well as SecondStage) in the names of all feature extractor tensors, then these tensors will have exactly the same name as in the checkpoint, and will be correctly restored. If you do this, the function you need to modify is
def restore_map(self,
fine_tune_checkpoint_type='detection',
load_all_detection_checkpoint_vars=False):
in file faster_rcnn_meta_arch.py.

NER Incremental training with Spacy

I would like to incrementally train a NER Spacy Model.
By incrementally I mean send a first batch of N training samples, get a first model, then send a second batch of M training samples and get a model identical as if the N+M samples would have been sent in one batch and the model trained.
To be clear, this is not about adding samples after the model has been fully trained. Instead it is the ability to save intermediate states in the model so we can "resume" and add more training samples.
This is very useful if the number of samples is large or to create an "active learning" systems.
It seems doable with NLTK according to this article : and I was wondering if this can be done with Spacy.
So far I have trained my own custom NER model with Spacy using nlp.update but it does not seem to store any intermediate state that supports incremental training.
Yes, this is possible in spaCy. Your approach with nlp.update is correct; once you have added your second batch of training samples, you just need to make a call to nlp.to_disk("/path") (https://spacy.io/usage/saving-loading). Then you can continue this process by loading your saved model again.