I'm studying machine learning and deep learning. I'm trying to customize this model from the Keras website https://keras.io/examples/generative/wgan_gp/
My model takes 3 512x512 images in each training iteration (from 10 different directories), which are then divided into patches used to train the generator and discriminator. These images must be consecutive and belong to the same directory. The directory can be chosen randomly in each iteration, and the 3 images must be taken from it.
In summary, for each training iteration, the algorithm must select a random directory, take 3 consecutive images and divide them into patches to train the two networks.
How can I customize the way I iterate over the dataset in fit() to achieve this?
Providing solution here from the answer provided by Shubham Panchal in comment section for the benefit of the community.
You can do this using TensorFlow. See this tutorial on DCGAN. With the TensorFlow API, you can create a custom training loop with any existing Keras model. You may implement the custom training loop from the tutorial above and using use the WGAN model you have.
Related
I am training a yolov4 (fully convolutional) in tensorflow 2.3.0.
I would like to change the spatial input shape of the network during training, to further adjust the weights to different scales.
Is this possible?
EDIT:
I know of the existence of darknet, but it suffers from some very specific augmentations I use and have implemented in my repo, that is why I ask explicitly for tensorflow.
To be more precisely about what I want to do.
I want to train for several batches at Y1xX1xC then change the input size to Y2xX2xC and train again for several batches and so on.
It is not possible. In the past people trained several networks for different scales but the current state-of-the-art approach is feature pyramids.
https://arxiv.org/pdf/1612.03144.pdf
Another great candidate is to use dilated convolution which can learn long distance dependencies among pixels with varying distance. You can concatenate the outputs of them and the model will then learn which distance is important for which case
https://towardsdatascience.com/review-dilated-convolution-semantic-segmentation-9d5a5bd768f5
It's important to mention which TensorFlow repository you're using. You can definitely achieve this. The idea is to keep the fixed spatial input dimension in a single batch.
But even better approach is to use the darknet repository from AlexeyAB: https://github.com/AlexeyAB/darknet
Just set, random = 1 https://github.com/AlexeyAB/darknet/blob/master/cfg/yolov4.cfg [line 1149]. It will train your network with different spatial dimensions randomly.
One thing you can do is, start your training with AlexeyAB repo with random=1 set, then take the trained weights file to tensorflow for fine-tuning.
I am using TF-Agents for a custom reinforcement learning problem, where I train a DQN (constructed using DqnAgents from the TF-Agents framework) on some features from my custom environment, and separately use a keras convolutional model to extract these features from images. Now I want to combine these two models into a single model and use transfer learning, where I want to initialize the weights of the first part of the network (images-to-features) as well as the second part which would have been the DQN layers in the previous case.
I am trying to build this combined model using keras.layers and compiling it with the Tf-Agents tf.networks.sequential class to bring it to the necessary form required when passing it to the DqnAgent() class. (Let's call this statement (a)).
I am able to initialize the image feature extractor network's layers with the weights since I saved it as a .h5 file and am able to obtain numpy arrays of the same. So I am able to do the transfer learning for this part.
The problem is with the DQN layers, where I saved the policy from the previous example using the prescribed Tensorflow Saved Model Format (pb) which gives me a folder containing model attributes. However, I am unable to view/extract the weights of my DQN in this way, and the recommended tf.saved_model.load('policy_directory') is not really transparent with respect to what data I can see regarding the policy. If I have to follow the transfer learning as I do in statement (a), I need to extract the weights of my DQN and assign them to the new network. The documentation seems to be quite sparse for this case where transfer learning needs to be applied.
Can anyone help me in this, by explaining how I can extract weights from the Saved Model method (from the pb file)? Or is there a better way to go about this problem?
First of all I want to state out that I am familiar with the benefits of transfer learning. Moreover I am able to train a pretrained model from 'modelzoo' on my dataset. But for research purposes I want to train my model from scratch without transferlearning.
I want to adopt the Faster-RCNN Resnet 101 implementation from tensorsflow's Object Detection API to my dataset. If I use one of the pretrained models the training goes as expected and the loss is always in 'normal' ranges (never above about 6). But if I do not use transferlearning the loss jumps very frequently to extrem high values (about 80,000,000), but between those values the loss is in normal ranges. In addition to this I do not see any predictions of the network on images in TensorBoard. It seems like the network does not make any predictions at all. The only thing which I change is to comment out those two lines in the model.config file:
# fine_tune_checkpoint: 'path'
# from_detection_checkpoint: true
I tried a lot of things to find the reason: Changed optimizer, changed the learning rate, used gradient clipping, changed the initializer used different machines to train on but nothing helps. Moreover I inspected my label_map as well as my record file. To ensure that those files are correct I redid the steps mentioned above by using the pascal voc dataset, the script to create records and the label map from the api, but even with this code from the Object Detection API without any code changes, the loss explodes (Tensorflow Object Detection API own inputs).
I'm using Google Colab for training my models.
But speed is still low.
So is there a way I can train from two different accounts and combine the training later?
No, you cannot train using 2 accounts the same model on colab. Google colab is for research purposes only. Not to train large scale production models. Colab also disconnects the kernel every 12 hour.
You can instead train the model using multiple GPU's on a single computer. Keras supports multi GPU training when using tensorflow as backend. But training on two different computers/VM is not possible. How will gradients flow during back propagation?
There is a solution though, but not an end-to-end approach. You can split your model into two different models, where the output of first model will become the input for second and second will produce the final output. For this you need a different training set for each model.
Take this example.
Suppose you are building a face recogniser where the model takes in a raw camera picture and recognises the face as yes/no.
Instead of training this big Networks you could split it into two different nets, where task for first net will be to crop the face and remove other useless things from image and second to recognise from cropped image.
This is non end-to-end model, and you can train the two models diffently on different machines with different dataset and then eventually merge it together. This is usually more powerful and easy to train.
Look up this question Tensorflow Combining Two Models End to End
Another possibility is to ensemble the two trained models. You'd have to make sure however that the data for both of the models are coming from the same distribution.
I've trained a seq2seq model for machine translation (DE-EN). And I have saved the trained model checkpoint. Now, I'd like to fine-tune this model checkpoint to some specific domain data samples which have not been seen in previous training phase. Is there a way to achieve this in tensorflow? Like modifying the embedding matrix somehow.
I couldn't find any relevant papers or works addressing this issue.
Also, I'm aware of the fact that the vocabulary files needs to be updated according to new sentence pairs. But, then do we have to again start training from scratch? Isn't there an easy way to dynamically update the vocabulary files and embedding matrix according to the new samples and continue training from the latest checkpoint?