How to train new data continuously in tensorflow - tensorflow

I use TF-slim training flower data set, scripts is this. the flower data set has only 5 classes. If I add some new image data to the roses, or add a new classification, what should I do after the train 1000 steps? Do I need to delete already trained data, such as checkpoint files?

There exists a similar question on Data Science Stack Exchange, with an answer that considers your scenario:
Once a model is trained and you get new data which can be used for
training, you can load the previous model and train onto it. For
example, you can save your model as a .pickle file and load it and
train further onto it when new data is available. Do note that for the
model to predict correctly, the new training data should have a
similar distribution as the past data
I do the same in my own project, where I started with a small dataset that grew bigger over the time. After addding new data I retrain the model from the last checkpoint.

Related

Training with spacy on full dataset

When I train my spacy model as follows
spacy train config.cfg --paths.train ./train.spacy --paths.dev ./dev.spacy
the model gets trained on train.spacy data file, and scored on dev.spacy. Then output_updated/model-best is the model with the highest score.
Is this best model finally trained on a combination of both train and dev data? I understand, it makes sense to split those datasets to avoid overfitting, but given little training data, I would like the final model to be trained on all data I have at hand.
No, spaCy does not automatically merge your datasets before training model-best. If you want to do that you would need to manually create a new training data set.
If you have so little data that seems like a good idea, you should probably prioritize getting more data.

Tensorflow : Is it possible to identify the data is used for training?

I have created text classification model(.pb) using tensorflow. Prediction is good.
Is it possible to check the sentence using for prediction is already used to train the model or not. I need to retrain the model when new sentence is given to model to predict.
I did some research and couldn't find a way to get the train data only with the pb file because that file only stores the features and not the actual train data(obviously),but if you have the dataset,then you can easily verify duh....
I don't think you can ever find the exact train data with only the trained model,cause the model only contains the features and not the actual train data

Can choose some data in training data after doing data augmentation?

I am training a UNET for semantic segmentation but I only have 200 labeled images. Given the small size of the dataset, it definitely needs some data augmentation techniques.
I have question about the test and the validation set.
I have custom data generator which keep feeding data from folder for training model.
So what I plan to do is:
do data augmentation for the training set and keep all of it in the same folder
"randomly" pick some of training data into test and validation set (of course, before training).
I am not sure if this is fine, since we just do some simple processing (flipping, transposing, adjusting brightness)
Would it be better to separate the data first and do the augmentation for the rest of data in the training folder?

Train dataset progressively using tensorflow

can we train image data-set progressively,like my previous training dataset is created using 500 images but now i want add more images in to it.
Should we train old dataset using more images ?
In Tensorflow there are checkpoints for this. You import already learned weights for an existing model and continue training on new (or existing) data. You can just add the new images to your dataset. For the repeatability of the training procedure it is useful to create a new record file. Of course you have to refer to the new record file during the training.

Training trained seq2seq model on additional training data

I have trained a seq2seq model with 1M samples and saved the latest checkpoint. Now, I have some additional training data of 50K sentence pairs which has not been seen in previous training data. How can I adapt the current model to this new data without starting the training from scratch?
You do not have to re-run the whole network initialization. You may run an incremental training.
Training from pre-trained parameters
Another use case it to use a base model and train it further with new training options (in particular the optimization method and the learning rate). Using -train_from without -continue will start a new training with parameters initialized from a pre-trained model.
Remember to tokenize your 50K corpus the same way you tokenized the previous one.
Also, you do not have to use the same vocabulary beginning with OpenNMT 0.9. See the Updating the vocabularies section and use the appropriate value with -update_vocab option.