Tensorflow : Is it possible to identify the data is used for training? - tensorflow

I have created text classification model(.pb) using tensorflow. Prediction is good.
Is it possible to check the sentence using for prediction is already used to train the model or not. I need to retrain the model when new sentence is given to model to predict.

I did some research and couldn't find a way to get the train data only with the pb file because that file only stores the features and not the actual train data(obviously),but if you have the dataset,then you can easily verify duh....
I don't think you can ever find the exact train data with only the trained model,cause the model only contains the features and not the actual train data

Related

Training with spacy on full dataset

When I train my spacy model as follows
spacy train config.cfg --paths.train ./train.spacy --paths.dev ./dev.spacy
the model gets trained on train.spacy data file, and scored on dev.spacy. Then output_updated/model-best is the model with the highest score.
Is this best model finally trained on a combination of both train and dev data? I understand, it makes sense to split those datasets to avoid overfitting, but given little training data, I would like the final model to be trained on all data I have at hand.
No, spaCy does not automatically merge your datasets before training model-best. If you want to do that you would need to manually create a new training data set.
If you have so little data that seems like a good idea, you should probably prioritize getting more data.

Training trained seq2seq model on additional training data

I have trained a seq2seq model with 1M samples and saved the latest checkpoint. Now, I have some additional training data of 50K sentence pairs which has not been seen in previous training data. How can I adapt the current model to this new data without starting the training from scratch?
You do not have to re-run the whole network initialization. You may run an incremental training.
Training from pre-trained parameters
Another use case it to use a base model and train it further with new training options (in particular the optimization method and the learning rate). Using -train_from without -continue will start a new training with parameters initialized from a pre-trained model.
Remember to tokenize your 50K corpus the same way you tokenized the previous one.
Also, you do not have to use the same vocabulary beginning with OpenNMT 0.9. See the Updating the vocabularies section and use the appropriate value with -update_vocab option.

How to get access to specific layer using tensorflow estimator and dataset API?

I am using tensorflow 1.3.0 to train a CNN classification model. However I need to get access to the prelogits layer to evaluate my method (i.e. while this is casted as a classification problem, the method is not a classification problem but is used to extract CNN features, i.e. to produce a point in an N-dimensional vector space for an input image test)
I am using both the dataset API (with TFRecord files) and the estimator API to train the model. However, I don't see how I can get access/return the prelogits value using the Estimator API, i.e. estimator.train(), .evaluate() or .predict() since model_fn() needs to return a specific tf.estimator.EstimatorSpec object.
Previously (i.e. using the standard sess=tf.Session() method) I could train the model and get access to the prelogits layer while training (or by loading the model after training) and feed the network with a specific input to get the specific layer output with a sess.run(specific_layer) as long as the layer was named specific_layer.
I have tried to use the prediction output of EstimatorSpec but it did not work. Any ideas/suggestions?

Tensorflow embeddings

I know what embeddings are and how they are trained. Precisely, while referring to the tensorflow's documentation, I came across two different articles. I wish to know what exactly is the difference between them.
link 1: Tensorflow | Vector Representations of words
In the first tutorial, they have explicitly trained embeddings on a specific dataset. There is a distinct session run to train those embeddings. I can then later on save the learnt embeddings as a numpy object and use the
tf.nn.embedding_lookup() function while training an LSTM network.
link 2: Tensorflow | Embeddings
In this second article however, I couldn't understand what is happening.
word_embeddings = tf.get_variable(“word_embeddings”,
[vocabulary_size, embedding_size])
embedded_word_ids = tf.gather(word_embeddings, word_ids)
This is given under the training embeddings sections. My doubt is: does the gather function train the embeddings automatically? I am not sure since this op ran very fast on my pc.
Generally: What is the right way to convert words into vectors (link1 or link2) in tensorflow for training a seq2seq model? Also, how to train the embeddings for a seq2seq dataset, since the data is in the form of separate sequences for my task unlike (a continuous sequence of words refer: link 1 dataset)
Alright! anyway, I have found the answer to this question and I am posting it so that others might benefit from it.
The first link is more of a tutorial that steps you through the process of exactly how the embeddings are learnt.
In practical cases, such as training seq2seq models or Any other encoder-decoder models, we use the second approach where the embedding matrix gets tuned appropriately while the model gets trained.

How to use self trained model in Tensorflow for image classification

I used the following documentation to train my own model to classify flowers as described there:
https://github.com/tensorflow/models/tree/master/inception#how-to-train-from-scratch
bazel-bin/inception/flowers_train --batch_size=32 --train_dir=/tmp/flowers_train --data_dir=/tmp/flowers_data
I specified --max_steps=30 only to see if I can use the model as expected for classification afterwards.
After these training steps I get the following files:
model.ckpt-29.data-00000-of-00001
model.ckpt-29.index
model.ckpt-29.meta
Unfortunately I actually don't know how to use these three files for image classification. Is there any example showing the necessary steps?
There's a section on how to evaluate (https://github.com/tensorflow/models/tree/master/inception#how-to-evaluate). It will use the saved model (those three files) to classify images and test it against the ground truth labels. You can dig into the code (models/inception/inception/inception_eval.py) to see how it loads and does the raw inference.