Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 11 months ago.
Improve this question
I am doing transfer learning with google audioset embeddings. According to the documentation,
the embedding layer does not include a final non-linear activation, so
the embedding value is pre-activation
I want to train and test a new model on top of these embedding layer with the embedding data. I have planned to do the following
Create new dense layers.
Convert the embeddings from byte string to tensor. Split these embeddings to train, test and split dataset.
Input these tensors to the new model.
Validate and test the model using validate dataset and test dataset.
I have two confusions with this implementation
Is using the embeddings as input of the new layers enough for the transfer learning? I have seen in some Transfer Learning implementation that they load pre-trained weights to the new model and freeze the layers involving those weights. But in those implementation, they use new data for training, not the embeddings from the pre-trained model. I am confused how that works.
Is it okay to split the embeddings to train, test and validate dataset? I am not sure if all the embeddings were used for training the pre-trained model. If they all were used, then does it make sense to use part of them as validation and test dataset?
Is using the embeddings as input of the new layers enough for the transfer learning?
This should work as expected. Of course, you should consider that your generalization capability might be lower than expected for unseen data points (when comparing with data points seen during training of the pre-trained model). Usually, when using a pre-trained model, every data point is unseen for the original network, but in your case some of the data points might have been used for training, so their performance might be "unrealistically too high" when compared with data that your pre-trained model has never seen.
Is it okay to split the embeddings to train, test and validate dataset?
This is a good approach to solve the problem from the previous point. If you don't know which data points were used for training, you could benefit from using cross-validation and create multiple splits to reduce the impact of this issue.
Related
As you may know, recent versions of tensorflow/keras allowed the data augmentation layers integrated into the model. This feature of the API is an excellent option, especially when you want to apply image augmentation on a part of inputs (image) for a model with multimodal inputs and different sub-networks for different inputs. And the test accuracy with this augmentation increased to 3-5% in comparison with no augmentation.
But I can't figure out how many training samples were used in the actual training with this augmentation method. For simplicity, let's assume I am passing a list of numpy arrays as the inputs of the model when fitting the model. For example, if I have 1000 training cases for a model with the augmentation layers, will 1000 training cases with transformed images be used in training? If not, how many?
I tried to search all related sites (tutorials and documentation) for an answer to this simple question in vain.
I think I found the answer. Based on the training log of the model, the augmentation layers do not produce additional images but randomly transform the original images. To increase generated data amount, a user has to provide multiple copies of original training data as input to the model.
I am using Google's Dopamine framework to train a specific reinforcement learning use-case. I am using an auto encoder to pre-train the convolutional layers of the Deep Q Network and then transfer those pre-trained weights in the final network.
To that end, I have created a separate model (in this case an auto-encoder) which I train and save the resulting model and weights.
The DQN model is created using Keras's model sub-classing method and the model used to save the trained convolutional layers weights was build using the Sequential API. My issue is with when trying to load the pre-trained weights to my final DQN model. Based on whether I use the load_model() or load_weights() functionality from Tensorflow's API I get two different overall behaviors of my network and I would like to understand why. Specifically I have the two following scenarios:
Loading the weights with theload_weights() method to the final model. The weights are the weights of the encoder plus one additional layer(added just before saving the weights) to fit the architecture of the final network implemented in dopamine where they are loaded.
First load the saved model with load_model() and then when defining the new model in the __init__() method, extract the relevant layers from the loaded model and then use them for the final model.
Overall, I would expect the two approaches to yield similar results with regards to the average reward achieved per episode , when I use the same pre-trained weights. However the two approaches differ ( 1. yield higher average reward than 2. although using the same pre-trained weights) and I don't understand why.
Furthermore, in order to validate this behavior I have tried loading random weights with the two aforementioned approaches in order to see a change in behavior. In both cases, based on which of the two aforementioned loading methods I am using, I end up with very similar resulting behavior with the respected case when loading the trained weights. It's seems like the pre-trained weights in each respected case have no effect on the overall resulting training behavior. Although, this might be irrelevant to the issue I am trying to investigate here as it might be the case that the pre-trained weights don't offer any benefit overall which is also possible.
Any thoughts and ideas on this would be much appreciated.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I trained doc2vec model in TensorFlow. So now I have embeded vectors for words in dictionary and vectors for the documents.
In the paper
"Distributed Representations of Sentences and Documents"
Quoc Le, Tomas Mikolov
authors write
“the inference stage” to get paragraph vectors D for new paragraphs
(never seen before) by adding more columns in D and gradient
descending on D while holding W,U,b fixed.
I have pretrained model so we have W, U and b as graph variables. Question is how to implement inference of D(new document) efficiently in Tensorflow?
For most neural networks, the output of the network (class for classification problems, number for regression,...) if the value you are interested in. In those cases, inference means running the frozen network on some new data (forward propagation) to compute the desired output.
For those cases, several strategies can be used to deliver quickly the desired output for multiple new data points : scaling horizontally, reduce the complexity of calculation through quantisation of the weights, optimising the freezed graph computation (see https://devblogs.nvidia.com/tensorrt-3-faster-tensorflow-inference/),...
doc2Vec (and word2vec) are different use case is however different : the neural net is used to compute an output (prediction of the next word), but the meaningful and useful data are the weights used in the neural network after training. The inference stage is therefore different : you do not want to get the output of the neural net to get a vector representation of a new document, you need to train the part of the neural net that provides you the vector representation of your document. Part of the neural net is then frozen (W,U,b).
How can you efficiently compute D (document vector) in Tensorflow :
Make experiments to define the optimal learning rate (a smaller value might be a better fit for shorter document) as it defines how quick your neural network representation of a document.
As the other part of the neural net are frozen, you can scale the inference on multiple processes / machines
Identify the bottle necks : what is currently slow ? model computation ? Text retrieval from disk of from external data source ? Storage of the results ?
Knowing more about your current issues, and the context might help.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Improve this question
Thanks to Google for providing a few pre-trained models with tensorflow API.
I would like to know how to retrain a pre-trained model available from the above repository, by adding new classes to the model.
For example, the trained COCO dataset model has 90 classes, I would like to add 1 or 2 classes to the existing one and get one 92 class object detection model as a result.
Running Locally is provided by the repository but it is completely replacing those pre-trained classes with newly trained classes. There, only train and eval are mentioned.
So, is there any other way to retrain the model and get 92 classes as a result?
Question : How do we add a few more classes to my already trained network?
Specifically, we want to keep all the network as-is other than the output of the new classes. This means that for something like ResNet, we want to keep everything other than the last layer frozen, and somehow expand the last layer to have our new classes.
Answer : Combine the existing last layer with a new one you train
Specifically, we will replace the last layer with a fully connected layer that is large enough for your new classes and the old ones. Initialize it with random weights and then train it on your classes and just a few of the others. After training, copy the original weights of the original last fully connected layer into your new trained fully connected layer.
If, for example, the previous last layer was a 1024x90 matrix, and your new last layer is a 1024x92 matrix, copy the 1024x90 into the corresponding space in your new 1024x92. This will destructively replace all your training of the old classes with the pre-trained values but leave your training of your new classes. That is good, because you probably didn't train it with the same number of old classes. Do the same thing with the bias, if any.
Your final network will have only 1024x2 new weight values (plus any bias), corresponding to your new classes.
A word of caution, although this will train fast and provide quick results, it will not perform as well as retraining on a full and comprehensive data set.
That said, it'll still work well ;)
Here is a reference to how to replace the last layer How to remove the last layer from trained model in Tensorflow that someone else answered
I am working with the Tensorflow Wide and Deep model. It currently trains against a binary classification (>50K or not).
Can this model be coerced to train directly against numeric values to produce more precise (if less accurate) predictions?
I have seen an example of using LSTM RNNs to make such predictions using TensorFlowEstimator directly here, but DNNLinearCombinedClassifier will not accept n_classes=0.
I like the structure of the Wide and Deep model, especially the ability to run the linear regression and the DNN separately to determine how learnable the data is, but my application involves data that clusters, but in an overlapping, input-dependent fashion.
Use DnnLinearCombinedRegressor for regression problems.