I'm currently using TFX to build a pipeline on the Google AI platform with the Kubeflow engine.
I have a model where the batch size is an important hyper-parameter to tune.
I would like to search this hyper-parameter in the Tuner component.
Is it even possible?
I follow the TFX example with the Penguin dataset, more precisely the tuner component implementation: found here.
The _get_hyperparameters function returns the sample space for the model hyper-parameters (see line 139).
However, the batch size to train the model is fixed and specified at the end of the tuner_fn (see line 246).
Is there a way to dynamically change the batch size based on a sample from the hyper-parameter space?
Related
I am using TF-Agents for a custom reinforcement learning problem, where I train a DQN (constructed using DqnAgents from the TF-Agents framework) on some features from my custom environment, and separately use a keras convolutional model to extract these features from images. Now I want to combine these two models into a single model and use transfer learning, where I want to initialize the weights of the first part of the network (images-to-features) as well as the second part which would have been the DQN layers in the previous case.
I am trying to build this combined model using keras.layers and compiling it with the Tf-Agents tf.networks.sequential class to bring it to the necessary form required when passing it to the DqnAgent() class. (Let's call this statement (a)).
I am able to initialize the image feature extractor network's layers with the weights since I saved it as a .h5 file and am able to obtain numpy arrays of the same. So I am able to do the transfer learning for this part.
The problem is with the DQN layers, where I saved the policy from the previous example using the prescribed Tensorflow Saved Model Format (pb) which gives me a folder containing model attributes. However, I am unable to view/extract the weights of my DQN in this way, and the recommended tf.saved_model.load('policy_directory') is not really transparent with respect to what data I can see regarding the policy. If I have to follow the transfer learning as I do in statement (a), I need to extract the weights of my DQN and assign them to the new network. The documentation seems to be quite sparse for this case where transfer learning needs to be applied.
Can anyone help me in this, by explaining how I can extract weights from the Saved Model method (from the pb file)? Or is there a better way to go about this problem?
As tf.data augmentations are executed only on CPUs. I need a way to run certain augmentations on the TPU for an audio project.
For example,
CPU: tf.recs read -> audio crop -> noise addition.
TPU: spectogram -> Mixup Augmentation.
Most augmentations can be done as a Keras Layer on top of the model, but MixUp requires both changes in input as well as label.
Is there a way to do it using tf keras APIs.
And if there is any way we can transfer part of tf.data to run on TPU that will also be helpful.
As you have rightly mentioned and as per the Tensorflow documentation also the preprocessing of tf.data is done on CPU only.
However, you can do some workaround to preprocess your tf.data using TPU/GPU by directly using transformation function in your model with something like below code.
input = tf.keras.layers.Input((512,512,3))
x = tf.keras.layers.Lambda(transform)(input)
You can follow this Kaggle post for detailed discussion on this topic.
See the Tensorflow guide that discusses preprocessing data before the model or inside the model. By including preprocessing inside the model, the GPU is leveraged instead of the CPU, it makes the model portable, and it helps reduce the training/serving skew. The guide also has multiple recipes to get you started too. It doesn't explicitly state this works for a TPU but it can be tried.
I would like to incrementally train a NER Spacy Model.
By incrementally I mean send a first batch of N training samples, get a first model, then send a second batch of M training samples and get a model identical as if the N+M samples would have been sent in one batch and the model trained.
To be clear, this is not about adding samples after the model has been fully trained. Instead it is the ability to save intermediate states in the model so we can "resume" and add more training samples.
This is very useful if the number of samples is large or to create an "active learning" systems.
It seems doable with NLTK according to this article : and I was wondering if this can be done with Spacy.
So far I have trained my own custom NER model with Spacy using nlp.update but it does not seem to store any intermediate state that supports incremental training.
Yes, this is possible in spaCy. Your approach with nlp.update is correct; once you have added your second batch of training samples, you just need to make a call to nlp.to_disk("/path") (https://spacy.io/usage/saving-loading). Then you can continue this process by loading your saved model again.
I have built a model and I am successfully able to prune it using tf.contrib's model pruning module with default params and sparsity as 90%, but the problem is when I run the model it still takes the same amount of execution time as of the original model, my guess is that instead of running only the pruned version, tensorflow is running the entire graph with masked weghts and thats why there is no improvement even after pruning.
So how to export the pruned model with subgraph and respective weights and use it?
The strip_pruning_vars utility might be what you're looking for.
From the read.me file: https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/model_pruning#adding-pruning-ops
Removing pruning ops from the trained graph
Once the model is trained, it is necessary to remove the auxiliary variables (mask, threshold) and pruning ops added to the graph in the steps above. This can be accomplished using the strip_pruning_vars utility.
Would you mind sharing your code?
I am using a tensorflow framework and I have noticed that there are major variances in the size of the tensorflow model files.
For example the framework provides 2 models:
one of pretrained model to be used with fine tuning for example
and one which contains an untrained version.
They both have a size of 172.539 kb
When I apply fine tuning in my model with some minor changes in the graph (there is a module in framework for that) and save my model the size remains essentially the same: 178.525 kb.
First, I am bit surprised that my fine-tuned model is somewhat bigger since I change just the last layer from 21 to 14 classes so I would expect a somewhat smaller model file size but since the difference is so little I didn't pay attention.
But when I trained the same model using the same model file (the pretrained one I mean) and saved the model in disk the file size is quite different: 340.097 kb. By the term train I mean I allow the network to modify all parameter not just the parameters of the last layer.
The model that is being implemented is a variation of resnet for semantic image segmentation (if can someone deduct the expected model file size from the model itself).
So, my questions are why I have such a variance in the model file sizes and how come my saved fine-tuned model is larger than the original model? Is there a way to include/exclude parameters in the model to be saved?
P.S.1 Some information that might be handy:
I am using tensorflow v2 model saving while I think the framework files use v1. I am not sure how to identify this besides the fact that the former produces 3 files.
The framework is called tensorflow-deeplab-resnet and can be found here and the models are here.
P.S.2
I am not sure stack overflow it 's the right place for this question either.
That is because, when training models and saving them, Tensorflow will also save the gradients of your ops.
So allowing training on the last layer will increase the size of your saved model a little. And allowing training on the whole model will essentially double the size of the save file because each op will have its gradients saved.