Change the spatial input dimension during training - tensorflow

I am training a yolov4 (fully convolutional) in tensorflow 2.3.0.
I would like to change the spatial input shape of the network during training, to further adjust the weights to different scales.
Is this possible?
EDIT:
I know of the existence of darknet, but it suffers from some very specific augmentations I use and have implemented in my repo, that is why I ask explicitly for tensorflow.
To be more precisely about what I want to do.
I want to train for several batches at Y1xX1xC then change the input size to Y2xX2xC and train again for several batches and so on.

It is not possible. In the past people trained several networks for different scales but the current state-of-the-art approach is feature pyramids.
https://arxiv.org/pdf/1612.03144.pdf
Another great candidate is to use dilated convolution which can learn long distance dependencies among pixels with varying distance. You can concatenate the outputs of them and the model will then learn which distance is important for which case
https://towardsdatascience.com/review-dilated-convolution-semantic-segmentation-9d5a5bd768f5

It's important to mention which TensorFlow repository you're using. You can definitely achieve this. The idea is to keep the fixed spatial input dimension in a single batch.
But even better approach is to use the darknet repository from AlexeyAB: https://github.com/AlexeyAB/darknet
Just set, random = 1 https://github.com/AlexeyAB/darknet/blob/master/cfg/yolov4.cfg [line 1149]. It will train your network with different spatial dimensions randomly.
One thing you can do is, start your training with AlexeyAB repo with random=1 set, then take the trained weights file to tensorflow for fine-tuning.

Related

Strategies for pre-training models for use in tfjs

This is a more general version of a question I've already asked: Significant difference between outputs of deep tensorflow keras model in Python and tensorflowjs conversion
As far as I can tell, the layers of a tfjs model when run in the browser (so far only tested in Chrome and Firefox) will have small numerical differences in the output values when compared to the same model run in Python or Node. The cumulative effect of these small differences across all the layers of the model can cause fairly significant differences in the output. See here for an example of this.
This means a model trained in Python or Node will not perform as well in terms of accuracy when run in the browser. And the deeper your model, the worse it will get.
Therefore my question is, what is the best way to train a model to use with tfjs in the browser? Is there a way to ensure the output will be identical? Or do you just have to accept that there will be small numerical differences and, if so, are there any methods that can be used to train a model to be more resilient to this?
This answer is based on my personal observations. As such, it is debatable and not backed by much evidence. Some things that I follow to get accuracy of 16-bit models close to 32 bit models are:
Avoid using activations that have small upper and lower bounds, such as sigmoid or tanh, for hidden layers. These activations cause the weights of the next layer to become very sensitive to small values, and hence, small changes. I prefer using ReLU for such models. Since it is now the standard activation for hidden layers in most models, you should be using it in any case.
Avoid weight decay and L1/L2 regularizations on weights while training (the kernel_regularizer parameter in keras), since these increase sensitivity of weights. Use Dropout instead, I didn't observe a major drop in performance on TFLite when using it instead of numerical regularizers.

what does it mean when weights of a layer is not normally distributed

I plot all my weights of my neural network on tensorboard, I found that some
weights of some layer is normally distributed:
but, some are not.
what does this imply? should I increase or decrease the capacity of this layer?
Update:
My network is a LSTM-based netowrk. the non-normal distributed weights is the weights multiply with input feature, the normal distributed weights is the weights multiply with states.
one explanation base on convolutional networks might be this(I don't know if this is true for any other kind of artificial neural models or not), hence the first layer tries to find distinct small features weights are distributed very widely and network tries to find any useful feature it can, then in the next layers combination of these distinct features are used, which make sense to put a normal distribution of weights hence every one of the previous features are going to be part of a single bigger or more representative feature in next layers.
but this was only my intuition I am not sure if this is the reason with proof now.

Tensorflow: how to restore only specific hidden layers from checkpoint and use them to build a different computational graph for inference?

Let's say I trained a model with a very complex computational graph tailored for training. After a lot of training, the best model was saved to a checkpoint file. Now, I want to use the learned parameters of this best model for inference. However, the computational graph used for training is not exactly the same as the one I intend to use for inference. Concretely, there is a module in the graph with several layers in charge of outputting embedding vectors for items (recommender system context). However, for the sake of computational performance, during inference time I would like to have all the item embedding vectors precomputed in advance, so that the only computation required per request would just involve a couple of hidden layers.
Therefore, what I would like to know how to do is:
How to just restore the part of the network that outputs item embedding vectors, in order to precompute these vectors for all items (this would happen in some pre-processing script off-line)
Once all item embedding vectors are precomputed, during on-line inference time how to just restore the hidden layers in the later parts of the network and make them receive the precomputed item embedding vectors instead.
How can the points above be accomplished? I think point 1. is easier to get done. But my biggest concern is with point 2. In the computational graph used for training, in order to evaluate any layer I would have to provide values for the input placeholders. However, during on-line inference these placeholders would be obsolete because a lot of stuff would be precomputed and I don't know how to tell hidden layers in the later parts of the network that they should no longer depend on these obsolete placeholders but depend on the precomputed stuff instead.

Can Tensorflow Wide and Deep model train to continuous values

I am working with the Tensorflow Wide and Deep model. It currently trains against a binary classification (>50K or not).
Can this model be coerced to train directly against numeric values to produce more precise (if less accurate) predictions?
I have seen an example of using LSTM RNNs to make such predictions using TensorFlowEstimator directly here, but DNNLinearCombinedClassifier will not accept n_classes=0.
I like the structure of the Wide and Deep model, especially the ability to run the linear regression and the DNN separately to determine how learnable the data is, but my application involves data that clusters, but in an overlapping, input-dependent fashion.
Use DnnLinearCombinedRegressor for regression problems.

Sequence Labeling in TensorFlow

I have managed to train a word2vec with tensorflow and I want to feed those results into an rnn with lstm cells for sequence labeling.
1) It's not really clear on how to use your trained word2vec model for a rnn. (How to feed the result?)
2) I don't find much documentation on how to implement a sequence labeling lstm. (How do I bring in my labels?)
Could someone point me in the right direction on how to start with this task?
I suggest you start by reading the RNN tutorial and sequence-to-sequence tutorial. They explain how to build LSTMs in TensorFlow. Once you're comfortable with that, you'll have to find the right embedding Variable and assign it using your pre-trained word2vec model.
I realize this was posted a while ago, but I found this Gist about sequence labeling and this Gist about variable sequence labeling really helpful for figuring out sequence labeling. The basic outline (the gist of the Gist):
Use dynamic_rnn to handle unrolling your network for training and prediction. This method has moved around some in the API, so you may have to find it for your version, but just Google it.
Arrange your data into batches of size [batch_size, sequence_length, num_features], and your labels into batches of size [batch_size, sequence_length, num_classes]. Note that you want a label for every time step in your sequence.
For variable-length sequences, pass a value to the sequence_length argument of the dynamic_rnn wrapper for each sequence in your batch.
Training the RNN is very similar to training any other neural network once you have the network structure defined: feed it training data and target labels and watch it learn!
And some caveats:
With variable-length sequences, you will need to build masks for calculating your error metrics and stuff. It's all in the second link above, but don't forget when you make your own error metrics! I ran in to this a couple of times and it made my networks look like they were doing much worse on variable-length sequences.
You might want to add a regularization term to your loss function. I had some convergence issues without this.
I recommend using tf.train.AdamOptimizer with the default settings at first. Depending on your data, this may not converge and you will need to adjust the settings. This article does a good job of explaining what the different knobs do. Start reading from the beginning, some of the knobs are explained before the Adam section.
Hopefully these links are helpful to others in the future!