Tensorflow initial_weights what mean? - tensorflow

Why we have to init weight in model predict? I can't understand.
You can refer : https://www.tensorflow.org/tutorials/structured_data/imbalanced_data#checkpoint_the_initial_weights
initial_weights = os.path.join(tempfile.mkdtemp(), 'initial_weights')
model.save_weights(initial_weights)

This tutorial appears to be referring to unbalanced data. You do not need to provide initial weights if you don't want to in Tensorflow's predict command. See this link describing potential inputs to the command.

Deep learning using Gradient Descent and its variant to find optimal weights. If you don't init weights, it may take a long time to converge or even can't converge.

Related

Loss function variational Autoencoder in Tensorflow example

I have a question regarding the loss function in variational autoencoder. I followed the tensorflow example https://www.tensorflow.org/tutorials/generative/cvae to create a LSTM-VAE, for sampling a sinus function.
My encoder-input is a set of points (x_i,sin(x_i)) for a specific range (randomly sampled), and as output of the decoder I expect similar values.
In the tensorflow guide, there is cross-entropy used to compare the encoder input with the decoder output.
cross_ent = tf.nn.sigmoid_cross_entropy_with_logits(logits=x_logit, labels=x)
This makes sense, because the input and output are treated as probabilities. But in reality these probabily functions represent the sets of my sinus function.
Can't I simply use a mean-squared-error instead of the cross-entropy (I tried it and it works well) or causes this a wrong behaviour of the architecture at some point?
Best regards and thanks for your help!
Well, such questions happen when you work too much and stop thinking properly. For the sake of solving this, it makes sense to think about what I'm trying to do.
p(x|z) is the decoder reconstruction, what means, that by sampling from z the value x is generated with the probability of p. In the tensorflow-example image-classification/generation is used, in that case crossentropy makes sense. I simply want to minimize the distance between my input and output. The use of mse is kind of logical.
Hope that helps someone at some point.
Regards.

How to initialize mean and variance of Pytorch BatchNorm2d?

I’m transforming a TensorFlow model to Pytorch. And I’d like to initialize the mean and variance of BatchNorm2d using TensorFlow model.
I’m doing it in this way:
bn.running_mean = torch.nn.Parameter(torch.Tensor(TF_param))
And I get this error:
RuntimeError: the derivative for 'running_mean' is not implemented
But is works for bn.weight and bn.bias. Is there any way to initialize the mean and variance using my pre-trained Tensorflow model? Is there anything like moving_mean_initializer and moving_variance_initializer in Pytorch?
Thanks!
The running mean and variance of a batch norm layer are not nn.Parameters, but rather a buffer of the layer.
I think you can simply assign a torch.tensor, no need to wrap a nn.Parameter around it.

How is get_updates() of optimizers.SGD used in Keras during training?

I am not familiar with the inner workings of Keras and have difficulty understanding how Keras uses the get_updates() function of optimizers.SGD during training.
I searched quite a while on the internet, but only got few details. Specifically, my understanding is that the parameters/weights update rule of SGD is defined in the get_updates() function. But it appears that get_updates() isn't literally called in every iteration during training; otherwise 'moments' wouldn't carry from one iteration to the next to implement momentum correctly, as it's reset in every call, c.f. optimizers.py:
shapes = [K.get_variable_shape(p) for p in params]
moments = [K.zeros(shape) for shape in shapes]
self.weights = [self.iterations] + moments
for p, g, m in zip(params, grads, moments):
v = self.momentum * m - lr * g # velocity
self.updates.append(K.update(m, v))
As pointed out in https://github.com/keras-team/keras/issues/7502, get_updates() only defines 'a symbolic computation graph'. I'm not sure what that means. Can someone give a more detailed explanation of how it works?
For example, how is the 'v' computed in one iteration got passed to 'moments' in the next iteration to implement momentum? I'd also appreciate it if someone can point me to some tutorial about how this works.
Thanks a lot! (BTW, I'm using tensorflow, if it matters.)
get_updates() defines graph operations that update the gradients.
When the graph is evaluated for training it will look somehow like this:
forward passes compute a prediction value
loss computes a cost
backward passes compute gradients
gradients are updated
Updating the gradients is a graph computation itself; i.e. the snippet of code that you quote defines how to perform the operation by specifying which tensors are involves and what math operations occur. The math operations themselves are not occurring at that point.
moments is a vectors of tensors defined in the code above. The code creates a graph operation that updates each moments element.
Every iteration of the graph will run this update operation.
The following link tries to explain the concept of the computational graph in TensorFlow:
https://www.tensorflow.org/guide/graphs
Keras uses the same underlying ideas but abstract the user from having to deal with the low level details. Defining a model in traditional TensorFlow 1.0 API requires a much higher level of detail.

Questions about tensorflow GetStarted tutorial

So I was reading the tensorflow getstarted tutorial and I found it very hard to follow. There were a lot of explanations missing about each function and why they are necesary (or not).
In the tf.estimator section, what's the meaning or what are they supposed to be the "x_eval" and "y_eval" arrays? The x_train and y_train arrays give the desired output (which is the corresponding y coordinate) for a given x coordinate. But the x_eval and y_eval values are incorrect: for x=5, y should be -4, not -4.1. Where do those values come from? What do x_eval and y_eval mean? Are they necesary? How did they choose those values?
The difference between "input_fn" (what does "fn" even mean?) and "train_input_fn". I see that the only difference is one has
num_epochs=None, shuffle=True
num_epochs=1000, shuffle=False
but I don't understand what "input_fn" or "train_input_fn" are/do, or what's the difference between the two, or if both are necesary.
3.In the
estimator.train(input_fn=input_fn, steps=1000)
piece of code, I don't understand the difference between "steps" and "num_epochs". What's the meaning of each one? Can you have num_epochs=1000 and steps=1000 too?
The final question is, how do i get the W and the b? In the previous way of doing it (not using tf.estimator) they explicitelly found that W=-1 and b=1. If I was doing a more complex neural network, involving biases and weights, I think I would want to recover the actual values of the weights and biases. That's the whole point of why I'm using tensorflow, to find the weights! So how do I recover them in the tf.estimator example?
These are just some of the questions that bugged me while reading the "getStarted" tutorial. I personally think it leaves a lot to desire, since it's very unclear what each thing does and you can at best guess.
I agree with you that the tf.estimator is not very well introduced in this "getting started" tutorial. I also think that some machine learning background would help with understanding what happens in the tutorial.
As for the answers to your questions:
In machine learning, we usually minimizer the loss of the model on the training set, and then we evaluate the performance of the model on the evaluation set. This is because it is easy to overfit the training set and get 100% accuracy on it, so using a separate validation set makes it impossible to cheat in this way.
Here (x_train, y_train) corresponds to the training set, where the global minimum is obtained for W=-1, b=1.
The validation set (x_eval, y_eval) doesn't have to perfectly follow the distribution of the training set. Although we can get a loss of 0 on the training set, we obtain a small loss on the validation set because we don't have exactly y_eval = - x_eval + 1
input_fn means "input function". This is to indicate that the object input_fn is a function.
In tf.estimator, you need to provide an input function if you want to train the estimator (estimator.train()) or evaluate it (estimator.evaluate()).
Usually you want different transformations for training or evaluation, so you have two functions train_input_fn and eval_input_fn (the input_fn in the tutorial is almost equivalent to train_input_fn and is just confusing).
For instance, during training we want to train for multiple epochs (i.e. multiple times on the dataset). For evaluation, we only need one pass over the validation data to compute the metrics we need
The number of epochs is the number of times we repeat the entire dataset. For instance if we train for 10 epochs, the model will see each input 10 times.
When we train a machine learning model, we usually use mini-batches of data. For instance if we have 1,000 images, we can train on batches of 100 images. Therefore, training for 10 epochs means training on 100 batches of data.
Once the estimator is trained, you can access the list of variables through estimator.get_variable_names() and the value of a variable through estimator.get_variable_value().
Usually we never need to do that, as we can for instance use the trained estimator to predict on new examples, using estimator.predict().
If you feel that the getting started is confusing, you can always submit a GitHub issue to tell the TensorFlow team and explain your point.

Tensorflow: jointly training CNN + LSTM

There are quite a few examples on how to use LSTMs alone in TF, but I couldn't find any good examples on how to train CNN + LSTM jointly.
From what I see, it is not quite straightforward how to do such training, and I can think of a few options here:
First, I believe the simplest solution (or the most primitive one) would be to train CNN independently to learn features and then to train LSTM on CNN features without updating the CNN part, since one would probably have to extract and save these features in numpy and then feed them to LSTM in TF. But in that scenario, one would probably have to use a differently labeled dataset for pretraining of CNN, which eliminates the advantage of end to end training, i.e. learning of features for final objective targeted by LSTM (besides the fact that one has to have these additional labels in the first place).
Second option would be to concatenate all time slices in the batch
dimension (4-d Tensor), feed it to CNN then somehow repack those
features to 5-d Tensor again needed for training LSTM and then apply a cost function. My main concern, is if it is possible to do such thing. Also, handling variable length sequences becomes a little bit tricky. For example, in prediction scenario you would only feed single frame at the time. Thus, I would be really happy to see some examples if that is the right way of doing joint training. Besides that, this solution looks more like a hack, thus, if there is a better way to do so, it would be great if someone could share it.
Thank you in advance !
For joint training, you can consider using tf.map_fn as described in the documentation https://www.tensorflow.org/api_docs/python/tf/map_fn.
Lets assume that the CNN is built along similar lines as described here https://github.com/tensorflow/models/blob/master/tutorials/image/cifar10/cifar10.py.
def joint_inference(sequence):
inference_fn = lambda image: inference(image)
logit_sequence = tf.map_fn(inference_fn, sequence, dtype=tf.float32, swap_memory=True)
lstm_cell = tf.contrib.rnn.LSTMCell(128)
output_state, intermediate_state = tf.nn.dynamic_rnn(cell=lstm_cell, inputs=logit_sequence)
projection_function = lambda state: tf.contrib.layers.linear(state, num_outputs=num_classes, activation_fn=tf.nn.sigmoid)
projection_logits = tf.map_fn(projection_function, output_state)
return projection_logits
Warning: You might have to look into device placement as described here https://www.tensorflow.org/tutorials/using_gpu if your model is larger than the memory gpu can allocate.
An Alternative would be to flatten the video batch to create an image batch, do a forward pass from CNN and reshape the features for LSTM.