What's the difference between tf.placeholder and tf.Variable?

What's the difference between tf.placeholder and tf.Variable? - tensorflow

I'm a newbie to TensorFlow. I'm confused about the difference between tf.placeholder and tf.Variable. In my view, tf.placeholder is used for input data, and tf.Variable is used to store the state of data. This is all what I know.
Could someone explain to me more in detail about their differences? In particular, when to use tf.Variable and when to use tf.placeholder?

In short, you use tf.Variable for trainable variables such as weights (W) and biases (B) for your model.
weights = tf.Variable(
tf.truncated_normal([IMAGE_PIXELS, hidden1_units],
stddev=1.0 / math.sqrt(float(IMAGE_PIXELS))), name='weights')
biases = tf.Variable(tf.zeros([hidden1_units]), name='biases')
tf.placeholder is used to feed actual training examples.
images_placeholder = tf.placeholder(tf.float32, shape=(batch_size, IMAGE_PIXELS))
labels_placeholder = tf.placeholder(tf.int32, shape=(batch_size))
This is how you feed the training examples during the training:
for step in xrange(FLAGS.max_steps):
feed_dict = {
images_placeholder: images_feed,
labels_placeholder: labels_feed,
}
_, loss_value = sess.run([train_op, loss], feed_dict=feed_dict)
Your tf.variables will be trained (modified) as the result of this training.
See more at https://www.tensorflow.org/versions/r0.7/tutorials/mnist/tf/index.html. (Examples are taken from the web page.)

The difference is that with tf.Variable you have to provide an initial value when you declare it. With tf.placeholder you don't have to provide an initial value and you can specify it at run time with the feed_dict argument inside Session.run

Since Tensor computations compose of graphs then it's better to interpret the two in terms of graphs.
Take for example the simple linear regression
WX+B=Y
where W and B stand for the weights and bias and X for the observations' inputs and Y for the observations' outputs.
Obviously X and Y are of the same nature (manifest variables) which differ from that of W and B (latent variables). X and Y are values of the samples (observations) and hence need a place to be filled, while W and B are the weights and bias, Variables (the previous values affect the latter) in the graph which should be trained using different X and Y pairs. We place different samples to the Placeholders to train the Variables.
We only need to save or restore the Variables (at checkpoints) to save or rebuild the graph with the code.
Placeholders are mostly holders for the different datasets (for example training data or test data). However, Variables are trained in the training process for the specific tasks, i.e., to predict the outcome of the input or map the inputs to the desired labels. They remain the same until you retrain or fine-tune the model using different or the same samples to fill into the Placeholders often through the dict. For instance:
session.run(a_graph, dict = {a_placeholder_name : sample_values})
Placeholders are also passed as parameters to set models.
If you change placeholders (add, delete, change the shape etc) of a model in the middle of training, you can still reload the checkpoint without any other modifications. But if the variables of a saved model are changed, you should adjust the checkpoint accordingly to reload it and continue the training (all variables defined in the graph should be available in the checkpoint).
To sum up, if the values are from the samples (observations you already have) you safely make a placeholder to hold them, while if you need a parameter to be trained harness a Variable (simply put, set the Variables for the values you want to get using TF automatically).
In some interesting models, like a style transfer model, the input pixes are going to be optimized and the normally-called model variables are fixed, then we should make the input (usually initialized randomly) as a variable as implemented in that link.
For more information please infer to this simple and illustrating doc.

TL;DR
Variables
For parameters to learn
Values can be derived from training
Initial values are required (often random)
Placeholders
Allocated storage for data (such as for image pixel data during a feed)
Initial values are not required (but can be set, see tf.placeholder_with_default)

The most obvious difference between the tf.Variable and the tf.placeholder is that
you use variables to hold and update parameters. Variables are
in-memory buffers containing tensors. They must be explicitly
initialized and can be saved to disk during and after training. You
can later restore saved values to exercise or analyze the model.
Initialization of the variables is done with sess.run(tf.global_variables_initializer()). Also while creating a variable, you need to pass a Tensor as its initial value to the Variable() constructor and when you create a variable you always know its shape.
On the other hand, you can't update the placeholder. They also should not be initialized, but because they are a promise to have a tensor, you need to feed the value into them sess.run(<op>, {a: <some_val>}). And at last, in comparison to a variable, placeholder might not know the shape. You can either provide parts of the dimensions or provide nothing at all.
There other differences:
the values inside the variable can be updated during optimizations
variables can be shared, and can be non-trainable
the values inside the variable can be stored after training
when the variable is created, 3 ops are added to a graph (variable op, initializer op, ops for the initial value)
placeholder is a function, Variable is a class (hence an uppercase)
when you use TF in a distributed environment, variables are stored in a special place (parameter server) and are shared between the workers.
Interesting part is that not only placeholders can be fed. You can feed the value to a Variable and even to a constant.

Adding to other's answers, they also explain it very well in this MNIST tutorial on Tensoflow website:
We describe these interacting operations by manipulating symbolic
variables. Let's create one:
x = tf.placeholder(tf.float32, [None, 784]),
x isn't a specific value. It's a placeholder, a value that we'll input when we ask TensorFlow to
run a computation. We want to be able to input any number of MNIST
images, each flattened into a 784-dimensional vector. We represent
this as a 2-D tensor of floating-point numbers, with a shape [None,
784]. (Here None means that a dimension can be of any length.)
We also need the weights and biases for our model. We could imagine
treating these like additional inputs, but TensorFlow has an even
better way to handle it: Variable. A Variable is a modifiable tensor
that lives in TensorFlow's graph of interacting operations. It can be
used and even modified by the computation. For machine learning
applications, one generally has the model parameters be Variables.
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
We create these Variables by giving tf.Variable the initial value of
the Variable: in this case, we initialize both W and b as tensors full
of zeros. Since we are going to learn W and b, it doesn't matter very
much what they initially are.

Tensorflow uses three types of containers to store/execute the process
Constants :Constants holds the typical data.
variables: Data values will be changed, with respective the functions such as cost_function..
placeholders: Training/Testing data will be passed in to the graph.

Example snippet:
import numpy as np
import tensorflow as tf
### Model parameters ###
W = tf.Variable([.3], tf.float32)
b = tf.Variable([-.3], tf.float32)
### Model input and output ###
x = tf.placeholder(tf.float32)
linear_model = W * x + b
y = tf.placeholder(tf.float32)
### loss ###
loss = tf.reduce_sum(tf.square(linear_model - y)) # sum of the squares
### optimizer ###
optimizer = tf.train.GradientDescentOptimizer(0.01)
train = optimizer.minimize(loss)
### training data ###
x_train = [1,2,3,4]
y_train = [0,-1,-2,-3]
### training loop ###
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init) # reset values to wrong
for i in range(1000):
sess.run(train, {x:x_train, y:y_train})
As the name say placeholder is a promise to provide a value later i.e.
Variable are simply the training parameters (W(matrix), b(bias) same as the normal variables you use in your day to day programming, which the trainer updates/modify on each run/step.
While placeholder doesn't require any initial value, that when you created x and y TF doesn't allocated any memory, instead later when you feed the placeholders in the sess.run() using feed_dict, TensorFlow will allocate the appropriately sized memory for them (x and y) - this unconstrained-ness allows us to feed any size and shape of data.
In nutshell:
Variable - is a parameter you want trainer (i.e. GradientDescentOptimizer) to update after each step.
Placeholder demo -
a = tf.placeholder(tf.float32)
b = tf.placeholder(tf.float32)
adder_node = a + b # + provides a shortcut for tf.add(a, b)
Execution:
print(sess.run(adder_node, {a: 3, b:4.5}))
print(sess.run(adder_node, {a: [1,3], b: [2, 4]}))
resulting in the output
7.5
[ 3. 7.]
In the first case 3 and 4.5 will be passed to a and b respectively, and then to adder_node ouputting 7. In second case there's a feed list, first step 1 and 2 will be added, next 3 and 4 (a and b).
Relevant reads:
tf.placeholder doc.
tf.Variable doc.
Variable VS placeholder.

Variables
A TensorFlow variable is the best way to represent shared, persistent state manipulated by your program. Variables are manipulated via the tf.Variable class. Internally, a tf.Variable stores a persistent tensor. Specific operations allow you to read and modify the values of this tensor. These modifications are visible across multiple tf.Sessions, so multiple workers can see the same values for a tf.Variable. Variables must be initialized before using.
Example:
x = tf.Variable(3, name="x")
y = tf.Variable(4, name="y")
f = x*x*y + y + 2
This creates a computation graph. The variables (x and y) can be initialized and the function (f) evaluated in a tensorflow session as follows:
with tf.Session() as sess:
x.initializer.run()
y.initializer.run()
result = f.eval()
print(result)
42
Placeholders
A placeholder is a node (same as a variable) whose value can be initialized in the future. These nodes basically output the value assigned to them during runtime. A placeholder node can be assigned using the tf.placeholder() class to which you can provide arguments such as type of the variable and/or its shape. Placeholders are extensively used for representing the training dataset in a machine learning model as the training dataset keeps changing.
Example:
A = tf.placeholder(tf.float32, shape=(None, 3))
B = A + 5
Note: 'None' for a dimension means 'any size'.
with tf.Session as sess:
B_val_1 = B.eval(feed_dict={A: [[1, 2, 3]]})
B_val_2 = B.eval(feed_dict={A: [[4, 5, 6], [7, 8, 9]]})
print(B_val_1)
[[6. 7. 8.]]
print(B_val_2)
[[9. 10. 11.]
[12. 13. 14.]]
References:
https://www.tensorflow.org/guide/variables
https://www.tensorflow.org/api_docs/python/tf/placeholder
O'Reilly: Hands-On Machine Learning with Scikit-Learn & Tensorflow

Think of Variable in tensorflow as a normal variables which we use in programming languages. We initialize variables, we can modify it later as well. Whereas placeholder doesn’t require initial value. Placeholder simply allocates block of memory for future use. Later, we can use feed_dict to feed the data into placeholder. By default, placeholder has an unconstrained shape, which allows you to feed tensors of different shapes in a session. You can make constrained shape by passing optional argument -shape, as I have done below.
x = tf.placeholder(tf.float32,(3,4))
y = x + 2
sess = tf.Session()
print(sess.run(y)) # will cause an error
s = np.random.rand(3,4)
print(sess.run(y, feed_dict={x:s}))
While doing Machine Learning task, most of the time we are unaware of number of rows but (let’s assume) we do know the number of features or columns. In that case, we can use None.
x = tf.placeholder(tf.float32, shape=(None,4))
Now, at run time we can feed any matrix with 4 columns and any number of rows.
Also, Placeholders are used for input data ( they are kind of variables which we use to feed our model), where as Variables are parameters such as weights that we train over time.

Placeholder :
A placeholder is simply a variable that we will assign data to at a later date. It allows us to create our operations and build our computation graph, without needing the data. In TensorFlow terminology, we then feed data into the graph through these placeholders.
Initial values are not required but can have default values with tf.placeholder_with_default)
We have to provide value at runtime like :
a = tf.placeholder(tf.int16) // initialize placeholder value
b = tf.placeholder(tf.int16) // initialize placeholder value
use it using session like :
sess.run(add, feed_dict={a: 2, b: 3}) // this value we have to assign at runtime
Variable :
A TensorFlow variable is the best way to represent shared,
persistent state manipulated by your program.
Variables are manipulated via the tf.Variable class. A tf.Variable
represents a tensor whose value can be changed by running ops on it.
Example : tf.Variable("Welcome to tensorflow!!!")

Tensorflow 2.0 Compatible Answer: The concept of Placeholders, tf.placeholder will not be available in Tensorflow 2.x (>= 2.0) by default, as the Default Execution Mode is Eager Execution.
However, we can use them if used in Graph Mode (Disable Eager Execution).
Equivalent command for TF Placeholder in version 2.x is tf.compat.v1.placeholder.
Equivalent Command for TF Variable in version 2.x is tf.Variable and if you want to migrate the code from 1.x to 2.x, the equivalent command is
tf.compat.v2.Variable.
Please refer this Tensorflow Page for more information about Tensorflow Version 2.0.
Please refer the Migration Guide for more information about migration from versions 1.x to 2.x.

Think of a computation graph. In such graph, we need an input node to pass our data to the graph, those nodes should be defined as Placeholder in tensorflow.
Do not think as a general program in Python. You can write a Python program and do all those stuff that guys explained in other answers just by Variables, but for computation graphs in tensorflow, to feed your data to the graph, you need to define those nods as Placeholders.

For TF V1:
Constant is with initial value and it won't change in the computation;
Variable is with initial value and it can change in the computation; (so good for parameters)
Placeholder is without initial value and it won't change in the computation. (so good for inputs like prediction instances)
For TF V2, same but they try to hide Placeholder (graph mode is not preferred).

In TensorFlow, a variable is just another tensor (like tf.constant or tf.placeholder). It just so happens that variables can be modified by the computation. tf.placeholder is used for inputs that will be provided externally to the computation at run-time (e.g. training data). tf.Variable is used for inputs that are part of the computation and are going to be modified by the computation (e.g. weights of a neural network).

Related

I cant understand LSTM implementation in tensorflow 1

I have been looking at an implementation of LSTM layers in a neural network architecture. An LSTM layer has been defined in it as given below. I am having trouble understanding this code. I have listed my doubts after the code snippet.
code source:https://gist.github.com/awjuliani/66e8f477fc1ad000b1314809d8523455#file-a3c-py
lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(RNN_SIZE,state_is_tuple=True)
c_init = np.zeros((1, lstm_cell.state_size.c), np.float32)
h_init = np.zeros((1, lstm_cell.state_size.h), np.float32)
state_init = [c_init, h_init]
c_in = tf.placeholder(tf.float32, [1, lstm_cell.state_size.c])
h_in = tf.placeholder(tf.float32, [1, lstm_cell.state_size.h])
state_in = (c_in, h_in)
rnn_in = tf.expand_dims(self.h3, [0])
step_size = tf.shape(inputs)[:1]
state_in = tf.nn.rnn_cell.LSTMStateTuple(c_in, h_in)
lstm_outputs, lstm_state = tf.nn.dynamic_rnn(
lstm_cell, rnn_in, initial_state=state_in, sequence_length=step_size,
time_major=False)
lstm_c, lstm_h = lstm_state
state_out = (lstm_c[:1, :], lstm_h[:1, :])
self.rnn_out = tf.reshape(lstm_outputs, [-1, RNN_SIZE])
Here are my doubts:
I understand we need to initialize a random Context and hidden
vectors to pass to our first LSTM cell. But why do initialize both c_init, h_init and then c_in, h_in. What purpose do they serve?
How are they different from each other? (same for state_in and state_init?)
Why do we use LSTMStateTuple?

def work(self, max_episode_length, gamma, sess, coord, saver, dep):
........
rnn_state = self.local_AC.state_init
def train(self, rollout, sess, gamma, bootstrap_value):
......
rnn_state = self.local_AC.state_init
feed_dict = {self.local_AC.target_v: discounted_rewards,
self.local_AC.inputs: np.vstack(observations),
self.local_AC.actions: actions,
self.local_AC.advantages: advantages,
self.local_AC.state_in[0]: rnn_state[0],
self.local_AC.state_in[1]: rnn_state[1]}
At the beginning of work, and then
before training a new batch, the network state is filled with zeros

I understand we need to initialize a random Context and hidden vectors to pass to our first LSTM cell. But why do initialize both c_init, h_init, and then c_in, h_in. What purpose do they serve? How are they different from each other? (same for state_in and state_init?)
To start using LSTM, one should initialise its cell and state state - named c and h respectively. For every input, these states are considered 'empty' and should be initialised with zeros. So that, we have here
c_in = tf.placeholder(tf.float32, [1, lstm_cell.state_size.c])
h_in = tf.placeholder(tf.float32, [1, lstm_cell.state_size.h])
state_in = (c_in, h_in)
state_in = tf.nn.rnn_cell.LSTMStateTuple(c_in, h_in)
Why are there are two variables, state_in and state_init? The first is just placeholders that will be initialised with the second at the evaluation state (i.e., session.run). Because state_in doesn't contain any actual values, in other words, numpy arrays are used during the training phase and tf.placeholders during the phase when one defines an architecture of the network.
TL;DR
Why so? Well, tf1.x (was?) is quite a low-level system. It has the following entities:
tf.Session aka computational session - thing that contain a computational graph(s) and allows user to provide inputs to the graph(s) via session.run.
tf.Graph, that is a representation of a computational graph. Usually engineer defines graph using tf.placeholders and tf.Variabless. One could connect them 'just like' math operations:
with tf.Session() as sess:
a = tf.placeholder(tf.float32, (1,))
b = tf.Variable(1.0, dtype=tf.float32)
tf.global_variables_initializer()
c = a * b
# ...and so on
tf. placeholder's are placeholers, but not actual values, intended to be filled with actual values at the session.run stage. And tf.Variables, well, for the actual weights of the neural network to be optimized. Why not plain NumPy arrays, but something else? It's because TensorFlow automatically adds each tensor and placeholder as an edge to the default computational graph (it's impossible to do the same with NumPy arrays); also, it allows to define an architecture and then initialize/train it with different inputs, which is good.
So, to do a computation (forward/backward propagation, etc.), one has to set placeholders and variables to some values. To do so, in a simple example, we could do the following:
import tensorflow as tf
with tf.compat.v1.Session() as sess:
a = tf.compat.v1.placeholder(tf.float32, shape=())
b = tf.compat.v1.Variable(1.0, dtype=tf.float32)
init = tf.compat.v1.global_variables_initializer()
c = a + b
sess.run(init)
a_value = 2.0
result = sess.run([c], feed_dict={a: a_value})
print("value of [c]:", result)
(I use tf.compat.v1 instead of just tf here because I work in tf2 environment; you could omit it)
Note two things: first, I create init operation. Because in tf1.x it is not enough to initialize a variable like tf.Variable(1.0), but the user has to kinda 'notify' the framework about creating and running init operation.
Then I do a computation: I initialize an a_value variable and map it to the placeholder a' in the sess.runmethod.Session.run` requires a list of tensors to be calculated as a first argument and a mapping from placeholders necessary to compute target tensors to their actual values.
Back to your example: state_in is a placeholder and state_init contains values to be fed into this placeholder somewhere in the code.
It would look like this: less.run(..., feed_dict={state_in: state_init, ...}).
Why do we use LSTMStateTuple?
Addressing the second part of the question: it looks like TensorFlow developers implemented it for some performance optimization. From the source code:
logging.warning(
"%s: Using a concatenated state is slower and will soon be"
"deprecated. Use state_is_tuple=True.", self)
and if state_is_tuple=True, state should be a StateTuple. But I'm not 100% sure about it - I don't remember how I used it. After all, StateTuple is just a collections.namedtuple with two named attributes, c and h.

Is it possible to recover parts of values of any layer in Tensorflow?

I'm using deeplab V3 structure for an image task, but I make a slight change that add a channel at input. So that the first CNN layer becomes [7,7,4,64] instead of [7,7,3,64].
I plan to do transfer learning, so I hope to recover all parameters except for the fourth channel of this first CNN layer, but these four channels are mastered by one tf.Variable so that I don't know how to recover them by tf.train.Saver. (tf.train.Saver can control which tf.Variable should be recovered but not some values of any tf.Variable I think)
Any idea?
Some codes related are shown below:
Load function
def load(saver, sess, ckpt_path):
saver.restore(sess, ckpt_path)
Part of main function
# All variables need to be restored
restore = [v for v in tf.global_variables()]
# Set up tf session and initialize variables
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config = config)
init = tf.global_variables_initializer()
sess.run(init)
# Load Variables
loader = tf.train.Saver(var_list = restore)
load(loader, sess, args.restore_from)
In main function, we can see that recovered variables are controlled by 'restore'. In this case, the first entry of 'restore' is:
<tf.Variable shape=(7,7,4,64) dtype=float32_ref>
But what I only hope to recover is the first three channels from another model, which is with size (7,7,3,64). And initialize the last channel with a zero initializer.
Any function can help with this?

A possible quick hack could be, instead of creating a variable with the new shape and trying to convert parts of it over, just creating a variable with the part that's missing (so shape=[7,7,1,64]) and concatenating it with your variable and using that as the convolution kernel.
For transfer learning to work properly, this should be zero-inited instead of random variables (which should be fine because the other values break the symmetry), or initialized with values that are very small compared to the pretrained ones (assuming the new channel has the same range of values), otherwise the later layers won't see the distributions they expect.

Eager execution get trainable variables

In all the toturials (including tf official docs) that I see about tfe, The example uses the gradient tape, and manually adding all the gradients to the list of computed gradients e.g
variables = [w1, b1, w2, b2] <--- manually store all the variables
optimizer = tf.train.AdamOptimizer()
with tf.GradientTape() as tape:
y_pred = model.predict(x, variables)
loss = model.compute_loss(y_pred, y)
grads = tape.gradient(loss, variables) < ---- send them to tape.gradient
optimizer.apply_gradients(zip(grads, variables))
But is it the only way? even for huge models we need to accumulate all the parameters, or we somehow can access the defaults graph variables list
Trying to access tf.get_default_graph().get_collection(tf.GraphKeys.GLOBAL_VARIABLES)
or trainable_variables inside a tfe session gave the empty list.

To the best of my understanding, Eager mode in TensorFlow stores information about model in objects, for example in tf.keras.Model or tf.estimator.Estimator. In the absence of graph you can get the list of variables only there, using tf.keras.Model.trainable_variables for example.
Eager mode, however, can work with graph object created explicitly. In this case, i think it will store list of variables. Without it, keras model object will be the only explicit storage for variables.

Tensorflow: how to replace one operation in the graph by another?

Currently I have following issue, I used Placeholder to create Input to my network, it is start node, operation with index zero:
placeholder_operation = net.graph.get_operations()[0]
But now I want to replace it with trainable Variable input, to minimize loss changing input(adversarial samples).
I can do it rebuilding my graph entirely through my algorithm, replacing first Placeholder operation with Variable. But is there more elegant way to do it?
Also, in general, if I have only graph of operations(no algorithm that constructed this graph), can I replace random node in this graph by some other operation? I.E. deleting and installing new nodes in any place in the graph.

tf.placeholder_with_default is a pretty nifty option. You can use it to switch out any Tensors, be they Variables, Placeholders or Constants.
import tensorflow as tf
sess = tf.InteractiveSession()
trainable_variable = tf.Variable(3.)
sess.run(tf.global_variables_initializer())
my_placeholder = tf.placeholder_with_default(trainable_variable, trainable_variable.shape)
b = tf.constant(2.)
c = my_placeholder * b
# Use this when you want to use `trainable_variable`
sess.run(c) # 6.0
# Use this when you want to use `my_placeholder`
sess.run(c, {my_placeholder: 5.}) # 10.0

How to initialize a keras tensor employed in an API model

I am trying to implemente a Memory-augmented neural network, in which the memory and the read/write/usage weight vectors are updated according to a combination of their previous values. These weigths are different from the classic weight matrices between layers that are automatically updated with the fit() function! My problem is the following: how can I correctly initialize these weights as keras tensors and use them in the model? I explain it better with the following simplified example.
My API model is something like:
input = Input(shape=(5,6))
controller = LSTM(20, activation='tanh',stateful=False, return_sequences=True)(input)
write_key = Dense(4,activation='tanh')(controller)
read_key = Dense(4,activation='tanh')(controller)
w_w = Add()([w_u, w_r]) #<---- UPDATE OF WRITE WEIGHTS
to_write = Dot()([w_w, write_key])
M = Add()([M,to_write])
cos_sim = Dot()([M,read_key])
w_r = Lambda(lambda x: softmax(x,axis=1))(cos_sim) #<---- UPDATE OF READ WEIGHTS
w_u = Add()([w_u,w_r,w_w]) #<---- UPDATE OF USAGE WEIGHTS
retrieved_memory = Dot()([w_r,M])
controller_output = concatenate([controller,retrieved_memory])
final_output = Dense(6,activation='sigmoid')(controller_output)`
You can see that, in order to compute w_w^t, I have to have first defined w_r^{t-1} and w_u^{t-1}. So, at the beginning I have to provide a valid initialization for these vectors. What is the best way to do it? The initializations I would like to have are:
M = K.variable(numpy.zeros((10,4))) # MEMORY
w_r = K.variable(numpy.zeros((1,10))) # READ WEIGHTS
w_u = K.variable(numpy.zeros((1,10))) # USAGE WEIGHTS`
But, analogously to what said in #2486(entron), these commands do not return a keras tensor with all the needed meta-data and so this returns the following error:
AttributeError: 'NoneType' object has no attribute 'inbound_nodes'
I also thought to use the old M, w_r and w_u as further inputs at each iteration and analogously get in output the same variables to complete the loop. But this means that I have to use the fit() function to train online the model having just the target as final output (Model 1), and employ the predict() function on the model with all the secondary outputs (Model 2) to get the variables to use at the next iteration. I have also to pass the weigth matrices from Model 1 to Model 2 using get_weights() and set_weights(). As you can see, it becomes a little bit messy and too slow.
Do you have any suggestions for this problem?
P.S. Please, do not focus too much on the API model above because it is a simplified (almost meaningless) version of the complete one where I skipped several key steps.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas