How to output an array as actions in a Deep Reinforcement Learning Model? - tensorflow

I am currently working on a custom multiagent DQN environment and my action_space is a list, for example [2,4,3,2,1].
Where actionlist[0] is the action taken by the first agent, actionlist[1] is an action taken by the second and so and so forth.
Normally for a single value output, the codes will look like this:
states = env.observation_space.shape
actions = env.action_space.n
def build_model(states, actions):
model = Sequential()
model.add(Dense(24, activation='relu', input_shape=states))
model.add(Dense(24, activation='relu'))
model.add(Dense(actions, activation='linear'))
return model
...some model building
def build_agent(model, actions):
policy = BoltzmannQPolicy()
memory = SequentialMemory(limit=50000, window_length=1)
dqn = DQNAgent(model=model, memory=memory, policy=policy,
nb_actions=actions, nb_steps_warmup=10, target_model_update=1e-2)
return dqn
However , this doesn't work with an output that is not a single value.
How do go about doing that?

First you need to know that there are two types of action space problem (discrete and continuous) and there are two types of reinforcement learning method (1.value-based and 2.policy or policy/value based).
DQN is value based reinforcement learning method, which calculates the value for each discrete action at any given state vector. Therefore, it is only designed for discrete action space problem (e.g., action can on be one value in a vector [1,2,3]).
"not a single value" action in your description is a continuous action space problem, where your action can be any possible value at a given range (e.g., 2.0120321 in the range of [0,10]). For the continuous action space problem, you need to use policy based reinforcement learning method, such as Actor-Critic and other popular frameworks built on top of actor-critic such as PPO, DDPG, etc.
I suggest you to get some basic ideas of reinforcement learning by reading the most popular text book in this area (Reinforcement Learning: An Introduction) from Richard Sutton:


How to disable e.g. Dropout in tf.keras.Model to generate activation maximation images using transfer learning

I am using transfer learning and keras.applications.InceptionV3. I manage to train the model successfully.
However, when I want to generate "activation maximisation" images (e.g. the input image that maximizes the activation of one of the custom classes, ref eg ) I struggle to use the pre-trained model since I do manage to use it in "fit" mode and disable all dropouts etc.
What I do is that I combine the pre-trained model in a tf.keras.Sequential to do gradient descent on the weights of the first layer (the input image).
Despite setting base_model.trainable = False however it seems as if the pre-trained model is put into training mode (although weights are not updated) when using on the outer sequential model.
Is there any way to force the base_model (a child of a Sequential) to be in "predict" mode when calling fit on the outer?
I just came across the same question. After reading some documentation and having a look on the source code of TensorFlows implementations of tf.keras.layers.Layer, tf.keras.layers.Dense, and tf.keras.layers.BatchNormalization I got the following understanding.
If training = False is passed on calling the layer, it will run in inference mode. This has nothing to do with the attribute trainable, which means something different. It would probably lead to less misunderstanding, if they would have called it training_mode instead.
When doing Transfer Learning or Fine Tuning training = False should be passed on calling the base model itself. As far as I saw until now this will only affect layers like tf.keras.layers.Dropout and tf.keras.layers.BatchNormalization and will have not effect on the other layers.
Running in inference mode via training = False will result in tf.layers.Dropout not to apply the dropout rate at all.
As tf.layers.Dropout has no trainable weights, setting the attribute trainable = False will have no effect at all,

Implementing stochastic forward passes in part of a neural network in Keras?

my problem is the following:
I am working on an object detection problem and would like to use dropout during test time to obtain a distribution of outputs. The object detection network consists of a training model and a prediction model, which wraps around the training model. I would like to perform several stochastic forward passes using the training model and combine these e.g. by averaging the predictions in the prediction wrapper. Is there a way of doing this in a keras model instead of requiring an intermediate processing step using numpy?
Note that this question is not about how to enable dropout during test time
def prediction_wrapper(model):
# Example code.
# Arguments
# model: the training model
regression = model.outputs[0]
classification = model.outputs[1]
predictions = # TODO: perform several stochastic forward passes (dropout during train and test time) here
avg_predictions = # TODO: combine predictions here, e.g. by computing the mean
outputs = # TODO: do some processing on avg_predictions
return keras.models.Model(inputs=model.inputs, outputs=outputs, name=name)
I use keras with a tensorflow backend.
I appreciate any help!
The way I understand, you're trying to average the weight updates for a single sample while Dropout is enabled. Since dropout is random, you would get different weight updates for the same sample.
If this understanding is correct, then you could create a batch by duplicating the same sample. Here I am assuming that the Dropout is different for each sample in a batch. Since, backpropagation averages the weight updates anyway, you would get your desired behavior.
If that does not work, then you could write a custom loss function and train with a batch-size of one. You could update a global counter inside your custom loss function and return non-zero loss only when you've averaged them the way you want it. I don't know if this would work, it's just an idea.

Optimizing number of optimum features

I am training neural network using Keras. Every time I train my model, I use slightly different set of features selected using Tree-based feature selection via ExtraTreesClassifier(). After training every time, I compute the AUCROC on my validation set and then go back in a loop to train the model again with different set of feature. This process is very inefficient and I want to select the optimum number of features using some optimization technique available in some python library.
The function to be optimized is the auroc for cross validation which can only be calculated after training the model on selected features. The features are selected via following function ExtraTreesClassifier(n_estimators=10, criterion=’gini’, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=’auto’) Here we see that the objective function is not directly dependent on the parameters to be optimized. The objective function which is auroc is related to the neural network training and the neural network takes features as input which are extracted on the basis of their important from ExtraTreesClassifier.
So in a way, the parameters for which I optimize auroc are n_estimators=10, criterion=’gini’, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=’auto’ or some other variables in ExtraTreesClassifier. These are not directly related to auroc.
You should combine GridSearchCV and Pipeline. Find more here
Use Pipeline when you need to run a set of instruction in sequence to get the optimal config.
For example, you have these steps to run:
1. Select KBest feature(s)
2. Use classifier DecisionTree or NaiveBayes
By combining GridSearchCV and Pipeline, you can select which features that best for a particular classifier, best config on the classifier, and so on, based on the scoring criteria.
#set your configuration options
param_grid = [{
'classify': [DecisionTreeClassifier()], #first option use DT
'kbest__k': range(1, 22), #range of n in SelectKBest(n)
#classifier's specific configs
'classify__criterion': ('gini', 'entropy'),
'classify__min_samples_split': range(2,10),
'classify__min_samples_leaf': range(1,10)
'classify': [GaussianNB()], #second option use NB
'kbest__k': range(1, 22), #range of n in SelectKBest(n)
pipe = Pipeline(steps=[("kbest", SelectKBest()), ("classify", DecisionTreeClassifier())]) #I put DT as default, but eventually the program will ignore this when you use GridSearchCV.
# Here the might of GridSearchCV working, this may takes time especially if you have more than one classifiers to be evaluated
grid = GridSearchCV(pipe, param_grid=param_grid, cv=10, scoring='f1'), labels)
#Find your best params if you want to use optimal setting later without running the grid search again (by commenting all these grid search lines)
print grid.best_params_
#You can now use pipeline again to wrap the steps with it best configs to build your model
pipe = Pipeline(steps=[("kbest", SelectKBest(k=12)), ("classify", DecisionTreeClassifier(criterion="entropy", min_samples_leaf=2, min_samples_split=9))])
Hope this helps
The flow of my program is in two stages.
I am using Sklearn ExtraTreesClassifier along with SelectFromModelmethod to select the most important features. Here it should be noted that the ExtraTreesClassifier takes many parameters as input like n_estimators etc for classification and eventually giving different set of important features for different values of n_estimators via SelectFromModel. This means that I can optimize the n_estimators to get the best features.
In the second stage, I am traing my NN keras model based on the features selected in the first stage. I am using AUROC as the score for grid search but this AUROC is calculated using Keras based neural network. I want to use Grid Search for n_estimators in my ExtraTreesClassifier to optimize the AUROC of keras neural Network. I know I have to use Pipline but I am confused in implementing both together.
I don't know where to put Pipeline in my code. I am getting an error which saysTypeError: estimator should be an estimator implementing 'fit' method, <function fs at 0x0000023A12974598> was passed
I concatenate the CV set and the train set so that I may select the most important features
in both CV and Train together.
frames11 = [train_x_upsampled, cross_val_x_upsampled]
train_cv_x = pd.concat(frames11)
frames22 = [train_y_upsampled, cross_val_y_upsampled]
train_cv_y = pd.concat(frames22)
def fs(n_estimators):
m = ExtraTreesClassifier(n_estimators = tree_number),train_cv_y)
sel = SelectFromModel(m, prefit=True)
The code below is to get the names of the selected important features
feature_idx = sel.get_support()
feature_name = train_cv_x.columns[feature_idx]
feature_name =pd.DataFrame(feature_name)
X_new = sel.transform(train_cv_x)
X_new =pd.DataFrame(X_new)
So Now the important features selected are in the data-frame X_new. In
code below, I am again dividing the data into train and CV but this time
only with the important features selected.
train_selected_x = X_new.iloc[0:train_x_upsampled.shape[0], :]
cv_selected_x = X_new.iloc[train_x_upsampled.shape[0]:train_x_upsampled.shape[0]+cross_val_x_upsampled.shape[0], :]
train_selected_y = train_cv_y.iloc[0:train_x_upsampled.shape[0], :]
cv_selected_y = train_cv_y.iloc[train_x_upsampled.shape[0]:train_x_upsampled.shape[0]+cross_val_x_upsampled.shape[0], :]
Now with this new data which only contains the important features,
I am training a neural network as below.
def create_model():
model = Sequential()
model.add(Dense(n_x_new, input_dim=n_x_new, kernel_initializer='glorot_normal', activation='relu'))
model.add(Dense(10, kernel_initializer='glorot_normal', activation='relu'))
model.add(Dense(1, kernel_initializer='glorot_normal', activation='sigmoid'))
optimizer = keras.optimizers.Adam(lr=0.001)
model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
seed = 7
model = KerasClassifier(build_fn=create_model, epochs=20, batch_size=400, verbose=0)
param_grid = dict(n_estimators=n_estimators)
grid = GridSearchCV(estimator=fs, param_grid=param_grid,scoring='roc_auc',cv = PredefinedSplit(test_fold=my_test_fold), n_jobs=1)
grid_result =, cv_selected_x), axis=0), np.concatenate((train_selected_y, cv_selected_y), axis=0))

tf.zeros vs tf.placeholder as RNN initial state

Tensorflow newbie here! I understand that Variables will be trained over time, placeholders are used input data that doesn't change as your model trains (like input images, and class labels for those images).
I'm trying to implement the forward propagation of RNN using Tensorflow, and wondering on what type I should save the output of the RNN cell. In numpy RNN implementation, it uses
hiddenStates = np.zeros((T, self.hidden_dim)) #T is the length of the sequence
Then it iteratively saves the output in the np.zeros array.
In case of TF, which one should I use, tf.zeros or tf.placeholder?
What is the best practice in this case? I think it should be fine to use tf.zeros but wanted to double check.
First of all, it is important to you to understand that everything inside Tensorflow is a Tensor. So when you are performing some kind of computation (e.g. an rnn implementation like outputs = rnn(...)) the output of this computation is returned as a Tensor. So you don't need to store it inside any kind of structure. You can retrieve it by running the correspondent node (i.e. output) like, feed_dict).
Told this, I think you need to take the final state of an RNN and provide it as initial state of a subsequent computation. Two ways:
A) If you are using RNNCell implementations During the construction of your model you can construct the zero state like this:
cell = (some RNNCell implementation)
initial_state = cell.zero_state(batch_size, tf.float32)
B) If you are uimplementing your own staff Define the state as a zero Tensor:
initial_state = tf.zeros([batch_size, hidden_size])
Then, in both cases you will have something like:
output, final_state = rnn(input, initial_state)
In your execution loop you can initialize your state first and then provide the final_state as initial_stateinside your feed_dict:
state =
for step in range(epochs):
feed_dict = {initial_state: state}
_, state =,final_state), feed_dict)
How you actually construct your feed_dict depends on the implementation of the RNN.
For an BasicLSTMCell, for example, a state is an LSTMState object and you need to provide both c and h:
feed_dict = {initial_state.c=state.c, initial_state.h: state.h}

Tensorflow RNN weight matrices initialization

I'm using bidirectional_rnn with GRUCell but this is a general question regarding the RNN in Tensorflow.
I couldn't find how to initialize the weight matrices (input to hidden, hidden to hidden). Are they initialized randomly? to zeros? are they initialized differently for each LSTM I create?
EDIT: Another motivation for this question is in pre-training some LSTMs and using their weights in a subsequent model. I don't currently know how to do that currently without saving all the states and restoring the entire model.
How to initialize weight matrices for RNN?
I believe people are using random normal initialization for weight matrices for RNN. Check out the example in TensorFlow GitHub Repo. As the notebook is a bit long, they have a simple LSTM model where they use tf.truncated_normal to initialize weights and tf.zeros to initialize biases (although I have tried using tf.ones to initialize biases before, seem to also work). I believe that the standard deviation is a hyperparameter you could tune yourself. Sometimes weights initialization is important to the gradient flow. Although as far as I know, LSTM itself is designed to handle gradient vanishing problem (and gradient clipping is for helping gradient exploding problem), so perhaps you don't need to be super careful with the setup of std_dev in LSTM? I've read papers recommending Xavier initialization (TF API doc for Xavier initializer) in Convolution Neural Network context. I don't know if people use that in RNN, but I imagine you can even try those in RNN if you want to see if it helps.
Now to follow up with #Allen's answer and your follow up question left in the comments.
How to control initialization with variable scope?
Using the simple LSTM model in the TensorFlow GitHub python notebook that I linked to as an example.
Specifically, if I want to re-factorize the LSTM part of the code in above picture using variable scope control, I may code something as following...
import tensorflow as tf
def initialize_LSTMcell(vocabulary_size, num_nodes, initializer):
'''initialize LSTMcell weights and biases, set variables to reuse mode'''
gates = ['input_gate', 'forget_gate', 'memory_cell', 'output_gate']
with tf.variable_scope('LSTMcell') as scope:
for gate in gates:
with tf.variable_scope(gate) as gate_scope:
wx = tf.get_variable("wx", [vocabulary_size, num_nodes], initializer)
wt = tf.get_variable("wt", [num_nodes, num_nodes], initializer)
bi = tf.get_variable("bi", [1, num_nodes, tf.constant_initializer(0.0)])
gate_scope.reuse_variables() #this line can probably be omitted, b.z. by setting 'LSTMcell' scope variables to 'reuse' as the next line, it'll turn on the reuse mode for all its child scope variables
def get_scope_variables(scope_name, variable_names):
'''a helper function to fetch variable based on scope_name and variable_name'''
vars = {}
with tf.variable_scope(scope_name, reuse=True):
for var_name in variable_names
var = tf.get_variable(var_name)
vars[var_name] = var
return vars
def LSTMcell(i, o, state):
'''a function for performing LSTMcell computation'''
gates = ['input_gate', 'forget_gate', 'memory_cell', 'output_gate']
var_names = ['wx', 'wt', 'bi']
gate_comp = {}
with tf.variable_scope('LSTMcell', reuse=True):
for gate in gates:
vars = get_scope_variables(gate, var_names)
gate_comp[gate] = tf.matmul(i, vars['wx']) + tf.matmul(o, vars['wt']) + vars['bi']
state = tf.sigmoid(gate_comp['forget_gate']) * state + tf.sigmoid(gate_comp['input_gate']) * tf.tanh(gate_comp['memory_cell'])
output = tf.sigmoid(gate_comp['output_gate']) * tf.tanh(state)
return output, state
The usage of the re-factorized code would be something like following...
initialize_LSTMcell(volcabulary_size, num_nodes, tf.truncated_normal_initializer(mean=-0.1, stddev=.01))
#...Doing some computation...
LSTMcell(input_tensor, output_tensor, state)
Even though the refactorized code may look less straightforward, but using scope variable control ensures scope encapsulation and allows flexible variable controls (in my opinion at least).
In pre-training some LSTMs and using their weights in a subsequent model. How to do that without saving all the states and restoring the entire model.
Assuming you have a pre-trained model froze and loaded in, if you wanna use their frozen 'wx', 'wt' and 'bi', you can simply find their parent scope names and variable names, then fetch the variables using similar structure in get_scope_variables func.
with tf.variable_scope(scope_name, reuse=True):
var = tf.get_variable(var_name)
Here is a link to understanding variable scope and sharing variables. I hope this is helpful.
The RNN models will create their variables with get_variable, and you can control the initialization by wrapping the code which creates those variables with a variable_scope and passing a default initializer to it. Unless the RNN specifies one explicitly (looking at the code, it doesn't), uniform_unit_scaling_initializer is used.
You should also be able to share model weights by declaring the second model and passing reuse=True to its variable_scope. As long as the namespaces match up, the new model will get the same variables as the first model.
A simple way to initialize all kernel weights with certain initializer is to leave the initializer in tf.variable_scope(). For example:
with tf.variable_scope('rnn', initializer=tf.variance_scaling_initializer()):
basic_cell= tf.contrib.rnn.BasicRNNCell(num_units=n_neurons)
outputs, state= tf.nn.dynamic_rnn(basic_cell, X, dtype=tf.float32)