My code looks like:
import tensorflow as tf
N = 16, num_ckfs = 5
init_variances = tf.placeholder(tf.float64, shape=[ num_ckfs, N],name='inital_variances')
init_states = tf.placeholder(tf.float64, shape=[num_ckfs, N], name='init_states')
#some more code
predicted_state = prior_state_expanded + kalman_gain * diff_expanded
error_covariance = sum_cov_cholesky + tf.batch_matmul(kg , kalman_gain, adj_x=True)
projected_output = tf.batch_matmul(predicted_state,input_vectors_extra, adj_y=True)
session = tf.Session()
# read data from input file
init_var = [10 for i in range(N)]
init_var_ckfs = [init_var for i in range(num_ckfs)]
init_state = [0 for i in range(N)]
init_state_ckfs = [init_state for i in range(num_ckfs)]
for timestep in range(10):
out=[projected_output, predicted_state, error_covariance], {init_variances:init_var_ckfs, init_states:init_state_ckfs })
init_state_ckfs = np.array([i.tolist()[0] for i in out[1]])
init_var_ckfs = np.array([i.diagonal().tolist() for i in out[2]])
This code is for running a Cubature Kalman Filter(CKF) in a batched mode. For example:
num_ckfs = 5
means that this code will run 5 CKFs in parallel. Now, what I would like to do is to distribute the workload to multiple nodes depending upon the value of num_ckfs. For example, if I pass num_ckfs as an argument to the code, and it is set to 20,000, then I would distribute the workload to 4 nodes running 5000 each.
I would like to do this using the distributed version of Tensorflow. Can someone please give me some hints on how this could be achieved? Ideally, I should have to execute the code on a single node and it should then get distributed to as many nodes as defined in


How to perform custom operations in between keras layers?

I have one input and one output neural network and in between I need to perform small operation. I have two inputs (from the same distribution of either mean 0 or mean 1) which I need to fed to the neural network one at a time and compare the output of each input. After the comparison, I am finally generating the prediction of the model. The implementation is as follows:
from tensorflow import keras
import tensorflow as tf
import numpy as np
#define network
x1 = keras.Input(shape=(1), name="x1")
x2 = keras.Input(shape=(1), name="x2")
model = keras.layers.Dense(20)
model1 = keras.layers.Dense(1)
x11 = model1(model(x1))
x22 = model1(model(x2))
After this I need to perform following operations:
if x11>=x22:
Finally I need to do:
out = Vm - 0.5
out= keras.activations.sigmoid(out)
model = keras.Model([x1,x2], out)
tf.keras.utils.plot_model(model) #visualize model
I have normally distributed pair of data with same mean (mean 0 and mean 1 as generated below:
#Generating training dataset
from scipy.stats import skewnorm
n=1000 #sample each
s = 1 # scale to change o/p range
X1_0 = skewnorm.rvs(a = 0 ,loc=0, size=n)*s; X1_1 = skewnorm.rvs(a = 0 ,loc=1, size=n)*s #Skewnorm function
X2_0 = skewnorm.rvs(a = 0 ,loc=0, size=n)*s; X2_1 = skewnorm.rvs(a = 0 ,loc=1, size=n)*s #Skewnorm function
X1_train = list(X1_0) + list(X1_1) #append both data
X2_train = list(X2_0) + list(X2_1) #append both data
y_train = [x for x in (0,1) for i in range(0, n)] #make Y for above conditions
#reshape to proper format
X1_train = np.array(X1_train).reshape(-1,1)
X2_train = np.array(X2_train).reshape(-1,1)
y_train = np.array(y_train)
#train model[X1_train, X2_train], y_train, epochs=10)
I am not been able to run the program if I include operation
if x11>=x22:
in between layers. If I directly work with maximum of outputs as:
Vm = keras.layers.Maximum()([x11,x22])
The program is working fine. But I need to select either x1 or x2 based on the value of x11 and x22.
The problem might be due to the inclusion of the comparison operation while defining structure of the model where there is no value for x11 and x22 (I guess). I am totally new to all these stuffs and so I could not resolve this. I would greatly appreciate any help/suggestions. Thank you.
You can add this functionality via a Lambda layer.
Vm = tf.keras.layers.Lambda(lambda x: tf.where(x[0]>=x[1], x[2], x[3]))([x11, x22, x1, x2])

Multi-GPU TFF simulation errors "Detected dataset reduce op in multi-GPU TFF simulation"

I ran my code for an emotion detection model using Tensorflow Federated simulation. My code work perfectly fine using CPUs only. However, I received this error when trying to run TFF with GPU.
ValueError: Detected dataset reduce op in multi-GPU TFF simulation: `use_experimental_simulation_loop=True` for `tff.learning`; or use `for ... in iter(dataset)` for your own dataset iteration.Reduce op will be functional after b/159180073.
What is this error about and how can I fix it? I tried to search many places but found no answer.
Here is the call stack if it help. It is very long so I pasted into this link:
Here is the code containing iterative_process
def startTraining(output_file):
iterative_process = tff.learning.build_federated_averaging_process(
client_optimizer_fn=lambda: tf.keras.optimizers.SGD(learning_rate=0.01),
server_optimizer_fn=lambda: tf.keras.optimizers.SGD(learning_rate=1.0),
flstate = iterative_process.initialize()
evaluation = tff.learning.build_federated_evaluation(model_fn)
curr_round_result = [0,0,100,0,100,0]
min_val_loss = 100
for round in range(1,ROUND_COUNT + 1):
available_users = fetch_available_users_and_increase_time(ROUND_DURATION_AVERAGE + random.randint(-ROUND_DURATION_VARIATION, ROUND_DURATION_VARIATION + 1))
if(len(available_users) == 0):
train_data = make_federated_data(available_users, 'train')
flstate, metrics =, train_data)
val_data = make_federated_data(available_users, 'val')
val_metrics = evaluation(flstate.model, val_data)
curr_round_result[0] = round
curr_round_result[1] = len(available_users)
curr_round_result[2] = metrics['train']['loss']
curr_round_result[3] = metrics['train']['sparse_categorical_accuracy']
curr_round_result[4] = val_metrics['loss']
curr_round_result[5] = val_metrics['sparse_categorical_accuracy']
Here is the code for make_federated_data
def make_federated_data(users, dataset_type):
offset = 0
if(dataset_type == 'val'):
offset = train_size
elif(dataset_type == 'test'):
offset = train_size + val_size
for id in users:
if(id + offset not in LOADED_USER):
LOADED_USER[id + offset] = getDatasetFromFilePath(filepaths[id + offset])
return [
LOADED_USER[id + offset]
for id in users
TFF does support Multi-GPU, and as the error message says one of two things is happening:
The code is using tff.learning but using the default use_experimental_simulation_loop argument value of False. With multiple GPUs, this must be set to True when using APIs including tff.learning.build_federated_averaging_process. For example, calling with:
training_process = tff.learning.build_federated_averaging_process(
..., use_experimental_simulation_loop=True)
The code contains a custom call somewhere. This must be replaced with Python code that iterates over the dataset. For example:
result = dataset.reduce(initial_state=0, reduce_func=lambda s, x: s + x)
s = 0
for x in iter(dataset):
s += x
I realized that TFF has not yet supported multi-GPUs. Therefore, we need to limit number visible of GPUs to just 1, using:
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

Tensorflow operation to combine Iterator string handles within a single session feed_dict

I would like to generate minibatches with varying combinations of multiple datasets in a manner that uses the data api and does not cause tensor leakage (i.e., increasing the number of graph ops over time). For example, minibatch 1 might be a1, a2, b1, b2 followed by minibatch 2 with a3, a4, c1, c2.
Is it possible to run a single session on multiple initialized dataset iterators via the "string handle feed_dict method" (see feedable at TF ). Is there an op to combine two Iterator.string_handle objects? I have a minimal working example below that shows my issue at the end after the sys.exit.
import tensorflow as tf # v.1.4
import sys
# Predetermine minibatch size.
num_per_class = 6
# Create example datasets.
ds0 =, 100, 2)
ds1 =, 101, 2)
# Minibatchify. Note: could use adjustable tensor for minibatch size.
ds0 = ds0.apply(
ds1 = ds1.apply(
# Run forever.
ds0 = ds0.repeat()
ds1 = ds1.repeat()
# Dataset iterators.
ds0_itr = ds0.make_initializable_iterator()
ds1_itr = ds1.make_initializable_iterator()
# Switcher handle placeholder, iterator and ultimate minibatch datums.
switcher_h = tf.placeholder(tf.string, shape=[])
switcher_h_itr =,
mb_datums = switcher_h_itr.get_next()
# Start session.
sess = tf.Session()
# Dataset iterator handles.
ds0_h =
ds1_h =
# *Separate* dataset feed_dicts.
ds0_fd = {switcher_h: ds0_h}
ds1_fd = {switcher_h: ds1_h}
# Initialize dataset iterators.[ds0_itr.initializer, ds1_itr.initializer])
# Print some datums from either (XOR) dataset.
print('ds0 data: {}'.format(, ds0_fd)))
print('ds1 data: {}'.format(, ds1_fd)))
ds01_fd = {switcher_h: OP_TO_COMBINE_STRING_HANDLES(ds0_h, ds1_h)}
print('ds0+ds1: {}'.format(, ds01_fd)))
I know it's old, but for others who get to this question as I did and don't want to figure it out themselves: here's a minimal example that uses one dataset to dynamically select or "get_next()" from one of two other datasets:
import numpy as np
import tensorflow as tf
x = np.full(100, 1)
y = np.full(100, 2)
x_i =
y_i =
with tf.Session() as sesh:
[x_h, y_h] =[x_i.string_handle(), y_i.string_handle()])
z_d =
z_d = x: tf.gather([x_h, y_h], tf.cast(tf.round(x), tf.int32)))
z_i = z_d.make_one_shot_iterator()
picker_i =, tf.int64).get_next()
for i in range(100):

If I don't want to train in batches and my state is a vector, what should my tensors have for a shape?

I'm trying to use tensorflow to solve a reinforced learning problem. I created an gym environment of my own. The state is a one dimensional array (size 224) and there are 170 actions to choose from (0...169). I do not want to train in batches. What I want is to make the most simple version of the RL problem running with tensorflow.
My main problem is, i guess the dimensions. I would assume that TF would allow me to input the state as 1D tensor. But then I get an error when I want to calculate W*input=action. Dimensions error make it hard to know whats right. Also, examples on the web focus on training from images, in batches.
In general, I started in this tutorial, but the state is encoded differently, which again makes it hard to follow (especially since I'm not really familiar with python).
import gym
import numpy as np
import random
import tensorflow as tf
env = gym.make('MyOwnEnv-v0')
n_state = 224
n_action = 170
sess = tf.InteractiveSession()
# Implementing the network itself
inputs1 = tf.placeholder(shape=[1,n_state],dtype=tf.float32)
W = tf.Variable(tf.random_uniform([n_state,n_action],0,0.01))
Qout = tf.transpose(tf.matmul(inputs1,W))
predict = tf.reshape(tf.argmax(Qout,1), [n_action,1])
#Below we obtain the loss by taking the sum of squares difference between the target and prediction Q values.
nextQ = tf.placeholder(shape=[n_action,1],dtype=tf.float32)
loss = tf.reduce_sum(tf.square(nextQ - Qout))
trainer = tf.train.GradientDescentOptimizer(learning_rate=0.1)
updateModel = trainer.minimize(loss)
# Training the network
init = tf.global_variables_initializer()
print("input: ", inputs1.get_shape()
, "\nW: ", W.get_shape()
, "\nQout: ", Qout.get_shape()
, "\npredict:", predict.get_shape()
, "\nnextQ: ", nextQ.get_shape()
, "\nloss: ", loss.get_shape())
# Set learning parameters
y = .99
e = 0.1
num_episodes = 2000
#create lists to contain total rewards and steps per episode
jList = []
rList = []
with tf.Session() as sess:
for i in range(num_episodes):
#Reset environment and get first new observation
s = env.reset()
rAll = 0
d = False
j = 0
#The Q-Network
while j < 99:
#Choose an action by greedily (with e chance of random action) from the Q-network
a,allQ =[predict,Qout],feed_dict={inputs1:s})
if np.random.rand(1) < e:
a = env.action_space.sample()
#Get new state and reward from environment
s1,r,d,_ = env.step(a)
#Obtain the Q' values by feeding the new state through our network
Q1 =,feed_dict={inputs1:s1})
#Obtain maxQ' and set our target value for chosen action.
maxQ1 = np.max(Q1)
targetQ = allQ
#targetQ[0,a[0]] = r + y*maxQ1
targetQ[a,0] = r + y*maxQ1
#Train our network using target and predicted Q values
_,W1 =[updateModel,W],feed_dict={inputs1:s,nextQ:targetQ})
rAll += r
s = s1
if d == True:
#Reduce chance of random action as we train the model.
e = 1./((i/50) + 10)
print('Percent of succesful episodes: ' + str(sum(rList)/num_episodes) + '%')

How to keep calculated values in a Tensorflow graph (on the GPU)?

How can we make sure that a calculated value will not be copied back to CPU/python memory, but is still available for calculations in the next step?
The following code obviously doesn't do it:
import tensorflow as tf
a = tf.Variable(tf.constant(1.),name="a")
b = tf.Variable(tf.constant(2.),name="b")
result = a + b
stored = result
with tf.Session() as s:
val =[result,stored],{a:1.,b:2.})
print(val) # 3[result],{a:4.,b:5.})
print(val) # 9
print(stored.eval()) # 3 NOPE:
Error : Attempting to use uninitialized value _recv_b_0
The answer is to store the value in a tf.Variable by storing to it using the assign operation:
working code:
import tensorflow as tf
with tf.Session() as s:
a = tf.Variable(tf.constant(1.),name="a")
b = tf.Variable(tf.constant(2.),name="b")
result = a + b
stored = tf.Variable(tf.constant(0.),name="stored_sum")
val,_ =[result,assign_op],{a:1.,b:2.})
print(val) # 3,{a:4.,b:5.})
print(val[0]) # 9
print(stored.eval()) # ok, still 3