AssertionError: batch_size must be divisible by the number of TPU cores in use (1 vs 8) when using the predict function - tensorflow

Some details for context:
Working on Google Colab using TPU.
Model is fitting successfully without any issues
Running into issues while attempting to use the predict function
Here is the code I'm using to train:
tpu_model.fit(x, y,
batch_size=128,
epochs=60)
Here is the code I'm using to predict:
def generate_output():
generated = ''
#sentence = text[start_index: start_index + Tx]
#sentence = '0'*Tx
usr_input = input("Write the beginning of your poem, the Shakespeare machine will complete it. Your input is: ")
# zero pad the sentence to Tx characters.
sentence = ('{0:0>' + str(maxlen) + '}').format(usr_input).lower()
generated += usr_input
sys.stdout.write("\n\nHere is your poem: \n\n")
sys.stdout.write(usr_input)
for i in range(400):
x_pred = np.zeros((1, maxlen, len(chars)))
for t, char in enumerate(sentence):
if char != '0':
x_pred[0, t, char_indices[char]] = 1.
--> preds = tpu_model.predict(x_pred, batch_size = 128 ,workers = 8,verbose=0)[0]
next_index = sample(preds, temperature = 1.0)
next_char = indices_char[next_index]
generated += next_char
sentence = sentence[1:] + next_char
sys.stdout.write(next_char)
sys.stdout.flush()
if next_char == '\n':
continue
And here is the error (Added an arrow above so you know the location of the error:
AssertionError: batch_size must be divisible by the number of TPU cores in use (1 vs 8)
This makes no sense to me as the batch size I used while training is divisible by 8 AND the batch size I've passes in my predict function is divisible by 8.
I'm not sure what the issue is and how to resolve it. Any help would be much appreciated.

From the error:
AssertionError: batch_size must be divisible by the number of TPU cores in use (1 vs 8)
It looks like you are using a batch_size of 1, which can be inferred from the first dimension of your input data:
x_pred = np.zeros((1, maxlen, len(chars)))
I think you might want to change it to:
x_pred = np.zeros((8, maxlen, len(chars)))
so that the batch dimension becomes 8 that matches the number of TPU cores in use.
Or you can also keep the current batch_size of 1 but use 1 TPU core.

Related

Multi-GPU TFF simulation errors "Detected dataset reduce op in multi-GPU TFF simulation"

I ran my code for an emotion detection model using Tensorflow Federated simulation. My code work perfectly fine using CPUs only. However, I received this error when trying to run TFF with GPU.
ValueError: Detected dataset reduce op in multi-GPU TFF simulation: `use_experimental_simulation_loop=True` for `tff.learning`; or use `for ... in iter(dataset)` for your own dataset iteration.Reduce op will be functional after b/159180073.
What is this error about and how can I fix it? I tried to search many places but found no answer.
Here is the call stack if it help. It is very long so I pasted into this link: https://pastebin.com/b1R93gf1
EDIT:
Here is the code containing iterative_process
def startTraining(output_file):
iterative_process = tff.learning.build_federated_averaging_process(
model_fn,
client_optimizer_fn=lambda: tf.keras.optimizers.SGD(learning_rate=0.01),
server_optimizer_fn=lambda: tf.keras.optimizers.SGD(learning_rate=1.0),
use_experimental_simulation_loop=True
)
flstate = iterative_process.initialize()
evaluation = tff.learning.build_federated_evaluation(model_fn)
output_file.write(
'round,available_users,loss,sparse_categorical_accuracy,val_loss,val_sparse_categorical_accuracy,test_loss,test_sparse_categorical_accuracy\n')
curr_round_result = [0,0,100,0,100,0]
min_val_loss = 100
for round in range(1,ROUND_COUNT + 1):
available_users = fetch_available_users_and_increase_time(ROUND_DURATION_AVERAGE + random.randint(-ROUND_DURATION_VARIATION, ROUND_DURATION_VARIATION + 1))
if(len(available_users) == 0):
write_to_file(curr_round_result)
continue
train_data = make_federated_data(available_users, 'train')
flstate, metrics = iterative_process.next(flstate, train_data)
val_data = make_federated_data(available_users, 'val')
val_metrics = evaluation(flstate.model, val_data)
curr_round_result[0] = round
curr_round_result[1] = len(available_users)
curr_round_result[2] = metrics['train']['loss']
curr_round_result[3] = metrics['train']['sparse_categorical_accuracy']
curr_round_result[4] = val_metrics['loss']
curr_round_result[5] = val_metrics['sparse_categorical_accuracy']
write_to_file(curr_round_result)
Here is the code for make_federated_data
def make_federated_data(users, dataset_type):
offset = 0
if(dataset_type == 'val'):
offset = train_size
elif(dataset_type == 'test'):
offset = train_size + val_size
global LOADED_USER
for id in users:
if(id + offset not in LOADED_USER):
LOADED_USER[id + offset] = getDatasetFromFilePath(filepaths[id + offset])
return [
LOADED_USER[id + offset]
for id in users
]
TFF does support Multi-GPU, and as the error message says one of two things is happening:
The code is using tff.learning but using the default use_experimental_simulation_loop argument value of False. With multiple GPUs, this must be set to True when using APIs including tff.learning.build_federated_averaging_process. For example, calling with:
training_process = tff.learning.build_federated_averaging_process(
..., use_experimental_simulation_loop=True)
The code contains a custom tf.data.Dataset.reduce(...) call somewhere. This must be replaced with Python code that iterates over the dataset. For example:
result = dataset.reduce(initial_state=0, reduce_func=lambda s, x: s + x)
becomes
s = 0
for x in iter(dataset):
s += x
I realized that TFF has not yet supported multi-GPUs. Therefore, we need to limit number visible of GPUs to just 1, using:
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

How do I discover the values for variables of an equation with keras/tensorflow?

I have an equation that describes a curve in two dimensions. This equation has 5 variables. How do I discover the values of them with keras/tensorflow for a set of data? Is it possible? Someone know a tutorial of something similar?
I generated some data to train the network that has the format:
sample => [150, 66, 2] 150 sets with 66*2 with the data something like "time" x "acceleration"
targets => [150, 5] 150 sets with 5 variable numbers.
Obs: I know the range of the variables. I know too, that 150 sets of data are too few sample, but I need, after the code work, to train a new network with experimental data, and this is limited too. Visually, the curve is simple, it has a descendent linear part at the beggining and at the end it gets down "like an exponential".
My code is as follows:
def build_model():
model = models.Sequential()
model.add(layers.Dense(512, activation='relu', input_shape=(66*2,)))
model.add(layers.Dense(5, activation='softmax'))
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['mae'])
return model
def smooth_curve(points, factor=0.9):
[...]
return smoothed_points
#load the generated data
train_data = np.load('samples00.npy')
test_data = np.load('samples00.npy')
train_targets = np.load('labels00.npy')
test_targets = np.load('labels00.npy')
#normalizing the data
mean = train_data.mean()
train_data -= mean
std = train_data.std()
train_data /= std
test_data -= mean
test_data /= std
#k-fold validation:
k = 3
num_val_samples = len(train_data)//k
num_epochs = 100
all_mae_histories = []
for i in range(k):
val_data = train_data[i * num_val_samples: (i + 1) * num_val_samples]
val_targets = train_targets[i * num_val_samples: (i + 1) * num_val_samples]
partial_train_data = np.concatenate(
[train_data[:i * num_val_samples],
train_data[(i + 1) * num_val_samples:]],
axis=0)
partial_train_targets = np.concatenate(
[train_targets[:i * num_val_samples],
train_targets[(i + 1) * num_val_samples:]],
axis=0)
model = build_model()
#reshape the data to get the format (100, 66*2)
partial_train_data = partial_train_data.reshape(100, 66 * 2)
val_data = val_data.reshape(50, 66 * 2)
history = model.fit(partial_train_data,
partial_train_targets,
validation_data = (val_data, val_targets),
epochs = num_epochs,
batch_size = 1,
verbose = 1)
mae_history = history.history['val_mean_absolute_error']
all_mae_histories.append(mae_history)
average_mae_history = [
np.mean([x[i] for x in all_mae_histories]) for i in range(num_epochs)]
smooth_mae_history = smooth_curve(average_mae_history[10:])
plt.plot(range(1, len(smooth_mae_history) + 1), smooth_mae_history)
plt.xlabel('Epochs')
plt.ylabel('Validation MAE')
plt.show()
Obviously as it is, I need to get the best accuracy possible, but I am getting an "median absolute error(MAE)" like 96%, and this is inaceptable.
I see some basic bugs in this methodology. Your final layer of the network has a softmax layer. This would mean it would output 5 values, which sum to 1, and behave as a probability distribution. What you actually want to predict is true numbers, or rather floating point values (under some fixed precision arithmetic).
If you have a range, then probably using a sigmoid and rescaling the final layer would to match the range (just multiply with the max value) would help you. By default sigmoid would ensure you get 5 numbers between 0 and 1.
The other thing should be to remove the cross entropy loss and use a loss like RMS, so that you predict your numbers well. You could also used 1D convolutions instead of using Fully connected layers.
There has been some work here: https://julialang.org/blog/2017/10/gsoc-NeuralNetDiffEq which tries to solve DEs and might be relevant to your work.

indices = 2 is not in [0, 1)

I'm working on a seq2sql project and I successfully build a model but when training I get an error. I'm not using any Keras embedding layer.
M=13 #Question Length
d=40 #Dimention of the LSTM
C=12 #number of table Columns
batch_size=9
inputs1=Input(shape=(M,100),name='question_token')
Hq=Bidirectional(LSTM(d,return_sequences=True),name='QuestionENC')(inputs1) #this is HQ shape is (num_samples,13,80)
inputs2=Input(shape=(C,3,100),name='col_token')
col_lstm_layer=Bidirectional(LSTM(d,return_sequences=False),name='ColENC')
def hidd(te):
t=tf.Variable(initial_value=1,dtype=tf.int32)
for i in range(batch_size):
t=tf.assign(t,i)
Z = tf.nn.embedding_lookup(te, t)
print(col_lstm_layer(Z))
h=tf.reshape(col_lstm_layer(Z),[1,C,d*2])
if i==0:
# cols_last_hidden=tf.Variable(initial_value=h)
cols_last_hidden=tf.stack(h)#this is because it gives an error if we use tf.Variable here
else:
cols_last_hidden=tf.concat([cols_last_hidden,h],0)#shape of this one is (num_samples,num_col,80) 80 is last encoding of each column
return cols_last_hidden
cols_last_hidden=Lambda(hidd)(inputs2)
Hq=Dense(d*2,name='QuestionLastEncode')(Hq)
I=tf.Variable(initial_value=1,dtype=tf.int32)
J=tf.Variable(initial_value=1,dtype=tf.int32)
K=1
def get_col_att(tensors):
global K,all_col_attention
if K:
t=tf.Variable(initial_value=1,dtype=tf.int32)
for i in range(batch_size):
t=tf.assign(t,i)
x = tf.nn.embedding_lookup(tensors[0], t)
# print("tensors[1]:",tensors[1])
y = tf.nn.embedding_lookup(tensors[1], t)
# print("x shape",x.shape,"y shape",y.shape)
y=tf.transpose(y)
# print("x shape",x.shape,"y",y.shape)
Ecol=tf.reshape(tf.transpose(tf.tensordot(x,y,axes=1)),[1,C,M])
if i==0:
# all_col_attention=tf.Variable(initial_value=Ecol,name=""+i)
all_col_attention=tf.stack(Ecol)
else:
all_col_attention=tf.concat([all_col_attention,Ecol],0)
K=0
print("all_col_attention",all_col_attention)
return all_col_attention
total_alpha_sel_lambda=Lambda(get_col_att,name="Alpha")([Hq,cols_last_hidden])
total_alpha_sel=Dense(13,activation="softmax")(total_alpha_sel_lambda)
# print("Hq",Hq," total_alpha_sel_lambda shape",total_alpha_sel_lambda," total_alpha_sel shape",total_alpha_sel.shape)
def get_EQcol(tensors):
global K
if K:
t=tf.Variable(initial_value=1,dtype=tf.int32)
global all_Eqcol
for i in range(batch_size):
t=tf.assign(t,i)
x = tf.nn.embedding_lookup(tensors[0], t)
y = tf.nn.embedding_lookup(tensors[1], t)
Eqcol=tf.reshape(tf.tensordot(x,y,axes=1),[1,C,d*2])
if i==0:
# all_Eqcol=tf.Variable(initial_value=Eqcol,name=""+i)
all_Eqcol=tf.stack(Eqcol)
else:
all_Eqcol=tf.concat([all_Eqcol,Eqcol],0)
K=0
print("all_Eqcol",all_Eqcol)
return all_Eqcol
K=1
EQcol=Lambda(get_EQcol,name='EQcol')([total_alpha_sel,Hq])#total_alpha_sel(12x13) Hq(13xd*2)
EQcol=Dropout(.2)(EQcol)
L1=Dense(d*2,name='L1')(cols_last_hidden)
L2=Dense(d*2,name='L2')(EQcol)
L1_plus_L2=Add()([L1,L2])
pre=Flatten()(L1_plus_L2)
Psel=Dense(12,activation="softmax")(pre)
model=Model(inputs=[inputs1,inputs2],outputs=Psel)
model.compile(loss='categorical_crossentropy', optimizer='adam',metrics=['accuracy'])
model.summary()
earlyStopping=EarlyStopping(monitor='val_loss', patience=7, verbose=0, mode='auto')
history=model.fit([Equestion,Col_Embeddings],y_train,epochs=50,validation_split=.1,shuffle=False,callbacks=[earlyStopping],batch_size=batch_size)
The shapes of the Equestion, Col_Embeddings, and y_train are (10, 12, 3, 100) ,(10, 13, 100) and (10, 12).
I searched about this error but in all cases they have used an embedding layer incorrectly. Here I get this error even though I'm not using one.
indices = 2 is not in [0, 1)
[[{{node lambda_3/embedding_lookup_2}} = GatherV2[Taxis=DT_INT32, Tindices=DT_INT32, Tparams=DT_FLOAT, _class=["loc:#col_token_2"], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_col_token_2_0_1, lambda_3/Assign_2, lambda_3/embedding_lookup_2/axis)]]
The problem here was the batch size is defined at the graph level.here i have used batch_size =9 for the graph and yes i get batch size of 9 for training by the validation split .1 for the full batch size of 10 but for the validation i left only one sample because 10*.1 is one.
So the batch size of 1 cannot be passed to the graph because it needs batch size of 9.that's why this error comes
As for the solution i put the batch_size=1 and then it works fine also got a good accuracy by using batch_size=1.
Hope this will help someone.
Cheers ..
For me this error was due to bad form of my input data. You have to double check your input data to the model and it depends on your model input.

RNN model: Infer on sentences longer than max sequence length used during training

I am training a RNN model (using rnn.dynamic_rnn method) and my data matrix is of the shape num_examples x max_sequence_length x num_features. During training, I do not want to increase max_sequence_length to more than 50 or 100, since it increases the training time and memory. All the sentences in my training set was less than 50. However, during testing, I want the model to be able to infer on up to 500 tokens. Is it possible? How do I do it?
#sonal - Yes it is possible . Because , in testing most of the time , we are interested in passing a single example , rather than a bunch of data .
So , what you need is , you need to pass the array of single example lets say
test_index = [10 , 23 , 42 ,12 ,24, 50]
to a dynamic_rnn . The prediction has to happen based on the final hidden state . Inside dynamic_rnn , I think you can pass sentences beyond max_length in training . If it is not , you can write a custom decoder function , to calculate the GRU or LSTM states , with the weights you obtained while training . The idea is you can continue generate the output until you reach a maximum length for a testing case or until the model generated a 'EOS' special token . I prefer , you make use of a decoder , after you get the final hidden state from the encoder , this will give better results also .
# function to the while-loop, for early stopping
def decoder_cond(time, state, output_ta_t):
return tf.less(time, max_sequence_length)
# the body_builder is just a wrapper to parse feedback
def decoder_body_builder(feedback=False):
# the decoder body, this is where the RNN magic happens!
def decoder_body(time, old_state, output_ta_t):
# when validating we need previous prediction, handle in feedback
if feedback:
def from_previous():
prev_1 = tf.matmul(old_state, W_out) + b_out
a_max = tf.argmax(prev_1, 1)
#### Try to find the token index and stop the condition until you get a EOS token index .
return tf.gather(embeddings, a_max )
x_t = tf.cond(tf.equal(time, 0), from_previous, lambda: input_ta.read(0))
else:
# else we just read the next timestep
x_t = input_ta.read(time)
# calculate the GRU
z = tf.sigmoid(tf.matmul(x_t, W_z_x) + tf.matmul(old_state, W_z_h) + b_z) # update gate
r = tf.sigmoid(tf.matmul(x_t, W_r_x) + tf.matmul(old_state, W_r_h) + b_r) # reset gate
c = tf.tanh(tf.matmul(x_t, W_c_x) + tf.matmul(r*old_state, W_c_h) + b_c) # proposed new state
new_state = (1-z)*c + z*old_state # new state
# writing output
output_ta_t = output_ta_t.write(time, new_state)
# return in "input-to-next-step" style
return (time + 1, new_state, output_ta_t)
return decoder_body
# set up variables to loop with
output_ta = tensor_array_ops.TensorArray(tf.float32, size=1, dynamic_size=True, infer_shape=False)
time = tf.constant(0)
loop_vars = [time, initial_state, output_ta]
# run the while-loop for training
_, state, output_ta = tf.while_loop(decoder_cond,
decoder_body_builder(feedback = True),
loop_vars,
swap_memory=swap)
This is just a snippet code , try to modify it accordingly . More details can found in https://github.com/alrojo/tensorflow-tutorial

Tensorflow with Buckets Error

I'm trying to train a sequence to sequence model using tensorflow. I see that in the tutorials, buckets help speed up training. So far I'm able to train using just one bucket, and also using just one gpu and multiple buckets using more or less out of the box code, but when I try to use multiple buckets with multiple gpus, I get an error stating
Invalid argument: You must feed a value for placeholder tensor 'gpu_scope_0/encoder50_gpu0' with dtype int32
From the error, I can tell that I'm not declaring the input_feed correctly, so it is expecting the input to be of the size of the largest bucket every time. I'm confused about why this is the case, though, because in the examples that I'm adapting, it does the same thing when initializing the placeholders for the input_feed. As far as I can tell, the tutorials also initialize up to the largest sized bucket, but this error doesn't happen when I use the tutorials' code.
The following is what I think is the relevant initialization code:
self.encoder_inputs = [[] for _ in xrange(self.num_gpus)]
self.decoder_inputs = [[] for _ in xrange(self.num_gpus)]
self.target_weights = [[] for _ in xrange(self.num_gpus)]
self.scope_prefix = "gpu_scope"
for j in xrange(self.num_gpus):
with tf.device("/gpu:%d" % (self.gpu_offset + j)):
with tf.name_scope('%s_%d' % (self.scope_prefix, j)) as scope:
for i in xrange(buckets[-1][0]): # Last bucket is the biggest one.
self.encoder_inputs[j].append(tf.placeholder(tf.int32, shape=[None],
name="encoder{0}_gpu{1}".format(i,j)))
for i in xrange(buckets[-1][1] + 1):
self.decoder_inputs[j].append(tf.placeholder(tf.int32, shape=[None],
name="decoder{0}_gpu{1}".format(i,j)))
self.target_weights[j].append(tf.placeholder(tf.float32, shape=[None],
name="weight{0}_gpu{1}".format(i,j)))
# Our targets are decoder inputs shifted by one.
self.losses = []
self.outputs = []
# The following loss computation creates the neural network. The specified
# device hosts the trainable tf parameters.
bucket = buckets[0]
i = 0
with tf.device(param_device):
output, loss = tf.nn.seq2seq.model_with_buckets(self.encoder_inputs[i], self.decoder_inputs[i],
[self.decoder_inputs[i][k + 1] for k in
xrange(len(self.decoder_inputs[i]) - 1)],
self.target_weights[0], buckets,
lambda x, y: seq2seq_f(x, y, True),
softmax_loss_function=self.softmax_loss_function)
bucket = buckets[0]
self.encoder_states = []
with tf.device('/gpu:%d' % self.gpu_offset):
with variable_scope.variable_scope(variable_scope.get_variable_scope(),
reuse=True):
self.encoder_outputs, self.encoder_states = get_encoder_outputs(self,
self.encoder_inputs[0])
if not forward_only:
self.grads = []
print ("past line 297")
done_once = False
for i in xrange(self.num_gpus):
with tf.device("/gpu:%d" % (self.gpu_offset + i)):
with tf.name_scope("%s_%d" % (self.scope_prefix, i)) as scope:
with variable_scope.variable_scope(variable_scope.get_variable_scope(), reuse=True):
#for j, bucket in enumerate(buckets):
output, loss = tf.nn.seq2seq.model_with_buckets(self.encoder_inputs[i],
self.decoder_inputs[i],
[self.decoder_inputs[i][k + 1] for k in
xrange(len(self.decoder_inputs[i]) - 1)],
self.target_weights[i], buckets,
lambda x, y: seq2seq_f(x, y, True),
softmax_loss_function=self.softmax_loss_function)
self.losses.append(loss)
self.outputs.append(output)
# Training outputs and losses.
if forward_only:
self.outputs, self.losses = tf.nn.seq2seq.model_with_buckets(
self.encoder_inputs, self.decoder_inputs,
[self.decoder_inputs[0][k + 1] for k in xrange(buckets[0][1])],
self.target_weights, buckets, lambda x, y: seq2seq_f(x, y, True),
softmax_loss_function=self.softmax_loss_function)
# If we use output projection, we need to project outputs for decoding.
if self.output_projection is not None:
for b in xrange(len(buckets)):
self.outputs[b] = [
tf.matmul(output, self.output_projection[0]) + self.output_projection[1]
for output in self.outputs[b]
]
else:
self.bucket_grads = []
self.gradient_norms = []
params = tf.trainable_variables()
opt = tf.train.GradientDescentOptimizer(self.learning_rate)
self.updates = []
with tf.device(aggregation_device):
for g in xrange(self.num_gpus):
for b in xrange(len(buckets)):
gradients = tf.gradients(self.losses[g][b], params)
clipped_grads, norm = tf.clip_by_global_norm(gradients, max_gradient_norm)
self.gradient_norms.append(norm)
self.updates.append(
opt.apply_gradients(zip(clipped_grads, params), global_step=self.global_step))
and the following is the relevant code when feeding in data:
input_feed = {}
for i in xrange(self.num_gpus):
for l in xrange(encoder_size):
input_feed[self.encoder_inputs[i][l].name] = encoder_inputs[i][l]
for l in xrange(decoder_size):
input_feed[self.decoder_inputs[i][l].name] = decoder_inputs[i][l]
input_feed[self.target_weights[i][l].name] = target_weights[i][l]
# Since our targets are decoder inputs shifted by one, we need one more.
last_target = self.decoder_inputs[i][decoder_size].name
input_feed[last_target] = np.zeros([self.batch_size], dtype=np.int32)
last_weight = self.target_weights[i][decoder_size].name
input_feed[last_weight] = np.zeros([self.batch_size], dtype=np.float32)
# Output feed: depends on whether we do a backward step or not.
if not forward_only:
output_feed = [self.updates[bucket_id], self.gradient_norms[bucket_id], self.losses[bucket_id]]
else:
output_feed = [self.losses[bucket_id]] # Loss for this batch.
for l in xrange(decoder_size): # Output logits.
output_feed.append(self.outputs[0][l])
Right now I'm considering just padding every input up to the bucket size, but I expect that this would lose some of the advantages of bucketing
Turns out the issue with this was not in the feeding of the placeholders, but was later on in my code where I referred to placeholders that weren't initialized. As far as I can tell when I fixed the later issues this error stopped