dimension of tf.Variables change after some epochs - tensorflow

I am new to TensorFlow and I am learning.
I define some variables and start training. Everything runs smoothly for the first epochs but suddenly it throws the following error:
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Matrix size-incompatible: In[0]: [17952,50], In[1]: [0,20]
[[{{node gradients/Embeddings_1/MatMul_grad/MatMul_1}}]]
[[gradients/Embeddings_1/MatMul_grad/tuple/control_dependency/_1867]]
(1) Invalid argument: Matrix size-incompatible: In[0]: [17952,50], In[1]: [0,20]
[[{{node gradients/Embeddings_1/MatMul_grad/MatMul_1}}]]
My problem is that why it is giving the error after some epochs and not in the first place. Usually, these types of errors are thrown when the graph is built.
This is my code for creating the variables and embedding the trees:
def __init__(self, vocab, embedding):
self.add_model_variables()
with tf.variable_scope("Embeddings", reuse=True):
with tf.device('/cpu:0'):
w_embed = tf.get_variable('WE', [self.vocab_embedding_size, self.embed_size])
b_embed = tf.get_variable('bE', [1, self.embed_size])
embeddings = tf.get_variable('embeddings')
self.embeddings = tf.add(tf.matmul(embeddings, w_embed), b_embed)
def add_model_variables(self):
myinitilizer = tf.random_uniform_initializer(-self.calc_wt_init(),self.calc_wt_init())
with tf.variable_scope('Embeddings'):
with tf.device('/cpu:0'):
w_embed = tf.get_variable('WE', [self.vocab_embedding_size, self.embed_size], initializer = myinitilizer)
b_embed = tf.get_variable('bE', [1, self.embed_size], initializer = myinitilizer)
embeddings = tf.get_variable('embeddings',
initializer=tf.convert_to_tensor(self.pretrained_embedding),
dtype=tf.float32)
with tf.variable_scope('Composition'):
self.W1 = tf.get_variable('W1', [2 * self.embed_size, self.embed_size], initializer = myinitilizer)
self.b1 = tf.get_variable('b1', [1, self.embed_size], initializer = myinitilizer)
with tf.variable_scope('Projection'):
self.U = tf.get_variable('U', [self.embed_size, 1], initializer = myinitilizer)
self.bu = tf.get_variable('bu', [self.max_number_nodes, 1], initializer = myinitilizer)
def embed_tree(self, batch_index):
def combine_children( left_tensor, right_tensor):
return tf.nn.relu(tf.matmul(tf.concat([left_tensor, right_tensor], axis=1, name='combine_children'), self.W1) + self.b1)
def embed_word(word_index):
with tf.device('/cpu:0'):
return tf.expand_dims(tf.gather(self.embeddings, word_index), 0)
def loop_body(node_tensors, i):
node_is_leaf = tf.gather(is_leaf, i)
word = tf.gather(words, i)
left_child = tf.gather(left_children, i)
right_child = tf.gather(right_children, i)
node_tensor = tf.cond(
node_is_leaf,
lambda: embed_word(word),
lambda: combine_children(
node_tensors.read(n-right_child),
node_tensors.read(n-left_child)))
node_tensors = node_tensors.write(i, node_tensor)
i = tf.add(i, 1)
return node_tensors, i
is_leaf = tf.gather(self.batch_is_leaf, batch_index)
left_children = tf.gather(self.batch_left_children, batch_index)
right_children = tf.gather(self.batch_right_children, batch_index)
words = tf.gather(self.batch_words, batch_index)
n = tf.reduce_sum(tf.cast(tf.not_equal(left_children, -1), tf.int32))-2
#iself.batch_operation = tf.print(batch_index,'N::::::::',output_stream=sys.stdout)
node_tensors = tf.TensorArray(tf.float32, size=self.max_number_nodes,
dynamic_size=False, clear_after_read=False, element_shape=[1, self.embed_size])
loop_cond = lambda node_tensors, i: tf.less(i, n+2)
#with tf.control_dependencies([self.batch_operation]):
node_tensors, _ = tf.while_loop(loop_cond, loop_body, [node_tensors, 0], parallel_iterations=1)
tree_embedding = tf.convert_to_tensor(node_tensors.stack())
return tree_embedding
The other problem is that I cannot replicate the error as it happens occasionally.
Update:
When I reduce the batch_size, the chance of getting this error reduces.
Is it possible for this to be because of working close to GPU memory limit?

The tf.gather produces zeros for invalid indices on GPU (it works correctly on CPU however). In other words, Tensorflow does not check for the range of indices while running on GPU.
The errors caused by returned 0s accumulate on the gradient and finally result in confusing error messages that are not related to the original problem.
For reference:
https://github.com/tensorflow/tensorflow/issues/3638
I changed tf.gather to index-based retrieval(a[i]) and the problem is fixed. I don't know exactly why!

Related

Why am I getting shape errors when trying to pass a batch from the Tensorflow Dataset API to my session operations?

I am dealing with an issue in my conversion over to the Dataset API and I guess I just don't have enough experience yet with the API to know how to handle the below situation. We currently have image augmentation that we perform currently using queueing and batching. I was tasked with checking out the new Dataset API and converting over our existing implementation using it rather than queues.
What we would like to do is get a reference to all the paths and handle all operations from just that reference. As you see in the dataset initialization, I have mapped the parse_fn to the dataset itself which then goes about reading the file and extracting the initial values from the filenames. However when I then go about calling the iterators next_batch method and then pass those values to get_summary, I'm now getting an error around shape. I have been trying a number of things which just keeps changing the error and so I felt I should see if anyone on SO saw possibly that I was going about this all wrong and should be taking a different route. Does anything jump out as absolutely wrong in my use of the Dataset API?
Should I not be calling the ops this way any longer? I noticed the majority of the examples I saw they would get the batch, pass the variables to the op and then capture that in a variable and pass that to sess.run, however I haven't found an easy way of doing that as of yet with our setup that wasn't erroring so this was the approach I took instead (but its still erroring). I'll be continuing to try to trace down the problem and post here should I find anything, but if anyone sees something please advise. Thanks!
Current Error:
... in get_summary summary, acc = sess.run([self._summary_op,
self._accuracy], feed_dict=feed_dict) ValueError: Cannot feed value of
shape (32,) for Tensor 'ph_input_labels:0', which has shape '(?, 1)
Below is the block where the get_summary method is called and error is fired:
def perform_train():
if __name__ == '__main__':
#Get all our image paths
filenames = data_layer_train.get_image_paths()
next_batch, iterator = preproc_image_fn(filenames=filenames)
with tf.Session(config=tf.ConfigProto(gpu_options=gpu_options)) as sess:
with sess.graph.as_default():
# Set the random seed for tensorflow
tf.set_random_seed(cfg.RNG_SEED)
classifier_network = c_common.create_model(len(products_to_class_dict), is_training=True)
optimizer, global_step_var = c_common.create_optimizer(classifier_network)
sess.run(tf.local_variables_initializer())
sess.run(tf.global_variables_initializer())
# Init tables and dataset iterator
sess.run(tf.tables_initializer())
sess.run(iterator.initializer)
cur_epoch = 0
blobs = None
try:
epoch_size = data_layer_train.get_steps_per_epoch()
num_steps = num_epochs * epoch_size
for step in range(num_steps):
timer_summary.tic()
if blobs is None:
#Now populate from our training dataset
blobs = sess.run(next_batch)
# *************** Below is where it is erroring *****************
summary_train, acc = classifier_network.get_summary(sess, blobs["images"], blobs["labels"], blobs["weights"])
...
Believe the error is in preproc_image_fn:
def preproc_image_fn(filenames, images=None, labels=None, image_paths=None, cells=None, weights=None):
def _parse_fn(filename, label, weight):
augment_instance = False
paths=[]
selected_cells=[]
if vals.FIRST_ITER:
#Perform our check of the path to see if _data_augmentation is within it
#If so set augment_instance to true and replace the substring with an empty string
new_filename = tf.regex_replace(filename, "_data_augmentation", "")
contains = tf.equal(tf.size(tf.string_split([filename], "")), tf.size(tf.string_split([new_filename])))
filename = new_filename
if contains is True:
augment_instance = True
core_file = tf.string_split([filename], '\\').values[-1]
product_id = tf.string_split([core_file], ".").values[0]
label = search_tf_table_for_entry(product_id)
weight = data_layer_train.get_weights(product_id)
image_string = tf.read_file(filename)
img = tf.image.decode_image(image_string, channels=data_layer_train._channels)
img.set_shape([None, None, None])
img = tf.image.resize_images(img, [data_layer_train._target_height, data_layer_train._target_width])
#Previously I was returning the below, but I was getting an error from the op when assigning feed_dict stating that it didnt like the dictionary
#retval = dict(zip([filename], [img])), label, weight
retval = img, label, weight
return retval
num_files = len(filenames)
filenames = tf.constant(filenames)
#*********** Setup dataset below ************
dataset = tf.data.Dataset.from_tensor_slices((filenames, labels, weights))
dataset=dataset.map(_parse_fn)
dataset = dataset.repeat()
dataset = dataset.batch(32)
iterator = dataset.make_initializable_iterator()
batch_features, batch_labels, batch_weights = iterator.get_next()
return {'images': batch_features, 'labels': batch_labels, 'weights': batch_weights}, iterator
def search_tf_table_for_entry(self, product_id):
'''Looks up keys in the table and outputs the values. Will return -1 if not found '''
if product_id is not None:
return self._products_to_class_table.lookup(product_id)
else:
if not self._real_eval:
logger().info("class not found in training {} ".format(product_id))
return -1
Where I create the model and have the placeholders used previously:
...
def create_model(self):
weights_regularizer = tf.contrib.layers.l2_regularizer(cfg.TRAIN.WEIGHT_DECAY)
biases_regularizer = weights_regularizer
# Input data.
self._input_images = tf.placeholder(
tf.float32, shape=(None, self._image_height, self._image_width, self._num_channels), name="ph_input_images")
self._input_labels = tf.placeholder(tf.int64, shape=(None, 1), name="ph_input_labels")
self._input_weights = tf.placeholder(tf.float32, shape=(None, 1), name="ph_input_weights")
self._is_training = tf.placeholder(tf.bool, name='ph_is_training')
self._keep_prob = tf.placeholder(tf.float32, name="ph_keep_prob")
self._accuracy = tf.reduce_mean(tf.cast(self._correct_prediction, tf.float32))
...
self.create_summaries()
def create_summaries(self):
val_summaries = []
with tf.device("/cpu:0"):
for var in self._act_summaries:
self._add_act_summary(var)
for var in self._train_summaries:
self._add_train_summary(var)
self._summary_op = tf.summary.merge_all()
self._summary_op_val = tf.summary.merge(val_summaries)
def get_summary(self, sess, images, labels, weights):
feed_dict = {self._input_images: images, self._input_labels: labels,
self._input_weights: weights, self._is_training: False}
summary, acc = sess.run([self._summary_op, self._accuracy], feed_dict=feed_dict)
return summary, acc
Since the error says:
Cannot feed value of shape (32,) for Tensor 'ph_input_labels:0', which has shape '(?, 1)
My guess is your labels in get_summary has the shape [32]. Can you just reshape it to (32, 1)? Or maybe reshape the label earlier in _parse_fn?

Tensorflow : Predict in Recurrent Neural Networks for Drawing Classification tutorial

I used the tutorial code from https://www.tensorflow.org/tutorials/recurrent_quickdraw and all works fine until I tried to make a prediction instead of just evaluate it.
I wrote a new input function for prediction, based on the code in create_dataset.py
def predict_input_fn():
def parse_line(stroke_points):
"""Parse an ndjson line and return ink (as np array) and classname."""
inkarray = json.loads(stroke_points)
stroke_lengths = [len(stroke[0]) for stroke in inkarray]
total_points = sum(stroke_lengths)
np_ink = np.zeros((total_points, 3), dtype=np.float32)
current_t = 0
for stroke in inkarray:
for i in [0, 1]:
np_ink[current_t:(current_t + len(stroke[0])), i] = stroke[i]
current_t += len(stroke[0])
np_ink[current_t - 1, 2] = 1 # stroke_end
# Preprocessing.
# 1. Size normalization.
lower = np.min(np_ink[:, 0:2], axis=0)
upper = np.max(np_ink[:, 0:2], axis=0)
scale = upper - lower
scale[scale == 0] = 1
np_ink[:, 0:2] = (np_ink[:, 0:2] - lower) / scale
# 2. Compute deltas.
np_ink = np_ink[1:, 0:2] - np_ink[0:-1, 0:2]
np_ink = np_ink[1:, :]
features = {}
features["ink"] = tf.train.Feature(float_list=tf.train.FloatList(value=np_ink.flatten()))
features["shape"] = tf.train.Feature(int64_list=tf.train.Int64List(value=np_ink.shape))
f = tf.train.Features(feature=features)
example = tf.train.Example(features=f)
#t = tf.constant(np_ink)
return example
def parse_example(example):
"""Parse a single record which is expected to be a tensorflow.Example."""
# feature_to_type = {
# "ink": tf.VarLenFeature(dtype=tf.float32),
# "shape": tf.FixedLenFeature((0,2), dtype=tf.int64)
# }
feature_to_type = {
"ink": tf.VarLenFeature(dtype=tf.float32),
"shape": tf.FixedLenFeature([2], dtype=tf.int64)
}
example_proto = example.SerializeToString()
parsed_features = tf.parse_single_example(example_proto, feature_to_type)
parsed_features["ink"] = tf.sparse_tensor_to_dense(parsed_features["ink"])
#parsed_features["shape"].set_shape((2))
return parsed_features
example = parse_line(FLAGS.predict_input_stroke_data)
features = parse_example(example)
dataset = tf.data.Dataset.from_tensor_slices(features)
# Our inputs are variable length, so pad them.
dataset = dataset.padded_batch(FLAGS.batch_size, padded_shapes=dataset.output_shapes)
iterator = dataset.make_one_shot_iterator()
next_feature_batch = iterator.get_next()
return next_feature_batch, None # In prediction, we have no labels
I modified the existing model_fn() function and added below at appropirate place
predictions = tf.argmax(logits, axis=1)
if mode == tf.estimator.ModeKeys.PREDICT:
preds = {
"class_index": predictions,
"probabilities": tf.nn.softmax(logits),
'logits': logits
}
return tf.estimator.EstimatorSpec(mode, predictions=preds)
However when i call the following the code
if (FLAGS.predict_input_stroke_data != None):
# prepare_input_tfrecord_for_prediction()
# predict_results = estimator.predict(input_fn=get_input_fn(
# mode=tf.estimator.ModeKeys.PREDICT,
# tfrecord_pattern=FLAGS.predict_input_temp_file,
# batch_size=FLAGS.batch_size))
predict_results = estimator.predict(input_fn=predict_input_fn)
for idx, prediction in enumerate(predict_results):
type = prediction["class_ids"][0] # Get the predicted class (index)
print("Prediction Type: {}\n".format(type))
I get the following error, what is wrong in my code could anyone please help me. I have tried quite a few things to get the shape right but i am unable to. I also tried to first write my strokes data as a tfrecord and then use the existing input_fn to read from the tfrecord that gives me similar errors but slighly different
File "/Users/farooq/.virtualenvs/tensor1.0/lib/python3.6/site-packages/tensorflow/python/framework/common_shapes.py", line 627, in call_cpp_shape_fn
require_shape_fn)
File "/Users/farooq/.virtualenvs/tensor1.0/lib/python3.6/site-packages/tensorflow/python/framework/common_shapes.py", line 691, in _call_cpp_shape_fn_impl
raise ValueError(err.message)
ValueError: Shape must be rank 2 but is rank 1 for 'Slice' (op: 'Slice') with input shapes: [?], [2], [2].
I finally solved the problem by taking my input keystrokes, writing them to disk as a TFRecord. I also had to write the same inputstrokes batch_size times to same TFRecord, else i got the shape mismatch errors. And then invoking predict worked.
The main addition for prediction was the following function
def create_tfrecord_for_prediction(batch_size, stoke_data, tfrecord_file):
def parse_line(stoke_data):
"""Parse provided stroke data and ink (as np array) and classname."""
inkarray = json.loads(stoke_data)
stroke_lengths = [len(stroke[0]) for stroke in inkarray]
total_points = sum(stroke_lengths)
np_ink = np.zeros((total_points, 3), dtype=np.float32)
current_t = 0
for stroke in inkarray:
if len(stroke[0]) != len(stroke[1]):
print("Inconsistent number of x and y coordinates.")
return None
for i in [0, 1]:
np_ink[current_t:(current_t + len(stroke[0])), i] = stroke[i]
current_t += len(stroke[0])
np_ink[current_t - 1, 2] = 1 # stroke_end
# Preprocessing.
# 1. Size normalization.
lower = np.min(np_ink[:, 0:2], axis=0)
upper = np.max(np_ink[:, 0:2], axis=0)
scale = upper - lower
scale[scale == 0] = 1
np_ink[:, 0:2] = (np_ink[:, 0:2] - lower) / scale
# 2. Compute deltas.
#np_ink = np_ink[1:, 0:2] - np_ink[0:-1, 0:2]
#np_ink = np_ink[1:, :]
np_ink[1:, 0:2] -= np_ink[0:-1, 0:2]
np_ink = np_ink[1:, :]
features = {}
features["ink"] = tf.train.Feature(float_list=tf.train.FloatList(value=np_ink.flatten()))
features["shape"] = tf.train.Feature(int64_list=tf.train.Int64List(value=np_ink.shape))
f = tf.train.Features(feature=features)
ex = tf.train.Example(features=f)
return ex
if stoke_data is None:
print("Error: Stroke data cannot be none")
return
example = parse_line(stoke_data)
#Remove the file if it already exists
if tf.gfile.Exists(tfrecord_file):
tf.gfile.Remove(tfrecord_file)
writer = tf.python_io.TFRecordWriter(tfrecord_file)
for i in range(batch_size):
writer.write(example.SerializeToString())
writer.flush()
writer.close()
Then in the main function you just have to invoke estimator.predict() reusing the same input_fn=get_input_fn(...)argument except point it to the temporary created tfrecord_file
Hope this helps

tensorfow tf.expand_dims Error

I'm beginner in tensorflow and i use tf.expand_dims and i get error which i can't understand the reason , so what am i missing?
This is the code
ML_OUTPUT = None
input_for_classification = None
def ConstructML( input_tensor, layers_count, node_for_each_layer):
global ML_OUTPUT
global input_for_classification
FeatureVector = np.array(input_tensor)
FeatureVector = FeatureVector.flatten()
print(FeatureVector.shape)
ML_ModelINT(FeatureVector, layers_count, node_for_each_layer)
def ML_ModelINT(FeatureVector, layers_count, node_for_each_layer):
hidden_layer = []
Alloutputs = []
hidden_layer.append({'weights': tf.Variable(tf.random_normal([FeatureVector.shape[0], node_for_each_layer[0]])),'biases': tf.Variable(tf.random_normal([node_for_each_layer[0]]))})
for i in range(1, layers_count):
hidden_layer.append({'weights': tf.Variable(tf.random_normal([node_for_each_layer[i - 1], node_for_each_layer[i]])),'biases': tf.Variable(tf.random_normal([node_for_each_layer[i]]))})
FeatureVector = tf.expand_dims(FeatureVector,0)
layers_output = tf.add(tf.matmul(FeatureVector, hidden_layer[0]['weights']), hidden_layer[0]['biases'])
layers_output = tf.nn.relu(layers_output)
Alloutputs.append(layers_output)
for j in range(1, layers_count):
layers_output = tf.add(tf.matmul(layers_output, hidden_layer[j]['weights']), hidden_layer[j]['biases'])
layers_output = tf.nn.relu(layers_output)
Alloutputs.append(layers_output)
ML_OUTPUT = layers_output
input_for_classification = Alloutputs[1]
return ML_OUTPUT
ML_Net = ConstructML(input,3,[1024,512,256])
And it give me error in this line
FeatureVector = tf.expand_dims(FeatureVector,0)
The error is Expected binary or unicode string, got tf.Tensor 'Relu_11:0' shape=(?, 7, 7, 512) dtype=float32
Note The input is output tensor of another network and it is works well
Okey, the numpy part was the error because when predection function is firstly called it has no feed yet for input_imgs and numpy code will not be excuted correctly, and i replaced it with tensorflow ops and it is worked now.

How to use `sparse_softmax_cross_entropy_with_logits`: without getting Incompatible Shapes Error

I would like to use the sparse_softmax_cross_entropy_with_logits
with the julia TensorFlow wrapper.
The operations is defined in the code here.
Basically, as I understand it the first argument should be logits, that would normally be fed to softmax to get them to be category probabilities (~1hot output).
And the second should be the correct labels as label ids.
I have adjusted the example code from the TensorFlow.jl readme
See below:
using Distributions
using TensorFlow
# Generate some synthetic data
x = randn(100, 50)
w = randn(50, 10)
y_prob = exp(x*w)
y_prob ./= sum(y_prob,2)
function draw(probs)
y = zeros(size(probs))
for i in 1:size(probs, 1)
idx = rand(Categorical(probs[i, :]))
y[i, idx] = 1
end
return y
end
y = draw(y_prob)
# Build the model
sess = Session(Graph())
X = placeholder(Float64)
Y_obs = placeholder(Float64)
Y_obs_lbl = indmax(Y_obs, 2)
variable_scope("logisitic_model", initializer=Normal(0, .001)) do
global W = get_variable("weights", [50, 10], Float64)
global B = get_variable("bias", [10], Float64)
end
L = X*W + B
Y=nn.softmax(L)
#costs = log(Y).*Y_obs #Dense (Orginal) way
costs = nn.sparse_softmax_cross_entropy_with_logits(L, Y_obs_lbl+1) #sparse way
Loss = -reduce_sum(costs)
optimizer = train.AdamOptimizer()
minimize_op = train.minimize(optimizer, Loss)
saver = train.Saver()
# Run training
run(sess, initialize_all_variables())
cur_loss, _ = run(sess, [Loss, minimize_op], Dict(X=>x, Y_obs=>y))
When I run it however, I get an error:
Tensorflow error: Status: Incompatible shapes: [1,100] vs. [100,10]
[[Node: gradients/SparseSoftmaxCrossEntropyWithLogits_10_grad/mul = Mul[T=DT_DOUBLE, _class=[], _device="/job:localhost/replica:0/task:0/cpu:0"](gradients/SparseSoftmaxCrossEntropyWithLogits_10_grad/ExpandDims, SparseSoftmaxCrossEntropyWithLogits_10:1)]]
in check_status(::TensorFlow.Status) at /home/ubuntu/.julia/v0.5/TensorFlow/src/core.jl:101
in run(::TensorFlow.Session, ::Array{TensorFlow.Port,1}, ::Array{Any,1}, ::Array{TensorFlow.Port,1}, ::Array{Ptr{Void},1}) at /home/ubuntu/.julia/v0.5/TensorFlow/src/run.jl:96
in run(::TensorFlow.Session, ::Array{TensorFlow.Tensor,1}, ::Dict{TensorFlow.Tensor,Array{Float64,2}}) at /home/ubuntu/.julia/v0.5/TensorFlow/src/run.jl:143
This only happens when I try to train it.
If I don't include an optimise function/output then it works fine.
So I am doing something that screws up the gradient math.

Tensorflow with Buckets Error

I'm trying to train a sequence to sequence model using tensorflow. I see that in the tutorials, buckets help speed up training. So far I'm able to train using just one bucket, and also using just one gpu and multiple buckets using more or less out of the box code, but when I try to use multiple buckets with multiple gpus, I get an error stating
Invalid argument: You must feed a value for placeholder tensor 'gpu_scope_0/encoder50_gpu0' with dtype int32
From the error, I can tell that I'm not declaring the input_feed correctly, so it is expecting the input to be of the size of the largest bucket every time. I'm confused about why this is the case, though, because in the examples that I'm adapting, it does the same thing when initializing the placeholders for the input_feed. As far as I can tell, the tutorials also initialize up to the largest sized bucket, but this error doesn't happen when I use the tutorials' code.
The following is what I think is the relevant initialization code:
self.encoder_inputs = [[] for _ in xrange(self.num_gpus)]
self.decoder_inputs = [[] for _ in xrange(self.num_gpus)]
self.target_weights = [[] for _ in xrange(self.num_gpus)]
self.scope_prefix = "gpu_scope"
for j in xrange(self.num_gpus):
with tf.device("/gpu:%d" % (self.gpu_offset + j)):
with tf.name_scope('%s_%d' % (self.scope_prefix, j)) as scope:
for i in xrange(buckets[-1][0]): # Last bucket is the biggest one.
self.encoder_inputs[j].append(tf.placeholder(tf.int32, shape=[None],
name="encoder{0}_gpu{1}".format(i,j)))
for i in xrange(buckets[-1][1] + 1):
self.decoder_inputs[j].append(tf.placeholder(tf.int32, shape=[None],
name="decoder{0}_gpu{1}".format(i,j)))
self.target_weights[j].append(tf.placeholder(tf.float32, shape=[None],
name="weight{0}_gpu{1}".format(i,j)))
# Our targets are decoder inputs shifted by one.
self.losses = []
self.outputs = []
# The following loss computation creates the neural network. The specified
# device hosts the trainable tf parameters.
bucket = buckets[0]
i = 0
with tf.device(param_device):
output, loss = tf.nn.seq2seq.model_with_buckets(self.encoder_inputs[i], self.decoder_inputs[i],
[self.decoder_inputs[i][k + 1] for k in
xrange(len(self.decoder_inputs[i]) - 1)],
self.target_weights[0], buckets,
lambda x, y: seq2seq_f(x, y, True),
softmax_loss_function=self.softmax_loss_function)
bucket = buckets[0]
self.encoder_states = []
with tf.device('/gpu:%d' % self.gpu_offset):
with variable_scope.variable_scope(variable_scope.get_variable_scope(),
reuse=True):
self.encoder_outputs, self.encoder_states = get_encoder_outputs(self,
self.encoder_inputs[0])
if not forward_only:
self.grads = []
print ("past line 297")
done_once = False
for i in xrange(self.num_gpus):
with tf.device("/gpu:%d" % (self.gpu_offset + i)):
with tf.name_scope("%s_%d" % (self.scope_prefix, i)) as scope:
with variable_scope.variable_scope(variable_scope.get_variable_scope(), reuse=True):
#for j, bucket in enumerate(buckets):
output, loss = tf.nn.seq2seq.model_with_buckets(self.encoder_inputs[i],
self.decoder_inputs[i],
[self.decoder_inputs[i][k + 1] for k in
xrange(len(self.decoder_inputs[i]) - 1)],
self.target_weights[i], buckets,
lambda x, y: seq2seq_f(x, y, True),
softmax_loss_function=self.softmax_loss_function)
self.losses.append(loss)
self.outputs.append(output)
# Training outputs and losses.
if forward_only:
self.outputs, self.losses = tf.nn.seq2seq.model_with_buckets(
self.encoder_inputs, self.decoder_inputs,
[self.decoder_inputs[0][k + 1] for k in xrange(buckets[0][1])],
self.target_weights, buckets, lambda x, y: seq2seq_f(x, y, True),
softmax_loss_function=self.softmax_loss_function)
# If we use output projection, we need to project outputs for decoding.
if self.output_projection is not None:
for b in xrange(len(buckets)):
self.outputs[b] = [
tf.matmul(output, self.output_projection[0]) + self.output_projection[1]
for output in self.outputs[b]
]
else:
self.bucket_grads = []
self.gradient_norms = []
params = tf.trainable_variables()
opt = tf.train.GradientDescentOptimizer(self.learning_rate)
self.updates = []
with tf.device(aggregation_device):
for g in xrange(self.num_gpus):
for b in xrange(len(buckets)):
gradients = tf.gradients(self.losses[g][b], params)
clipped_grads, norm = tf.clip_by_global_norm(gradients, max_gradient_norm)
self.gradient_norms.append(norm)
self.updates.append(
opt.apply_gradients(zip(clipped_grads, params), global_step=self.global_step))
and the following is the relevant code when feeding in data:
input_feed = {}
for i in xrange(self.num_gpus):
for l in xrange(encoder_size):
input_feed[self.encoder_inputs[i][l].name] = encoder_inputs[i][l]
for l in xrange(decoder_size):
input_feed[self.decoder_inputs[i][l].name] = decoder_inputs[i][l]
input_feed[self.target_weights[i][l].name] = target_weights[i][l]
# Since our targets are decoder inputs shifted by one, we need one more.
last_target = self.decoder_inputs[i][decoder_size].name
input_feed[last_target] = np.zeros([self.batch_size], dtype=np.int32)
last_weight = self.target_weights[i][decoder_size].name
input_feed[last_weight] = np.zeros([self.batch_size], dtype=np.float32)
# Output feed: depends on whether we do a backward step or not.
if not forward_only:
output_feed = [self.updates[bucket_id], self.gradient_norms[bucket_id], self.losses[bucket_id]]
else:
output_feed = [self.losses[bucket_id]] # Loss for this batch.
for l in xrange(decoder_size): # Output logits.
output_feed.append(self.outputs[0][l])
Right now I'm considering just padding every input up to the bucket size, but I expect that this would lose some of the advantages of bucketing
Turns out the issue with this was not in the feeding of the placeholders, but was later on in my code where I referred to placeholders that weren't initialized. As far as I can tell when I fixed the later issues this error stopped