slim inception_v4 retrain ValueError: All shapes must be fully defined - tensorflow

I got the following error
ValueError: All shapes must be fully defined: [TensorShape([Dimension(299), Dimension(299), Dimension(3)]), TensorShape([Dimension(None)])]
when training the inception_v4 with slim.
Full traceback
Traceback (most recent call last):
File "../models/slim/train_vienna_classifier.py", line 575, in <module>
tf.app.run()
File "/home/osman/anaconda2/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 44, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "../models/slim/train_vienna_classifier.py", line 441, in main
capacity=5 * FLAGS.batch_size)
File "/home/osman/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/input.py", line 872, in batch
name=name)
File "/home/osman/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/input.py", line 658, in _batch
capacity=capacity, dtypes=types, shapes=shapes, shared_name=shared_name)
File "/home/osman/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/data_flow_ops.py", line 685, in __init__
shapes = _as_shape_list(shapes, dtypes)
File "/home/osman/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/data_flow_ops.py", line 77, in _as_shape_list
raise ValueError("All shapes must be fully defined: %s" % shapes)
ValueError: All shapes must be fully defined: [TensorShape([Dimension(299), Dimension(299), Dimension(3)]), TensorShape([Dimension(None)])]
The code
with tf.device(deploy_config.inputs_device()):
provider = slim.dataset_data_provider.DatasetDataProvider(
dataset,
num_readers=FLAGS.num_readers,
common_queue_capacity=20 * FLAGS.batch_size,
common_queue_min=10 * FLAGS.batch_size)
[image, label] = provider.get(['image', 'label'])
label -= FLAGS.labels_offset
train_image_size = FLAGS.train_image_size or network_fn.default_image_size
image = image_preprocessing_fn(image, train_image_size, train_image_size)
images, labels = tf.train.batch(
[image, label],
batch_size=FLAGS.batch_size,
num_threads=FLAGS.num_preprocessing_threads,
capacity=5 * FLAGS.batch_size)
labels = slim.one_hot_encoding(
labels, dataset.num_classes - FLAGS.labels_offset)
batch_queue = slim.prefetch_queue.prefetch_queue(
[images, labels], capacity=2 * deploy_config.num_clones)
Although the sizes of images in the dataset are different, I am using the given preprocessing function for resizing them, therefore it should not return the error. Am I correct?

The issue is not with the images but the labels as its shape is not defined: [TensorShape([Dimension(299), Dimension(299), Dimension(3)]), TensorShape([Dimension(None)])]. The second tensor dimension is shown as None. So setting the labels to the correct shape should fix this issue.
Use the tf.reshape() function to set the shape of the labels.

Related

How to deal with incompatible shapes while trying to fit a model in Tensorflow Keras

So, I have done some modifications with the VGG16 neural network to make a linear-regression and classification model.
And at last I am writing the following code to compile and fit my data into the model to initiate training->
class Facetracker(Model):
# The initialization function-
def __init__(self,eyetracker,**kwargs):
super().__init__(**kwargs)
self.model = eyetracker #Instantiating the model
def compile(self,opt,classlosss,localization_loss,**kwargs):
super().compile(**kwargs)
self.classloss = class_loss
self.localization_loss = regress_loss
self.opt = optimizer
# Defining the training step
def train_step(self,batch,**kwargs):
X,y = batch #unpacking our data
with tf.GradientTape() as tape:
classes,coords = self.model(X,training=True)
batch_classloss = self.classloss(y[0],classes)
batch_localloss = self.localization_loss(tf.cast(y[1],tf.float32),coords)
# calculating total loss-
total_loss = batch_localloss+0.5*batch_classloss
grad = tape.gradient(total_loss,self.model.trainable_variables)
optimizer.apply_gradients(zip(grad,self.model.trainable_variables))
return{
"total_loss":total_loss,
"class_loss":batch_classloss,
"localilzation_loss":batch_localloss
}
def test_step(self,batch):
X,y = batch
classes,coords = self.model(X,training=False)
batch_classloss = self.classloss(y[0],classes)
batch_localloss = self.localization_loss(tf.cast(y[1],tf.float32),coords)
total_loss = batch_localloss+0.5*batch_classloss
return{
"total_loss": total_loss,
"class_loss": batch_classloss,
"localilzation_loss": batch_localloss
}
# def call(self, X, **kwargs):
# return self.model(X,**kwargs)
# Replacing the call function with a lambda function
lambda self,X,**kwargs: self.model(X,**kwargs)
# Subclassing our model-
print("Subclassing.....")
model = Facetracker(facetracker)
print("Compiling......")
model.compile(optimizer,classlosss=class_loss,localization_loss=localization_loss)
# Preparing the log directory
logdir="logdir"
tensorboard_callbacks = tf.keras.callbacks.TensorBoard(log_dir=logdir)
print("Fitting the model")
hist = model.fit(train.take(80),
epochs=16,
initial_epoch =8,
validation_data=val,
validation_steps =8,
validation_freq=2,
callbacks = [[tensorboard_callbacks]])
This includes the class prepared for training the model and the last few lines are subclassing the model and fitting the prepared data into the model.
The error I now get seems pretty hefty to me and it goes like this->
File "C:\Users\Radhe Krishna\OneDrive\Documents\MarkATT\main.py", line 535, in <module>
hist = model.fit(train.take(80),
File "C:\Users\Radhe Krishna\OneDrive\Documents\MarkATT\MarkATT\lib\site-packages\keras\utils\traceback_utils.py", line 67, in error_handler
raise e.with_traceback(filtered_tb) from None
File "C:\Users\Radhe Krishna\OneDrive\Documents\MarkATT\MarkATT\lib\site-packages\tensorflow\python\eager\execute.py", line 54, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError: Graph execution error:
Detected at node 'gradient_tape/sub_2/BroadcastGradientArgs' defined at (most recent call last):
File "C:\Users\Radhe Krishna\OneDrive\Documents\MarkATT\main.py", line 535, in <module>
hist = model.fit(train.take(80),
File "C:\Users\Radhe Krishna\OneDrive\Documents\MarkATT\MarkATT\lib\site-packages\keras\utils\traceback_utils.py", line 64, in error_handler
return fn(*args, **kwargs)
File "C:\Users\Radhe Krishna\OneDrive\Documents\MarkATT\MarkATT\lib\site-packages\keras\engine\training.py", line 1409, in fit
tmp_logs = self.train_function(iterator)
File "C:\Users\Radhe Krishna\OneDrive\Documents\MarkATT\MarkATT\lib\site-packages\keras\engine\training.py", line 1051, in train_function
return step_function(self, iterator)
File "C:\Users\Radhe Krishna\OneDrive\Documents\MarkATT\MarkATT\lib\site-packages\keras\engine\training.py", line 1040, in step_function
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "C:\Users\Radhe Krishna\OneDrive\Documents\MarkATT\MarkATT\lib\site-packages\keras\engine\training.py", line 1030, in run_step
outputs = model.train_step(data)
File "C:\Users\Radhe Krishna\OneDrive\Documents\MarkATT\main.py", line 498, in train_step
grad = tape.gradient(total_loss,self.model.trainable_variables)
Node: 'gradient_tape/sub_2/BroadcastGradientArgs'
Incompatible shapes: [2,4] vs. [8]
[[{{node gradient_tape/sub_2/BroadcastGradientArgs}}]] [Op:__inference_train_function_22026]
It would be a great help if someone could examine this problem and help me get out of it. I have been quite struggling with it now.
Thanks in advance!!!
I has planned to initiate training with the code provided, but I firstly received an error for the call function and as I resolved it by converting it into lambda function- This issue of incompatible shapes came up. I tries to adjust the integers in the input data and batch related fields but nothing worked.

keras tape.gradient error: Input to reshape is a tensor with 1012 values, but the requested shape has 20240 [Op:Reshape]

i use the tape.gradient(g_loss, aa_mutator.trainable_variables) to calculate the gradient of a model called aa_mutator and got the error
File "/home/tialan/Data/gan/code/g.py", line 297, in <module>
grads_g = tape.gradient(g_loss, aa_mutator.trainable_variables)
File "/home/tialan/tf/lib/python3.7/site-packages/tensorflow/python/eager/backprop.py", line 1086, in gradient
unconnected_gradients=unconnected_gradients)
File "/home/tialan/tf/lib/python3.7/site-packages/tensorflow/python/eager/imperative_grad.py", line 77, in imperative_grad
compat.as_str(unconnected_gradients.value))
File "/home/tialan/tf/lib/python3.7/site-packages/tensorflow/python/eager/backprop.py", line 162, in _gradient_function
return grad_fn(mock_op, *out_grads)
File "/home/tialan/tf/lib/python3.7/site-packages/tensorflow/python/ops/array_grad.py", line 782, in _ReshapeGrad
_IndexedSlicesToTensorNoWarning(grad), array_ops.shape(op.inputs[0])),
File "/home/tialan/tf/lib/python3.7/site-packages/tensorflow/python/util/dispatch.py", line 201, in wrapper
return target(*args, **kwargs)
File "/home/tialan/tf/lib/python3.7/site-packages/tensorflow/python/ops/array_ops.py", line 195, in reshape
result = gen_array_ops.reshape(tensor, shape, name)
File "/home/tialan/tf/lib/python3.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 8368, in reshape
_ops.raise_from_not_ok_status(e, name)
File "/home/tialan/tf/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 6862, in raise_from_not_ok_status
six.raise_from(core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 1012 values, but the requested shape has 20240 [Op:Reshape]
within the aa_mutator model i build a customized keras layer
class mutate_func_layer(layers.Layer):
def __init__(self):
super(mutate_func_layer, self).__init__()
def call(self, inputs):
Mut_pos_layer_out, input_pre, Mutation_3 = inputs
where = where_func(Mut_pos_layer_out)
return mutate_func(Mutation_3, where, input_pre)
with mutate_func defined as
#tf.custom_gradient
def mutate_func(x, where, input_pre):#(Mutation_3, where, input_pre): ## x = mutation 3
print('m3_in_mutate_func')
print(x.shape)
aa_aft = gather_nd_func(x, where)
print('m3_in_mutate_func')
print(aa_aft.shape)
aa_aft = K.argmax(aa_aft, axis=-1)
print('m3_in_mutate_func')
print(aa_aft.shape)
aa_aft = tf.reshape(aa_aft, [-1])
print('m3_in_mutate_func')
print(aa_aft.shape)
aa_aft = tf.cast(aa_aft, dtype=tf.float32)
print('m3_in_mutate_func')
print(aa_aft.shape)
aa_seq_out = tf.tensor_scatter_nd_update(input_pre, [where], [aa_aft])
print('m3_in_mutate_func')
print(aa_seq_out.shape)
def grad(upstream):
return upstream*1, upstream*1, upstream*1
the shape for layers in the mutate_func are printed as
m3_in_mutate_func
(1, 1012, 20)
m3_in_mutate_func
(675, 20)
m3_in_mutate_func
(675,)
m3_in_mutate_func
(675,)
m3_in_mutate_func
(675,)
m3_in_mutate_func
(1, 1012)
the model is able to predict given the input. just for fitting the error shows at the stage of tape.gradient. is the error raised due to the customized layer? Thanks for any help or suggestion

Tensor Object is not Iterable with BasicLSTMCell

I have the following code:
def dense_layers(pool3):
with tf.variable_scope('local1') as scope:
# Move everything into depth so we can perform a single matrix multiply.
shape_d = pool3.get_shape()
shape = shape_d[1] * shape_d[2] * shape_d[3]
# tf_shape = tf.stack(shape)
tf_shape = 1024
print("shape:", shape, shape_d[1], shape_d[2], shape_d[3])
# So note that tf_shape = 1024, this means that we have 1024 features are fed into the network. And
# the batch size = 1024. Therefore, the aim is to divide the batch_size into num_steps so that
reshape = tf.reshape(pool3, [-1, tf_shape])
# Now we need to reshape/divide the batch_size into num_steps so that we would be feeding a sequence
# And note that most importantly is to have batch_partition_length followed by step_size in the parameter list.
lstm_inputs = tf.reshape(reshape, [batch_partition_length, step_size, tf_shape])
# print('RNN inputs shape: ', lstm_inputs.get_shape()) # -> (128, 8, 1024).
# Note that the state_size is the number of neurons.
lstm = tf.contrib.rnn.BasicLSTMCell(state_size)
lstm_outputs, final_state = tf.nn.dynamic_rnn(cell=lstm, inputs=lstm_inputs, initial_state=init_state)
tf.assign(init_state, final_state)
So, I am taking the output of the pool layer and try to feed it into the LSTM in the network.
Initially I have declared the following:
state_size = 16
step_size = 8
batch_partition_length = int(batch_size / step_size)
init_state = tf.Variable(tf.zeros([batch_partition_length, state_size])) # -> [128, 16].
Therefore, I am getting an error on:
lstm_outputs, final_state = tf.nn.dynamic_rnn(cell=lstm, inputs=lstm_inputs, initial_state=init_state)
As follows:
Traceback (most recent call last):
File "C:/Users/user/PycharmProjects/AffectiveComputing/Brady_with_LSTM.py", line 197, in <module>
predictions = dense_layers(conv_nets_output)
File "C:/Users/user/PycharmProjects/AffectiveComputing/Brady_with_LSTM.py", line 162, in dense_layers
lstm_outputs, final_state = tf.nn.dynamic_rnn(cell=lstm, inputs=lstm_inputs, initial_state=init_state)
File "C:\Users\user\AppData\Local\Continuum\Anaconda3\lib\site-packages\tensorflow\python\ops\rnn.py", line 553, in dynamic_rnn
dtype=dtype)
File "C:\Users\user\AppData\Local\Continuum\Anaconda3\lib\site-packages\tensorflow\python\ops\rnn.py", line 720, in _dynamic_rnn_loop
swap_memory=swap_memory)
File "C:\Users\user\AppData\Local\Continuum\Anaconda3\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 2623, in while_loop
result = context.BuildLoop(cond, body, loop_vars, shape_invariants)
File "C:\Users\user\AppData\Local\Continuum\Anaconda3\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 2456, in BuildLoop
pred, body, original_loop_vars, loop_vars, shape_invariants)
File "C:\Users\user\AppData\Local\Continuum\Anaconda3\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 2406, in _BuildLoop
body_result = body(*packed_vars_for_body)
File "C:\Users\user\AppData\Local\Continuum\Anaconda3\lib\site-packages\tensorflow\python\ops\rnn.py", line 705, in _time_step
(output, new_state) = call_cell()
File "C:\Users\user\AppData\Local\Continuum\Anaconda3\lib\site-packages\tensorflow\python\ops\rnn.py", line 691, in <lambda>
call_cell = lambda: cell(input_t, state)
File "C:\Users\user\AppData\Local\Continuum\Anaconda3\lib\site-packages\tensorflow\contrib\rnn\python\ops\core_rnn_cell_impl.py", line 238, in __call__
c, h = state
File "C:\Users\user\AppData\Local\Continuum\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 504, in __iter__
raise TypeError("'Tensor' object is not iterable.")
TypeError: 'Tensor' object is not iterable.
Any help is much appreciated!!
The state for LSTMs really consists of two parts
State for the cell(s)
Previous outputs
This is alluded to in the docs for BasicLSTMCell. This paper has a good explanation of how LSTMs work which will help you understand why you need to keep two sets of states in an LSTM implementation. The reason an error is being thrown is because you need to supply a tuple of tensors for the initial state.
That said you have two options:
Supply an initial state that consists of two tensors.
Let the RNN cell generate its own initial state.
You would usually only do 1. if you wanted to override default behavior. In this case you are using the default (zero) initial state so you can do 2.
lstm_outputs, final_state = tf.nn.dynamic_rnn(cell=lstm, inputs=lstm_inputs, dtype=tf.float32)

why dataset.output_shapes returns demension(none) after batching

I'm using the Dataset API for input pipelines in TensorFlow (version: r1.2). I built my dataset and batched it with a batch size of 128. The dataset fed into the RNN.
Unfortunately, the dataset.output_shape returns dimension(none) in the first dimension, so the RNN raises an error:
Traceback (most recent call last):
File "untitled1.py", line 188, in <module>
tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
File "/home/harold/anaconda2/envs/tensorflow_py2.7/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "untitled1.py", line 121, in main
run_training()
File "untitled1.py", line 57, in run_training
is_training=True)
File "/home/harold/huawei/ConvLSTM/ConvLSTM.py", line 216, in inference
initial_state=initial_state)
File "/home/harold/anaconda2/envs/tensorflow_py2.7/lib/python2.7/site-packages/tensorflow/python/ops/rnn.py", line 566, in dynamic_rnn
dtype=dtype)
File "/home/harold/anaconda2/envs/tensorflow_py2.7/lib/python2.7/site-packages/tensorflow/python/ops/rnn.py", line 636, in _dynamic_rnn_loop
"Input size (depth of inputs) must be accessible via shape inference,"
ValueError: Input size (depth of inputs) must be accessible via shape inference, but saw value None.
I think this error is caused by the shape of input, the first dimension should be batch size but not none.
here is the code:
origin_dataset = Dataset.BetweenS_Dataset(FLAGS.data_path)
train_dataset = origin_dataset.train_dataset
test_dataset = origin_dataset.test_dataset
shuffle_train_dataset = train_dataset.shuffle(buffer_size=10000)
shuffle_batch_train_dataset = shuffle_train_dataset.batch(128)
batch_test_dataset = test_dataset.batch(FLAGS.batch_size)
iterator = tf.contrib.data.Iterator.from_structure(
shuffle_batch_train_dataset.output_types,
shuffle_batch_train_dataset.output_shapes)
(images, labels) = iterator.get_next()
training_init_op = iterator.make_initializer(shuffle_batch_train_dataset)
test_init_op = iterator.make_initializer(batch_test_dataset)
print(shuffle_batch_train_dataset.output_shapes)
I print output_shapes and it gives:
(TensorShape([Dimension(None), Dimension(36), Dimension(100)]), TensorShape([Dimension(None)]))
I suppose that it should be 128, because I have batched dataset:
(TensorShape([Dimension(128), Dimension(36), Dimension(100)]), TensorShape([Dimension(128)]))
This feature has been added with the drop_remainder parameter used like the following:
batch_test_dataset = test_dataset.batch(FLAGS.batch_size, drop_remainder=True)
From the docs:
drop_remainder: (Optional.) A tf.bool scalar tf.Tensor, representing whether the last batch should be dropped in the case its has fewer than batch_size elements; the default behavior is not to drop the smaller batch.
They hardcoded batch size in implementation and it always will return None (tf 1.3).
def _padded_shape_to_batch_shape(s):
return tensor_shape.vector(None).concatenate(
tensor_util.constant_value_as_shape(s))
In this way, they can batch all elements (e.g. dataset_size=14, batch_size=5, last_batch_size=4).
You can use dataset.filter and dataset.map to fix this issue
d = contrib.data.Dataset.from_tensor_slices([[5] for x in range(14)])
batch_size = 5
d = d.batch(batch_size)
d = d.filter(lambda e: tf.equal(tf.shape(e)[0], batch_size))
def batch_reshape(e):
return tf.reshape(e, [args.batch_size] + [s if s is not None else -1 for s in e.shape[1:].as_list()])
d = d.map(batch_reshape)
r = d.make_one_shot_iterator().get_next()
print('dataset_output_shape = %s' % r.shape)
with tf.Session() as sess:
while True:
print(sess.run(r))
Output
dataset_output_shape = (5, 1)
[[5][5][5][5][5]]
[[5][5][5][5][5]]
OutOfRangeError

Compute status: Not found: Tensor name "input_producer/limit_epochs/epochs" not found in checkpoint files

I'm using the CIFAR10 example. I trained the net as it is with the code provided. The training was done successfully. As I wanted to evaluate each example only once on my data set, I have modified inputs in cifar10_input.py to the following.
def inputs(eval_data, data_dir, batch_size):
filename = os.path.join(data_dir, TEST_FILE)
filename_queue = tf.train.string_input_producer([filename],num_epochs=1)
image, label = read_and_decode(filename_queue)
float_image = tf.image.per_image_whitening(image)
min_fraction_of_examples_in_queue = 0.4
min_queue_examples = int(NUM_EXAMPLES_PER_EPOCH_FOR_EVAL *
min_fraction_of_examples_in_queue)
images, label_batch = tf.train.batch(
[image, label],
batch_size=batch_size,
num_threads=1,
capacity=min_queue_examples + 3 * batch_size)
tf.image_summary('images', images)
return images, tf.reshape(label_batch, [batch_size])
I have isolated the problem to the following:
tf.train_string_input_producer([filename], num_epochs = 1)
If I don't set num_epochs = 1, everything works fine as it is. If I do, I get the following error.
0x2cf2700 Compute status: Not found: Tensor name "input_producer/limit_epochs/epochs" not found in checkpoint files /home/jkschin/tensorflow/my_code/data/svhn/train/model.ckpt-8000
Thank you for your help!
EDIT 3 #mrry:
It still fails. Here's the trace.
Traceback (most recent call last):
File "cnn_eval.py", line 148, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/default/_app.py", line 30, in run
sys.exit(main(sys.argv))
File "cnn_eval.py", line 144, in main
evaluate()
File "cnn_eval.py", line 119, in evaluate
saver = tf.train.Saver([v for v in variables_to_restore if v.name != "input_producer/limit_epochs/epochs"])
AttributeError: 'unicode' object has no attribute 'name'
EDIT 4 #mrry:
softmax_linear/biases/ExponentialMovingAverage
conv2/biases/ExponentialMovingAverage
local4/biases/ExponentialMovingAverage
local3/biases/ExponentialMovingAverage
softmax_linear/weights/ExponentialMovingAverage
conv1/biases/ExponentialMovingAverage
local4/weights/ExponentialMovingAverage
conv2/weights/ExponentialMovingAverage
input_producer/limit_epochs/epochs
local3/weights/ExponentialMovingAverage
conv1/weights/ExponentialMovingAverage
Traceback (most recent call last):
File "cnn_eval.py", line 148, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/default/_app.py", line 30, in run
sys.exit(main(sys.argv))
File "cnn_eval.py", line 144, in main
evaluate()
File "cnn_eval.py", line 119, in evaluate
saver = tf.train.Saver([v for v in variables_to_restore if v != "input_producer/limit_epochs/epochs"])
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 784, in __init__
restore_sequentially=restore_sequentially)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 437, in build
vars_to_save = self._ValidateAndSliceInputs(names_to_variables)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 340, in _ValidateAndSliceInputs
names_to_variables = self._VarListToDict(names_to_variables)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 314, in _VarListToDict
raise TypeError("Variable to save is not a Variable: %s" % var)
TypeError: Variable to save is not a Variable: Tensor("Const:0", shape=(), dtype=string)
EDIT 5 #mrry:
saver = tf.train.Saver([tf.Variable(0.0,validate_shape=False,name=v) for v in variables_to_restore if v != "input_producer/limit_epochs/epochs"])
0x21d0cb0 Compute status: Invalid argument: Assign requires shapes of both tensors to match. lhs shape= [] rhs shape= [10]
[[Node: save/Assign_8 = Assign[T=DT_FLOAT, use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](softmax_linear/biases/ExponentialMovingAverage, save/restore_slice_8/_20)]]
TL;DR: In cifar10_eval.py, change the saver constructor so that it is:
saver = tf.train.Saver([v for v in variables_to_restore
if v != "input_producer/limit_epochs/epochs"])
This problem arises because tf.train.string_input_producer() internally creates a variable (called "input_producer/limit_epochs/epochs") when its num_epochs argument is not None. When, in cifar10_eval.py a tf.train.Saver is created, it uses tf.all_variables(), which includes the implicitly-created variable from the tf.nn.string_input_producer(). This list of variables determines the set of names that TensorFlow looks up in the checkpoint file.
Currently there isn't a great way to refer to implicitly created variables, other than by their name. Therefore, the best fix is to exclude the variable from the Saver constructor by name.
Another way of eliminating the implicit variable "input_producer/limit_epochs/epochs" is to only load the trainable variables:
saver = tf.train.Saver(tf.trainable_variables())