tensorflow estimator from_generator, how to set TensorShape?

tensorflow estimator from_generator, how to set TensorShape? - tensorflow

I am trying use a generator to feed data into estimator. The following is the code. However, when try to run, I got the following error:
Update2: I finally made it work. So the correct tensorshape is
([], [], [])
Update: I added tensorshape ([None], [None], [None]), then I changed ds.batch(10), to an assignment ds = ds.batch(10)
but still got error.
Traceback (most recent call last):
File "xyz.py", line 79, in <module>
tf.app.run(main=main, argv=None)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "xyz.py", line 67, in main
model.train(input_fn=lambda: input_fn(100))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py", line 302, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py", line 783, in _train_model
_, loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss])
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 521, in run
run_metadata=run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 892, in run
run_metadata=run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 967, in run
raise six.reraise(*original_exc_info)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 952, in run
return self._sess.run(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 1024, in run
run_metadata=run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 827, in run
return self._sess.run(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 889, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1120, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1317, in _do_run
options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1336, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: exceptions.ValueError: `generator` yielded an element of shape () where an element of shape (?,) was expected.
[[Node: PyFunc = PyFunc[Tin=[DT_INT64], Tout=[DT_INT64, DT_STRING, DT_FLOAT], token="pyfunc_1"](arg0)]]
[[Node: IteratorGetNext = IteratorGetNext[output_shapes=[[?,?], [?,?], [?,?]], output_types=[DT_INT64, DT_STRING, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](OneShotIterator)]]
So my question, how to set the TensorShape? The from generator takes a third argument of TensorShape but I cannot find any example/doc on how to set it. Any help?
Thanks,
def gen(nn):
ii = 0
while ii < nn:
ii += 1
yield ii, 't{0}'.format(ii), ii*2
def input_fn(n):
ds = tf.data.Dataset.from_generator(lambda: gen(n), (tf.int64, tf.string, tf.float32), ([None], [None], [None]))
ds = ds.batch(10)
x, y, z = ds.make_one_shot_iterator().get_next()
return {'x': x, 'y': y}, tf.greater_equal(z, 10)
def build_columns():
x = tf.feature_column.numeric_column('x')
y = tf.feature_column.categorical_column_with_hash_bucket('y', hash_bucket_size=5)
return [x, y]
def build_estimator():
run_config = tf.estimator.RunConfig().replace(
session_config=tf.ConfigProto(device_count={'GPU': 0}))
return tf.estimator.LinearClassifier(model_dir=FLAGS.model_dir, feature_columns=build_columns(), config=run_config)
def main(unused):
# Clean up the model directory if present
shutil.rmtree(FLAGS.model_dir, ignore_errors=True)
model = build_estimator()
# Train and evaluate the model every `FLAGS.epochs_per_eval` epochs.
for n in range(FLAGS.train_epochs // FLAGS.epochs_per_eval):
model.train(input_fn=lambda: input_fn(100))
results = model.evaluate(input_fn=lambda: input_fn(20))

As mentioned by #FengTian in an update, the correct answer was to use shape ([], [], []) as the output shape of the generator:
tf.data.Dataset.from_generator(lambda: gen(n), (tf.int64, tf.string, tf.float32), ([], [], []))

Related

tensorflow training with feature_column_lib.input_layer

I'm trying to train a model with an input layer similar to DNNClassifier.
I used feature_column_lib.input_layer, which is how DNNClassifier constructs its input layer.
However, I got an error when I tried to optimize the loss of my graph. I feel it has something to do with the categorical feature spec, when I removed the categorical feature, it works fine. Is there a way to solve this?
Thanks.
tensorflow/core/framework/op_kernel.cc:1318] OP_REQUIRES failed at sparse_to_dense_op.cc:126 : Invalid argument: indices[1] = [1,0] is out of bounds: need 0 <= index < [1,1]
Traceback (most recent call last):
File "tf_exp.py", line 120, in <module>
print sess.run(loss)
File "lib/python2.7/site-packages/tensorflow/python/client/session.py", line 900, in run
run_metadata_ptr)
File "lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1135, in _run
feed_dict_tensor, options, run_metadata)
File "lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1316, in _do_run
run_metadata)
File "lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1335, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[1] = [1,0] is out of bounds: need 0 <= index < [1,1]
[[Node: input_layer/a4_indicator/SparseToDense = SparseToDense[T=DT_INT64, Tindices=DT_INT64, validate_indices=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](input_layer/a4_indicator/to_sparse_input/indices, input_layer/a4_indicator/to_sparse_input/dense_shape, input_layer/a4_indicator/Select, input_layer/a4_indicator/SparseToDense/default_value)]]
Caused by op u'input_layer/a4_indicator/SparseToDense', defined at:
File "tf_exp.py", line 102, in <module>
features=features, feature_columns=feature_columns)
File "lib/python2.7/site-packages/tensorflow/python/feature_column/feature_column.py", line 277, in input_layer
trainable, cols_to_vars)
File "lib/python2.7/site-packages/tensorflow/python/feature_column/feature_column.py", line 202, in _internal_input_layer
trainable=trainable)
File "lib/python2.7/site-packages/tensorflow/python/feature_column/feature_column.py", line 3332, in _get_dense_tensor
return inputs.get(self)
File "lib/python2.7/site-packages/tensorflow/python/feature_column/feature_column.py", line 2175, in get
transformed = column._transform_feature(self) # pylint: disable=protected-access
File "lib/python2.7/site-packages/tensorflow/python/feature_column/feature_column.py", line 3277, in _transform_feature
id_tensor, default_value=-1)
File "lib/python2.7/site-packages/tensorflow/python/ops/sparse_ops.py", line 996, in sparse_tensor_to_dense
name=name)
File "lib/python2.7/site-packages/tensorflow/python/ops/sparse_ops.py", line 776, in sparse_to_dense
name=name)
File "lib/python2.7/site-packages/tensorflow/python/ops/gen_sparse_ops.py", line 2824, in sparse_to_dense
name=name)
File "lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3414, in create_op
op_def=op_def)
File "lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1740, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
InvalidArgumentError (see above for traceback): indices[1] = [1,0] is out of bounds: need 0 <= index < [1,1]
[[Node: input_layer/a4_indicator/SparseToDense = SparseToDense[T=DT_INT64, Tindices=DT_INT64, validate_indices=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](input_layer/a4_indicator/to_sparse_input/indices, input_layer/a4_indicator/to_sparse_input/dense_shape, input_layer/a4_indicator/Select, input_layer/a4_indicator/SparseToDense/default_value)]]
Here is my program
dataset = tf.data.TextLineDataset('test_input.txt')
dataset = dataset.shuffle(5)
dataset = dataset.batch(5)
dataset = dataset.repeat(configs.get('epochs',10))
train_data_iterator = dataset.make_one_shot_iterator()
features, labels= _parse_example(train_data_iterator.get_next())
feature_columns = []
feature_columns.append(tf.feature_column.numeric_column(key='f0', dtype=tf.float32))
feature_columns.append(tf.feature_column.numeric_column(key='f1', dtype=tf.float32))
feature_columns.append(tf.feature_column.numeric_column(key='f2', dtype=tf.float32))
feature_columns.append(tf.feature_column.numeric_column(key='f3', dtype=tf.float32))
feature_columns.append(tf.feature_column.indicator_column(
tf.feature_column.categorical_column_with_identity(key='a4', num_buckets=11,
default_value=0)))
input_layer = feature_column_lib.input_layer(
features=features, feature_columns=feature_columns)
logits = tf.layers.dense(inputs=input_layer, units=1)
loss = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(labels=labels, logits=logits))
optimizer = tf.train.GradientDescentOptimizer(0.001).minimize(loss, global_step = global_step)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print sess.run(loss)

Figured it out, the real problem is labels dimension didn't match with logits.

Training model from logits and checkpoint

I am new in tensor flow and I am trying to train the mobile net_v1. To do that, I first created the tfrecords' file for multi-class from a txt file.( example : namefile label1 label2 ...)
import sys, os
import tensorflow as tf
import cv2
import numpy as np
import matplotlib.pyplot as plt
# function
def load_image(addr):
# read an image and resize to (224, 224)
# cv2 load images as BGR, convert it to RGB
img = cv2.imread(addr)
img = cv2.resize(img, (224, 224), interpolation=cv2.INTER_CUBIC)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = img.astype(np.float32)
return img
def _int64_feature(value):
return tf.train.Feature(int64_list=tf.train.Int64List(value=[*value]))
def _bytes_feature(value):
return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))
def loadData(inputs):
addrs = []
labels = []
f = open(inputs, 'r')
data = [ln.split(' ') for ln in f ]
f.close()
print(data)
for i in range(0, len(data)):
addrs.append(data[i][0].rstrip())
l = []
for j in range(1,len(data[i])):
if(data[i][j].rstrip().isdigit() == True):
l.append(int(data[i][j].rstrip()))
print(l)
labels.append(l)
return addrs, labels
def CreateTrainFile(input_filename, train_filename,):
path = '/home/rd/Documents/RD2/Databases/Faces/'
# load file and label
train_addrs, train_labels = loadData(input_filename)
print(train_labels)
# open the TFRecords file
writer = tf.python_io.TFRecordWriter(train_filename)
for i in range(len(train_addrs)):
# print how many images are saved every 1000 images
if not i % 1000:
print('Train data: {}/{}'.format(i, len(train_addrs)))
sys.stdout.flush()
# Load the image
img = load_image(train_addrs[i])
label = train_labels[i]
print('label : ', _int64_feature(label))
# Create a feature
feature = {'train/label': _int64_feature(label),
'train/image': _bytes_feature(tf.compat.as_bytes(img.tostring()))}
# Create an example protocol buffer
example = tf.train.Example(features=tf.train.Features(feature=feature))
# Serialize to string and write on the file
writer.write(example.SerializeToString())
writer.close()
sys.stdout.flush()
# open the TFRecords file
def CreateValidationFile(val_filename):
writer = tf.python_io.TFRecordWriter(val_filename)
for i in range(len(val_addrs)):
# print how many images are saved every 1000 images
if not i % 1000:
print('Val data: {}/{}'.format(i, len(val_addrs)))
sys.stdout.flush()
# Load the image
img = load_image(val_addrs[i])
label = val_labels[i]
# Create a feature
feature = {'val/label': _int64_feature(label),
'val/image': _bytes_feature(tf.compat.as_bytes(img.tostring()))}
# Create an example protocol buffer
example = tf.train.Example(features=tf.train.Features(feature=feature))
# Serialize to string and write on the file
writer.write(example.SerializeToString())
writer.close()
sys.stdout.flush()
# open the TFRecords file
def CreateTestFile(test_filename):
writer = tf.python_io.TFRecordWriter(test_filename)
for i in range(len(test_addrs)):
# print how many images are saved every 1000 images
if not i % 1000:
print('Test data: {}/{}'.format(i, len(test_addrs)))
sys.stdout.flush()
# Load the image
img = load_image(test_addrs[i])
label = test_labels[i]
# Create a feature
feature = {'test/label': _int64_feature(label),
'test/image': _bytes_feature(tf.compat.as_bytes(img.tostring()))}
# Create an example protocol buffer
example = tf.train.Example(features=tf.train.Features(feature=feature))
# Serialize to string and write on the file
writer.write(example.SerializeToString())
writer.close()
sys.stdout.flush()
def ReadRecordFileTrain(data_path):
#data_path = 'train.tfrecords' # address to save the hdf5 file
with tf.Session() as sess:
feature = {'train/image': tf.FixedLenFeature([], tf.string),
'train/label': tf.FixedLenFeature([], tf.int64)}
# Create a list of filenames and pass it to a queue
filename_queue = tf.train.string_input_producer([data_path], num_epochs=1)
# Define a reader and read the next record
reader = tf.TFRecordReader()
_, serialized_example = reader.read(filename_queue)
# Decode the record read by the reader
features = tf.parse_single_example(serialized_example, features=feature)
# Convert the image data from string back to the numbers
image = tf.decode_raw(features['train/image'], tf.float32)
# Cast label data into int32
label = tf.cast(features['train/label'], tf.int32)
# Reshape image data into the original shape
image = tf.reshape(image, [224, 224, 3])
# Any preprocessing here ...
# Creates batches by randomly shuffling tensors
images, labels = tf.train.shuffle_batch([image, label], batch_size=2, capacity=30, num_threads=1, min_after_dequeue=10)
return images, labels
def main():
train_filename = 'train.tfrecords' # address to save the TFRecords file
#test_filename = 'test.tfrecords' # address to save the TFRecords file
#val_filename = 'val.tfrecords' # address to save the TFRecords file
CreateTrainFile("data.txt", train_filename)
main()
and to read the tf records :
def ReadRecordFileTrain(data_path):
#data_path = 'train.tfrecords' # address to save the hdf5 file
with tf.Session() as sess:
feature = {'train/image': tf.FixedLenFeature([], tf.string),
'train/label': tf.FixedLenFeature([2], tf.int64)}
# Create a list of filenames and pass it to a queue
filename_queue = tf.train.string_input_producer([data_path], num_epochs=1)
# Define a reader and read the next record
reader = tf.TFRecordReader()
_, serialized_example = reader.read(filename_queue)
# Decode the record read by the reader
features = tf.parse_single_example(serialized_example, features=feature)
# Convert the image data from string back to the numbers
image = tf.decode_raw(features['train/image'], tf.float32)
print('label1 :', features['train/label'] )
# Cast label data into int32
label = tf.cast(features['train/label'], tf.int32)
print('label load:', label)
# Reshape image data into the original shape
image = tf.reshape(image, [224, 224, 3])
# Any preprocessing here ...
# Creates batches by randomly shuffling tensors
images, labels = tf.train.batch([image, label], batch_size=2, capacity=30, num_threads=1)
return images, labels
I suppose it works but I am not sure ( I don't have any errors when I called these functions.)
Then, I load the model and its weight. Call the loss function and try to start the training, but it fails at this moment.
g = tf.Graph()
with g.as_default():
# size of the folder
inputs = tf.placeholder(tf.float32, [1, 224, 224, 3])
# load dataset
images, labels = ReadRecordFileTrain('train.tfrecords')
print('load dataset done')
print('labels = ', labels)
print('data = ', images)
print(tf.shape(labels))
# load network
network, end_points= mobilenet.mobilenet_v1(images, num_classes=2, depth_multiplier=0.25 )
print('load network done')
print('network : ', network)
variables_to_restore = slim.get_variables_to_restore(exclude=["MobilenetV1/Logits/Conv2d_1c_1x1"])
load_checkpoint = "modele_mobilenet_v1_025/mobilenet_v1_0.25_224.ckpt"
init_fn = slim.assign_from_checkpoint_fn(load_checkpoint, variables_to_restore)
print('custom network done')
# Specify the loss function:
tf.losses.softmax_cross_entropy(labels, network)
total_loss = tf.losses.get_total_loss()
#tf.scalar_summary('losses/total_loss', total_loss)
# Specify the optimization scheme:
optimizer = tf.train.GradientDescentOptimizer(learning_rate=.001)
# create_train_op that ensures that when we evaluate it to get the loss,
# the update_ops are done and the gradient updates are computed.
train_tensor = slim.learning.create_train_op(total_loss, optimizer)
print('loss and optimizer chosen')
# Actually runs training.
save_checkpoint = 'model/modelcheck'
# start training
learning = slim.learning.train(train_tensor, save_checkpoint, init_fn=init_fn, number_of_steps=1000)
The error message :
label1 : Tensor("ParseSingleExample/Squeeze_train/label:0", shape=(2,), dtype=int64)
label load: Tensor("Cast:0", shape=(2,), dtype=int32)
load dataset done
labels = Tensor("batch:1", shape=(2, 2), dtype=int32)
data = Tensor("batch:0", shape=(2, 224, 224, 3), dtype=float32)
Tensor("Shape:0", shape=(2,), dtype=int32)
load network done
network : Tensor("MobilenetV1/Logits/SpatialSqueeze:0", shape=(2, 2), dtype=float32)
custom network done
loss and optimizer chosen
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1039, in _do_call
return fn(*args)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1021, in _run_fn
status, run_metadata)
File "/usr/lib/python3.5/contextlib.py", line 66, in __exit__
next(self.gen)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [1,1,256,2] rhs shape= [1,1,256,1]
[[Node: save_1/Assign_109 = Assign[T=DT_FLOAT, _class=["loc:#MobilenetV1/Logits/Conv2d_1c_1x1/weights"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/cpu:0"](MobilenetV1/Logits/Conv2d_1c_1x1/weights, save_1/RestoreV2_109)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "test.py", line 106, in <module>
main()
File "test.py", line 103, in main
learning = slim.learning.train(train_tensor, save_checkpoint, init_fn=init_fn, number_of_steps=1000)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/slim/python/slim/learning.py", line 725, in train
master, start_standard_services=False, config=session_config) as sess:
File "/usr/lib/python3.5/contextlib.py", line 59, in __enter__
return next(self.gen)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/supervisor.py", line 960, in managed_session
self.stop(close_summary_writer=close_summary_writer)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/supervisor.py", line 788, in stop
stop_grace_period_secs=self._stop_grace_secs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/coordinator.py", line 389, in join
six.reraise(*self._exc_info_to_raise)
File "/usr/lib/python3/dist-packages/six.py", line 686, in reraise
raise value
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/supervisor.py", line 949, in managed_session
start_standard_services=start_standard_services)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/supervisor.py", line 706, in prepare_or_wait_for_session
init_feed_dict=self._init_feed_dict, init_fn=self._init_fn)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/session_manager.py", line 256, in prepare_session
config=config)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/session_manager.py", line 188, in _restore_checkpoint
saver.restore(sess, ckpt.model_checkpoint_path)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 1457, in restore
{self.saver_def.filename_tensor_name: save_path})
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 778, in run
run_metadata_ptr)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 982, in _run
feed_dict_string, options, run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1032, in _do_run
target_list, options, run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1052, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [1,1,256,2] rhs shape= [1,1,256,1]
[[Node: save_1/Assign_109 = Assign[T=DT_FLOAT, _class=["loc:#MobilenetV1/Logits/Conv2d_1c_1x1/weights"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/cpu:0"](MobilenetV1/Logits/Conv2d_1c_1x1/weights, save_1/RestoreV2_109)]]
Caused by op 'save_1/Assign_109', defined at:
File "test.py", line 106, in <module>
main()
File "test.py", line 103, in main
learning = slim.learning.train(train_tensor, save_checkpoint, init_fn=init_fn, number_of_steps=1000)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/slim/python/slim/learning.py", line 642, in train
saver = saver or tf_saver.Saver()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 1056, in __init__
self.build()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 1086, in build
restore_sequentially=self._restore_sequentially)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 691, in build
restore_sequentially, reshape)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 419, in _AddRestoreOps
assign_ops.append(saveable.restore(tensors, shapes))
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 155, in restore
self.op.get_shape().is_fully_defined())
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/state_ops.py", line 270, in assign
validate_shape=validate_shape)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_state_ops.py", line 47, in assign
use_locking=use_locking, name=name)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op
op_def=op_def)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 2336, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1228, in __init__
self._traceback = _extract_stack()
InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [1,1,256,2] rhs shape= [1,1,256,1]
[[Node: save_1/Assign_109 = Assign[T=DT_FLOAT, _class=["loc:#MobilenetV1/Logits/Conv2d_1c_1x1/weights"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/cpu:0"](MobilenetV1/Logits/Conv2d_1c_1x1/weights, save_1/RestoreV2_109)]]
I don't understand where the problem comes from and how to solve it.

InvalidArgumentError: Assign requires shapes of both tensors to match.
lhs shape= [1,1,256,2] rhs shape= [1,1,256,1]
I used to get this error when my model saved in the model directory has conflicts with my current running model. Try deleting your model directory and start training again.

It seems to solve the error but now, when I want to execute it with a tf.Session it failed. I was wondering if the problem comes from my graph or am I doing something wrong in the tf.Session ?
def evaluation(logits, labels):
with tf.name_scope('Accuracy'):
# Operation comparing prediction with true label
correct_prediction = tf.equal(tf.argmax(logits,1), tf.argmax(labels, 1))
# Operation calculating the accuracy of the predictions
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
# Summary operation for the accuracy
#tf.scalar_summary('train_accuracy', accuracy)
return accuracy
g = tf.Graph()
with g.as_default():
# size of the folder
inputs = tf.placeholder(tf.float32, [1, 224, 224, 3])
# load dataset
images, labels = ReadRecordFileTrain('train.tfrecords')
print('load dataset done')
print('labels = ', labels)
print('data = ', images)
print(tf.shape(labels))
# load network
network, end_points= mobilenet.mobilenet_v1(images, num_classes=2, depth_multiplier=0.25 )
print('load network done')
print('network : ', network)
variables_to_restore = slim.get_variables_to_restore(exclude=["MobilenetV1/Logits/Conv2d_1c_1x1"])
load_checkpoint = "modele_mobilenet_v1_025/mobilenet_v1_0.25_224.ckpt"
init_fn = slim.assign_from_checkpoint_fn(load_checkpoint, variables_to_restore)
print('custom network done')
# Specify the loss function:
tf.losses.softmax_cross_entropy(labels, network)
total_loss = tf.losses.get_total_loss()
#tf.scalar_summary('losses/total_loss', total_loss)
# Specify the optimization scheme:
optimizer = tf.train.GradientDescentOptimizer(learning_rate=.001)
# create_train_op that ensures that when we evaluate it to get the loss,
# the update_ops are done and the gradient updates are computed.
train_tensor = slim.learning.create_train_op(total_loss, optimizer)
print('loss and optimizer chosen')
# Actually runs training.
save_checkpoint = 'model/modelcheck'
# start training
learning = slim.learning.train(train_tensor, save_checkpoint, init_fn=init_fn, number_of_steps=1000)
accuracy = evaluation(network, labels)
with tf.Session(graph=g) as sess:
sess.run(network)
print('network load')
sess.run(total_loss)
sess.run(accuracy)
sess.run(train_tensor)
sess.run(learning)
The error :
label1 : Tensor("ParseSingleExample/Squeeze_train/label:0", shape=(2,), dtype=int64)
label load: Tensor("Cast:0", shape=(2,), dtype=int32)
load dataset done
labels = Tensor("batch:1", shape=(4, 2), dtype=int32)
data = Tensor("batch:0", shape=(4, 224, 224, 3), dtype=float32)
Tensor("Shape:0", shape=(2,), dtype=int32)
load network done
network : Tensor("MobilenetV1/Logits/SpatialSqueeze:0", shape=(4, 2), dtype=float32)
custom network done
loss and optimizer chosen
end of graph
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1039, in _do_call
return fn(*args)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1021, in _run_fn
status, run_metadata)
File "/usr/lib/python3.5/contextlib.py", line 66, in __exit__
next(self.gen)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.FailedPreconditionError: Attempting to use uninitialized value MobilenetV1/Conv2d_3_depthwise/BatchNorm/beta
[[Node: MobilenetV1/Conv2d_3_depthwise/BatchNorm/beta/read = Identity[T=DT_FLOAT, _class=["loc:#MobilenetV1/Conv2d_3_depthwise/BatchNorm/beta"], _device="/job:localhost/replica:0/task:0/cpu:0"](MobilenetV1/Conv2d_3_depthwise/BatchNorm/beta)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "test.py", line 113, in <module>
main()
File "test.py", line 105, in main
sess.run(network)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 778, in run
run_metadata_ptr)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 982, in _run
feed_dict_string, options, run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1032, in _do_run
target_list, options, run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1052, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.FailedPreconditionError: Attempting to use uninitialized value MobilenetV1/Conv2d_3_depthwise/BatchNorm/beta
[[Node: MobilenetV1/Conv2d_3_depthwise/BatchNorm/beta/read = Identity[T=DT_FLOAT, _class=["loc:#MobilenetV1/Conv2d_3_depthwise/BatchNorm/beta"], _device="/job:localhost/replica:0/task:0/cpu:0"](MobilenetV1/Conv2d_3_depthwise/BatchNorm/beta)]]
Caused by op 'MobilenetV1/Conv2d_3_depthwise/BatchNorm/beta/read', defined at:
File "test.py", line 113, in <module>
main()
File "test.py", line 67, in main
network, end_points= mobilenet.mobilenet_v1(images, num_classes=2, depth_multiplier=0.25 )
File "/home/rd/Documents/RD2/users/Ludovic/tensorflow_mobilenet/mobilenet_v1.py", line 301, in mobilenet_v1
conv_defs=conv_defs)
File "/home/rd/Documents/RD2/users/Ludovic/tensorflow_mobilenet/mobilenet_v1.py", line 228, in mobilenet_v1_base
scope=end_point)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 181, in func_with_args
return func(*args, **current_args)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1891, in separable_convolution2d
outputs = normalizer_fn(outputs, **normalizer_params)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 181, in func_with_args
return func(*args, **current_args)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/layers/python/layers/layers.py", line 528, in batch_norm
outputs = layer.apply(inputs, training=is_training)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/layers/base.py", line 320, in apply
return self.__call__(inputs, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/layers/base.py", line 286, in __call__
self.build(input_shapes[0])
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/layers/normalization.py", line 125, in build
trainable=True)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/variable_scope.py", line 1049, in get_variable
use_resource=use_resource, custom_getter=custom_getter)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/variable_scope.py", line 948, in get_variable
use_resource=use_resource, custom_getter=custom_getter)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/variable_scope.py", line 349, in get_variable
validate_shape=validate_shape, use_resource=use_resource)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/variable_scope.py", line 1389, in wrapped_custom_getter
*args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/layers/base.py", line 275, in variable_getter
variable_getter=functools.partial(getter, **kwargs))
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/layers/base.py", line 228, in _add_variable
trainable=trainable and self.trainable)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/variable_scope.py", line 1389, in wrapped_custom_getter
*args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1334, in layer_variable_getter
return _model_variable_getter(getter, *args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1326, in _model_variable_getter
custom_getter=getter, use_resource=use_resource)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 181, in func_with_args
return func(*args, **current_args)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/framework/python/ops/variables.py", line 262, in model_variable
use_resource=use_resource)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 181, in func_with_args
return func(*args, **current_args)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/framework/python/ops/variables.py", line 217, in variable
use_resource=use_resource)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1334, in layer_variable_getter
return _model_variable_getter(getter, *args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1326, in _model_variable_getter
custom_getter=getter, use_resource=use_resource)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 181, in func_with_args
return func(*args, **current_args)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/framework/python/ops/variables.py", line 262, in model_variable
use_resource=use_resource)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 181, in func_with_args
return func(*args, **current_args)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/framework/python/ops/variables.py", line 217, in variable
use_resource=use_resource)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/variable_scope.py", line 341, in _true_getter
use_resource=use_resource)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/variable_scope.py", line 714, in _get_single_variable
validate_shape=validate_shape)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/variables.py", line 197, in __init__
expected_shape=expected_shape)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/variables.py", line 316, in _init_from_args
self._snapshot = array_ops.identity(self._variable, name="read")
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 1338, in identity
result = _op_def_lib.apply_op("Identity", input=input, name=name)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op
op_def=op_def)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 2336, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1228, in __init__
self._traceback = _extract_stack()
FailedPreconditionError (see above for traceback): Attempting to use uninitialized value MobilenetV1/Conv2d_3_depthwise/BatchNorm/beta
[[Node: MobilenetV1/Conv2d_3_depthwise/BatchNorm/beta/read = Identity[T=DT_FLOAT, _class=["loc:#MobilenetV1/Conv2d_3_depthwise/BatchNorm/beta"], _device="/job:localhost/replica:0/task:0/cpu:0"](MobilenetV1/Conv2d_3_depthwise/BatchNorm/beta)]]

Tensorflow summery merge error : Shape [-1,784] has negative dimensions

I am trying to get summary of a training process of the neural net below.
import tensorflow as tf
import numpy as np
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets(".\MNIST",one_hot=True)
# Create the model
def train_and_test(hidden1,hidden2, learning_rate, epochs, batch_size):
with tf.name_scope("first_layer"):
input_data = tf.placeholder(tf.float32, [batch_size, 784], name = "input")
weights1 = tf.Variable(
tf.random_normal(shape =[784, hidden1],stddev=0.1),name = "weights")
bias = tf.Variable(tf.constant(0.0,shape =[hidden1]), name = "bias")
activation = tf.nn.relu(
tf.matmul(input_data, weights1) + bias, name = "relu_act")
tf.summary.histogram("first_activation", activation)
with tf.name_scope("second_layer"):
weights2 = tf.Variable(
tf.random_normal(shape =[hidden1, hidden2],stddev=0.1),
name = "weights")
bias2 = tf.Variable(tf.constant(0.0,shape =[hidden2]), name = "bias")
activation2 = tf.nn.relu(
tf.matmul(activation, weights2) + bias2, name = "relu_act")
tf.summary.histogram("second_activation", activation2)
with tf.name_scope("output_layer"):
weights3 = tf.Variable(
tf.random_normal(shape=[hidden2, 10],stddev=0.5), name = "weights")
bias3 = tf.Variable(tf.constant(1.0, shape =[10]), name = "bias")
output = tf.add(
tf.matmul(activation2, weights3, name = "mul"), bias3, name = "output")
tf.summary.histogram("output_activation", output)
y_ = tf.placeholder(tf.float32, [batch_size, 10])
with tf.name_scope("loss"):
cross_entropy = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=output))
tf.summary.scalar("cross_entropy", cross_entropy)
with tf.name_scope("train"):
train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(cross_entropy)
with tf.name_scope("tests"):
correct_prediction = tf.equal(tf.argmax(output, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
summary_op = tf.summary.merge_all()
sess = tf.InteractiveSession()
writer = tf.summary.FileWriter("./data", sess.graph)
tf.global_variables_initializer().run()
# Train
for i in range(epochs):
batch_xs, batch_ys = mnist.train.next_batch(batch_size)
_, summary = sess.run([train_step,summary_op], feed_dict={input_data: batch_xs, y_: batch_ys})
writer.add_summary(summary)
if i % 10 ==0:
test_xs, test_ys = mnist.train.next_batch(batch_size)
test_accuracy = sess.run(accuracy, feed_dict = {input_data : test_xs, y_ : test_ys})
writer.close()
return test_accuracy
if __name__ =="__main__":
print(train_and_test(500, 200, 0.001, 10000, 100))
I am testing the model every 10 step with a random batch of test data.
The problem is in the summery writer. The sess.run() inside the for loop throws following error.
Traceback (most recent call last):
File "<ipython-input-18-78c88c8e6471>", line 1, in <module>
runfile('C:/Users/Suman
Nepal/Documents/Projects/MNISTtensorflow/mnist.py', wdir='C:/Users/Suman
Nepal/Documents/Projects/MNISTtensorflow')
File "C:\Users\Suman Nepal\Anaconda3\lib\site-
packages\spyder\utils\site\sitecustomize.py", line 880, in runfile
execfile(filename, namespace)
File "C:\Users\Suman Nepal\Anaconda3\lib\site-
packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/Suman Nepal/Documents/Projects/MNISTtensorflow/mnist.py", line 68, in <module>
print(train_and_test(500, 200, 0.001, 100, 100))
File "C:/Users/Suman Nepal/Documents/Projects/MNISTtensorflow/mnist.py", line 58, in train_and_test
_, summary = sess.run([train_step,summary_op], feed_dict={input_data: batch_xs, y_: batch_ys})
File "C:\Users\Suman Nepal\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 789, in run
run_metadata_ptr)
File "C:\Users\Suman Nepal\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 997, in _run
feed_dict_string, options, run_metadata)
File "C:\Users\Suman Nepal\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1132, in _do_run
target_list, options, run_metadata)
File "C:\Users\Suman Nepal\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1152, in _do_call
raise type(e)(node_def, op, message)
InvalidArgumentError: Shape [-1,784] has negative dimensions
[[Node: first_layer_5/input = Placeholder[dtype=DT_FLOAT, shape=[?,784], _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
Caused by op 'first_layer_5/input', defined at:
File "C:\Users\Suman Nepal\Anaconda3\lib\site-packages\spyder\utils\ipython\start_kernel.py", line 231, in <module>
main()
File "C:\Users\Suman Nepal\Anaconda3\lib\site-packages\spyder\utils\ipython\start_kernel.py", line 227, in main
kernel.start()
File "C:\Users\Suman Nepal\Anaconda3\lib\site-packages\ipykernel\kernelapp.py", line 477, in start
ioloop.IOLoop.instance().start()
File "C:\Users\Suman Nepal\Anaconda3\lib\site-packages\zmq\eventloop\ioloop.py", line 177, in start
super(ZMQIOLoop, self).start()
File "C:\Users\Suman Nepal\Anaconda3\lib\site-packages\tornado\ioloop.py", line 888, in start
handler_func(fd_obj, events)
File "C:\Users\Suman Nepal\Anaconda3\lib\site-packages\tornado\stack_context.py", line 277, in null_wrapper
return fn(*args, **kwargs)
File "C:\Users\Suman Nepal\Anaconda3\lib\site-packages\zmq\eventloop\zmqstream.py", line 440, in _handle_events
self._handle_recv()
File "C:\Users\Suman Nepal\Anaconda3\lib\site-packages\zmq\eventloop\zmqstream.py", line 472, in _handle_recv
self._run_callback(callback, msg)
File "C:\Users\Suman Nepal\Anaconda3\lib\site-packages\zmq\eventloop\zmqstream.py", line 414, in _run_callback
callback(*args, **kwargs)
File "C:\Users\Suman Nepal\Anaconda3\lib\site-packages\tornado\stack_context.py", line 277, in null_wrapper
return fn(*args, **kwargs)
File "C:\Users\Suman Nepal\Anaconda3\lib\site-packages\ipykernel\kernelbase.py", line 283, in dispatcher
return self.dispatch_shell(stream, msg)
File "C:\Users\Suman Nepal\Anaconda3\lib\site-packages\ipykernel\kernelbase.py", line 235, in dispatch_shell
handler(stream, idents, msg)
File "C:\Users\Suman Nepal\Anaconda3\lib\site-packages\ipykernel\kernelbase.py", line 399, in execute_request
user_expressions, allow_stdin)
File "C:\Users\Suman Nepal\Anaconda3\lib\site-packages\ipykernel\ipkernel.py", line 196, in do_execute
res = shell.run_cell(code, store_history=store_history, silent=silent)
File "C:\Users\Suman Nepal\Anaconda3\lib\site-packages\ipykernel\zmqshell.py", line 533, in run_cell
return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
File "C:\Users\Suman Nepal\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2717, in run_cell
interactivity=interactivity, compiler=compiler, result=result)
File "C:\Users\Suman Nepal\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2827, in run_ast_nodes
if self.run_code(code, result):
File "C:\Users\Suman Nepal\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2881, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-8-78c88c8e6471>", line 1, in <module>
runfile('C:/Users/Suman Nepal/Documents/Projects/MNISTtensorflow/mnist.py', wdir='C:/Users/Suman Nepal/Documents/Projects/MNISTtensorflow')
File "C:\Users\Suman Nepal\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 880, in runfile
execfile(filename, namespace)
File "C:\Users\Suman Nepal\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/Suman Nepal/Documents/Projects/MNISTtensorflow/mnist.py", line 86, in <module>
File "C:/Users/Suman Nepal/Documents/Projects/MNISTtensorflow/mnist.py", line 12, in train_and_test
input_data = tf.placeholder(tf.float32, [None, 784], name = "input")
File "C:\Users\Suman Nepal\Anaconda3\lib\site-packages\tensorflow\python\ops\array_ops.py", line 1530, in placeholder
return gen_array_ops._placeholder(dtype=dtype, shape=shape, name=name)
File "C:\Users\Suman Nepal\Anaconda3\lib\site-packages\tensorflow\python\ops\gen_array_ops.py", line 1954, in _placeholder
name=name)
File "C:\Users\Suman Nepal\Anaconda3\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 767, in apply_op
op_def=op_def)
File "C:\Users\Suman Nepal\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 2506, in create_op
original_op=self._default_original_op, op_def=op_def)
File "C:\Users\Suman Nepal\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 1269, in __init__
self._traceback = _extract_stack()
InvalidArgumentError (see above for traceback): Shape [-1,784] has negative dimensions
[[Node: first_layer_5/input = Placeholder[dtype=DT_FLOAT, shape=[?,784], _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
If I deleted all the summary writers and summary, the model runs fine.
Can you help me spot the problem here? I tried manipulating the shapes of tensors but got nowhere.

From one comment of the deleted answer, from the original poster:
I actually build a neural net under with tf.Graph() as g. I removed the interactive session and started session as with tf.Session(g) as sess. It fixed the problem.
The graph g was not marked as the default graph that way, thus the session (tf.InteractiveSession in the original code) would use another graph instead.
Note that I stumbled upon here because of the same error message. In my case, I had accidentally something like this:
input_data = tf.placeholder(tf.float32, shape=(None, 50))
input_data = tf.tanh(input_data)
session.run(..., feed_dict={input_data: ...})
I.e. I didn't feed the placeholder. It seems that some other tensor operations can then result in this confusing error as internally an undefined dimension is represented as -1.

I was also having this problem. Searching around the basic consensus is to check for problems somewhere else in your code.
What fixed it for me was I was doing a sess.run(summary_op) without feeding in data for my placeholders.
Tensorflow seems to be a bit strange with placeholders, often they won't mind you not feeding them if you're trying to evaluate part of the graph that is independent of them. Here though, it did.

This has may have to do with the InteractiveSession initialization.
I initialized it at the beginning and then it worked - then initialized the global variables within the session.
I am unable to reproduce the error with the old code, which makes it unpredictable or caching settings somewhere.
import tensorflow as tf
sess = tf.InteractiveSession()
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
x = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784,10]))
b = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(x, W)+b)
y_ = tf.placeholder(tf.float32, [None,10])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
train_step = tf.train.GradientDescentOptimizer(0.05).minimize(cross_entropy)
sess.run(tf.global_variables_initializer())
for _ in range(1000):
batch_xs, batch_ys = mnist.train.next_batch(100)
#print batch_xs.shape, batch_ys.shape
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

expand the output of CNN

I am trying to expand the outputs of my network from 11 to 12 outputs, I have restored the previous checkpoint that is already retrained on 11 outputs. I found an answer from here. Using that I found out how to change the shape of the output layer, to expand it to fit another row of weights, but I don't know if I initialize the weight and biases correctly. Actually I don't get any compile or runtime errors but the the test accuracy drops from 95% to 9%. It may be that there is something wrong somewhere in the code. Here's the code:
w_b_not = {
'weight_4': tf.Variable(tf.random_normal([num_hidden, num_labels], stddev=0.1)),
'bias_4' : tf.Variable(tf.constant(1.0, shape=[num_labels])),}
w_b = {
'wc1_0': tf.Variable(tf.random_normal([patch_size_1, patch_size_1, num_channels, depth],stddev=0.1)),
.....
'bc1_0' : tf.Variable(tf.zeros([depth]))}
.... #here is the networks model
num_steps = 1001
with tf.Session(graph=graph) as sess:
ckpt = ('path_of_checkpoint.ckpt')
if os.path.isfile(ckpt) :
layer6_weights = tf.Variable(tf.random_normal([num_hidden, num_labels], stddev=0.1))
layer6_biases = tf.Variable(tf.constant(1.0, shape=[num_labels]))
n_w_b = {
'new_layer_weights' : tf.concat(0,[w_b_not['weight_4'], layer6_weights]),
'new_layer_biases' : tf.concat(0,[w_b_not['bias_4'], layer6_biases])}
resize_var_1 = tf.assign(w_b_not['weight_4'], n_w_b['new_layer_weights'], validate_shape=False)
resize_var_2 = tf.assign(w_b_not['bias_4'], n_w_b['new_layer_biases'], validate_shape=False)
logits = tf.get_collection('logits')[0]
w_b_new_saver = tf.train.Saver()
init_op = tf.initialize_all_variables()
w_b_saver.restore(sess, ckpt)
print("restore complete")
for step in xrange(num_steps):
sess.run(init_op)
print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval() , test_labels, force = False ))
After deleted the for loop i got this error:
Traceback (most recent call last):
File "/home/owner/tensorflow/tensorflow/models/image/mnist/new_dataset/Nets.py", line 237, in <module>
print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval() , test_labels, force = False ))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 502, in eval
return _eval_using_default_session(self, feed_dict, self.graph, session)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 3334, in _eval_using_default_session
return session.run(tensors, feed_dict)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 340, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 564, in _run
feed_dict_string, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 637, in _do_run
target_list, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 659, in _do_call
e.code)
FailedPreconditionError: Attempting to use uninitialized value Variable
[[Node: Variable/read = Identity[T=DT_FLOAT, _class=["loc:#Variable"], _device="/job:localhost/replica:0/task:0/gpu:0"](Variable)]]
[[Node: Softmax_2/_19 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_715_Softmax_2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
Caused by op u'Variable/read', defined at:
File "<string>", line 1, in <module>
File "/usr/lib/python2.7/idlelib/run.py", line 116, in main
ret = method(*args, **kwargs)
File "/usr/lib/python2.7/idlelib/run.py", line 324, in runcode
exec code in self.locals
File "/home/owner/tensorflow/tensorflow/models/image/mnist/new_dataset/Nets.py", line 155, in <module>
'weight_4': tf.Variable(tf.random_normal([num_hidden, num_labels], stddev=0.1)),
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variables.py", line 206, in __init__
dtype=dtype)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variables.py", line 275, in _init_from_args
self._snapshot = array_ops.identity(self._variable, name="read")
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 609, in identity
return _op_def_lib.apply_op("Identity", input=input, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/op_def_library.py", line 655, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2154, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1154, in __init__
self._traceback = _extract_stack()

Change this
w_b_new_saver = tf.train.Saver()
init_op = tf.initialize_all_variables()
w_b_saver.restore(sess, ckpt)
print("restore complete")
for step in xrange(num_steps):
sess.run(init_op)
print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval() , test_labels, force = False ))
to:
w_b_new_saver = tf.train.Saver()
w_b_saver.restore(sess, ckpt)
print("restore complete")
sess.run(tf.initialize_variables( list( tf.get_variable(name) for name in sess.run( tf.report_uninitialized_variables( tf.all_variables( ) ) ) ) ))
# Only initialize the unitialize variables
# That really long line will go through the variables and check which ones are not initialized
# If you already know which ones aren't initialized just pass those in directly
w_b_new_saver2 = tf.train.Saver() # Now you can make a new saver
# in case you want to save this changed model with new weights
print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval() , test_labels, force = False ))
You reinitialized the variables which causes you're accuracy to go back down to 9% because you're resetting all of the weights.

How to calculate AUC with tensorflow?

I've built a binary classifier using Tensorflow and now I would like to evaluate the classifier using AUC and accuracy.
As far as accuracy is concerned, I can easily do like this:
X = tf.placeholder('float', [None, n_input])
y = tf.placeholder('float', [None, n_classes])
pred = mlp(X, weights, biases, dropout_keep_prob)
correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
When calculating AUC I use the following:
print(tf.argmax(pred, 1).dtype.name)
print(tf.argmax(pred, 1).dtype.name)
a = tf.cast(tf.argmax(pred, 1),tf.float32)
b = tf.cast(tf.argmax(y,1),tf.float32)
auc = tf.contrib.metrics.streaming_auc(a, b)
and in the training loop:
train_acc = sess.run(accuracy, feed_dict={X: batch_xs, y: batch_ys, dropout_keep_prob:1.})
train_auc = sess.run(auc, feed_dict={X: batch_xs, y: batch_ys, dropout_keep_prob:1.})
which gives me the following output (and error) error:
int64
int64
/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/array_ops.py:1197: VisibleDeprecationWarning: converting an array with ndim > 0 to an index will result in an error in the future
result_shape.insert(dim, 1)
Net built successfully...
Starting training...
Epoch: 000/300 cost: 0.618990561
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 715, in _do_call
return fn(*args)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 697, in _run_fn
status, run_metadata)
File "/usr/lib/python3.5/contextlib.py", line 66, in __exit__
next(self.gen)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/errors.py", line 450, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors.FailedPreconditionError: Attempting to use uninitialized value auc/false_positives
[[Node: auc/false_positives/read = Identity[T=DT_FLOAT, _class=["loc:#auc/false_positives"], _device="/job:localhost/replica:0/task:0/cpu:0"](auc/false_positives)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "./mlp_.py", line 152, in <module>
train_auc = sess.run(auc, feed_dict={X: batch_xs, y: batch_ys, dropout_keep_prob:1.})
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 372, in run
run_metadata_ptr)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 636, in _run
feed_dict_string, options, run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 708, in _do_run
target_list, options, run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 728, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors.FailedPreconditionError: Attempting to use uninitialized value auc/false_positives
[[Node: auc/false_positives/read = Identity[T=DT_FLOAT, _class=["loc:#auc/false_positives"], _device="/job:localhost/replica:0/task:0/cpu:0"](auc/false_positives)]]
Caused by op 'auc/false_positives/read', defined at:
File "./mlp_.py", line 121, in <module>
auc = tf.contrib.metrics.streaming_auc(a, b)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/metrics/python/ops/metric_ops.py", line 718, in streaming_auc
predictions, labels, thresholds, ignore_mask)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/metrics/python/ops/metric_ops.py", line 603, in _tp_fn_tn_fp
false_positives = _create_local('false_positives', shape=[num_thresholds])
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/metrics/python/ops/metric_ops.py", line 75, in _create_local
collections=collections)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/variables.py", line 211, in __init__
dtype=dtype)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/variables.py", line 319, in _init_from_args
self._snapshot = array_ops.identity(self._variable, name="read")
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 831, in identity
result = _op_def_lib.apply_op("Identity", input=input, name=name)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/op_def_library.py", line 704, in apply_op
op_def=op_def)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 2260, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1230, in __init__
self._traceback = _extract_stack()
I don't understand what I am doing wrong and why when using accuracy only the code runs fine but when using AUC it throws this error.
Could you please hint me in the right direction to understand how to fix this?
My objective is to calculate AUC and ROC for better evaluating the binary classifier performances.

I've found the same issue on github. At the moment, it seems that you also need to run sess.run(tf.initialize_local_variables()) in order to make tf.contrib.metrics.streaming_auc() work. They're working on it.
Here you have an example demonstrating how you can solve this issue:
import tensorflow as tf
a = tf.Variable([0.1, 0.5])
b = tf.Variable([0.2, 0.6])
auc = tf.contrib.metrics.streaming_auc(a, b)
sess = tf.Session()
sess.run(tf.initialize_all_variables())
sess.run(tf.initialize_local_variables()) # try commenting this line and you'll get the error
train_auc = sess.run(auc)
print(train_auc)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

tensorflow estimator from_generator, how to set TensorShape? - tensorflow

As mentioned by #FengTian in an update, the correct answer was to use shape ([], [], []) as the output shape of the generator: tf.data.Dataset.from_generator(lambda: gen(n), (tf.int64, tf.string, tf.float32), ([], [], []))

Related

tensorflow training with feature_column_lib.input_layer

Training model from logits and checkpoint

Tensorflow summery merge error : Shape [-1,784] has negative dimensions

expand the output of CNN

How to calculate AUC with tensorflow?

Categories

Resources