I want to implement the code in th book Tesorflow for Machine Intelligence, the code runs well at the first time,but when run it again ,the error
"Variable rnn/gru_cell/gates/weights already exists, disallowed" occurs. when I restart the console the error disapear and it occurs after the first running or debug. the code is below:
def prediction(self):
output, _ = tf.nn.dynamic_rnn(tf.contrib.rnn.GRUCell(300),
self.data,
dtype = tf.float32,
sequence_length = self.length)
last = self._last_relevant(output, self.length)
#softmax层
num_classes =int(self.target.get_shape()[1])
weight = tf.Variable(tf.truncated_normal([self.params.rnn_hidden, num_classes], stddev = 0.01))
bias = tf.Variable(tf.constant(0.1, shape = [num_classes]))
prediction = tf.nn.softmax(tf.matmul(last, weight) + bias)
return prediction
anyone can help me with the problem?
Code that adds things to your graph (which includes pretty much everything in the function you posted) should usually only be run once. When you want to train your model or have it make a prediction, you would use something like sess.run with a feed_dict and the ops you want output from.
If you actually want to delete your graph without restarting the console, you can use tf.reset_default_graph().
Related
I am trying to use a for loop within a model definition (and attempting to recreate TabNet in keras).
class TabNet(keras.Model):
def __init__(self, input_dim, output_dim, steps, n_d, n_a, gamma=1.3):
super().__init__()
self.n_d, self.n_a, self.steps = n_d, n_a, steps
self.shared = SharedBlock(n_d+n_a)
self.first_block = SharedBlock(n_a)
self.decision_blocks = [DecisionBlock(n_d+n_a)] * steps
self.prior_scale = Prior(input_dim, gamma)
self.bn = layers.BatchNormalization()
self.attention = [AttentiveTransformer(input_dim)] * steps
self.final = layers.Dense(output_dim)
self.eps = 1e-8
#tf.function
def call(self, x):
self.prior_scale.reset()
final_out = 0
M_loss = 0
x = self.bn(x)
attention = self.first_block(self.shared(x))
for i in range(self.steps):
mask = self.attention[i](attention, self.prior_scale.P)
M_loss += tf.reduce_sum(mask * tf.math.log(mask + self.eps), axis=-1) / self.steps
prior = self.prior_scale(mask)
out = self.decision_blocks[i](self.shared(x * prior))
attention, output = out[:,:self.n_a], out[:,self.n_a:]
final_out += tf.nn.relu(output)
return self.final(final_out), M_loss
If you're unaware of what those individual blocks are, simply assume that they are linear layers. I have a colab notebook with the full code if you wish to see what they actually are.
However, I cannot train it as I am getting the error iterating over tf.Tensor is not allowed: AutoGraph did not convert this function. Try decorating it directly with #tf.function.. I have decorated it, and still does not help.
I am fairly certain it is the for loop that is causing me the error when I do model.fit(train_x, train_y). Would appreciate any thoughts on how to implement the above for loop in the tensorflow way. tf.while_loop is all I have seen so far and the examples given are fairly simplistic compared to what I want to do.
this is my proposal...
I don't know what your network exactly do but what I can see is that you want to produce 2 outputs and combine them inside your loss. One of your output is also the results of some hidden operation inside the network (M_loss).
so if you want to return 2 outputs, 2 targets are needed in keras in order to make a fit. In the code I provide below, the first target is the real labels and the other is a fake output (an array of zeros).
As said before, you try to build a combined loss as sparse_entropy(y_true, y_pred) - reg_sparse * M_loss. To make this possible I split the loss in two pieces (one for each output): the sparse part and the M_loss part. The sparse loss is simply SparseCategoricalCrossentropy(from_logits=True) from keras, while for the M_loss, I wrote this function following your code
def m_loss(y_true, y_pred):
m = tf.reduce_mean(y_pred, keepdims=True)
return m
the m_loss use only 'y_pred' that are the hidden pieces of your network. the y_true in this case doesn't matter for the required operation. this is why we pass an array of zeros when fitting.
At this point, we have to combine the two losses and this possible in keras in this way
reg_sparse = 0.1
model.compile('Adam', loss=[sce, m_loss], loss_weights=[1,-reg_sparse])
model.fit(train_x, [train_y, np.zeros(train_y.shape[0])], epochs=3)
in this case, the final loss is the result of the combination of 1*sce + (-reg_sparse)*m_loss
this is the full running code: https://colab.research.google.com/drive/152q1rmqTJ0dWLbFN8PqzCBhWkVKirkU5?usp=sharing
I also make some little changes in TabNet, for example in the way final_out and M_loss are created
No actually it is not a problem of for loop. I checked your code, the problem was that you forgot to call the superclass constructor in your SharedBlock, DecisionBlock and Prior.
For e.g your code should look like.
class SharedBlock(layers.Layer):
def __init__(self, units, mult=tf.sqrt(0.5)):
super().__init__()
self.layer1 = FCBlock(units)
self.layer2 = FCBlock(units)
self.mult = mult
After doing these changes you will not see that error again but something else comes up.
TypeError: in user code:
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:1147 predict_function *
outputs = self.distribute_strategy.run(
<ipython-input-46-f609cb1acdfa>:15 call *
self.prior_scale.reset()
TypeError: tf__reset() missing 1 required positional argument: 'len_x'
To resolve this issue you will need to do following changes in the class class Prior(layers.Layer):.
def reset(self, len_x=1.0):
self.P = 1.0
Then you will get another issue.
AttributeError: in user code:
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:1147 predict_function *
outputs = self.distribute_strategy.run(
<ipython-input-46-f609cb1acdfa>:26 call *
out = self.decision[i](self.shared(x * prior))
AttributeError: 'TabNet' object has no attribute 'decision'
For this issue I will request to open another question as I think you main issue is resolved.
UPDATE:
You can look into the comment section of this answer, there a solution has been provided for the issue AttributeError: 'TabNet' object has no attribute 'decision'
UPDATE: 21/07
I have to disappoint you again that the issue is not with the for loop.
If you look closely at the error log you will see that the issue is due to the full_loss function.
<ipython-input-10-07e59f23d230>:7 full_loss *
logits, M_loss = y_pred
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py:561 __iter__
self._disallow_iteration()
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py:554 _disallow_iteration
self._disallow_when_autograph_enabled("iterating over `tf.Tensor`")
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py:532 _disallow_when_autograph_enabled
" decorating it directly with #tf.function.".format(task))
OperatorNotAllowedInGraphError: iterating over `tf.Tensor` is not allowed: AutoGraph did not convert this function. Try decorating it directly with #tf.function.
The exact problem is caused by the below statement.
logits, M_loss = y_pred
If you use the below code that does not use your loss function you will see a different result.
model.compile('Adam', loss='sparse_categorical_crossentropy')
model.fit(train_x, train_y, batch_size=1)
Received a label value of 1 which is outside the valid range of [0, 1). Label values: 1
[[node sparse_categorical_crossentropy_1/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits (defined at <ipython-input-26-d39f533b7a69>:2) ]] [Op:__inference_train_function_18003]
I do not understand the model code completely and the model.summary() is not that helpful in your case. There is some problem with your last layer, at least the error message suggests that you do not have ample neurons (1 for each class).
I will suggest looking into the last layer and the loss function.
Why I am sure it is not due to for loop is for the fact that even if you comment the for loop you will still receive the same error.
I hope I have helped you further, took me a few hours to figure it out.
I'm having problems with the function resnet50.preprocess_input() from tensorflow.compat.v1.keras.applications.resnet50
In particular, after several trial and error, I can say the problem comes when inside a dataset generator function, there is a call:
dataset.map(pre_processing_image)
where
def pre_processing_image(image):
image = resnet50.preprocess_input(image)
return image
and the dataset is splitted in batches. When I reach the last batch, no matter if it is complete or smaller, I get an error similar to
Tensor("Const:0", shape=(3,), dtype=float32) must be from the same graph as Tensor("BatchDatasetV2:0", shape=(), dtype=variant)
I can't really understand what is going on because
If I use another preprocess_input, such as the one of mobilenet, without changing anything else then there is no problem. By digging the code I found that those functions are all calling this one but mobilenet uses "mode='tf'" while resnet should use 'caffe'
The error isn't related to the fact the last batch is smaller compared to the others, I tried to make them all equals but the errors keeps happening at the last step of the first epoch of training
If I don't use map but instead pre_processing_image is called directly inside tf.data.Dataset.from_generator there is no problem.. only the code becomes a lot slower
To give you the full code:
def image_gen(ds_path, ds_scores=None):
for i, path in enumerate(ds_path):
img = im.load_img(path,
color_mode='rgb',
target_size=(NETWORK_INFO.value[1],NETWORK_INFO.value[1]),
interpolation='bilinear')
img_to_numpy = np.array(img)
if (ds_scores is not None):
yield img_to_numpy, ds_scores[i]
else:
yield img_to_numpy
def pre_processing_image(image, score=None):
image = resnet50.preprocess_input(image)
if score is None:
return image
else:
return image, score
def generator(batchsize, train=False, val=False, test=False, shuffle=False):
with tf.Session() as sess:
if (train):
dataset = tf.data.Dataset.from_generator(lambda: image_gen(train_paths, train_scores),
output_types=(tf.float32, tf.float32))
elif(val):
dataset = tf.data.Dataset.from_generator(lambda: image_gen(val_paths, val_scores),
output_types=(tf.float32, tf.float32))
else:
dataset = tf.data.Dataset.from_generator(lambda: image_gen(test_paths),
output_types=(tf.float32))
if (shuffle):
dataset = dataset.shuffle(buffer_size=10*batchsize)
dataset = dataset.batch(batchsize)
dataset = dataset.map(pre_processing_image,
num_parallel_calls=tf.data.experimental.AUTOTUNE)
dataset = dataset.prefetch(buffer_size=2)
dataset = dataset.repeat(count = -1)
iterable = tf.data.make_initializable_iterator(dataset)
batch = iterable.get_next()
sess.run(iterable.initializer)
# yield all the time it is required
while True:
try:
yield sess.run(batch)
except tf.errors.OutOfRangeError:
pass
I tried to mess with the position of the map function and shuffle/prefatch parameters but nothing solved the issue. Finally as you can see I use the same function for both training and validation generator, I just change the input parameter to selecet with dataset the function should use
Solved the issue.
I tried to search for something similar but regarding other networks that shared the same image preprocessing (such as VGG16) and it comes out those related issues were keras bugs
I updated to the last commit the module keras-applications (commit, not release!) and the code now works without problems
This question has been asked here, difference is my problem is focused on Estimator.
Some context: We have trained a model using estimator and get some variable defined within Estimator input_fn, this function preprocesses data to batches. Now, we are moving to prediction. During the prediction, we use the same input_fn to read in and process the data. But got error saying variable (word_embeddings) does not exist (variables exist in the chkp graph), here's the relevant bit of code in input_fn:
with tf.variable_scope('vocabulary', reuse=tf.AUTO_REUSE):
if mode == tf.estimator.ModeKeys.TRAIN:
word_to_index, word_to_vec = load_embedding(graph_params["word_to_vec"])
word_embeddings = tf.get_variable(initializer=tf.constant(word_to_vec, dtype=tf.float32),
trainable=False,
name="word_to_vec",
dtype=tf.float32)
else:
word_embeddings = tf.get_variable("word_to_vec", dtype=tf.float32)
basically, when it's in prediction mode, else is invoked to load up variables in checkpoint. Failure of recognizing this variable indicates a) inappropriate usage of scope; b) graph is not restored. I don't think scope matters that much here as long as reuse is set properly.
I suspect that is because the graph is not yet restored at input_fn phase. Usually, the graph is restored by calling saver.restore(sess, "/tmp/model.ckpt") reference. Investigation of estimator source code doesn't get me anything relating to restore, the best shot is MonitoredSession, a wrapper of training. It's already been stretch so much from the original problem, not confident if I'm on the right path, I'm looking for help here if anyone has any insights.
One line summary of my question: How does graph get restored within tf.estimator, via input_fn or model_fn?
Hi I think that you error comes simply because you didn't specify the shape in the tf.get_variable (at predict) , it seems that you need to specify the shape even if the variable is going to be restored.
I've made the following test with a simple linear regressor estimator that simply needs to predict x + 5
def input_fn(mode):
def _input_fn():
with tf.variable_scope('all_input_fn', reuse=tf.AUTO_REUSE):
if mode == tf.estimator.ModeKeys.TRAIN:
var_to_follow = tf.get_variable('var_to_follow', initializer=tf.constant(20))
x_data = np.random.randn(1000)
labels = x_data + 5
return {'x':x_data}, labels
elif mode == tf.estimator.ModeKeys.PREDICT:
var_to_follow = tf.get_variable("var_to_follow", dtype=tf.int32, shape=[])
return {'x':[0,10,100,var_to_follow]}
return _input_fn
featcols = [tf.feature_column.numeric_column('x')]
model = tf.estimator.LinearRegressor(featcols, './outdir')
This code works perfectly fine, the value of the const is 20 and also for fun use it in my test set to confirm :p
However if you remove the shape=[] , it breaks, you can also give another initializer such as tf.constant(500) and everything will work and 20 will be used.
By running
model.train(input_fn(tf.estimator.ModeKeys.TRAIN), max_steps=10000)
and
preds = model.predict(input_fn(tf.estimator.ModeKeys.PREDICT))
print(next(preds))
You can visualize the graph and you'll see that a) the scoping is normal and b) the graph is restored.
Hope this will help you.
In my TensorFlow code I want my network to take inputs from one of the two StagingArea objects depending upon whether I want to do training or testing.
A part of the graph construction code I wrote is as follows :
with tf.device("/gpu:0"):
for i in range(numgpus):
with tf.variable_scope(tf.get_variable_scope(), reuse=i>0) as vscope:
with tf.device('/gpu:{}'.format(i)):
with tf.name_scope('GPU-Tower-{}'.format(i)) as scope:
phase = tf.get_variable("phase", [], initializer=tf.zeros_initializer(),dtype=tf.uint8, trainable=False)
phaseassigntest = phase.assign(1)
phaseassigntrain = phase.assign(0)
phasetest = tf.equal(phase, 0)
is_training = tf.cond(phasetest, lambda: tf.constant(True), lambda: tf.constant(False))
trainstagingarea = tf.contrib.staging.StagingArea([tf.float32, tf.int32], shapes=[[trainbatchsize, 3, 221, 221], [trainbatchsize]], capacity=20)
putoptrain = trainstagingarea.put(train_iterator.get_next())
trainputop.append(putoptrain)
getoptrain = trainstagingarea.get()
traingetop.append(getoptrain)
trainclearop = trainstagingarea.clear()
trainstageclear.append(trainclearop)
trainsizeop = trainstagingarea.size()
trainstagesize.append(trainsizeop)
valstagingarea = tf.contrib.staging.StagingArea([tf.float32, tf.int32], shapes=[[valbatchsize, 3, 221, 221], [valbatchsize]], capacity=20)
putopval = valstagingarea.put(val_iterator.get_next())
valputop.append(putopval)
getopval = valstagingarea.get()
valgetop.append(getopval)
valclearop = valstagingarea.clear()
valstageclear.append(valclearop)
valsizeop = valstagingarea.size()
valstagesize.append(valsizeop)
#elem = valgetop[i]
elem = tf.cond(is_training,lambda: traingetop[i],lambda: valgetop[i])
img = elem[0]
label = elem[1]
labelonehot = tf.one_hot(label, depth=numclasses)
net, networksummaries = overfeataccurate(img,numclasses=numclasses, phase=is_training)
I have used tf.cond to make sure that the network is fed by one of the two StagingArea objects. One is meant for training and the other one is meant for validation.
Now, when I try to execute the graph as follows, I do not get any result and infact the code just hangs and I have to kill the process.
with tf.Session(graph=g,config=config) as sess:
sess.run(init_op)
sess.run(tf.local_variables_initializer())
sess.run(val_initialize)
for i in range(20):
sess.run(valputop)
print(sess.run(valstagesize))
writer = tf.summary.FileWriter('.', graph=tf.get_default_graph())
epoch = 0
iter = 0
print("Performing Validation")
sess.run(phaseassigntest)
saver = tf.train.Saver()
while(epoch<10):
time_init = time.time()
while True:
try:
[val_accu, _, summaries] = sess.run([towervalidation, towervalidationupdateop,validation_summary_op])
print(val_accu)
when instead of tf.cond() I directly assign elem = valgetop[i], the code works just fine.
Am I missing something over here ?
What is the right way to feed my network based on whether I want to do training or testing ?
NOTE The error does not go away even if I set numgpus to 1.
Your problem
What you think tf.cond does
Based on the flag, execute what is required to put either traingetop[i] or valgetop[i] into your elem tensor.
What tf.cond actually does
Executes what is required to get both traingetop[i] and valgetop[i], then passes one of them into your elem tensor.
So
The reason it is hanging forever is because it's waiting for an element to be added to your training staging area (so that it can get that element and discard it). You're forgiven for not realising this is what it's doing; it's actually very counter-intuitive. The documentation is awfully unclear on how to deal with this.
Recommended Solution (by Tensorflow documentation)
If you really need the queues to be in the same graph, then you need to make two copies of your ENTIRE graph, one that is fed by your training staging area, and one that is fed by your validation staging area. Then you just use the relevant tensor in your sess.run call. I recommend creating a function that takes a queue output tensor, and returns a model_output tensor. Now you have a train_time_output tensor and a validation_time_output tensor, and you can choose which one you want to execute in your sess.run.
Warning
You need to make sure that you aren't actually creating new variables to go along with these new ops. To do that take a look at the latest documentation on variables. It looks like they've simplified it from v0.12, and it essentially boils down to using tf.get_variable instead of tf.Variable to create your variables.
My preferred work around
Although that is the recommended solution (AFAIK), it is extremely unsatisfying to me; you're creating a whole other set of operations on the graph that just happen to use the same weights. It seems like there's a lot of potential for programmer error by abusing the separation between train time and test/validation time (resulting in the model acting unexpectedly different at these times). Worse; it doesn't solve the problem of tf.cond demanding the values for inputs to both branches, it just forces you to copy your whole graph, which is not always possible.
I prefer to just not have my queues in the graph like that, and treat the model as a function which can be fed an example without caring where it's from. That is, I would instantiate the model with a tf.placeholder as the input, and at execution time I would use feed_dict to actually provide the value. It would function something like this
#inside main training loop
if time_to_train:
example = sess.run(traingettop)
else:
example = sess.run(valgettop)
result = sess.run(model_output, {input_placeholder: example})
It's very useful to note that you can use the feed_dict to feed any value for any tensor anywhere in your model. So, you can change any model definition that, due to tf.cond would always require an input, like:
a = tf.constant(some_value)
b = tf.placeholder(tf.float32)
flag = tf.placeholder(tf.bool, [])
one_of_them = tf.cond(flag, a, b)
model_output = build_graph(one_of_them)
Into a definition that doesn't, like:
a = tf.constant(some_value)
model_output = build_graph(a)
Remembering that you can always overwrite what a is at execution time:
# In main training loop,
sess.run(train_op, {a: some_other_value})
This essentially pushes the conditional into native python land. In your code you might end up with something like:
if condition_satisfied:
sess.run(train_op, {a:some_other_value})
else:
sess.run(train_op)
Performance concerns
If you are using tensorflow on a single machine, then there is practically no performance cost to this solution, as the numpy array/s put into the example python variable are actually still stored on the GPU.
If you are using tensorflow in a distributed fashion, then this solution would kill your performance; it would require sending the example from whatever machine it's on to the master so that it can send it back.
I'm trying to make a small neural network in tensorflow and I'm a bit new in this. I saw this in a tutorial (http://de.slideshare.net/tw_dsconf/tensorflow-tutorial) and everything is working fine till I try to optimize the weights (with gradient descent) since I get a Null value.
with tf.Session() as sess:
x = tf.placeholder("float",[1,3],name="x")
w = tf.Variable(tf.random_uniform([3,3]),name="w")
y = tf.matmul(x,w)
labels = tf.placeholder("float",[1,3],name="labels")
relu_out = tf.nn.relu(y)
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(relu_out,labels,name="loss")
optimizer = tf.train.GradientDescentOptimizer(0.5)
train_op = optimizer.minimize(cross_entropy)
e_labels = np.array([[1.0,1.0,0.0]])
sess.run(tf.initialize_all_variables())
for step in range(10):
[out,loss] = sess.run([train_op,cross_entropy],feed_dict={x:np.array([[1.0,2.0,3.0]]),labels: e_labels})
print("the result is:",out)
print("The loss of the function is:",loss)
Till now I changed label values (e_labels) and the input values (x) but anyway I always get a None result. My question is: Is that None Value normal? I don't think so, but if someone could tell me, I would be glad to know what can I do and how to solve it.
I assume you mean that the value of out (i.e., the first return value from sess.run([train_op, cross_entropy], ...)) is None.
This is perfectly normal: train_op is a tf.Operation, and when you pass a tf.Operation to tf.Session.run() (quoting the docs) "The corresponding fetched value will be None."
You can think of a tf.Operation like a function with a void return type (in a language like C or Java). It's something that you run() to cause a side effect (i.e., updating the variables) but it doesn't have a meaningful return value itself.