save/restore weights of a tensorflow model

save/restore weights of a tensorflow model - tensorflow

I'm trying to save and restore parameters of a tensorflow model. The code does save the parameters to the given path, but when I try to restore the last checkpoint, perform an operation, and then save again, it doesn't return the last checkpoint. what should I do?
import tensorflow as tf
import os
v1 = tf.Variable(1.32, name="v1")
v2 = tf.Variable(1.33, name="v2")
saver = tf.train.Saver()
init = tf.initialize_all_variables()
with tf.Session() as sess:
sess.run(init)
print (v2.eval(sess))
saver.save(sess, "/tmp/model")
print("Model restored.")
for i in range(10):
ckpt = tf.train.get_checkpoint_state(os.path.dirname('/tmp/'))
print(ckpt)
sess = tf.Session()
if(ckpt and ckpt.model_checkpoint_path):
saver.restore(sess, ckpt.model_checkpoint_path)
x = sess.run("v2:0")
x = x + 1
print(x)
saver.save(sess, "/tmp/model", global_step = i)
sess.close()

Related

why tensorflow restore from ckpt the same variable with different values

when I use tensorflow restore, run the code as follows, Each run will produce different results.
import tensorflow as tf
ckpt = tf.train.get_checkpoint_state("./models/")
saver = tf.train.import_meta_graph(ckpt.model_checkpoint_path + '.meta')
with tf.Session() as sess:
saver.restore(sess, ckpt.model_checkpoint_path)
weights = sess.run(tf.get_default_graph().get_tensor_by_name('output_weights/obj_w:0'))
print(weights[0])

How to save and restore Tensorflow model made with Keras

I am currently working on a project where I build a Network in Keras like so:
inputstuff = Input(shape=(32,), name='main_input')
encoded = Dense(16, activation='relu',init='he_normal', activity_regularizer=regularizers.l1(0.01))(inputstuff)
[other layers...]
decoded = Dense(32, activation='relu',init='he_normal', name='main_output')(decoded)
autoencoder = Model(input=inputstuff, output=decoded)
Next to save this file as a tensorflow model, I do the following:
sessK=K.get_session()
saver = tf.train.Saver()
model_path = "/tmp/KerasGlobalAEModel.ckpt"
save_path = saver.save(sessK, model_path)
Next I want to load the file from another program:
model_path = "/tmp/KerasGlobalAEModel.ckpt"
tf.train.NewCheckpointReader(model_path)
with tf.Session() as sess:
# Initialize variables
init = tf.initialize_all_variables()
sess.run(init)
saver = tf.train.import_meta_graph('/tmp/KerasGlobalAEModel.ckpt.meta')
# Restore model weights from previously saved model
saver.restore(sess, model_path)
test = [some data]
graph = tf.get_default_graph()
MyAnswers = sess.run(Y*, feed_dict={X*: test})
If this was the same file I would be able to use:
X* = model.input
Y* = model.output
However, this and all other things I have tried failed.
Here is a list of what DOES NOT work.
Y = graph.get_tensor_by_name("main_output")
Y = model.output
Y = autoencoder.output
I realize I am a noobie at this, and this might be a dumb question but I would appreciate any guidance.
Thank you,

Cannot load int variable from previous session in tensorflow 1.1

I have read many similar questions and just cannot get this to work properly.
I have my model being trained well and checkpoint files are being made every epoch. I want to have it so the program can continue from epoch x once reloaded and also for it to print that is on that epoch with every iteration. I could simply save the data outside of the checkpoint file, however I was also wanting to do this to give me confidence everything else is also being stored properly.
Unfortunately the value in the epoch/global_step variable is always still 0 when I restart.
import tensorflow as tf
import numpy as np
import tensorflow as tf
import numpy as np
# more imports
def extract_number(f): # used to get latest checkpint file
s = re.findall("epoch(\d+).ckpt",f)
return (int(s[0]) if s else -1,f)
def restore(init_op, sess, saver): # called to restore or just initialise model
list = glob(os.path.join("./params/e*"))
if list:
file = max(list,key=extract_number)
saver.restore(sess, file[:-5])
sess.run(init_op)
return
with tf.Graph().as_default() as g:
# build models
total_batch = data.train.num_examples / batch_size
epochLimit = 51
saver = tf.train.Saver()
init_op = tf.global_variables_initializer()
with tf.Session() as sess:
saver = tf.train.Saver()
init_op = tf.global_variables_initializer()
restore(init_op, sess, saver)
epoch = global_step.eval()
while epoch < epochLimit:
total_batch = data.train.num_examples / batch_size
for i in range(int(total_batch)):
sys.stdout.flush()
voxels = newData.eval()
batch_z = np.random.uniform(-1, 1, [batch_size, z_size]).astype(np.float32)
sess.run(opt_G, feed_dict={z:batch_z, train:True})
sess.run(opt_D, feed_dict={input:voxels, z:batch_z, train:True})
with open("out/loss.csv", 'a') as f:
batch_loss_G = sess.run(loss_G, feed_dict={z:batch_z, train:False})
batch_loss_D = sess.run(loss_D, feed_dict={input:voxels, z:batch_z, train:False})
msgOut = "Epoch: [{0}], i: [{1}], G_Loss[{2:.8f}], D_Loss[{3:.8f}]".format(epoch, i, batch_loss_G, batch_loss_D)
print(msgOut)
epoch=epoch+1
sess.run(global_step.assign(epoch))
saver.save(sess, "params/epoch{0}.ckpt".format(epoch))
batch_z = np.random.uniform(-1, 1, [batch_size, z_size]).astype(np.float32)
voxels = sess.run(x_, feed_dict={z:batch_z})
v = voxels[0].reshape([32, 32, 32]) > 0
util.save_binvox(v, "out/epoch{0}.vox".format(epoch), 32)
I also update the global step variable using assign at the bottom. Any ideas? Any help would be greatly appreciated.

When you call sess.run(init_op) after restoring this resets all variables to their initial values. Comment that line out and things should work.

My original code was wrong for several reasons because I was trying so many things. The first responder Alexandre Passos gives a valid point, but I believe what changed the game was also the use of scopes (maybe?).
Below is the working updated code if it helps anyone:
import tensorflow as tf
import numpy as np
# more imports
def extract_number(f): # used to get latest checkpint file
s = re.findall("epoch(\d+).ckpt",f)
return (int(s[0]) if s else -1,f)
def restore(sess, saver): # called to restore or just initialise model
list = glob(os.path.join("./params/e*"))
if list:
file = max(list,key=extract_number)
saver.restore(sess, file[:-5])
return saver, True, sess
saver = tf.train.Saver()
init_op = tf.global_variables_initializer()
sess.run(init_op)
return saver, False , sess
batch_size = 100
learning_rate = 0.0001
beta1 = 0.5
z_size = 100
save_interval = 1
data = dataset.read()
total_batch = data.train.num_examples / batch_size
def fill_queue():
for i in range(int(total_batch*epochLimit)):
sess.run(enqueue_op, feed_dict={batch: data.train.next_batch(batch_size)}) # runnig in seperate thread to feed a FIFOqueue
with tf.variable_scope("glob"):
global_step = tf.get_variable(name='global_step', initializer=0,trainable=False)
# build models
epochLimit = 51
saver = tf.train.Saver()
with tf.Session() as sess:
saver,rstr,sess = restore(sess, saver)
with tf.variable_scope("glob", reuse=True):
epocht = tf.get_variable(name='global_step', trainable=False, dtype=tf.int32)
epoch = epocht.eval()
while epoch < epochLimit:
total_batch = data.train.num_examples / batch_size
for i in range(int(total_batch)):
sys.stdout.flush()
voxels = newData.eval()
batch_z = np.random.uniform(-1, 1, [batch_size, z_size]).astype(np.float32)
sess.run(opt_G, feed_dict={z:batch_z, train:True})
sess.run(opt_D, feed_dict={input:voxels, z:batch_z, train:True})
with open("out/loss.csv", 'a') as f:
batch_loss_G = sess.run(loss_G, feed_dict={z:batch_z, train:False})
batch_loss_D = sess.run(loss_D, feed_dict={input:voxels, z:batch_z, train:False})
msgOut = "Epoch: [{0}], i: [{1}], G_Loss[{2:.8f}], D_Loss[{3:.8f}]".format(epoch, i, batch_loss_G, batch_loss_D)
print(msgOut)
epoch=epoch+1
sess.run(global_step.assign(epoch))
saver.save(sess, "params/epoch{0}.ckpt".format(epoch))
batch_z = np.random.uniform(-1, 1, [batch_size, z_size]).astype(np.float32)
voxels = sess.run(x_, feed_dict={z:batch_z})
v = voxels[0].reshape([32, 32, 32]) > 0
util.save_binvox(v, "out/epoch{0}.vox".format(epoch), 32)

tensorflow error: restore checkpoint file

I built up my own convolutional neural network, in which I track the moving averages of all trainable variables (tensorflow 1.0):
variable_averages = tf.train.ExponentialMovingAverage(
0.9999, global_step)
variables_averages_op = variable_averages.apply(tf.trainable_variables())
train_op = tf.group(apply_gradient_op, variables_averages_op)
saver = tf.train.Saver(tf.global_variables(), max_to_keep=10)
summary_op = tf.summary.merge(summaries)
init = tf.global_variables_initializer()
sess = tf.Session(config=tf.ConfigProto(
allow_soft_placement=True,
log_device_placement=False))
sess.run(init)
# start queue runners
tf.train.start_queue_runners(sess=sess)
summary_writer = tf.summary.FileWriter(FLAGS.train_dir, sess.graph)
# training loop
start_time = time.time()
for step in range(FLAGS.max_steps):
_, loss_value = sess.run([train_op, loss])
duration = time.time() - start_time
start_time = time.time()
assert not np.isnan(loss_value), 'Model diverged with loss = NaN'
if step % 1 == 0:
# print current model status
num_examples_per_step = FLAGS.batch_size * FLAGS.num_gpus
examples_per_sec = num_examples_per_step/duration
sec_per_batch = duration/FLAGS.num_gpus
format_str = '{} step{}, loss {}, {} examples/sec, {} sec/batch'
print(format_str.format(datetime.now(), step, loss_value, examples_per_sec, sec_per_batch))
if step % 50 == 0:
summary_str = sess.run(summary_op)
summary_writer.add_summary(summary_str, step)
if step % 10 == 0 or step == FLAGS.max_steps:
print('save checkpoint')
# save checkpoint file
checkpoint_file = os.path.join(FLAGS.train_dir, 'model.ckpt')
saver.save(sess, checkpoint_file, global_step=step)
This workes fine and checkpoint files are saved (saver version V2). Then I try to restore the checkpoints in a nother script for evaluating the model. There I have this piece of code
# Restore the moving average version of the learned variables for eval.
variable_averages = tf.train.ExponentialMovingAverage(
MOVING_AVERAGE_DECAY)
variables_to_restore = variable_averages.variables_to_restore()
saver = tf.train.Saver(variables_to_restore)
where I get the error "NotFoundError (see above for traceback): Key conv1/Variable/ExponentialMovingAverage not found in checkpoint" where conv1/variable/ is a variable scope.
This error ocuurs even before I try to restore the variables. Can you please help to solve it?
Thanks in advance
TheJude

I solved it in this way:
Call tf.reset_default_graph() before create second ExponentialMovingAverage(...) in the graph.
# reset the graph before create a new ema
tf.reset_default_graph()
# Restore the moving average version of the learned variables for eval.
variable_averages = tf.train.ExponentialMovingAverage(MOVING_AVERAGE_DECAY)
variables_to_restore = variable_averages.variables_to_restore()
saver = tf.train.Saver(variables_to_restore)
It took me 2 hours...

Tensorflow. Could not open Checkpoint

I have built and trained a tensorflow model, but unfortunately, the checkpoint file cannot be open as shown below by an error.
Now there isn't an error, but a bunch of warnings that don't really tell you anything.
This happens when I run the evalutation code:
import tensorflow as tf
import main
import Process
import Input
eval_dir = "/Users/Zanhuang/Desktop/NNP"
checkpoint_dir = "/Users/Zanhuang/Desktop/NNP"
ckpt = tf.train.get_checkpoint_state(checkpoint_dir)
def evaluate():
with tf.Graph().as_default() as g:
images, labels = Process.eval_inputs()
forward_propgation_results = Process.forward_propagation(images)
init_op = tf.initialize_all_variables()
saver = tf.train.Saver()
top_k_op = tf.nn.in_top_k(forward_propgation_results, labels, 1)
with tf.Session(graph = g) as sess:
tf.train.start_queue_runners(sess = sess)
sess.run(init_op)
saver.restore(sess, eval_dir)
if ckpt and ckpt.model_checkpoint_path:
saver.restore(sess, ckpt.model_checkpoint_path)
for i in range(100):
print(sess.run(top_k_op))
def main(argv = None):
evaluate()
if __name__ == '__main__':
tf.app.run()
The next is how I had generated the checkpoint file:
if step % 2 == 0:
checkpoint_path = os.path.join(FLAGS.data_dir, 'model.ckpt')
saver.save(sess, checkpoint_path, global_step = step)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

save/restore weights of a tensorflow model - tensorflow

Related

why tensorflow restore from ckpt the same variable with different values

How to save and restore Tensorflow model made with Keras

Cannot load int variable from previous session in tensorflow 1.1

tensorflow error: restore checkpoint file

Tensorflow. Could not open Checkpoint

Categories

Resources