Running Tensorflow and Tensorboard on docker here.
I was trying to write the simplest code to just demonstrate how tensorboard may work:
graph = tf.Graph()
with graph.as_default(), tf.device('/cpu:0'):
a = tf.constant(5.0)
b = tf.constant(6.0)
c = a * b
# Enter data into summary.
c_summary = tf.scalar_summary("c", c)
merged = tf.merge_all_summaries()
with tf.Session(graph=graph) as session:
writer = tf.train.SummaryWriter("log/test_logs", session.graph_def)
result = session.run([merged])
tf.initialize_all_variables().run()
writer.add_summary(result[0], 0)
I then ran tensorboard --logdir={absolute path to log/test_logs} but no event was listed there. Is there anything I should have written differently in the code maybe?
Note that log/test_logs does contain files like events.out.tfevents.1459102927.0a8840dee548.
I am not sure whether it is your case.
SummaryWriter by default will store summaries in its buffer, it will flush every period of time(I guess 120 seconds? Not sure).
So maybe you just did not wait until your the flush happens. Try to manually flush SummaryWriter or just close() it at the end of your program.
Related
I try to adapt the this tf-agents actor<->learner DQN Atari Pong example to my windows machine using a TFUniformReplayBuffer instead of the ReverbReplayBuffer which only works on linux machine but I face a dimensional issue.
[...]
---> 67 init_buffer_actor.run()
[...]
InvalidArgumentError: {{function_node __wrapped__ResourceScatterUpdate_device_/job:localhost/replica:0/task:0/device:CPU:0}} Must have updates.shape = indices.shape + params.shape[1:] or updates.shape = [], got updates.shape [84,84,4], indices.shape [1], params.shape [1000,84,84,4] [Op:ResourceScatterUpdate]
The problem is as follows: The tf actor tries to access the replay buffer and initialize the it with a certain number random samples of shape (84,84,4) according to this deepmind paper but the replay buffer requires samples of shape (1,84,84,4).
My code is as follows:
def train_pong(
env_name='ALE/Pong-v5',
initial_collect_steps=50000,
max_episode_frames_collect=50000,
batch_size=32,
learning_rate=0.00025,
replay_capacity=1000):
# load atari environment
collect_env = suite_atari.load(
env_name,
max_episode_steps=max_episode_frames_collect,
gym_env_wrappers=suite_atari.DEFAULT_ATARI_GYM_WRAPPERS_WITH_STACKING)
# create tensor specs
observation_tensor_spec, action_tensor_spec, time_step_tensor_spec = (
spec_utils.get_tensor_specs(collect_env))
# create training util
train_step = train_utils.create_train_step()
# calculate no. of actions
num_actions = action_tensor_spec.maximum - action_tensor_spec.minimum + 1
# create agent
agent = dqn_agent.DqnAgent(
time_step_tensor_spec,
action_tensor_spec,
q_network=create_DL_q_network(num_actions),
optimizer=tf.compat.v1.train.RMSPropOptimizer(learning_rate=learning_rate))
# create uniform replay buffer
replay_buffer = tf_uniform_replay_buffer.TFUniformReplayBuffer(
data_spec=agent.collect_data_spec,
batch_size=1,
max_length=replay_capacity)
# observer of replay buffer
rb_observer = replay_buffer.add_batch
# create batch dataset
dataset = replay_buffer.as_dataset(
sample_batch_size=batch_size,
num_steps = 2,
single_deterministic_pass=False).prefetch(3)
# create callable function for actor
experience_dataset_fn = lambda: dataset
# create random policy for buffer init
random_policy = random_py_policy.RandomPyPolicy(collect_env.time_step_spec(),
collect_env.action_spec())
# create initalizer
init_buffer_actor = actor.Actor(
collect_env,
random_policy,
train_step,
steps_per_run=initial_collect_steps,
observers=[replay_buffer.add_batch])
# initialize buffer with random samples
init_buffer_actor.run()
(The approach is using the OpenAI Gym Env as well as the corresponding wrapper functions)
I worked with keras-rl2 and tf-agents without actor<->learner for other atari games to create the DQN and both worked quite well afer a some adaptions. I guess my current code will also work after a few adaptions in the tf-agent libary functions, but that would obviate the purpose of the libary.
My current assumption: The actor<->learner methods are not able to work with the TFUniformReplayBuffer (as I expect them to), due to the missing support of the TFPyEnvironment - or I still have some knowledge shortcomings regarding this tf-agents approach
Previous (successful) attempt:
from tf_agents.environments.tf_py_environment import TFPyEnvironment
tf_collect_env = TFPyEnvironment(collect_env)
init_driver = DynamicStepDriver(
tf_collect_env,
random_policy,
observers=[replay_buffer.add_batch],
num_steps=200)
init_driver.run()
I would be very grateful if someone could explain me what I'm overseeing here.
I fixed it...partly, but the next error is (in my opinion) an architectural problem.
The problem is that the Actor/Learner setup is build on a PyEnvironment whereas the
TFUniformReplayBuffer is using the TFPyEnvironment which ends up in the failure above...
Using the PyUniformReplayBuffer with a converted py-spec solved this problem.
from tf_agents.specs import tensor_spec
# convert agent spec to py-data-spec
py_collect_data_spec = tensor_spec.to_array_spec(agent.collect_data_spec)
# create replay buffer based on the py-data-spec
replay_buffer = py_uniform_replay_buffer.PyUniformReplayBuffer(
data_spec= py_collect_data_spec,
capacity=replay_capacity*batch_size
)
This snippet solved the issue of having an incompatible buffer in the background but ends up in another issue
--> The add_batch function does not work
I found this approach which advises to use either a batched environment or to make the following adaptions for the replay observer (add_batch method).
from tf_agents.utils.nest_utils import batch_nested_array
#********* Adpations add_batch method - START *********#
rb_observer = lambda x: replay_buffer.add_batch(batch_nested_array(x))
#********* Adpations add_batch method - END *********#
# create batch dataset
dataset = replay_buffer.as_dataset(
sample_batch_size=32,
single_deterministic_pass=False)
experience_dataset_fn = lambda: dataset
This helped me to solve the issue regarding this post but now I run into another problem where I need to ask someone of the tf-agents-team...
--> It seems that the Learner/Actor structure is no able to work with another buffer than the ReverbBuffer, because the data-spec which is processed by the PyUniformReplayBuffer sets up a wrong buffer structure...
For anyone who has the same problem: I just created this Github-Issue report to get further answers and/or fix my lack of knowledge.
the full fix is shown below...
--> The dimensionality issue was valid and should indicate the the (uploaded) batched samples are not in the correct shape
--> This issue happens due to the fact that the "add_batch" method loads values with the wrong shape
rb_observer = replay_buffer.add_batch
Long story short, this line should be replaced by
rb_observer = lambda x: replay_buffer.add_batch(batch_nested_array(x))
--> Afterwards the (replay buffer) inputs are of correct shape and the Learner Actor Setup starts training.
The full replay buffer is shown below:
# create buffer for storing experience
replay_buffer = tf_uniform_replay_buffer.TFUniformReplayBuffer(
agent.collect_data_spec,
1,
max_length=1000000)
# create batch dataset
dataset = replay_buffer.as_dataset(
sample_batch_size=32,
num_steps = 2,
single_deterministic_pass=False).prefetch(4)
# create batched nested array input for rb_observer
rb_observer = lambda x: replay_buffer.add_batch(batch_nested_array(x))
# create batched readout of dataset
experience_dataset_fn = lambda: dataset
I am using tf.train.Supervisor to manage my session. I am already using the summary_writer in the supervisor to write some summaries. I would however, at other intervals, like to write another set of summaries. As fare as I can see the easiest way is to use a supervisor.loop. What I was is basically:
Pseudo code:
summary_merged_valid = tf.summary.merge(summary_ops_valid)
valid_writer = tf.train.SummaryWriter(logdir + '/valid')
global_step = slim.get_or_create_global_step()
...
config = tf.ConfigProto(allow_soft_placement=True)
with sv.managed_session(config=config) as sess:
...
sv.loop(validation_interval,
valid_writer.add_summary,
(summary_merged_valid, global_step)
)
How should I go about this?
You can also provide your own summaries to Supervisor using
sv.summary_computed(sess, summary, global_step)
manually. One interesting thing that doesn't seem to be advertised too much is that you can group summaries into collections like so:
tf.summary.scalar('learning_rate', p_lr, collections=['train'])
tf.summary.scalar('loss', t_loss, collections=['train', 'test'])
s_training = tf.summary.merge_all('train')
and then only write the train variables by fetching s_training and giving it to the the above function.
TensorFlow documents this here, under Launching additional services:
Example: Start a thread to print losses. We want this thread to run every 60 seconds, so we launch it with sv.loop().
sv = Supervisor(logdir='/tmp/mydir') with sv.managed_session(FLAGS.master) as sess:
sv.loop(60, print_loss, (sess)) while not sv.should_stop():
sess.run(my_train_op)
See answer by #sunside for good tips on how to do this in a smart way.
Below is a code snippet that I use to monitor events when training a DNNRegressor. I am running from a Jupyter notebook.
During training, I get the following errors in the terminal:
E tensorflow/core/util/events_writer.cc:162] The events file
/Users/eran/Genie/PNP/TB/events.out.tfevents.1473067505.Eran has
disappeared. E tensorflow/core/util/events_writer.cc:131] Failed to
flush 2498 events to
/Users/eran/Genie/PNP/TB/events.out.tfevents.1473067505.Eran
def add_monitors():
validation_metrics = {'MeanSquaredError': tf.contrib.metrics.streaming_mean_squared_error}
monitors = learn.monitors.ValidationMonitor(valid_X, valid_y, every_n_steps=50, metrics=validation_metrics)
return [monitors]
regressor = learn.DNNRegressor(model_dir='/Users/eran/Genie/PNP/TB',
hidden_units=[32,16], feature_columns=learn.infer_real_valued_columns_from_input(X),
optimizer=tf.train.ProximalAdagradOptimizer(learning_rate=0.1),
config=learn.RunConfig(save_checkpoints_secs=1))
monitors = add_monitors()
regressor.fit(X, y, steps=10000, batch_size=20, monitors=monitors)
Any ideas? When opening TensorBoard I do not see any events being recorded
log_dir=path_to_events_file
in your code, weather you add some recreate directory code such as tf.gfile.DeleteRecursively(log_dir);tf.gfile.MakeDirs(log_dir) . this step must be done before any summary writer, otherwise tf would not be able to find the right event file.
If you use Windows, give the directory like this:
model_dir='C:\\Users\\eran\\Genie\\PNP\\TB'
I encountered a problem that the code order influences the final result. At first, the code works. After I move one line, tensorflow generates an error.
For example,
working version:
probs = net.get_output()
label_node = tf.placeholder(tf.int32, name='label_node')
top_1_op = tf.nn.in_top_k(probs, label_node, 1)
top_5_op = tf.nn.in_top_k(probs, label_node, 5)
threads = image_producer.start(session=sess, coordinator=coordinator)
for (labels, images) in image_producer.batches(sess):
top_1_result, top_5_result = sess.run([top_1_op, top_5_op],
feed_dict={input_node: images, label_node: labels})
Non-working version:
threads = image_producer.start(session=sess, coordinator=coordinator) # move here
probs = net.get_output()
label_node = tf.placeholder(tf.int32, name='label_node')
top_1_op = tf.nn.in_top_k(probs, label_node, 1)
top_5_op = tf.nn.in_top_k(probs, label_node, 5)
for (labels, images) in image_producer.batches(sess):
top_1_result, top_5_result = sess.run([top_1_op, top_5_op],
feed_dict={input_node: images, label_node: labels})
Tensorflow generates an error
"tensorflow.python.framework.errors.NotFoundError: FeedInputs: unable to find feed output label_node:0".
As you can see, tensorflow should be able to find "label_node:0". Actually, tensorflow cannot find top_1_op and top_5_op either.
The content of image_producer.start is something similar to:
op_A = ...
queue_runner = tf.train.QueueRunner(queue_B, [op_B] * num_concurrent)
session.run(op_A)
t = queue_runner.create_threads(session, coord=coordinator, start=True)
A more strange thing is that in the non-workable version, after I add two lines in image_producer.start, the code works again. For example, image_producer.start becomes
op_C = ... # new
session.run(op_C) # new
op_A = ...
queue_runner = tf.train.QueueRunner(queue_B, [op_B] * num_concurrent)
session.run(op_A)
t = queue_runner.create_threads(session, coord=coordinator, start=True)
Does anyone have an idea about possible causes of this problem? Or any idea about how to debug this?
It sounds like you are suffering from a bug that was fixed after TensorFlow 0.9.0 was released. In that version (and earlier) TensorFlow suffered from a race condition that could lead to unrecoverable errors if you modified the graph after queue runners (or other threads calling sess.run()) had started. The only workaround in version 0.9.0 is to start
the queue runners (i.e. the image_producer in your code) after the graph has been completely constructed.
I'm trying to restrict the number of cores that a tf session uses but it's not working. This is how I'm initializing the session:
sess = tf.Session(config=tf.ConfigProto(inter_op_parallelism_threads=1,
intra_op_parallelism_threads=1,
use_per_session_threads=True))
The system has 12 cores / 24 threads, and I can see that 40-60% of them are being used at any given point in time. The system also has 8 GPUs, but I construct the whole graph with tf.device('/cpu:0').
UPDATE: To clarify, the graph itself is a simple LSTM-RNN, that hews very closely to the examples in the tf source code. For completeness here's the full graph:
node_input = tf.placeholder(tf.float32, [n_steps, batch_size, input_size], name = 'input')
list_input = [tf.reshape(i, (batch_size, input_size)) for i in tf.split(0, n_steps, node_input)]
node_target = tf.placeholder(tf.float32, [n_steps, batch_size, output_size], name = 'target')
node_target_flattened = tf.reshape(tf.transpose(node_target, perm = [1, 0, 2]), [-1, output_size])
node_max_length = tf.placeholder(tf.int32, name = 'batch_max_length')
node_cell_initializer = tf.random_uniform_initializer(-0.1, 0.1)
node_cell = LSTMCell(state_size, input_size, initializer = node_cell_initializer)
node_initial_state = node_cell.zero_state(batch_size, tf.float32)
nodes_output, nodes_state = rnn(node_cell,
list_input,
initial_state = node_initial_state,
sequence_length = node_max_length)
node_output_flattened = tf.reshape(tf.concat(1, nodes_output), [-1, state_size])
node_softmax_w = tf.Variable(tf.random_uniform([state_size, output_size]), name = 'softmax_w')
node_softmax_b = tf.Variable(tf.zeros([output_size]), name = 'softmax_b')
node_logit = tf.matmul(node_output_flattened, node_softmax_w) + node_softmax_b
node_cross_entropy = tf.nn.softmax_cross_entropy_with_logits(node_logit, node_target_flattened, name = 'cross_entropy')
node_loss = tf.reduce_mean(node_cross_entropy, name = 'loss')
node_optimizer = tf.train.AdamOptimizer().minimize(node_loss)
node_op_initializer = tf.initialize_all_variables()
One important thing to note is that if the first time I call tf.Session, I pass in the appropriate parameters, then the session does only run on a single core. The problem is that in subsequent runs, I am unable to change the behavior, even though I use use_per_session_threads which is supposed to specifically allow for session-specific settings. I.e. even after I close the session using sess.close() and start a new one with new options, the original behavior remains unchanged unless I restart the python kernel (which is very costly because it takes it nearly an hour to load my data).
use_per_session_threads will only affect the inter_op_parallelism_threads but not the intra_op_parallelism_threads. The intra_op_parallelism_threads will be used for the Eigen thread pool (see here) which is always global, thus subsequent sessions will not influence this anymore.
Note that there are other TF functions which can also trigger the initialization of the Eigen thread pool, so it can happen that it's already initialized before you create the first tf.Session. One example is tensorflow.python.client.device_lib.list_local_devices().
I solve this in a way that very early in my Python script, I create a dummy session with the appropriate values.
TensorFlow does an optimization where the first time a DirectSession is created it will create static thread pools which then will be reused. If you want to change this, specify multiple different thread pools in the session_inter_op_thread_pool flag and specify which one you want to use.
In tensorflow 2.3.2 I managed to limit cpus by using psutils lib.
I provided this the beginning of the function
pid = psutil.Process(os.getpid())
pid.cpu_affinity([0, 1])
The later call of model.fit utilized exactly 2 cpus