TensorFlow not writing events - tensorflow

Below is a code snippet that I use to monitor events when training a DNNRegressor. I am running from a Jupyter notebook.
During training, I get the following errors in the terminal:
E tensorflow/core/util/events_writer.cc:162] The events file
/Users/eran/Genie/PNP/TB/events.out.tfevents.1473067505.Eran has
disappeared. E tensorflow/core/util/events_writer.cc:131] Failed to
flush 2498 events to
/Users/eran/Genie/PNP/TB/events.out.tfevents.1473067505.Eran
def add_monitors():
validation_metrics = {'MeanSquaredError': tf.contrib.metrics.streaming_mean_squared_error}
monitors = learn.monitors.ValidationMonitor(valid_X, valid_y, every_n_steps=50, metrics=validation_metrics)
return [monitors]
regressor = learn.DNNRegressor(model_dir='/Users/eran/Genie/PNP/TB',
hidden_units=[32,16], feature_columns=learn.infer_real_valued_columns_from_input(X),
optimizer=tf.train.ProximalAdagradOptimizer(learning_rate=0.1),
config=learn.RunConfig(save_checkpoints_secs=1))
monitors = add_monitors()
regressor.fit(X, y, steps=10000, batch_size=20, monitors=monitors)
Any ideas? When opening TensorBoard I do not see any events being recorded

log_dir=path_to_events_file
in your code, weather you add some recreate directory code such as tf.gfile.DeleteRecursively(log_dir);tf.gfile.MakeDirs(log_dir) . this step must be done before any summary writer, otherwise tf would not be able to find the right event file.

If you use Windows, give the directory like this:
model_dir='C:\\Users\\eran\\Genie\\PNP\\TB'

Related

Streaming output truncated to the last 5000 lines

The Google Colab output is being truncated. I've looked through the settings and I didn't see a limitation there. What is the best option to solve the problem?
I had the same problem and managed it by writing the output on a file on drive:
from google.colab import drive
drive.mount('/content/drive')
import os
os.chdir("/content/drive/")
with open('/content/drive/output.txt','w') as out:
out.write(' abcd \n')
I have the same issue currently, I found this link on medium, check the part "How do I use Colab for long training times/runs?"
So basically according to this article you need to store checkpoints on your drive and by using callbacks from Keras, you will be able to run it nonstop.
from keras.callbacks import *
filepath = "/content/gdrive/My Drive/MyCNN/epochs:{epoch:03d}-val_acc:{val_acc:.3f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, save_best_only=True, mode='max')
callbacks_list = [checkpoint]
Other solution to solve this problem is according to my researches, you should put this code to your console but make sure that you save your progress to drive, because it will be terminated in 12 hours.
function ClickConnect() {
console.log("Working");
document
.querySelector('#top-toolbar > colab-connect-button')
.shadowRoot.querySelector('#connect')
.click()
}
setInterval(ClickConnect, 60000)

Run twice occurred gpu error on Google Colaboratory

I want to test the FIFIQueue, when I use "with tf.device("/device:GPU:0"):", the first time I run, it's just ok, but when I run it twice, error occurred just print cannot assign gpu to fifo_queue_EnqueueMany(the error detail is in the image below), anyone warm-hearted to help me?
enter image description here
enter image description here
Per drpng's note on Tensorflow: using a FIFO queue for code running on GPUs I wouldn't expect FIFOQueue to schedule on a GPU, and indeed wrapping your code in a .py file (to see TF's logging output) and logging device placement confirms that even the first (successful) invocation schedules on a CPU.
In one cell run:
%%writefile go.py
import tensorflow as tf
config = tf.ConfigProto()
#config.allow_soft_placement=True
config.gpu_options.allow_growth = True
config.log_device_placement=True
def go():
Q = tf.FIFOQueue(3, tf.float16)
enq_many = Q.enqueue_many([[0.1, 0.2, 0.3],])
with tf.device('/device:GPU:0'):
with tf.Session(config=config) as sess:
sess.run(enq_many)
print(Q.size().eval())
go()
go()
And in another cell execute the above as:
!python3 go.py
and observe placement.
Uncomment the allow_soft_placement assignment to make the crash go away.
(I do not know why the first execution succeeds even in the face of non-soft-placement when asking FIFOQueue to schedule on the GPU explicitly as in your code's "first time")

"No graph definition files were found" - TensorBoard error

I used the following code in Pycharm:
import tensorflow as tf
sess = tf.Session()
a = tf.constant(value=5, name='input_a')
b = tf.constant(value=3, name='input_b')
c = tf.multiply(a,b, name='mult_c')
d = tf.add(a,b, name='add_d')
e = tf.add(c,d, name='add_e')
print(sess.run(e))
writer = tf.summary.FileWriter("./tb_graph", sess.graph)
Then, I pasted following line to the Anaconda Prompt:
tensorboard --logdir=="tb_graph"
I tried both with "" and '' as there were proposed: Tensorboard: No graph definition files were found. and it does nothing for me.
I had similar issue. The issue occurred when I specified 'logdir' folder inside single quotes instead of double quotes. Hope this may be helpful to you.
egs: tensorboard --logdir='my_graph' -> Tensorboard didn't detect the graph
tensorboard --logdir="my_graph" -> Tensorboard detected the graph
I checked the code on laptop with Ubuntu 16.04 and another one with Win10, so it probably isn't system-based error.
I also tried adding and removing --host=127.0.0.1 in An Prompt and checking several times both http://localhost:6006/ and http://desktop-.......:6006/.
Still same error:
No graph definition files were found.
To store a graph, create a tf.summary.FileWriter and pass the graph either via the constructor, or by calling its add_graph() method. You may want to check out the graph visualizer tutorial.
....
Please tell me what is wrong in the code/propmp command?
EDIT: On Ubuntu I used the normal terminal, of course.
EDIT2: I used both = and == in command prompt
The answer to my question is:
1) change "./new1_dir" into ".\\new1_dir"
and
2)put full track to file to anaconda propmpt: --logdir="C:\Users\Admin\Documents\PycharmProjects\try_tb\new1_dir"
Thanks #BugKiller for your help!
EDIT: Working only on Windows for me, but still better than nothing
EDIT2: Works on Ubuntu 16.04 too

TensorFlow: Opening log data written by SummaryWriter

After following this tutorial on summaries and TensorBoard, I've been able to successfully save and look at data with TensorBoard. Is it possible to open this data with something other than TensorBoard?
By the way, my application is to do off-policy learning. I'm currently saving each state-action-reward tuple using SummaryWriter. I know I could manually store/train on this data, but I thought it'd be nice to use TensorFlow's built in logging features to store/load this data.
As of March 2017, the EventAccumulator tool has been moved from Tensorflow core to the Tensorboard Backend. You can still use it to extract data from Tensorboard log files as follows:
from tensorboard.backend.event_processing.event_accumulator import EventAccumulator
event_acc = EventAccumulator('/path/to/summary/folder')
event_acc.Reload()
# Show all tags in the log file
print(event_acc.Tags())
# E. g. get wall clock, number of steps and value for a scalar 'Accuracy'
w_times, step_nums, vals = zip(*event_acc.Scalars('Accuracy'))
Easy, the data can actually be exported to a .csv file within TensorBoard under the Events tab, which can e.g. be loaded in a Pandas dataframe in Python. Make sure you check the Data download links box.
For a more automated approach, check out the TensorBoard readme:
If you'd like to export data to visualize elsewhere (e.g. iPython
Notebook), that's possible too. You can directly depend on the
underlying classes that TensorBoard uses for loading data:
python/summary/event_accumulator.py (for loading data from a single
run) or python/summary/event_multiplexer.py (for loading data from
multiple runs, and keeping it organized). These classes load groups of
event files, discard data that was "orphaned" by TensorFlow crashes,
and organize the data by tag.
As another option, there is a script
(tensorboard/scripts/serialize_tensorboard.py) which will load a
logdir just like TensorBoard does, but write all of the data out to
disk as json instead of starting a server. This script is setup to
make "fake TensorBoard backends" for testing, so it is a bit rough
around the edges.
I think the data are encoded protobufs RecordReader format. To get serialized strings out of files you can use py_record_reader or build a graph with TFRecordReader op, and to deserialize those strings to protobuf use Event schema. If you get a working example, please update this q, since we seem to be missing documentation on this.
I did something along these lines for a previous project. As mentioned by others, the main ingredient is tensorflows event accumulator
from tensorflow.python.summary import event_accumulator as ea
acc = ea.EventAccumulator("folder/containing/summaries/")
acc.Reload()
# Print tags of contained entities, use these names to retrieve entities as below
print(acc.Tags())
# E. g. get all values and steps of a scalar called 'l2_loss'
xy_l2_loss = [(s.step, s.value) for s in acc.Scalars('l2_loss')]
# Retrieve images, e. g. first labeled as 'generator'
img = acc.Images('generator/image/0')
with open('img_{}.png'.format(img.step), 'wb') as f:
f.write(img.encoded_image_string)
You can also use the tf.train.summaryiterator: To extract events in a ./logs-Folder where only classic scalars lr, acc, loss, val_acc and val_loss are present you can use this GIST: tensorboard_to_csv.py
Chris Cundy's answer works well when you have less than 10000 data points in your tfevent file. However, when you have a large file with over 10000 data points, Tensorboard will automatically sampling them and only gives you at most 10000 points. It is a quite annoying underlying behavior as it is not well-documented. See https://github.com/tensorflow/tensorboard/blob/master/tensorboard/backend/event_processing/event_accumulator.py#L186.
To get around it and get all data points, a bit hacky way is to:
from tensorboard.backend.event_processing.event_accumulator import EventAccumulator
class FalseDict(object):
def __getitem__(self,key):
return 0
def __contains__(self, key):
return True
event_acc = EventAccumulator('path/to/your/tfevents',size_guidance=FalseDict())
It looks like for tb version >=2.3 you can streamline the process of converting your tb events to a pandas dataframe using tensorboard.data.experimental.ExperimentFromDev().
It requires you to upload your logs to TensorBoard.dev, though, which is public. There are plans to expand the capability to locally stored logs in the future.
https://www.tensorflow.org/tensorboard/dataframe_api
You can also use the EventFileLoader to iterate through a tensorboard file
from tensorboard.backend.event_processing.event_file_loader import EventFileLoader
for event in EventFileLoader('path/to/events.out.tfevents.xxx').Load():
print(event)
Surprisingly, the python package tb_parse has not been mentioned yet.
From documentation:
Installation:
pip install tensorflow # or tensorflow-cpu pip install -U tbparse # requires Python >= 3.7
Note: If you don't want to install TensorFlow, see Installing without TensorFlow.
We suggest using an additional virtual environment for parsing and plotting the tensorboard events. So no worries if your training code uses Python 3.6 or older versions.
Reading one or more event files with tbparse only requires 5 lines of code:
from tbparse import SummaryReader
log_dir = "<PATH_TO_EVENT_FILE_OR_DIRECTORY>"
reader = SummaryReader(log_dir)
df = reader.scalars
print(df)

Tensorboard not listing any event

Running Tensorflow and Tensorboard on docker here.
I was trying to write the simplest code to just demonstrate how tensorboard may work:
graph = tf.Graph()
with graph.as_default(), tf.device('/cpu:0'):
a = tf.constant(5.0)
b = tf.constant(6.0)
c = a * b
# Enter data into summary.
c_summary = tf.scalar_summary("c", c)
merged = tf.merge_all_summaries()
with tf.Session(graph=graph) as session:
writer = tf.train.SummaryWriter("log/test_logs", session.graph_def)
result = session.run([merged])
tf.initialize_all_variables().run()
writer.add_summary(result[0], 0)
I then ran tensorboard --logdir={absolute path to log/test_logs} but no event was listed there. Is there anything I should have written differently in the code maybe?
Note that log/test_logs does contain files like events.out.tfevents.1459102927.0a8840dee548.
I am not sure whether it is your case.
SummaryWriter by default will store summaries in its buffer, it will flush every period of time(I guess 120 seconds? Not sure).
So maybe you just did not wait until your the flush happens. Try to manually flush SummaryWriter or just close() it at the end of your program.