What is the use of .profile-empty file in Tensorflow events folder - tensorflow

There is this file (events.out.tfevents.1611631707.8f60fbcf7419.profile-empty) that appears alongside other files e.g. events.out.tfevents.1611897478.844156cf4a75.61.560.v2.
My model training is not going well at all so I am looking all over to identify things I don't understand to see if they may be the cause. What is this .profile-empty file for? An image below to show the files.

This is a file written by the TensorFlow profiler. It is here to help the TensorBoard know which directory contains the profile data.
From the commit c66b603:
save empty event file in logdir when running profiler. TensorBoard will use this event file to identify the logdir that contains profile data
And from the commit 23d8e38:
Save an empty event file when StartTracing is called. This is to help with TensorBoard subdirectory searching.

Related

How to save tensorflow models in RAMDisk?

In my original python code, there is a frequent restore of the ckpt model file. It takes too much time to read the checkpoints again and again. So I decided to save the model in the memory. A simple way is to create a RAMDisk and save the model in that disk. However, something unexpected happens.
I deployed 1G of RAMDisk according to the tutorial How to Create RAM Disk in Windows 10 for Super-Fast Read and Write Speeds. My system is windows 11.
I made two attempts: In the first one, I copied my code to the RAMDisk E: and used tf.train.Saver().save(self.sess,'./') to save the model, but it reports that UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb4 in position 114: invalid start byte. However, if I put the code on other normal folders, it runs successfully.
In the second attempt, I put the code under D: and modified the line as tf.train.Saver().save(self.sess,'E:\\'), and it reports that cannot create directory E: Permission Denied. Obviously, E:\ is not a directory to create. So I don't know how to handle this.
Your jupyter/python environment cannot go beyond the directory from which jupyter/python is started from and that's why you get a permission denied error.
However, you can run shell commands from the jupyter notebook. If your user has write access to your destination, you can do the following.
model.save("my_model") # This will save the model to the current directory.
!mv "my_model" "E:\my_model" # This will move the model from the current directory to your required directory.
On a side note, I when searching for tf.train.Saver().save(), I get this page as the only relevant result, which says it is used for saving checkpoints and not model. Also they recommend switching to the newer tf.train.Checkpoint or tf.keras.Model.save_weights. None the less, the above method should work as expected.

Keras "SavedModel file does not exist at..." for a model retrieved from an online URL

Keras "SavedModel file does not exist at..." error occurs for a model retrieved from an online URL and never manually saved at any local directory.
The code ran just fine for as long as I've been working on it before but I reopened the project today and without changing anything it now gives me this error.
Code Snippet & Error Screenshot
Managed to solve it myself. Simply visit the file directory the error mentions and delete the folder with the random numbers and letters in it. Rerun the program and it'll properly generate the files needed.

kaggle directly download input data from copied kernel

How can I download all the input data from a kaggle kernel? For example this kernel: https://www.kaggle.com/davidmezzetti/cord-19-study-metadata-export.
Once you make a copy and have the option to edit, you have the ability to run the notebook and make changes.
One thing I have noticed is that anything that goes in the output directory is provided with an option of a download button next to the file icon. So I see that I can surely just read each and every file and write to the output but it seems like a waste.
Am I missing something here?
The notebook you list contains two data sources;
another notebook (https://www.kaggle.com/davidmezzetti/cord-19-analysis-with-sentence-embeddings)
and a dataset (https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge)
You can use Kaggle's API to retrieve a kernel's output:
kaggle kernels output davidmezzetti/cord-19-analysis-with-sentence-embeddings
And to download dataset files:
kaggle datasets download allen-institute-for-ai/CORD-19-research-challenge

Does google colab permanently change file

I am doing some data pre-processing on Google Colab and just wondering how it works with manipulating dataset. For example R does not change the original dataset until you use write.csv to export the changed dataset. Does it work similarly in colab? Thank you!
Until you explicitly save your changed data, e.g. using df.to_csv to the same file you read from, your changed dataset is not saved.
You must remember that due to inactivity (up to an hour or so), you colab session might expire and all progress be lost.
Update
To download a model, dataset or a big file from Google Drive, gdown command is already available
!gdown https://drive.google.com/uc?id=FILE_ID
Download your code from GitHub and run predictions using the model you already downloaded
!git clone https://USERNAME:PASSWORD#github.com/username/project.git
Write ! before a line of your code in colab and it would be treated as bash command. You can download files form internet using wget for example
!wget file_url
You can commit and push your updated code to GitHub etc. And updated dataset / model to Google Drive or Dropbox.

Tensorboard, only show latest tfevents

Tensboard shows all the events which it finds in the given logdir.
If I ran my training (or whatever) multiple times, I will have multiple tfevents file in the logdir. Tensorboard will show all such variable summaries merged together in a graph which looks strange.
On stdout, it writes sth like:
WARNING:tensorflow:Found more than one graph event per run. Overwriting the graph with the newest event.
WARNING:tensorflow:Found more than one "run metadata" event with tag step_0000. Overwriting it with the newest event.
How can I make it only show the summaries/events from the latest tfevents file, so that it ignores all older tfevents files?
Try putting your tfevents files into unique directories with the name of the run, as documented here