Tensorflow: checkpoints simple load - tensorflow

I have a checkpoint file:
checkpoint-20001 checkpoint-20001.meta
how do I extract variables from this space, without having to load the previous model and starting session etc.
I want to do something like
cp = load(checkpoint-20001)
cp.var_a

It's not documented, but you can inspect the contents of a checkpoint from Python using the class tf.train.NewCheckpointReader.
Here's a test case that uses it, so you can see how the class works.
https://github.com/tensorflow/tensorflow/blob/861644c0bcae5d56f7b3f439696eefa6df8580ec/tensorflow/python/training/saver_test.py#L1203
Since it isn't a documented class, its API may change in the future.

Related

How to access a file inside sagemaker entrypoint script

I want to know how to access a private bucket S3 file or a folder inside script.py entry point of sagemaker .
I uploaded the file to S3 using following code
boto3_client = boto3.Session(
region_name='us-east-1',
aws_access_key_id='xxxxxxxxxxx',
aws_secret_access_key='xxxxxxxxxxx'
)
sess = sagemaker.Session(boto3_client)
role=sagemaker.session.get_execution_role(sagemaker_session=sess)
inputs = sess.upload_data(path="df.csv", bucket=sess.default_bucket(), key_prefix=prefix)
This is the code of estimator
import sagemaker
from sagemaker.pytorch import PyTorch
pytorch_estimator = PyTorch(
entry_point='script.py',
instance_type='ml.g4dn.xlarge',
source_dir = './',
role=role,
sagemaker_session=sess,
)
Now inside script.py file i want to access the df.csv file from s3.
This is my code inside script.py.
parser = argparse.ArgumentParser()
parser.add_argument("--data-dir", type=str, default=os.environ["SM_CHANNEL_TRAINING"])
args, _ = parser.parse_known_args()
#create session
sess=Session(boto3.Session(
region_name='us-east-1'))
S3Downloader.download(s3_uri=args.data_dir,
local_path='./',
sagemaker_session=sess)
df=pd.read_csv('df.csv')
But this is giving error
ValueError: Expecting 's3' scheme, got: in /opt/ml/input/data/training., exit code: 1
I think one way is to pass secret key and access key. But i am already passing sagemaker_session. How can i call that session inside script.py file and get my file read.
I think this approach is conceptually wrong.
Files within sagemaker jobs (whether training or otherwise) should be passed during machine initialization. Imagine you have to create a job with 10 machines, do you want to read the file 10 times or replicate it directly by having it read once?
In the case of the training job, they should be passed into the fit (in the case of direct code like yours) or as TrainingInput in the case of pipeline.
You can follow this official AWS example: "Train an MNIST model with PyTorch"
However, the important part is simply passing a dictionary of input channels to the fit:
pytorch_estimator.fit({'training': s3_input_train})
You can put the name of the channel (in this case 'train') any way you want. The path s3 will be the one in your df.csv.
Within your script.py, you can read the df.csv directly between environment variables (or at least be able to specify it between argparse). Generic code with this default will suffice:
parser.add_argument("--train", type=str, default=os.environ["SM_CHANNEL_TRAINING"])
It follows the nomenclature "SM_CHANNEL_" + your_channel_name.
So if you had put "train": s3_path, the variable would have been called SM_CHANNEL_TRAIN.
Then you can read your file directly by pointing to the path corresponding to that environment variable.

Filename of graph .meta in a TensorFlow checkpoint directory

How do I get the filename of the latest checkpoint's graph?
In the examples for tf.train.import_meta_graph I typically see the filename hard-coded to be something like checkpoint_dir + 'model.ckpt-1234.meta'.
After importing, .restore can load the latest training parameters as in:
saver.restore(*session*, tf.train.latest_checkpoint(*my_dir*))
However, what is a reliable way to get the graph's filename? In my case,
tf.train_import_meta_data(tf.train.latest_checkpoint(*my_dir*) + '.meta')
should do the job but I don't think its reliable as checkpoints don't necessarily save the metagraph every time, correct?
I can write a routine that looks through the checkpoint dir and walks back until I find a .meta. But, is there a better/built-in way such as tf.train.latest_metagraph(*my_dir*)?
I just found an article where someone implemented the routine I mentioned in my question. This isn't an "answer", but is what I'll use assuming there isn't a built-in solution.
This is from the very nice article at seaandsalor written by João Felipe Santos.
I don't know the usage rules, so I didn't want to directly link to his code. Follow that link if interested and then go to the gist mentioned on the bottom.

How to create an op like conv_ops in tensorflow?

What I'm trying to do
I'm new to C++ and bazel and I want to make some change on the convolution operation in tensorflow, so I decide that my first step is to create an ops just like it.
What I have done
I copied conv_ops.cc from //tensorflow/core/kernels and change the name of the ops registrated in my new_conv_ops.cc. I also changed some name of the functions in the file to avoid duplication. And here is my BUILD file.
As you can see, I copy the deps attributes of conv_ops from //tensorflow/core/kernels/BUILD. Then I use "bazel build -c opt //tensorflow/core/user_ops:new_conv_ops.so" to build the new op.
What my problem is
Then I got this error.
I tried to delete bounds_check and got same error for the next deps. Then I realize that there is some problem for including h files in //tensorflow/core/kernels from //tensorflow/core/user_ops. So how can I perfectely create a new op excatcly like conv_ops?
Adding a custom operation to TensorFlow is covered in the tutorial here. You can also look at actual code examples.
To address your specific problem, note that the tf_custom_op_library macro adds most of the necessary dependencies to your target. You can simply write the following :
tf_custom_op_library(
name="new_conv_ops.so",
srcs=["new_conv_ops.cc"]
)

How to write summaries for multiple runs in Tensorflow

If you look at the Tensorboard dashboard for the cifar10 demo, it shows data for multiple runs. I am having trouble finding a good example showing how to set the graph up to output data in this fashion. I am currently doing something similar to this, but it seems to be combining data from runs and whenever a new run starts I see the warning on the console:
WARNING:root:Found more than one graph event per run.Overwritting the graph with the newest event
The solution turned out to be simple (and probably a bit obvious), but I'll answer anyway. The writer is instantiated like this:
writer = tf.train.SummaryWriter(FLAGS.log_dir, sess.graph_def)
The events for the current run are written to the specified directory. Instead of having a fixed value for the logdir parameter, just set a variable that gets updated for each run and use that as the name of a sub-directory inside the log directory:
writer = tf.train.SummaryWriter('%s/%s' % (FLAGS.log_dir, run_var), sess.graph_def)
Then just specify the root log_dir location when starting tensorboard via the --logdir parameter.
As mentioned in the documentation, you can specify multiple log directories when running tensorboard. Alternatively, you can create multiple run subfolder in the log directory to visualize different plots in the same graph.

Inferring topics with mallet, using the saved topic state

I've used the following command to generate a topic model from some documents:
bin/mallet train-topics --input topic-input.mallet --num-topics 100 --output-state topic-state.gz
I have not, however, used the --output-model option to generate a serialized topic trainer object. Is there any way I can use the state file to infer topics for new documents? Training is slow, and it'll take a couple of days for me to retrain, if I have to create the serialized model from scratch.
We did not use the command line tools shipped with mallet, we just use the mallet api to create the serialized model for inferences of the new document. Two point need special notice:
You need serialize out the pipes you used just after you finish the training (For my case, it is SerialPipes)
And of cause the model need also to be serialized after you finish the training(For my case, it is ParallelTopicModel)
Please check with the java doc:
http://mallet.cs.umass.edu/api/cc/mallet/pipe/SerialPipes.html
http://mallet.cs.umass.edu/api/cc/mallet/topics/ParallelTopicModel.html
Restoring a model from the state file appears to be a new feature in mallet 2.0.7 according to the release notes.
Ability to restore models from gzipped "state" files. From the new
TopicTrainer, use the --input-state [filename] argument. Note that you
can manually edit this file. Any token with topic set to -1 will be
immediately resampled upon loading.
If you mean you want to see how new documents fit into a previously trained topic model, then I'm afraid there is no simple command you can use to do it right.
The class cc.mallet.topics.LDA in mallet 2.0.7's source code provides such a utility, try to understand it and use it in your program.
P.S., If my memory serves, there is some problem with the implementation of the function in that class:
public void addDocuments(InstanceList additionalDocuments,
int numIterations, int showTopicsInterval,
int outputModelInterval, String outputModelFilename,
Randoms r)
You have to rewrite it.