recommended way of profiling distributed tensorflow - tensorflow

Currently, I am using tensorflow estimator API to train my tf model. I am using distributed training that is almost 20-50 workers and 5-30 parameter servers based on the training data size. Since I do not have access to the session, I cannot use run metadata a=with full trace to look at the chrome trace. I see there are two other approaches :
1) tf.profiler.profile
2) tf.train.profilerhook
I am specifically using
tf.estimator.train_and_evaluate(estimator, train_spec, test_spec)
where my estimator is a prebuilt estimator.
Can someone give me some guidance (concrete code samples and code pointers will be really helpful since I am very new to tensorflow) what is the recommended way to profile estimator? Are the 2 approaches getting some different information or serve the same purpose? Also is one recommended over another?

There are two things you can try:
ProfilerContext
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/profiler/profile_context.py
Example usage:
with tf.contrib.tfprof.ProfileContext('/tmp/train_dir') as pctx:
train_loop()
ProfilerService
https://www.tensorflow.org/tensorboard/r2/tensorboard_profiling_keras
You can start a ProfilerServer via tf.python.eager.profiler.start_profiler_server(port) on all workers and parameter servers. And use TensorBoard to capture profile.
Note that this is a very new feature, you may want to use tf-nightly.

Tensorflow have recently added a way to sample multiple workers.
Please have a look at the API:
https://www.tensorflow.org/api_docs/python/tf/profiler/experimental/client/trace?version=nightly
The parameter of the above API which is important in this context is :
service_addr: A comma delimited string of gRPC addresses of the
workers to profile. e.g. service_addr='grpc://localhost:6009'
service_addr='grpc://10.0.0.2:8466,grpc://10.0.0.3:8466'
service_addr='grpc://localhost:12345,grpc://localhost:23456'
Also, please look at the API,
https://www.tensorflow.org/api_docs/python/tf/profiler/experimental/ProfilerOptions?version=nightly
The parameter of the above API which is important in this context is :
delay_ms: Requests for all hosts to start profiling at a timestamp
that is delay_ms away from the current time. delay_ms is in
milliseconds. If zero, each host will start profiling immediately upon
receiving the request. Default value is None, allowing the profiler
guess the best value.

Related

How to specify the model version label in a REST API request?

As described in the documentation, using the version_labels field, you can specify a label to a model version in order to handle canary deployments.
https://github.com/tensorflow/serving/blob/master/tensorflow_serving/g3doc/serving_config.md#assigning-string-labels-to-model-versions-to-simplify-canary-and-rollback
For example, you can have model 43 labeled as stable and model 44 labeled as canary.
That feature sounds really neat, but I did not find in the doc how to adapt my POST request to specify the label I want to use.
Until now, I was using something of the sort:
curl -d '{"instances": <<my input data>>}' -X POST http://localhost:8501/v1/models/<<my model name>>:predict
Any idea ?
Update:
Based on comments on this GitHub Issue, #misterpeddy states that, as of August 14th 2019:
Re: not being able to access the version using labels via HTTP - this is something that's not possible today (AFAIR) - only through the grpc interface can you declare labels :(
To the best of my knowledge, this feature is yet to be implemented.
Original Answer:
It looks like the current implementation of the HTTP API Handler expects the version to be numeric.
You can see the regular expression that attempts to parse the URL here.
prediction_api_regex_(
R"((?i)/v1/models/([^/:]+)(?:/versions/(\d+))?:(classify|regress|predict))")
The \d defines an expectation for a numeric version indicator rather than a text label.
I've opened a corresponding TensorFlow Serving issue here.
The REST API for TensorFlow Serving is defined here: https://www.tensorflow.org/tfx/serving/api_rest#url_4
For the predict method it would be:
http://host:port/v1/models/${MODEL_NAME}[/versions/${MODEL_VERSION}]:predict
where ${MODEL_VERSION} would be stable or canary

API design for machine learning models

Using Django rest framework to build an API webservice that contains many of already trained machine learning models. Some models can predict a batch_size of 1 or an image at a time. Others need a history of data (timelines) to be able to predict/forecasts. Usually these timelines can hardly fit and passed as parameter. Being that, we want to give the requester the ability to request by either:
sending the data (small batches) to predict as parameter.
passing a database id/reference as parameter then the API will query the database and do the predictions.
So the question is, what would be the best API design for identifying which approach the requester chose?. Some considered approaches:
Add /db to the path of the endpoint ex: POST models/<X>/db. The problem with this approach is that (2x) endpoints are generated for each model.
Add parameter db as boolean to each request. The problem with such approach is that it adds additional overhead for each request just to check which approach. Also, make the code less readable.
Global variable set for each requester when signed for the API token. The problem is that you restricted the requester for 1 mode which is not convenient.
What would be the best approach for this case
The fact that you currently have more than one source would cause me to seriously consider attempting to abstract the "source" component as much as possible, to allow all manner of sources. For example, suppose that future users would like to pull data out of a mongodb, instead of a whatever db you currently are using? Or from some other storage structure? Or pull from a third party? Or, or, or....
In any case the question is now "how much do they all have in common, and what should they all implement?"
class Source(object):
def __get_batch__(self, batch_size=1):
raise NotImplementedError() #each source needs to implement this on its own
#http_library.POST_endpoint("/db")
class DBSource(Source):
def __init__(self, post_data):
if post_data["table"] in ["data1", "data2"]:
self.table = table
else:
raise Exception("Must use predefined table to prevent SQL injection")
def __get_batch__(self, batch_size=1):
return sql_library.query("SELECT * FROM {} LIMIT ?".format(self.table), batch_size)
#http_library.POST_endpoint("/local")
class LocalSource(Source):
def __init__(self, post_data):
self.data = post_data["data"]
def __get_batch__(self, batch_size=1):
data = self.data[self.i, self.i+batch_size]
i += batch_size
return data
This is just an example. However, if a fixed part of your path designates "the source", then you have left yourself open to scale this indefinitely.
Add /db to the path of the endpoint ex: POST models//db. The problem with this approach is that (2x) endpoints are generated for
each model.
Inevitable. DRY out common code to sub-methods.
Add parameter db as boolean to each request. The problem with such approach is that it adds additional overhead for each request just to
check which approach. Also, make the code less readable.
There won't be any additional overhead (that's what your underlying framework does to match a URL to a function/method anyway). However, these are 2 separate functionalities, I would keep them separate, so I would prefer the first approach.
Global variable set for each requester when signed for the API token. The problem is that you restricted the requester for 1 mode
which is not convenient.
Yikes! unless you provide a UI letting a user to select his preference and apply it globally (I don't think any UX will agree to that)
That being said, the api design should be driven by questioning who is mastering (or owning) the data. If it's the application and user already knows the ID of that entity, then you shouldn't be asking the data from the user.
If it's the user, and then if it won't fit in a POST body, then I would say, a real-time API may not be the right solution, think about message queues/pub-sub based systems.
If you need a hybrid solution as you asked in the question, then, I would prefer the 1st approach.

Distributed TensorFlow: about the using of tf.train.Supervisor.start_queue_runners

I'm looking into the code for distributed inception model in TF, in which I have below questions about the use of tf.train.Supervisor.start_queue_runners in inception_distributed_train.py:
Why do we need to explicitly call sv.start_queue_runners() in line
264 and line 269 in inception_distributed_train.py? In API
doc. of start_queue_runners, I see there is no need for such
calls due to:
Note that the queue runners collected in the graph key QUEUE_RUNNERS
are already started automatically when you create a session with the
supervisor, so unless you have non-collected queue runners to start
you do not need to call this explicitly.
I noticed the values of queue_runners in calling
sv.start_queue_runners are different in line 264 and line
269 in inception_distributed_train.py. But aren't
chief_queue_runners also in the collection of
tf.GraphKeys.QUEUE_RUNNERS (all QUEUE_RUNNERS are obtained in line 263)? If
so, then there is no need for line 269 since the chief_queue_runners has already
been started in line 264.
Besides, could you please explain to me or show me some references about what queues are created in tf.train.Supervisor?
Thanks for your time!
Not an answer, but some general notes how to find an answer :)
First of all, using github's blame, inception_distributed was checked in on April 13, while that comment in start_queue_runners was added on Apr 15th, so it's possible that functionality was changed but didn't get updated in all the places that use it.
You could comment-out that line and see if things still work. And if not, you could add import pdb; pdb.set_trace() in the place where queue runner gets created (ie here) and see who is creating those extra unattended queue runners.
Also, Supervisor development seems to have slowed down and things are getting moved over to FooSession (from comment here). Those provide a more robust training architecture (your workers won't crash because of temporary network error), but there are not many examples on how to use them yet.

BPMN model API to edit Process diagram

I have a process diagram that directs flow on the basis of threshold variables. For example, for variable x,y; if x<50 I am directed to service task 1 , if y<40 to service task 2, or if x>50 && y>40 to some task..
As intuition tells, I am using compare checks on sequence flow to determine next task.
x,y are input by user but 50, 40 (Let's call these numbers {n}) is a part of process definition(PD).
Now, for a fixed {n} I have deployed a process diagram and it runs successfully.
What should I do if my {n} can vary for different process instances? Is there a way to maintain the same version of process definition but which takes {n} dynamically?
I read about BPMN Model API here. But, I can't seem to figure out how to use it to edit my PD dynamically? Do I need to redeploy it each time on Tomcat or how does it work?
If you change a process model with the model API you have to redeploy it to actually use it. If you want to have a process definition with variable {n} values you can also use a variable for it and set it during the start of the process instance either using the Java API, REST API or the Tasklist.

How do multiple versions of a REST API share the same data model?

There is a ton of documentation on academic theory and best practices on how to manage versioning for RESTful Web Services, however I have not seen much discussion on how multiple REST APIs interact with data.
I'd like to see various architectural strategies or documentation on how to handle hosting multiple versions of your app that rely on the same data pool.
For instance, suppose you make a database level destructive change to a database table that causes you to have to increment your major API version to v2.
Now at any given time, users could be interacting with the v1 web service and the v2 web service at the same time and creating data that is visible and editable by both services. How should this be handled?
Most of changes introduced to API affect the content of the response, till changes introduced are incremental this is not a very big problem (note: you should never expose the exact DB model directly to the clients).
When you make a destructive/significant change to DB model and new API version of API is introduced, there are two options:
Turn the previous version off, filter out all queries to reply with 301 and new location.
If 1. is impossible to need to maintain both previous and current version of the API. Since this might time and money consuming it should be done only for some time and finally previous version should be turned off.
What with DB model? When two versions of API are active at the same time I'd try to keep the DB model as consistent as possible - having in mind that running two versions at the same time is just temporary. But as I wrote earlier, DB model should never be exposed directly to the clients - this may help you to avoid a lot of problems.
I have given this a little thought...
One solution may be this:
Just because the v1 API should not change, it doesn't mean the underlying implementation cannot change. You can modify the v1 implementation code to set a default value, omit the saving of a field, return an unchecked exception, or do some kind of computational logic that helps the v1 API to be compatible with the shared datasource. Then, implement a better, cleaner, more idealistic implementation in v2.
when you are going to change any thing in your API structure that can change the response, you most increase you'r API Version.
for example you have this request and response:
request post: a, b, c, d
res: {a,b,c+d}
and your are going to add 'e' in your response fetched from database.
if you don't have any change based on 'e' in current client versions, you can add it on your current API version.
but if you'r new changes are going to change last responses, for example:
res: {a+e, b, c+d}
you most increase API number to prevent crashing.
changing in the request input's are the same.