Are there associated costs with SageMaker Debugger? - amazon-sagemaker-debugger

Are there associated costs with SageMaker Debugger? Everywhere I look it says that there are no additional costs, but this has me curious as to what underlying resources the service uses.

As mentioned in the SageMaker pricing page, there's no additional cost when you use built-in rules for debugging. But if you use custom rules then you'll need to specify an instance size and there will be a cost associated to that instance for the duration of the training job.
Also this documentation details the architecture, and explains that debugging for built-in rules happens in processing containers managed by Debugger.
https://docs.aws.amazon.com/sagemaker/latest/dg/debugger-how-it-works.html

Related

Real Browser Metrics in eggPlant Performance tool

Our organization is in the need of executing Performance test in Protocol level(HTTP requests) and UI Level(Browser Actions).
After doing some explorations, I found that eggPlant has option to execute functional scripts in eggPlant Performance. Do any one know about what are all the Real Browser Metrics which eggPlant Performance is providing?
Looking forward for some useful helps.
EggPlant Performance doesn't provide a fixed number of metrics. A better mental model is to understand that while a test suite is running EggPlant records event log entries. These event log entries are consumed by a collection of analysis tools to derive whatever charts and metrics are determined to be relevant. In addition to the default events captured during a test run EggPlant supports the logging of custom events.
This is useful because it allows you as the user to choose what metrics are relevant to your specific use case.
Instead of asking "What metrics are there?" a better question to be asking is "What metrics do I care about?".

What is the difference between the `policy` and `collect_policy` of a tf-agent?

I am looking at tf-agents to learn about reinforcement learning. I am following this tutorial. There is a different policy used, called collect_policy for training than for evaluation (policy).
The tutorial states there is a difference, but in IMO it does not describe the why of having 2 policies as it does not describe a functional difference.
Agents contain two policies:
agent.policy — The main policy that is used for evaluation and deployment.
agent.collect_policy — A second policy that is used for data collection.
I've looked at the source code of the agent. It says
policy: An instance of tf_policy.Base representing the Agent's current policy.
collect_policy: An instance of tf_policy.Base representing the Agent's current data collection policy (used to set self.step_spec).
But I do not see self.step_spec anywhere in the source file. The next closest thing I find is time_step_spec. But that is the first ctor argument of the TFAgent class, so that makes no sense to set via a collect_policy.
So the only thing I can think of was: put it to the test. So I used policy instead of collect_policy for training. And the agent reached the max score in the environment nonetheless.
So what is the functional difference between the two policies?
There are some reinforcement learning algorithms, such as Q-learning, that use a policy to behave in (or interact with) the environment to collect experience, which is different than the policy they are trying to learn (sometimes known as the target policy). These algorithms are known as off-policy algorithms. An algorithm that is not off-policy is known as on-policy (i.e. the behaviour policy is the same as the target policy). An example of an on-policy algorithm is SARSA. That's why we have both policy and collect_policy in TF-Agents, i.e., in general, the behavioural policy can be different than the target policy (though this may not always be the case).
Why should this be the case? Because during learning and interaction with environment, you need to explore the environment (i.e. take random actions), while, once you have learned the near-optimal policy, you may not need to explore anymore and can just take the near-optimal action (I say near-optimal rather than optimal because you may not have learned the optimal one)

How to model Storage Capacity in BPMN?

I am right now trying to model a warehouse with import and export processes. I have the problem that I do not know how I should model the capacity of different storage places in the warehouse. There are processes where vehicles with different loadings come and all of them need to be stored in the warehouse with a limited capacity. Else the arriving goods have to be declined.
I am modeling this process in a BPM Suite and was thinking about using Python to access this problem. I thought that I could simply use variables and if clauses to check the capacity of each storage. But if I would simulate this process with this approach then the variables are re-instantiated each time with the start value and do not hold the actual value., beucause with the script is included in the model as a script task.
Does anyone has other ideas to model capacity in BPMN?
Have you considered to not use BPMN as it is clearly adds more complexity than benefit in your case? Look at the Cadence Workflow which allows to specify orchestration logic using normal code and would support your requirements directly without any ugly workarounds.

Sampling rule depending on duration

Is it possible in AWS XRay to create the sampling rule somehow that will sample all the calls for some service with duration greater than some value?
The only way right now to find the lagging sub-service is to sample 100% and then filter by service name and duration. This is pretty expensive having 10K+ segments per second.
AWS X-Ray dev here. Biased sampling on longer duration can skew your service graph statistics and cause false negatives. If you are ok with this data skew, depend on which X-Ray SDK you are using you might be able to achieve conditional sampling by making explicit sampling decisions on segment close. This would require you to override some certain part of the SDK behaviors.
We know this is a popular feature request and we are working on improving it. Please use our AWS X-Ray public forum https://forums.aws.amazon.com/forum.jspa?forumID=241&start=0 for latest feature launch or provide any additional comments.

Limit GPU memory allocation in skflow

I am training convnets with Tensorflow and skflow, on an EC2 instance I share with other people. For all of us to be able to work at the same time, I'd like to limit the fraction of available GPU memory which is allocated.
This question does it with Tensorflow, but since I'm using sklfow I'm never using a tf.Session().
Is it possible to do the same thing through skflow ?
At this moment, you can only control the number of cores (num_cores) to be used in estimators by passing this parameter to estimator.
One can add gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.333) to tf.ConfigProto as suggested by this question you linked to achieve what you need.
Feel free to submit a PR to make changes here as well as adding this additional parameters to all estimators. Otherwise, I'll make the changes some time this week.
Edit:
I have made the changes to allow those options. Please check "Building A Model Using Different GPU Configurations" example in examples folder. Let me know if there's any particular need or other options you want to add. Pull requests are always welcomed!