Servers and sessions - tensorflow

I'm learning about distributed TensorFlow applications, and I understand jobs, tasks, and servers. A server's target identifies its gRPC location, as in grpc://localhost:1234.
I don't understand what happens when you create a session with a server's target, as in the following code:
with tf.Session(server.target) as sess:
...
The documentation states that server.target identifies the session's execution engine. Another page says that the constructor creates a session on the server. This isn't clear to me. How exactly does a server's target affect the session's execution?

Related

Failed Deployment in App Engine Google Cloud

I am deploying my nodejs application in google cloud app engine but it is giving error
This request caused a new process to be started for your application, and thus caused your application code to be loaded for the first time.
This request may thus take longer and use more CPU than a typical request for your application. -- when making request.
I had also saw some stackoverflow answers, but they didn't worked for me.
my app.yaml have this config
runtime: nodejs10
Can anyone help me out
You could add the following to your app.yaml:
inbound_services:
- warmup
And then implement a handler that will catch all warmup requests, so that your application doesn't get the full load. The full explanation is given here. Another detailed post about this topic can be found here.
Additionally you can also add automatic scaling options. You can play a bit with those to find the optimum for your application. Especially the latency related variables are important. Good to note that they can be set in a standard GAE environment.
automatic_scaling:
min_idle_instances: automatic
max_idle_instances: automatic
min_pending_latency: automatic
max_pending_latency: automatic
More scaling options can be found here.
The "request caused a new process to be started" notification usually occurred when there is no warm up request present in your application.
Can you try to implement a health check handler that only returns a ready status when the application is warmed up. This will allow your service to not receive traffic until it is ready.
Warning: Legacy health checks using the /_ah/health path are now
deprecated, and you should migrate to use split health checks.
Here you can find Split health checks for Nodejs
Liveness checks
Liveness checks confirm that the VM and the Docker container are
running. Instances that are deemed unhealthy are restarted.
path: "/liveness_check"
check_interval_sec: 30
timeout_sec: 4
failure_threshold: 2
success_threshold: 2
Readiness checks
Readiness checks confirm that an instance can accept incoming
requests. Instances that don't pass the readiness check are not added
to the pool of available instances.
path: "/readiness_check"
check_interval_sec: 5
timeout_sec: 4
failure_threshold: 2
success_threshold: 2
app_start_timeout_sec: 300
Edit
For App Engine Standard, which doesn't afford you that flexibility, hardware and software failures that cause early termination or frequent restarts can occur without prior warning. link
App Engine attempts to keep manual and basic scaling instances running
indefinitely. However, at this time there is no guaranteed uptime for
manual and basic scaling instances. Hardware and software failures
that cause early termination or frequent restarts can occur without
prior warning and can take considerable time to resolve; thus, you
should construct your application in a way that tolerates these
failures.
Here are some good strategies for avoiding downtime due to instance
restarts:
Reduce the amount of time it takes for your instances restart or for
new ones to start.
For long-running computations, periodically create
checkpoints so that you can resume from that state.
Your app should be "stateless" so that nothing is stored on the instance.
Use queues for performing asynchronous task execution.
If you configure your instances to manual scaling: Use load balancing across > multiple instances. Configure more instances than required to handle normal
traffic. Write fall-back logic that uses cached results when a manual
scaling instance is unavailable.
Instance Uptime

JMeter gives “The target server failed to respond ” Error

We use Jmeter to do performance testing. I gave 200 threads(200 users). and we have two servers. like sever A, Server B. i tested indivisibly for 200 Users, it works. and we load balancing server Like server C. So request goes to ether server A Or Server B. But if configure my same jmx script(200 thread) with Server C. it gives error below (but it works for 50 users-- no error).
org.apache.http.NoHttpResponseException: The target server failed to respond
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:95)
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:61)
at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:254)
at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:289)
at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:252)
at org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:191)
at org.apache.jmeter.protocol.http.sampler.MeasuringConnectionManager$MeasuredConnection.receiveResponseHeader(MeasuringConnectionManager.java:201)
at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:300)
at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:127)
If the issue can be reproduced only on higher loads - it's definitely a server (or load balancer) issue so congratulations on finding the first bottleneck.
Now you can investigate the reason and suggest the fixes, the next steps could be:
Inspect application under test / load balancer logs - you can find a clue there
Inspect application under test / load balancer / database / any other middleware configuration. in the majority of cases default configuration is good for development and debugging but you will need to perform some performance tuning before running a prod-like load test
Collect main health metrics on the application under test side (CPU, RAM, Network, Disk, Swap usage, etc.). It might be the case your application simply lacks hardware resources. You can use built-in tools of the operating system(s) or an APM tool or JMeter PerfMon Plugin
Re-run your test with a profiling tool telemetry enabled on the application under test side. This will give you an overview with regards to where the application spends the most time, which are the "heaviest" functions or functions called most frequently so you would know what to optimise.
Make sure that the load balancer equally (or according to the other algorithm) distributes the requests between the backend servers. It might be the case you're hitting only one server, if this is - consider adding the DNS Cache Manager to your Test Plan and re-run your test to see if it helps.

How to build up a parameter server?

In distributed tensorflow, does a parameter server need to be built as a tensorflow server with both master service and worker service ? If the answer is yes, then can a ps machine also be a worker ?
In distributed TensorFlow, every worker is also a master. Furthermore, TensorFlow runtime has only one kind of worker, unlike its predecessor DistBelief, which had specialized parameter server workers.
You implement traditional parameter server architecture by using some workers to store parameters and others to execute session.run requests.

How to construct remote TenosrFlow session without starting a Server?

How can I construct a TensorFlow session that connects to an existing set of TF servers?
Following the distributed TensorFlow guide I started a bunch of TensorFlow servers that are connected to form a cluster.
I would now like to start a session that can connect to those TF servers and assign ops to them. I assume I just need to specify an appropriate target in the constructor of tf session; e.g something like
with tf.Session(
target, config=tf.ConfigProto(log_device_placement=True)) as sess:
However its not clear to me how to construct a target object pointing at any existing cluster of TF servers. The docs only show how to get the cluster spec from the server by calling server.target.
Do I need to start another server in order to construct a client that talks to the existing servers?
I want to connect to the TF cluster remotely. My TF servers are running on GCE VMs. I want to connect and assign ops to them from my local machine. Is this possible?
For the target you can just use a string of the form
grpc://<host>:<port>
Where host and port are the hostname (or ip address) and port of one of the TF servers comprising the cluster. It doesn't matter which one because where an op executes depends on the device you assigned it to when you constructed the graph.

Why are my WebLogic clustered MDB app deployments in warning state?

I have a WebLogic cluster on which I've deployed numerous topics and applications that use them. My applications uniformly show themselves in a Warning status. Looking at Monitoring on the deployment, I see the MDB application connects to Server #1, but on server #2 it shows this:
MDB application appName is NOT connected to messaging system.
My JMS Server is targetted to a migratable target, which is in turn targetted to the #1 server and has a cluster identified. And messages sent to either server all flow as expected. I just don't know why these deployments show in a Warning state.
WebLogic 11g
This can be avoided by using the parameter below
<start-mdbs-with-application>false</start-mdbs-with-application>
In the weblogic-application.xml, Setting start-mdbs-with-application to false forces MDBs to defer starting until after the server instance opens its listen port, near the end of the server boot up process.
If you want to perform startup tasks after JMS and JDBC services are available, but before applications and modules have been activated, you can select the Run Before Application Deployments option in the Administration Console (or set the StartupClassMBean’s LoadBeforeAppActivation attribute to “true”).
If you want to perform startup tasks before JMS and JDBC services are available, you can select the Run Before Application Activations option in the Administration Console (or set the StartupClassMBean’s LoadBeforeAppDeployments attribute to “true”).
Refer :http://docs.oracle.com/cd/E13222_01/wls/docs81/ejb/message_beans.html
this is applicable for the versions till 12c and later
I don't like unanswered questions, so I'm going to answer this one.
The problem is resolved, though I was not involved in its resolution. At present the problem only exists for the length of time it takes the JMS subsystem to fully initialize. During that period (with many queues, it can take a while) the JNDI system throws errors and the apps are truly in warning state. Once the JMS is fully initialized, everything goes green.
My belief is that someone corrected something in the JMS Server / Cluster config. I'll never know what it was.