How do you manage 100 virtual servers for inference?

How do you manage 100 virtual servers for inference? - tensorflow

I need to boot 100 virtual servers and use each for tensorflow model inference for 30 days. What are some tools to do this?
I currently boot the servers using an image, and open two tmux sessions manually. One session is for the model client and the other is for the tensorflow server. I receive a slack notification if any of the server CPUs stop working as a way to know if a server fails (I also manually SSH to debug/restart the server).
Would appreciate any tips!

Related

Is there a way of using dask jobqueue over ssh

Dask jobqueue seems to be a very nice solution for distributing jobs to PBS/Slurm managed clusters. However, if I'm understanding its use correctly, you must create instance of "PBSCluster/SLURMCluster" on head/login node. Then you can on the same node, create a client instance to which you can start submitting jobs.
What I'd like to do is let jobs originate on a remote machine, be sent over SSH to the cluster head node, and then get submitted to dask-jobqueue. I see that dask has support for sending jobs over ssh to a "distributed.deploy.ssh.SSHCluster" but this seems to be designed for immediate execution after ssh, as opposed to taking the further step of putting it in jobqueue.
To summarize, I'd like a workflow where jobs go remote --ssh--> cluster-head --slurm/jobqueue--> cluster-node. Is this possible with existing tools?

I am currently looking into this. My idea is to set-up an SSH tunnel with paramiko and then use Pyro5 to communicate with the cluster object from my local machine.

NiFi Site-to-Site Data Flow is Slow

I have multiple standalone NiFi instances (approx. 10) that I want to use to send data to a NiFi cluster (3 NiFi instances) using RPG (Site-to-Site). But, the flow from the standalone instances to the cluster seems to be slow.
Is this the right approach?
How many Site-to-Site Connections does NiFi allow?
Are there any best practices for Site-to-Site NiFi Data Flow?

You may want to first rule out your network. You could ssh to one of the standalone nodes and then try to SCP a large file from the standalone node to one of the nodes in the NiFi cluster. If that is slow then it is more of a network problem and there won't be much you can do to make it go faster in NiFi.
In NiFi, you can tune each side of the site-to-site config...
On the central cluster you can right-click on the remote Input Port and configure the concurrent tasks which defaults to 1. This is the number of threads that can concurrently process data received on the port.
On the standalone NiFi instances you can also configure the concurrent tasks used to send data to a given port. Right-click on the RPG and select "Manage remote ports", and then change the concurrent tasks for whichever port.

Verify Load balancing Azure Container Service

I am using the Azure Container Service with Kubernetes orchestrator and have an app deployed on a cluster with 3 nodes. It has 5 replicas. How can I verify load balancing in action e.g. I want to be able to see that every time I hit the external IP I am being routed to perhaps a different node. Thanks.

The simplest solution is to connect (over ssh for example) to 3 nodes and run WinDump there. In order everything is working properly you will be able to see what happens on every node.
Also here is Microsoft documentation for testing a load balancer:
https://learn.microsoft.com/en-us/azure/virtual-machines/windows/tutorial-load-balancer#test-load-balancer
The default Load Balancer which are available to your Windows Azure Web and Worker roles are software load balancers and not so much configurable however they do work in Round Robin setting. If you want to test this behavior this is what you need to do:
Create two (or more) instances of your service with RDP access
enabled so you can RDP to both instances
RDP to your both instances and run NETMON or any network monitor
solution in it.
Now access your Windows Azure web application from your desktop You
need to understand that when a network connection is made from your
desktop the connection is still alive based on network settings
(default 60 seconds) so you need to wait until default timeout is
passed to access your Windows Azure web application again.
When you will access your Windows Azure Web application again you can
verify that seconds time the request went to next instance. BE sure
to pass the connection timeout otherwise your request will be keep
handled by same instance.
Note: If you dont want to use RDP, you sure can also create a test ASP.NET page to write some special code based on your specific instance which will show you that this page is specific to certain instance. The best way to do is to read the Instance ID as below:
int instanceID = RoleEnvironment.CurrentRoleInstance.Id;
If you want to have more control over Windows Azure Load Balancing, i would suggest using the Windows Azure Traffic Manager which will help you to route the traffic to your site via Round-Robin, Performance or backup based scenario. More info on using Traffis Manager is in this article.

How can I access a local API using Amazon Alexa

I intend to build a set of skills for Amazon Alexa that will integrate with a custom software suite that runs on a RaspberryPi in my home.
I am struggling to figure out how I can make the Echo / Dot itself make an API call to the raspberry pi directly - without going through the internet, as the target device will have nothing more then an intranet connection - it will be able to receive commands from devices on the local network, but is not accessible via the world.
From what I have read, the typical workflow is as follows
Echo -> Alexa Service -> Lambda
Where a Lambda function will return a blob of data to the Smart Home device; using this return value
Is it possible, and how can I make the Alexa device itself make an API request to a device on the local network, after receiving a response from lambda?

I have the same problem and my solution is to use SQS as the message bus so that my RaspberryPi doesn't need to be accessible from the internet.
Echo <-> Alexa Service <-> Lambda -> SQS -> RaspberryPi
A |
+------ SQS <-----+
This works fine as long as:
you enable long polling (20sec) of SQS on the RaspberryPi and set the max messages per request to 1
you don't have concurrent messages going back and forth between Alexa and the RaspberryPi
This give the benefit of:
with a max message size of 1 the SQS request will return as soon as one message is available in the queue, even before the long poll timeout is met
with only 1 long polling at a time to SQS for the entire month this fit under the SQS free tier of 1 million requests
no special firewall permission for accessing your RaspberryPi from the internet, so the RaspberryPi's connection from the lambda always "just works"
more secure than exposing your RaspberryPi to the internet since there are no open ports exposed for malicious programs to attack

You could try using AWS IoT:
Echo <-> Alexa Service <-> Lambda <-> IoT <-> RaspberryPi
I though about using this for my Alexa RasberryPi project but abandoned the idea since AWS IoT doesn't offer a permanent free tier. But the free tier is no longer a concern since Amazon now offers Alexa AWS promotional credits.
https://developer.amazon.com/alexa-skills-kit/alexa-aws-credits

One possibility is to install node-red on your rPi. Node-red has plugins (https://flows.nodered.org/node/node-red-contrib-alexa-local) to simulate Philips hue and makes Alexa talk to it directly. It's an instant response. The downside is that it only works for 3 commands: on , off, set to x %. Works great for software/devices that control lights, shades and air-con.

It was answered in this forum a while ago and I'm afraid to tell you that situation hasn't changed since:
Alexa is cloud based and requires access to the internet / Amazon servers to function, so you cannot use it only within the intranet without external access.

There are a couple workaround methods I've seen used.
The first method is one that I've used:
I setup If This Then That (IFTTT) to listen for a specific phrase from Alexa, then transmit commands through the Telegram secure chat/messaging service where I used a "chat bot" running on my raspberry PI to read and act on those messages.
The second method I most recently saw would use IFTTT to add rows to a google spreadsheet which the raspberry pi could monitor and act on.
I wasn't particularly happy with the performance/latency of either of these methods but if I wrote a custom Alexa service using a similar methodology it might at least eliminate the IFTTT delay.

Just open an SSH tunnel into your rPi with a service like https://ngrok.com/ and then communicate with that as either your endpoint or from the lambda.

You can achieve this by using proxy. BST has a tool for that , I currently use that one http://docs.bespoken.tools/en/latest/commands/proxy/
So rather than using a Lambda you can use local machine.
Essentially it becomes Echo -> Alexa Service -> Local Machine
Install npm bst to your local machine https://www.npmjs.com/package/bespoken-tools
npm install bespoken-tools --save
Go to your projects index.js folder and run proxy command
bst proxy lambda index.js
This will give you a url as follow:
https://proxy.bespoken.tools?node-id=xxx-xxx-xxx-xxx-xxxxxxxx
Now go to your alexa skill on developer.amazon and click to configure your skill.
Choose your service endpoint as https and enter the url printed out by BST
Then click save, and boooom your local machine becomes the final end point.

Celery, rabbitmq: How to install remote worker?

Can I have multiple machines to execute the tasks and return messages that are distributed by django? I looked into celery/rabbitmq, I'm not sure if I can setup celery workers on remote computers. Can anyone guide me through here?
If this is not possible or very hard, any alternative solution for the problem?

You can do this by installing your Django project on to the remote computer, and then ensuring that it is configured to use the correct broker, database server, and media directory (assuming your tasks need access to that).

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How do you manage 100 virtual servers for inference? - tensorflow

Related

Is there a way of using dask jobqueue over ssh

NiFi Site-to-Site Data Flow is Slow

Verify Load balancing Azure Container Service

How can I access a local API using Amazon Alexa

Celery, rabbitmq: How to install remote worker?

Categories

Resources