Can i distribute training and inference DNN architecture over cloud and edge devices? - tensorflow

I'm doing research about distributed DNN, from what I got, we can distribute DNN computation over many GPUs and also we can do on our mobile devices. Inference architectures are usually single platform, so either exist on mobile or in the cloud.
My question is:
Can we distribute training and inference phase in DNN architecture in a joint platform (in both cloud and mobile)? if it possible, how to do that?

There's a plethora of options to choose from, depending on your framework. Horovod is mostly framework agnostic and can be used for distributed training. It also satisfies your need to use cloud services. Although it is entirely possible to create an own framework using Distributed Tensorflow, you should be aware that this is a more low-level approach than Horovod and therefore is missing some bells and whistles.
Distributed inference on the other hand is not as common, since inference itself does not require as much computational power as training, and is embarassingly parallelizable most of the time.

Related

What's different between TFServing and KFServing on KubeFlow

TFServin and KFServing both deploy the model on Kubeflow, and let users easy to use the model as a service, don't need to know detail about Kubernetes, hiding the infra layers.
TFServing is from TensorFlow, it can also run on Kubeflow or standalone. TFserving on kubeflow
KFServing is from Kubeflow, which can support multiple frameworks like PyTorch, TensorFlow, MXNet, etc. KFServing
My question is what's the main difference between these two projects.
If I want to launch my model in production, which should I use? which has better performance?
KFServing is an abstraction on top of inferencing rather than a replacement. It seeks to simplify deployment and make inferencing clients agnostic to what inference server is doing the actual work behind the scenes (be it TF Serving, Triton (formerly TRT-IS), Seldon, etc). It does this by seeking agreement among inference server vendors on an inferencing dataplane specification which allows extra components (such as transformations and explainers) to be more pluggable.

Why is my Google automl vision trained on cloud much better than the one trained on edge

I am new to Google Vision and i just tried a dataset to train. I first trained it on edge and another time on cloud hosted. In both cases I used the suggested nodes per hours. My model on edge is much worse than on cloud. Can someone explain this? don't they both train in the cloud and should've the same results? I thought the difference is only, that on edge it is possible to export the model.
I used a image classification
Kind regards
Yes, both are indeed trained on cloud, but difference is where each model is intended to use.
Edge models are lighter, in terms of model size and computation needed to perform prediction, that is why they are not as good as cloud models. Intention of edge models is to run on edge devices(like mobile devices) which do no have much computational power like that of GPU cloud instances.(Probably this is the reason why Google allows you to export edge models so that they can be used offline on mobile devices.)
On the other hand, models trained for cloud usage gives more preference to model accuracy, which are intended to be run on big GPU/CPU machines.
There is a trade-of between edge models and cloud models. Edge models are have low memory requirements where accuracy is compromised and higher latency, whereas cloud models are more accurate with comparatively higher memory requirements.

How can I use Tensorflow to make cellular automata?

Knowing that Tensorflow is good for working with matrices, would I be able to use Tensorflow to create a cellular automata? And would this offer a great deal of speed over just coding it in Python?
Are there any tutorials or websites that could point me in the right direction to use Tensorflow for more general purpose computing than machine learning (for example, simulations)?
If so, could someone help point me in the right direction to the type of Tensorflow commands I would need to learn to make this program? Thanks!
A TensorFlow implementation is likely to offer an improvement in execution time, especially if executed by GPU, since CA can be executed in parallel. See: https://cs.stackexchange.com/a/320/67726.
A starting point for TensorFlow in general might be the official guide and documentation, which do go beyond just machine learning. Also available are two tutorials on non-ML examples: Mandelbrot Set, Partial Differential Equations.
While TensorFlow is usually mentioned in the context of machine learning, it is worth noting that:
TensorFlowâ„¢ is an open source software library for high performance
numerical computation. Its flexible architecture allows easy
deployment of computation across a variety of platforms (CPUs, GPUs,
TPUs), and from desktops to clusters of servers to mobile and edge
devices.
Edit: here's an implementation and a tutorial about Conway's Game of Life using TF.

Is everything in Tensorflow implemented as a NN?

For example, Kmeans clustering - is it implemented as a neural network algorithm?
No, why should it ? In order to better understand tensorflow take a look at the original paper in the abstract it states:
TensorFlow [1] is an interface for expressing machine learning
algorithms, and an implementation for executing such algorithms. A
computation expressed using TensorFlow can be executed with little or
no change on a wide variety of heterogeneous systems, ranging from
mobile devices such as phones and tablets up to large-scale
distributed systems of hundreds of machines and thousands of
computational devices such as GPU cards.
Hence Tensorflow is a tool to express algorithms and to schedule them on pieces of hardware such as CPU's, GPU's, TPU's and friends. Because it is most well known for running neural networks doesn't mean that even the simplest things should be implemented by using them.

tensorflow: difference between multi GPUs and distributed tensorflow

I am little confused about these two concepts.
I saw some examples about multi GPU without using clusters and servers in the code.
Are these two different? What is the difference?
Thanks a lot!
It depends a little on the perspective from which you look at it. In any multi-* setup, either multi-GPU or multi-machine, you need to decide how to split up your computation across the parallel resources. In a single-node, multi-GPU setup, there are two very reasonable choices:
(1) Intra-model parallelism. If a model has long, independent computation paths, then you can split the model across multiple GPUs and have each compute a part of it. This requires careful understanding of the model and the computational dependencies.
(2) Replicated training. Start up multiple copies of the model, train them, and then synchronize their learning (the gradients applied to their weights & biases).
Our released Inception model has some good diagrams in the readme that show how both multi-GPU and distributed training work.
But to tl;dr that source: In a multi-GPU setup, it's often best to synchronously update the model by storing the weights on the CPU (well, in its attached DRAM). But in a multi-machine setup, we often use a separate "parameter server" that stores and propagates the weight updates. To scale that to a lot of replicas, you can shard the parameters across multiple parameter servers.
With multiple GPUs and parameter servers, you'll find yourself being more careful about device placement using constructs such as with tf.device('/gpu:1'), or placing weights on the parameter servers using tf.train.replica_device_setter to assign it on /job:ps or /job:worker.
In general, training on a bunch of GPUs in a single machine is much more efficient -- it takes more than 16 distributed GPUs to equal the performance of 8 GPUs in a single machine -- but distributed training lets you scale to even larger numbers, and harness more CPU.
Well until recently there was no open-source cluster version of tensor flow - just single machine with zero or more GPU.
The new release v0.9 may or may not have changed things.
The article in the original release documentation (Oct 2015) showed that Google has cluster-based solutions - but they had not open-sourced them.
Here is what the whitepaper says:
3.2 Multi-Device Execution Once a system has multiple devices, there are two main complications: deciding which device to place the computation for each node in the graph, and then managing the required
communication of data across device boundaries implied by these
placement decisions. This subsection discusses these two issues