For example, Kmeans clustering - is it implemented as a neural network algorithm?
No, why should it ? In order to better understand tensorflow take a look at the original paper in the abstract it states:
TensorFlow [1] is an interface for expressing machine learning
algorithms, and an implementation for executing such algorithms. A
computation expressed using TensorFlow can be executed with little or
no change on a wide variety of heterogeneous systems, ranging from
mobile devices such as phones and tablets up to large-scale
distributed systems of hundreds of machines and thousands of
computational devices such as GPU cards.
Hence Tensorflow is a tool to express algorithms and to schedule them on pieces of hardware such as CPU's, GPU's, TPU's and friends. Because it is most well known for running neural networks doesn't mean that even the simplest things should be implemented by using them.
Related
I'm doing research about distributed DNN, from what I got, we can distribute DNN computation over many GPUs and also we can do on our mobile devices. Inference architectures are usually single platform, so either exist on mobile or in the cloud.
My question is:
Can we distribute training and inference phase in DNN architecture in a joint platform (in both cloud and mobile)? if it possible, how to do that?
There's a plethora of options to choose from, depending on your framework. Horovod is mostly framework agnostic and can be used for distributed training. It also satisfies your need to use cloud services. Although it is entirely possible to create an own framework using Distributed Tensorflow, you should be aware that this is a more low-level approach than Horovod and therefore is missing some bells and whistles.
Distributed inference on the other hand is not as common, since inference itself does not require as much computational power as training, and is embarassingly parallelizable most of the time.
Knowing that Tensorflow is good for working with matrices, would I be able to use Tensorflow to create a cellular automata? And would this offer a great deal of speed over just coding it in Python?
Are there any tutorials or websites that could point me in the right direction to use Tensorflow for more general purpose computing than machine learning (for example, simulations)?
If so, could someone help point me in the right direction to the type of Tensorflow commands I would need to learn to make this program? Thanks!
A TensorFlow implementation is likely to offer an improvement in execution time, especially if executed by GPU, since CA can be executed in parallel. See: https://cs.stackexchange.com/a/320/67726.
A starting point for TensorFlow in general might be the official guide and documentation, which do go beyond just machine learning. Also available are two tutorials on non-ML examples: Mandelbrot Set, Partial Differential Equations.
While TensorFlow is usually mentioned in the context of machine learning, it is worth noting that:
TensorFlow™ is an open source software library for high performance
numerical computation. Its flexible architecture allows easy
deployment of computation across a variety of platforms (CPUs, GPUs,
TPUs), and from desktops to clusters of servers to mobile and edge
devices.
Edit: here's an implementation and a tutorial about Conway's Game of Life using TF.
As the question already suggests, I am new to deep learning. I know that the learning process of the model will be slow without GPU. If I am willing to wait, Will it be OK if i use CPU only ?
Many operations which are performed in computing deep learning (and neural networks in general) can be run in parallel, meaning they can be calculated independently then aggregated later. This is, in part, because most of the operations are on vectors.
A typical consumer CPU has between 4 to 8 cores, and hyperthreading allows them to be treated as 8 or 16 respectively. Server CPUs can have between 4 to 24 cores, 8 to 48 threads respectively. Additionally, most modern CPUs have SIMD (single instruction multiple data) extensions which allow them to perform vector operations in parallel on a single thread. Depending on the data type you're working with, an 8 core CPU can perform 8 * 2 * 4 = 64 to 8 * 2 * 8 = 128 vector calculations at once.
Nvidia's new 1080ti has 3584 CUDA cores, which essentially means it can perform 3584 vector calculations at once (hyperthreading and SIMD don't come into play here). That's 56 to 28 times more operations at once than an 8 core CPU. So, whether you're training a single network, or multiples to tune meta-parameters, it will probably be significantly faster on a GPU than a CPU.
Depending on what you are doing, it might take a lot longer. I had 20x speedups be using a GPU. If you read some Computer Vision papers, they train their networks on ImageNet for about 1-2 weeks. Now imagine if that took 20x longer...
Having said that: There are much simpler tasks. For example, for my HASY dataset you can train a reasonable network without a GPU in probably 3 hours. Similar small datasets are MNIST, CIFAR-10, CIFAR-100.
Computationally intensive part of the neural network is multiple matrix multiplications. And how do we make it faster? We can do this by doing all the operations at the same time instead of doing it one after the other. This is in a nutshell why we use GPU (graphics processing units) instead of a CPU (central processing unit).
Google used to have a powerful system, which they had specially built for training huge nets. This system costs $5 billion, with multiple clusters of CPUs.
Few years later, researchers at Stanford built the same system in terms of computation to train their deep nets using GPU. They reduced the costs to $33K. This system was built using GPUs, and it gave the same processing power as Google’s system.
Source: https://www.analyticsvidhya.com/blog/2017/05/gpus-necessary-for-deep-learning/
Deep learning is all about building a mathematical model of the reality or of some kind of part of reality for some kind of specific use by using a lot of training data so you use a lot of training data from the real world that you have collected and then you can train your model so your mathematical model can predict the other outcomes when you give it new data as input so you basically can train this mathematical model but it needs a lot of data and this training needs a lot of computation. So there are lot of computational heavy operations that need to take place and also you need a lot of data. Therefore, for example companies such as Nvidia who are traditionally have been making gaming GPUs for graphics, now they are also having a huge part of the revenue coming from AI and Machine Learning and all of these scientists who want to train their models and you see companies like Google and Facebook, all of them are using GPUs currently to train their ML models.
If you ask this question you probably need a GPU/TPU (Tensor Processing Unit).
You can get one with Google Colab GPU for "free". They have a pretty cool cloud GPU technology
You can stat working with your google accounts
with a Jupiter notebook: https://colab.research.google.com/notebooks/intro.ipynb
Kaggle (Google Owned Data Science competition site) also has this option to create Jupiter notebooks + GPU, but only in limited cases:
notebooks: https://www.kaggle.com/kernels
Documentation for it: https://www.kaggle.com/docs/notebooks
Is the TensorFlow designed only for implementing neural networks? Can it be used as a general machine learning library -- for implementing all sorts of supervised as well as unsupervised techniques (naive baysian, decision trees, k-means, SVM to name a few) ? Whatever TensorFlow literature I am coming across is generally talking about neural networks. Probably graph based architecture of TensorFlow makes it suitable candidate for neural nets. But can it be also used as a general Machine Learning framework?
Tensorflow does include additional machine learning algorithms such as:
K-means clustering
Random Forests
Support Vector Machines
Gaussian Mixture Model clustering
Linear/logistic regression
The above list is taken from here, so you can read this link for more details.
I am little confused about these two concepts.
I saw some examples about multi GPU without using clusters and servers in the code.
Are these two different? What is the difference?
Thanks a lot!
It depends a little on the perspective from which you look at it. In any multi-* setup, either multi-GPU or multi-machine, you need to decide how to split up your computation across the parallel resources. In a single-node, multi-GPU setup, there are two very reasonable choices:
(1) Intra-model parallelism. If a model has long, independent computation paths, then you can split the model across multiple GPUs and have each compute a part of it. This requires careful understanding of the model and the computational dependencies.
(2) Replicated training. Start up multiple copies of the model, train them, and then synchronize their learning (the gradients applied to their weights & biases).
Our released Inception model has some good diagrams in the readme that show how both multi-GPU and distributed training work.
But to tl;dr that source: In a multi-GPU setup, it's often best to synchronously update the model by storing the weights on the CPU (well, in its attached DRAM). But in a multi-machine setup, we often use a separate "parameter server" that stores and propagates the weight updates. To scale that to a lot of replicas, you can shard the parameters across multiple parameter servers.
With multiple GPUs and parameter servers, you'll find yourself being more careful about device placement using constructs such as with tf.device('/gpu:1'), or placing weights on the parameter servers using tf.train.replica_device_setter to assign it on /job:ps or /job:worker.
In general, training on a bunch of GPUs in a single machine is much more efficient -- it takes more than 16 distributed GPUs to equal the performance of 8 GPUs in a single machine -- but distributed training lets you scale to even larger numbers, and harness more CPU.
Well until recently there was no open-source cluster version of tensor flow - just single machine with zero or more GPU.
The new release v0.9 may or may not have changed things.
The article in the original release documentation (Oct 2015) showed that Google has cluster-based solutions - but they had not open-sourced them.
Here is what the whitepaper says:
3.2 Multi-Device Execution Once a system has multiple devices, there are two main complications: deciding which device to place the computation for each node in the graph, and then managing the required
communication of data across device boundaries implied by these
placement decisions. This subsection discusses these two issues