I've read the XLA prerelease document here.
https://www.tensorflow.org/versions/master/resources/xla_prerelease#xla_accelerated_linear_algebra
It discusses datatypes of elements, but does not go into much detail about the data organization of the tensors themselves. How will operations on SparseTensor objects be handled once XLA is available?
The layouts restrict the data organization of input and output tensors and don't include sparse layouts, although as Jingyue suggests, they could be extended in the future. The internal representation of tensors in the AST can in principle be anything a backend wants, and it is expected that the compiler may reorganize the data to different layouts for the convenience of different operators implemented by different backends.
I am not aware that anyone has put much thought into how to do this efficiently for sparse tensors. In principle maybe it could be done as a compiler pass to infer sparsity and propagate it, with sparse implementations for all the relevant operators. Nothing like that exists today.
No, XLA focuses on dense tensors and doesn't deal with sparse tensors in an efficient way today.
It could be easily extended to allow users to express some sparsity using layouts (e.g. interior padding).
Sparse data is something we'd like to have working, though it has some challenges. E.g. currently XLA depends on knowing the exact size of every buffer statically. We could certainly find a way to deal with that, but have been focusing on dense data so far.
A few years later, XLA seems to have some support for sparse tensors, and working well at that. My workflow involves sparse tensors for very high dimensional data that would be prohibitive to keep in memory, then slicing and manipulating and finally performing math ops on a lower dimensional sense tensor. For slicing sparse tensors I’m getting a roughly 4x speed up with xla
Related
I am using TensorFlow Federated to simulate a scenario in which clients hosted on a remote server can work with our very sparse dataset in a federated setting.
Presently, the code is capable of running with a small subset of the very sparse dataset being loaded on the server-side and passing it to the remote workers hosted on another device. The data is in SVM Light format and can be loaded through sklearn's load_svmlight_file function, but needs to be converted into Tensors to work within tff. The current solution to do so involves converting the very sparse data into a dense array, then setting it up through the tf.data.Dataset.from_tensor_slices function for use with a keras model (following existing examples for tff).
This works, but takes up significant memory resources and is not suitable for the dataset as it cannot be run remotely for more than six samples due to the sparse data's serialized size, nor locally with more than a few hundred samples due to the size in memory.
To mitigate this, I converted the data into SparseTensors, but this approach fails due to the tff.learning.from_keras_model function expecting a pair of TensorSpec input_spec values, not a SparseTensorSpec input_spec with the labels being TensorSpec.
So, are there any concrete examples or known methods to work with SparseTensors within keras models in tff? Or must they be as Tensors for now? The data loads fine when not converted to regular Tensors so I will need to find a solution for working with the sparse data.
If there is presently no way to do so, are there examples of strategies within tff to work with very small subsets of data at a time, either being loaded directly with the remote client or being passed from the server?
Thanks!
I'd say the best approach now is to work with the TF's representation of tf.SparseTensor. That is, a tuple of 3 tensors, indices, values and dense_shape.
So when the problem is with Keras requiring the input to not be sparse tensors, you can pass in the input as for instance a dictionary consisting of these three tensors, which you convert to tf.sparse.SparseTensor as part of your tf.data pipeline.
See also this tutorial which I think is doing something related to what you are looking for, and please ask more detailed questions if needed!
In deep learning based model training, in general batch of inputs are passed. For example for training a deep learning model with [512] dimensional input feature vector, say for batch size= 4, we mainly pass [4,512] dimenional input. I am curious what are the logical significance of passing the same input after flattening the input across the batch and channel dimenions [2048]. Logically the locality structure will be destroyed but will it significanlty speed up my implementation? And can it affect the performance?
In supervised learning, you would usually be working with data points (e.g. a feature vector or a multi-dimensional input such as an image) paired with some kind of ground-truth (a label for classifications tasks, or another multi-dimensional object altogether). Feeding to your model a flattened tensor containing multiple data points would not make sense in terms of supervision. Assuming you do an inference this way, what would be the supervision signal at the output level of your model? Would you combine the labels as well? All of this seem to depend heavily on the use case: is there some kind of temporal coherence between the elements of the batch?
Performance-wise, this has no implications whatsoever. Tensors are already 'flattened' by design since their memory is laid out in contiguous memory buffers. The idea of multi-dimensionality is an abstraction layer provided by those libraries (namely NumPy's arrays and Torch's tensors) to allow for easier and more flexible control over data.
The Keras's ImageDataGenerator looks great for simply progressively loading images and passing an iterator to the model.fit function. However, it seems to be only usable for images and for classification tasks.
I want to do regression, i.e., my labels are also arrays of the same shape as the training set ones. In practice, they are multidimensional (>1 channels) arrays like images but they are not images.
Any suggestions on what class to use to simply spit batches of data to a keras model.fit() for training a deep neural net?
The problem, of course, is that my datasets are much too large to fit in memory, which is why I need to use these generators/iterators.
The best solution for your case is to use tf.data.Dataset().
While it may take a relatively short time to accustom to it, it is the recommended way to load your data and use model.fit().
You can consult the documentation here: https://www.tensorflow.org/api_docs/python/tf/data/Dataset
Is is new, fast, beautifully designed and easily extensible.
For instance, for your problem you may want to use tf.data.Dataset.from_tensor_slices(); I will leave you discover its features :D.
A quick solution would be to use Colab whose GPU instance has got 24 GB RAM to work with . You could also reduce your memory when you load the numpy array like the way I did here
Many times I have seen in neural networks forward propagation that example vectors are multiplied from the left (vector-matrix) and some times from the right (matrix-vector). Notation, some Tensorflow tutorials and the datasets I have found seem to prefer the former over the later, contrary to the way in which linear algebra tends to be teached (matrix-vector way).
Moreover, they represent inverted ways of representing parameters: enumerate problem variables in dimension 0 or enumerate neurons in dimension 0.
This confuses me and makes me wonder if there is really a standard here or it has been only coincidence. If there is, I would like to know if the standard follows some deeper reasons. I would feel really better answering this question.
(By the way, I know that you will normally use example matrices instead of vectors [or more complex things in conv nets, etc..] because the use of minibatches, but the point still holds.)
Not sure if this answer is what you are looking for, but in the context of Tensorflow, the standard is to use a dense layer (https://www.tensorflow.org/api_docs/python/tf/layers/dense) which is a higher level abstraction that wraps up the affine transformation logic you are referring to.
When I was learning tensorflow, one basic concept of tensorflow was computational graphs, and the graphs was said to be static.
And I found in Pytorch, the graphs was said to be dynamic.
What's the difference of static Computational Graphs in tensorflow and dynamic Computational Graphs in Pytorch?
Both frameworks operate on tensors and view any model as a directed acyclic graph (DAG), but they differ drastically on how you can define them.
TensorFlow follows ‘data as code and code is data’ idiom. In TensorFlow you define graph statically before a model can run. All communication with outer world is performed via tf.Session object and tf.Placeholder which are tensors that will be substituted by external data at runtime.
In PyTorch things are way more imperative and dynamic: you can define, change and execute nodes as you go, no special session interfaces or placeholders. Overall, the framework is more tightly integrated with Python language and feels more native most of the times. When you write in TensorFlow sometimes you feel that your model is behind a brick wall with several tiny holes to communicate over. Anyways, this still sounds like a matter of taste more or less.
However, those approaches differ not only in a software engineering perspective: there are several dynamic neural network architectures that can benefit from the dynamic approach. Recall RNNs: with static graphs, the input sequence length will stay constant. This means that if you develop a sentiment analysis model for English sentences you must fix the sentence length to some maximum value and pad all smaller sequences with zeros. Not too convenient, huh. And you will get more problems in the domain of recursive RNNs and tree-RNNs. Currently Tensorflow has limited support for dynamic inputs via Tensorflow Fold. PyTorch has it by-default.
Reference:
https://medium.com/towards-data-science/pytorch-vs-tensorflow-spotting-the-difference-25c75777377b
https://www.reddit.com/r/MachineLearning/comments/5w3q74/d_so_pytorch_vs_tensorflow_whats_the_verdict_on/
Both TensorFlow and PyTorch allow specifying new computations at any point in time. However, TensorFlow has a "compilation" steps which incurs performance penalty every time you modify the graph. So TensorFlow optimal performance is achieved when you specify the computation once, and then flow new data through the same sequence of computations.
It's similar to interpreters vs. compilers -- the compilation step makes things faster, but also discourages people from modifying the program too often.
To make things concrete, when you modify the graph in TensorFlow (by appending new computations using regular API, or removing some computation using tf.contrib.graph_editor), this line is triggered in session.py. It will serialize the graph, and then the underlying runtime will rerun some optimizations which can take extra time, perhaps 200usec. In contrast, running an op in previously defined graph, or in numpy/PyTorch can be as low as 1 usec.
In tensorflow you first have to define the graph, then you execute it.
Once defined you graph is immutable: you can't add/remove nodes at runtime.
In pytorch, instead, you can change the structure of the graph at runtime: you can thus add/remove nodes at runtime, dynamically changing its structure.