In my University we have a Cluster having Tesla GPUs. However the resource is shared by several departments and the supercomputing department requires users to provide uniquely the module/code object of the program one needs to run in the cluster. In such a situation, I searched for some information about this. The supercomputer has a queue system (which is usual in supercomputers to be shared). As I understand, the supercomputing department requires one to follow procedures like this. So, how to obtain the object code of a Keras-Theano model compiled for GPU? Just like the produced by gcc model.c --> a.out which is what i need.
Any other idea is very appreciated.
The easiest solution should be pickling theano function, however this only saves optimized graph but not generated code. I'm not sure if this works for your case.
You can use command theano-cache list to find the directories for generated Op code, which is typically under /home/user/.theano/. However hand-compiling this into module may be complicated.
There's also a PR for shared library generation however it's not merged yet.
Related
I'm trying to test all the operations available in tensorflow. For example, we can find 'Conv2d' in tf.nn module.
There are some operations started with an '_', e.g, '_Arg', '_ArrayToList', '_Retval'. I looked into the tensorflow source code, but still can't find how to create an operation '_Arg'. Please give me some instructions of how to find these operations, or what does these operations do?
Those operations are for an internal purpose, they are implemented in c++ so you'll need to download the source code, code (in c++) your own tests, compile and run them, since most of those operations do not have a Python wrapper.
Here you can find the c++ api.
This tutorial may help you if you are starting with tf operation. It does not do what you want, as it works with custom public operations.
You may have a look to the tests already implemented in tf code, fore example this test file.
However, I will strongly recommend that you reconsider if you really need to test those functions. Testing every single function from TensorFlow, even the internal ones, is going to be a hard job.
I encounter some problem while running the python script on Google cloud computing instance, with python 3.6, tensorflow 1.13.1. I see several people encounter similar problems of loops in computational graph on stackoverflow. But none of it really find the culprit for it. And I observe something interesting so maybe someone experienced can figure it out.
The error message is like this:
2019-05-28 22:28:57.747339: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:704] Iteration = 0, topological sort failed with message: The graph couldn't be sorted in topological order.
2019-05-28 22:28:57.754195: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:704] Iteration = 1, topological sort failed with message: The graph couldn't be sorted in topological order.
My script for train.py will look like this:
import A,B,C
...
def main():
....
if __name__ == '__main__':
main()
So I will show my two ways to run this script:
VERSION1:
In terminal,
python3 train.py
This gives me the error like I state above. When I only use CPU, i notice it throws me something like failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected. So I add GPU to my instance but the loop in computational graph is still there.
VERSION 2(This is where weird thing happens):
I just simply copy, with nothing changed, the code in main to the jupyter notebook and run there. Then suddenly, no error occurs anymore.
I don't really know what's going on under the hood. I just notice the message at the beginning of the running is not the same between two different ways of running the code.
If you encounter the same problem, copy to jupyter notebook might help directly. I would really like share more info if someone has any ideas what might possible cause this. Thank you!
Well it turns out, no matter what, I choose a wrong way to build the graph at the beginning, which will not give the loop in my perspective. The loop error give me an idea I'm doing something wrong. But the interesting question I mentioned above is still not answered! However, I'd like to share my error so anyone see the loop error should think whether you're doing the same thing as me.
In the input_fn, I use tensor.eval() to get corresponding numpy.array in the middle to interact with data outside of that function. I choose not to use tf.data.Dataset because the whole process is complicated and I can't compress the whole thing into Dataset directly. But it turns out this approach sabotage the static computational graph design of Tensorflow. So during training, it trains on the same batch again and again. So my two cents advice is that if you want to achieve something super complex in your input_fn. It's likely you will be better off or even only doing the right thing by using the old fashioned modelling way- tf.placeholder.
This year Google produced 5 different packages for seq2seq:
seq2seq (claimed to be general purpose but
inactive)
nmt (active but supposed to be just
about NMT probably)
legacy_seq2seq
(clearly legacy)
contrib/seq2seq
(not complete probably)
tensor2tensor (similar purpose, also
active development)
Which package is actually worth to use for the implementation? It seems they are all different approaches but none of them stable enough.
I've had too a headache about some issue, which framework to choose? I want to implement OCR using Encoder-Decoder with attention. I've been trying to implement it using legacy_seq2seq (it was main library that time), but it was hard to understand all that process, for sure it should not be used any more.
https://github.com/google/seq2seq: for me it looks like trying to making a command line training script with not writing own code. If you want to learn Translation model, this should work but in other case it may not (like for my OCR), because there is not enough of documentation and too little number of users
https://github.com/tensorflow/tensor2tensor: this is very similar to above implementation but it is maintained and you can add more of own code for ex. reading own dataset. The basic usage is again Translation. But it also enable such task like Image Caption, which is nice. So if you want to try ready to use library and your problem is txt->txt or image->txt then you could try this. It should also work for OCR. I'm just not sure it there is enough documentation for each case (like using CNN at feature extractor)
https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/seq2seq: apart from above, this is just pure library, which can be useful when you want to create a seq2seq by yourself using TF. It have a function to add Attention, Sequence Loss etc. In my case I chose that option as then I have much more freedom of choosing the each step of framework. I can choose CNN architecture, RNN cell type, Bi or Uni RNN, type of decoder etc. But then you will need to spend some time to get familiar with all the idea behind it.
https://github.com/tensorflow/nmt : another translation framework, based on tf.contrib.seq2seq library
From my perspective you have two option:
If you want to check the idea very fast and be sure that you are using very efficient code, use tensor2tensor library. It should help you to get early results or even very good final model.
If you want to make a research, not being sure how exactly the pipeline should look like or want to learn about idea of seq2seq, use library from tf.contrib.seq2seq.
TensorFlow use reverse-mode automatic differentiation(reverse mode AD), as shown in https://github.com/tensorflow/tensorflow/issues/675.
Reverse mode AD need a data structure called a Wengert List - see https://en.wikipedia.org/wiki/Automatic_differentiation#Reverse_accumulation.
However, searching through the TensorFlow repository with the keyword "Wengert List", I get nothing.
Do they use a different name, or do they get rid of Wengert List? If so, how?
AD terminology is very old. It was invented when there was no Python and things were complicated. Nowadays you could just use a regular Python list for that purpose.
Implementation of reverse AD is in gradients function of gradients_impl.py here
The data-structure used to store the tape is initialized on line 532 and it's a Python Queue
# Initialize queue with to_ops.
queue = collections.deque()
However, searching through the TensorFlow repository with the keyword "Wengert List", but I get nothing.
This is because TensorFlow is not tape based AD, it is graph based AD system.
Wengert list would be the tape describing the order in which operations were originally executed.
There is also source code transformation based AD and a nice example of that system is Tangent.
Nowadays almost no one uses tape (Wengert list) any more. Check for instance what PyTorch does (Page 2).
TensorFlow 2 uses a Wengert List (a tape) as does JAX and Autograd. This is because these tools keep track of the operations on variables with some sort of gradient_tape.
Tensorflow 1 did not use a Wengert list to keep track of what computation was being performed, instead it used a static graph to keep track of what computation was being performed. This had certain performance benefits, but limited what TensorFlow 1 was capable of doing.
I have been playing around with building some deep learning models in Python and now I have a couple of outcomes I would like to be able to show friends and family.
Unfortunately(?), most of my friends and family aren't really up to the task of installing any of the advanced frameworks that are more or less necessary to have when creating these networks, so I can't just send them my scripts in the present state and hope to have them run.
But then again, I have already created the nets, and just using the finished product is considerably less demanding than making it. We don't need advanced graph compilers or GPU compute powers for the show and tell. We just need the ability to make a few matrix multiplications.
"Just" being a weasel word, regrettably. What I would like to do is convert the the whole model (connectivity,functions and parameters) to a model expressed in e.g. regular Numpy (which, though not part of standard library, is both much easier to install and easier to bundle reliably with a script)
I fail to find any ready solutions to do this. (I find it difficult to pick specific keywords on it for a search engine). But it seems to me that I can't be the first guy who wants to use a ready-made deep learning model on a lower-spec machine operated by people who aren't necessarily inclined to spend months learning how to set the parameters in an artificial neural network.
Are there established ways of transferring a model from e.g. Theano to Numpy?
I'm not necessarily requesting those specific libraries. The main point is I want to go from a GPU-capable framework in the creation phase to one that is trivial to install or bundle in the usage phase, to alleviate or eliminate the threshold the dependencies create for users without extensive technical experience.
An interesting option for you would be to deploy your project to heroku, like explained on this page:
https://github.com/sugyan/tensorflow-mnist