Building a deep neural network that produces output that is distributed as multivariate Standard normal distribution - tensorflow

I'm looking for a way to Build a deep neural network that produces output that is distributed as multivariate Standard normal distribution ~N(0,1).
I can use Pytorch or TensorFlow, whichever is easier for this task.
I actually have some input X, which in terms of this question can be assumed to be just a matrix of values ​​from the uniform distribution.
I put the input into the network, whose architecture can currently change.
And I want to get output, so in addition to other requirements I will have from it. I want that if we represent the values ​​obtained by all the possible x's, we get that it looks like a multivariate standard normal distribution ~N(0,1).
What I think needs to be done for this to happen is to choose the right loss function.
To do this, I thought of two ways:
Use of statistical tests.
A loss that tests a large number of properties (mean, standard deviation, ..).
Realizing 2 sounds complicated, so I started with 1.
I was looking for statistical tests already implemented in one of the packages ​​as a loss function, and I did not find anything like that.
I implemented statistical tests by myself to obtain output that is univariate standard normal distribution - and it seemed to work relatively well.
With the realization of multidimensional tests I became more entangled.
Do you know of any understandable tensorflow\pythorch functions that do something similar to what I'm trying to do?
Do you have another idea for the operation?
Do you have any comments regarding the methods I try to work with?
Thanks

Using pytorch functions can help you a lot. Considering that I don't know exactly what you will want with these results, I can refer you to pytorch with this link here.
In this link you will have all the pytorch loss functions and the calculations used in each one of them! just click on one and check how it works and see if it’s what you’re looking for.
For the second topic you can look at this same link I sent the BCEWithLogitcLoss function because it may be what you are looking for.

Related

Which model (GPT2, BERT, XLNet and etc) would you use for a text classification task? Why?

I'm trying to train a model for a sentence classification task. The input is a sentence (a vector of integers) and the output is a label (0 or 1). I've seen some articles here and there about using Bert and GPT2 for text classification tasks. However, I'm not sure which one should I pick to start with. Which of these recent models in NLP such as original Transformer model, Bert, GPT2, XLNet would you use to start with? And why? I'd rather to implement in Tensorflow, but I'm flexible to go for PyTorch too.
Thanks!
It highly depends on your dataset and is part of the data scientist's job to find which model is more suitable for a particular task in terms of selected performance metric, training cost, model complexity etc.
When you work on the problem you will probably test all of the above models and compare them. Which one of them to choose first? Andrew Ng in "Machine Learning Yearning" suggest starting with simple model so you can quickly iterate and test your idea, data preprocessing pipeline etc.
Don’t start off trying to design and build the perfect system.
Instead, build and train a basic system quickly—perhaps in just a few
days
According to this suggestion, you can start with a simpler model such as ULMFiT as a baseline, verify your ideas and then move on to more complex models and see how they can improve your results.
Note that modern NLP models contain a large number of parameters and it is difficult to train them from scratch without a large dataset. That's why you may want to use transfer learning: you can download pre-trained model and use it as a basis and fine-tune it to your task-specific dataset to achieve better performance and reduce training time.
I agree with Max's answer, but if the constraint is to use a state of the art large pretrained model, there is a really easy way to do this. The library by HuggingFace called pytorch-transformers. Whether you chose BERT, XLNet, or whatever, they're easy to swap out. Here is a detailed tutorial on using that library for text classification.
EDIT: I just came across this repo, pytorch-transformers-classification (Apache 2.0 license), which is a tool for doing exactly what you want.
Well like others mentioned, it depends on the dataset and multiple models should be tried and best one must be chosen.
However, sharing my experience, XLNet beats all other models so far by a good margin. Hence if learning is not the objective, i would simple start with XLNET and then try a few more down the line and conclude. It just saves time in exploring.
Below repo is excellent to do all this quickly. Kudos to them.
https://github.com/microsoft/nlp-recipes
It uses hugging face transformers and makes them dead simple. 😃
I have used XLNet, BERT, and GPT2 for summarization tasks (English only). Based on my experience, GPT2 works the best among all 3 on short paragraph-size notes, while BERT performs better for longer texts (up to 2-3 pages). You can use XLNet as a benchmark.

Tensorflow Stored Learning

I haven't tried Tensorflow yet but still curious, how does it store, and in what form, data type, file type, the acquired learning of a machine learning code for later use?
For example, Tensorflow was used to sort cucumbers in Japan. The computer used took a long time to learn from the example images given about what good cucumbers look like. In what form the learning was saved for future use?
Because I think it would be inefficient if the program should have to re-learn the images again everytime it needs to sort cucumbers.
Ultimately, a high level way to think about a machine learning model is three components - the code for the model, the data for that model, and metadata needed to make this model run.
In Tensorflow, the code for this model is written in Python, and is saved in what is known as a GraphDef. This uses a serialization format created at Google called Protobuf. Common serialization formats include Python's native Pickle for other libraries.
The main reason you write this code is to "learn" from some training data - which is ultimately a large set of matrices, full of numbers. These are the "weights" of the model - and this too is stored using ProtoBuf, although other formats like HDF5 exist.
Tensorflow also stores Metadata associated with this model - for instance, what should the input look like (eg: an image? some text?), and the output (eg: a class of image aka - cucumber1, or 2? with scores, or without?). This too is stored in Protobuf.
During prediction time, your code loads up the graph, the weights and the meta - and takes some input data to give out an output. More information here.
Are you talking about the symbolic math library, or the idea of tensor flow in general? Please be more specific here.
Here are some resources that discuss the library and tensor flow
These are some tutorials
And here is some background on the field
And this is the github page
If you want a more specific answer, please give more details as to what sort of work you are interested in.
Edit: So I'm presuming your question is more related to the general field of tensor flow than any particular application. Your question still is too vague for this website, but I'll try to point you toward a few resources you might find interesting.
The tensorflow used in image recognition often uses an ANN (Artificial Neural Network) as the object on which to act. What this means is that the tensorflow library helps in the number crunching for the neural network, which I'm sure you can read all about with a quick google search.
The point is that tensorflow isn't a form of machine learning itself, it more serves as a useful number crunching library, similar to something like numpy in python, in large scale deep learning simulations. You should read more here.

Visualizing the detection process in Mask-RCNN

I am working on a project that aims to detect objects in certain difficult circumstances. I ran a test with Mask_RCNN on a dataset that contains that specific type of difficult examples and it did a pretty good job in some of them.
But some other examples didn't get detected surprisingly, when there is no obvious reason. To understand the reason behind this performance difference, I've been adviced to use Tensorboard. But then I realized that its mostly used for training phase, as I understood from this video.
At the end of the video, however, they mention about an integration project of Tensorboard, namely the Tensorflow Debugger Integration. But unfortunately I could not find further information regarding the continuation about that feature.
Is there any way to visualize weights and activation maps inside a CNN during inference/evaluation phase?
The main difference between training and inference time for tensorboard will be the global_step value. Most graphs display global step as the x-axis. You can supply your own global step counter if you like, but you'll have to decide what the x-axis should represent to you in this case since "time" isn't really a logical construct during inference. Other tabs such as the images tab don't have a time component, so using them should be the same as during training.
The tensorflow debugger is a nice terminal debugger, but wouldn't really be related to what you're trying to do here. It's certainly not a visualization tool.
Another approach might be to simply generate your own plots and output a set of PDFs with the various visualizations you need using standard tools like matplotlib for each test image. I've found tools like XnView make it really easy to look through a lot of PDF visualizations to understand what's going on. I've used this approach quite effectively. If you want to view many hundreds or thousands of results quickly you might have an easier time if all the visuals are just dumped out to a directory.

How to specify the architecture of deep neural network in Tensorflow?

I am newbie in Tensorflow
Actually, I am testing some example in Tensorflow web-site, and I start to understand some features of the framwork, but what I don't understand is how I can design my architecture, I mean number of layers, type of Layer "conv, pool...", and if it is necessery to do that, because there are many predifined architectures like AmexNet,
Thanks,
I would strongly recommend working through their hands on tutorial, depending on if you have previous ML experience (https://www.tensorflow.org/get_started/mnist/pros) or not (https://www.tensorflow.org/get_started/mnist/beginners). The questions you are asking are answered in there.
The question on using predefined architectures or self defined depends on your use case. If you want to do something easy like classifying if there is only a car in the scene or not a more shallow architecture might work better, because it is faster and a more deep one is overkill. However most architectures are similar to the ones already defined in literature.
Another question that arises naturally, while talking about pre defined architecture is about transfer learning / fine tuning. Often pre defined architectures are already learned on some big dataset (mostly ImageNet) and already perform really well out of the box for many tasks. With little training data it makes a lot of sense to use this. With lots of training data it can hinder your progress though.

How does deep learning work in real time?

I just have made api which recommends movies based on user input using Django. It’s designed as to execute deep learning function for every 10 inputs from each user. For this each 10 inputs, it takes around 10 seconds to give output. How does google or amazon can do real time data update without delay?
When it comes to execution there is no deep learning, there is just deep model. Let us assume that this is a regular feed forward model, then all you have to do is perform K (depth) matrix multiplications. For convolutional layers it is much more, but still matrix multiplications. All these operations are extremely simple to parallelize. In particular just running them on GPU will give you ~20x boost. Using tensorflow which can scatter computations across many cores/cpus/machines also works the same way. There are also numerous other optimizations possible, like training a small net to replicate behaviour of the big one, making use of sparsity of some matrices (if you are using relus - many neurons will produce zeros, thus - sparse matrices appear) etc.
That being said - network that takes 10s to process 10 inputs sounds like a horrible implementation, or the network is really huge, so before going for general optimization schemes - make sure your current code is fine. For example if you use tensorflow etc. it is important that many things before pushing your data take time - like loading libraries, starting sessions, session .run call itself etc.