CNTK time series anomaly detection tutorial or documentation (RNN/LTSM)? - cntk

Problem
Do you have a tutorial for LTSM or RNN time series anomaly detection using deep learning with CNTK? If not, can you make one or suggest a series of simple steps here for us to follow?
I am a software developer and a member of a team investigating using deep learning on time series data we have for anomaly detection. We have not found anything on your python docs that can help us. It seems most of the tutorials are for visual recognition problems and not specific to the problem domain of interest to us.
Using LTSM and RNN in Anomaly Detection
I have found the following
This link references why we are trying to use time series for anomaly detection
This paper convinced us that the first link is a respected approach to the problem in general
This link also outlined the same approach
I look around on CNTK here, but didn't find any similar question and so I hope this question helps other developers in the future.
Additional Notes and Questions
My problem is that I am finding CNTK not that simple to use or as well documented as I had hoped. Frankly, our framework and stack is heavy on .NET and Microsoft technologies. So I repeat the question again for emphasis with a few follow ups:
Do you have any resources you feel you can recommend to developers learning neural networks, deep learning, and so on to help us understand what is going on under the hood with CNTK?
Build 2017 mentions C# is supported by CNTK. Can you please point us in the direction of where the documentation and support is for this?
Most importantly can you please help get us unstuck on trying to do time series anomaly analysis for time series using CNTK?
Thank you very much for time and assistance in reading and asking this question

Thanks for your feedback. Your suggestions help improve the toolkit.
First Bullet
I would suggest that you can start with the CNTK tutorials.
https://github.com/Microsoft/CNTK/tree/master/Tutorials
They are designed from CNTK 101 to 301. Suggest that you work through them. Many of them even though uses image data, the concept and the models are amenable to build solutions with numerical data. 101-103 series are great to understand basics of the train-test-predict workflow.
Second Bullet:
Once you have trained the model (using Python recommended). The model evaluation can be performed using different language bindings, C# being one of them.
https://github.com/Microsoft/CNTK/wiki/CNTK-Evaluation-Overview
Third Bullet
There are different approaches suggested in the papers you have cited. All of them are possible to do in CNTK with some changes to the code in the tutorials.
The key tutorial for you would be CNTK 106, CNTK 105, and CNTK 202
Anomaly as classification: This would involve you label your target value as 1 of N classes, with one of the class being "anomaly". Then you can combine 106 with 202, to classify the prediction
Anomaly as an autoencoder: You can need to study 105 autoencoder. Now instead of a dense network, you could apply the concept for Recurrent networks. Train only on the normal data. Once trained, pass any data through the trained model. The difference between the input and autoencoded version will be small for normal data but the difference will be much larger for anomalies. The 105 tutorial uses images, but you can train these models with any numerical data.
Hope you find these suggestions helpful.

Related

Which model (GPT2, BERT, XLNet and etc) would you use for a text classification task? Why?

I'm trying to train a model for a sentence classification task. The input is a sentence (a vector of integers) and the output is a label (0 or 1). I've seen some articles here and there about using Bert and GPT2 for text classification tasks. However, I'm not sure which one should I pick to start with. Which of these recent models in NLP such as original Transformer model, Bert, GPT2, XLNet would you use to start with? And why? I'd rather to implement in Tensorflow, but I'm flexible to go for PyTorch too.
Thanks!
It highly depends on your dataset and is part of the data scientist's job to find which model is more suitable for a particular task in terms of selected performance metric, training cost, model complexity etc.
When you work on the problem you will probably test all of the above models and compare them. Which one of them to choose first? Andrew Ng in "Machine Learning Yearning" suggest starting with simple model so you can quickly iterate and test your idea, data preprocessing pipeline etc.
Don’t start off trying to design and build the perfect system.
Instead, build and train a basic system quickly—perhaps in just a few
days
According to this suggestion, you can start with a simpler model such as ULMFiT as a baseline, verify your ideas and then move on to more complex models and see how they can improve your results.
Note that modern NLP models contain a large number of parameters and it is difficult to train them from scratch without a large dataset. That's why you may want to use transfer learning: you can download pre-trained model and use it as a basis and fine-tune it to your task-specific dataset to achieve better performance and reduce training time.
I agree with Max's answer, but if the constraint is to use a state of the art large pretrained model, there is a really easy way to do this. The library by HuggingFace called pytorch-transformers. Whether you chose BERT, XLNet, or whatever, they're easy to swap out. Here is a detailed tutorial on using that library for text classification.
EDIT: I just came across this repo, pytorch-transformers-classification (Apache 2.0 license), which is a tool for doing exactly what you want.
Well like others mentioned, it depends on the dataset and multiple models should be tried and best one must be chosen.
However, sharing my experience, XLNet beats all other models so far by a good margin. Hence if learning is not the objective, i would simple start with XLNET and then try a few more down the line and conclude. It just saves time in exploring.
Below repo is excellent to do all this quickly. Kudos to them.
https://github.com/microsoft/nlp-recipes
It uses hugging face transformers and makes them dead simple. 😃
I have used XLNet, BERT, and GPT2 for summarization tasks (English only). Based on my experience, GPT2 works the best among all 3 on short paragraph-size notes, while BERT performs better for longer texts (up to 2-3 pages). You can use XLNet as a benchmark.

what is the real meaning of training steps in deep learning Tensorflow?

I am following this tutorial about Google Cloud Platform (GCP) for deep learning. According to the tutorial it says
The --train_steps option specifies the total number of training batches.
However, in the code of the same tutorial it says
'--train_steps', help='Steps to run the training job for.'
Now, I am confused because I found several questions regarding this topic here in StackOverflow and other sources saying that the training steps corresponds to the number of iterations that the optimizer does to find the minimum. Can someone confirm which one of these three definitions is correct?
xerx is correct in that steps and batches are pretty much the same thing, and can be treated as doing exactly the same thing for your model. I write more about the training process, batch sizes, and overfitting in my book about stock prediction, which you can find here - it's great for learning Deep Learning.
Good Luck!

How to determine what type of layers do I need for my Deep learning model?

Suppose that I have want to make a model that does something. Now when I search about the topic in Google or YouTube, I find many related tutorials and it seems like some clever programmer had already implemented that model with Deep learning.
But how do they know that what type of layers, what type of activation functions, loss functions, optimizer, number of units etc. they need to solve that certain problem using deep learning.
Are there any techniques for knowing this, or its just a matter of understanding and experience? Also it would be very helpful if somebody could point me to some videos or articles answering my question.
This is more of a matter of understanding and experience. When building a model from scratch, you must understand which optimizer, loss, etc. makes sense for your particular problem. In order to choose these appropriately, you must understand the differences between the available optimizers, loss functions, etc.
In regards to choosing how many layers and nodes, what batch size, what learning rate, etc.-- these are all hyperparameters that you will need to test and tune as you experiment with your model.
I have a Deep Learning Fundamentals YouTube playlist that you may find helpful. It covers the fundamental basics of each of these topics in short videos. Additionally, this Deep Learning with Keras playlist may also be beneficial if you're wanting to focus more on coding after getting the basic concepts down.
Thanks for the question.
The CS231n Stanford lectures on CNN is the best for beginners refer to the video lectures here and class notes are available here
After watching the lectures and completing the assignments, you will get a basic idea of what Deep Learning is and all the algorithms available etc.
But when it comes to solving real-world problems this won't be sufficient So take this course by Jeremy Howard where he teaches more on how to approach a problem using Kaggle platform.
Keep on solving more problems experimenting new models and algorithms using several platforms like hackerearth, Kaggle, topcoder etc.

Which kinds of high level API of tensorflow should I learn?

I have studied tensorflow for about one month. I just feel that creating a network with primitive operations of Tensorflow is very verbose. Then I found some high level API, such as TF-Slim, TF Learn, Keras. But multiple choices confuse me so that I don't know which I should learn.
TF-Slim is a lightweight library for defining, training and evaluating complex models in TensorFlow, but as I investigated, it's only for convnets. What networks Keras can build are more diverse.
Can Anyone give a comparision between them so that I could choose which high level API I should learn ? In terms of :
1. popularity: which ones are the most popular ?
2. practicality: what kinds of network can they build ?
3. performance: what's their training/inference performance ?
... something else
Hope someone could give me a suggestion. Thanks.
I suggest you start with Keras.
It´s very easy to learn, it has a broad user base (see Shobhits link), there is a ton of reference code out there on GitHub and in tutorials / MOOCs / eBooks etc. and you can build almost anything with it. And I personally think that is has a good documentation (although some might disagree with that...).
Since it´s an API that connects to Tensorflow, Theano, CNTK (and possibly more frameworks in the future) you have even more flexibility.
Don´t worry too much about performance. That´s really not important while youre learning.

How to predict using Tensorflow?

This is a newbie question for the tensorflow experts:
I reading lot of data from power transformer connected to an array of solar panels using arduinos, my question is can I use tensorflow to predict the power generation in future.
I am completely new to tensorflow, if can point me to something similar I can start with that or any github repo which is doing similar predictive modeling.
Edit: Kyle pointed me to the MNIST data, which I believe is a Image Dataset. Again, not sure if tensorflow is the right computation library for this problem or does it only work on Image datasets?
thanks, Rajesh
Surely you can use tensorflow to solve your problem.
TensorFlow™ is an open source software library for numerical
computation using data flow graphs.
So it works not only on Image dataset but also others. Don't worry about this.
And about prediction, first you need to train a model(such as linear regression) on you dataset, then predict. The tutorial code can be found in tensorflow homepage .
Get your hand dirty, you will find it works on your dataset.
Good luck.
You can absolutely use TensorFlow to predict time series. There are plenty of examples out there, like this one. And this is a really interesting one on using RNN to predict basketball trajectories.
In general, TF is a very flexible platform for solving problems with machine learning. You can create any kind of network you can think of in it, and train that network to act as a model for your process. Depending on what kind of costs you define and how you train it, you can build a network to classify data into categories, predict a time series forward a number of steps, and other cool stuff.
There is, sadly, no short answer for how to do this, but that's just because the possibilities are endless! Have fun!