Does wit.ai prioritize what it ask me to validate? (e.g low confidence score results) - wit.ai

I am training wit.ai understanding, I use python script to make api call and feed wit.ai with sentences stored in a local file.
There is more data (~thousands) than I can manually validate (hundreds).
Does wit.ai prioritize what it ask me to validate? e.g. those that has low confidence score or "no match"
From my experience, I did not see this kind of optimization for wit.ai. If this is true, I need to do some optimization in my training process, and avoid flooding wit.ai with similar data that reach high confidence score quickly.

Related

Bot Detection via Google Recaptcha V3

Is Google reCAPTCHA v3 a perfect fit for bot detection? I doubt 100% accuracy is needed. We want statistical significance and confidence that an account generating a post, tweet, or some other electronic message is mostly automatic and mostly not human.
I would make a Google ReCaptcha test every few message transmissions. One would think over a long enough period one could with confidence generate significant statistics to determine whether the account was used mostly by a human or mostly by a bot.

what is a "convolution warmup"?

i encountered this phrase few times before, mostly in the context of neural networks and tensorflow, but i get the impression its something more general and not restricted to these environments.
here for example, they say that this "convolution warmup" process takes about 10k iterations.
why do convolutions need to warmup? what prevents them from reaching their top speed right away?
one thing that i can think of is memory allocation. if so, i would expect that it would be solved after 1 (or at least <10) iteration. why 10k?
edit for clarification: i understand that the warmup is a time period or number of iterations that have to be done until the convolution operator reaches its top speed (time per operator).
what i ask is - why is it needed and what happens during this time that makes the convolution faster?
Training neural networks works by offering training data, calculating the output error, and backpropagating the error back to the individual connections. For symmetry breaking, the training doesn't start with all zeros, but by random connection strengths.
It turns out that with the random initialization, the first training iterations aren't really effective. The network isn't anywhere near to the desired behavior, so the errors calculated are large. Backpropagating these large errors would lead to overshoot.
A warmup phase is intended to get the initial network away from a random network, and towards a first approximation of the desired network. Once the approximation has been achieved, the learning rate can be accelerated.
This is an empirical result. The number of iterations will depend on the complexity of your program domain, and therefore also with the complexity of the necessary network. Convolutional neural networks are fairly complex, so warmup is more important for them.
You are not alone to claiming the timer per iteration varies.
I run the same example and I get the same question.And I can say the main reason is the differnet input image shape and obeject number to detect.
I offer my test result to discuss it.
I enable trace and get the timeline at the first,then I found that Conv2D occurrences vary between steps in gpu stream all compulte,Then I use export TF_CUDNN_USE_AUTOTUNE=0 to disable autotune.
then there are same number of Conv2D in the timeline,and the time are about 0.4s .
the time cost are still different ,but much closer!

Intrusion DetecTion DataSet

First Off all i hope my subject will not delte because maybe out Topic But i Didn't Find better Website to Post on it.
I'm working on Intrusion detection Project, as my research Most of intrusion detection dataset (KDD,DARPA, CDX,ISCX...) every base had their proper format(arff,tcmpdump,dump,csv...) So i want to Convert DataSet from Dump and Tcpdump to Arff, to arff format (if you have better idea to make the dataset into same format i'll be thankfull), what is the best way to do this ?
And the last question is the best Intrusion detection System, which can analyse Heterogenous Dataset format and give me Detection Rate of every Attack
First, the dataset KDDCup is based on traffic collected of DARPA. The Darpa Dataset isn’t labeled and contains only traffic collected of experiments, that is why format Tcpdump.
So, KDDCup is work on DARPA. They analyse all traffic(packets) and selected features that help algorithms classified traffic as normal and anomaly. We call this approach "Offline Learning" when we have the labeled data.

Tensorflow network resource usage

To evaluate the quality of job placements in distributed TensorFlow, I want to obtain the total size in bytes of the data sent through the network during training. This is in preparation for further work on the automatic job placement algorithm. Network usage will measure data locality of the training, and is a proxy for training delays.
My plan is to simply record all the sizes of tensors input to _Send nodes, then output and display this in the python profiling Timeline. I've read the related discussions here and here and believe this correct in principle. The only concern is my experiments have shown that Send and Recv nodes are used for communication within a process in addition to inter-process communication - which appears different from what's described in the whitepaper: https://www.tensorflow.org/about/bib.
Are there any caveats to my approach, and is this a good approximation to the actual amount of network used? Also, is data transferred a worthwhile quantity to minimize for minimizing delays, coming from job placements?

Deep neural network diverges after convergence

I implemented the A3C network in https://arxiv.org/abs/1602.01783 in TensorFlow.
At this point I'm 90% sure the algorithm is implemented correctly. However, the network diverges after convergence. See the attached image that I got from a toy example where the maximum episode reward is 7.
When it diverges, policy network starts giving a single action very high probability (>0.9) for most states.
What should I check for this kind of problem? Is there any reference for it?
Note that in Figure 1 of the original paper the authors say:
For asynchronous methods we average over the best 5
models from 50 experiments.
That can mean that in lot of cases the algorithm does not work that well. From my experience, A3C often diverges, even after convergence. Carefull learning-rate scheduling can help. Or do what the authors did - learn several agents with different seed and pick the one performing the best on your validation data. You could also employ early stopping when validation error becomes to increase.