Request intutive explanation of xgboost leaf scores - xgboost

I refer to the stackoverflow question at this link regarding calculation of scores at the leaves by xgboost algorithm, once again. I have searched documentation a lot but could not find any intuitive explanation. Is it possible to have an intuitive meaning as to scores at the leaves in this picture and how (intuitively) these could have been calculated. Unfortunately, it is difficult for me to understand mathematics behind all this.

Related

What is the Benjamini-Yekutieli test

According to the documentation the tsfresh algorithm makes use of the Benjamini-Yekutieli in its final step. To give you a little bit of context,
tsfresh automatically calculates a large number of time series characteristics, the so called features. Further the package contains methods to evaluate the explaining power and importance of such characteristics for regression or classification tasks.
I have tried to read the linked references but I found them very technical. Can anybody provide an high-level description of the Benjamini-Yekutieli and explain why it is needed? I would like to understand what is its main purpose.
If you don’t know what FRESH is, I would still be happy to read an explanation of the Benjamini-Yekutieli test.

How to model coin-flips with pymc (from Probabilistic Programming and Bayesian Methods for Hackers)

Basically asking about https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/issues/150
At https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/blob/master/Chapter2_MorePyMC/Ch2_MorePyMC_PyMC2.ipynb the author discusses a model where students are asked if they cheated on a test or not. They flip coins and answer honestly if heads, otherwise answer randomly. Given the number of "yes" answers, how can we get the distribution of cheaters?
The author there basically models this arising from a probability of cheaters, which will give rise to some set of observations of students either cheating or not, which will then give rise to some set of answers via the coin flips, which will then yield some observed probability of answering, "yes, I cheated."
However, instead of letting that observed probability (or just the sum of "yes") answers be the observation, he THEN models a binomial distribution on top of that, and the observation recorded in the experiment is set as the observed value for that distribution.
My questions:
Is this the right thing to do? If so, why?
Assuming it's not, is there a better solution (short of the radically simplified version he presents)?
The general case of this is having an "observed" value for a sum of random variables. People online seem to suggest this is impossible, but I don't get why you couldn't just, e.g., "observe" a draw from a uniform distribution with the mean at your deterministic observation and bounds at +/- epsilon.
Prob/Stat problems are subtle. If I am reading your question correctly; The choice between binomial and uniform distros have to do with the significance of the latter part of the scenario setup.. It say's if the students flip tails, they can answer randomly. This means that an answer of no is not equally likely as yes because while a coin flip is uniformly distributed, the following answers are not.
If the student absolutely HAD to answer 'no' on tails, you could definitely use another uniform distribution. Hope that helps!

How can I benchmark signature algorithms (HMAC vs RSA) and compare them well?

I would like to kind of re-ask a question that was asked here two years ago (Benchmarking symmetric and asymmetric cryptography) but, as I find, was not satisfyingly answered.
1) I too would really like to back up the notion that RSA-like asymmetric cryptography is much more expensive than for example performing an HMAC operation with hard numbers. These numbers should be informative with regard to comparability of algorithms.
2) Moreover, I would be interested, in addition to mere mean values of speed, also in information about standard derivation/variance of the measured operation costs. This is because in the protocol in question, predictability of the operation time is actually an issue. This goes so far that if candidate A took significantly longer than B, but the time it took was more reliably predicabtle than the time that B took, then A would be my option of choice.
So my question is this: does anybody know of a Benchmarking tool which can give me the desired information 1) and 2) described above?
I should also mention that I have tried the "speed" command of OpenSSL, and found it unsatisfactory. So another question is: do any of you know of any further parameters or tools for that which could help me achieve my goal 2)? This would also be very welcome.
If you feel you can not help me with the first two questions, the last question would be: how is the information given back by "openssl speed rsa" to be read exactly (what are the input sizes, for example 0.o)? An answer to this would help me at least achieve goal 1).
Thanks in advance.
Kris
tl;dr: Do you know of any kind of Benchmark that gives me clear information on the performance of signature algorithms (more than just some general mean value, as for example "openssl speed" will give)?
If so, please tell me.
On a side note: please answer only if you have something to contribute to the questions as stated above. Mere recommendations of cryptographic algorithms or such are not really helpful to me.

Where can I find large sample of computer languages for Naive Bayesian Analysis

I am trying to analyse online code and want to use Bayesian Classification. However I need a fair amount of pre classified code as sample data.
Maybe the twenty or so top languages?
Does anyone know of such a corpus?
there was a data set on Kaggle with questions from StackOverflow where the objective was to guess the tags related to the question. That could require guessing the language of code samples (or just looking for keywords)
https://www.kaggle.com/c/facebook-recruiting-iii-keyword-extraction
Other possibilities searching through Github - since all that code is free and open.
StackOverflow itself shares its own data of all user contributed posts (anonymized)

Neural Networks Project? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
I'm looking for ideas for a Neural Networks project that I could complete in about a month or so. I'm doing it for the National Science Fair, so I need something that has some curb appeal as well since it's being judged.
It doesn't necessarily have to be completely new and unique, I'm just looking for ideas, but it should be complex enough that it would impress someone who knows about the field. My first idea was to implement a spam filter of sorts, but I recently found out that NN's aren't a very good way to do it. I've already got a basic NN simulator with Genetic Algorithms, and I'm also adding the the generic back-propagation algorithms as well.
Any ideas?
Look into Numenta's Hierarchical Temporal Memory (HTM) concept. This may be slightly off topic if the expectation is of "traditional" Neural Nets, but it is also an extremely promising avenue for Artificial Intelligence.
Although Numenta introduced HTM and its associated software platform, NuPIC, almost five years ago, the first commercial product based upon this technology was released (in beta) a few weeks ago by Vitamin D. It is called Vitamin D Video and essentially turns any webcam or IP camera into a sophisticated video monitoring system, recognizing classes of items (say persons vs. cats or other animals) in the video feed.
With the proper setup, this type of application could make for an interesting display at the Science Fair, one with much "curb appeal".
To wet your appetite or even get your feet wet with HTM technology you can download NuPIC and check its various sample applications. Chances are that you may find something that meets typical criteria of both geekness and coolness for science fairs.
Generally, HTMs aim at solving problems which are simple for humans but difficult for computers; such a statement is somewhat of a generic/applicable to Neural Nets, but HTMs take this to the "next level".
Although written in C (I think) NuPIC is typically interfaced in Python, which makes it a convenient test bed for simple yet sophisticated proofs of concept applications.
You could always try to play around with a neural network and stock courses, if I had a month of spare time for a neural network implementation, thats what I would play with.
A friend of mine in college wrote a NN to play go on a 9x9 board.
I don't think it ever got very good, but I think it would be fun to try.
Look on how a bidirectional associative memory compare with other classical edit distance algorithms (Levenshtein, Damerau-Levenshtein etc) for typo correction. Also consider the articles on hebbian unlearning while training your NN - it seems that the confabulation phenomena is avoided.
I've done some works on top of NN, mainly an XML based language (Neural XML). See details here
http://amazedsaint.blogspot.com/search/label/Neural%20Network
Also, one interesting .NET Neural network project is Aforge.net - Check out that as well..
You can implement the game Cellz or create a controller for it. It was first created by Simon M Lucas. It's a nice and interesting game, and i'm sure that everyone will love it. I used it also for a school project and it turned out very ok.
You can find in that page some links to other interesting games.
How about applying it to predicting exchange rate (USD - EUR for example for sub minute trading) should be fun to show net gain of money over 1 month.
I doubt this will work for trades longer than a minute... without a lot of extra work.
I like using committee machines so why not apply it to Face-Detection in images / movies or voice print authentication.
Finally you could get it to play pleasing music and use a crowd sourcing fitness function whereby people vote for the best "musicians"