Encog binary classification score for ROC - encog

I am working on a binary classifier using Encog (via Java). I have it set up using an SVM or neural network, and I am want to evaluate the quality of the different models using (in part) the area under the ROC curve.
More specifically, I would ideally like to convert the output of the model into a some kind of prediction confidence score that can be used for rank ordering in the ROC, but I have yet to find anything in the documentation.
In the code, I get the model results with something like:
MLData result = ((MLRegression) method).compute( pair.getInput() );
String classification = normHelper.denormalizeOutputVectorToString( result )[0];
How do I also get a numerical confidence of the classification?

I have found a way to coax prediction probabilities out of SVM inside the encog framework. This method relies upon the equivalent of the -b option for libSVM (see http://www.csie.ntu.edu.tw/~cjlin/libsvm/index.html)
To do this, override the SVM class from encog. The constructor will enable probability estimates via the smv_parameter object (see below). Then, when doing the calculation, call the method svm_predict_probability as shown below.
Caveat: below is only a code fragment and in order to be useful you will probably need to write other constructors and to pass the resulting probabilities out of the methods below. This fragment is based upon encog version 3.3.0.
public class MySVMProbability extends SVM {
public MySVMProbability(SVM method) {
super(method.getInputCount(), method.getSVMType(), method.getKernelType());
// Enable probability estimates
getParams().probability = 1;
}
#Override
public int classify(final MLData input) {
svm_model model = getModel();
if (model == null) {
throw new EncogError(
"Can't use the SVM yet, it has not been trained, "
+ "and no model exists.");
}
final svm_node[] formattedInput = makeSparse(input);
final double probs[] = new double[svm.svm_get_nr_class(getModel())];
final double d = svm.svm_predict_probability(model, formattedInput, probs);
/* probabilities for each class are in probs[] */
return (int) d;
}
#Override
public MLData compute(MLData input) {
svm_model model = getModel();
if (model == null) {
throw new EncogError(
"Can't use the SVM yet, it has not been trained, "
+ "and no model exists.");
}
final MLData result = new BasicMLData(1);
final svm_node[] formattedInput = makeSparse(input);
final double probs[] = new double[svm.svm_get_nr_class(getModel())];
final double d = svm.svm_predict_probability(model, formattedInput, probs);
/* probabilities for each class are in probs[] */
result.setData(0, d);
return result;
}
}

Encog has no direct support for ROC curves. A ROC curve is more of a visualization than an actual model type, which is primarily the focus of Encog.
Generating a ROC curve for SVM's and Neural Networks is somewhat different. For a neural network, you must establish thresholds for the classification neurons. There is a good paper about that here: http://www.lcc.uma.es/~jja/recidiva/048.pdf
I may eventually add direct support for ROC curves into Encog in the future. They are becoming a very common visualization.

Related

Fast evaluation of a decision forest

I have some decision trees (1000-3000) which need to be evaluated as fast as possible. They all access the same set of double values. There are no categorical values at all (so all values are just numerical).
What is the fastest way to do this? At the moment I generate some C-code at runtime and compile it with the heaviest optimizations for the local architecture. The generated code looks like this (similar, but much larger):
static inline double eval_tree0() {
if (*(const double *)0x12345 < 1.2345) {
if (*(const double *)0x4563456 < 2.2243) {
return 1.2111;
}
else {
return 5.2111;
}
}
else {
return 1.234;
}
}
double eval() {
return eval_tree0() + eval_tree1() + ...;
}
Is there something more performant? I was thinking about using AVX to evaluate multiple trees at once, but this seems to be tricky and I'm not sure if the performance is that much better.
Has anyone an idea what's the very fastest possibility to evaluate a bunch of decision trees for a given input (batch size is 1)? Maybe even something with AVX?
Thanks

How can I converge loss to a lower value?(tensorflow)

I used tensorflow object detection API.
Here is my environment.
All images are from coco API
Tensorflow version : 1.13.1
Tensorboard version : 1.13.1
Number of test images : 3000
Number of train images : 24000
Pre-trained model : SSD mobilenet v2 quantized 300x300 coco
Number of detecting class : 1(person)
And here is my train_config.
train_config: {
batch_size: 6
optimizer {
adam_optimizer: {
learning_rate {
exponential_decay_learning_rate: {
initial_learning_rate:0.000035
decay_steps: 7
decay_factor: 0.98
}
}
}
}
fine_tune_checkpoint: "D:/TF/models/research/object_detection/ssd_mobilenet_v2_quantized_300x300_coco_2019_01_03/model.ckpt"
fine_tune_checkpoint_type: "detection"
num_steps: 200000
data_augmentation_options {
random_horizontal_flip {
}
}
data_augmentation_options {
ssd_random_crop {
}
}
}
I can't find optimized learning rate, appropriate decay steps and factor.
So I did many training, but the result is always similar.
How can I fix this??
I already spent a week just for this problem..
On the other post, someone recommended that add a noise to data set(images).
But I don't know what it means.
How can I make that happen?
I think what was referenced on the other post was to do some data augmentation by adding some noisy images to your training dataset. It means that you apply some random transformations to your input so that the model aims to generalize better.
A type of noise that can be used is the Random Gaussian noise (https://en.wikipedia.org/wiki/Gaussian_noise) which is applied by patch in the object-detection API.
Although it seems that you have enough training images it is worth a shot.
The noise would look like :
...
data_augmentation_options {
random_horizontal_flip {
}
}
data_augmentation_options {
ssd_random_crop {
}
}
data_augmentation_options {
randompatchgaussian {
// The patch size will be chosen to be in the range
// [min_patch_size, max_patch_size).
min_patch_size: 300;
max_patch_size: 300; //if you want the whole image to be noisy
}
}
...
For the list of data augmentation you can check :
https://github.com/tensorflow/models/blob/master/research/object_detection/protos/preprocessor.proto
Regarding the learning rate one common strategy is to try on large learning rate (0.02 for instance) and one very small as you have tried already. I would recommend you to try with 0.02, leave it for a while or use the exponential decay learning rate to see if the results are better.
Changing the batch_size can also have some benefits, try batch_size = 2 instead of 6.
I would also recommend you to leave the training for more steps until you see no improvements at all in the training, maybe leave it until the 200000 steps define in your configuration.
Some deeper strategies can help the model to perform better, they have been said on this answer : https://stackoverflow.com/a/61699696/14203615
That being said, if your dataset is correctly made you should get good results on your test set.

How to detect object in ML C# using yolo v3 model (with two outputs)?

I have tiny yolo v3 pre-trained model and I want to use it in C#, in order to be able to detect objects.
I came across the following working sample code but the tutorial is made for a tiny yolo v2 model with the properties:
while my pre-trained model has the properties:
So there is incompatibility not only in the names, but also in the number of outputs and in the parameters for inputs / outputs. Since ML is not my area, I am having difficulties in migrating the code to support this new model that I have.
What I did so far was:
renaming all occurrences of the input param image into the value 000_net
renaming all occurrences of the output param grid into the value 016_convolutional
changes the values (this section)
from:
public const int ROW_COUNT = 13;
public const int COL_COUNT = 13;
public const int CHANNEL_COUNT = 125;
public const int BOXES_PER_CELL = 5;
public const int BOX_INFO_FEATURE_COUNT = 5;
public const int CLASS_COUNT = 20;
public const float CELL_WIDTH = 32;
public const float CELL_HEIGHT = 32;
into (based on the comments from person that provided me the model):
public const int ROW_COUNT = 13;
public const int COL_COUNT = 13;
public const int CHANNEL_COUNT = 18;
public const int BOXES_PER_CELL = 3;
public const int BOX_INFO_FEATURE_COUNT = 5;
public const int CLASS_COUNT = 1;
public const float CELL_WIDTH = 32;
public const float CELL_HEIGHT = 32;
Also I have replaced the class names, with the one class that the model is trained for.
In the final result after all my changes, the application is not throwing errors but it's showing warnings and the objects are not detected as they should. Here is the output that I got:
Warning says:
...
2020-08-25 14:46:40.5959296 [W:onnxruntime:, graph.cc:863 onnxruntime::Graph::Graph] Initializer 022_convolutional_bn_bias appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2020-08-25 14:46:40.5970795 [W:onnxruntime:, graph.cc:863 onnxruntime::Graph::Graph] Initializer 022_convolutional_bn_mean appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2020-08-25 14:46:40.5979695 [W:onnxruntime:, graph.cc:863 onnxruntime::Graph::Graph] Initializer 022_convolutional_bn_var appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2020-08-25 14:46:40.5988356 [W:onnxruntime:, graph.cc:863 onnxruntime::Graph::Graph] Initializer 022_convolutional_conv_weights appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2020-08-25 14:46:40.5996638 [W:onnxruntime:, graph.cc:863 onnxruntime::Graph::Graph] Initializer 023_convolutional_conv_bias appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2020-08-25 14:46:40.6006995 [W:onnxruntime:, graph.cc:863 onnxruntime::Graph::Graph] Initializer 023_convolutional_conv_weights appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.

How Can I merge complex shapes stored in an ArrayList with Geomerative Library

I store shapes of this class:
class Berg{
int vecPoint;
float[] shapeX;
float[] shapeY;
Berg(float[] shapeX, float[] shapeY, int vecPoint){
this.shapeX = shapeX;
this.shapeY = shapeY;
this.vecPoint = vecPoint;
}
void display(){
beginShape();
curveVertex(shapeX[vecPoint-1], shapeY[vecPoint-1]);
for(int i=0;i<vecPoint;i++){
curveVertex(shapeX[i], shapeY[i]);
}
curveVertex(shapeX[0],shapeY[0]);
curveVertex(shapeX[1],shapeY[1]);
endShape();
}
}
in an ArrayList with
shapeList.add(new Berg(xBig,yBig,points));
The shapes are defined with eight (curveVertex-)points (xBig and yBig) forming a shape around a randomly positioned center.
After checking if the shapes are intersecting I want to merge the shapes that overlap each other. I already have the detection of the intersection working but struggle to manage the merging.
I read that the library Geomerative has a way to do something like that with union() but RShapes are needed as parameters.
So my question is: How can I change my shapes into the required RShape type? Or more general (maybe I did some overall mistakes): How Can I merge complex shapes stored in an ArrayList with or without Geomerative Library?
Take a look at the API for RShape: http://www.ricardmarxer.com/geomerative/documentation/geomerative/RShape.html
That lists the constructors and methods you can use to create an RShape out of a series of points. It might look something like this:
class Berg{
public RShape toRShape(){
RShape rShape = new rShape();
for(int i = 0; i < shapeX; i++){
rShape.addLineto(shapeX[i], shapeY[i]);
}
}
}

How do I use Strategy Pattern in this context?

Let me begin by saying I am a mathematician and not a coder. I am trying to code a linear solver. There are 10 methods which I coded. I want the user to choose which solver she wishes to use, like options.solver_choice='CG'.
Now, I have all 10 methods coded in a single class. How do I use the strategy pattern in this case?
Previously, I had 10 different function files which I used to use in the main program using a switch case.
if options.solver_choice=='CG'
CG(A,x,b);
if options.solver_choice=='GMRES'
GMRES(A,x,b);
.
.
.
This isn't the most exact of answers, but you should get the idea.
Using the strategy pattern, you would have a solver interface that implements a solver method:
public interface ISolver {
int Solve();
}
You would implement each solver class as necessary:
public class Solver1 : ISolver {
int Solve() {
return 1;
}
}
You would then pass the appropriate solver class when it's time to do the solving:
public int DoSolve(ISolver solver) {
return solver.solve();
}
Foo.DoSolve(new Solver1());
TL;DR
As I've always understood the strategy pattern, the idea is basically that you perform composition of a class or object at run-time. The implementation details vary by language, but you should be able to swap out pieces of behavior by "plugging in" different modules that share an interface. Here I present an example in Ruby.
Ruby Example
Let's say you want to use select a strategy for how the #action method will return a set of results. You might begin by composing some modules named CG and GMRES. For example:
module CG
def action a, x, b
{ a: a, x: x, b: b }
end
end
module GMRES
def action a, x, b
[a, x, b]
end
end
You then instantiate your object:
class StrategyPattern
end
my_strategy = StrategyPattern.new
Finally, you extend your object with the plug-in behavior that you want. For example:
my_strategy.extend GMRES
my_strategy.action 'q', nil, 1
#=> ["q", nil, 1]
my_strategy.extend GMRES
my_strategy.action 'q', nil, 1
#=> {:a=>"q", :x=>nil, :b=>1}
Some may argue that the Strategy Pattern should be implemented at the class level rather than by extending an instance of a class, but this way strikes me as easier to follow and is less likely to screw up other instances that need to choose other strategies.
A more orthodox alternative would be to pass the name of the module to include into the class constructor. You might want to read Russ Olsen's Design Patterns in Ruby for a more thorough treatment and some additional ways to implement the pattern.
Other answers present the pattern correctly, however I don't feel they are clear enough. Unfortunately the link I've provided does the same, so I'll try to demonstrate what's the Strategy's spirit, IMHO.
Main thing about strategy is to have a general procedure, with some of its details (behaviours) abstracted, allowing them to be changed transparently.
Consider an gradient descent optimization algorithm - basically, it consists of three actions:
gradient estimation
step
objective function evaluation
Usually one chooses which of these steps they need abstracted and configurable. In this example it seems that evaluation of the objective function is not something that you can do in more than one way - you always just ... evaluate the function.
So, this introduces two different strategy (or policy) families then:
interface GradientStrategy
{
double[] CalculateGradient(Function objectiveFunction, double[] point);
}
interface StepStrategy
{
double[] Step(double[] gradient, double[] point);
}
where of course Function is something like:
interface Function
{
double Evaluate(double[] point);
}
interface FunctionWithDerivative : Function
{
double[] EvaluateDerivative(double[] point);
}
Then, a solver using all these strategies would look like:
interface Solver
{
double[] Maximize(Function objectiveFunction);
}
class GradientDescentSolver : Solver
{
public Solver(GradientStrategy gs, StepStrategy ss)
{
this.gradientStrategy = gs;
this.stepStrategy = ss;
}
public double[] Maximize(Function objectiveFunction)
{
// choosing starting point could also be abstracted into a strategy
double[] currentPoint = ChooseStartingPoint(objectiveFunction);
double[] bestPoint = currentPoint;
double bestValue = objectiveFunction.Evaluate(bestPoint);
while (...) // termination condition could also
// be abstracted into a strategy
{
double[] gradient = this.gradientStrategy.CalculateGradient(
objectiveFunction,
currentPoint);
currentPoint = this.stepStrategy.Step(gradient, currentPoint);
double currentValue = objectiveFunction.Evaluate(currentPoint);
if (currentValue > bestValue)
{
bestValue = currentValue;
bestPoint = currentPoint;
}
else
{
// terminate or step back and reduce step size etc.
// this could also be abstracted into a strategy
}
}
return bestPoint;
}
private GradientStrategy gradientStrategy;
private StepStrategy stepStrategy;
}
So the main point is that you have some algorithm's outline, and you delegate particular, general steps of this algorithm to strategies or policies. Now you could implement GradientStrategy which works only for FunctionWithDerivative (casts down) and just uses function's analytical derivative to obtain the gradient. Or you could have another one implementing stochastic version of gradient estimation. Note, that the main solver does not need to know about how the gradient is being calculated, it just needs the gradient. The same thing goes for the StepStrategy - it can be a typical step policy with single step-size:
class SimpleStepStrategy : StepStrategy
{
public SimpleStepStrategy(double stepSize)
{
this.stepSize = stepSize;
}
double[] Step(double[] gradient, double[] point)
{
double[] result = new double[point.Length];
for (int i = 0;i < result.Length;++i)
{
result[i] = point[i] + this.stepSize * gradient[i];
}
return result;
}
private double stepSize;
}
, or a complicated algorithm adjusting the step-size as it goes.
Also think about the behaviours noted in the comments in the code: TerminationStrategy, DeteriorationPolicy.
Names are just examples - they're probably not the best, but I hope they give the intent. Also, usually best to stick with one version (Strategy or Policy).
PHP Examples
You'd define your strategies that implement only singular method called solve()
class CG
{
public function solve($a, $x, $y)
{
//..implementation
}
}
class GMRES
{
public function solve($a, $x, $y)
{
// implementation..
}
}
Usage:
$solver = new Solver();
$solver->setStratery(new CG());
$solver->solve(1,2,3); // the result of CG
$solver->setStrategy(new GMRES());
$solver->solve(1,2,3); // the result of GMRES
class Solver
{
private $strategy;
public function setStrategy($strategy)
{
$this->strategy = $strategy;
}
public function solve($a, $x, $y)
{
return $this->strategy->solve($a, $x, $y);
}
}