looking for simulated annealing implementation in VB - vb.net

Is anyone aware of a reasonably well documented example of simulated annealing in Visual Basic that I can examine and adapt?

This project looks pretty well documented: http://www.codeproject.com/KB/recipes/simulatedAnnealingTSP.aspx. It's C# but contains only one important source file (TravellingSalesmanProblem.cs) so it's pretty easy to run it through a converter. Maybe: http://labs.developerfusion.co.uk/convert/csharp-to-vb.aspx?
MSDN magazine also had an interesting article on neural networks. As I understand simulated annealing, you can add it to other function estimation methods (like neural nets). So you could add simulated annealing to the MSDN VB code by shrinking the Momentum over time. The network starts 'hot' by backpropagating error with a large Momentum and slowly 'cools' by shrinking the Momentum and thus reducing the effect of output error in backpropagation.
Cheers.

I generally refer to "Numerical recipes in C/C++" for all the pseudocode and adapt to my own later. That is the best documentation/implementation you could find. Sometimes you could even find better algorithms or an alternative way of solving. (In case Newton Raphshon is not the way to go)

Related

Is TensorFlow the way to go for this optimization problem?

I have to optimize the result of a process that depends on a large number of variables, i.e. a laser engraving system where the engraving depth depends on the laser speed, distance, power and so on.
The final objective is the minimization of the engraving time, or the maximization of the laser speed. All the other parameters can vary, but must stay within safe bounds.
I have never used any machine learning tools, but to my very limited knowledge this seems like a good use case for TensorFlow or any other machine learning library.
I would experimentally gather data points to train the algorithm, test it and then use a gradient descent optimizer to find the parameters (within bounds) that maximize the laser travel velocity.
Does this sound feasible? How would you approach such a problem? Can you link to any examples available online?
Thank you,
Riccardo
I’m not quite sure if I understood the problem correctly, would you add some example data and a desired output?
As far as I understood, It could be feasible to use TensorFlow, but I believe there are better solutions to that problem. Let me expand on this.
TensorFlow is a framework focused in the development of Deep Learning models. These usually require lots of data (the number really depends on the problem) but I don’t believe that just you manually gathering this data would be enough unless your team is quite big or already have some data gathered.
Also, as you have a minimization (or maximization) problem given variables that lay within a known range, I think this can be a case of Operations Research optimization instead of Machine Learning. Check this example of OR.

Limitations of optimisation software such as CPLEX

Which of the following optimisation methods can't be done in an optimisation software such as CPLEX? Why not?
Dynamic programming
Integer programming
Combinatorial optimisation
Nonlinear programming
Graph theory
Precedence diagram method
Simulation
Queueing theory
Can anyone point me in the right direction? I didn't find too much information regarding the limitations of CPLEX on the IBM website.
Thank you!
That's kind-of a big shopping list, and most of the things on it are not optimisation methods.
For sure CPLEX does integer programming, non-linear programming (just quadratic, SOCP, and similar but not general non-linear) and combinatoric optimisation out of the box.
It is usually possible to re-cast things like DP as MILP models, but will obviously require a bit of work. Lots of MILP models are also based on graphs, so yes it is certainly possible to solve a lot of graph problems using a MILP solver such as CPLEX.
Looking wider at topics like simulation, then that is quite a different approach. Simulation really is NOT an optimisation method, but it can be used alongside optimisation to get extra insights which may be useful in a business context. Might be used for example to discover some empirical relationships that could be used in an optimisation model by CPLEX.
The same can probably also be said for things like queuing theory, precedence, etc. Basically, use CPLEX as an optimisation tool to solve part or all of your problem once you have structured and analysed it via one of these other approaches.
Hope that helps.

idea behind xgboost/lightgbm/catboost in comparison

I'm trying to decide, which one of the following I will use in practice for regression tasks: xgboost, lightgbm or catboost (python 3).
So, what are general idea behind each of them? Why should I choose one, but not another?
I'm not interested in very slight difference in the accuracy score like 0.781 vs 0.782. Result should be tenable, and my tool should be robust, convenient in use. The workhorse.
As I understand about these methods, Just how they are implemented is different, otherwise they have implemented GBM methods.
So you should just try to do some hyper parameter tuning.
Also, its good idea to read this paper:
catboost-vs-light-gbm-vs-xgboost
You cannot determine a priori which Tree algorithm (or any algorithm) will be automatically the best. This is because of the https://en.wikipedia.org/wiki/No_free_lunch_theorem
It's best to try them all out. You should also throw in Random Forest (RF) as another one to try.
I will say that http://CatBoost.ai (CB) does have one advantage over the others: if you have Categorical Variables, CB will most likely beat the others because it can handle categorical variables directly without One-Hot-Encoding.
You might try http://H2O.ai 's grid search which supports several algorithms (RF, XGBoost, GBM, Linear Regression) with Hypertuning of parameters to see which one works best. You can run this overnight. (CB is not included in H2O's grid search)

How do you find the most discriminant terms in binary document classification?

I want to use feature selection to find the terms in a document that are most useful for a binary classification task.
I've been looking around:
This mentions Mutual Information and the chi-squared test metric
http://nlp.stanford.edu/IR-book/html/htmledition/feature-selection-1.html
MATLAB has a number of functions as well:
http://www.mathworks.com/help/toolbox/stats/brj0qbu.html
Feature Selection in MATLAB
Of the above, relieff and rankfeatures look promising.
I do not know if my data follows a normal distribution. Any thoughts on which technique performs the best? Are there any newer methods you would suggest? The focus is to increase classification accuracy.
Thank you!
Since the answer is highly dependent on the nature of your data, I'd suggest playing with several options, possibly using a hold-out set for verification.
The easiest path would probably be to use Weka or RapidMiner for experimenting. Choosing from the plethora of options provided by them, you'll probably get acquainted with several other methods.
Having said that, I have found Mutual Information/Infogain to be useful on a large variety of problems.

What's the difference between code written for a desktop machine and a supercomputer?

Hypothetically speaking, if my scientific work was leading toward the development of functions/modules/subroutines (on a desktop), what would I need to know to incorporate it into a large-scale simulation to be run on a supercomputer (which might simulate molecules, fluids, reactions, and so on)?
My impression is that it has to do with taking advantage of certain libraries (e.g., BLAS, LAPLACK) where possible, revising algorithms (reducing iteration), profiling, parallelizing, considering memory-hard disk-processor use/access... I am aware of the adage, "want to optimize your code? don't do it", but if one were interested in learning about writing efficient code, what references might be available?
I think this question is language agnostic, but since many number-crunching packages for biomolecular simulation, climate modeling, etc. are written in some version of Fortran, this language would probably be my target of interest (and I have programmed rather extensively in Fortran 77).
Profiling is a must at any level of machinery. In common usage, I've found that scaling to larger and larger grids requires a better understanding of the grid software and the topology of the grid. In that sense, everything you learn about optimizing for one machine is still applicable, but understanding the grid software gets you additional mileage. Hadoop is one of the most popular and widespread grid systems, so learning about the scheduler options, interfaces (APIs and web interfaces), and other aspects of usage will help. Although you may not use Hadoop for a given supercomputer, it is one of the less painful methods for learning about distributed computing. For parallel computing, you may pursue MPI and other systems.
Additionally, learning to parallelize code on a single machine, across multiple cores or processors, is something you can begin learning on a desktop machine.
Recommendations:
Learn to optimize code on a single machine:
Learn profiling
Learn to use optimized libraries (after profiling: so that you see the speedup)
Be sure you know algorithms and data structures very well (*)
Learn to do embarrassingly parallel programming on multiple core machines.
Later: consider multithreaded programming. It's harder and may not pay off for your problem.
Learn about basic grid software for distributed processing
Learn about tools for parallel processing on a grid
Learn to program for alternative hardware, e.g. GPUs, various specialized computing systems.
This is language agnostic. I have had to learn the same sequence in multiple languages and multiple HPC systems. At each step, take a simpler route to learn some of the infrastructure and tools; e.g. learn multicore before multithreaded, distributed before parallel, so that you can see what fits for the hardware and problem, and what doesn't.
Some of the steps may be reordered depending on local computing practices, established codebases, and mentors. If you have a large GPU or MPI library in place, then, by all means, learn that rather than foist Hadoop onto your collaborators.
(*) The reason to know algorithms very well is that as soon as your code is running on a grid, others will see it. When it is hogging up the system, they will want to know what you're doing. If you are running a process that is polynomial and should be constant, you may find yourself mocked. Others with more domain expertise may help you find good approximations for NP-hard problems, but you should know that the concept exists.
Parallelization would be the key.
Since the problems you cited (e.g. CFD, multiphysics, mass transfer) are generally expressed as large-scale linear algebra problems, you need matrix routines that parallelize well. MPI is a standard for those types of problems.
Physics can influence as well. For example, it's possible to solve some elliptical problems efficiently using explicit dynamics and artificial mass and damping matricies.
3D multiphysics means coupled differential equations with varying time scales. You'll want a fine mesh to resolve details in both space and time, so the number of degrees of freedom will rise rapidly; time steps will be governed by the stability requirements of your problem.
If someone ever figures out how to run linear algebra as a map-reduce problem they'll have it knocked.
Hypothetically speaking, if my scientific work was leading toward the development of functions/modules/subroutines (on a desktop), what would I need to know to incorporate it into a large-scale simulation to be run on a supercomputer (which might simulate molecules, fluids, reactions, and so on)?
First, you would need to understand the problem. Not all problems can be solved in parallel (and I'm using the term parallel in as wide meaning as it can get). So, see how the problem is now solved. Can it be solved with some other method quicker. Can it be divided in independent parts ... and so on ...
Fortran is the language specialized for scientific computing, and during the recent years, along with the development of new language features, there has also been some very interesting development in terms of features that are aiming for this "market". The term "co-arrays" could be an interesting read.
But for now, I would suggest reading first into a book like Using OpenMP - OpenMP is a simpler model but the book (fortran examples inside) explains nicely the fundamentals. Message parsing interface (for friends, MPI :) is a larger model, and one of often used. Your next step from OpenMP should probably go in this direction. Books on the MPI programming are not rare.
You mentioned also libraries - yes, some of those you mentioned are widely used. Others are also available. A person who does not know exactly where the problem in performance lies should IMHO never try to undertake the task of trying to rewrite library routines.
Also there are books on parallel algorithms, you might want to check out.
I think this question is language agnostic, but since many number-crunching packages for biomolecular simulation, climate modeling, etc. are written in some version of Fortran, this language would probably be my target of interest (and I have programmed rather extensively in Fortran 77).
In short it comes down to understanding the problem, learning where the problem in performance is, re-solving the whole problem again with a different approach, iterating a few times, and by that time you'll already know what you're doing and where you're stuck.
We're in a position similar to yours.
I'm most in agreement with #Iterator's answer, but I think there's more to say.
First of all, I believe in "profiling" by the random-pausing method, because I'm not really interested in measuring things (It's easy enough to do that) but in pinpointing code that is causing time waste, so I can fix it. It's like the difference between a floodlight and a laser.
For one example, we use LAPACK and BLAS. Now, in taking my stack samples, I saw a lot of the samples were in the routine that compares characters. This was called from a general routine that multiplies and scales matrices, and that was called from our code. The matrix-manipulating routine, in order to be flexible, has character arguments that tell it things like, if a matrix is lower-triangular or whatever. In fact, if the matrices are not very large, the routine can spend more than 50% of its time just classifying the problem. Of course, the next time it is called from the same place, it does the same thing all over again. In a case like that, a special routine should be written. When it is optimized by the compiler, it will be as fast as it reasonably can be, and will save all that classifying time.
For another example, we use a variety of ODE solvers. These are optimized to the nth degree of course. They work by calling user-provided routines to calculate derivatives and possibly a jacobian matrix. If those user-provided routines don't actually do much, samples will indeed show the program counter in the ODE solver itself. However, if the user-provided routines do much more, samples will find the lower end of the stack in those routines mostly, because they take longer, while the ODE code takes roughly the same time. So, optimization should be concentrated in the user-provided routines, not the ODE code.
Once you've done several of the kind of optimization that is pinpointed by stack sampling, which can speed things up by 1-2 orders of magnitude, then by all means exploit parallelism, MPI, etc. if the problem allows it.