Test case specifically constructed to make algorithm fail - testing

This is a question on nomenclature. In complexity theory of algorithms, there is a name for a test case constructed specifically to make the algorithm fail. I had that word in my head, but can't recall it for the life of me now. I'm hoping someone here can help. What is the word for an input that is incredibly unlikely to occur in the real world but someone who knows the algorithm can construct to make it fail or perform badly?

You may be thinking of an adversary:
The idea is that an all-powerful malicious adversary pretends to choose an input for the algorithm. When the algorithm wants looks at a bit, the adversary sets that bit to whatever value will make the algorithm do the most work.

The word I was looking for was "pathological case". When reading about the Cauchy distribution (https://en.wikipedia.org/wiki/Cauchy_distribution), I chanced upon it again seeing it described as a "pathological distribution".

Related

Information about CGAL and alternatives

I'm working on a problem that will eventually run in an embedded microcontroller (ESP8266). I need to perform some fairly simple operations on linear equations. I don't need much, but do need to be able work with points and linear equations to:
Define an equations for lines either from two known points, or one
point and a gradient
Calculate a new x,y point on an equation line that is a specific distance from another point on that equation line
Drop a perpendicular onto an equation line from a point
Perform variations of cosine-rule calculations on points and triangle sides defined as equations
I've roughed up some code for this a while ago based on high school "y = mx + c" concepts, but it's flawed (it fails with infinities when lines are vertical), and currently in Scala. Since I suspect I'm reinventing a wheel that's not my primary goal, I'd like to use someone else's work for this!
I've come across CGAL, and it seems very likely it's capable of all this and more, but I have two questions about it (given that it seems to take ages to get enough understanding of this kind of huge library to actually be able to answer simple questions!)
It seems to assert some kind of mathematical perfection in it's calculations, but that's not important to me, and my system will be severely memory constrained. Does it use/offer memory efficient approximations?
Is it possible (and hopefully easy) to separate out just a limited subset of features, or am I going to find the entire library (or even a very large subset) heading into my memory limited machine?
And, I suppose the inevitable follow up: are there more suitable libraries I'm unaware of?
TIA!
The problems that you are mentioning sound fairly simple indeed, so I'm wondering if you really need any library at all. Maybe if you post your original code we could help you fix it--your problem sounds like you need to redo a calculation avoiding a division by zero.
As for your point (2) about separating a limited number of features from CGAL, giving the size and the coding style of that project, from my experience that will be significantly more complicated (if at all possible) than fixing your own code.
In case you want to try a simpler library than CGAL, maybe you could try Boost.Geometry
Regards,

How can I benchmark signature algorithms (HMAC vs RSA) and compare them well?

I would like to kind of re-ask a question that was asked here two years ago (Benchmarking symmetric and asymmetric cryptography) but, as I find, was not satisfyingly answered.
1) I too would really like to back up the notion that RSA-like asymmetric cryptography is much more expensive than for example performing an HMAC operation with hard numbers. These numbers should be informative with regard to comparability of algorithms.
2) Moreover, I would be interested, in addition to mere mean values of speed, also in information about standard derivation/variance of the measured operation costs. This is because in the protocol in question, predictability of the operation time is actually an issue. This goes so far that if candidate A took significantly longer than B, but the time it took was more reliably predicabtle than the time that B took, then A would be my option of choice.
So my question is this: does anybody know of a Benchmarking tool which can give me the desired information 1) and 2) described above?
I should also mention that I have tried the "speed" command of OpenSSL, and found it unsatisfactory. So another question is: do any of you know of any further parameters or tools for that which could help me achieve my goal 2)? This would also be very welcome.
If you feel you can not help me with the first two questions, the last question would be: how is the information given back by "openssl speed rsa" to be read exactly (what are the input sizes, for example 0.o)? An answer to this would help me at least achieve goal 1).
Thanks in advance.
Kris
tl;dr: Do you know of any kind of Benchmark that gives me clear information on the performance of signature algorithms (more than just some general mean value, as for example "openssl speed" will give)?
If so, please tell me.
On a side note: please answer only if you have something to contribute to the questions as stated above. Mere recommendations of cryptographic algorithms or such are not really helpful to me.

Finding best path trought strongly connected component

I have a directed graph which is strongly connected and every node have some price(plus or negative). I would like to find best (highest score) path from node A to node B. My solution is some kind of brutal force so it takes ages to find that path. Is any algorithm for this or any idea how can I do it?
Have you tried the A* algorithm?
It's a fairly popular pathfinding algorithm.
The algorithm itself is not to difficult to implement, but there are plenty of implementations available online.
Dijkstra's algorithm is a special case for the A* (in which the heuristic function h(x) = 0).
There are other algorithms who can outperform it, but they usually require graph pre-processing. If the problem is not to complex and you're looking for a quick solution, give it a try.
EDIT:
For graphs containing negative edges, there's the Bellman–Ford algorithm. Detecting the negative cycles comes at the cost of performance, though (worse than the A*). But it still may be better than what you're currently using.
EDIT 2:
User #templatetypedef is right when he says the Bellman-Ford algorithm may not work in here.
The B-F works with graphs where there are edges with negative weight. However, the algorithm stops upon finding a negative cycle. I believe that is a useful behavior. Optimizing the shortest path in a graph that contains a cycle of negative weights will be like going down a Penrose staircase.
What should happen if there's the possibility of reaching a path with "minus infinity cost" depends on the problem.

Looking for ideas/references/keywords: adaptive-parameter-control of a search algorithm (online-learning)

I'm looking for ideas/experiences/references/keywords regarding an adaptive-parameter-control of search algorithm parameters (online-learning) in combinatorial-optimization.
A bit more detail:
I have a framework, which is responsible for optimizing a hard combinatorial-optimization-problem. This is done with the help of some "small heuristics" which are used in an iterative manner (large-neighborhood-search; ruin-and-recreate-approach). Every algorithm of these "small heuristics" is taking some external parameters, which are controlling the heuristic-logic in some extent (at the moment: just random values; some kind of noise; diversify the search).
Now i want to have a control-framework for choosing these parameters in a convergence-improving way, as general as possible, so that later additions of new heuristics are possible without changing the parameter-control.
There are at least two general decisions to make:
A: Choose the algorithm-pair (one destroy- and one rebuild-algorithm) which is used in the next iteration.
B: Choose the random parameters of the algorithms.
The only feedback is an evaluation-function of the new-found-solution. That leads me to the topic of reinforcement-learning. Is that the right direction?
Not really a learning-like-behavior, but the simplistic ideas at the moment are:
A: A roulette-wheel-selection according to some performance-value collected during the iterations (near past is more valued than older ones).
So if heuristic 1 did find all the new global best solutions -> high probability of choosing this one.
B: No idea yet. Maybe it's possible to use some non-uniform random values in the range (0,1) and i'm collecting some momentum of the changes.
So if heuristic 1 last time used alpha = 0.3 and found no new best solution, then used 0.6 and found a new best solution -> there is a momentum towards 1
-> next random value is likely to be bigger than 0.3. Possible problems: oscillation!
Things to remark:
- The parameters needed for good convergence of one specific algorithm can change dramatically -> maybe more diversify-operations needed at the beginning, more intensify-operations needed at the end.
- There is a possibility of good synergistic-effects in a specific pair of destroy-/rebuild-algorithm (sometimes called: coupled neighborhoods). How would one recognize something like that? Is that still in the reinforcement-learning-area?
- The different algorithms are controlled by a different number of parameters (some taking 1, some taking 3).
Any ideas, experiences, references (papers), keywords (ml-topics)?
If there are ideas regarding the decision of (b) in a offline-learning-manner. Don't hesitate to mention that.
Thanks for all your input.
Sascha
You have a set of parameter variables which you use to control your set of algorithms. Selection of your algorithms is just another variable.
One approach you might like to consider is to evolve your 'parameter space' using a genetic algorithm. In short, GA uses an analogue of the processes of natural selection to successively breed ever better solutions.
You will need to develop an encoding scheme to represent your parameter space as a string, and then create a large population of candidate solutions as your starting generation. The genetic algorithm itself takes the fittest solutions in your set and then applies various genetic operators to them (mutation, reproduction etc.) to breed a better set which then become the next generation.
The most difficult part of this process is developing an appropriate fitness function: something to quantitatively measure the quality of a given parameter space. Your search problem may be too complex to measure for each candidate in the population, so you will need a proxy model function which might be as hard to develop as the ideal solution itself.
Without understanding more of what you've written it's hard to see whether this approach is viable or not. GA is usually well suited to multi-variable optimisation problems like this, but it's not a silver bullet. For a reference start with Wikipedia.
This sounds like hyper heuristics which you're trying to do. Try looking for that keyword.
In Drools Planner (open source, java) I have support for tabu search and simulated annealing out the box.
I haven't implemented the ruin-and-recreate-approach (yet), but that should be easy, although I am not expecting better results. Challenge: Prove me wrong and fork it and add it and beat me in the examples.
Hyper heuristics are on my TODO list.

How to test numerical analysis routines?

Are there any good online resources for how to create, maintain and think about writing test routines for numerical analysis code?
One of the limitations I can see for something like testing matrix multiplication is that the obvious tests (like having one matrix being the identity) may not fully test the functionality of the code.
Also, there is the fact that you are usually dealing with large data structures as well. Does anyone have some good ideas about ways to approach this, or have pointers to good places to look?
It sounds as if you need to think about testing in at least two different ways:
Some numerical methods allow for some meta-thinking. For example, invertible operations allow you to set up test cases to see if the result is within acceptable error bounds of the original. For example, matrix M-inverse times the matrix M * random vector V should result in V again, to within some acceptable measure of error.
Obviously, this example exercises matrix inverse, matrix multiplication and matrix-vector multiplication. I like chains like these because you can generate quite a lot of random test cases and get statistical coverage that would be a slog to have to write by hand. They don't exercise single operations in isolation, though.
Some numerical methods have a closed-form expression of their error. If you can set up a situation with a known solution, you can then compare the difference between the solution and the calculated result, looking for a difference that exceeds these known bounds.
Fundamentally, this question illustrates the problem that testing complex methods well requires quite a lot of domain knowledge. Specific references would require a little more specific information about what you're testing. I'd definitely recommend that you at least have Steve Yegge's recommended book list on hand.
If you're going to be doing matrix calculations, use LAPACK. This is very well-tested code. Very smart people have been working on it for decades. They've thought deeply about issues that the uninitiated would never think about.
In general, I'd recommend two kinds of testing: systematic and random. By systematic I mean exploring edge cases etc. It helps if you can read the source code. Often algorithms have branch points: calculate this way for numbers in this range, this other way for numbers in another range, etc. Test values close to the branch points on either side because that's where approximation error is often greatest.
Random input values are important too. If you rationally pick all the test cases, you may systematically avoid something that you don't realize is a problem. Sometimes you can make good use of random input values even if you don't have the exact values to test against. For example, if you have code to calculate a function and its inverse, you can generate 1000 random values and see whether applying the function and its inverse put you back close to where you started.
Check out a book by David Gries called The Science of Programming. It's about proving the correctness of programs. If you want to be sure that your programs are correct (to the point of proving their correctness), this book is a good place to start.
Probably not exactly what you're looking for, but it's the computer science answer to a software engineering question.