Using multidimensional test data to optimize an output - optimization

I am looking for a machine learning strategy that will allow me to load in test data, along with a test outcome, and optimize for a particular scenario to adjust future testing parameters.
(See example in the edit)
Original example: For example, consider that I have a 3 dimensional space (environmental chamber) that I place a physical test device into. I will then select a number of locations and physical attributes with which to test the device. First I will select to test the device at every location configuration, across multiple temperatures, humidities, and pressures. At each test increment, or combination of variables, I log the value of each feature, e.g. x,y,z positional data, as well as the temperature, humidity, pressure, etc.. after setting these parameters I will initiate a software test on the physical device that is affected by the environmental factors in ways too complex for me to predict. This software test can output three outputs that vary with an unknown (until tested) probability based on the logged physical parameters. Of the three outputs, one is a failure, one is a success, and one is that the test finishes without any meaningful output (we can ignore this case in processing).
Once the test finishes testing every physical parameter I provide it, I would like to then run an algorithm on this test data to select the controllable parameters e.g. x,y,z positions, or temperature to maximize my chance of a successful test while also minimizing my chance at a failure (in some cases we find parameters that have a high chance at failure and a high chance at success, failures are more time expensive, thus we need to avoid them). The algorithm should use these to suggest an alternative set of test parameter ranges to initiate the next test iteration that it believes will get us closer to our goal.
Currently, I can maximize the success case by using an iterative decision tree and ignoring the results that end in failure.
Any ideas are appreciated
Edit:
Another example (this is contrived, lets not get into the details of PRNGS)-- Lets say that I have an embedded device that has a hardware pseudo random number generator (PRNG) that is affected by environmental factors, such as heat and magnetometer data. I have a program on the device that uses this PRNG to give me a random value. Suppose that this PRNG just barely achieves sufficient randomization in the average case, in the best case gives me a good random value, and in the worst case fails to provide a random value. By changing the physical parameters of the environment around the device I can find values which with some probability cause this PRNG to fail to give me a random number, give me an 'ok' random number, and which cause it to succeed in generating a cryptographically secure random number. Lets suppose in cases in which it fails to generate a good enough random number the program enters a long running loop trying to find one before ultimately failing, which we would like to avoid as it is computationally expensive. To test this, we first start off by testing every combination of variables in which we can control (temperature, position, etc..) perhaps by jumping several degrees at a time, to give a rough picture on what the device's response looks like over a large range. I would like then to run an algorithm on this test data, narrow my testing windows and iterate over the newly selected feature parameters to arrive at an optimized solution. In this case, since failures are expensive, the optimized solution would be something that minimizes failures, while simultaneously maximizing successes.

Related

How do I handle variability of output in Anylogic?

I have been working on a simulation model for battery swapping in Anylogic. So far I have developed the simulation model, optimization experiment and parameters variation experiment.
There are no errors in the model but the output values are unsatisfactory. Small changes such as changing the step size of the decision variables results in a drastic change in the best value obtained after every experiment. Though the objective does not change much but I am concerned about the other variables that are changing with each run. Even with multiple optimization runs it is difficult to come to a conclusion.
For reference I am posting an output of parameters variation experiment here. I ran the experiment with an optimized value but I was getting feasible results (percentile > 95%) far off the expected input values. Although, the overall result is correct (decreasing percentile with increasing charging time) but it is difficult to understand the variability.
Can anyone help?enter image description here
When building a model, this is a common problem you will have when looking at high level overall outputs. You could have a model bug, but it is just as likely (if not more likely) that there is some dynamic to your system that was not clear in simple Excel spreadsheets or mental models. The DES may be telling us something truly interesting about the system behavior, but without additional outputs, there is no way to understand what that is.
A few suggestions:
Run this as a simple single scenario, where you manually update inputs. When you run this with the low range of input values and then the high range of input values, what do you see on the animation or additional outputs that is different than you expected or could explain the overall output trend? Try running several intermediate points.
Add additional output metrics. If you look at queue sizes, resource utilizations, turn-around-times, etc; do you see anything at that level that is different than expected?
Add a "replication" log. When you run a set of inputs for multiple scenarios, does any single replication stand out as an outlier? If so, re-run the scenario with that set of inputs and that random seed.
There is no substitute for understanding underlying system behavior, and without understanding those dynamics, looking at overall correlation with optimization or parameter variation experiments will often lead companies to make the wrong policies decisions.

relation between random testing and fuzz testing

I am wondering what the relation between random and fuzz testing is. I understand that random testing has been there for a longer time, but I cannot see any differences between them. They both seem to make use of random input to see if the program goes into an unexpected state (i.e. crash). Is the main difference difference that fuzz testing is automated?
Random(ized) testing has no intention of crashing a system. You can randomize valid values. The goals could be to increase coverage or to find out new/unexpected information about the system (possibly bugs, but could be simply unknown behaviour).
Fuzz(y) testing is about sending complete rubbish (e.g. could be random bytes instead of HTTP request) into the system and see whether it can handle it gracefully (not crash or hang). The data is not necessarily random - it's just meaningless to the software.

Automatically create test cases for web page?

If someone has a webpage, the usual way of testing the web site for user interaction bugs is to create each test case by hand and use selenium.
Is there a tool to create these testcases automatically? So if I have a webpage that gets altered, new test cases get created automatically?
You can look at a paid product. That type of technology is not being developed as open source and will probably cost a bit. Some of the major test tools get closer to this, but full auto I have not heard of.
If this was the case the role of QA Engineer and especially Automation Engineer would not be as important and the jobs would spike downwards pretty quickly. I would imagine that if such a tool was out there that it would be breaking news to the entire industry and be world wide.
If you go down the artificial intelligence path this is possible in theory and concept, however, usually artificial intelligence development efforts costs more than the app being developed that needs the testing, so...that's not going to happen.
The best to do at this point is separate out as much of the maintenance into a single section from the rest so you limit the maintenance headache when modyfying and keep a core that stays the same. I usually focus on control manipulation as generic and then workflow and specific maps and data change. That will allow it to function against any website...but you still have to write/update the tests and maintain the maps.
I think Growing Test Cases Automatically is more of what your asking. To be more specific I'll try to introduce basics and if you're interested take a closer look at Evolutionary Testing
Usually there is a standard set of constraints we meet like changing functionality of the system under test (SUT), limited timeframe, lack of appropriate test tools and the list goes on… Yet there is another type of challenge which arises as technological solutions progress further – increase of system complexity.
While the typical constraints are solvable through different technical and management approaches, in the case of system complexity we are facing the limit of our capability of defining a straight-forward analytical method for assessing and validating system behavior. Complex system consist of multiple, often heterogeneous components which when working together amplify each other’s statistical and behavioral deviations, resulting in a system which acts in ways that were not part of its initial design. To make matter worse, complex systems increase sensitivity to their environment as well with the help of the same mechanism.
Options for testing complex systems
How can we test a system which behaves differently each time we run a test scenario? How can we reproduce a problem which costs days and millions to recover from, but happens only from time to time under conditions which are known just approximately?
One possible solution which I want to focus on is to embrace our lack of knowledge and work with whatever we have by using evolutionary testing. In this context the evolutionary testing can be viewed as a variant of black-box testing, because we are working with feeding input into and evaluating output from a SUT without focusing on its internal structure. The fine line here is that we are organizing this process of automatic test case generation and execution on a massive scale as an iterative optimization process which mimics the natural evolution.
Evolutionary testing
Elements:
• Population – set of test case executions, participating into the optimization process
• Generation – the part of the Population, involved into given iteration
• Individual – single test case execution and its results, an element from the Population
• Genome – unified definition of all test cases, model describing the Population
• Genotype – a single test case instance, a model describing an Individual, instance of the Genome
• Recombination – transformation of one or more Genotypes into a new Genotype
• Mutation – random change in a Genotype
• Fitness Function – formalized criterion, expressing the suitability of the Individual against the goal of the optimization
How we create these elements?
• Definition of the experiment goal (selection criteria) – sets the direction of the optimization process and is related to the behavior of the SUT. Involves certain characteristics of SUT state or environment during the performed test case experiments. Examples:
o “SUT should complete the test case execution with an error code”
o “The test case should drive the SUT through the largest number of branches in SUT’s logical structure”
o “Ambient temperature in the room where SUT is situated should not exceed 40 ºC during test case execution”
o “CPU utilization on the system, where SUT runs should exceed 80% during test case execution”
Any measurable parameters of SUT and its environment could be used in a goal statement. Knowledge of the relation between the test input and the goal itself is not obligatory. This gives a possibility to cover goals which are derived directly from requirements, rather than based on some late requirement derivative like business, architectural or technical model.
• Definition of the relevant inputs and outputs of the tested system – identification of SUT inputs and outputs, as well as environment parameters, relevant to the experiment goal.
• Formal definition of the experiment genome – encoding the summarized set of test cases into a parameterized model (usually a data structure), expressing relevant SUT input data, environment parameters and action sequences. This definition also needs to comply with the two major operations applied over genome instances – recombination and mutation. The mechanism for those two operations can be predefined for the type of data or action present in the genome or have custom definitions
• Formal definition of the selection criteria (fitness function) – an evaluation mechanism which takes SUT output or environment parameters resulting from a test case execution (Individual) and calculates a number (Fitness), signifying how close is this particular Individual to the experiment goal.
How the process works?
We use the Genome to create a Generation of random Genotypes (test case instances).
We execute the test cases (Genotypes) generating results (Individuals)
We evaluate each execution result (Individual) against our goal using the Fitness Function
We select only those Individuals from given Generation which have Fitness above a given threshold (the top 10 %, above the average, etc.)
We use the selected individuals to produce a new, full Generation set by applying Recombination and Mutation
We repeat the process, returning on step 2
The iteration process usually stops by setting a condition with regard to the evaluated Fitness of a Generation. For example:
• If the top Fitness hasn’t changed with more than 0.1% since the last Iteration
• If the difference between the top and the bottom Fitness in a Generation is less than 0.3%
then probably it is time to stop.
Upsides and downsides
Upsides:
• We can work with limited knowledge for the SUT and goal-oriented test definitions
• We use a test case model (Genome) which allows us to mass-produce a large number of test cases (Genotypes) with little effort
• We can “seed” test cases (Genotypes) in the first iteration instead of generating them at random in order to speed up the optimization process.
• We could run test cases in parallel in order to speed up the process
• We could find multiple solutions which meet our test goal
• If the optimization process in convergent we have a guarantee that each following Generation is a better approximate solution of our test goal. This means that even if we need to stop before we have reached optimal Fitness we will still have better test cases than the one we started with.
• We can achieve replay of very complex, hard to reproduce test scenarios which mimic real life and which are far beyond the reach of any other automated or manual testing technique.
Downsides:
• The process of defining the necessary elements for evolutionary test implementation is non-trivial and requires specific knowledge.
• Implementing such automation approach is time- and resource-consuming and should be employed only when it is justifiable.
• The convergence of the optimization process depends on the smoothness of the Fitness Function. If its definition results in a zones of discontinuity or small/no gradient then we can expect slow or no convergence
Update:
I also recommend you to look at Genetic algorithms and this article about Test data generation can give you approaches and guidelines.
I happen to develop ecFeed - an open-source tool that may assist in test design. It's in pre-release phase and we are going to add better integration with Selenium, but you may have a look at the current snapshot: https://github.com/testify-no/ecFeed/wiki . The next version should arrive in October and will have major improvements in usability. Anyway, I am looking forward for constructive criticism.
In the Microsoft development world there is Visual Studio's Coded UI Test framework. This will record your actions in a web browser and generate test cases to replicate that use case. It won't update test cases with any changes to code though, you would need to update them manually or re-generate.

Algorithmically suggest best node to perform demanding computation

At work we perform demanding numerical computations.
We have a network of several Linux boxes with different processing capabilities. At any given time, there can be anywhere from zero to dozens of people connected to a given box.
I created a script to measure the MFLOPS (Million of Floating Point Operations per Second) using the Linpack Benchmark; it also provides number of cores and memory.
I would like to use this information together with the load average (obtained using the uptime command) to suggest the best computer for performing a demanding computation. In other words, its 3:00pm; I have a meeting in two hours; I need to run a demanding process: what node will get me the answer fastest?
I envision a script which will output a suggestion along the lines of:
SUGGESTED HOSTS (IN ORDER OF PREFERENCE)
HOST1.MYNETWORK
HOST2.MYNETWORK
HOST3.MYNETWORK
Such suggestion should favor fast computers (high MFLOPS) if the load average is low and, as load average increases for a given node, it should favor available nodes instead (i.e., I'd rather run in a slower computer with no users than in an eight-core with forty dudes logged in).
How should I prioritize? What algorithm (rationale) would you use? Again, what I have is:
Load Average (1min, 5min, 15min)
MFLOPS measure
Number of users logged in
RAM (installed and available)
Number of cores (important to normalize the load average)
Any thoughts? Thanks!
You don't have enough data to make an well-informed decision. It sounds as though the scheduling is very volatile: "At any given time, there can be anywhere from zero to dozens of people connected to a given box." So the current load does not necessarily reflect the future load of the machines.
To properly asses what hosts someone should use to minimize computation time would require knowing when the current jobs will terminate. If a powerful machine is about to be done doing most of its jobs, it would be a good candidate even though it currently has a high load.
If you want to guess purely on the current situation, you can do a weighed calculation to find out which hosts have the most MFLOPS available.
MFLOPS available = host's MFLOPS + (number of logical processors - load average)
Sort the hosts by MFLOPS available and suggest them in a descending order.
This formula assumes that the MFLOPS of a host is linearly related to its load average. This might not be exactly true, but it's probably fairly close.
I would favor the most recent load average since it's closer to the current/future situation, whereas, jobs from 15 minutes ago might have completed by now.
Have you considered a distributed approach to computation? Not all computations can be broken up such that more than one cpu can work on them. But perhaps your problem space can benefit from some parallelization. Have a look at Hadoop.
You don't need to know FLOPS. beowulf modules paralell computing center has I go to has the script for sure
PDC operates leading-edge, high-performance computers on a national level. PDC offers easily accessible computational resources that primarily cater to the ...

Correctness testing for process modelling application

Our group is building a process modelling application that simulates an industrial process. The final output of this process is a set of number representing chemistry and flow rates.
This application is based on some very old software that uses the exact same underlying mathematical model to create the simulation. Thousands of variables are involved in the simulation.
Although each component has been unit tested, we now need to be able to make sure that the data output produced by our software matches that of the old simulation software. I am wondering how best to approach this issue in a formalised and rigorous manner.
The old program works by specifying the input via a text file, so I was thinking we could programatically take each variable, adjust its value in the file (and correspondingly in our new application), then compare the outputs between the new and old application. We do this for every variable in the model.
We know the allowable range for each variable so I suppose a random sample across each variable of a few values is enough to show correctness for that particular variable.
Any thoughts on this approach? Any other ideas?
The comparison of output of the old and new applications id definitely good idea. This is sometime called back-to-back testing.
Regarding test input samples - get familiarized with following concepts:
Equivalence partitioning
Boundary-value analysis