Correctness testing for process modelling application - testing

Our group is building a process modelling application that simulates an industrial process. The final output of this process is a set of number representing chemistry and flow rates.
This application is based on some very old software that uses the exact same underlying mathematical model to create the simulation. Thousands of variables are involved in the simulation.
Although each component has been unit tested, we now need to be able to make sure that the data output produced by our software matches that of the old simulation software. I am wondering how best to approach this issue in a formalised and rigorous manner.
The old program works by specifying the input via a text file, so I was thinking we could programatically take each variable, adjust its value in the file (and correspondingly in our new application), then compare the outputs between the new and old application. We do this for every variable in the model.
We know the allowable range for each variable so I suppose a random sample across each variable of a few values is enough to show correctness for that particular variable.
Any thoughts on this approach? Any other ideas?

The comparison of output of the old and new applications id definitely good idea. This is sometime called back-to-back testing.
Regarding test input samples - get familiarized with following concepts:
Equivalence partitioning
Boundary-value analysis


Calculating Production chain using a database (factorio)

I'm playing the game Factorio, where you build a factory.
For the time being, I made a kind-of flowchart using libreoffice calc to calculate how many machines I need to produce a certain material.
Example image from the spreadsheet
Each block has a recipe saved (blue). This recipe includes what and how much it produces and needs and how much time it takes.
It takes the demand from the previous Block (yellow) and, using the recipe, calculates how many machines (green) it needs to fulfill this demand.
Based on the amount of machines it calculates its own demands (orange).
Then the following blocks do the same, until it has reached the last block.
Doing this in a spreadsheet does work, but it is quite a tedious task.
I showed this to my dad, as I'm quite proud of what I made, and he said that maybe a database would be more suitable.
I definitely see its advantages. For example I could easily summarize the final demands of raw resources, or the total power consumption, etc.
So I got myself Microsoft Access, and I'm pretty lost now. I know the basics of Databases and some SQL-Coding, but I'm not quite sure how I would make this.
My first attempt was:
one table for machines. It includes the machines production speed and other relevant stats.
one table for recipes. Each recipe clearly states what it produces, what it needs, the amount of each, and whether or not it is a basic. Basic means that it is a raw resources, i.e. the production chain would end with this.
one table for units. Each unit has a machine, a recipe and an amount. For example I would have one unit using basic assemblers to produce iron gears. This unit also says how many machines there are, so it needs more and produces more.
I did manage to make a query that calculates the total in and outputs of all units based on their machine and recipe, as well as a total energy consumption.
However, that is nowhere near the spreadsheet I made.
For now we can probably set the Graphical overlay aside, that would probably be quite a bit overkill. However what I do want to be able to make:
enter how much I want of a certain resources
based on that entry the database would create a new table. The first entry would be the unit that produces the requested resources. The second would fulfill the firsts demand, the third fulfills the seconds demand, and so on.
So in the end I would end up with a list of units that will produce my requested resource.
I hope someone can help me. There are programs out there that already do this kind of stuff, but I want to do this myself. If this is a problem that a database isn't suited for, then please tell me so.
Thanks for any help!

Using multidimensional test data to optimize an output

I am looking for a machine learning strategy that will allow me to load in test data, along with a test outcome, and optimize for a particular scenario to adjust future testing parameters.
(See example in the edit)
Original example: For example, consider that I have a 3 dimensional space (environmental chamber) that I place a physical test device into. I will then select a number of locations and physical attributes with which to test the device. First I will select to test the device at every location configuration, across multiple temperatures, humidities, and pressures. At each test increment, or combination of variables, I log the value of each feature, e.g. x,y,z positional data, as well as the temperature, humidity, pressure, etc.. after setting these parameters I will initiate a software test on the physical device that is affected by the environmental factors in ways too complex for me to predict. This software test can output three outputs that vary with an unknown (until tested) probability based on the logged physical parameters. Of the three outputs, one is a failure, one is a success, and one is that the test finishes without any meaningful output (we can ignore this case in processing).
Once the test finishes testing every physical parameter I provide it, I would like to then run an algorithm on this test data to select the controllable parameters e.g. x,y,z positions, or temperature to maximize my chance of a successful test while also minimizing my chance at a failure (in some cases we find parameters that have a high chance at failure and a high chance at success, failures are more time expensive, thus we need to avoid them). The algorithm should use these to suggest an alternative set of test parameter ranges to initiate the next test iteration that it believes will get us closer to our goal.
Currently, I can maximize the success case by using an iterative decision tree and ignoring the results that end in failure.
Any ideas are appreciated
Another example (this is contrived, lets not get into the details of PRNGS)-- Lets say that I have an embedded device that has a hardware pseudo random number generator (PRNG) that is affected by environmental factors, such as heat and magnetometer data. I have a program on the device that uses this PRNG to give me a random value. Suppose that this PRNG just barely achieves sufficient randomization in the average case, in the best case gives me a good random value, and in the worst case fails to provide a random value. By changing the physical parameters of the environment around the device I can find values which with some probability cause this PRNG to fail to give me a random number, give me an 'ok' random number, and which cause it to succeed in generating a cryptographically secure random number. Lets suppose in cases in which it fails to generate a good enough random number the program enters a long running loop trying to find one before ultimately failing, which we would like to avoid as it is computationally expensive. To test this, we first start off by testing every combination of variables in which we can control (temperature, position, etc..) perhaps by jumping several degrees at a time, to give a rough picture on what the device's response looks like over a large range. I would like then to run an algorithm on this test data, narrow my testing windows and iterate over the newly selected feature parameters to arrive at an optimized solution. In this case, since failures are expensive, the optimized solution would be something that minimizes failures, while simultaneously maximizing successes.

Unit testing strategy of a Mathematical system like Sage (and MACSYMA, Mathematica)

To test the correctness and performance of a mathematical system like Sage, do people use a standard test data set of math problems?
If so I'd appreciate a link or reference to the data set.
I have taken a look at some of the documents related to testing of Sage like Running Sage’s doctests
I cannot answer regarding Mathematica or Macsyma (or Maple or ...), but both Sage and Maxima have unit tests that are indeed run with each micro-release; however, they are usually not a 'standard' set of problems in either case, though both have some subset thereof. Depending on the area, some may be part of a standard set - Sage tries to test as many of Wester's problems in calculus, and Maxima does them in all sorts of areas. Some papers and books have full doctests built into Sage, e.g. the k-Schur function primer. But otherwise it just is a set of representative tests in both cases, e.g. Maxima Lambert W or Sage normal form games.
If any such data sets exist, it would be a very worthwhile contribution to turn them into a testing file for any given system - Sympy comes to mind, for instance, as another worthy target.

Run the same IPython notebook code on two different data files, and compare

Is there a good way to modularize and re-use code in IPython Notebook (Jupyter) when doing the same analysis on two different sets of data?
For example, I have a notebook with a lot of cells doing analysis on a data file. I have another data file of the same format, and I'd like to run the same analysis and compare the output. None of these options looks particularly appealing for this:
Copy and paste the cells to a second notebook. The analysis code is now duplicated and harder to update.
Move the analysis code into a module and run it for both files. This would lose the cell-by-cell format of the figures that are currently generated and simply jumble them all together in one massive cell.
Load both files in one notebook and run the analyses side by side. This also involves a lot of copy-and-pasting, and doesn't generalize well to 3 or 4 different data files.
Is there a better way to do this?
You could lace demo directives into the standalone module, as per the IPython Demo Mode example.
Then when actually executing it in the notebook, you make a call to the demo object wrapper each time you want to step to the next important part. So your cells would mostly consist of calls to that demo wrapper object.
Option 2 is clearly the best for code re-use, it is the de facto standard arguably in all of software engineering.
I argue that the notebook concept itself doesn't scale well to 3, 4, 5, ... different data files. Notebook presentations are not meant to be batch processing receptacles. If you find yourself needing to do parameter sweeps across different data sets, and wanting to re-run analyses on top of the different data loaded for each parameter group (even when the 'parameters' might be as simple as different file names) it raises a bad code smell. It likely means the level of analysis being performed in an 'interactive' way is wrong. Witnessing analysis 'interactively' and at the same time performing batch processing are two pretty much incompatible goals. A much better idea would be to batch process all of the parameter sets separately, 'offline' from the point of view of any presentation, and then build a set of stand-alone functions that can produce visual results from the computed and stored batch results. Then the notebook will just be a series of function calls, each of which produces summary data (some of which could be examples from a selection of parameter sets during batch processing) across all of the parameter sets at once to invite the necessary comparisons and meaningfully present the result data side-by-side.
'Witnessing' an entire interactive presentation that performs analysis on one parameter set, then changing some global variable / switching to a new notebook / running more cells in the same notebook in order to 'witness' the same presentation on a different parameter set sounds borderline useless to me, in the sense that I cannot imagine a situation where that mode of consuming the presentation is not strictly worse than consuming a targeted summary presentation that first computed results for all parameter sets of interest and assembled important results into a comparison.
Perhaps the only case I can think of would be toy pedagogical demos, like some toy frequency data and a series of notebooks that do some simple Fourier analysis or something. But that's exactly the kind of case that begs for the analysis functions to be made into a helper module, and the notebook itself just lets you selectively declare which toy input file you want to run the notebook on top of.

Automatically create test cases for web page?

If someone has a webpage, the usual way of testing the web site for user interaction bugs is to create each test case by hand and use selenium.
Is there a tool to create these testcases automatically? So if I have a webpage that gets altered, new test cases get created automatically?
You can look at a paid product. That type of technology is not being developed as open source and will probably cost a bit. Some of the major test tools get closer to this, but full auto I have not heard of.
If this was the case the role of QA Engineer and especially Automation Engineer would not be as important and the jobs would spike downwards pretty quickly. I would imagine that if such a tool was out there that it would be breaking news to the entire industry and be world wide.
If you go down the artificial intelligence path this is possible in theory and concept, however, usually artificial intelligence development efforts costs more than the app being developed that needs the testing, so...that's not going to happen.
The best to do at this point is separate out as much of the maintenance into a single section from the rest so you limit the maintenance headache when modyfying and keep a core that stays the same. I usually focus on control manipulation as generic and then workflow and specific maps and data change. That will allow it to function against any website...but you still have to write/update the tests and maintain the maps.
I think Growing Test Cases Automatically is more of what your asking. To be more specific I'll try to introduce basics and if you're interested take a closer look at Evolutionary Testing
Usually there is a standard set of constraints we meet like changing functionality of the system under test (SUT), limited timeframe, lack of appropriate test tools and the list goes on… Yet there is another type of challenge which arises as technological solutions progress further – increase of system complexity.
While the typical constraints are solvable through different technical and management approaches, in the case of system complexity we are facing the limit of our capability of defining a straight-forward analytical method for assessing and validating system behavior. Complex system consist of multiple, often heterogeneous components which when working together amplify each other’s statistical and behavioral deviations, resulting in a system which acts in ways that were not part of its initial design. To make matter worse, complex systems increase sensitivity to their environment as well with the help of the same mechanism.
Options for testing complex systems
How can we test a system which behaves differently each time we run a test scenario? How can we reproduce a problem which costs days and millions to recover from, but happens only from time to time under conditions which are known just approximately?
One possible solution which I want to focus on is to embrace our lack of knowledge and work with whatever we have by using evolutionary testing. In this context the evolutionary testing can be viewed as a variant of black-box testing, because we are working with feeding input into and evaluating output from a SUT without focusing on its internal structure. The fine line here is that we are organizing this process of automatic test case generation and execution on a massive scale as an iterative optimization process which mimics the natural evolution.
Evolutionary testing
• Population – set of test case executions, participating into the optimization process
• Generation – the part of the Population, involved into given iteration
• Individual – single test case execution and its results, an element from the Population
• Genome – unified definition of all test cases, model describing the Population
• Genotype – a single test case instance, a model describing an Individual, instance of the Genome
• Recombination – transformation of one or more Genotypes into a new Genotype
• Mutation – random change in a Genotype
• Fitness Function – formalized criterion, expressing the suitability of the Individual against the goal of the optimization
How we create these elements?
• Definition of the experiment goal (selection criteria) – sets the direction of the optimization process and is related to the behavior of the SUT. Involves certain characteristics of SUT state or environment during the performed test case experiments. Examples:
o “SUT should complete the test case execution with an error code”
o “The test case should drive the SUT through the largest number of branches in SUT’s logical structure”
o “Ambient temperature in the room where SUT is situated should not exceed 40 ºC during test case execution”
o “CPU utilization on the system, where SUT runs should exceed 80% during test case execution”
Any measurable parameters of SUT and its environment could be used in a goal statement. Knowledge of the relation between the test input and the goal itself is not obligatory. This gives a possibility to cover goals which are derived directly from requirements, rather than based on some late requirement derivative like business, architectural or technical model.
• Definition of the relevant inputs and outputs of the tested system – identification of SUT inputs and outputs, as well as environment parameters, relevant to the experiment goal.
• Formal definition of the experiment genome – encoding the summarized set of test cases into a parameterized model (usually a data structure), expressing relevant SUT input data, environment parameters and action sequences. This definition also needs to comply with the two major operations applied over genome instances – recombination and mutation. The mechanism for those two operations can be predefined for the type of data or action present in the genome or have custom definitions
• Formal definition of the selection criteria (fitness function) – an evaluation mechanism which takes SUT output or environment parameters resulting from a test case execution (Individual) and calculates a number (Fitness), signifying how close is this particular Individual to the experiment goal.
How the process works?
We use the Genome to create a Generation of random Genotypes (test case instances).
We execute the test cases (Genotypes) generating results (Individuals)
We evaluate each execution result (Individual) against our goal using the Fitness Function
We select only those Individuals from given Generation which have Fitness above a given threshold (the top 10 %, above the average, etc.)
We use the selected individuals to produce a new, full Generation set by applying Recombination and Mutation
We repeat the process, returning on step 2
The iteration process usually stops by setting a condition with regard to the evaluated Fitness of a Generation. For example:
• If the top Fitness hasn’t changed with more than 0.1% since the last Iteration
• If the difference between the top and the bottom Fitness in a Generation is less than 0.3%
then probably it is time to stop.
Upsides and downsides
• We can work with limited knowledge for the SUT and goal-oriented test definitions
• We use a test case model (Genome) which allows us to mass-produce a large number of test cases (Genotypes) with little effort
• We can “seed” test cases (Genotypes) in the first iteration instead of generating them at random in order to speed up the optimization process.
• We could run test cases in parallel in order to speed up the process
• We could find multiple solutions which meet our test goal
• If the optimization process in convergent we have a guarantee that each following Generation is a better approximate solution of our test goal. This means that even if we need to stop before we have reached optimal Fitness we will still have better test cases than the one we started with.
• We can achieve replay of very complex, hard to reproduce test scenarios which mimic real life and which are far beyond the reach of any other automated or manual testing technique.
• The process of defining the necessary elements for evolutionary test implementation is non-trivial and requires specific knowledge.
• Implementing such automation approach is time- and resource-consuming and should be employed only when it is justifiable.
• The convergence of the optimization process depends on the smoothness of the Fitness Function. If its definition results in a zones of discontinuity or small/no gradient then we can expect slow or no convergence
I also recommend you to look at Genetic algorithms and this article about Test data generation can give you approaches and guidelines.
I happen to develop ecFeed - an open-source tool that may assist in test design. It's in pre-release phase and we are going to add better integration with Selenium, but you may have a look at the current snapshot: . The next version should arrive in October and will have major improvements in usability. Anyway, I am looking forward for constructive criticism.
In the Microsoft development world there is Visual Studio's Coded UI Test framework. This will record your actions in a web browser and generate test cases to replicate that use case. It won't update test cases with any changes to code though, you would need to update them manually or re-generate.