Is including system test cases into the final packaged product of your application contributing to bloat or increasing risk? - testing

I am packaging up an rpm file which has a %postinstall section that detects certain conditions and runs a suite of unit, function, and system tests. I am getting some push back that it exposes some of the internal structure as I use some of the same environment variables the code itself uses for diagnostics. Thoughts?
UPDATE: I am not planning on running the tests automatically nor exposing their existance to the end users. I am proposing that the testing package simply be available to any machine where the suite lands. It adds roughtly 3% to the final size of the package and requires an obscene amount of internal knowledge to execute properly.
The program itself is a library which others may use and is exposed in an API. The internal knowledge of how things functions is not at issue. My main motivation is the lack of a suitable test resources and the large variability in the target environment. Some of the tests are really simple (similar to what configure might do to determine all the right features are available from the compiler). Other tests are more involved and they prove the basic functions the library should provide.

If you want to avoid the complaint that it runs on every install, at least use the %check rule of RPM.
Sounds like people are concerned about "reverse engineering". So the software is proprietary? This would seem to be the crux of your problem. Regardless, it's common for the test suite to be separate from the packaged software.
However, you're not being unrealistic: Allowing users to run tests themselves on their systems and give you the results is a great aspect of a collaborative relationship with users. Unfortunately, you're running up against the proprietary business model.
Perhaps you can compromise by trimming down or rewriting the tests and the diagnostics to only prove an adequate amount of fitness without revealing too much. I wouldn't back down from throwing out the tests and diagnostics of what you've written so far.
You really should make the argument that users will be pleased and have more confidence in a software package shipped with a thorough testing system, and that these outweigh any fears of revealing the software's internals.

Related

Usage of STAF/STAX in Automation

I had been exploring on STAX/STAF past week.
It is used for test automation execution & is some what similar to Hudson.
I would like to know on which type of Tests it can be used. i.e functional tests, load tests etc., The functional automation tests are basically dependent on the framework i.e how they run, their return status on fail or pass are through the framework . How I can we integrate such with the Test Automation Framework like STAF?
I've been using STAF/STAX for over 4 years.
PROs:
Open Source
Cross-platform
Concurrent execution
Extensible (i.e. you can write your own services)
Decent support from IBM through the STAF website
CONs:
Sometimes buggy
Difficult to diagnose problems
Programming STAX scripts is awkward and ugly (i.e. scripting via XML tags and embedded jython)
I've found that STAF/STAX is useful for systems test. It enables you, for example, to launch a server on one system and a client on another, then test their interaction. It's also helpful if you need to test cross-platform, or for multiple language bindings. I also like the fact that it can be used both in large, networked systems, as well as on an individual's desktop.
On the other hand, I would probably avoid using it for unit testing, or tests that are relatively simple and can be run on a single system. I'd probably use a language specific unit framework for that.
STAF is not comparable to Hudson.
When I look at something like Hudson/Jenkins and Buildbot, I see GUI with emphasis on scheduling, viewing what's going on, what was done, and how it went.
STAF, on the other hand, is more like the plumbing for a QA framework over a distributed environment. It helps with launching processes, collecting logs, locking resources, etc.

What things to test when re-compiling software for another operating system?

Our software vendor is currently working on a project to migrate our enterprise scale laboratory system from Tru64 unix to Red Hat.
This obviously means recompiling with a new compiler and perform lots of testing.
While the vendor will do their own testing, we also need to do acceptance testing.
We don't exactly trust the vendor will be as thorough with their testing as we hope.
So I have been tasked to think of things that will need to be tested.
This is a laboratory system, so things such as calculations and rounding (and general maths) need to be tested.
But I thought I would ask the SO community for advice on what to test or perhaps past experiences with this sort of thing?
You will need to test everything. Whatever you tested in your original environment, you will need to test in your new environment.
Eventually, you'll gain confidence that most of your tests will simply never fail in the new environment. There will surely be a set of tests that will always succeed, as long as the old and new environments are Unix-based systems. That's fine - that's a set of tests you won't need to run constantly. I'd still keep those tests around to run once per release of the new OS or per release of your product, however, just to be safe.
Check it works on 32 and 64 bit CPUs, spaces in filenames, users don't need admin rights to run it or change configs
One unix to another isn't a huge leap.
If you can come up with a suite of regression tests, you can use those scenarios via an automated tool against the original and ported systems to make sure they match. The QA and UAT tests that you currently run against the system would probably be a good starting point, and then you could add any critical edge cases (such as those that test the math in detail) as needed. Paul's suggestion above about compiler issues would also allow derivation of some good edge cases; I'd recommend looking at that scope from the perspective of both the Tru64 and RHEL compilers.
A fair amount of my recent experience is with JMeter, which has a number of assertions, pre-conditions, and post-conditions that can be evaluated to ensure compliance. A number of the tools in this space would also allow you to do load testing, if appropriate.
If your system doesn't have a remotely accessible interface (like a web-based or socket-based interface), you could potentially do the same thing with scripted tools.
Thirteen or fourteen years ago, I couldn't move an Informix database from SCO OpenServer to Linux because SCO used 16-bit inode numbers, Linux used 32-bit inode numbers, and Linux's 'personalities' was nowhere near as advanced as it is today. I can appreciate your skepticism.
If you can re-run old experiments with saved data and saved outcomes, that would be my preferred place to start. Given simple datatypes, the precision or range of operations may be vastly different on different compilers/platforms, so I wouldn't be surprised if small differences in output are common, so exact matches may not be realistic, but certainly results should be close enough to not influence the larger 'outcomes' of your testing runs.
Rather than searching for test cases, use it for what you've already done with it. (As an aside, that's also a good way to build test cases for software development.)
Differences in precision between the standard math library functions. They are not the same on different systems. If you need consistent calculations, you will need to replace them. Look into crlibm and/or fdlibm.

What kinds of tests are there?

I've always worked alone and my method of testing is usually compiling very often and making sure the changes I made work well and fix them if they don't. However, I'm starting to feel that that is not enough and I'm curious about the standard kinds of tests there are.
Can someone please tell me about the basic tests, a simple example of each, and why it is used/what it tests?
Thanks.
Different people have slightly different ideas about what constitutes what kind of test, but here are a few ideas of what I happen to think each term means. Note that this is heavily biased towards server-side coding, as that's what I tend to do :)
Unit test
A unit test should only test one logical unit of code - typically one class for the whole test case, and a small number of methods within each test. Unit tests are (ideally) small and cheap to run. Interactions with dependencies are usually isolated with a test double such as a mock, fake or stub.
Integration test
An integration test will test how different components work together. External services (ones not part of the project scope) may still be faked out to give more control, but all the components within the project itself should be the real thing. An integration test may test the whole system or some subset.
System test
A system test is like an integration test but with real external services as well. If this is automated, typically the system is set up into a known state, and then the test client runs independently, making requests (or whatever) like a real client would, and observing the effects. The external services may be production ones, or ones set up in just a test environment.
Probing test
This is like a system test, but using the production services for everything. These run periodically to keep track of the health of your system.
Acceptance test
This is probably the least well-defined term - at least in my mind; it can vary significantly. It will typically be fairly high level, like a system test or an integration test. Acceptance tests may be specified by an external entity (a standard specification or a customer).
Black box or white box?
Tests can also be "black box" tests, which only ever touch the public API, or "white box" tests which take advantage of some extra knowledge to make testing easier. For example, in a white box test you may know that a particular internal method is used by all the public API methods, but is easier to test. You can test lots of corner cases by calling that method directly, and then do fewer tests with the public API. Of course, if you're designing the public API you should probably design it to be easily testable to start with - but it doesn't always work out that way. Often it's nice to be able to test one small aspect in isolation of the rest of the class.
On the other hand, black box testing is generally less brittle than white box testing: by definition, if you're only testing what the API guarantees in its contracts, then the implementation can change as much as it wants without the tests changing. White box tests, on the other hand, are sensitive to implementation changes: if the internal method changes subtly - or gains an extra parameter, for example - then you'll need to change the tests to reflect that.
It all boils down to balance, in the end - the higher the level of the test, the more likely it is to be black box. Unit tests, on the other hand, may well include an element of white box testing... at least in my experience. There are plenty of people who would refuse to use white box testing at all, only ever testing the public API. That feels more dogmatic than pragmatic to me, but I can see the benefits too.
Starting out
Now, as for where you should go next - unit testing is probably the best thing to start with. You may choose to write the tests before you've designed your class (test-driven development) or at roughly the same time, or even months afterwards (not ideal, but there's a lot of code which doesn't have tests but should). You'll find that some of your code is more amenable to testing than others... the two crucial concepts which make testing feasible (IMO) are dependency injection (coding to interfaces and providing dependencies to your class rather than letting them instantiate those dependencies themselves) and test doubles (e.g. mocking frameworks which let you test interaction, or fake implementations which do everything a simple way in memory).
I would suggest reading at least book about this, since the domain is quite huge, and books tend to synthesize better such concepts.
E.g. A very good basis might be: Software Testing Testing Across the Entire Software Development Life Cycle (2007)
I think such a book might explain better everything than some out of context examples we could post here.
Hi… I would like to add on to what Jon Skeet Sir’s answer..
Based on white box testing( or structural testing) and black box testing( or functional testing) the following are the other testing techniques under each respective category:
STRUCTURAL TESTING Techniques
Stress Testing
This is used to test bulk volumes of data on the system. More than what a system normally takes. If a system can stand these volumes, it can surely take normal values well.
E.g.
May be you can take system overflow conditions like trying to withdraw more than available in your bank balance shouldn’t work and withdrawing up to a maximum threshold should work.
Used When
This can be mainly used we your unsure about the volumes up to your system can handle.
Execution Testing
Done in order to check how proficient is a system.
E.g.
To calculate turnaround time for transactions.
Used when:
Early in the development process to see if performance criteria is met or not.
Recovery Testing
To see if a system can recover to original form after a failure.
E.g.
A very common e.g. in everyday life is the System Restore present in Windows OS..
They have restore points used for recovery as one would well know.
Used when:
When a user feels an application critical to him/her at that point of time has stopped working and should continue to work, for which he performs recovery.
Other types of testing which you could find use of include:-
Operations Testing
Compliance Testing
Security Testing
FUNCTIONAL TESTING Techniques include:
Requirements Testing
Regression Testing
Error-Handling Testing
Manual-Support Testing
Intersystem testing
Control Testing
Parallel Testing
There is a very good book titled “Effective methods for Software Testing” by William Perry of Quality Assurance Institute(QAI) which I would suggest is a must read if you want to go in depth w.r.t. Software Testing.
More on the above mentioned testing types would surely be available in this book.
There are also two other very broad categories of Testing namely
Manual Testing: This is done for user interfaces.
Automated Testing: Testing which basically involves white box testing or testing done
through Software Testing tools like Load Runner, QTP etc.
Lastly I would like to mention a particular type of testing called
Exhaustive Testing
Here you try to test for every possible condition, hence the name. This is as one would note pretty much infeasible as the number of test conditions could be infinite.
Firstly there are various tests one can perform. The Question is how does one organize it. Testing is a Vast & enjoying process.
Start Testing with
1.Smoke Testing. Once passed , go ahead with Functionality Testing. This is the Backbone of Testing. If Functionality works fine then 80% of Testing is profitable.
2.Now go with User Interface testing. AS at times User Interface is something that attracts the Client more than functionality. It is the look & feel that a client gets more attracted to it.
3.Now its time to have a look on Cosmetics bugs. Generally these bugs are ignored because of time constraint. But these play a major role depending on the page it is found. A spelling mistake turns to be Major when found on Splash Screen Or Your landing page or the App name itself. Hence these can not be overlooked as well.
4.Do Conduct Compatibility Testing. i,e Testing on Various Browsers & browser Versions. May be devices & OS for Responsive applications.
Happy testing :)

How do you organise/layout your test scripts

I'm interested in how others organise their test scripts or have seen good test scripts organised anywhere they've worked. Also, what level of detail is in those test scripts. This specifically relates to test scripts created for manual testing as opposed to those created for any automated test purposes.
The problem as I see it is this, there is a lot of complexity in test scripts but without the benefit of the principles used in organising a complex or large code base. You need to be able to specify what a piece of code should do but without boring someone to death as they read it.
Also, How do you layout test scripts, I'm not keen to create fully specified scripts suitable to be run by data entry types as that isn't the team we have and the overhead of maintaining them seems too high. Also, it feels to me that specifying the process in such detail removes responsibility from the person actually doing the testing for the quality of the product. Do people specify every button click and value to be entered? If not then what level of detail is specified.
Tests executed by humans should be at a very high level of abstraction.
E.g. a test case for stackoverflow registration:
Good:
A site visitor with an existing OpenId
account registers as a stackoverflow
user and posts an answer.
Bad:
1) Navigate to
http://stackoverflow.com 2) Click on
the login link 3) Etc...
This is important for several reasons:
a) it keeps the tests maintainable. So you don't have to update your test script every time navigation elements are relabeled (e.g. 'login' changes to 'sign in').
b) it saves your testers from going insane from the tedium of minute details.
c) writing detailed manual test scripts is a poor use of your finite test resources.
Detailed manual test scripts will divert your testers into writing bugs for minor documentation issues. You want to use your time to find the real bugs that will impact customers.
Tests can be grouped by priority. The BVT/smoke tests could have the highest priority with functional, integration, regression, localization, stress, and performance having lower priorities. Depending on your test pass you would select a priority and run all tests with that or higher priorities. All you need to do is determine which priority a particular test is.
I try to make manual tests fit into an automated structure---you can have both.
The organization schemes used by automated tests (e.g., the xUnit frameworks) work for
me. In fact, they can be used to semi-automate the tests, by stopping and calling for a manual test to be run, or input put to be entered, or a GUI to be inspected. The scheme usually is to mirror the directory structure of the production code, or to include the tests inside the production code, sometimes as inner classes. Tests above the unit level can often be fit into the higher level directories (assuming you have a deep enough directory tree). These higher level tests can go in (mirrored) directories that have no production code, but are there for organizational purposes.
The level of detail---well, that depends, right?
Matt Andresen has provided good answer, in general case, but there are situations, when you can't do it that way. For example when you are working on validated applications, that must comply with regulations form other parties like FDA, and it goes through very intensive audit, review, sign off, than 2 answer form your example is required. Although I would opt for moving into automation with HP QuickTestPro or IBM RationaRobot in this case.
Maybe you should try with some tests repository? There are again tools from HP QualityCenter and IBM products, but this can expensive. You could find some cheaper, that will let you organize them into tree structures, by requirement/feature, assign them priorities, group them into test suits for releases, group them into regression testing suits etc...

What exactly defines production?

Like almost anyone who's been programming for a while, I'm familiar with the term "production code" and have a vague sense of what it means. However, can someone offer a semi-rigorous definition, since it seems Wikipedia and Google can't? It seems like there are a lot of gray areas in what counts as production, such as internal tools that are used by a small group of people and therefore not "formalized" in terms of UI, documentation, etc. and open source apps that are feature complete, reasonably bug free and working, but lack polish, UI and extensive testing.
When your code runs on a production system, that means it is being used by the intended audience in a real-world situation.
Production code, however, does not necessarily mean robust, reliable, or stable code. The Daily WTF provides plenty of evidence in this regard.
Production means anything that you need to work reliably, and consistently.
Whether is a build script, or a public facing web server.
When others rely on your code, particularly folks who may not understand it (i.e. even "smart" developers but perhaps not in your group, but using a library you wrote), that code is production code.
It's production because "work stops" and "money is lost" when the production code fails.
The definition as I understand it is that production code is any code that is installed or in use on a live, non-test-bed system. A server used internally to a company is a production system if it is the live system used by the employees of the company. The point here is that code running on a server internal to the company writing the code can be production code.
Usually, a good distinction when looking at internal code is whether or not the group maintaining the code is separate from the group using the code. If the groups are separate, odds are that the code is production code. If running the business depends on the code, then it is certainly production code, even if it is developed and maintained in-house.
EDIT: The short answer: If you are "betting the farm on it", it is "production".
This is a great question--an absolutely critical distinction that routinely gets everyone in trouble due to misunderstandings. The question of what is "production" is a subset of the related question of what is an "environment".
So part of the answer is that "production" is THE "environment" that is most
important and is most trusted as THE "real" thing.
So now we must define "environment" (and then revisit "production"). We are still far from a satisfactory answer.
We programmers use the term "environment" constantly to refer to computer systems consisting of hardware that is executing software. That software is the code that we wrote plus software that it depends upon, which was written by others. We write our code and integrate it with the other software, then we typically run the integrated software through an escalating series of tests (unit tests, integration tests, functional tests, acceptance tests, regression tests, etc.), until we finally run the integrated software in the full manner in which it was intended.
Of course, not everything is fully automated. There are usually numerous people involved, and they have manual processes to perform. We programmers look for ways to automate as many of these processes as possible, but there is always a "man/machine boundary" in the systems we work on. Often, there are many such boundaries in any particular case.
On the other hand, there may not be any significant automation at all. For example, we spoke of "production" way back when we had a room full of people performing manual labor which produced a product. So, there doesn't have to be any automation present in our "production" "environment". There is also a middle ground, where the automation involved does not include software, such as in the case of a person running a loom to weave cloth.
Also, there may not be a product, since we have adapted our language of "production" "environment" to include product-less service providers.
Likewise, the testing may not involve software, since we may be testing a non-software-driven machine (e.g., the loom) or even the people (training and evaluation).
Now we have touched on all the crucial elements of an "environment":
there is a purpose, an intent, being pursued
an intent requires an intender, so there must be a sponsor (a person or
group, but not a machine) that specifies the intent
that intent is pursued through various processes that are performed by
various actors
those actors may be people, they may be software executing on hardware, or they
may be non-software-driven machines, so there may or may not be automation present
Now we can properly and fully define our original terms.
An environment consists of all the processes and their actors that
collaborate to pursue a particular intent on behalf of its sponsor. That
means software executing on hardware, that means non-software-driven machines, and that
means people performing their various duties. It is the intent that primarily
defines an environment, not its processes or its actors.
Furthermore...
If the intent being pursued in a particular environment is the
sponsor's ultimate goal, which usually involves producing a product or
providing a service in exchange for money, then we refer to that
environment as production.
Now we can go a bit further.
If the intent being pursued in an environment is the verification of
processes and their actors in preparation for production, we call
that a test environment.
We further call it an integration environment if that testing involves the
initial joining together of significant individuals or groups of processes and
their actors.
If that preparation involves the "programming" of human actors to perform new
processes, or the subsequent verification (evaluation), then we call that a
training environment.
Armed with these distinctions and definitions, we can now understand several common scenarios.
An environment can be mislabeled with a name that does not match its intent, such as when a training environment is used as test.
An environment can be grossly misused, such as when integration or training is done in production.
An environment can be misrepresented, such as when key processes or actors are left unidentified (e.g., manual reconciliations, or even by ignoring the people altogether).
An environment can be retasked, by repurposing its processes and actors to a new intent. A very successful technique for some organizations is to routinely "flip" several sets of actors (servers hosting software) between production, test, training, and integration upon each release.
In most cases, a single actor (person or hardware) can execute multiple processes which can participate in multiple environments. For example, a single computer server can host software that performs production transactions while also hosting other software that performs test or training functions.
Normally, a single instance of an actor should participate in only one environment at a time. On very rare occasion, a single actor can be shared across environments if the intents are mutually compatible. Most of the time, it is very unwise to attempt such sharing because the intents are not really compatible. A perfect example is running a test process on a server that also supports production processes, resulting in downtime because the test caused the entire server to fail.
Therefore, the intent of an environment must be construed with very wide latitude, to include concepts such as availability, reliability, performance, disaster recovery, accuracy, precision, repeatability, longevity, etc. This means that the actors and processes must often be construed to include things like providing power, cooling, backups, and redundancy.
Finally, note that the situation can get quite complex. For example, a desktop computer (actor) may be tasked by the development team (sponsor) to host their source control (process), which the team relies upon for their primary jobs (production). Nevertheless, the IT staff sees that same desktop computer as simply a developer workstation (development, not production) and treats it with contempt and nonchalance when it develops a hardware problem. But the developers are producing production code, so aren't they also part of production? Perspective matters.
EDIT: Production quality
A solid verification (testing) methodology should take packaged code from development and run it through a series of tests (integration, TQA, functional, regression, acceptance, etc.) until it comes out the other side "stamped" for production use. However, that makes the package production quality, but not actually production. The package only becomes production when a sponsor actually deploys it into an environment with that ultimate level of intent.
However, if your organization merely produces that package (its product) for the consumption of others, then such a release comes as close to production as that organization will experience with respect to that product, so it is common to stretch the term production to apply rather than clarify that it is production quality. In reality, that organization's production environment consists of the actors and processes involved in its development/release efforts that result in that product.
I said it could get quite complex...
Any code that will be used by it's intended userbase would fit into my definition of 'production code'.
Of course, the grey area in that definition would be clearly defining who your userbase is.
G-Man
The production software can perform at the necessary workload without disruption or degradation of the service
Software has been successfully tested in different production scenarios
Transforming working prototype into production software which runs on fail-safe redundant architecture that can work in real business, i.e. production environment, needs time, code refactoring, and attention to details
The production code has acceptable level of maintainability and is reasonably well commented
The documentation manual explains functionality, all features and facilitates maintenance
If the production software is an international service or application, it must be localized
Production code is used by end-users, often customers under conditions described in Terms-of-Service Agreement
Production software does not necessarily mean reliable mission critical software
The software does well, what it was intended to do
Log files provide an accurate description of run-time performance and software reliability metrics and reporting which do facilitate debugging and software maintainability
I think the best way to describe it, is as any code that "leads-to" deployment and "follows-up" deployment. Deployment itself is defined as all of the activities that make a software system available for use. If your code is ready to be used by people, in-house or otherwise, then it is production code.
In simple words "Production code which is live and in use by its intended audience"
The term "production code" mixes two different concepts. One is deployment management and the other is release life cycle.
In the strict sense of the word, a system is in production when it is being used as part of business or service operation. What's not in production are development, testing, QA, demo, and staging system. Production system does not immediately imply quality.
From release life cycle's point of view, a "production" build is the build that is released to general public or clients. It is the stage after pre-alpha, alpha, beta, (feature complete, code complete, etc.) and release candidate. For shrink-wrap products that cannot easily deploy updates, reaching the production stage likely implies series of testing and bug fixes.