Self-diagnostic test for software layers? - testing

With the increasing effort put into dividing software pieces into independent layers and decoupling them using dynamic discovery and dependency injection, it has become harder to determine which "layer" in the system is contributing to a system-wide failure of the "application".
Unit tests help by making sure that all "modules" that makes up a layer is working as expected. But unit tests are written in a such a way that each "modules" are isolated by using techniques such as stubbing and mocking.
Consider the simplistic example below:
L1. Database -> L2. Database Layer -> L3. Windows Service -> L4. Client Application
For instance, if the database engine is down, then the system will not operate normally. It will be hard to tell if the database engine is really down or that there is a bug in the database layer (L2) code. To check, you would have to launch some kind of a database management tool to check if the database engine is running.
What we are trying to achieve is a developer tool that can be launched whenever "there is something wrong" with the system, and this tool will "query" each layer for their "integrity" or "diagnostic" data. The tool will provide the list of software layers and their "integrity status". It will then be able to say, right off the bat, that layer X is the cause of the issue (I.e. The database engine is down).
Of course, each layer will be responsible for providing its own "diagnostic method" that can be queried by the tool.
I guess what we're trying to achieve here is some kind of an "Integration Test" framework or something similar that can be used in run-time (not compile/build time like a unit test). The inspiration came from physical devices that have their own "On-board Diagnostics" like cars. A good example in the software world is the Power-on Self Test that is being run every time the computer is being turned-on.
Have anyone seen or heard something like this? Any suggestions or pointers will surely help a lot!

You could have a common interface that each layer would implement as a WCF service. This way you could connect to every layer and diagnose it. Having this diagnostics available might be useful but if you want to implement it everywhere (every layer) it will be another thing that may fail - how would you diagnose that? Your system would swarm with WCF services which is not good as it requires a lot of maintenance and makes it less stable. Plus it requires a lot of work to implement.
Alternative that I would suggest is to have a good logging system in place. Minimum would be to have every module log an error in all catch sections but I suggest more than that especially for debug purposes. I recommend using Log4Net which is free and very flexible. It is effective in not logging when it is not required meaning that you can set the logging level high and it does not impact performance even in production code. You can change logging level during runtime by changing a setting in a config file. I use Log4Net a lot and it works well.
Once you have your code logging stuff you can configure Log4Net so that all logs go into a central database. You would then have one place where you could relatively easy diagnose what's going on, what has failed, where and what was the exception or message. You can even set up email messages to be sent when something goes wrong.

Related

I want a sandboxed test environment that is *always* an exact copy of Production

I'm having an issue with a web application I am responsible for maintaining.
The system experiences regular bugs, and our support vendors are always asking us to see if we can "replicate the error in UAT". This is obviously a reasonable request. A lot of the time, for various reasons (some of which are clear, some of which are not), these errors are not present in UAT. This lack of bug reproducability in a testing environment is adding huge amounts of friction to the bug resolution process.
There are 3 key pieces of our system architecture where these bugs are flaring (the CMS, the API layer, and the database). I am proposing we set up a system job that perpetually clones these 3 parts of the system in to a sandboxed test environment. This cloning would happen periodically (eg, once every 24 hours), and automatically.
Is there a technical term for this sort of environment? Is this an established method of helping diagnose system issues? Is there somewhere I can read up on the industry best practices for establishing something like this? Thanks.
The technical term for this kind of process is replication it is often done for some systems like databases, but normally not for testing purpose, but in order to increase available, so the replication is used as a failover spare.
An exact copy of a production system, with all the data is not you'll find often, due to the high demand on resources. Also at some points to two systems have to differ. Most systems (I know of) have tons of interfaces you just can't copy a complete system systems.
Also: you only need the copy of the production system when you actually debugging an issue. And if you are in the middle of that you probably don't want everything to go away and get replaced by a new copy.
So instead I would recommend to setup scripts that allows to obtain a copy of the relevant parts on demand.
Also you might want to consider how you might be able to modify your system to make it easier to setup a copy.
For example, when you have all the setup automated (with chef/docker or similar) you should be able to setup the same system again anywhere you want, so you now you just have to get the production data over.
Which is an interesting point. Production data often contains secret information (because it is vital to the business, or because it is personal data). You don't want this kind of stuff hang around in a test system everybody can access.

How to get started with WCF Performance profiling

I'm trying to figure out how to profile a WCF service so I can identify any bottlenecks.
I have found a bit of information on line, but nothing that assumes no prior knowlege which is where I'm at.
What are recomended FREE tools?
- visual studio tools
- clrprofiler
Here is information I found using vsperfcmd.exe to profile wcf service and according to this it is very simple, but I need to fill in the gaps on where to start. My assumptions are to copy VsPerfCLREnv and VsPerfCmd to the server that hosts my wcf service and perform some configuraiton steps that I'm not quite sure on. I'm also not quite sure how I would be able to see the call stack to evaluate the performance of each call.
clrprofiler seems a bit simpler. I assume I would copy clrprofiler.exe to the server, File->Profile Service and add the name and start/stop commands. (is this a friendly name or filename or the service display name?) I assume I would then run my tests against the service and I could see the call stack in clrprofiler. Does that sound correct?
[edit]
I'm not so interested in testing the network since this is on a test server, and this is a large wcf project with multiple devs on it and I am unable to make changes to the project for the sole purpose of monitoring the performance. I want to focus on the performance of the actual methods within it.
Any assistance on getting started is greatly appreciated.
For WCF it is not enough to profile your code only as bunch of things happen on the channel stack (security, deserialization, formatting etc). A good way to visualise that is by using WCF Tracing at verbose level and then using the service trace viewer to see how long it is taking at each step of message processing. Read here on how to configure and use WCF tracing. This is the single most thing that has hepled me with diagnosing WCF issues.
Of course all other code profiling, DB profiling etc. are valid approach as well. You may even use a tool like SoapUI to test your network communication and client side performance overhead for a more end-to-end benchmark.
some things I've learned that someone might find helpful:
you cannot remote profile a service, even over your local network. The profiler must be running on the same machine as the service. (This actually took me quite a while to figure out. Maybe obvious to you, but it was never spelled out so I kept trying to do it)
Visual Studio didn't work for me to profile my WCF service. I was able to get a bit of help from the VS profiler team, but never came out of it with a working solution.
VS was slow to connect and disconnect the profiler and often instrumented my binaries and left them in a corrupted state.
.net binaries do not need to be instrumented since they contain the metadata of the methods which is odd that visual studio kept hosing my binaries trying to instrument them.
I also tried the VS standalone profiler but this is very complex to use and requires reboots of my server.
I ended up getting an internal profiler to work (after getting a private build from the team) so I'm not sure how many profilers out there are designed to work with a WCF service.
I actually set the profiler to watch the WAS service and then added my additional binaries to the profiler.
process explorer is useful when troubleshooting if the profiler is connected or not. Use it to look at inetinfo.exe environment
Can you run it under a debugger?
Can you stand a simple, old-fashioned, method that just works? Here's one.
In addition to Mike's comments, you can use the built-in WCF performance counters to see a number of performance-related metrics and you can also see call times on a WCF trace. Once you know which operations are 'slow' it's usually easier to add some custom timing/logging code to those operations than using a general purpose profiler for something like this. This coming from someone who used to work on commercial profilers.
Tools you should look into: svctracelogviewer (and turn on tracing in both your service and clients). SoapUI for simulating load (and do analysis) and Fiddler, an excellent HTTP sniffer/diagnostics tool.

Testing fault tolerant code

I’m currently working on a server application were we have agreed to try and maintain a certain level of service. The level of service we want to guaranty is: if a request is accepted by the server and the server sends on an acknowledgement to the client we want to guaranty that the request will happen, even if the server crashes. As requests can be long running and the acknowledgement time needs be short we implement this by persisting the request, then sending an acknowledgement to the client, then carrying out the various actions to fulfill the request. As actions are carried out they too are persisted, so the server knows the state of a request on start up, and there’s also various reconciliation mechanisms with external systems to check the accuracy of our logs.
This all seems to work fairly well, but we have difficult saying this with any conviction as we find it very difficult to test our fault tolerant code. So far we’ve come up with two strategies but neither is entirely satisfactory:
Have an external process watch the server code and then try and kill it off at what the external process thinks is an appropriate point in the test
Add code the application that will cause it to crash a certain know critical points
My problem with the first strategy is the external process cannot know the exact state of the application, so we cannot be sure we’re hitting the most problematic points in the code. My problem with the second strategy, although it gives more control over were the fault takes, is I do not like have code to inject faults within my application, even with optional compilation etc. I fear it would be too easy to over look a fault injection point and have it slip into a production environment.
I think there are three ways to deal with this, if available I could suggest a comprehensive set of integration tests for these various pieces of code, using dependency injection or factory objects to produce broken actions during these integrations.
Secondly, running the application with random kill -9's, and disabling of network interfaces may be a good way to test these things.
I would also suggest testing file system failure. How you would do that depends on your OS, on Solaris or FreeBSD I would create a zfs file system in a file, and then rm the file while the application is running.
If you are using database code, then I would suggest testing failure of the database as well.
Another alternative to dependency injection, and probably the solution I would use, are interceptors, you can enable crash test interceptors in your code, these would know the state of the application and introduce the above listed failures at the correct time, or any others you may want to create. It would not require changes to your existing code, just some additional code to wrap it.
A possible answer to the first point is to multiply experiments with your external process so that probability to impact problematic parts of code is increased. Then you can analyze core dump file to determine where the code has actually crashed.
Another way is to increase observability and/or commandability by stubbing library or kernel calls, i.e., without modifying your application code.
You can find some resources on Fault Injection page of Wikipedia, in particular in Software Implemented Fault Injection section.
Your concern about fault injection is not a fundamental concern. You merely need a foolproof way to prevent such code ending up in deployment. One way to do so is by designing your fault injector as a debugger. I.e. the faults are injected by a process external to your process. This already provides a level of isolation. Furthermore, most OS'es provide some kind of access control which prevents debugging unless specifially enabled. In the most primitive form, it's by limiting it to root, on other operating systems it requires a specific "debug privilege". Naturally, on production nobody will have that, and thus your fault injector cannot even run on production.
Practially, the fault injector can set breakpoints at specific addresses, i.e. function or even line of code. You can then react to that, e.g. by terminating the process after a certain breakpoint is hit three times.
I was just about to write the same as Justin :)
The component I would suggest to replace during testing could be the logging component (if you have one, if not, I'd strongly suggest to implement one...). It's relatively easy to replace it with code that generates error and the logger usually gets enough information to know the current application state.
Also it seems to be feasible to make sure that the testing code doesn't go into production. I would discourage conditional compilation though but rather go with some configuration file to select the logging component.
Using "random" kills might help to detect errors but is not well suited for systematic testing because of its non-determinism. Therefore I wouldn't use it for automatic tests.

BPMS or just plain programming?

What do you prefer (from your developer's point of view) when it comes to implement a business process?
A Business Process Management System (BPMS) or just your favorite IDE with the needed tools and frameworks (a reporting tool for example)?
What is from your point of view the greatest Benefit of a BPMS compared to an IDE with your personal tools and frameworks?
OK. Maybe I should be more specific... I got to know one specific BPMS which should make it easy to implement a business process by configuring rules. But for me as a developer it is hard to work with the system. I would like to work with text files which I can refactor and I would like to be able to choose the right technology or framework for the job I have to do. Instead the system forces me to configure.
There are rules where I can use java, but even then I have to stick to the systems editor without intellisense etc.
So this leads me to the answer of my own question - I would like to use the tools I am used to instead of having to learn how to work with a BPMS (at least the one I know) because it limits me more than it helps. The BPMS I know is a framework from which it is hard to escape! At this time, I would prefer a framework like Grail over any BPMS I know.
So maybe the more specific question is: do you feel the same or are there BPMSes which support you in beeing a developer and think like a developer or do most of them force you to do your job a different way?
In my experience the development environments provided by BPMS systems are third rate, unproductive, and practically force you to write hard to maintain, poorly designed code (due to their limitations). Almost all the "features" (UI, integrations, etc) provided by the BPMS system I'm familiar with (the one sold by that company named for its database) were not worth the money we paid.
If you're forced to use BPMS, as a developer, my advice would be to build as much of your application in a conventional development environment, such as Java or .Net, build as little as possible in the BPMS environment itself, and integrate the two. The only things that should go in the BPMS is the minimum to make the business process work.
Not sure what exactly you ask, but the choice BPM vs. plain programming will depend on the requirements. A "business process" is a relatively vague term in software engineering.
Here are a few criterion to evaluate your needs:
complexity of the rules - Are the decisions/rules embodied in your process simple, complicated, configurable, hard-coded?
volatility of the process - How frequently does your process change? Who should be able to make the change?
integration need - Is your process realized using multiple heterogenous services, or is all implemented in the same language?
synchronous/asynchrounous - Is your process "long-running" with the need to handle asynchronous actions?
human tasks - Does your process involves human interaction, with task being assigned/routed to people according to their roles/responsibilities?
monitoring of the process - What is the level of control you want on the existing process instances being executed? Do you need to audit the actions, etc. ?
error handling - Depending on the previous points, how do you plan to deal with errors, or retry of faulty process execution?
Depending on the answer to these questions, you may realize that your process is closer to a simple state chart with a few actions and decisions that can be executed in a sequence, or you may realize that you need something more elaborated, and that you don't want to re-implement all that yourself.
Between plain programming and a full-fledge BPM solution (e.g. Oracle BPM suite which contains BPEL, rule engine, etc.), there are intermediate solutions such as jBPM or Windows Workflow Foundation and probably a lot of others. These intermediate solution are frequently good trade-off.
I have worked with Biztalk in the past and more recently with JBPM. My opinion is biased against BPMs for the following reasons:
Steep learning curve : To make a process work, I have to understand how the system and the editor works. It is hard enough for a developer to understand the system, let alone a business user. The drag and drop and visual representation is a great demo tool. It certainly impresses managers (who ultimately pay for it), but a developer's productivity just drops.
Non developers changing the workflow : I haven't seen one BPM solution do it flawlessly. Though it doesn't look like code, right click on the box and you do have to put some code, otherwise it is not going to work. So you definitely need a developer to do it. The best part is that it is neither developer friendly nor business user friendly, just demo user friendly.
Testablity and refactoring : It is virtually impossible to test drive a BPMS. You do have 'unit test frameworks' advertised, but most of them are hacks and hard to use. Recently I tried the JBPM one; I ended up writing a lot of glue code and fake workflow handlers to make it work. The deal breaker for me though is refactoring. If the business radically changes it's mind about how a business process should look, then good luck re-arranging the boxes, because just re-arranging them won't work, all the variables bound to the boxes also need to be re-arranged. I would prefer the power of the IDE and tests to refactor my business process.
If your application has workflow, then you could try a workflow library (with or without persistent state). It will still manage your workflows without all the bloat that comes with a BPM. If a business user needs to understand the code, then let the business prepare good process flowcharts and translate them into good domain driven code. Use cucumber style acceptance tests to make bring the developers and business together. A BPM is just something that tries to do too many things and ends up doing all those things badly.
BPMS-- a lot of common business case, use case are already implemented. So you just have to know how to use it. For common workflow, you don't even need to write a single line of code, though mostly you would have to write some scripts to cover things that are not yet implemented.
Plain programming-- just use the IDE to hack out the code. The positive side: more control. The negative? A lot of times are spent on rewriting boilerplate code. And you have to maintain them.
So in a nutshell, I would prefer a Business Process Management System. One that I would recommend is ProcessMaker. It features an intuitive process designer that allows you to design workflow with drag and drop. And you can always write trigger to extend the process functionalities. It's open source as well.

How you test your applications for reliability under badly behaving i/o

Almost every application out there performs i/o operations, either with disk or over network.
As my applications work fine under the development-time environment, I want to be sure they will still do when the Internet connection is slow or unstable, or when the user attempts to read data from badly-written CD.
What tools would you recommend to simulate:
slow i/o (opening files, closing files, reading and writing, enumeration of directory items)
occasional i/o errors
occasional 'access denied' responses
packet loss in tcp/ip
etc...
EDIT:
Windows:
The closest solution to do the job as described seems to be holodeck, commercial software (>$900).
Linux:
Open solution wasn't found by now, but the same effect
can be achived as specified by smcameron and krosenvold.
Decorator pattern is a good idea.
It would require to wrap my i/o classes, but resulting in a testing framework.
The only remaining untested code would be in 3rd party libraries.
Yet I decided not to go this way, but leave my code as it is and simulate i/o errors from outside.
I now know that what I need is called 'fault injection'.
I thought it was a common production-line part with plenty of solutions I just didn't know.
(By the way, another similar good idea is 'fuzz testing', thanks to Lennart)
On my mind, the problem is still not worth $900.
I'm going to implement my own open-source tool based on hooks (targeting win32).
I'll update this post when I'm done with it. Come back in 3 or 4 weeks or so...
What you need is a fault injecting testing system. James Whittaker's 'How to break software' is a good read on this subject and includes a CD with many of the tools needed.
If you're on linux you can do tons of magic with iptables;
iptables -I OUTPUT -p tcp --dport 7991 -j DROP
Can simulate connections up/down as well. There's lots of tutorials out there.
Check out "Fuzz testing": http://en.wikipedia.org/wiki/Fuzzing
At a programming level many frameworks will let you wrap the IO stream classes and delegate calls to the wrapped instance. I'd do this and add in a couple of wait calls in the key methods (writing bytes, closing the stream, throwing IO exceptions, etc). You could write a few of these with different failure or issue type and use the decorator pattern to combine as needed.
This should give you quite a lot of flexibility with tweaking which operations would be slowed down, inserting "random" errors every so often etc.
The other advantage is that you could develop it in the same code as your software so maintenance wouldn't require any new skills.
You don't say what OS, but if it's linux or unix-ish, you can wrap open(), read(), write(), or any library or system call etc, with an LD_PRELOAD-able library to inject faults.
Along these lines:
http://scaryreasoner.wordpress.com/2007/11/17/using-ld_preload-libraries-and-glibc-backtrace-function-for-debugging/
I didn't go writing my own file system filter, as I initially thought, because there's a simpler solution.
1. Network i/o
I've found at least 2 ways to simulate i/o errors here.
a) Running a virtual machine (such as vmware) allows to configure bandwidth and packet loss rate. Vmware supports on-machine debugging.
b) Running a proxy on the local machine and tunneling all the traffic through it. For the case of upd/tcp communications a proxifier (e.g. widecap) can be used.
2. File i/o
I've managed to deduce this scenario to the previous one by mapping a drive letter to a network share which resides inside the virtual machine. The file i/o will be slow.
A cheaper alternative exists: to set up a local ftp server (e.g. FileZilla), configure speeds and use Novell's NetDrive to access it.
You'll wanna setup a test lab for this. What type of application are you building anyway? Are you really expecting the application be fed corrupt data?
A test technique I know the Microsoft Exchange Server people tried was sending noise to the server. Basically feeding every possible input with seemingly random data. They managed to crash the server quite often this way.
But still, if you can't trust input that hasn't been signed then general rules apply. Track every operation which could potentially be untrusted (result of corrupt data) and you should be able to handle most problems gracefully.
Just test your application behavior on random input, that should catch most problems but you'll never be able to fully protect your self from corrupt data. That's just not possible, as the data could be part of some internal buffer being handed off within the application itself.
Be mindful of when and how you decode data. That is all.
The first thing you'll need to do is define what "correct" means under these circumstances. You can only test against a definition of what behaviour is intended.
The tactics of testing will depend on technology. In the context of automated unit testing, I have found it very useful, in OO languages such as Java, to use various flavors of "mocking" or "stubbing" to pass e.g. misbehaving InputStreams to parts of my code that used file I/O.
Consider holodeck for some of the fault injection, if you have access to spare hardware you can simulate network impairment using Netem or a commercial product based on it the Mini-Maxwell, which is much more expensive than free but possibly easier to use.