How to simulate production concurrency issue using tests - testing

I've have an application which receives an object from queue, transforms it and publishes it on a topic. It's a message driven bean (spring) message listener container with a fair number of inner beans.
Some strange activity recently happened on the prod boxes. We want to check if this is a concurrency issue. Which is great, but not something I've done before.
My approach is to pump a load of messages at application. Write a piece of software to listen to it's publishing topic. Consume these and process them via something similar to Junit tests which compares the objects attributes with the expected result.
I've added the above for a bit of scope on the problem but basically is there any applications on the market that I could plug into either my code or my IDE which would enable me to do that. I think this is somewhat over the capabilities for JUNIT

I would recommend JavaPathfinder (JPF) for this type of exercise.
JPF is essentially a JVM that simulates execution of your code in every possible way - e.g. simulating all the possible interleavings of the bytecode instructions. JPF works as a model checker that explores the set of possible states that an application can enter and evaluates them based on predefined rules -e.g. deadlock free, non-null, etc.
Model checkers usually fail when applications become too complex but JPF is able to execute the app without traversing the whole state space and is usually able to reach problem states pretty quickly. Try to have a look at that one.

Related

IWantToRunWhenBusStartsAndStops not for production?

New to NServiceBus (4.7.5) and just implemented an NSB host.exe hosted service (implementing IWantToRunWhenBusStartsAndStops) that detects changes to database tables and notifies subscribing web apps by publishing events, e.g. "CustomerDataWasUpdatedEvent". In the future we will perform the actual update through messagehandlers receiving commands obviously, but at the moment this publishing service just polls the database etc.
It all works well, however, approaching production, I noticed that David Boike, in his latest edition of "Learning NServiceBus", states that classes implementing
IWantToRunWhenBusStartsAndStops are really mostly for development and rarely used in production. I set up my database change detection in the Start method and it works nicely, does anyone know why this is discouraged?
Here is the comment in the actual book:
https://books.google.se/books?id=rvpzBgAAQBAJ&pg=PA110&lpg=PA110&dq=nservicebus+iwanttorunwhenbusstartsandstops+in+production+david+boike&source=bl&ots=U6sNII0nm3&sig=qIXffOVFhcy-_3qDnSExRpwRlD4&hl=sv&sa=X&ei=lHWRVc2_BKrWywPB65fIBw&ved=0CBsQ6AEwAA#v=onepage&q=nservicebus%20iwanttorunwhenbusstartsandstops%20in%20production%20david%20boike&f=false
The actual quote is:
...it isn't common to have widespread use of in a production system.
Uncommon is not the same thing as discouraged.
That said I do think there is intent here by the author to highlight the fact that further up the page they assert that this is not a good place to be doing lots of coding, as an unhandled exception can cause the whole process to fail.
The author actually does go on to mention a possible use case for when you may want to load a resource(s) to do work within the handler.
Ok, maybe it's just this scenario we have that is a bit uncommon
Agreed - there is nothing fundamentally wrong with your approach. I recently did the same thing as you for wiring up SqlDependency to listen for database events and then publish a message as a result. In these scenarios there is literally nothing else you can do other than to use IWantToRunAtStatup.
Also, David himself often trawls the nservicebus tag, maybe he'll provide a more definitive answer than mine.
I'll copy the answer I gave in the Particular Software Google Group...
I'll quote myself directly here:
An implementation of IWantToRunWhenBusStartsAndStops is a great place to create a quick interface in order to test messages during debugging by allowing you to send messages based on the console input. Apart from this, it isn't common to have widespread use of them in a production system. One possible production use case will be to provision a resource needed by the endpoint at startup and then tear it down when the endpoint stops.
I think if I could add a little bit of emphasis it would be to "widespread use". I'm not trying to say you won't/can't have an IWantToRunWhenBusStartsAndStops in production code or that avoiding them is a best practice. I am trying to say that having a ton of them is probably a code smell.
Above that paragraph in the book, I warn about IWantToRunWhenBusStartsAndStops not having any ambient transactions or try/catch stuff going on. THAT is really the key part. If you end up throwing an exception in an IWantToRunWhenBusStartsAndStops, tyou can run into big problems. If you use something like a .NET Timer and then throw an exception, you can crash your process!
Let me tell you how I screwed up on this in my first-ever NServiceBus system. The system (still in use today, from what I hear) is responsible for ingesting more than 3000 RSS feeds (probably a lot more than that now) into a CMS. So processing each feed, breaking it up into items, resizing images, encoding attached video for mobile ... all those things were handled in NServiceBus message handlers, which was scaled out to multiple servers, and that was all fantastic.
The problem was the scheduler. I implemented that as an IWantToRunWhenBusStartsAndStops (well, actually IWantToRunAtStartup at that time) and it quickly turned into a mess. I kept the whole table worth of feed information in memory so that I could calculate when to fire off the next ProcessFeed command. I was using the .NET Timer class, and IIRC, I eventually had to use threading primitives like ManualResetEvent in order to coordinate the activity. And because I was using .NET Timer, if the scheduler threw an exception, that endpoint failed and had to restart. Lots of weird edge cases and it was always a quagmire of bugs. Plus, this was now a singleton "commander app" so while the feed/item processors could be scaled out, the scheduler could not.
As I got more experienced with NServiceBus, I realized that each feed should have been a saga, starting from a FeedCreated event, controlled through PauseProcessing and ResumeProcessing commands, using timeouts to control the next processing time, and finally (perhaps) ended via a FeedRemoved event. This would have been MUCH more straightforward and everything would have executed inside transactionally-controlled message handlers.
That experience led me to be a little bit distrustful/skeptical of IWantToRunWhenBusStartsAndStops. Not saying it's bad, just something to be aware of. Always be prepared to consider if what you're trying to do couldn't be better accomplished in another way.

Self-diagnostic test for software layers?

With the increasing effort put into dividing software pieces into independent layers and decoupling them using dynamic discovery and dependency injection, it has become harder to determine which "layer" in the system is contributing to a system-wide failure of the "application".
Unit tests help by making sure that all "modules" that makes up a layer is working as expected. But unit tests are written in a such a way that each "modules" are isolated by using techniques such as stubbing and mocking.
Consider the simplistic example below:
L1. Database -> L2. Database Layer -> L3. Windows Service -> L4. Client Application
For instance, if the database engine is down, then the system will not operate normally. It will be hard to tell if the database engine is really down or that there is a bug in the database layer (L2) code. To check, you would have to launch some kind of a database management tool to check if the database engine is running.
What we are trying to achieve is a developer tool that can be launched whenever "there is something wrong" with the system, and this tool will "query" each layer for their "integrity" or "diagnostic" data. The tool will provide the list of software layers and their "integrity status". It will then be able to say, right off the bat, that layer X is the cause of the issue (I.e. The database engine is down).
Of course, each layer will be responsible for providing its own "diagnostic method" that can be queried by the tool.
I guess what we're trying to achieve here is some kind of an "Integration Test" framework or something similar that can be used in run-time (not compile/build time like a unit test). The inspiration came from physical devices that have their own "On-board Diagnostics" like cars. A good example in the software world is the Power-on Self Test that is being run every time the computer is being turned-on.
Have anyone seen or heard something like this? Any suggestions or pointers will surely help a lot!
You could have a common interface that each layer would implement as a WCF service. This way you could connect to every layer and diagnose it. Having this diagnostics available might be useful but if you want to implement it everywhere (every layer) it will be another thing that may fail - how would you diagnose that? Your system would swarm with WCF services which is not good as it requires a lot of maintenance and makes it less stable. Plus it requires a lot of work to implement.
Alternative that I would suggest is to have a good logging system in place. Minimum would be to have every module log an error in all catch sections but I suggest more than that especially for debug purposes. I recommend using Log4Net which is free and very flexible. It is effective in not logging when it is not required meaning that you can set the logging level high and it does not impact performance even in production code. You can change logging level during runtime by changing a setting in a config file. I use Log4Net a lot and it works well.
Once you have your code logging stuff you can configure Log4Net so that all logs go into a central database. You would then have one place where you could relatively easy diagnose what's going on, what has failed, where and what was the exception or message. You can even set up email messages to be sent when something goes wrong.

Monitoring service with web API and GUI

I need to do cross platform (Linux/Windows) application, which runs in the background on the server, performs monitoring of the local system via net sockets as well it monitors CPU usage plus it's using some external commands to find out state of the system, starts / stops non-responsive or faulty processes on the same server, reports results to remote mysql cluster as well exposes web API and GUI, so one can see what is happening and configure it.
In case this process faults, it needs to auto-restart, send mail confirmation etc.
I know it can sound weird, but the app could be designed the way, that it runs e.g. 100 monitoring threads (each as separate thread process), which each one is not blocking another one. This is easier to implement than single thread with all non-blocking stuff.
Note that simplicity of implementation is more important than actually philosophy of coding. This is also important because of real-time requirements - each check must be performed every 1 second and act instantly.
I have done such an app in C# and it took me only one day to do it. It's using windows specific stuff so it's not portable to Mono at all, but it runs now for 6 months on 40 production systems and it never faulted because it's handling all exceptions properly, that's why I would prefer to do it in a language which has nice try/catch statements like C#, it's just plain effective.
This windows app is also using standard winforms / WPF interface for configuring etc, but for Linux I guess I would need to use web based GUI.
Now I need to do the same for Linux. but I would do something which runs on Windows too, so both share at least part of the same code.
There are some performance requirements - this app receives messages via UDP via 50 ports (it needs to make sure that the server is receiving streams), each at 10Mbit and C# has no problem with handling it. It's really just checking if the streams are on, it's not analyzing these packets.
I would like to ask you, what is the best architecture / language to design and implement this kind of app, so it's most optimal? E.g. one can do PHP gui interface and C/C++ backend monitoring (or perl or python), but to run PHP I will need to run apache, which is a bit complexity added.
Since these 10Mbit streams must be handled the way, that exactly 1 second of UDP stream not coming in must be detected, I was wondering if Java with garbage collector which likes to pause is a good choice (well, these pauses seems to be proportional to memory usage, which would not be huge in this case), but I guess it could be viable option, but I dont know how to design such an app, so it runs proper background process as well web interface.
Do I need to run glassfish or tomcat? Do I need to use any libraries?
These servers are having lot's of memory and CPU.
It's just mainly for managing availability in the clusters.
I used to program java 10 years ago in Eclipse and Netbeans, but I have lot's of time for learning :D I have both installed on my laptop - they are both very nice.

Testing fault tolerant code

I’m currently working on a server application were we have agreed to try and maintain a certain level of service. The level of service we want to guaranty is: if a request is accepted by the server and the server sends on an acknowledgement to the client we want to guaranty that the request will happen, even if the server crashes. As requests can be long running and the acknowledgement time needs be short we implement this by persisting the request, then sending an acknowledgement to the client, then carrying out the various actions to fulfill the request. As actions are carried out they too are persisted, so the server knows the state of a request on start up, and there’s also various reconciliation mechanisms with external systems to check the accuracy of our logs.
This all seems to work fairly well, but we have difficult saying this with any conviction as we find it very difficult to test our fault tolerant code. So far we’ve come up with two strategies but neither is entirely satisfactory:
Have an external process watch the server code and then try and kill it off at what the external process thinks is an appropriate point in the test
Add code the application that will cause it to crash a certain know critical points
My problem with the first strategy is the external process cannot know the exact state of the application, so we cannot be sure we’re hitting the most problematic points in the code. My problem with the second strategy, although it gives more control over were the fault takes, is I do not like have code to inject faults within my application, even with optional compilation etc. I fear it would be too easy to over look a fault injection point and have it slip into a production environment.
I think there are three ways to deal with this, if available I could suggest a comprehensive set of integration tests for these various pieces of code, using dependency injection or factory objects to produce broken actions during these integrations.
Secondly, running the application with random kill -9's, and disabling of network interfaces may be a good way to test these things.
I would also suggest testing file system failure. How you would do that depends on your OS, on Solaris or FreeBSD I would create a zfs file system in a file, and then rm the file while the application is running.
If you are using database code, then I would suggest testing failure of the database as well.
Another alternative to dependency injection, and probably the solution I would use, are interceptors, you can enable crash test interceptors in your code, these would know the state of the application and introduce the above listed failures at the correct time, or any others you may want to create. It would not require changes to your existing code, just some additional code to wrap it.
A possible answer to the first point is to multiply experiments with your external process so that probability to impact problematic parts of code is increased. Then you can analyze core dump file to determine where the code has actually crashed.
Another way is to increase observability and/or commandability by stubbing library or kernel calls, i.e., without modifying your application code.
You can find some resources on Fault Injection page of Wikipedia, in particular in Software Implemented Fault Injection section.
Your concern about fault injection is not a fundamental concern. You merely need a foolproof way to prevent such code ending up in deployment. One way to do so is by designing your fault injector as a debugger. I.e. the faults are injected by a process external to your process. This already provides a level of isolation. Furthermore, most OS'es provide some kind of access control which prevents debugging unless specifially enabled. In the most primitive form, it's by limiting it to root, on other operating systems it requires a specific "debug privilege". Naturally, on production nobody will have that, and thus your fault injector cannot even run on production.
Practially, the fault injector can set breakpoints at specific addresses, i.e. function or even line of code. You can then react to that, e.g. by terminating the process after a certain breakpoint is hit three times.
I was just about to write the same as Justin :)
The component I would suggest to replace during testing could be the logging component (if you have one, if not, I'd strongly suggest to implement one...). It's relatively easy to replace it with code that generates error and the logger usually gets enough information to know the current application state.
Also it seems to be feasible to make sure that the testing code doesn't go into production. I would discourage conditional compilation though but rather go with some configuration file to select the logging component.
Using "random" kills might help to detect errors but is not well suited for systematic testing because of its non-determinism. Therefore I wouldn't use it for automatic tests.

Unattended Processing - Application Automation

I'm looking for information [I hesitate to infer "Best Practices"] for Automating Applications. I'm specifically referring to replacing that which is predictably repeatable through traditional manual means [humans manipulating the GUI] with something that is scheduled by the User and performed "Automatically".
We use AutoIT internally for performing Automated Testing and have considered the same approach for providing Unattended Processing of our applications, but we're reluctant due to the possibility of the user "accidentally" interacting with the Application in parallel with the execution of a scheduled "automation" and therefore "breaking" the automation.
Shy of building in our own scheduler with known events and fixed arguments for controlling a predefined set of actions, what approaches should I evaluate/consider and which tools would be required?
Additional Information:
Some would refer to this capability as "Batch Processing" within the application context.
In general it is a hazardous practice to automate UIs. It can be a useful hack for a short term problem: I find myself using AutoHotKey to run some tedious tasks in some situations... but only if the task is not worthy of writing code to implement the change (i.e., a one time, 15 minute task).
Otherwise, you will likely suffer from inconsistent runs due to laggy response of some screens, inconsistent UIs, etc. Most applications have an API available, and not using it is going to be far more painful than acquiring and using it in 99% of cases.
In the unfortunate but possible situation that there is no UI and you are reduced to screen scraping/manipulating, a tool that performs automated testing is probably as good as you will get. It allows you to verify the state of the app (to some degree) and thus can build some safety nets in. Additionally, I would dedicate a workstation to this task... with the keyboard and mouse locked away from curious users. (A remote desktop or VNC style connection works well for this: you can kick off the process and disconnect, making it resistant to tampering.)
However, I would consider that approach only as a desperate last resort. Manipulating an API is far, far, far, far (did I get enough "fars" in there?) more sustainable.
If I understand correctly, you want to do automated processing using some tool that will execute a predefined list of actions in a given software system. This being different from automated testing.
I would urge you to avoid using tools meant for testing to perform processing. Many major software systems have public APIs you can use to perform actions without direct user interaction. This is a much more robust and reliable way to schedule automated processes. Contact the vendor of the software you are working with, sometimes the API's are available upon request.
Godeke and Dave are absolutely correct that, if available, the API is the best route. However, practically this is sometimes not possible, and you have to go the GUI automation route. In addition to the previously mentioned dedicated workstation(s) to run the automation, I recommend coding in some audit trails, so that it is easier to debug or backtrack if problems arise. Your batch processing automation should keep a detailed log of what records were processed, when they were processed and how they were processed. You should set it up so that the records themselves (in the native application) will reflect that it was updated/processed via automation. For example, if each record has an updateable notes/comments field, the automation should add text to this field like "Processed by automation user, 2009-02-25 10:05:11 AM, Account field changed from 'ABC123' to 'DEF456'" That way, the automated mods will be readily apparent to a user manually pulling up the record in the GUI.