Are there well-identified patterns for software scalability testing? - testing

I've recently become quite interested in identifying patterns for software scalability testing. Due to the variable nature of different software solutions, it seems to like there are as many good solutions to the problem of scalability testing software as there are to designing and implementing software. To me, that means that we can probably distill some patterns for this type of testing that are widely used.
For the purposes of eliminating ambiguity, I'll say in advance that I'm using the wikipedia definition of scalability testing.
I'm most interested in answers proposing specific pattern names with thorough descriptions.

All the testing scenarios I am aware of use the same basic structure for the test which involves generating a number of requests on one or more requesters targeted at the processing agent to be tested. Kurt's answer is an excellent example of this process. Generally you will run the tests to find some thresholds and also run some alternative configurations (less nodes, different hardware etc...) to build up an accurate averaged data.
A requester can be a machine, network card, specific software or thread in software that generates the requests. All it does is generate a request that can be processed in some way.
A processing agent is the software, network card, machine that actually processes the request and returns a result.
However what you do with the results determines the type of test you are doing and they are:
Load/Performance Testing: This is the most common one in use. The results are processed is to see how much is processed at various levels or in various configurations. Again what Kurt is looking for above is an example if this.
Balance Testing: A common practice in scaling is to use a load balancing agent which directs requests to a process agent. The setup is the same as for Load Testing, but the goal is to check distribution of requests. In some scenarios you need to make sure that an even (or as close to as is acceptable) balance of requests across processing agents is achieved and in other scenarios you need to make sure that the process agent that handled the first request for a specific requester handles all subsequent requests (web farms are commonly needed like this).
Data Safety: With this test the results are collected and the data is compared. What you are looking for here is locking issues (such as a SQL deadlock) which prevents writes or that data changes are replicated to the various nodes or repositories you have in use in an acceptable time or less.
Boundary Testing: This is similar to load testing except the goal is not processing performance but how much is stored effects performance. For example if you have a database how many rows/tables/columns can you have before the I/O performance drops below acceptable levels.
I would also recommend The Art of Capacity Planning as an excellent book on the subject.

I can add one more type of testing to Robert's list: soak testing. You pick a suitably heavy test load, and then run it for an extended period of time - if your performance tests usually last for an hour, run it overnight, all day, or all week. You monitor both correctness and performance. The idea is to detect any kind of problem which builds up slowly over time: things like memory leaks, packratting, occasional deadlocks, indices needing rebuilding, etc.
This is a different kind of scalability, but it's important. When your system leaves the development shop and goes live, it doesn't just get bigger 'horizontally', by adding more load and more resources, but in the time dimension too: it's going to be running non-stop on the production machines for weeks, months or years, which it hasn't done in development.

Related

PDF generation performance

I can't find enough data about pdf generation performance. I'm planning to create some system and one of its features is to generate PDFs. Mostly simple ones that have about 3-5 pages only with text and tables, occasionally some logo.
What's bothering me is the requirement to support high user traffic (about 2500 requests per second).
Do you know any tools (preferably in java) that are fast and reliable to serve that bunch of users as fast as possible ? How long will it take to serve this amount of people on a single, average machine? I would appreciate any info about experience on this topic.
You almost certainly have to execute some tests with your typical workload on your typical machine. This is probably the only way you can evaluate whether any tools will be able to do what you need.
2500 requests per second is a non-trivial requirement so you are right to be concerned. If that 2500/sec is a sustained load and each request has to produce the 3-5 page pdf you simply might not be able to keep up on a "single average machine". It's not only processing power you'll have to consider, but memory and IO performance.
From experience iText is fast and Docmosis has some built-in facilities to distribute load to other hosts. I've seen both working stably under load. Be careful with memory management when you have that many documents on the fly - if you fall behind you might "blow up" no matter what document engine you use.

stress testing web applications on less capable hardware

My organization is having an interesting internal debate right now that raises a question that I would like to open to the community at large.
The issue at hand is our environment in which we do stress-testing, capacity-testing, performance-regression-testing, and the like.
On one side of the debate are some software engineers who would like this environment to mirror the production environment as much as possible, in the interest of making the results as meaningful as possible. While we currently do have an environment for such testing, it is far less capable than the production system, and these software engineers feel that they are reaching the limits of what they can learn from it.
On the other side of the debate are some network engineers who both administer the environments and control the purse-strings. While they concede that capacity-testing would be better in an environment that is a better replica of the production environment, they argue that – for the purposes of stress testing – a more modest environment would have the effect of magnifying performance bottlenecks, making them easier to discover and fix.
This finally brings us to the part that piqued my interest: one software engineer suggests that while a more modest stress environment will increase the likelihood that you will encounter some bottleneck, it does not necessarily follow that it would help you find the next bottleneck you may encounter in production. The scaling effect, he argues, may not be linear.
Is there merit to that point of view? If yes, then why? What are the sources of that nonlinearity?
There are a lot of moving parts involved here: a cluster of java application servers, a cluster of database servers, lots of dynamic content being generated for each HTTP hit.
Edit: I appreciate everybody's thoughts so far, but I was really hoping that someone would do more than re-affirm one side or the other and actually tackle the question of "why". If there is such nonlinearity, what gives rise to it? Better yet it would be great if the reasons were expressed in terms of the CPU, memory, bandwidth, latency, interactions between subsystems, what have you... TerryE, you have come the closest. You should re-post your comment as an answer for the bounty if no one else steps up
Your software developer is right and I will take the point even further.
When you test an application components, like a web service, to see its behaviour under load, it is understandable to use a less capable environment. You can find the bottlenecks about memory, io etc. And most probably will find bugs and oversights like out of memory errors and log files getting huge.
But when your application components are running as intended and you need to test the whole shebang, you need to test the real environment.
When you run stress tests on an environment, you measure that environment's behaviour under load and its bottlenecks. While this tests may provide valuable information, this information will not be about your production system. The bottlenecks you find might not be relevant to your real system and you may spend precious development time to fix the bugs that do not exist. To know about bottlenecks you really might face with, you should run your stress tests on your real production system (preferably before the grand opening).
The assumption of the network engineers is that modest system is basically a scale model of the production system. They are also assuming that the various characteristics of the production environment which would be affecting the software performance are mirrored in the more modest system just at lower levels however in the same ratios. For instance, the CPU is not as fast, there is not quite as much memory, the storage is a bit slower, etc. and all of these differences are in similar ratios such that if everything were magically multiplied by some factor, say 1.77, the resulting changed modest system would be exactly like the production system.
However that the modest system is an exact scale model in all particulars of the production system is difficult for me to believe.
Here is a specific example. Lets say that measurements on the production system indicates that CPU utilization, the percentage of time the CPU is not idle, is too high. So you put the software on the modest system and do measurements and discover that on the modest system, the CPU utilization is lower. An investigation reveals that the modest system has slower storage so the CPU is spending more time idle waiting on data transfer from storage to complete because the application is I/O bound on the modest system where as on the production system it is not. This difference is due to the modest machine not being an exact scale model of the production machine because the CPU ratio is different than the I/O transfer ratio.
Another example would be having more memory allowing fewer page faults in the production environment. When the software is loaded onto the more modest machine, there are more page faults due to having less physical memory. With the various applications paging in and out, they begin to affect each other as pages of other applications are swapped out and then swapped back in again. On the product machine with larger memory, this cascading page fault behavior is not seen because there is sufficient memory to hold more applications simultaneously.
The point that I am really trying to make here is that a computer with all its various parts and applications is a complex, dynamic system. The idea that one computing environment is just a scale model of another is too simplistic of an assumption. Using a modest system can certainly provide valuable data. However once the gross adjustments have been made to the software and you are beginning to get into more subtle detailed adjustments, the differences in the environment can have a large impact on the results of the testing.
Some citations.
Computer systems are dynamical systems by Tod Mytkowicz, Amer Diwan, and Elizabeth Bradley.
Bayesian fault detection and diagnosis in dynamic systems by Uri Lerner, Ronald Parr, Daphne Koller, and Gautam Biswas.
I have encountered the similar situation in my production environment. We use modest system just for initial and basic level testing and findings. It is true that you can never find real bottlenecks and other performance issues on your testing environment. So to find real performance related issues and to find bottlenecks you must do it on production environment, there's no other way.
We have hosted over 2.5Million websites, although it might not be case with yours but let me tell you this, that in our case, we have faced horrible situations of linear bottlenecks. Meaning, we first faced memory issues when our traffic was getting increased. We resolved that by adding more memory. Until then we didn't even notice that having only 256 threads of httpd was our next bottleneck because limited memory was hiding it, once we resolved memory issue it quickly came down to the problem that why our webservers were slow again after just few weeks? We found out that 256 httpd threads are just not enough to serve that much traffic. We not only increased threads but also installed HA parallel load balancers in front of our WebServers to mitigate the issue.
Fortunately it solved our slow page loading problems. But after few months as traffic continuously grew we got into next bottleneck of storage system. You know what this time disk I/Os was the issue. To make the story short, we put parallel NFS based physical storage systems. Each NFS machine now serve files by having over 2000 threads running.
I forgot to mentioned that Database was also a big culprit of slowness that we resolved that issue by installing Master-Slaves model in cluster. We had to do a lot of performance tweaks in our application code as well and we had to physically distribute our application into different modules over different servers.
I'm just mentioning all this to prove a point that it's very likely that all performance related issues almost come in a linear way, at least that's what we have faced in our WebBased model. Even you have tested a lot on your modest systems you still have chances hidden bottlenecks which you can't find on testing environments.
What I have learned in my last 6 years experience that try to DISTRIBUTE your model AS MUCH AS POSSIBLE if you think you might going to have a lot of traffic or hits/sec. Centralized model can hold your traffic for some time by doing much tweaks but in the end your system gets busted.
I'm not saying you will face some bottlenecks or issues in your situation but I just wanted to warn you that these cases happen sometimes, just so you better aware.
**Sorry for my English.
good question. learning and optimization is best on modest hardware. but testing is safer on mirror (or at least something from same epoch)
it seams like you try to predict the first bottleneck that will appears and when it will happen. i'm not sure if that's the correct objective and the correct way. i assume we don't speak about a typical CRUD where client says 'it should work as fast as every other web application'. if you want to do tests correctly then, before you start your tests, you should know the expected load. expected number of users, expected number of events, response time etc. it's a part of your product specification. if you don't have the numbers, that means your analysts didn't do their job.
if you have the numbers then you don't need exact tests result. you just need to know the order of magnitude. you should also check how your software/hardware scale. how many instances do you need to handle x users/requests/whatever and how many to handle y
We load test systems for our customers every day -- and we see a wide range of problems. Certain classes of problems can be found on down-sized systems. Other cannot. Some can ONLY be found in production...because no matter how closely you mirror the two systems, they can never be identical. You can get REALLY close, if you work hard enough.
So, simple fact of testing: the closer your system is to the production system, the more accurate your tests will be.
IMO, this is one of the best reasons for moving to the cloud: you can spin up a system that is very close to your production system (about as identical as you could ever get) and run your load tests on that.
It is probably worth mentioning that we've occasionally seen customers waste a lot of hours chasing problems in their test environments that never would have occurred in production. The more different the environments are, the more likely this is to happen :(
I think you have partially answered your own question - you already have a production level environment and are already finding it is not at the same level / not as capable as the production environment. The bottom line is that with all the money in the world you will never be able to replicate the exact functioning of the production website - timings of events, volumes, cpu utilisation, memory utilisation, db IO, when it's all working in anger the behaviour can be non-deterministic to a certain extend. My point is you can never make it exactly the same. And on the other side of the coin a production environment by it's nature is going to be an expensive environment with a lot of kit in order to make it perform and handle your production volume of data / transactions. This is a big expense / overhead to the business, and in these times of frugality should we not be looking to avoid additional cost to the business.
Maybe a different tactic should be taken - learn the performance profile of your production software - how it scales with volume, does running times increase linearly, exponentially or logarithmically? Can you model this? Firstly you can verify that the test environment is behaving in a similar way - this is key to having a valid test. Then the other important part is taking relative tests rather than absolutes - you aren't going to get absolute running times that are the same as production, but run your performance tests before deploying the code changes to give you your baseline, then deploy your code changes and re-run the performance tests - this will give you the relative changes in production (e.g. will the performance degrade with this code release), based on your models of performance you will be able to verify that the software is scaling in the same way with extra volume.
So my viewpoint is that there is a great deal you can learn about your software and hardware performance in the lower environment, and doing this on a smaller / less capable infrastructure saves your company money, and if used right can give you most of your answers to performance testing that you are looking for.

Concurrent page request comparisons

I have been hoping to find out what different server setups equate to in theory for concurrent page requests, and the answer always seems to be soaked in voodoo and sorcery. What is the approximation of max concurrent page requests for the following setups?
apache+php+mysql(1 server)
apache+php+mysql+caching(like memcached or similiar (still one server))
apache+php+mysql+caching+dedicated Database Server (2 servers)
apache+php+mysql+caching+dedicatedDB+loadbalancing(multi webserver/single dbserver)
apache+php+mysql+caching+dedicatedDB+loadbalancing(multi webserver/multi dbserver)
+distributed (amazon cloud elastic) -- I know this one is "as much as you can afford" but it would be nice to know when to move to it.
I appreciate any constructive criticism, I am just trying to figure out when its time to move from one implementation to the next, because they each come with their own implementation feat either programming wise or setup wise.
In your question you talk about caching and this is probably one of the most important factors in a web architecture r.e performance and capacity.
Memcache is useful, but actually, before that, you should be ensuring proper HTTP cache directives on your server responses. This does 2 things; it reduces the number of requests and speeds up server response times (if you have Apache configured correctly). This can also be improved by using an HTTP accelerator like Varnish and a CDN.
Another factor to consider is whether your system is stateless. By stateless, it usually means that it doesn't store sessions on the server and reference them with every request. A good systems architecture relies on state as little as possible. The less state the more horizontally scalable a system. Most people introduce state when confronted with issues of personalisation - i.e serving up different content for different users. In such cases you should first investigate using the HTML5 session storage (i.e store the complete user data in javascript on the client, obviously over https) or if the data set is smaller, secure javascript cookies. That way you can still serve up cached resources and then personalise with javascript on the client.
Finally, your stack includes a database tier, another potential bottleneck for performance and capacity. If you are only reading data from the system then again it should be quite easy to horizontally scale. If there are reads and writes, its typically better to separate the read write datasets into a separate database and have the read only in another. You can then use more relevant methods to scale.
These setups do not spit out a single answer that you can then compare to each other. The answer will vary on way more factors than you have listed.
Even if they did spit out a single answer, then it is just one metric out of dozens. What makes this the most important metric?
Even worse, each of these alternatives is not free. There is engineering effort and maintenance overhead in each of these. Which could not be analysed without understanding your organisation, your app and your cost/revenue structures.
Options like AWS not only involve development effort but may "lock you in" to a solution so you also need to be aware of that.
I know this response is not complete, but I am pointing out that this question touches on a large complicated area that cannot be reduced to a single metric.
I suspect you are approaching this from exactly the wrong end. Do not go looking for technologies and then figure out how to use them. Instead profile your app (measure, measure, measure), figure out the actual problem you are having, and then solve that problem and that problem only.
If you understand the problem and you understand the technology options then you should have an answer.
If you have already done this and the problem is concurrent page requests then I apologise in advance, but I suspect not.

First Time Architecturing?

I was recently given the task of rebuilding an existing RIA. The new RIA that I've designed is based on Silverlight, with a WCF service to connect to MS SQL Server. This is my first time doing something like this, so I'm not sure how to design the entire thing.
Basically, the client can look through graphs of "stocks" (allowing the client to choose different time periods, settings, etc). I've written the whole application essentially, but I'm not sure how to put it together.
The graphs are supposed to be directly based on the database, and to create the datapoints on the graph, some calculations need to be done (not very expensive ones).
The problem I'm having is to decide where to put the calculations (client or serverside? Or half and half?)
What factors should I look for to help me decide where the calculations should be done? And how can I go about optimizing this (caching, etc)?
Obviously this is a very broad subject, so I'm not expecting an immediate answer, but any help/pointing in the right direction/resources would be appreciated.
A few tips for this kind of app.
Put as much logic as possible on the client.
Make the client responsible for session data, making all your server code stateless.
Try to minimize traffic to and from the server (Bigger requests are more efficient than multiple smaller ones) so consolidate requests when possible.
If this project is likely to grow beyond it's current feature set I think it's probably a good idea to perform the calculations client side. This can avoid scaling issues, because you're using all the client side CPUs ratther than you're single, precious server CPU. This does however rely on being able to transfer the required data to the client in an efficient way, otherwise you replace a processor bottleneck with a network bottleneck.
As for caching it depends on your inputs, what variables can users of the client affect? If any of the variables they can alter are discrete (ie they can be a fixed set of values) then they're candidates for caching. For example if a user can select a date range of stock variations to view then that's probably not so useful, if however they can only select a year then you could cache your data sets by year (download each data set to the client and perform your calculation). I'd not worry about caching too much unless you find it's a real performance problem, it'll only make your code more complex, so don't add it until you have proven you need it.
One other thing, if this project is unlikely to be a long term concern then implement the calculations wherever is easiest and fastest, you can revisit if the project becomes more important later on.
Be REALLY REALLY careful about implementing client-side caching. Caching is INSANELY hard to do right while maintaining performance, security and correctness. Note that your DB Server's caching mechanism is already likely to be way better than any local caching mechanism you're likely to implement in less than 2 weeks' effort!
I would urge you to do as much work on the back-end as possible and to limit your client to render the data in a manner that is appropriate for your users. While many may balk at this suggestion, it's based on a number of observations from building many such systems in the past:
If you're going to filter some of the data returned by your service, you've just wasted thousands of clock cycles shipping data that need never have left your server
If you're going to sort your data, your DB could have done the sorting for you (often using otherwise idle CPU ticks) while waiting for the data to be read from its disks.
Your server most likely has more CPU and RAM available than your clients and has a surprising amount of "free time" to use for sorting, filtering, running inline calculations, etc., while its waiting for disks to read sectors etc.
As Roman suggested: Minimize your round-trips between your client and your server as much as possible.
But perhaps most importantly:
BEFORE YOU START DESIGNING YOUR SYSTEM, state your performance goals
Design what you think will achieve those goals. Try to find bottlenecks in your design, particularly areas where you make blocking calls. Re-design those areas to use async patterns wherever you can.
Build your intended solution
Measure your actual perforamnce under actual real-world load
If you're within your expected performance goals, then you're done.
If not, work out where you're spending too long and tune the design of that portion of the system. Goto 3.
Don't try to build the perfect system in one try - chances are that you won't manage it, no matter how hard you try, for a variety of reasons including user expectations, your servers ability to process the required load, your clients' ability to handle the returned data, your network's ability to carry the traffic, etc.
They're a little old now, but I suggest you read through some of the earlier posts at http://blogs.msdn.com/richardt for more thoughts around designing and constructing Service Oriented and distributed systems.

How often should applications be stress or load tested?

Is there a rule on how often an application should be stress or load tested? I normally do it before putting into production a new version, when the hardware changes or when the expected amount of users is known to change.
But today i'm asked if this should be a standard practice for an application that is in production even if no changes are introduced. If so, how often?
It really depends on how you want to address it for your company's needs. Personally, we load test our integration (test) builds daily - just like the builds go out. After the build runs at approx 1a, we have it scripted to be load tested as well. Our goal is specifically looking for build over build changes in performance. Even if we do not introduce changes into the code, the servers that the code is load tested on still recieves updates/patches/hot fixes/service packs/etc. At worst, once automated, it provides additional historical data.
We are going this route (build relativity) because it is cost prohibitive to try and replicate our hardware environment in production. In the event that we see a sudden change (or gradual changes) to key performance monitors, we can look into what changesets were introduced at that time and isolate potential code changes that adversely impacted performance.
From the sound of it, you are testing against a lab that replicates production? That is a different approach then we had, because we are going under the assumption that most of our bottlenecks would be code-induced and not directly dependent on hardware. We use VMs to approximate, but not duplicate, our production environment.
One thing that affects system performance, even though the code is unchanged, is data.
An example might be performance of a database query. As data is added to a table the cost of maintaining indexes goes up. Page splits in the index can degrade performance. As indexes grow, the number of 'levels' in the index will every so often have to be increased. When that happens you see a sudden, apparently inexplicable change in performance.
Running stress tests in a production environment is not always possible - it affects your day-to-day business. More often systems are instrumented to provide on-going feedback about performance. Maybe using something like ganglia. The data are used to detect issues and for capacity planning.
I think that whenever you change something in the application - code, data files, use-cases - and that includes but is not limited to expected amount of users, you should test it.
My two cents: sometimes you won't have changed anything and your site's performance could suffer. It could be from the app handling too much data (ie: caches overflowing). It could be from third party advertisements on the site slowing down. Heck, it could be because of fault RAM!
The main thing is that while it's generally advised you do testing after any known change, it's also not a bad idea to do occasional performance testing to check for possible unknown changes that could affect performance.
My company, BrowserMob, provides a free website monitoring service that runs real Selenium scripts every X minutes against your site. While it's not a load test, it definitely can help you identify trends and bottlenecks in your production site.
This depends on how mission-critical your system is ... if it is just a small tool that one can do without, once you put it on production. If your life depends on it, after every single build.
As far as I'm concerned, that is.
Depends on how much effort the testing requires. If it is easy enough, more testing never hurts. However, I see no reason to test if there are no changes expected. If this would lead to the software not tested for a long time, then it might be appropriate to run the tests from time to time in case there are unexpected changes.
Needless to say, it also depends on whether your software is running a nuclear reactor or a bulletin board.
If no changes are introduced in your product and the load testing simply repeats the same process over and over for some interval, I don't see the benefit of re-running.
I used to have stress tests as part of my ant file, so I could run those tests every night, if I wanted, and I would run them when I was making any changes, or testing a possible new change, that would in one way or another impact what I was testing, to see if there was any improvement.
I think how often would depend on your environment.
If you can't really stress test in development then you may have to wait until you get to QA, where you are testing just before you go to production.
I think the sooner you do it the better, as you can find problems and fix them faster, since you know it worked two nights ago, last night it failed the test, so there was some change during the day that caused it.
I used junitperf for many of my stress tests.
I think we don't want to stress test Notepad. It's a joke. Never mind =).
Stress testing not for all applications. :-)
If your software can kill someone, i think you should.
Imagine someone dying because the blood didn't arrive because some timeout on the system.
"Timeout.exception: your heart are not coming anymore. Try again later"