Is it premature optimization to develop on slow machines? - premature-optimization

We should develop on slow boxen because it forces us to optimize early.
Randall Hyde points out in The Fallacy of Premature Optimization, there are plenty of misconceptions around the Hoare quote:
We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.
In particular, even though machines these days scream compared with those in Hoare's day, it doesn't mean "optimization should be avoided." So does my respected colleague have a point when he suggests that we should develop on boxes of modest tempo? The idea is that performance bottlenecks are more irritating on a slow box and so they are likely to receive attention.

This should be community wiki since it's pretty subjective and there's no "right" answer.
That said, you should develop on the fastest machine available to you. Yes, anything slower will introduce irritation and encourage you to fix the slowdowns, but only at a very high price:
Your productivity as a programmer is directly related to the number of things you can hold in your head, and anything which slows down your process or impedes you at all lengthens the amount of time you have to hold those ideas in short term-memory, making you more likely to forget them, and have to go re-learn them.
Waiting for a program to compile allows the stack of bugs, potential issues, and fixes to drop out of your head as you get distracted. Waiting for a dialog to load, or a query to finish interrupts you similarly.
Even if you ignore that effect, you've still got the truth of the later statement - early optimization will leave you chasing yourself round in circles, breaking code that already works, and guessing (with often poor accuracy) about where things might get bogged down. Design your code properly in the first place, and you can forget about optimization until it's had a chance to settle for a bit, at which point any necessary optimization will be obvious.

Slow computers are not going to help you find your performance problems.
If your test data is only a few hundred rows in a table your db will cache it all and you'll never find badly written queries or bad table/index design. If your server application is not multi-threaded server you will not find that out until you stress test it with 500 users. Or if the app bottlenecks on bandwidth.
Optimization is "A Good Thing" but as I say to new developers who have all sorts of ideas about how to do it better 'I don't care how quickly you give me the wrong answer'. Get it right first, then make it faster when you find a bottleneck. An experienced programmer is going to design and build it reasonably well to start with.
If performance is really critical (real time? millisecond-transactions?) then you need to design and implement a set of benchmarks and tools to scientifically prove to yourselves that your changes are making it faster. There are way too many variables out there that affect performance.
Plus there's the classic programmer excuse they will bring out - 'but it's running slow because we have deliberately picked slow computers, it will run much faster when we deploy it.'
If your colleague thinks its important give him a slow computer and put him in charge of 'performance' :-)

I guess it would depend on what you're making and what the intended audience is.
If you're writing software for fixed hardware (say, console games) then use equipment (at least test equipment) that is similar or the same as what you will deploy on.
If you're developing desktop apps or something in that realm then develop on whatever machine you want and then tune it afterward to run on the desired min-spec hardware. Likewise, if you're developing in-house software, there is likely to be a min-spec for the machines that the company wants to buy. In that case, develop on a fast machine (to decrease development time and therefore costs) and test against that min-spec.
Bottom line, develop on the fastest machine you can get your hands on, and test on the minimum or exact hardware that you'll be supporting.

If you are programming on hardware that is close to the final test and production environments, you tend to find that there are less nasty surprises when it comes time to release the code.
I've seen enough programmers get side-swiped by serious, but unexpected problems caused by their machines being way faster than their most of their users. But also, I've seen the same problem occur with data. The code is tested on a small dataset and then "crumbles" on a large one.
Any differences in development and deployment environments can be the source of unexpected problems.
Still, since programming is expensive and time-consuming, if the end-user is running slow out-of-date equipment, the better solution is to deal with it at testing time (and schedule in a few early tests just to check usability and timing).
Why cripple your programmers just because you're worried about missing a potential problem? That's not a sane development strategy.
Paul.

for the love of Codd, use profiling tools, not slow development machines!

Optimization should be avoided, didn't that give us Vista? :p
But in all seriousness, its always a matter of tradeoffs. Important questions to ask yourself
What platform will your end users be using?
Can I drop cycles? What will happen if I do?
I agree with most that initial development should be done on the fastest or most efficient (not neccesarily the same) machine available to you. But for running tests, run it on your target platform, and test often and early.

Depends on your time to delivery. If you are in a 12 month delivery cycle then you should develop on a box with decent speed since your customers' 12 months from now will have better "average" boxes than the current "average".
As your development cycle approaches "today", your development machines should approach the current "average" speed of your clients' boxes.

I typically develop on the fastest machine I can get my hands on.
Most of the time I'm running a debug build, which is slow enough already.

I think it is a sound concept (but maybe because it works for me).
If my developer workstation is too fast I find I don't think ideas through thorougly enough simply because there is little time-penalty in re-generating the software image or downloading it to the target. I'd say at least half my downloads were unnecessary because I just remembered something I'd missed right before I was going to debug the code.
The target machine could well contain a throttled processor. If - on an embedded MCU - you have half the FLASH, RAM and clock cycles per second chances are developers will be a lot more careful when designing their code. I once suggested byte variables for the lengths of individual records in a data area (not in RAM but in a serial eeprom) and received the reply "we don't need to be stingy." A few months later they hit the RAM ceiling (128KiB). My reflection was that for this app there would never be any records larger than 256 bytes simply because there was no RAM to copy them to.
For server applications I think it would be a great idea to have a (much) lower-performing hardware to test on. Two or four cores instead of sixteen (or more). 1.6 GHz istdo 2.8. The list goes on. A server is usually - due to the very fact that everyone talks to it - a bottleneck in the system architecture. And that is long before you start developing the (server) application for it.

Related

stress testing web applications on less capable hardware

My organization is having an interesting internal debate right now that raises a question that I would like to open to the community at large.
The issue at hand is our environment in which we do stress-testing, capacity-testing, performance-regression-testing, and the like.
On one side of the debate are some software engineers who would like this environment to mirror the production environment as much as possible, in the interest of making the results as meaningful as possible. While we currently do have an environment for such testing, it is far less capable than the production system, and these software engineers feel that they are reaching the limits of what they can learn from it.
On the other side of the debate are some network engineers who both administer the environments and control the purse-strings. While they concede that capacity-testing would be better in an environment that is a better replica of the production environment, they argue that – for the purposes of stress testing – a more modest environment would have the effect of magnifying performance bottlenecks, making them easier to discover and fix.
This finally brings us to the part that piqued my interest: one software engineer suggests that while a more modest stress environment will increase the likelihood that you will encounter some bottleneck, it does not necessarily follow that it would help you find the next bottleneck you may encounter in production. The scaling effect, he argues, may not be linear.
Is there merit to that point of view? If yes, then why? What are the sources of that nonlinearity?
There are a lot of moving parts involved here: a cluster of java application servers, a cluster of database servers, lots of dynamic content being generated for each HTTP hit.
Edit: I appreciate everybody's thoughts so far, but I was really hoping that someone would do more than re-affirm one side or the other and actually tackle the question of "why". If there is such nonlinearity, what gives rise to it? Better yet it would be great if the reasons were expressed in terms of the CPU, memory, bandwidth, latency, interactions between subsystems, what have you... TerryE, you have come the closest. You should re-post your comment as an answer for the bounty if no one else steps up
Your software developer is right and I will take the point even further.
When you test an application components, like a web service, to see its behaviour under load, it is understandable to use a less capable environment. You can find the bottlenecks about memory, io etc. And most probably will find bugs and oversights like out of memory errors and log files getting huge.
But when your application components are running as intended and you need to test the whole shebang, you need to test the real environment.
When you run stress tests on an environment, you measure that environment's behaviour under load and its bottlenecks. While this tests may provide valuable information, this information will not be about your production system. The bottlenecks you find might not be relevant to your real system and you may spend precious development time to fix the bugs that do not exist. To know about bottlenecks you really might face with, you should run your stress tests on your real production system (preferably before the grand opening).
The assumption of the network engineers is that modest system is basically a scale model of the production system. They are also assuming that the various characteristics of the production environment which would be affecting the software performance are mirrored in the more modest system just at lower levels however in the same ratios. For instance, the CPU is not as fast, there is not quite as much memory, the storage is a bit slower, etc. and all of these differences are in similar ratios such that if everything were magically multiplied by some factor, say 1.77, the resulting changed modest system would be exactly like the production system.
However that the modest system is an exact scale model in all particulars of the production system is difficult for me to believe.
Here is a specific example. Lets say that measurements on the production system indicates that CPU utilization, the percentage of time the CPU is not idle, is too high. So you put the software on the modest system and do measurements and discover that on the modest system, the CPU utilization is lower. An investigation reveals that the modest system has slower storage so the CPU is spending more time idle waiting on data transfer from storage to complete because the application is I/O bound on the modest system where as on the production system it is not. This difference is due to the modest machine not being an exact scale model of the production machine because the CPU ratio is different than the I/O transfer ratio.
Another example would be having more memory allowing fewer page faults in the production environment. When the software is loaded onto the more modest machine, there are more page faults due to having less physical memory. With the various applications paging in and out, they begin to affect each other as pages of other applications are swapped out and then swapped back in again. On the product machine with larger memory, this cascading page fault behavior is not seen because there is sufficient memory to hold more applications simultaneously.
The point that I am really trying to make here is that a computer with all its various parts and applications is a complex, dynamic system. The idea that one computing environment is just a scale model of another is too simplistic of an assumption. Using a modest system can certainly provide valuable data. However once the gross adjustments have been made to the software and you are beginning to get into more subtle detailed adjustments, the differences in the environment can have a large impact on the results of the testing.
Some citations.
Computer systems are dynamical systems by Tod Mytkowicz, Amer Diwan, and Elizabeth Bradley.
Bayesian fault detection and diagnosis in dynamic systems by Uri Lerner, Ronald Parr, Daphne Koller, and Gautam Biswas.
I have encountered the similar situation in my production environment. We use modest system just for initial and basic level testing and findings. It is true that you can never find real bottlenecks and other performance issues on your testing environment. So to find real performance related issues and to find bottlenecks you must do it on production environment, there's no other way.
We have hosted over 2.5Million websites, although it might not be case with yours but let me tell you this, that in our case, we have faced horrible situations of linear bottlenecks. Meaning, we first faced memory issues when our traffic was getting increased. We resolved that by adding more memory. Until then we didn't even notice that having only 256 threads of httpd was our next bottleneck because limited memory was hiding it, once we resolved memory issue it quickly came down to the problem that why our webservers were slow again after just few weeks? We found out that 256 httpd threads are just not enough to serve that much traffic. We not only increased threads but also installed HA parallel load balancers in front of our WebServers to mitigate the issue.
Fortunately it solved our slow page loading problems. But after few months as traffic continuously grew we got into next bottleneck of storage system. You know what this time disk I/Os was the issue. To make the story short, we put parallel NFS based physical storage systems. Each NFS machine now serve files by having over 2000 threads running.
I forgot to mentioned that Database was also a big culprit of slowness that we resolved that issue by installing Master-Slaves model in cluster. We had to do a lot of performance tweaks in our application code as well and we had to physically distribute our application into different modules over different servers.
I'm just mentioning all this to prove a point that it's very likely that all performance related issues almost come in a linear way, at least that's what we have faced in our WebBased model. Even you have tested a lot on your modest systems you still have chances hidden bottlenecks which you can't find on testing environments.
What I have learned in my last 6 years experience that try to DISTRIBUTE your model AS MUCH AS POSSIBLE if you think you might going to have a lot of traffic or hits/sec. Centralized model can hold your traffic for some time by doing much tweaks but in the end your system gets busted.
I'm not saying you will face some bottlenecks or issues in your situation but I just wanted to warn you that these cases happen sometimes, just so you better aware.
**Sorry for my English.
good question. learning and optimization is best on modest hardware. but testing is safer on mirror (or at least something from same epoch)
it seams like you try to predict the first bottleneck that will appears and when it will happen. i'm not sure if that's the correct objective and the correct way. i assume we don't speak about a typical CRUD where client says 'it should work as fast as every other web application'. if you want to do tests correctly then, before you start your tests, you should know the expected load. expected number of users, expected number of events, response time etc. it's a part of your product specification. if you don't have the numbers, that means your analysts didn't do their job.
if you have the numbers then you don't need exact tests result. you just need to know the order of magnitude. you should also check how your software/hardware scale. how many instances do you need to handle x users/requests/whatever and how many to handle y
We load test systems for our customers every day -- and we see a wide range of problems. Certain classes of problems can be found on down-sized systems. Other cannot. Some can ONLY be found in production...because no matter how closely you mirror the two systems, they can never be identical. You can get REALLY close, if you work hard enough.
So, simple fact of testing: the closer your system is to the production system, the more accurate your tests will be.
IMO, this is one of the best reasons for moving to the cloud: you can spin up a system that is very close to your production system (about as identical as you could ever get) and run your load tests on that.
It is probably worth mentioning that we've occasionally seen customers waste a lot of hours chasing problems in their test environments that never would have occurred in production. The more different the environments are, the more likely this is to happen :(
I think you have partially answered your own question - you already have a production level environment and are already finding it is not at the same level / not as capable as the production environment. The bottom line is that with all the money in the world you will never be able to replicate the exact functioning of the production website - timings of events, volumes, cpu utilisation, memory utilisation, db IO, when it's all working in anger the behaviour can be non-deterministic to a certain extend. My point is you can never make it exactly the same. And on the other side of the coin a production environment by it's nature is going to be an expensive environment with a lot of kit in order to make it perform and handle your production volume of data / transactions. This is a big expense / overhead to the business, and in these times of frugality should we not be looking to avoid additional cost to the business.
Maybe a different tactic should be taken - learn the performance profile of your production software - how it scales with volume, does running times increase linearly, exponentially or logarithmically? Can you model this? Firstly you can verify that the test environment is behaving in a similar way - this is key to having a valid test. Then the other important part is taking relative tests rather than absolutes - you aren't going to get absolute running times that are the same as production, but run your performance tests before deploying the code changes to give you your baseline, then deploy your code changes and re-run the performance tests - this will give you the relative changes in production (e.g. will the performance degrade with this code release), based on your models of performance you will be able to verify that the software is scaling in the same way with extra volume.
So my viewpoint is that there is a great deal you can learn about your software and hardware performance in the lower environment, and doing this on a smaller / less capable infrastructure saves your company money, and if used right can give you most of your answers to performance testing that you are looking for.

How often should applications be stress or load tested?

Is there a rule on how often an application should be stress or load tested? I normally do it before putting into production a new version, when the hardware changes or when the expected amount of users is known to change.
But today i'm asked if this should be a standard practice for an application that is in production even if no changes are introduced. If so, how often?
It really depends on how you want to address it for your company's needs. Personally, we load test our integration (test) builds daily - just like the builds go out. After the build runs at approx 1a, we have it scripted to be load tested as well. Our goal is specifically looking for build over build changes in performance. Even if we do not introduce changes into the code, the servers that the code is load tested on still recieves updates/patches/hot fixes/service packs/etc. At worst, once automated, it provides additional historical data.
We are going this route (build relativity) because it is cost prohibitive to try and replicate our hardware environment in production. In the event that we see a sudden change (or gradual changes) to key performance monitors, we can look into what changesets were introduced at that time and isolate potential code changes that adversely impacted performance.
From the sound of it, you are testing against a lab that replicates production? That is a different approach then we had, because we are going under the assumption that most of our bottlenecks would be code-induced and not directly dependent on hardware. We use VMs to approximate, but not duplicate, our production environment.
One thing that affects system performance, even though the code is unchanged, is data.
An example might be performance of a database query. As data is added to a table the cost of maintaining indexes goes up. Page splits in the index can degrade performance. As indexes grow, the number of 'levels' in the index will every so often have to be increased. When that happens you see a sudden, apparently inexplicable change in performance.
Running stress tests in a production environment is not always possible - it affects your day-to-day business. More often systems are instrumented to provide on-going feedback about performance. Maybe using something like ganglia. The data are used to detect issues and for capacity planning.
I think that whenever you change something in the application - code, data files, use-cases - and that includes but is not limited to expected amount of users, you should test it.
My two cents: sometimes you won't have changed anything and your site's performance could suffer. It could be from the app handling too much data (ie: caches overflowing). It could be from third party advertisements on the site slowing down. Heck, it could be because of fault RAM!
The main thing is that while it's generally advised you do testing after any known change, it's also not a bad idea to do occasional performance testing to check for possible unknown changes that could affect performance.
My company, BrowserMob, provides a free website monitoring service that runs real Selenium scripts every X minutes against your site. While it's not a load test, it definitely can help you identify trends and bottlenecks in your production site.
This depends on how mission-critical your system is ... if it is just a small tool that one can do without, once you put it on production. If your life depends on it, after every single build.
As far as I'm concerned, that is.
Depends on how much effort the testing requires. If it is easy enough, more testing never hurts. However, I see no reason to test if there are no changes expected. If this would lead to the software not tested for a long time, then it might be appropriate to run the tests from time to time in case there are unexpected changes.
Needless to say, it also depends on whether your software is running a nuclear reactor or a bulletin board.
If no changes are introduced in your product and the load testing simply repeats the same process over and over for some interval, I don't see the benefit of re-running.
I used to have stress tests as part of my ant file, so I could run those tests every night, if I wanted, and I would run them when I was making any changes, or testing a possible new change, that would in one way or another impact what I was testing, to see if there was any improvement.
I think how often would depend on your environment.
If you can't really stress test in development then you may have to wait until you get to QA, where you are testing just before you go to production.
I think the sooner you do it the better, as you can find problems and fix them faster, since you know it worked two nights ago, last night it failed the test, so there was some change during the day that caused it.
I used junitperf for many of my stress tests.
I think we don't want to stress test Notepad. It's a joke. Never mind =).
Stress testing not for all applications. :-)
If your software can kill someone, i think you should.
Imagine someone dying because the blood didn't arrive because some timeout on the system.
"Timeout.exception: your heart are not coming anymore. Try again later"

Profiling a VxWorks system

We've got a fairly large application running on VxWorks 5.5.1 that's been developed and modified for around 10 years now. We have some simple home-grown tools to show that we are not using too much memory or too much processor, but we don't have a good feel for how much headroom we actually have. It's starting to make it difficult to do estimates for future enhancements.
Does anybody have any suggestions on how to profile such a system? We've never had much luck getting the Wind River tools to work.
For bonus points: the other complication is that our system has very different behaviors at different times; during start-up it does a lot of stuff, then it sits relatively idle except for brief bursts of activity. If there is a profiler with some programmatic way to have to record state information, I think that'd be very useful too.
FWIW, this is compiled with GCC and written entirely in C.
I've done a lot of performance tuning of various kinds of software, including embedded applications. I won't discuss memory profiling - I think that is a different issue.
I can only guess where the "well-known" idea originated that to find performance problems you need to measure performance of various parts. That is a top-down approach, similar to the way governments try to control budget waste, by subdividing. IMHO, it doesn't work very well.
Measurement is OK for seeing if what you did made a difference, but it is poor at telling you what to fix.
What is good at telling you what to fix is a bottom-up approach, in which you examine a representative sample of microscopic units of what is being spent, and finding out the full explanation of why each one is being spent. This works for a simple statistical reason. If there is a reason why some percent (for example 40%) of samples can be saved, on average 40% of samples will show it, and it doesn't require a huge number of samples. It does require that you examine each sample carefully, and not just sort of aggregate them into bigger bunches.
As a historical example, this is what Harry Truman did at the outbreak of the U.S. involvement in WW II. There was terrific waste in the defense industry. He just got in his car, drove out to the factories, and interviewed the people standing around. Then he went back to the U.S. Senate, explained what the problems were exactly, and got them fixed.
Maybe this is more of an answer than you wanted. Specifically, this is the method I use, and this is a blow-by-blow example of it.
ADDED: I guess the idea of finding-by-measuring is simply natural. Around '82 I was working on an embedded system, and I needed to do some performance tuning. The hardware engineer offered to put a timer on the board that I could read (providing from his plenty). IOW he assumed that finding performance problems required timing. I thanked him and declined, because by that time I knew and trusted the random-halt technique (done with an in-circuit-emulator).
If you have the Auxiliary Clock available, you could use the SPY utility (configurable via the config.h file) which does give you a very rough approximation of which tasks are using the CPU.
The nice thing about it is that it does not require being attached to the Tornado environment and you can use it from the Kernel shell.
Otherwise, btpierre's suggestion of using taskHookAdd has been used successfully in the past.
I've worked on systems that have had luck using locally-built monitoring utilities based on taskSwitchHookAdd and related functions (delete hook, etc).
"Simply" use this to track the number of ticks a given task runs. I realize that this is fairly gross scale information for profiling, but it can be useful depending on your needs.
To see how much cpu% each task is using, calculate the percentage of ticks assigned to each task.
To see how much headroom you have, add a lowest priority "idle" task that just does "while(1){}", and see how much cpu% it is assigned to it. Roughly speaking, that's your headroom.

Setting up a lab for developers performance testing

Our product earned bad reputation in terms of performance. Well, it's a big enterprise application, 13 years old, that needs a refreshment treat, and specifically a boost in its performance.
We decided to address the performance problem strategically in this version. We are evaluating a few options on how to do that.
We do have an experienced load test engineers equipped with the best tools in the market, but usually they get a stable release late in the version development life cycle, therefore in the last versions developers didn't have enough time to fix all their findings. (Yes, I know we need to deliver earlier a stable versions, we are working on this process as well, but it's not in my area)
One of the directions I am pushing is to set up a lab environment installed with the nightly build so developers can test the performance impact of their code.
I'd like this environment to be constantly loaded by scripts simulating real user's experience. On this loaded environment each developer will have to write a specific script that tests his code (i.e. single user experience in a real world environment). I'd like to generate a report that shows each iteration impact on existing features, as well as performance of new features.
I am a bit worried that I'm aiming too high, and it it will turn out to become too complicated.
What do you think of such an idea?
Does anyone have an experience with setting up such an environment?
Can you share your experience?
It sounds like a good idea, but in all honesty, if your organisation can't get a build to the expensive load test team it has employed just for this purpose, then it will never get your idea working.
Go for the low hanging fruit first. Get a nightly build available to the performance testing team earlier in the process.
In fact, if this version is all about performance, why not have the team just take this version to address all the performance issues that came late in the iteration for the last version.
EDIT: "Don't developers have a responsibility to performance test code" was a comment. Yes, true. I personally would have every developer have a copy of YourKit java profiler (it's cheap and effective) and know how to use it. However, unfortunately performance tuning is a really, really fun technical activity and it is possible to spend a lot of time doing this when you would be better developing features.
If your developer team are repeatedly developing noticeably slow code then education on performance or better programmers is the only answer, not more expensive process.
One of the biggest boost in productivity is an automated build system which runs overnight (this is called Continuous Integration). Errors made yesterday are caught today early in the morning, when I'm still fresh and when I might still remember what I did yesterday (instead of several weeks/months later).
So I suggest to make this happen first because it's the very foundation for anything else. If you can't reliably build your product, you will find it very hard to stabilize the development process.
After you have done this, you will have all the knowledge necessary to create performance tests.
One piece of advice though: Don't try to achieve everything at once. Work one step at a time, fix one issue after the other. If someone comes up with "we must do this, too", you must do the same triage as you do with any other feature request: How important is this? How dangerous? How long will it take to implement? How much will we gain?
Postpone hard but important tasks until you have sorted out the basics.
Nightly builds are the right approach to performance testing. I suggest you require scripts that run automatically each night. Then record the results in a database and provide regular reports. You really need two sorts of reports:
A graph of each metric over time. This will help you see your trends
A comparison of each metric against a baseline. You need to know when something drops dramatically in a day or when it crosses a performance threshold.
A few other suggestions:
Make sure your machines vary similarly to your intended environment. Have low and high end machines in the pool.
Once you start measuring, never change the machines. You need to compare like to like. You can add new machines, but you can't modify any existing ones.
We built a small test bed, to do sanity testing - ie did the app fire up and work as expected when the buttons were pushed, did the validation work etc. Ours was a web app and we used Watir a ruby based toolkit to drive the browser. The output from those runs are created as Xml documents, and the our CI tool (cruise control) could output the results, errors and performance as part of each build log. The whole thing worked well, and could have been scaled onto multiple PCs for proper load testing.
However, we did all that because we had more bodies than tools. There are some big end stress test harnesses that will do everything you need. They cost, but that will be less than the time spent to hand roll. Another issue we had was getting our Devs to write Ruby/Watir tests, in the end that fell to one person and the testing effort was pretty much a bottleneck because of that.
Nightly builds are excellent, lab environments are excellent, but you're in danger of muddling performance testing with straight up bug testing I think.
Ensure your lab conditions are isolated and stable (i.e. you vary only one factor at a time, whether that's your application or a windows update) and the hardware is reflective of your target. Remember that your benchmark comparisons will only be bulletproof internally to the lab.
Test scripts written by the developers who wrote the code tends to be a toxic thing to do. It doesn't help you drive out misunderstandings at implementation (since the same misunderstanding will be in the test script), and there is limited motivation to actually find problems. Far better is to take a TDD approach and write the tests first as a group (or a separate group), but failing that you can still improve the process by writing the scripts collaboratively. Hopefully you have some user-stories from your design stage, and it may be possible to replay logs for real world experience (app varying).

How to avoid the dangers of optimisation when designing for the unknown?

A two parter:
1) Say you're designing a new type of application and you're in the process of coming up with new algorithms to express the concepts and content -- does it make sense to attempt to actively not consider optimisation techniques at that stage, even if in the back of your mind you fear it might end up as O(N!) over millions of elements?
2) If so, say to avoid limiting cool functionality which you might be able to optimise once the proof-of-concept is running -- how do you stop yourself from this programmers habit of a lifetime? I've been trying mental exercises, paper notes, but I grew up essentially counting clock cycles in assembler and I continually find myself vetoing potential solutions for being too wasteful before fully considering the functional value.
Edit: This is about designing something which hasn't been done before (the unknown), when you're not even sure if it can be done in theory, never mind with unlimited computing power at hand. So answers along the line of "of course you have to optimise before you have a prototype because it's an established computing principle," aren't particularly useful.
I say all the following not because I think you don't already know it, but to provide moral support while you suppress your inner critic :-)
The key is to retain sanity.
If you find yourself writing a Theta(N!) algorithm which is expected to scale, then you're crazy. You'll have to throw it away, so you might as well start now finding a better algorithm that you might actually use.
If you find yourself worrying about whether a bit of Pentium code, that executes precisely once per user keypress, will take 10 cycles or 10K cycles, then you're crazy. The CPU is 95% idle. Give it ten thousand measly cycles. Raise an enhancement ticket if you must, but step slowly away from the assembler.
Once thing to decide is whether the project is "write a research prototype and then evolve it into a real product", or "write a research prototype". With obviously an expectation that if the research succeeds, there will be another related project down the line.
In the latter case (which from comments sounds like what you have), you can afford to write something that only works for N<=7 and even then causes brownouts from here to Cincinnati. That's still something you weren't sure you could do. Once you have a feel for the problem, then you'll have a better idea what the performance issues are.
What you're doing, is striking a balance between wasting time now (on considerations that your research proves irrelevant) with wasting time later (because you didn't consider something now that turns out to be important). The more risky your research is, the more you should be happy just to do something, and worry about what you've done later.
My big answer is Test Driven Development. By writing all your tests up front then you force yourself to only write enough code to implement the behavior you are looking for. If timing and clock cycles becomes a requirement then you can write tests to cover that scenario and then refactor your code to meet those requirements.
Like security and usability, performance is something that has to be considered from the beginning of the project. As such, you should definitely be designing with good performance in mind.
The old Knuth line is "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil." O(N!) to O(poly(N)) is not a "small efficiency"!
The best way to handle type 1 is to start with the simplest thing that could possibly work (O(N!) cannot possibly work unless you're not scaling past a couple dozen elements!) and encapsulate it from the rest of the application so you could rewrite it to a better approach assuming that there is going to be a performance issue.
Optimization isn't exactly a danger; its good to think about speed to some extent when writing code, because it stops you from implementing slow and messy solutions when something simpler and faster would do. It also gives you a check in your mind on whether something is going to be practical or not.
The worst thing that can happen is you design a large program explicitly ignoring optimization, only to go back and find that your entire design is completely useless because it cannot be optimized without completely rewriting it. This never happens if you consider everything when writing it--and part of that "everything" is potential performance issues.
"Premature optimization is the root of all evil" is the root of all evil. I've seen projects crippled by overuse of this concept. At my company we have a software program that broadcasts transport streams from disk on the network. It was originally created for testing purposes (so we would just need a few streams at once), but it was always in the program's spec requirements that it work for larger numbers of streams so it could later be used for video on demand.
Because it was written completely ignoring speed, it was a mess; it had tons of memcpys despite the fact that they should never be necessary, its TS processing code was absurdly slow (it actually parsed every single TS packet multiple times), and so forth. It handled a mere 40 streams at a time instead of the thousands it was supposed to, and when it actually came time to use it for VOD, we had to go back and spend a huge amount of time cleaning it up and rewriting large parts of it.
"First, make it run. Then make it run fast."
or
"To finish first, first you have to finish."
Slow existing app is usually better than ultra-fast non-existing app.
First of all peopleclaim that finishign is only thing that matters (or almost).
But if you finish a product that has O(N!) complexity on its main algorithm, as a rule of thumb you did not finished it! You have an incomplete and unacceptable product for 99% of the cases.
A reasonable performance is part of a working product. A perfect performance might not be. If you finish a text editor that needs 6 GB of memory to write a short note, then you have not finished a product at all, you have only a waste of time at your hands.. You must remember always that is not only delivering code that makes a product complete, is making it achieve capability of supplying the costumer/users needs. If you fail at that it matters nothing that you have finished the code writing in the schedule.
So all optimizations that avoid a resulting useless product are due to be considered and applied as soon as they do not compromise the rest of design and implementation proccess.
"actively not consider optimisation" sounds really weird to me. Usually 80/20 rule works quite good. If you spend 80% of your time to optimize program for less than 20% of use cases, it might be better to not waste time unless those 20% of use-cases really matter.
As for perfectionism, there is nothing wrong with it unless it starts to slow you down and makes you miss time-frames. Art of computer programming is an act of balancing between beauty and functionality of your applications. To help yourself consider learning time-management. When you learn how to split and measure your work, it would be easy to decide whether to optimize it right now, or create working version.
I think it is quite reasonable to forget about O(N!) worst case for an algorithm. First you need to determine that a given process is possible at all. Keep in mind that Moore's law is still in effect, so even bad algorithms will take less time in 10 or 20 years!
First optimize for Design -- e.g. get it to work first :-) Then optimize for performance. This is the kind of tradeoff python programmers do inherently. By programming in a language that is typically slower at run-time, but is higher level (e.g. compared to C/C++) and thus faster to develop, python programmers are able to accomplish quite a bit. Then they focus on optimization.
One caveat, if the time it takes to finish is so long that you can't determine if your algorithm is right, then it is a very good time to worry about optimization earlier up stream. I've encountered this scenario only a few times -- but good to be aware of it.
Following on from onebyone's answer there's a big difference between optimising the code and optimising the algorithm.
Yes, at this stage optimising the code is going to be of questionable benefit. You don't know where the real bottlenecks are, you don't know if there is going to be a speed problem in the first place.
But being mindful of scaling issues even at this stage of the development of your algorithm/data structures etc. is not only reasonable but I suspect essential. After all there's not going to be a lot of point continuing if your back-of-the-envelope analysis says that you won't be able to run your shiny new application once to completion before the heat death of the universe happens. ;-)
I like this question, so I'm giving an answer, even though others have already answered it.
When I was in grad school, in the MIT AI Lab, we faced this situation all the time, where we were trying to write programs to gain understanding into language, vision, learning, reasoning, etc.
My impression was that those who made progress were more interested in writing programs that would do something interesting than do something fast. In fact, time spent worrying about performance was basically subtracted from time spent conceiving interesting behavior.
Now I work on more prosaic stuff, but the same principle applies. If I get something working I can always make it work faster.
I would caution however that the way software engineering is now taught strongly encourages making mountains out of molehills. Rather than just getting it done, folks are taught to create a class hierarchy, with as many layers of abstraction as they can make, with services, interface specifications, plugins, and everything under the sun. They are not taught to use these things as sparingly as possible.
The result is monstrously overcomplicated software that is much harder to optimize because it is much more complicated to change.
I think the only way to avoid this is to get a lot of experience doing performance tuning and in that way come to recognize the design approaches that lead to this overcomplication. (Such as: an over-emphasis on classes and data structure.)
Here is an example of tuning an application that has been written in the way that is generally taught.
I will give a little story about something that happened to me, but not really an answer.
I am developing a project for a client where one part of it is processing very large scans (images) on the server. When i wrote it i was looking for functionality, but i thought of several ways to optimize the code so it was faster and used less memory.
Now an issue has arisen. During Demos to potential clients for this software and beta testing, on the demo unit (self contained laptop) it fails due to too much memory being used. It also fails on the dev server with really large files.
So was it an optimization, or was it a known future bug. Do i fix it or oprtimize it now? well, that is to be determined as their are other priorities as well.
It just makes me wish I did spend the time to reoptimize the code earlier on.
Think about the operational scenarios. ( use cases)
Say that we're making a pizza-shop finder gizmo.
The user turns on the machine. It has to show him the nearest Pizza shop in meaningful time. It Turns out our users want to know fast: in under 15 seconds.
So now, any idea you have, you think: is this going to ever, realistically run in some time less than 15 seconds, less all other time spend doing important stuff..
Or you're a trading system: accurate sums. Less than a millisecond per trade if you can, please. (They'd probably accept 10ms), so , agian: you look at every idea from the relevant scenarios point of view.
Say it's a phone app: has to start in under (how many seconds)
Demonstrations to customers fomr laptops are ALWAYS a scenario. We've got to sell the product.
Maintenance, where some person upgrades the thing are ALWAYS a scenario.
So now, as an example: all the hard, AI heavy, lisp-customized approaches are not suitable.
Or for different strokes, the XML server configuration file is not user friendly enough.
See how that helps.
If I'm concerned about the codes ability to handle data growth, before I get too far along I try to set up sample data sets in large chunk increments to test it with like:
1000 records
10000 records
100000 records
1000000 records
and see where it breaks or becomes un-usable. Then you can decide based on real data if you need to optimize or re-design the core algorithms.