What exactly does A/B test mean in this scenario? - testing

When multiple developers are continually applying changes to databases
within an organization, there may be no structured change log. Then
it's difficult to find out what change caused the database performance
to slow down. In a call center, where results must be instantaneous
for customers waiting on the phone, an adverse change can be
detrimental to the business. The DEA tests a change that might have an
impact, and compares it to the unchanged database. You apply the A/B
testing method to analyze one change at a time, helping you to clearly
know its effect. The change might mean an improvement to performance,
degradation, or maybe no performance difference, which is equally
acceptable.
I am looking for an example to understand the following statement from the above quote: The DEA tests a change that might have an impact, and compares it to the unchanged database. You apply the A/B testing method to analyze one change at a time, helping you to clearly know its effect.
For example - suppose a setting is currently having the value as 0 and I want to set it to 1. How will the above A/B testing help me exactly?

You got two environments and the only difference between them is the setting you specified. And then you split the traffic to the DB into two (50% vs. 50% for example), and part of them goes to the first enviroment and part of them goes to another. You monitor the metrics (performance or something else) and if the difference is statistically significant, you would have high confidence that setting is the change that moves the metric.

Related

Is it possible to reject excessively large queries on specific views?

I'm working with MS-SQL Server, and we have several views that have the potential to return enormous amounts of processed data, enough to spike our servers to 100% resource usage for 30 minutes straight with a single query (if queried irresponsibly).
There is absolutely no business case in which such huge amounts of data would need to be returned from these views, so we'd like to lock it down to make sure nobody can DoS our SQL servers (intentionally or otherwise) by simply querying these particular views without proper where clauses etc.
Is it possible, via triggers or another method, to check the where clause etc. and confirm whether a given query is "safe" to execute (based on thresholds we determine), and reject the query if it doesn't meet our guidelines?
Or can we configure the server to reject given execution plans based on estimated time-to-completion etc.?
One potential way to reduce the overall cost of certain queries coming from a certain group of people is to use the resource governor. You can throttle how much CPU and/or memory is used up be a particular user/group. This is effective if you have a "wild west" kind of environment where some users submit bad queries that eat your resources alive. See here.
Another thing to consider is to set your MAXDOP (max degree of parallelism) to prevent any single query from taking all of the available CPU threads. That is, if MAXDOP is 1, then any query can only take 2 CPU threads to process. This is useful to prevent a large query from letting smaller quick ones processing. See here.
Kind of hacky but put a top x in every view
You cannot enforce it at the SQL side but on the app size they could use a TimeOut. But if they lack QC they probably lack the discipline for TimeOut. If you have some queries going 30 minutes they are probably setting a value longer than the default.
I'm not convinced about Blam's top X in each view. Without a corresponding ORDER BY clause the data will be returned in an indeterminate order. There may benefits to CDC's MAXDOP suggestion. Not so much for itself, but for the other queries that want to run at the same time.
I'd be inclined to look at moving to stored procedures. Then you can require input parameters and evaluate them before the query gets run in earnest. If, for example, a date range is too big, you can restrict it. You should also find out who is running the expensive query and what they really need. Seems like they might benefit from some ETL. Just some ideas.

How do you test your software on possible sql server deadlocks before production?

Imagine that you developed an application that reads/writes/updates/deletes data in SQL Server database.
What tests do you run to see all possible sql deadlocks before putting it into production?
You need to follow the basics of testing.
Unit Testing
Positive test cases, simple test cases that get you started
Negative test cases, add bad data that fails validations, and check if its been written
Boundary (dont confuse it with negative), check for all limits, validations, 0, maxint, minint. For gps, check for +90 degrees, -90 degrees, -180 degrees... Basically testing is not as easy as you think. Doing it right is as/more painful than coding.
All the above tests should focus on code COVERAGE (path coverage too, if you have the guts :)).
And yes, you need to test for add, update, remove anomalies, from your system. Do this per table, per connection per relation.
Scale-ability
Each table, you need to add data from different clients, increase the number of clients until you break. IT MUST BREAK. That is what you can scale to max.
Performance
Is comparative in nature. You cannot test for performance by itself. It has to be between builds, between competitive product (exactly same data, exactly same hardware, exactly same scenario)
Stress
Add a lot of data from all sides, concurrently (dont use scale-ability max's here, the intent is "NOT" to break, but to find out if it can deal with a lot of data, users, concurrency.
Reliability
Here, use normal data, normal scenario that your customer is looking for. And run it over 48/72 hours. The gc's should bot go crazy, memory leaks should not keep growing, performance should be "consistent" (need not be the best). But should not "DROP" over time. Look for "things going bad with time".

How important is optimization?

I got into a bit of a debate yesterday with my boss about the proper role of optimization when building software. Essentially, his position was that optimization needs to be a primary concern during the entire process of development.
My opinion is that you need to make the right algorithmic decisions during development, but you should never be counting cycles during development. In fact, I feel so strongly about this I had to walk away from the conversation. I've seen too many bad programming decisions in the name of "optimization", and too much bad code defended with the excuse "this way is faster".
What does the StackOverflow.com community think?
"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%."
- Donald Knuth
I think the premature optimization quote is used by too many to avoid thinking about the hard stuff concerning how well the application will run. I guarantee the users want you to think about how to design it so it will run as fast as possible.
This is not to say you should be timing everything, but the design phase is the easiest place to optimize and not cost lots of time later.
There are often several ways to do anything, you should pick in the design phase the one which is most likely to perform the best (if it turns out to be one of the times when it isn't the best, then optimize later). This should trump the need to have easy to read code.
If you aren't considering performance in the design phase, you aren't going to have a well designed system. That doesn't mean it should be the only concern (although in a database I'd rate it as 3rd in importance, right after data integrity and security), but trying to fix a system where poorly performing techniques were used throughout because the developers thought they were easier to understand is a nightmare. Being a user of such a system where you have to wait for minutes everytime you want to move from one screen to another is a nightmare (developers reallly should spend all day everyday for at least a week, using their systems!) for everyone who is stuck with the badly designed system. It costs less to design properly than to fix later and considering performance is critical to designing properly.
I work somewhere where the orginal developers drank the koolaid about premature optimization and did everything the way they thought was simplest (but which in almost every case was the wrong choice from a performance perspective). Now we are at 10 times the size we were three years ago and every screen on every website takes 30 seconds or so to load (or worse times out) and we are losing customers because of it. But changing it will be too hard because at the base they designed the database without considering how it would perform and redesigning a database with many many gigabytes of data into a new structure is way too time consuming and costly. If it had been designed to perform from the start it would be both easier to maintain and faster for the clients. We aren't talking about the need to performance tune the top 10 slowest queries here, we are talking about the fact that the overall structure requires a drastic change (one that would affect virtually every query against the system) to perform well.
Yes don't do micro optimization until you nmeed to, but please do the macro stuff. Consider if is this the best way before you commit to the path. Don't write cursors to hit tables with millions of records when a set-based statement will do. Don't try to have as few tables as possible becasue that seems to be a more elegant solution when the tables are storing disparate items (such as people, places, and vehicles) causing every single query to hit the same table and causing every delete to check all sorts of foreign key tables that will not ever have a record for that type of entity (it takes minutes to delete one record from the main table in our database, it's a real joy when something goes wrong in an import (bad data from a client usually) and we have to delete 200,000 let me tell you).
Optimization is almost tautologically a tradeoff: you gain runtime efficiency at the cost of other things (readability, maintainability, flexibility, compile times, etc.). As such, it's really never a good idea to do unless you know what the tradeoff is and why it's worthwhile.
Even worse, thinking about "how do I do X fast" can be very distracting. To much energy in that direction can easily lead to you misisng out on method Y which is much better (and often faster --- this is how "optimization" can make your code slower). Particularly if you do too much of this on a big project from the beginning, it represents a lot of momentum. If you can't afford to overcome that momentum, you can easily become locked into a bad design because you can't afford the time to restructure it. This way lie dragons
What your boss may be thinking of is more of an issue of writing bad code via inappropriate representations and algorithms. It's not really the same thing as optimizing, but an approach where you pay no attention whatsoever to appropriate data structures etc. can result in a codebase that is slow everywhere, and (much like the above "lock in") requires heroic effort to fix.
In general though, premature optimization really honestly is a terrible idea. Particularly when you end up with a complex, finely tuned, well documented (because that's the only way you can understand it) piece of code you end up not using anyway. And that's not even getting into the issue of subtle bugs that are often introduced when "optimizing"
[edit: pshaw, of course a Knuth quote encapsulates this well. That's what I get for typing too much]
Engineer throughout, optimize at the end.
Since with going with pithy, I'll say that optimization is as important as the impact of not doing it.
I think the "premature optimization is root of all evil" has to be understood literally - it does not say when is premature, and does not say you should optimize only at the end. Just not too early. Also, the "use the right algorithm, O(n^2) vs O(N)" is a bit dangerous if taken literally - because for many problems, the N is actually small, etc...
I think it depends a lot of the type of software you are doing: some software are such as every part is very independent, and can be optimized separately. But that's not always the case. For many (most ?) applications, speed just does not matter at all, the brute force but obviously correct way is the best one. But for projects where speed matters, it often has to be taken into account early - maybe that's another possible interpretation of Knuth's saying: many applications don't need to be optimized at all, just know which ones need and plan ahead.
Optimization is a primary concern through development when you have a good reason to expect that performance will be unfixably bad if optimization is a secondary concern.
This depends a lot what kind of code you're writing, but there are often better reasons to believe that your code will be unfixably difficult to use; or maintain; or full of bugs; or late; if all those things become secondary to tweaking performance.
Bad managers say, "all of those things are our primary concerns". Good managers work to find out which are dangers for this project.
Of course, good design does have to consider all these things, and the earlier you have a back-of-the-envelope estimate of any of them, the better. If all your manager is saying, is that if you never think about how fast your code will run then too much of your code will be dog-slow, then he's right. I just wouldn't say that makes optimization your "primary" concern.
If the USP of your software is that it's faster than your competitors', then optimization is a primary concern. With experience, you can often predict what sorts of operations will be the bottlenecks, design those with optimization in mind right from the start, and more-or-less ignore optimization elsewhere. A lot of projects won't even need this: they'll be fast enough without much effort, provided you use sensible algorithms and don't do anything stupid. "Don't do anything stupid" is always a primary concern, with no need to mention performance in particular.
I think that code needs to be, first and foremost, readable and understandable. So, optimisations that are done, should not be at the expense of readability. However, optimisation is often a trade-off.
Whether or not you should optimise your code depends on your application domain. If you are working on an embedded processor with only 8Mb of memory, then optimisation is probably something that every team member needs to keep in mind, when writing code - optimising for space vs speed.
However, pre-mature optimisation is not useful unless your system has been clearly spec'ed and understood. This is because most programmers do not make good optimisation decisions unless they can factor in the influence of the overall system, including processor architectural factors such as cache memory, hardware threads, pipelines, etc.
From 2 years of building highly optimized Java code (and that needed to be optimized that way) I would say that there is a time-spent rule that governs optimization:
optimizing on the spot: 5%-10% of your development time, because you have to do it countless times (every single time you have to amend your design)
optimizing just when you have had it working: 2% of your development time (you do it only once)
going back to it and optimizing when it's too slow: 30% of your development time, because you have to plunge yourself back into the system
SO I would come to the conclusion that there is a right time and a right way to optimize: do it entity by entity (class by class, if you have classes that have a single, well defined job to do, and can be tested and optimized), test well, make sure the logic is working, optimize just afterward, and forget about that entity's implementation details forever.
When developing, just keep it simple. IMHO, most performance problems are caused by over-engineering - making mountains out of molehills, often because of wanting "the right algorithm".
Periodically, stress test with a big data set, profiling or (my favorite technique) manual random sampling. You find a problem, you fix it. You find another, you fix it.
That way you avoid creating slugs (slowness bugs), and when they do arise, you kill them.
Added: If I can just elaborate on point 1. OO is seemingly the law of the land, and it certainly has good reasons behind it. Unfortunately it causes many young programmers to feel that programming is all about having lots of data structure, with layers upon layers of abstraction. Not that those are inherently bad, but combine that with the natural tendency to assume that the time something takes is roughly proportional to the number of characters you have to type to invoke it, and that this tendency multiplies over the layers (and besides, the machines are really fast), it's easy to create a perfect storm of cycle-waste.
Quote from a friend: "It's easier to make a working system efficient than to make an efficient system work".
I think it is important to use smart practices and patterns from the start, but get the system actually running for small test cases then do performance analysis. Frequently the areas of code that have poor performance aren't anticipated at the beginning, so get some real data and then optimize the bottlenecking 3% (or 20%, or whatever it is).
I think your boss is a lot more right than you are.
Allt too often the user experience is lost only to be "rediscovered" at the last possible moments when performance activites are prohibitively costly and inefficient. Or when it is discovered that the batch program that will process today's transactions requires forty hours to run.
Things such as database organization, when and when not to do which SELECTs are examples of design decisions that can make or break an application. Still you run the risk of a single programmer deciding to to otherwise, misinterpret or simply not understand what to do. Following-up on performance during an entire project decreases the risk that such things will happen. It also allows design decisions to be changed when there is a need for that without puting the entire poroject at risk.
"You need to make the right algorithmic decisions during development" is certainly true yet how many mainstream programmers are able to do that? Browsing the net for information does not guarantee finding a high quality solution. "Right" could be interpreted as meaning that it is ok to choose a poor algorithm because it is easy to understand and implement (= less development time, lower cost) than a more complicated one (= more development time, higher cost).
The pendulum of quantity vs quality is almost always on the quantity side because more code per hour or faster development time means money in the short term. The quality side means money in the long term.
EDIT
This article discusses performance and optimization quite thoroughly.
Performance preacher Rico Mariana sums it up in the short statement "never give up your performance accidentally."
Premature optimization is the root of all evil...There is a fine balance between, but I would say 95% of the time you need to optimize at the end; however, there are decisions you can make early on to help prevent issues. For example assume we are talking about an e-commerce web site. You have a requirement to display the catalog. Now you can grab all 100,000 items and display 50 of them, or you can grab just 50 from the database. These type of decisions should be made up front.
Cycle counting should only be done when a problem has been identified.
Your boss is partly right, optimisation does need to be considered throughout the development lifecycle but it is rarely the primary concern. Also, the term 'optimisation' is vague - it's an adjective, 'optimise for ...' which could be 'memory', 'speed', 'usability', 'maintainability' and so on.
However, the OP is right that counting cycles is pointless for many projects. For most PC applications the CPU is never the bottleneck. Also, the IA32 is not consistent - what worked well on one architecture, performs poorly on another. Cycle counting should only ever be done when it will actually make a difference - usually in CPU limited code or code with very specific timing needs.
Optimisation, of any kind, must always be driven by hard evidence. Never assume anything about the system or how the code is behaving. In an ideal world, application performance / constraints will be specified in the initial product design and tools to monitor the application's performance during development will be added early on in the development phase to guide the programmers as the product is made.

How to prove that code isn't broken, but the hardware is?

I'm sure it repeats everywhere. You can 'feel' network is slow, or machine or slow or something. But the server/chassis logs are not showing anything, so IT doesn't believe you. What do you do?
Your regressions are taking twice the time ... but that's not enough
Okay you transfer 100 GB using dd etc, but ... that's not enough.
Okay you get server placed in different chassis for 2 week, it works fine ... but .. that's not enough...
so HOW do you get IT to replace the chassis ?
More specifically:
Is there any suite which I can run on two setups ( supposed to be identical ), which can show up difference in network/cpu/disk access .. which IT will believe ?
Computers don't age and slow down the same way we do. If your server is getting slower -- actually slower, not just feels slower because every other computer you use is getting faster -- then there is a reason and it is possible that you may be able to fix it. I'd try cleaning up some disk space, de-fragmenting the disk, and checking what other processes are running (perhaps someone's added more apps to the system and you're just not getting as many cycles).
If your app uses a database, you may want to analyze your query performance and see if some indices are in order. Queries that perform well when you have little data can start taking a long time as the amount of data grows if they have to use table scans. As a former "IT" guy, I'd also be reluctant to throw hardware at a problem because someone tells me the system is slowing down. I'd want to know what has changed and see if I could get the system running the way it should be. If the app has simply out grown the hardware -- after you've made suitable optimizations -- then upgrading is a reasonable choice.
Run a standard benchmark suite. See if it pinpoints memory, cpu, bus or disk, when compared to a "working" similar computer.
See http://en.wikipedia.org/wiki/Benchmark_(computing)#Common_benchmarks for some tips.
The only way to prove something is to do a stringent audit.
Now traditionally, we should keep the system constant between two different sets while altering the variable we are interested. In this case the variable is the hardware that your code is running on. So in simple terms, you should audit the running of your software on two different sets of hardware, one being the hardware you are unhappy about. And see the difference.
Now if you are to do this properly, which I am sure you are, you will first need to come up with a null hypothesis, something like:
"The slowness of the application is
unrelated to the specific hardware we
are using"
And now you set about disproving that hypothesis in favour of an alternative hypothesis. Once you have collected enough results, you can apply statistical analyses on them, to decide whether any differences are statistically significant. There are analyses to find out how much data you need, and then compare the two sets to decide if the differences are random, or not random (which would disprove your null hypothesis). The type of tests you do will mostly depend on your data, but clever people have made checklists to help us decide.
It sounds like your main problem is being listened to by IT, but raw technical data may not be persuasive to the right people. Getting backup from the business may help you and that means talking about money.
Luckily, both platforms already contain a common piece of software - the application itself - designed to make or save money for someone. Why not measure how quickly it can do that e.g. how long does it take to process an order?
By measuring how long your application spends dealing with each sub task or data source you can get a rough idea of the underlying hardware which is under performing. Writing to a local database, or handling a data structure larger than RAM will impact the disk, making network calls will impact the network hardware, CPU bound calculations will impact there.
This data will never be as precise as a benchmark, and it may require expensive coding, but its easier to translate what it finds into money terms. Log4j's NDC and MDC features, and Springs AOP might be good enabling tools for you.
Run perfmon.msc from Start / Run in Windows 2000 through to Vista. Then just add counters for CPU, disk etc..
For SQL queries you should capture the actual queries then run them manually to see if they are slow.
For instance if using SQL Server, run the profiler from Tools, SQL Server Profiler. Then perform some operations in your program and look at the capture for any suspicous database calls. Copy and paste one of the queries into a new query window in management studio and run it.
For networking you should try artificially limiting your network speed to see how it affects your code (e.g. Traffic Shaper XP is a simple freeware limiter).

How to avoid the dangers of optimisation when designing for the unknown?

A two parter:
1) Say you're designing a new type of application and you're in the process of coming up with new algorithms to express the concepts and content -- does it make sense to attempt to actively not consider optimisation techniques at that stage, even if in the back of your mind you fear it might end up as O(N!) over millions of elements?
2) If so, say to avoid limiting cool functionality which you might be able to optimise once the proof-of-concept is running -- how do you stop yourself from this programmers habit of a lifetime? I've been trying mental exercises, paper notes, but I grew up essentially counting clock cycles in assembler and I continually find myself vetoing potential solutions for being too wasteful before fully considering the functional value.
Edit: This is about designing something which hasn't been done before (the unknown), when you're not even sure if it can be done in theory, never mind with unlimited computing power at hand. So answers along the line of "of course you have to optimise before you have a prototype because it's an established computing principle," aren't particularly useful.
I say all the following not because I think you don't already know it, but to provide moral support while you suppress your inner critic :-)
The key is to retain sanity.
If you find yourself writing a Theta(N!) algorithm which is expected to scale, then you're crazy. You'll have to throw it away, so you might as well start now finding a better algorithm that you might actually use.
If you find yourself worrying about whether a bit of Pentium code, that executes precisely once per user keypress, will take 10 cycles or 10K cycles, then you're crazy. The CPU is 95% idle. Give it ten thousand measly cycles. Raise an enhancement ticket if you must, but step slowly away from the assembler.
Once thing to decide is whether the project is "write a research prototype and then evolve it into a real product", or "write a research prototype". With obviously an expectation that if the research succeeds, there will be another related project down the line.
In the latter case (which from comments sounds like what you have), you can afford to write something that only works for N<=7 and even then causes brownouts from here to Cincinnati. That's still something you weren't sure you could do. Once you have a feel for the problem, then you'll have a better idea what the performance issues are.
What you're doing, is striking a balance between wasting time now (on considerations that your research proves irrelevant) with wasting time later (because you didn't consider something now that turns out to be important). The more risky your research is, the more you should be happy just to do something, and worry about what you've done later.
My big answer is Test Driven Development. By writing all your tests up front then you force yourself to only write enough code to implement the behavior you are looking for. If timing and clock cycles becomes a requirement then you can write tests to cover that scenario and then refactor your code to meet those requirements.
Like security and usability, performance is something that has to be considered from the beginning of the project. As such, you should definitely be designing with good performance in mind.
The old Knuth line is "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil." O(N!) to O(poly(N)) is not a "small efficiency"!
The best way to handle type 1 is to start with the simplest thing that could possibly work (O(N!) cannot possibly work unless you're not scaling past a couple dozen elements!) and encapsulate it from the rest of the application so you could rewrite it to a better approach assuming that there is going to be a performance issue.
Optimization isn't exactly a danger; its good to think about speed to some extent when writing code, because it stops you from implementing slow and messy solutions when something simpler and faster would do. It also gives you a check in your mind on whether something is going to be practical or not.
The worst thing that can happen is you design a large program explicitly ignoring optimization, only to go back and find that your entire design is completely useless because it cannot be optimized without completely rewriting it. This never happens if you consider everything when writing it--and part of that "everything" is potential performance issues.
"Premature optimization is the root of all evil" is the root of all evil. I've seen projects crippled by overuse of this concept. At my company we have a software program that broadcasts transport streams from disk on the network. It was originally created for testing purposes (so we would just need a few streams at once), but it was always in the program's spec requirements that it work for larger numbers of streams so it could later be used for video on demand.
Because it was written completely ignoring speed, it was a mess; it had tons of memcpys despite the fact that they should never be necessary, its TS processing code was absurdly slow (it actually parsed every single TS packet multiple times), and so forth. It handled a mere 40 streams at a time instead of the thousands it was supposed to, and when it actually came time to use it for VOD, we had to go back and spend a huge amount of time cleaning it up and rewriting large parts of it.
"First, make it run. Then make it run fast."
or
"To finish first, first you have to finish."
Slow existing app is usually better than ultra-fast non-existing app.
First of all peopleclaim that finishign is only thing that matters (or almost).
But if you finish a product that has O(N!) complexity on its main algorithm, as a rule of thumb you did not finished it! You have an incomplete and unacceptable product for 99% of the cases.
A reasonable performance is part of a working product. A perfect performance might not be. If you finish a text editor that needs 6 GB of memory to write a short note, then you have not finished a product at all, you have only a waste of time at your hands.. You must remember always that is not only delivering code that makes a product complete, is making it achieve capability of supplying the costumer/users needs. If you fail at that it matters nothing that you have finished the code writing in the schedule.
So all optimizations that avoid a resulting useless product are due to be considered and applied as soon as they do not compromise the rest of design and implementation proccess.
"actively not consider optimisation" sounds really weird to me. Usually 80/20 rule works quite good. If you spend 80% of your time to optimize program for less than 20% of use cases, it might be better to not waste time unless those 20% of use-cases really matter.
As for perfectionism, there is nothing wrong with it unless it starts to slow you down and makes you miss time-frames. Art of computer programming is an act of balancing between beauty and functionality of your applications. To help yourself consider learning time-management. When you learn how to split and measure your work, it would be easy to decide whether to optimize it right now, or create working version.
I think it is quite reasonable to forget about O(N!) worst case for an algorithm. First you need to determine that a given process is possible at all. Keep in mind that Moore's law is still in effect, so even bad algorithms will take less time in 10 or 20 years!
First optimize for Design -- e.g. get it to work first :-) Then optimize for performance. This is the kind of tradeoff python programmers do inherently. By programming in a language that is typically slower at run-time, but is higher level (e.g. compared to C/C++) and thus faster to develop, python programmers are able to accomplish quite a bit. Then they focus on optimization.
One caveat, if the time it takes to finish is so long that you can't determine if your algorithm is right, then it is a very good time to worry about optimization earlier up stream. I've encountered this scenario only a few times -- but good to be aware of it.
Following on from onebyone's answer there's a big difference between optimising the code and optimising the algorithm.
Yes, at this stage optimising the code is going to be of questionable benefit. You don't know where the real bottlenecks are, you don't know if there is going to be a speed problem in the first place.
But being mindful of scaling issues even at this stage of the development of your algorithm/data structures etc. is not only reasonable but I suspect essential. After all there's not going to be a lot of point continuing if your back-of-the-envelope analysis says that you won't be able to run your shiny new application once to completion before the heat death of the universe happens. ;-)
I like this question, so I'm giving an answer, even though others have already answered it.
When I was in grad school, in the MIT AI Lab, we faced this situation all the time, where we were trying to write programs to gain understanding into language, vision, learning, reasoning, etc.
My impression was that those who made progress were more interested in writing programs that would do something interesting than do something fast. In fact, time spent worrying about performance was basically subtracted from time spent conceiving interesting behavior.
Now I work on more prosaic stuff, but the same principle applies. If I get something working I can always make it work faster.
I would caution however that the way software engineering is now taught strongly encourages making mountains out of molehills. Rather than just getting it done, folks are taught to create a class hierarchy, with as many layers of abstraction as they can make, with services, interface specifications, plugins, and everything under the sun. They are not taught to use these things as sparingly as possible.
The result is monstrously overcomplicated software that is much harder to optimize because it is much more complicated to change.
I think the only way to avoid this is to get a lot of experience doing performance tuning and in that way come to recognize the design approaches that lead to this overcomplication. (Such as: an over-emphasis on classes and data structure.)
Here is an example of tuning an application that has been written in the way that is generally taught.
I will give a little story about something that happened to me, but not really an answer.
I am developing a project for a client where one part of it is processing very large scans (images) on the server. When i wrote it i was looking for functionality, but i thought of several ways to optimize the code so it was faster and used less memory.
Now an issue has arisen. During Demos to potential clients for this software and beta testing, on the demo unit (self contained laptop) it fails due to too much memory being used. It also fails on the dev server with really large files.
So was it an optimization, or was it a known future bug. Do i fix it or oprtimize it now? well, that is to be determined as their are other priorities as well.
It just makes me wish I did spend the time to reoptimize the code earlier on.
Think about the operational scenarios. ( use cases)
Say that we're making a pizza-shop finder gizmo.
The user turns on the machine. It has to show him the nearest Pizza shop in meaningful time. It Turns out our users want to know fast: in under 15 seconds.
So now, any idea you have, you think: is this going to ever, realistically run in some time less than 15 seconds, less all other time spend doing important stuff..
Or you're a trading system: accurate sums. Less than a millisecond per trade if you can, please. (They'd probably accept 10ms), so , agian: you look at every idea from the relevant scenarios point of view.
Say it's a phone app: has to start in under (how many seconds)
Demonstrations to customers fomr laptops are ALWAYS a scenario. We've got to sell the product.
Maintenance, where some person upgrades the thing are ALWAYS a scenario.
So now, as an example: all the hard, AI heavy, lisp-customized approaches are not suitable.
Or for different strokes, the XML server configuration file is not user friendly enough.
See how that helps.
If I'm concerned about the codes ability to handle data growth, before I get too far along I try to set up sample data sets in large chunk increments to test it with like:
1000 records
10000 records
100000 records
1000000 records
and see where it breaks or becomes un-usable. Then you can decide based on real data if you need to optimize or re-design the core algorithms.