TPC or other DB benchmarks for SSD drives

TPC or other DB benchmarks for SSD drives - sql

I have been interested in SSD drives for quite sometime. I do a lot of work with databases, and I've been quite interested to find benchmarks such as TPC-H performed with and without SSD drives.
On the outside it sounds like there would be one, but unfortunately I have not been able to find one. The closest I've found to an answer was the first comment in this blog post.
http://dcsblog.burtongroup.com/data_center_strategies/2008/11/intels-enterprise-ssd-performance.html
The fellow who wrote it seemed to be a pretty big naysayer when it came to SSD technology in the enterprise, due to a claim of lack of performance with mixed read/write workloads.
There have been other benchmarks such as
this
and
this
that show absolutely ridiculous numbers. While I don't doubt them, I am curious if what said commenter in the first link said was in fact true.
Anyways, if anybody can find benchmarks done with DBs on SSDs that would be excellent.

I've been testing and using them for a while and whilst I have my own opinions (which are very positive) I think that Anandtech.com's testing document is far better than anything I could have written, see what you think;
http://www.anandtech.com/show/2739
Regards,
Phil.

The issue with SSD is that they make real sense only when the schema is normalized to 3NF or 5NF, thus removing "all" redundant data. Moving a "denormalized for speed" mess to SSD will not be fruitful, the mass of redundant data will make SSD too cost prohibitive.
Doing that for some existing application means redefining the existing table (references) to views, encapsulating the normalized tables behind the curtain. There is a time penalty on the engine's cpu to synthesize rows. The more denormalized the original schema, the greater the benefit to refactor and move to SSD. Even on SSD, these denormalized schemas will run slower, likely, due to the mass of data which must be retrieved and written.
Putting logs on SSD is not indicated; this is a sequential write-mostly (write-only under normal circumstances) operation, physics of SSD (flash type; a company named Texas Memory Systems has been building RAM based sub-systems for a long time) makes this non-indicated. Conventional rust drives, duly buffered, will do fine.
Note the anandtech articles; the Intel drive was the only one which worked right. That will likely change by the end of 2009, but as of now only the Intel drives qualify for serious use.

I've been running a fairly large SQL2008 database on SSDs for 9 months now. (600GB, over 1 billion rows, 500 transactions per second). I would say that most SSD drives that I tested are too slow for this kind of use. But if you go with the upper end Intels, and carefully pick your RAID configuration, the results will be awesome. We're talking 20,000+ random read/writes per second. In my experience, you get the best results if you stick with RAID1.
I can't wait for Intel to ship the 320GB SSDs! They are expected to hit the market in September 2009...

The formal TPC benchmarks will probably take a while to appear using SSD because there are two parts to the TPC benchmark - the speed (transactions per unit time) and the cost per (transaction per unit time). With the high speed of SSD, you have to scale the size of the DB even larger, thus using more SSD, and thus costing more. So, even though you might get superb speed, the cost is still prohibitive for a fully-scaled (auditable, publishable) TPC benchmark. This will remain true for a while yet, as in a few years, while SSD is more expensive than the corresponding quantity of spinning disk.

Commenting on:
"...quite interested to find benchmarks such as TPC-H performed with and without SSD drives."
(FYI and full disclosure, I am pseudonymously "J Scouter", the "pretty big naysayer when it came to SSD technology in the enterprise" referred to and linked above.)
So....here's the first clue to emerge.
Dell and Fusion-IO have published the first EVER audited benchmark using a Flash-memory device for storage.
The benchmark is the TPC-H, which is a "decision support" benchmark. This is important because TPC-H entails an almost exclusively "read-only" workload pattern -- perfect context for SSD as it completely avoids the write performance problem.
In the scenarios painted for us by the Flash SSD hypesters, this application represents a soft-pitch, a gentle lob right over the plate and an easy "home run" for a Flash-SSD database application.
The results? The very first audited benchmark for a flash SSD based database application, and a READ ONLY one at that resulted in (drum roll here)....a fifth place finish among comparable (100GB) systems tested.
This Flash SSD system produced about 30% as many Queries-per-hour as a disk-based system result published by Sun...in 2007.
Surely though it will be in price/performance that this Flash-based system will win, right?
At $1.46 per Query-per-hour, the Dell/Fusion-IO system finishes in third place. More than twice the cost-per-query-per-hour of the best cost/performance disk-based system.
And again, remember this is for TPC-H, a virtually "read-only" application.
This is pretty much exactly in line with what the MS Cambridge Research team discovered over a year ago -- that there are no enterprise workloads where Flash makes ROI sense from economic or energy standpoints
Can't wait to see TPC-C, TPC-E, or SPC-1, but according the the research paper that was linked above, SSDs will need to become orders-of-magnitude cheaper for them to ever make sense in enterprise apps.

Related

What is the expected performance gap switching from SQL to TSDB for handling time series?

We are in the case of using a SQL database for a single node storage of roughly 1 hour of high frequency metrics (several k inserts a second). We quickly ran into I/O issues which proper buffering would not simply handle, and we are willing to put time into solving the performance issue.
I suggested to switch to a specialised database for handling time series, but my colleague stayed pretty skeptical. His argument is that the gain "out of the box" is not guaranteed as he knows SQL well and already spent time optimizing the storage, and we in comparison do not have any kind of TSDB experience to properly optimize it.
My intuition is that using a TSDB would be much more efficient even with an out of box configuration but I don't have any data to measure this, and internet benchs such as InfluxDB's are nowhere near trustable. We should run our own, except we can't affoard to loose time in a dead end or a mediocre improvement.
What would be, in my use case but very roughly, the performance gap between relational storage and TSDB, when it comes to single node throughput ?

This question may be bordering on a software recommendation. I just want to point one important thing out: You have an existing code base so switching to another data store is expensive in terms of development costs and time. If you have someone experienced with the current technology, you are probably better off with a good-faith effort to make that technology work.
Whether you switch or not depends on the actual requirements of your application. For instance, if you don't need the data immediately, perhaps writing batches to a file is the most efficient mechanism.
Your infrastructure has ample opportunity for in-place growth -- more memory, more processors, solid-state disk (for example). These might meet your performance needs with a minimal amount of effort.
If you cannot make the solution work (and 10k inserts per second should be quite feasible), then there are numerous solutions. Some NOSQL databases relax some of the strict ACID requirements of traditional RDBMSs, providing faster throughout.

stress testing web applications on less capable hardware

My organization is having an interesting internal debate right now that raises a question that I would like to open to the community at large.
The issue at hand is our environment in which we do stress-testing, capacity-testing, performance-regression-testing, and the like.
On one side of the debate are some software engineers who would like this environment to mirror the production environment as much as possible, in the interest of making the results as meaningful as possible. While we currently do have an environment for such testing, it is far less capable than the production system, and these software engineers feel that they are reaching the limits of what they can learn from it.
On the other side of the debate are some network engineers who both administer the environments and control the purse-strings. While they concede that capacity-testing would be better in an environment that is a better replica of the production environment, they argue that – for the purposes of stress testing – a more modest environment would have the effect of magnifying performance bottlenecks, making them easier to discover and fix.
This finally brings us to the part that piqued my interest: one software engineer suggests that while a more modest stress environment will increase the likelihood that you will encounter some bottleneck, it does not necessarily follow that it would help you find the next bottleneck you may encounter in production. The scaling effect, he argues, may not be linear.
Is there merit to that point of view? If yes, then why? What are the sources of that nonlinearity?
There are a lot of moving parts involved here: a cluster of java application servers, a cluster of database servers, lots of dynamic content being generated for each HTTP hit.
Edit: I appreciate everybody's thoughts so far, but I was really hoping that someone would do more than re-affirm one side or the other and actually tackle the question of "why". If there is such nonlinearity, what gives rise to it? Better yet it would be great if the reasons were expressed in terms of the CPU, memory, bandwidth, latency, interactions between subsystems, what have you... TerryE, you have come the closest. You should re-post your comment as an answer for the bounty if no one else steps up

Your software developer is right and I will take the point even further.
When you test an application components, like a web service, to see its behaviour under load, it is understandable to use a less capable environment. You can find the bottlenecks about memory, io etc. And most probably will find bugs and oversights like out of memory errors and log files getting huge.
But when your application components are running as intended and you need to test the whole shebang, you need to test the real environment.
When you run stress tests on an environment, you measure that environment's behaviour under load and its bottlenecks. While this tests may provide valuable information, this information will not be about your production system. The bottlenecks you find might not be relevant to your real system and you may spend precious development time to fix the bugs that do not exist. To know about bottlenecks you really might face with, you should run your stress tests on your real production system (preferably before the grand opening).

The assumption of the network engineers is that modest system is basically a scale model of the production system. They are also assuming that the various characteristics of the production environment which would be affecting the software performance are mirrored in the more modest system just at lower levels however in the same ratios. For instance, the CPU is not as fast, there is not quite as much memory, the storage is a bit slower, etc. and all of these differences are in similar ratios such that if everything were magically multiplied by some factor, say 1.77, the resulting changed modest system would be exactly like the production system.
However that the modest system is an exact scale model in all particulars of the production system is difficult for me to believe.
Here is a specific example. Lets say that measurements on the production system indicates that CPU utilization, the percentage of time the CPU is not idle, is too high. So you put the software on the modest system and do measurements and discover that on the modest system, the CPU utilization is lower. An investigation reveals that the modest system has slower storage so the CPU is spending more time idle waiting on data transfer from storage to complete because the application is I/O bound on the modest system where as on the production system it is not. This difference is due to the modest machine not being an exact scale model of the production machine because the CPU ratio is different than the I/O transfer ratio.
Another example would be having more memory allowing fewer page faults in the production environment. When the software is loaded onto the more modest machine, there are more page faults due to having less physical memory. With the various applications paging in and out, they begin to affect each other as pages of other applications are swapped out and then swapped back in again. On the product machine with larger memory, this cascading page fault behavior is not seen because there is sufficient memory to hold more applications simultaneously.
The point that I am really trying to make here is that a computer with all its various parts and applications is a complex, dynamic system. The idea that one computing environment is just a scale model of another is too simplistic of an assumption. Using a modest system can certainly provide valuable data. However once the gross adjustments have been made to the software and you are beginning to get into more subtle detailed adjustments, the differences in the environment can have a large impact on the results of the testing.
Some citations.
Computer systems are dynamical systems by Tod Mytkowicz, Amer Diwan, and Elizabeth Bradley.
Bayesian fault detection and diagnosis in dynamic systems by Uri Lerner, Ronald Parr, Daphne Koller, and Gautam Biswas.

I have encountered the similar situation in my production environment. We use modest system just for initial and basic level testing and findings. It is true that you can never find real bottlenecks and other performance issues on your testing environment. So to find real performance related issues and to find bottlenecks you must do it on production environment, there's no other way.
We have hosted over 2.5Million websites, although it might not be case with yours but let me tell you this, that in our case, we have faced horrible situations of linear bottlenecks. Meaning, we first faced memory issues when our traffic was getting increased. We resolved that by adding more memory. Until then we didn't even notice that having only 256 threads of httpd was our next bottleneck because limited memory was hiding it, once we resolved memory issue it quickly came down to the problem that why our webservers were slow again after just few weeks? We found out that 256 httpd threads are just not enough to serve that much traffic. We not only increased threads but also installed HA parallel load balancers in front of our WebServers to mitigate the issue.
Fortunately it solved our slow page loading problems. But after few months as traffic continuously grew we got into next bottleneck of storage system. You know what this time disk I/Os was the issue. To make the story short, we put parallel NFS based physical storage systems. Each NFS machine now serve files by having over 2000 threads running.
I forgot to mentioned that Database was also a big culprit of slowness that we resolved that issue by installing Master-Slaves model in cluster. We had to do a lot of performance tweaks in our application code as well and we had to physically distribute our application into different modules over different servers.
I'm just mentioning all this to prove a point that it's very likely that all performance related issues almost come in a linear way, at least that's what we have faced in our WebBased model. Even you have tested a lot on your modest systems you still have chances hidden bottlenecks which you can't find on testing environments.
What I have learned in my last 6 years experience that try to DISTRIBUTE your model AS MUCH AS POSSIBLE if you think you might going to have a lot of traffic or hits/sec. Centralized model can hold your traffic for some time by doing much tweaks but in the end your system gets busted.
I'm not saying you will face some bottlenecks or issues in your situation but I just wanted to warn you that these cases happen sometimes, just so you better aware.
**Sorry for my English.

good question. learning and optimization is best on modest hardware. but testing is safer on mirror (or at least something from same epoch)
it seams like you try to predict the first bottleneck that will appears and when it will happen. i'm not sure if that's the correct objective and the correct way. i assume we don't speak about a typical CRUD where client says 'it should work as fast as every other web application'. if you want to do tests correctly then, before you start your tests, you should know the expected load. expected number of users, expected number of events, response time etc. it's a part of your product specification. if you don't have the numbers, that means your analysts didn't do their job.
if you have the numbers then you don't need exact tests result. you just need to know the order of magnitude. you should also check how your software/hardware scale. how many instances do you need to handle x users/requests/whatever and how many to handle y

We load test systems for our customers every day -- and we see a wide range of problems. Certain classes of problems can be found on down-sized systems. Other cannot. Some can ONLY be found in production...because no matter how closely you mirror the two systems, they can never be identical. You can get REALLY close, if you work hard enough.
So, simple fact of testing: the closer your system is to the production system, the more accurate your tests will be.
IMO, this is one of the best reasons for moving to the cloud: you can spin up a system that is very close to your production system (about as identical as you could ever get) and run your load tests on that.
It is probably worth mentioning that we've occasionally seen customers waste a lot of hours chasing problems in their test environments that never would have occurred in production. The more different the environments are, the more likely this is to happen :(

I think you have partially answered your own question - you already have a production level environment and are already finding it is not at the same level / not as capable as the production environment. The bottom line is that with all the money in the world you will never be able to replicate the exact functioning of the production website - timings of events, volumes, cpu utilisation, memory utilisation, db IO, when it's all working in anger the behaviour can be non-deterministic to a certain extend. My point is you can never make it exactly the same. And on the other side of the coin a production environment by it's nature is going to be an expensive environment with a lot of kit in order to make it perform and handle your production volume of data / transactions. This is a big expense / overhead to the business, and in these times of frugality should we not be looking to avoid additional cost to the business.
Maybe a different tactic should be taken - learn the performance profile of your production software - how it scales with volume, does running times increase linearly, exponentially or logarithmically? Can you model this? Firstly you can verify that the test environment is behaving in a similar way - this is key to having a valid test. Then the other important part is taking relative tests rather than absolutes - you aren't going to get absolute running times that are the same as production, but run your performance tests before deploying the code changes to give you your baseline, then deploy your code changes and re-run the performance tests - this will give you the relative changes in production (e.g. will the performance degrade with this code release), based on your models of performance you will be able to verify that the software is scaling in the same way with extra volume.
So my viewpoint is that there is a great deal you can learn about your software and hardware performance in the lower environment, and doing this on a smaller / less capable infrastructure saves your company money, and if used right can give you most of your answers to performance testing that you are looking for.

How to store 15 x 100 million 32-byte records for sequential access?

Me got 15 x 100 million 32-byte records. Only sequential access and appends needed. The key is a Long. The value is a tuple - (Date, Double, Double). Is there something in this universe which can do this? I am willing to have 15 seperate databases (sql/nosql) or files for each of those 100 million records. I only have a i7 core and 8 GB RAM and 2 TB hard disk.
I have tried PostgreSQL, MySQL, Kyoto Cabinet (with fine tuning) with Protostuff encoding.
SQL DBs (with indices) take forever to do the silliest query.
Kyoto Cabinet's B-Tree can handle upto 15-18 million records beyond which appends take forever.
I am fed up so much that I am thinking of falling back on awk + CSV which I remember used to work for this type of data.

If you scenario means always going through all records in sequence then it may be an overkill to use a database. If you start to need random lookups, replacing/deleting records or checking if a new record is not a duplicate of an older one, a database engine would make more sense.
For the sequential access, a couple of text files or hand-crafted binary files will be easier to handle. You sound like a developer - I would probably go for an own binary format and access it with help of memory-mapped files to improve the sequential read/append speed. No caching, just a sliding window to read the data. I think that it would perform better and even on usual hardware than any DB would; I did such data analysis once. It would also be faster than awking CSV files; however, I am not sure how much and if it satisfied the effort to develop the binary storage, first of all.
As soon as the database becomes interesting, you can have a look at MongoDB and CouchDB. They are used for storing and serving very large amounts of data. (There is a flattering evaluation that compares one of them to traditional DBs.). Databases usually need a reasonable hardware power to perform better; maybe you could check out how those two would do with your data.
--- Ferda

Ferdinand Prantl's answer is very good. Two points:
By your requirements I recommend that you create a very tight binary format. This will be easy to do because your records are fixed size.
If you understand your data well you might be able to compress it. For example, if your key is an increasing log value you don't need to store it entirely. Instead, store the difference to the previous value (which is almost always going to be one). Then, use a standard compression algorithm/library to save on data size big time.

For sequential reads and writes, leveldb will handle your dataset pretty well.

I think that's about 48 gigs of data in one table.
When you get into large databases, you have to look at things a little differently. With an ordinary database (say, tables less than a couple million rows), you can do just about anything as a proof of concept. Even if you're stone ignorant about SQL databases, server tuning, and hardware tuning, the answer you come up with will probably be right. (Although sometimes you might be right for the wrong reason.)
That's not usually the case for large databases.
Unfortunately, you can't just throw 1.5 billion rows straight at an untuned PostgreSQL server, run a couple of queries, and say, "PostgreSQL can't handle this." Most SQL dbms have ways of dealing with lots of data, and most people don't know that much about them.
Here are some of the things that I have to think about when I have to process a lot of data over the long term. (Short-term or one-off processing, it's usually not worth caring a lot about speed. A lot of companies won't invest in more RAM or a dozen high-speed disks--or even a couple of SSDs--for even a long-term solution, let alone a one-time job.)
Server CPU.
Server RAM.
Server disks.
RAID configuration. (RAID 3 might be worth looking at for you.)
Choice of operating system. (64-bit vs 32-bit, BSD v. AT&T derivatives)
Choice of DBMS. (Oracle will usually outperform PostgreSQL, but it costs.)
DBMS tuning. (Shared buffers, sort memory, cache size, etc.)
Choice of index and clustering. (Lots of different kinds nowadays.)
Normalization. (You'd be surprised how often 5NF outperforms lower NFs. Ditto for natural keys.)
Tablespaces. (Maybe putting an index on its own SSD.)
Partitioning.
I'm sure there are others, but I haven't had coffee yet.
But the point is that you can't determine whether, say, PostgreSQL can handle a 48 gig table unless you've accounted for the effect of all those optimizations. With large databases, you come to rely on the cumulative effect of small improvements. You have to do a lot of testing before you can defensibly conclude that a given dbms can't handle a 48 gig table.
Now, whether you can implement those optimizations is a different question--most companies won't invest in a new 64-bit server running Oracle and a dozen of the newest "I'm the fastest hard disk" hard drives to solve your problem.
But someone is going to pay either for optimal hardware and software, for dba tuning expertise, or for programmer time and waiting on suboptimal hardware. I've seen problems like this take months to solve. If it's going to take months, money on hardware is probably a wise investment.

How to prove that code isn't broken, but the hardware is?

I'm sure it repeats everywhere. You can 'feel' network is slow, or machine or slow or something. But the server/chassis logs are not showing anything, so IT doesn't believe you. What do you do?
Your regressions are taking twice the time ... but that's not enough
Okay you transfer 100 GB using dd etc, but ... that's not enough.
Okay you get server placed in different chassis for 2 week, it works fine ... but .. that's not enough...
so HOW do you get IT to replace the chassis ?
More specifically:
Is there any suite which I can run on two setups ( supposed to be identical ), which can show up difference in network/cpu/disk access .. which IT will believe ?

Computers don't age and slow down the same way we do. If your server is getting slower -- actually slower, not just feels slower because every other computer you use is getting faster -- then there is a reason and it is possible that you may be able to fix it. I'd try cleaning up some disk space, de-fragmenting the disk, and checking what other processes are running (perhaps someone's added more apps to the system and you're just not getting as many cycles).
If your app uses a database, you may want to analyze your query performance and see if some indices are in order. Queries that perform well when you have little data can start taking a long time as the amount of data grows if they have to use table scans. As a former "IT" guy, I'd also be reluctant to throw hardware at a problem because someone tells me the system is slowing down. I'd want to know what has changed and see if I could get the system running the way it should be. If the app has simply out grown the hardware -- after you've made suitable optimizations -- then upgrading is a reasonable choice.

Run a standard benchmark suite. See if it pinpoints memory, cpu, bus or disk, when compared to a "working" similar computer.
See http://en.wikipedia.org/wiki/Benchmark_(computing)#Common_benchmarks for some tips.

The only way to prove something is to do a stringent audit.
Now traditionally, we should keep the system constant between two different sets while altering the variable we are interested. In this case the variable is the hardware that your code is running on. So in simple terms, you should audit the running of your software on two different sets of hardware, one being the hardware you are unhappy about. And see the difference.
Now if you are to do this properly, which I am sure you are, you will first need to come up with a null hypothesis, something like:
"The slowness of the application is
unrelated to the specific hardware we
are using"
And now you set about disproving that hypothesis in favour of an alternative hypothesis. Once you have collected enough results, you can apply statistical analyses on them, to decide whether any differences are statistically significant. There are analyses to find out how much data you need, and then compare the two sets to decide if the differences are random, or not random (which would disprove your null hypothesis). The type of tests you do will mostly depend on your data, but clever people have made checklists to help us decide.

It sounds like your main problem is being listened to by IT, but raw technical data may not be persuasive to the right people. Getting backup from the business may help you and that means talking about money.
Luckily, both platforms already contain a common piece of software - the application itself - designed to make or save money for someone. Why not measure how quickly it can do that e.g. how long does it take to process an order?
By measuring how long your application spends dealing with each sub task or data source you can get a rough idea of the underlying hardware which is under performing. Writing to a local database, or handling a data structure larger than RAM will impact the disk, making network calls will impact the network hardware, CPU bound calculations will impact there.
This data will never be as precise as a benchmark, and it may require expensive coding, but its easier to translate what it finds into money terms. Log4j's NDC and MDC features, and Springs AOP might be good enabling tools for you.

Run perfmon.msc from Start / Run in Windows 2000 through to Vista. Then just add counters for CPU, disk etc..
For SQL queries you should capture the actual queries then run them manually to see if they are slow.
For instance if using SQL Server, run the profiler from Tools, SQL Server Profiler. Then perform some operations in your program and look at the capture for any suspicous database calls. Copy and paste one of the queries into a new query window in management studio and run it.
For networking you should try artificially limiting your network speed to see how it affects your code (e.g. Traffic Shaper XP is a simple freeware limiter).

Is it premature optimization to develop on slow machines?

We should develop on slow boxen because it forces us to optimize early.
Randall Hyde points out in The Fallacy of Premature Optimization, there are plenty of misconceptions around the Hoare quote:
We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.
In particular, even though machines these days scream compared with those in Hoare's day, it doesn't mean "optimization should be avoided." So does my respected colleague have a point when he suggests that we should develop on boxes of modest tempo? The idea is that performance bottlenecks are more irritating on a slow box and so they are likely to receive attention.

This should be community wiki since it's pretty subjective and there's no "right" answer.
That said, you should develop on the fastest machine available to you. Yes, anything slower will introduce irritation and encourage you to fix the slowdowns, but only at a very high price:
Your productivity as a programmer is directly related to the number of things you can hold in your head, and anything which slows down your process or impedes you at all lengthens the amount of time you have to hold those ideas in short term-memory, making you more likely to forget them, and have to go re-learn them.
Waiting for a program to compile allows the stack of bugs, potential issues, and fixes to drop out of your head as you get distracted. Waiting for a dialog to load, or a query to finish interrupts you similarly.
Even if you ignore that effect, you've still got the truth of the later statement - early optimization will leave you chasing yourself round in circles, breaking code that already works, and guessing (with often poor accuracy) about where things might get bogged down. Design your code properly in the first place, and you can forget about optimization until it's had a chance to settle for a bit, at which point any necessary optimization will be obvious.

Slow computers are not going to help you find your performance problems.
If your test data is only a few hundred rows in a table your db will cache it all and you'll never find badly written queries or bad table/index design. If your server application is not multi-threaded server you will not find that out until you stress test it with 500 users. Or if the app bottlenecks on bandwidth.
Optimization is "A Good Thing" but as I say to new developers who have all sorts of ideas about how to do it better 'I don't care how quickly you give me the wrong answer'. Get it right first, then make it faster when you find a bottleneck. An experienced programmer is going to design and build it reasonably well to start with.
If performance is really critical (real time? millisecond-transactions?) then you need to design and implement a set of benchmarks and tools to scientifically prove to yourselves that your changes are making it faster. There are way too many variables out there that affect performance.
Plus there's the classic programmer excuse they will bring out - 'but it's running slow because we have deliberately picked slow computers, it will run much faster when we deploy it.'
If your colleague thinks its important give him a slow computer and put him in charge of 'performance' :-)

I guess it would depend on what you're making and what the intended audience is.
If you're writing software for fixed hardware (say, console games) then use equipment (at least test equipment) that is similar or the same as what you will deploy on.
If you're developing desktop apps or something in that realm then develop on whatever machine you want and then tune it afterward to run on the desired min-spec hardware. Likewise, if you're developing in-house software, there is likely to be a min-spec for the machines that the company wants to buy. In that case, develop on a fast machine (to decrease development time and therefore costs) and test against that min-spec.
Bottom line, develop on the fastest machine you can get your hands on, and test on the minimum or exact hardware that you'll be supporting.

If you are programming on hardware that is close to the final test and production environments, you tend to find that there are less nasty surprises when it comes time to release the code.
I've seen enough programmers get side-swiped by serious, but unexpected problems caused by their machines being way faster than their most of their users. But also, I've seen the same problem occur with data. The code is tested on a small dataset and then "crumbles" on a large one.
Any differences in development and deployment environments can be the source of unexpected problems.
Still, since programming is expensive and time-consuming, if the end-user is running slow out-of-date equipment, the better solution is to deal with it at testing time (and schedule in a few early tests just to check usability and timing).
Why cripple your programmers just because you're worried about missing a potential problem? That's not a sane development strategy.
Paul.

for the love of Codd, use profiling tools, not slow development machines!

Optimization should be avoided, didn't that give us Vista? :p
But in all seriousness, its always a matter of tradeoffs. Important questions to ask yourself
What platform will your end users be using?
Can I drop cycles? What will happen if I do?
I agree with most that initial development should be done on the fastest or most efficient (not neccesarily the same) machine available to you. But for running tests, run it on your target platform, and test often and early.

Depends on your time to delivery. If you are in a 12 month delivery cycle then you should develop on a box with decent speed since your customers' 12 months from now will have better "average" boxes than the current "average".
As your development cycle approaches "today", your development machines should approach the current "average" speed of your clients' boxes.

I typically develop on the fastest machine I can get my hands on.
Most of the time I'm running a debug build, which is slow enough already.

I think it is a sound concept (but maybe because it works for me).
If my developer workstation is too fast I find I don't think ideas through thorougly enough simply because there is little time-penalty in re-generating the software image or downloading it to the target. I'd say at least half my downloads were unnecessary because I just remembered something I'd missed right before I was going to debug the code.
The target machine could well contain a throttled processor. If - on an embedded MCU - you have half the FLASH, RAM and clock cycles per second chances are developers will be a lot more careful when designing their code. I once suggested byte variables for the lengths of individual records in a data area (not in RAM but in a serial eeprom) and received the reply "we don't need to be stingy." A few months later they hit the RAM ceiling (128KiB). My reflection was that for this app there would never be any records larger than 256 bytes simply because there was no RAM to copy them to.
For server applications I think it would be a great idea to have a (much) lower-performing hardware to test on. Two or four cores instead of sixteen (or more). 1.6 GHz istdo 2.8. The list goes on. A server is usually - due to the very fact that everyone talks to it - a bottleneck in the system architecture. And that is long before you start developing the (server) application for it.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas