What things to test when re-compiling software for another operating system?

What things to test when re-compiling software for another operating system? - testing

Our software vendor is currently working on a project to migrate our enterprise scale laboratory system from Tru64 unix to Red Hat.
This obviously means recompiling with a new compiler and perform lots of testing.
While the vendor will do their own testing, we also need to do acceptance testing.
We don't exactly trust the vendor will be as thorough with their testing as we hope.
So I have been tasked to think of things that will need to be tested.
This is a laboratory system, so things such as calculations and rounding (and general maths) need to be tested.
But I thought I would ask the SO community for advice on what to test or perhaps past experiences with this sort of thing?

You will need to test everything. Whatever you tested in your original environment, you will need to test in your new environment.
Eventually, you'll gain confidence that most of your tests will simply never fail in the new environment. There will surely be a set of tests that will always succeed, as long as the old and new environments are Unix-based systems. That's fine - that's a set of tests you won't need to run constantly. I'd still keep those tests around to run once per release of the new OS or per release of your product, however, just to be safe.

Check it works on 32 and 64 bit CPUs, spaces in filenames, users don't need admin rights to run it or change configs
One unix to another isn't a huge leap.

If you can come up with a suite of regression tests, you can use those scenarios via an automated tool against the original and ported systems to make sure they match. The QA and UAT tests that you currently run against the system would probably be a good starting point, and then you could add any critical edge cases (such as those that test the math in detail) as needed. Paul's suggestion above about compiler issues would also allow derivation of some good edge cases; I'd recommend looking at that scope from the perspective of both the Tru64 and RHEL compilers.
A fair amount of my recent experience is with JMeter, which has a number of assertions, pre-conditions, and post-conditions that can be evaluated to ensure compliance. A number of the tools in this space would also allow you to do load testing, if appropriate.
If your system doesn't have a remotely accessible interface (like a web-based or socket-based interface), you could potentially do the same thing with scripted tools.

Thirteen or fourteen years ago, I couldn't move an Informix database from SCO OpenServer to Linux because SCO used 16-bit inode numbers, Linux used 32-bit inode numbers, and Linux's 'personalities' was nowhere near as advanced as it is today. I can appreciate your skepticism.
If you can re-run old experiments with saved data and saved outcomes, that would be my preferred place to start. Given simple datatypes, the precision or range of operations may be vastly different on different compilers/platforms, so I wouldn't be surprised if small differences in output are common, so exact matches may not be realistic, but certainly results should be close enough to not influence the larger 'outcomes' of your testing runs.
Rather than searching for test cases, use it for what you've already done with it. (As an aside, that's also a good way to build test cases for software development.)

Differences in precision between the standard math library functions. They are not the same on different systems. If you need consistent calculations, you will need to replace them. Look into crlibm and/or fdlibm.

Related

How to obtain test cases

Recently I focused on the static analysis software, especially the Indus and and Soot Java frameworks. I want to test these software. Can anyone can provide comprehensive test cases? I think the test cases I write are not typical enough.

My standard advice in evaluating static analysis tools is to test them on the real software you’ll be using them for: “Pitfall II: Don’t buy a tool based on bugs it finds in other people’s code. Before you commit to a static analysis tool, make sure that it finds important bugs in your real code. Bugs found in open source or demo code can be very impressive; but your organization’s code, while it’s under development (which is the cheapest time to find bugs) will be very different from code which has already been made public.” Supplemental Proceedings of the 21st IEEE International Symposium on Software Reliability Engineering, http://pobox.com/~flash/Static_Analysis_Deployment_Pitfalls.pdf.

Your best bet is to contact the vendors of these software packages and ask them for test cases. It is in their own interest to have as many as possible right now.

One way of obtaining test cases is holding onto the input files you get from your users when things break -- maybe distill the input to the least amount of input necessary to trigger the bug in the broken version, and make sure newer versions work correctly.

Is there a right way to implement a continuous improvement (AKA software hardening) process?

Each release it seems that our customers find a few old issues with our software. It makes it look like every release has multiple bugs, when in reality our new code is generally solid.
We have tried to implement some additional testing where we have testers do several hours of monthly regression testing on a single app each month in an effort to stay ahead of small issues. We refer to this process as our Software Hardening process, but it does not seem like we are catching enough of the bugs and it feels like a very backburner process since there is always new code to write.
Is there a trick to this kind of testing? Do I need to target one specific feature at a time?

When you develop your testing procedures, you may want to implement these kind of tests:
unit testing (testing invididual components of your project to test their functionality), these tests are important because they allow you to pinpoint where in the software the error may come from. Basically in these tests you will test a single functionality and use mock objects to simulate the behavior, return value of other objects/entities.
regression testing, which you mentioned
characterization testing, one example could be running automatically the program on automatically generated input (simulating the user input), storing the results and compare the results of every version against these results.
At the beginning this will be very heavy to put in place, but with more releases and more bugs fixes being added to the automated non-regression tests, you should be starting to save time.
It is very important that you do not fall in the trap of designing huge numbers of dumb tests. Testing should make your life easier, if you spend too much time understanding where the tests have broken you should redesign the tests such as they give you better messages/understanding of the problem so you can locate the issue quickly.
Depending of your environment, these tests can be linked to the development process.
In my environment, we use SVN for versioning, a bot runs the tests against every revision and returns the failing tests and messages with the name of the revision which broke it and the contributor (his login).
EDIT:
In my environment, we use a combination of C++ and C# to deliver analytics used in Finance, the code was C++ and is quite old while we are trying to migrate the interfaces toward C# and keep the core of the analytics in C++ (mainly because of speed requirements)
Most of the C++ tests are hand-written unit tests and regression tests.
On the C# side we are using NUnit for unit testing. We have a couple of general tests.
We have a 0 warnings policy, we explicitely forbid people to commit code which is generating warnings unless they can justify why it is useful to bypass the warning for this part of the code. We have as well conventions about exception safety, the use of design patterns and many other aspects.
Setting explicitely conventions and best practices is another way to improve the quality of your code.

Is there a trick to this kind of testing?
You said, "we have testers do several hours of monthly regression testing on a single app each month in an effort to stay ahead of small issues."
I guess that by "regression testing" you mean "manually exercising old features".
You ought to decide whether you're looking for old bugs which have never been found before (which means, running tests which you've never run before); or, whether you're repeating previously-run tests to verify that previously-tested functionality is unchanged. These are two opposite things.
"Regression testing" implies to me that you're doing the latter.
If the problem is that "customers find a few old issues with our software" then either your customers are running tests which you've never run before (in which case, to find these problems you need to run new tests of old software), or they're finding bugs which you have previous tested and found, but which you apparently never fixed after you found them.
Do I need to target one specific feature at a time?
What are you trying to do, exactly:
Find bugs before customers find them?
Convince customers that there's little wrong with the new development?
Spend as little time as possible on testing?
Very general advice is that bugs live in families: so when you find a bug, look for its parents and siblings and cousins, for example:
You might have this exact same bug in other modules
This module might be buggier than other modules (written by somone on an off day, perhaps), so look for every other kind of bug in this module
Perhaps this is one of a class of problems (performance problems, or low-memory problems) which suggests a whole area (or whole type of requirement) which needs better test coverage
Other advice is that it's to do with managing customer expectations: you said, "It makes it look like every release has multiple bugs, when in reality our new code is generally solid" as if the real problem is the mistaken perception that the bug is newly-written.
it feels like a very backburner process since there is always new code to write
Software develoment doesn't happen in the background, on a burner: either someone is working on it, or they're not. Management must to decide whether to assign anyone to this task (i.e. look for existing previously-unfound bugs, or fix-previously-found-but-not-yet-reported bugs), or whether they prefer to concentrate on new development and let the customers do the bug-detecting.
Edit: It's worth mentioning that testing isn't the only way to find bugs. There's also:
Informal design reviews (35%)
Formal design inspections (55%)
Informal code reviews (25%)
Formal code inspections (60%)
Personal desk checking of code (40%)
Unit test (30%)
Component test (30%)
Integration test (35%)
Regression test (25%)
System test (40%)
Low volume beta test (<10 sites) (35%)
High-volume beta test (>1000 sites) (70%)
The percentage which I put next to each is a measure of the defect-removal rate for each technique (taken from page 243 of McConnel's Software Estimation book). The two most effective techniques seem to be formal code inspection, and high-volume beta tests.
So it might be a good idea to introduce formal code reviews: that might be better at detecting defects than black-box testing is.

As soon as your coding ends, first you should go for the unit testing. THere you will get some bugs which should be fixed and you should perform another round of unit testing to find if new bugs came or not. After you finish Unit testing you should go for functional testing.
YOu mentioned here that your tester are performing regression testing on a monthly basis and still there are old bugs coming out. So it is better to sit with the tester and review the test cases as i feel that they need to be updated regularly. Also during review put stress on which module or functionality the bugs are coming. Stress on those areas and add more test cases in those areas and add those in your rgression test cases so that once new build comes those test cases should be run.
YOu can try one more thing if your project is a long term one then you should talk with the tester to automate the regression test cases. It will help you to run the test cases at off time like night and in the next day you will get the results. Also the regression test cases should be updated as the major problem comes when regression test cases are not updated regularly and by running old regression test cases and new progression test cases you are missing few modules that are not tested.

There is a lot of talk here about unit testing and I couldn't agree more. I hope that Josh understands that unit testing is a mechanized process. I disagree with PJ in that unit tests should be written before coding the app and not after. This is called TDD or Test Driven Development.
Some people write unit tests that exercise the middle tier code but neglect testing the GUI code. That is imprudent. You should write unit tests for all tiers in your application.
Since unit tests are also code, there is the question of QA for your test suite. Is the code coverage good? Are there false positives/negatives errors in the unit tests? Are you testing for the right things? How do you assure the quality of your quality assurance process? Basically, the answer to that comes down to peer review and cultural values. Everyone on the team has to be committed to good testing hygiene.
The earlier a bug is introduced into your system, the longer it stays in the system, the harder and more costly it is to remove it. That is why you should look into what is known as continuous integration. When set up correctly, continuous integration means that the project gets compiled and run with the full suite of unit tests shortly after you check in your changes for the day.
If the build or unit tests fail, then the offending coder and the build master gets a notification. They work with the team lead to determine what the most appropriate course correction should be. Sometimes it is just as simple as fix the problem and check the fix in. A build master and team lead needs to get involved to identify any overarching patterns that require additional intervention. For example, a family crisis can cause a developer's coding quality to bottom out. Without CI and some managerial oversight, it might take six months of bugs before you realize what is going on and take corrective action.
You didn't mention what your development environment is. If yours were a J2EE shop, then I would suggest that you look into the following.
CruiseControl for continuous integration
Subversion for the source code versioning control because it integrates well with CruiseControl
Spring because DI makes it easier to mechanize the unit testing for continuous integration purposes
JUnit for unit testing the middle tier
HttpUnit for unit testing the GUI
Apache JMeter for stress testing

Going back and implementing a testing strategy for (all) existing stuff is a pain. It's long, it's difficult, and no one will want to do it. However, I strongly recommend that as a new bug comes in, a test be developed around that bug. If you don't get a bug report on it, then either is (a) works or (b) the user doesn't care that it doesn't work. Either way, a test is a waste of your time.
As soon as its identified, write a test that goes red. Right now. Then fix the bug. Confirm that it is fixed. Confirm that the test is now green. Repeat as new bugs come in.

Sorry to say that but maybe you're just not testing enough, or too late, or both.

Is including system test cases into the final packaged product of your application contributing to bloat or increasing risk?

I am packaging up an rpm file which has a %postinstall section that detects certain conditions and runs a suite of unit, function, and system tests. I am getting some push back that it exposes some of the internal structure as I use some of the same environment variables the code itself uses for diagnostics. Thoughts?
UPDATE: I am not planning on running the tests automatically nor exposing their existance to the end users. I am proposing that the testing package simply be available to any machine where the suite lands. It adds roughtly 3% to the final size of the package and requires an obscene amount of internal knowledge to execute properly.
The program itself is a library which others may use and is exposed in an API. The internal knowledge of how things functions is not at issue. My main motivation is the lack of a suitable test resources and the large variability in the target environment. Some of the tests are really simple (similar to what configure might do to determine all the right features are available from the compiler). Other tests are more involved and they prove the basic functions the library should provide.

If you want to avoid the complaint that it runs on every install, at least use the %check rule of RPM.
Sounds like people are concerned about "reverse engineering". So the software is proprietary? This would seem to be the crux of your problem. Regardless, it's common for the test suite to be separate from the packaged software.
However, you're not being unrealistic: Allowing users to run tests themselves on their systems and give you the results is a great aspect of a collaborative relationship with users. Unfortunately, you're running up against the proprietary business model.
Perhaps you can compromise by trimming down or rewriting the tests and the diagnostics to only prove an adequate amount of fitness without revealing too much. I wouldn't back down from throwing out the tests and diagnostics of what you've written so far.
You really should make the argument that users will be pleased and have more confidence in a software package shipped with a thorough testing system, and that these outweigh any fears of revealing the software's internals.

Are embedded developers more conservative than their desktop brethrens? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I've been in the embedded space for a while now, and it seems that most programmers I talk to seem to be doing things pretty much the same way it was done 15 years or more ago: Waterfall(ish) Development, command line tools and a small group uses lint.
Contrast this with the server/desktop environment, where there seems to be lots of activity related to all sorts of facets of programming:
XP, Scrum, Iterative, Lean/Agile
Continuous Integration
Automated Builds
Automated Unit Testing Frameworks
Refactoring tool support
Is it just that the embedded environment makes it more difficult to implement new practices or tools?
Is it that the mindset of embedded programmers steers them away from new tools/concepts?
Is it that management in the typical embedded industry behind the curve compared to IT focused fields?
I do realize that this is a generalization, and some embedded projects do use Scrum, Agile, CI, Automated Builds (in fact I worked at a company that had that in place since the 80s). But my impression is that it is a very small percentage.

We are all used to the fact that our desktop PC crashes once in a while (or at least an application on the desktop suddenly disappears). It's no big deal. The next patch will fix it.
In the embedded space, you are building something which can't be patched. Lives can depend on your device (in a car, an elevator or a medical system). Most devices are installed and then must run unattended for years. So embedded people tend to be very conservative. TCP/IP is often "too modern". They stick to their trusty serial bus with a communication "stack" that is roughly 50 lines of assembler code.
What's worse, you simply don't have the abundance of space on the device which means you can't use one of the latest programming languages which make TDD and automated builds a bliss.
Next, a lot of embedded development environments are proprietary. If your supplier doesn't support it, you won't get it. Linux has started to weaken this in the past years but a whole lot of devices are not powerful enough to run Linux, yet. And even if they were, the CPU power would be used for something else instead of running a fancy OS which comes with source.
So yes, there are powerful forces working in the background to keep the embedded space where it is.

Are embedded developers more conservative than their desktop brethrens?
Yes, because they are more concerned with the consequences of making errors. It’s a big deal to patch an embedded device. Not so much for a desktop app.
Waterfallish development is necessary in the embedded world because you are generally building hardware at the same time as the software. You need to know as soon as possible how much memory, how much processor speed, how big a flash, what if any special hardware is necessary etc...The hardware design can’t complete until you know these answers. Once you decide, that is pretty much it. The lead time for redoing a board is far too long. If you mess up then the software is going to have to work around any short-comings. Not usually an ideal situation.
As for the tools, that is largely based on what the supplier provides and any biases of the developers. On some projects I have used XP Embedded and got pretty much everything that the desktop developer gets.
XP, Scrum, Iterative, Lean/Agile:
Since most of the design is done up front (by necessity), and you usually don’t have working hardware when it is time to code, the quick turn-around processes don’t really provide much benefit.
Continuous Integration/Automated Builds
Nice to have, but not really necessary. What…it takes about 15 seconds to open the IDE and press the compile button.
Automated Unit Testing
No reason why this shouldn't be done, but only part of the code can truly be automatically tested because the other part is either hardware dependent or has some other dependencies like timing. So you can't really tell if the code is working by the automated tests.
Refactoring Tool Support
The vendors of embedded processors product is the processor. They provide the IDE support in order to encourage you to purchase their processor. They couldn't possibly afford to pay for a Visual Studio sized development team in order to add all the bells and whistles to the IDE which isn't even their product.

These some reasons I can think of:
Embedded teams are usually smaller that desktop/Web teams. Code base is smaller.
System testing is much more important than unit testing. The software needs to be tested together with hardware. Automated testing is not possible and can only be applied to a small fraction of the code base.
Embedded engineers have a different skill set than software engineers. They interact with hardware, know how to use an oscilloscope and a logic analyzer. Usually, the difficult part of their job is to find a glitch in the hardware. They do not have the time to adopt modern software methologies.

Embedded programmers are mostly electrical engineers, not computer scientists or software engineers.
They excel in their field of expertise. They bring a slower more methodical approach than most computer programmers. When it comes to programming firmware, electrical engineers know just enough to be dangerous.
Here are some of the things I've noticed electrical engineers doing in C:
All code in ONE single file
Math like variable names: x, y, z
No or missing indentation
No stardard comment headers
No comments at all
Too many comments
In their defense EE's didn't train to be computer programmers, it's not their job. I think software is the hardest part of creating embedded devices. Designing PCBs and choosing components requires skill but pales in comparison to the complexity of 10,000 lines of code.
Embedded programmers also have to deal IDE's that look and behave like the IDE's of the 90's.
MPLAB
AVR Studio

Is it just that the embedded environment makes it more difficult to implement new practices or tools?
It's partly a matter of scale. Software is NOT the product, the product is the product. however, there are thousands of different types of microcontrollers and microprocessors out there, and the most popular thousand have 3-4 different compilers that aren't completely compatible.
So a given tool is only going to be used by a few hundred or thousand engineers.
In windows development, however, there are millions of programmers of many levels - the tools produce software directly which is the product, and so it's going to get more eyeballs, and more money.
Each new product that an engineer puts out might have a different processor.
Is it that the mindset of embedded programmers steers them away from new tools/concepts?
Embedded programmers are generally software or firmware engineers, as opposed to programmers. Engineering implies a certain amount of design, design analysis, and design proof prior to implementation - in other words a ton of work is done before the first line of code is written, and the documentation, ideally, is specific enough that implementation is merely turning pseudocode like documentation into compilable code.
New tools and concepts are needed in the design phase, not the implementation phase. An IDE with intellisense may be nice, but by the time the code is being written it's useless cruft - they already know what they need.
CAD - computer aided design - tools are being developed for firmware engineers that are used in the design phase to develop models and simulations that are directly turned into code. Matlab and simulink are good examples of this. The system as a whole is designed.
In fact, one might wonder why software developers are still writing code while the engineers are making data/program flow charts and state machine diagrams. Why is UML uptake so slow in the application world? It sounds like application developers can use some of the tools in common use among embedded systems engineers...
Is it that management in the typical embedded industry behind the curve compared to IT focused fields?
Actually, it's likely the reverse. When a project starts the engineers have to pick the processor.
The processor manufacturers get less money on older chips, so they pitch the latest and greatest, and they are generally cheaper overall than the chips used in the previous design (either by die shrinks, more integration, etc).
So the design is actually using the latest and greatest chips.
The downside is that the compiler and tools are often immature. They can only build so much on the older tools, and since the target moves with each new processor, they can't focus on a lot of the nice features application programmers might like. Especially since many of those features won't be useful to an embedded engineer.
There are many other factors, some of which are enumerated by other answers, but it's really a different field even though they both involve programming.
-Adam

I would also add a couple of points here:
In general embedded projects tend to be smaller than desktop projects. This decreases the need for very elaborated software processes.
Requirements for embedded project are often precise and better defined. Therefore SCRUM and agile are not so crucial
Finally embedded projects are generally a mix of software and hardware. The software being only a part of the project embedded developpers invest less time in software processes

I agree with much that's been written here:
Old tools without the bells and whistles (far fewer refactorings available due to C/C++'s preprocessor directives, if any at all) (time consuming to choose a unit test framework vs simply using JUnit).
It's true that waterfall feels more efficient. If I'm going to open the hood and get into a hard-to-access place, I'll want to do as much as I can while I'm there, rather than exiting and closing the hood after each task just to open it again. The idea that creating the most important features first allows you the option of shipping when promised instead of going late can also be hard to grasp when you believe nothing is optional, which might be true. IME, though, when the deadline looms something always becomes unnecessary.
Less visibility into the system makes it riskier to revisit existing code to refactor or change functionality. There are often timing issues, which automated tests running on the host using stubs and mocks won't catch. It can be hard for someone who's been bitten by these issues to take a different perspective.
I'll add one more; the language of agile/scrum is in workstation programmer's terms. To an embedded developer who knows just enough C to get the job done, what is a class, object, or method? When the "user" is typically regarded as a physical person clicking and typing, and the product has no person user interface, it's easy to dismiss the idea as Not Applicable. This may change with James Grenning's forthcoming book about TDD in C. I've been reading the beta ebook and it's quite good.

I would say it's more lack of good toolsets. It's really frustrating when you want to use C++ for its compile-time features not present in C (templates, namespaces, object-orientedness, etc) rather than its run-time features (exceptions, virtual functions) -- but the device manufacturers & 3rd parties just give you a C compiler, not C++. This probably results more from market size (hundreds of millions of PCs running Windows, with hundreds of thousands or even millions of developers -- vs. hundreds of thousands of Chip X, with hundreds or low thousands of developers) than from device capability.
edit: w/r/t robustness: there are different markets out there. The car/elevator/aeronautics/medical device market is going to have to be rigorous about getting rid of bugs. Other markets (toys, MP3 players, & other consumer electronics) can be less rigorous, especially if it's possible to upgrade code in the field. ("Oops! We're sorry we deleted your music library! We just fixed that bug, you can grab the latest release at our website at your convenience!")

I'd say different sorts of problem environments.
The biggest problem with the waterfall methodology is that requirements change. In every environment I've been in, there has been at least the likelihood of a requirements change, which means that the successful methodologies are those that keep flexibility as long as possible. Even if the customer has signed off in blood, and stands to forfeit his left hand if he suggests a change, there are changes coming in the future.
In embedded programming, it is possible to nail the requirements down up front. They come from the behavior of the system as a whole, and engineers are good at nailing down system requirements. Nobody's going to come in halfway through and say that the user now wants the pacemaker to deliver syncopated impulses while the recipient is dancing.
Once the requirements are frozen beyond thawing, which never happens in software designed for human use, waterfall is a very efficient methodology. The team proceeds from well-specified requirements to overall design, then detailed design, then coding, verifying all the way that the stages are done correctly. Then it's time to debug the code (since it's never perfect when written), and final tests to make sure the code meets the requirements.

I would also posit that some fields are inherently conservative. The transportation industry for example, where trains and planes may have life spans of 30 years or so. Customers tend to require tried and true practices, probably derived from IEEE. Waterfall is what customers know, waterfall is what customers demand.

What exactly defines production?

Like almost anyone who's been programming for a while, I'm familiar with the term "production code" and have a vague sense of what it means. However, can someone offer a semi-rigorous definition, since it seems Wikipedia and Google can't? It seems like there are a lot of gray areas in what counts as production, such as internal tools that are used by a small group of people and therefore not "formalized" in terms of UI, documentation, etc. and open source apps that are feature complete, reasonably bug free and working, but lack polish, UI and extensive testing.

When your code runs on a production system, that means it is being used by the intended audience in a real-world situation.
Production code, however, does not necessarily mean robust, reliable, or stable code. The Daily WTF provides plenty of evidence in this regard.

Production means anything that you need to work reliably, and consistently.
Whether is a build script, or a public facing web server.
When others rely on your code, particularly folks who may not understand it (i.e. even "smart" developers but perhaps not in your group, but using a library you wrote), that code is production code.
It's production because "work stops" and "money is lost" when the production code fails.

The definition as I understand it is that production code is any code that is installed or in use on a live, non-test-bed system. A server used internally to a company is a production system if it is the live system used by the employees of the company. The point here is that code running on a server internal to the company writing the code can be production code.
Usually, a good distinction when looking at internal code is whether or not the group maintaining the code is separate from the group using the code. If the groups are separate, odds are that the code is production code. If running the business depends on the code, then it is certainly production code, even if it is developed and maintained in-house.

EDIT: The short answer: If you are "betting the farm on it", it is "production".
This is a great question--an absolutely critical distinction that routinely gets everyone in trouble due to misunderstandings. The question of what is "production" is a subset of the related question of what is an "environment".
So part of the answer is that "production" is THE "environment" that is most
important and is most trusted as THE "real" thing.
So now we must define "environment" (and then revisit "production"). We are still far from a satisfactory answer.
We programmers use the term "environment" constantly to refer to computer systems consisting of hardware that is executing software. That software is the code that we wrote plus software that it depends upon, which was written by others. We write our code and integrate it with the other software, then we typically run the integrated software through an escalating series of tests (unit tests, integration tests, functional tests, acceptance tests, regression tests, etc.), until we finally run the integrated software in the full manner in which it was intended.
Of course, not everything is fully automated. There are usually numerous people involved, and they have manual processes to perform. We programmers look for ways to automate as many of these processes as possible, but there is always a "man/machine boundary" in the systems we work on. Often, there are many such boundaries in any particular case.
On the other hand, there may not be any significant automation at all. For example, we spoke of "production" way back when we had a room full of people performing manual labor which produced a product. So, there doesn't have to be any automation present in our "production" "environment". There is also a middle ground, where the automation involved does not include software, such as in the case of a person running a loom to weave cloth.
Also, there may not be a product, since we have adapted our language of "production" "environment" to include product-less service providers.
Likewise, the testing may not involve software, since we may be testing a non-software-driven machine (e.g., the loom) or even the people (training and evaluation).
Now we have touched on all the crucial elements of an "environment":
there is a purpose, an intent, being pursued
an intent requires an intender, so there must be a sponsor (a person or
group, but not a machine) that specifies the intent
that intent is pursued through various processes that are performed by
various actors
those actors may be people, they may be software executing on hardware, or they
may be non-software-driven machines, so there may or may not be automation present
Now we can properly and fully define our original terms.
An environment consists of all the processes and their actors that
collaborate to pursue a particular intent on behalf of its sponsor. That
means software executing on hardware, that means non-software-driven machines, and that
means people performing their various duties. It is the intent that primarily
defines an environment, not its processes or its actors.
Furthermore...
If the intent being pursued in a particular environment is the
sponsor's ultimate goal, which usually involves producing a product or
providing a service in exchange for money, then we refer to that
environment as production.
Now we can go a bit further.
If the intent being pursued in an environment is the verification of
processes and their actors in preparation for production, we call
that a test environment.
We further call it an integration environment if that testing involves the
initial joining together of significant individuals or groups of processes and
their actors.
If that preparation involves the "programming" of human actors to perform new
processes, or the subsequent verification (evaluation), then we call that a
training environment.
Armed with these distinctions and definitions, we can now understand several common scenarios.
An environment can be mislabeled with a name that does not match its intent, such as when a training environment is used as test.
An environment can be grossly misused, such as when integration or training is done in production.
An environment can be misrepresented, such as when key processes or actors are left unidentified (e.g., manual reconciliations, or even by ignoring the people altogether).
An environment can be retasked, by repurposing its processes and actors to a new intent. A very successful technique for some organizations is to routinely "flip" several sets of actors (servers hosting software) between production, test, training, and integration upon each release.
In most cases, a single actor (person or hardware) can execute multiple processes which can participate in multiple environments. For example, a single computer server can host software that performs production transactions while also hosting other software that performs test or training functions.
Normally, a single instance of an actor should participate in only one environment at a time. On very rare occasion, a single actor can be shared across environments if the intents are mutually compatible. Most of the time, it is very unwise to attempt such sharing because the intents are not really compatible. A perfect example is running a test process on a server that also supports production processes, resulting in downtime because the test caused the entire server to fail.
Therefore, the intent of an environment must be construed with very wide latitude, to include concepts such as availability, reliability, performance, disaster recovery, accuracy, precision, repeatability, longevity, etc. This means that the actors and processes must often be construed to include things like providing power, cooling, backups, and redundancy.
Finally, note that the situation can get quite complex. For example, a desktop computer (actor) may be tasked by the development team (sponsor) to host their source control (process), which the team relies upon for their primary jobs (production). Nevertheless, the IT staff sees that same desktop computer as simply a developer workstation (development, not production) and treats it with contempt and nonchalance when it develops a hardware problem. But the developers are producing production code, so aren't they also part of production? Perspective matters.
EDIT: Production quality
A solid verification (testing) methodology should take packaged code from development and run it through a series of tests (integration, TQA, functional, regression, acceptance, etc.) until it comes out the other side "stamped" for production use. However, that makes the package production quality, but not actually production. The package only becomes production when a sponsor actually deploys it into an environment with that ultimate level of intent.
However, if your organization merely produces that package (its product) for the consumption of others, then such a release comes as close to production as that organization will experience with respect to that product, so it is common to stretch the term production to apply rather than clarify that it is production quality. In reality, that organization's production environment consists of the actors and processes involved in its development/release efforts that result in that product.
I said it could get quite complex...

Any code that will be used by it's intended userbase would fit into my definition of 'production code'.
Of course, the grey area in that definition would be clearly defining who your userbase is.
G-Man

The production software can perform at the necessary workload without disruption or degradation of the service
Software has been successfully tested in different production scenarios
Transforming working prototype into production software which runs on fail-safe redundant architecture that can work in real business, i.e. production environment, needs time, code refactoring, and attention to details
The production code has acceptable level of maintainability and is reasonably well commented
The documentation manual explains functionality, all features and facilitates maintenance
If the production software is an international service or application, it must be localized
Production code is used by end-users, often customers under conditions described in Terms-of-Service Agreement
Production software does not necessarily mean reliable mission critical software
The software does well, what it was intended to do
Log files provide an accurate description of run-time performance and software reliability metrics and reporting which do facilitate debugging and software maintainability

I think the best way to describe it, is as any code that "leads-to" deployment and "follows-up" deployment. Deployment itself is defined as all of the activities that make a software system available for use. If your code is ready to be used by people, in-house or otherwise, then it is production code.

In simple words "Production code which is live and in use by its intended audience"

The term "production code" mixes two different concepts. One is deployment management and the other is release life cycle.
In the strict sense of the word, a system is in production when it is being used as part of business or service operation. What's not in production are development, testing, QA, demo, and staging system. Production system does not immediately imply quality.
From release life cycle's point of view, a "production" build is the build that is released to general public or clients. It is the stage after pre-alpha, alpha, beta, (feature complete, code complete, etc.) and release candidate. For shrink-wrap products that cannot easily deploy updates, reaching the production stage likely implies series of testing and bug fixes.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas