I want a sandboxed test environment that is *always* an exact copy of Production - testing

I'm having an issue with a web application I am responsible for maintaining.
The system experiences regular bugs, and our support vendors are always asking us to see if we can "replicate the error in UAT". This is obviously a reasonable request. A lot of the time, for various reasons (some of which are clear, some of which are not), these errors are not present in UAT. This lack of bug reproducability in a testing environment is adding huge amounts of friction to the bug resolution process.
There are 3 key pieces of our system architecture where these bugs are flaring (the CMS, the API layer, and the database). I am proposing we set up a system job that perpetually clones these 3 parts of the system in to a sandboxed test environment. This cloning would happen periodically (eg, once every 24 hours), and automatically.
Is there a technical term for this sort of environment? Is this an established method of helping diagnose system issues? Is there somewhere I can read up on the industry best practices for establishing something like this? Thanks.

The technical term for this kind of process is replication it is often done for some systems like databases, but normally not for testing purpose, but in order to increase available, so the replication is used as a failover spare.
An exact copy of a production system, with all the data is not you'll find often, due to the high demand on resources. Also at some points to two systems have to differ. Most systems (I know of) have tons of interfaces you just can't copy a complete system systems.
Also: you only need the copy of the production system when you actually debugging an issue. And if you are in the middle of that you probably don't want everything to go away and get replaced by a new copy.
So instead I would recommend to setup scripts that allows to obtain a copy of the relevant parts on demand.
Also you might want to consider how you might be able to modify your system to make it easier to setup a copy.
For example, when you have all the setup automated (with chef/docker or similar) you should be able to setup the same system again anywhere you want, so you now you just have to get the production data over.
Which is an interesting point. Production data often contains secret information (because it is vital to the business, or because it is personal data). You don't want this kind of stuff hang around in a test system everybody can access.

Related

IBM S/390 mainframe COBOL source code

We have an S/390 mainframe at my new job that’s been running COBOL applications since the late 90’s. The mainframe is getting old enough that we need to migrate to a newer system. We’re a small enough business that we can’t warrant spending the money to upgrade to new mainframe hardware and the program logic has been a constant work in progress for 30+ years, so it has a lot of functional value. I’ve been considering moving the functionality to a Linux machine and using something like OpenCOBOL to recompile as an executable binary instead of trying to rewrite it in a newer language. I haven’t messed with a mainframe enough to have any clue how or where to access this information and the gentleman that wrote all of the programs is unfortunately no longer with us. I’ve read that SSH is an option, but I’m not even sure how to get the ball rolling on that with a mainframe. I use Linux on a fairly regular basis, so I’m familiar with SSH, but from my understanding those mainframes aren’t a simple OS that you can merely connect to and navigate the file system to retrieve data like we can in modern operating systems. Can anyone give me some pointers to get a sense of direction for accessing the source code for the COBOL programs? Are there default locations that they are stored, etc.? They’re somewhat simple programs that don’t use any DB2 functionality and will hopefully compile on a different system with relatively minimal debugging and fixes. I’m certain that I’ve left out necessary information that would help getting an answer to this question, and I can provide any additional information that is needed to help you all help me. I suspect that SSH isn’t enabled by default, but maybe I’m wrong there too. Any assistance is greatly appreciated. Thanks everyone!
Although not a programming question I'll provide some guidance I think might help you.
First, this is a business decision about where to invest.
Do we upgrade the system to a newer model and upgrade some software and acquire the skills to keep the system running? (System Programming, OS upgrade and cost of migration, newer platform (used z13 could be an economical option, storage systems to support the mainframe)
Migration of existing workloads to other platforms. (Cost to migrate code, sizing of performance needs, new technologies to replace existing access methods like VSAM or dare I say ISAM if the applications are old enough)
Status Quo ... leave things where they are and keep the lights on
In evaluating any option you have to assess the risk to the business and what would a disruption cost? IMHO, its less about a technology like SSH or COBOL on Linux but requires some serious assessment of the current state, the acceptable to be scenarios and the cost of pursuing one of those options.
My comments are not intended to instill fear but provide a framework of how do I approach analyzing a challenge of this magnitude.
There is no default location where source code is stored on z/OS (it is z/OS you're talking about, right?). Source code is usually stored in PDS data sets. The naming of those depends on the installation, i.e. the company, and whether or not any software like Endevor, ChangeMan, etc. is being used to maintain the sources.
Since this is old z/OS (OS/390) COBOL code, chances are the code is making use of OS specifics such as record level I/O, VSAM data sets, etc. These are the parts that will not work on a non-z/OS platform without major rewrite. So, you will need to look into the sources.
SSH is available on z/OS, but it needs to be configured and enabled. You need to check with your z/OS sysprog. FTP, and NFS are other options, but again, they need to be configured and enabled.
Transfering the sources is the least of your problems, I'd say.
I have to agree with the prior two answers, but have some additional suggestions. This is a business decision what to do on the system.
Finding the program to understand what it does is the first requirement. Since you know what program is running that may be the name of the source file. That you will need to find. The source file probably will be in some library manager, the first place to look is in the ISPF menu system. There will be an option for the library manager you are using if you are using one. Based on your description you may be using something called SCLM which would should up, or you might see Librarian or Panvalet. You will need to get into ISPF by connecting using a 3270 connection emulator. Once you find the file, using FTP or SFTP may be the best, or your emulator may just provide a transfer mechanism. You will need to find the related files as well, which should also be defined in the library manager.
Once you have the file, you will need to figure out what it uses as mentioned above, it will be working with some kind of data file, and that will be the biggest part to deal with.
If it is a batch program it is probably part of a schedule, and there are other programs also running that you will need to find and figure out how they fit together.
Once you have an understanding of all the parts then you can work to make the right business decision as to how this should be run. You may want to upgrade, you may want to look to getting z/OS as a cloud service if you don't want to upgrade but you want the function. Or it may be a simple program you could move. That will be much easier to figure out once you have the details.
You say the program logic has changing for 30+ years. Was it only one person making all the changes ? Would anyone on the team have some idea about the PDS's that the user had access to ? That might be one of the places to look for. As the previous answers suggested , most shops would have store the source code in some kind of config mgmt tool like SCLM or panvelet. If you have access to the load code, there are utilities that can be used to inspect the load member to get a CSECT listing which would have the names of the obj members that make up that load.You can check with your mainframe admins. That can get you the source code file names. We use SSH from USS in our shop to move code from a HFS folder to gitlab. I have also used plain FTP to just transfer source code files to my workstation . But yes, first you have to find where it is stored.

How to address multi-vendor ATM support in a Windows application

After reading about the CEN/XFS programming reference I thought it would be "easy" to write ATM software that will be supported in all ATMs. At first view, the whole standard seems reasonable to me in terms of portability.
However, to my great surprise, I have had access to some ATMs from well known vendors that do not have even the Microsoft XFS manager (msxfs.dll, etc.) installed. I thought this would be a very rare case.
I have been told that some vendors have their own XFS manager. Is it true? I thought JXFS or a vendor specific layer would depend on the CEN/XFS manager under the hood.
If so, do I have to be aware of all vendor dependant APIs? I refuse to believe this industry works like this.
Sad truth is that generig software doesn't work that well on any of the ATMs out there.
Generally speaking I belive every vendor creates their own XFS manager. The used XFS manager is pretty generic though so who ever the XFS manager provider is is not that a big deal. Actual device and service provider implementations are the real differences.
So you could write your software to a common subset of the features and you could even get a decent level of operability using that aproach. Well until you need to start and handle the error cases that is. The limitations would at this point create situations that just make that generic software useless in practice.
Reason to that is simply because all the devices are so different on implementation level and thus can do different things during and after error conditions.
So even though the CEN/XFS error codes might be the same for two vendors the required operations can be quite a bit different as their responses may indicate different severity or the error condition might be even self clearing on one, but may require operator intervention on an other one.
Because you naturally want all the available benefits from the hardware you have so at that point we start to need configuration options that are just outside the scope of CEN/XFS. After you go that way you start to get the benefits of the hardware, but that also means higher complexity to your software. Oh and you'll need lots and lots of testing as sadly you can't really trust vendor documentation either...

How should release management be structured for an agile Professional Services department?

Background
Professional Services departments provide add-on services to customers of a product.
A lot of these projects are small (4-10 hours) and need to be turned around quickly. Additionally, these are important projects as they are enhancements that customers rely on for their business.
Some challenges are:
There is a good amount of rework or feature changes as customers often change their mind or make tiny additional request. Aside from the obvious that this is a mangement issue (managing scope creep etc.), the fact remains that often there are minor tweaks that need to be implemented after the project is "live".
Sometimes something breaks for whatever reason and issues need to be handled with expedience. Again, these are in-production processes that customers rely on.
Currently, our release management is very ad hoc:
Engineers manage the projects from soup to nuts, including scoping, customer relationships, code development, production deployment, and project support (for any subsequent issues).
We have dev servers and we have production servers. The servers exist on-site in a server farm. They are not backed up ever, and they have no redundancy because they are not in the colo - they kind of get second class service from operations.
They Engineers have full root(linux)/admin(windows) access to the dev and prod servers. They develop on the dev servers, and when the project is ready, deploy to prod (basically, just copy the files up). When issues come up, they just work directly on the servers.
We use svn for source control but it's basically just check out to dev, work on the project, check in as necessary, and deploy to prod just by copying files up to the server.
The problem:
The problem is basically number 2 above. The servers are not treated with the same reverence by operations that our product servers (in the colo) are treated. We need the servers to be first class citizens for operations. However their proposal is to put them in the colo, which makes them untouchable. If we do that, we will need to go through operations to get projects deployed. Basically it will be the same arduous and painful process that the product engineers go through when releasing an update to our software product.
This would take away all our agility in responding to these tiny projects and the issues that arise that need immediate attention.
The question
How should we solve this problem?
Should we put the servers in colo and just work with the formal release process?
How should this situation be handled?
Any help making this question better is welcome!
The servers exist on-site in a server farm. They are not backed up
ever, and they have no redundancy because they are not in the colo -
they kind of get second class service from operations.
So you want these servers to be self-serviceable by your PS engineers, yet have good redundancy, backup etc without having to go through formal ops processes. Can't you move them from the on-site server farm to the cloud (ec2 or other)? btw, #3 & #4 are accidents waiting to happen but that is not material to the main question here.
This is an old question but sounds very similar to our company in that production team requires a lot of small changes.
I'm having a hard time understanding the question but I'll attempt an answer.
You should not place development servers in the colo because it will slow down your development process. If operations is not able to give you the support you need in development could you designate a developer or bring on someone that can support your teams needs when it comes to server management/requirements. Ideally a build engineer, release manager, or even say a QA resource. Unfortunately it sounds like a political management issue. In that case you need to clearly layout you issues and address them with management. If I completely missed the mark let me know.

Why use AMQP/ZeroMQ/RabbitMQ

as opposed to writing your own library.
We're working on a project here that will be a self-dividing server pool, if one section grows too heavy, the manager would divide it and put it on another machine as a separate process. It would also alert all connected clients this affects to connect to the new server.
I am curious about using ZeroMQ for inter-server and inter-process communication. My partner would prefer to roll his own. I'm looking to the community to answer this question.
I'm a fairly novice programmer myself and just learned about messaging queues. As i've googled and read, it seems everyone is using messaging queues for all sorts of things, but why? What makes them better than writing your own library? Why are they so common and why are there so many?
what makes them better than writing your own library?
When rolling out the first version of your app, probably nothing: your needs are well defined and you will develop a messaging system that will fit your needs: small feature list, small source code etc.
Those tools are very useful after the first release, when you actually have to extend your application and add more features to it.
Let me give you a few use cases:
your app will have to talk to a big endian machine (sparc/powerpc) from a little endian machine (x86, intel/amd). Your messaging system had some endian ordering assumption: go and fix it
you designed your app so it is not a binary protocol/messaging system and now it is very slow because you spend most of your time parsing it (the number of messages increased and parsing became a bottleneck): adapt it so it can transport binary/fixed encoding
at the beginning you had 3 machine inside a lan, no noticeable delays everything gets to every machine. your client/boss/pointy-haired-devil-boss shows up and tell you that you will install the app on WAN you do not manage - and then you start having connection failures, bad latency etc. you need to store message and retry sending them later on: go back to the code and plug this stuff in (and enjoy)
messages sent need to have replies, but not all of them: you send some parameters in and expect a spreadsheet as a result instead of just sending and acknowledges, go back to code and plug this stuff in (and enjoy.)
some messages are critical and there reception/sending needs proper backup/persistence/. Why you ask ? auditing purposes
And many other use cases that I forgot ...
You can implement it yourself, but do not spend much time doing so: you will probably replace it later on anyway.
That's very much like asking: why use a database when you can write your own?
The answer is that using a tool that has been around for a while and is well understood in lots of different use cases, pays off more and more over time and as your requirements evolve. This is especially true if more than one developer is involved in a project. Do you want to become support staff for a queueing system if you change to a new project? Using a tool prevents that from happening. It becomes someone else's problem.
Case in point: persistence. Writing a tool to store one message on disk is easy. Writing a persistor that scales and performs well and stably, in many different use cases, and is manageable, and cheap to support, is hard. If you want to see someone complaining about how hard it is then look at this: http://www.lshift.net/blog/2009/12/07/rabbitmq-at-the-skills-matter-functional-programming-exchange
Anyway, I hope this helps. By all means write your own tool. Many many people have done so. Whatever solves your problem, is good.
I'm considering using ZeroMQ myself - hence I stumbled across this question.
Let's assume for the moment that you have the ability to implement a message queuing system that meets all of your requirements. Why would you adopt ZeroMQ (or other third party library) over the roll-your-own approach? Simple - cost.
Let's assume for a moment that ZeroMQ already meets all of your requirements. All that needs to be done is integrating it into your build, read some doco and then start using it. That's got to be far less effort than rolling your own. Plus, the maintenance burden has been shifted to another company. Since ZeroMQ is free, it's like you've just grown your development team to include (part of) the ZeroMQ team.
If you ran a Software Development business, then I think that you would balance the cost/risk of using third party libraries against rolling your own, and in this case, using ZeroMQ would win hands down.
Perhaps you (or rather, your partner) suffer, as so many developers do, from the "Not Invented Here" syndrome? If so, adjust your attitude and reassess the use of ZeroMQ. Personally, I much prefer the benefits of Proudly Found Elsewhere attitude. I'm hoping I can proud of finding ZeroMQ... time will tell.
EDIT: I came across this video from the ZeroMQ developers that talks about why you should use ZeroMQ.
what makes them better than writing your own library?
Message queuing systems are transactional, which is conceptually easy to use as a client, but hard to get right as an implementor, especially considering persistent queues. You might think you can get away with writing a quick messaging library, but without transactions and persistence, you'd not have the full benefits of a messaging system.
Persistence in this context means that the messaging middleware keeps unhandled messages in permanent storage (on disk) in case the server goes down; after a restart, the messages can be handled and no retransmit is necessary (the sender does not even know there was a problem). Transactional means that you can read messages from different queues and write messages to different queues in a transactional manner, meaning that either all reads and writes succeed or (if one or more fail) none succeeds. This is not really much different from the transactionality known from interfacing with databases and has the same benefits (it simplifies error handling; without transactions, you would have to assure that each individual read/write succeeds, and if one or more fail, you have to roll back those changes that did succeed).
Before writing your own library, read the 0MQ Guide here: http://zguide.zeromq.org/page:all
Chances are that you will either decide to install RabbitMQ, or else you will make your library on top of ZeroMQ since they have already done all the hard parts.
If you have a little time give it a try and roll out your own implemntation! The learnings of this excercise will convince you about the wisdom of using an already tested library.

What exactly defines production?

Like almost anyone who's been programming for a while, I'm familiar with the term "production code" and have a vague sense of what it means. However, can someone offer a semi-rigorous definition, since it seems Wikipedia and Google can't? It seems like there are a lot of gray areas in what counts as production, such as internal tools that are used by a small group of people and therefore not "formalized" in terms of UI, documentation, etc. and open source apps that are feature complete, reasonably bug free and working, but lack polish, UI and extensive testing.
When your code runs on a production system, that means it is being used by the intended audience in a real-world situation.
Production code, however, does not necessarily mean robust, reliable, or stable code. The Daily WTF provides plenty of evidence in this regard.
Production means anything that you need to work reliably, and consistently.
Whether is a build script, or a public facing web server.
When others rely on your code, particularly folks who may not understand it (i.e. even "smart" developers but perhaps not in your group, but using a library you wrote), that code is production code.
It's production because "work stops" and "money is lost" when the production code fails.
The definition as I understand it is that production code is any code that is installed or in use on a live, non-test-bed system. A server used internally to a company is a production system if it is the live system used by the employees of the company. The point here is that code running on a server internal to the company writing the code can be production code.
Usually, a good distinction when looking at internal code is whether or not the group maintaining the code is separate from the group using the code. If the groups are separate, odds are that the code is production code. If running the business depends on the code, then it is certainly production code, even if it is developed and maintained in-house.
EDIT: The short answer: If you are "betting the farm on it", it is "production".
This is a great question--an absolutely critical distinction that routinely gets everyone in trouble due to misunderstandings. The question of what is "production" is a subset of the related question of what is an "environment".
So part of the answer is that "production" is THE "environment" that is most
important and is most trusted as THE "real" thing.
So now we must define "environment" (and then revisit "production"). We are still far from a satisfactory answer.
We programmers use the term "environment" constantly to refer to computer systems consisting of hardware that is executing software. That software is the code that we wrote plus software that it depends upon, which was written by others. We write our code and integrate it with the other software, then we typically run the integrated software through an escalating series of tests (unit tests, integration tests, functional tests, acceptance tests, regression tests, etc.), until we finally run the integrated software in the full manner in which it was intended.
Of course, not everything is fully automated. There are usually numerous people involved, and they have manual processes to perform. We programmers look for ways to automate as many of these processes as possible, but there is always a "man/machine boundary" in the systems we work on. Often, there are many such boundaries in any particular case.
On the other hand, there may not be any significant automation at all. For example, we spoke of "production" way back when we had a room full of people performing manual labor which produced a product. So, there doesn't have to be any automation present in our "production" "environment". There is also a middle ground, where the automation involved does not include software, such as in the case of a person running a loom to weave cloth.
Also, there may not be a product, since we have adapted our language of "production" "environment" to include product-less service providers.
Likewise, the testing may not involve software, since we may be testing a non-software-driven machine (e.g., the loom) or even the people (training and evaluation).
Now we have touched on all the crucial elements of an "environment":
there is a purpose, an intent, being pursued
an intent requires an intender, so there must be a sponsor (a person or
group, but not a machine) that specifies the intent
that intent is pursued through various processes that are performed by
various actors
those actors may be people, they may be software executing on hardware, or they
may be non-software-driven machines, so there may or may not be automation present
Now we can properly and fully define our original terms.
An environment consists of all the processes and their actors that
collaborate to pursue a particular intent on behalf of its sponsor. That
means software executing on hardware, that means non-software-driven machines, and that
means people performing their various duties. It is the intent that primarily
defines an environment, not its processes or its actors.
Furthermore...
If the intent being pursued in a particular environment is the
sponsor's ultimate goal, which usually involves producing a product or
providing a service in exchange for money, then we refer to that
environment as production.
Now we can go a bit further.
If the intent being pursued in an environment is the verification of
processes and their actors in preparation for production, we call
that a test environment.
We further call it an integration environment if that testing involves the
initial joining together of significant individuals or groups of processes and
their actors.
If that preparation involves the "programming" of human actors to perform new
processes, or the subsequent verification (evaluation), then we call that a
training environment.
Armed with these distinctions and definitions, we can now understand several common scenarios.
An environment can be mislabeled with a name that does not match its intent, such as when a training environment is used as test.
An environment can be grossly misused, such as when integration or training is done in production.
An environment can be misrepresented, such as when key processes or actors are left unidentified (e.g., manual reconciliations, or even by ignoring the people altogether).
An environment can be retasked, by repurposing its processes and actors to a new intent. A very successful technique for some organizations is to routinely "flip" several sets of actors (servers hosting software) between production, test, training, and integration upon each release.
In most cases, a single actor (person or hardware) can execute multiple processes which can participate in multiple environments. For example, a single computer server can host software that performs production transactions while also hosting other software that performs test or training functions.
Normally, a single instance of an actor should participate in only one environment at a time. On very rare occasion, a single actor can be shared across environments if the intents are mutually compatible. Most of the time, it is very unwise to attempt such sharing because the intents are not really compatible. A perfect example is running a test process on a server that also supports production processes, resulting in downtime because the test caused the entire server to fail.
Therefore, the intent of an environment must be construed with very wide latitude, to include concepts such as availability, reliability, performance, disaster recovery, accuracy, precision, repeatability, longevity, etc. This means that the actors and processes must often be construed to include things like providing power, cooling, backups, and redundancy.
Finally, note that the situation can get quite complex. For example, a desktop computer (actor) may be tasked by the development team (sponsor) to host their source control (process), which the team relies upon for their primary jobs (production). Nevertheless, the IT staff sees that same desktop computer as simply a developer workstation (development, not production) and treats it with contempt and nonchalance when it develops a hardware problem. But the developers are producing production code, so aren't they also part of production? Perspective matters.
EDIT: Production quality
A solid verification (testing) methodology should take packaged code from development and run it through a series of tests (integration, TQA, functional, regression, acceptance, etc.) until it comes out the other side "stamped" for production use. However, that makes the package production quality, but not actually production. The package only becomes production when a sponsor actually deploys it into an environment with that ultimate level of intent.
However, if your organization merely produces that package (its product) for the consumption of others, then such a release comes as close to production as that organization will experience with respect to that product, so it is common to stretch the term production to apply rather than clarify that it is production quality. In reality, that organization's production environment consists of the actors and processes involved in its development/release efforts that result in that product.
I said it could get quite complex...
Any code that will be used by it's intended userbase would fit into my definition of 'production code'.
Of course, the grey area in that definition would be clearly defining who your userbase is.
G-Man
The production software can perform at the necessary workload without disruption or degradation of the service
Software has been successfully tested in different production scenarios
Transforming working prototype into production software which runs on fail-safe redundant architecture that can work in real business, i.e. production environment, needs time, code refactoring, and attention to details
The production code has acceptable level of maintainability and is reasonably well commented
The documentation manual explains functionality, all features and facilitates maintenance
If the production software is an international service or application, it must be localized
Production code is used by end-users, often customers under conditions described in Terms-of-Service Agreement
Production software does not necessarily mean reliable mission critical software
The software does well, what it was intended to do
Log files provide an accurate description of run-time performance and software reliability metrics and reporting which do facilitate debugging and software maintainability
I think the best way to describe it, is as any code that "leads-to" deployment and "follows-up" deployment. Deployment itself is defined as all of the activities that make a software system available for use. If your code is ready to be used by people, in-house or otherwise, then it is production code.
In simple words "Production code which is live and in use by its intended audience"
The term "production code" mixes two different concepts. One is deployment management and the other is release life cycle.
In the strict sense of the word, a system is in production when it is being used as part of business or service operation. What's not in production are development, testing, QA, demo, and staging system. Production system does not immediately imply quality.
From release life cycle's point of view, a "production" build is the build that is released to general public or clients. It is the stage after pre-alpha, alpha, beta, (feature complete, code complete, etc.) and release candidate. For shrink-wrap products that cannot easily deploy updates, reaching the production stage likely implies series of testing and bug fixes.