Using different virtualenv for jobs in apscheduler - apscheduler

I have an apscheduler implementation which is able to run different kinds of tasks. These tasks might have different dependencies which needs to be installed when they are executed. The best was is to create a virtualenv install these respective dependencies taken from a resource file for each task and then may be release it when the task is done.
I have been trying to implement this but haven't had much success. The idea is probably to have a custom executer which can start a subprocess connected to separate python interpreter in a respective virtualenv and run the task there and get some results back. Note: I have only process pools for running tasks.
Does anybody have any idea how to proceed with this or any code snippets?

Nobody has asked for this yet, so I'd say you need to implement the custom executor you mentioned.

Related

How do regression tests best fit within a CI/CD workflow?

I'm writing an API that consists of several microservices. I have the code in a private Gitlab repo. I have a custom CI/CD pipeline configured to run a couple of different steps automatically on every commit to master (e.g. build, test, deploy to a dev environment). Deploying to prod is manual.
I have written some unit tests around this code, which naturally test only small units of the code. These, of course, are run with every commit, because if they fail, that means something in the code has broken.
I also have regression tests which we run after deploying. One of these is actually a bash script that uses curl to hit my production endpoint with certain parameters and checks to make sure that I'm getting 200 responses. I have parameterized this script so I can easily point it at my dev environment (instead of prod).
I use this regression test (and others like it) to check that my already-deployed service is functioning properly. And I run it right after deploying as a final, double-check to confirm that everything is working. But I want to automate that.
My question is where does this fit in a CI/CD workflow? It wouldn't make sense to run this kind of regression test on a commit, because that commit is not necessarily coupled with a deploy. And because there are any number of reasons why the service might be down that are unrelated to whatever code changes went into the most recent commit. In other words, the pipeline should not fail because of external circumstances.
Are there any best practices for running and automating regressions tests?
Great question. There are a couple of interesting points here.
When to run the regression tests (as they exist today) in your CI / CD environment.
The obvious answer to this is to run as a post deploy step. Using the same approach you are currently using to limit the deploy step to the master branch only you can limit this post deploy step to the master branch only.
If you add more details about your environment. For example the CI / CD system that you are using and your current configuration I would be very happy to provide more concrete details on how to achieve this.
It wouldn't make sense to run this kind of regression test on a commit
An interesting approach that I have seen a couple of times. Is using a cloud service (AWS / GCloud etc.) to spin up an environment on each CI run. This means that the full pipeline can be run for every commit. While it takes more resources, it means that you can find issues prior to merging to master. Of course up to you whether the ROI adds up in your environment.

Write a YARN application for a Non-JVM application

Assume I want to use yarn cluster to run a Non-JVM distributed application (e.g. .Net based. is this a good idea?). From what I have read so far, I need to develop a YARN application that consists of
a YARN client that is used to submit the job to the yarn framework
a YARN ApplicationMaster, which is the core to orchestra the application in the cluster.
It seems that the two pieces need to be written using Yarn APIs, which are offered as Jar libraries. It means they have to be written using one of the JVM languages. It seems it's possible to write the YARN client with REST APIs, correct? If yes, it means the client can be written with any language (e.g. C# on .Net). However, for application master, this does not seem to be the case, and it has to use JVM. Correct?
I'm new to YARN. Just want to confirm whether my understanding is correct or not.
The YARN Client and AppMaster need to be written in Java as they are the ones that write to the YARN Java API. The RESTful API, https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/WebServicesIntro.html, is really about offering up the commands that you can do from the CLI.
Fortunately, your "container" processes can be just created with just about anything. http://hortonworks.com/blog/apache-hadoop-yarn-concepts-and-applications/ says it best with the following quote:
"This allows the ApplicationMaster to work with the NodeManager to launch containers ranging from simple shell scripts to C/Java/Python processes on Unix/Windows to full-fledged virtual machines (e.g. KVMs)."
That said, if you are trying to bring a non-Java application (or even a Java one!) that is already a distributed application of sorts then the Slider framework, http://slider.incubator.apache.org/, is probably the best starting point for you.

Automatic Jenkins deployment

I want to be able to automate Jenkins server installation using a script.
I want, given Jenkins release version and a list of {(plugin,version)}, to run a script that will deploy me a new jenkins server and start it using Jetty or Tomcat.
It sounds like a common thing to do (in need to replicate Jenkins master enviroment or create a clean one). Do you know what's the best practice in this case?
Searching Google only gives me examples of how to deploy products with Jenkins but I want to actually deploy Jenkins.
Thanks!
this may require some additional setup at the beginning but perhaps could save you time in the long run. You could use a product called puppet (puppetlabs.com) to automatically trigger the script when you want. I'm basically using that to trigger build outs of my development environments. As I find new things that need to be modified, I simply update my puppet modules and don't need to worry about what needs to be done to recreate the environments through testing for the next go round.

How Can I Automate Running Pig Batch Jobs on Elastic MapReduce without Amazon GUI?

I have some pig batch jobs in .pig files I'd love to automatically run on EMR once every hour or so. I found a tutorial for doing that here, but that requires using Amazon's GUI for every job I setup, which I'd really rather avoid. Is there a good way to do this using Whirr? Or the Ruby Elastic-mapreduce client? I have all my files in s3, along with a couple pig jars with functions I need to use.
Though I don't know how to run pig scripts with the tools that you mention, I know of two possible ways:
To run files locally: you can use cron
To run files on the cluster: you can use OOZIE
That being said, most tools with a GUI, can be controlled via the command line as well. (Though setup may be easier if you have the GUI available).

How to validate deployment packages created by msbuild? (preferably using mstest or nunit)

Our msbuild process creates a variety of zip packages for deployment (mostly web sites, but other things as well). We have a variety of recurring problems that keep sneaking back - files included that shouldn't be, missing resources. This screams for automated validation. The criteria to test for are simple
Validation of foosite package:
Resource files are present.
No test result files, obj files, or other build artifacts
And so forth.
Ideally, I could use nunit or mstest, which everone is familiar with. Msbuild knows where the packages are. We have a lot of packages, possible concurrent builds on different branches. Ergo, the location of the packages and names of the packages are not deterministic - so the tests don't know where the packages are.
What is the simplest way to feed msbuild information to mstest or nunit? The answer to this question would one possible answer, however, that question got architectural advice instead of an answer. I know this isn't a unit test, but the test framework is handy, anyway. I could create an exe to validate the build - but why add a couple hours to the project?
Or, do you have a better suggestion for automatically validating build packages? (MSIs, zips, whatever)?
What I've ended up doing is having a bunch of custom MS build tasks which spin up a virtual machine on Virtual Server, copy the MSI onto the machine, silently deploy it and then validate against it. I used PSExec to start the MSI. It could then use the MSTest command line runner to use MSTest and run your test bits.
This is probably overkill for you, but using a VM allows you to start clean and not be affected by any previous installs on your dev box.
If you want a fast fail, like a unit test, then I suggest you create unit tests against your packages. Such a test would unzip the .ZIP packages, and run some asserts against the contents.
You could even use some TDD techniques against the packages. For instance, if you have a deployment fail because a particular file is missing, then write a unit test that fails because the file is missing; change the build so that the file is present; then make sure the unit test succeeds.
But in general, deployment issues are broader than that, and I echo the suggestion from blowdart. Deploy into one or more virtual machines, then run automates tests over the deployed environments. These tests would not only test for simple things like was there an error returned during the installation itself; they would also check things like were the IIS virtual directories set up correctly, with the correct properties and contents, and does the web site basically run.
I'd use several different virtual machines to test different deployment scenarios: one for a clean deploy; one for an upgrade from version .-1, etc. It's possible that the same, or similar IVT tests could be run for each environment.
Even if you can't do this all at once, the thought process involved in this exercise should lead to a more formal definition of what your deployment environment really is. You this will be helpful when you get a chance to embody this formal definition in actual tests.