Hadoop development environment, what yours looks like?

Hadoop development environment, what yours looks like? - ide

I would like to know what yours Hadoop development environment looks like?
Do you deploy jars to test cluster, or run jars in local mode?
What IDE do you use and what plugins do you use?
How do you deploy completed projects to be run on servers?
What are you other recommendations about setting my own Hadoop development/test environment?

It's extremely common to see people writing java MR jobs in an IDE like Eclipse or IJ. Some even use plugins like Karamasphere's dev tools which are handy. As for testing, the normal process is to unit test business logic as you normally would. You can unit test some of the MR surrounding infrastructure using the MRUnit classes (see Hadoop's contrib). The next step is usually testing in the local job runner, but note there a number of caveats here: the distributed cache doesn't work in local mode, and you're singly threaded (so static variables are accessible in ways they won't be in production). The next step (and most common test environment) is pseudo-distributed mode - all daemons running, but on a single box. This is going to run code in different JVMs with multiple tasks in parallel and will reveal most developer errors.
MR job jars are distributed to the client machine in different ways. Usually custom deployment processes are seen here. Some folks use tools like Capistrano or config management tools like Chef or Puppet to automate this.
My personal development is usually done in Eclipse with Maven. I build jars using Maven's Assembly plugin (packages all dependencies in a single jar for easier deployment, but fatter jars). I regularly test using MRUnit and then pseudo-distributed mode. The local job runner isn't very useful in my experience. Deployment is almost always via a configuration management system. Testing can be automated with a CI server like Hudson.
Hope this helps.

Related

E2E Test Automation workflow with GitLab CI/CD

I am to build a test automation system for E2E testing for a company. The product is React/Node.JS based runs in a cloud (Docker & Kubernetes). The code is stored in GitLab repositories for which there are CI/CD pipelines setup for test/lint/deployment.
I plan to use Jest for test orchestration and Selenium / Appium for the UI testing (FRW being in TypeScript), while creating a generator to test our proprietary backend interface.
My code is in a similar repository and will be containerized and uploaded to the test environment.
In my former workplaces we used TeamCity and similar tools to manage test sessions but I do not seem to be able to find the perfect link between our already set up GitLab CI/CD and the E2E testing framework.
I know it could be implemented as part of the pipeline, but for me it seems lacking (which can also be because of my inexperience)
Could you advise some tools/methods for handling test session management for system testing in such an environment?
(with a GUI where I can see the progress of all sessions, being able to manage them, run / rerun / run on certain platforms only, etc)

Code and DB Automation Migration Tool

Does someone know of a Code/DB migration tool. I'm looking for a script-able way of automating code pushes from Dev to Production environments. The tool should be language/db agnostic so it can work with multiple DB's and development languages.

You can take a look at this tools (for deployment):
Capistrano + best practices
Maven
Heroku
In case you are talking about binaries, such concept is strongly recommended by the Continuous delivery. Since they consider it as antipattern to promote at the source-code level rather than at the binary level. Use of CI tools to trigger production deployments or you could easily write a simple interface to drive your deploy scripts. Alternatively there are numerous tools available (open source as well as enterprise) which can be used to drive your deployments. Jenkins is quite popular.

Bamboo build plans vs deployment plans for custom environment configurations

I'm evaluating Bamboo to replace our Jenkins setup and have a couple questions. I have a .NET solution that generates two artifacts: a packaged website and an MSI. I have three environments I deploy to: test, stage, production. Our Jenkins server in turn has three jobs--one for each environment. Each job builds the solution, copies in configuration files for the environment it will be deployed to and then deploys the artifacts. Reading the documentation and other stuff (https://answers.atlassian.com/questions/19562/plans-stages-jobs-best-practices), I'm getting mixed signals about how deployment should work with Bamboo. It seems to me like deployment plans expect artifacts to exist and then deploy them. But, build plans include deployment steps as well. How is all of this supposed to interact together?
The reason I'm confused is because I have environment specific configuration files that get packaged during a build. Any direction on how this should work?

I posted the question to the Atlassian board as well and got an answer I think I like the best:
Jason Monsorno · 762 karma · Aug 30 '13 at 04:38 PM
Deployment projects in Bamboo seem to be dependant on the existance of
an artifact, the catch is you don't necessary need to use that
artifact so you could use an empty artifact and do completely
independent steps. Deployment projects are still fairly new to Bamboo
and your structure may favor the "normal" workflow so each environment
would be a separate manual stage.
The Deployment project do have a separate workflow and versioning. To
use Deployment projects in your scenario, I'd suggest making the
artifact the entire checkout then each Deployment environment can
build a copy of the artifact. The space-saving/less-time-efficient
option would be to just save the current revision in a file as the
artifact and use that to check it out and build in each Deployment
environment.

Fitnesse deployment practices

Is there some documentation on the best ways to organize the deployment of Fitnesse for use in projects?
I have many questions:
How should it be stored? Should the whole fitnesse root be stored in SVN? How do you deal with acceptance tests that span multiple svn repositories?
We have some code that runs only on linux (server) and other code that runs only on windows (client) that make up the complete system, how do you run these? Do you have multiple Finesse servers?

In the company that I work we are setting up a FitNesse for Functional Tests integrated with SVN and Selenium.
Here is our basic idea:
Store FitNesse in a repository on SVN (yes, the root)
Store Selenium tests in another repository on SVN (per project and both .html and .java TestNG generated)
Use Hudson to automate checkouts from SVN and put everything to run on a QA Environment. If a FitNesse acceptance test span across multiple svn repositories, Hudson is able to download and build the projects. This way, FitNesse does not need to deal with this issue.
We are still integrating the tools. We also use Jira, Testlink, Sonar and MediaWiki.

Maven Cargo configuring a Glassfish 2.1 instance to run integration-tests?

I was wondering whether it is possible to use Maven2 to automatically configure a Glassfish 2.1 with JNDI Resources, Datasources and Mail-Sessions for my integration tests.
Also I wonder whether it is possible to create some sort of benchmarks that might then be tracked using continuum or Hudson.

I was wondering whether it is possible to use Maven2 to automatically configure a Glassfish 2.1 with JNDI Resources, Datasources and Mail-Sessions for my integration tests.
I'm not sure Cargo does provide anything to configure Mail-Sessions. And anyway, from what I can see in DataSource+and+Resource+Support, there is no support at all for GlassFish. I'd simply configure the installed container against which you run your integration tests.
Also I wonder whether it is possible to create some sort of benchmarks that might then be tracked using continuum or Hudson.
You could run JMeter performance tests. Hudson has a Performance Plugin allowing to generate a trend graphic report from the results. Also maybe have a look at JChav (seems dormant though).

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas