marklogic content pump yarn support - hadoop-yarn

We are running mlcp.sh in distributed mode on cdh5.2.4, the job is always running in local its not submitting to yarn/resource manager. does anyone successfully implement mlcp on cdh5+?
we are using marklogic-contentpump-1.0.5.jar
bin/mlcp.sh export
-host xxx.xx.xx.xxx
-port xxxx
-username <user>
-password xxxxx
-output_type sequence
-compress_type record
-output_file_path /tmp
-mode distributed
-job_queue cp11
-query_type unfiltered
-max_split_size 500
-query_config file.properties
-after_ts 2015-01-01T16:55:05-04:00
-before_ts 2015-04-10T17:55:37-04:00
-perm_path /data/mlcp

Fixed after changing from client-0.20 to client for yarn
Using JAR Files Provided in the hadoop-client Package
Make sure you add to your project all of the JAR files provided under /usr/lib/hadoop/client-0.20 (for MRv1 APIs) or /usr/lib/hadoop/client (for YARN).
For example, you can add this location to the JVM classpath:
$ export CLASSPATH=/usr/lib/hadoop/client-0.20/\*

Related

How to Use Docker Build Secrets with Kaniko

Context
Our current build system builds docker images inside of a docker container (Docker in Docker). Many of our docker builds need credentials to be able to pull from private artifact repositories.
We've handled this with docker secrets.. passing in the secret to the docker build command, and in the Dockerfile, referencing the secret in the RUN command where its needed. This means we're using docker buildkit. This article explains it.
We are moving to a different build system (GitLab) and the admins have disabled Docker in Docker (security reasons) so we are moving to Kaniko for docker builds.
Problem
Kaniko doesn't appear to support secrets the way docker does. (there are no command line options to pass a secret through the Kaniko executor).
The credentials the docker build needs are stored in GitLab variables. For DinD, you simply add those variables to the docker build as a secret:
DOCKER_BUILDKIT=1 docker build . \
--secret=type=env,id=USERNAME \
--secret=type=env,id=PASSWORD \
And then in docker, use the secret:
RUN --mount=type=secret,id=USERNAME --mount=type=secret,id=PASSWORD \
USER=$(cat /run/secrets/USERNAME) \
PASS=$(cat /run/secrets/PASSWORD) \
./scriptThatUsesTheseEnvVarCredentialsToPullArtifacts
...rest of build..
Without the --secret flag to the kaniko executor, I'm not sure how to take advantage of docker secrets... nor do I understand the alternatives. I also want to continue to support developer builds. We have a 'build.sh' script that takes care of gathering credentials and adding them to the docker build command.
Current Solution
I found this article and was able to sort out a working solution. I want to ask the experts if this is valid or what the alternatives might be.
I discovered that when the kaniko executor runs, it appears to mount a volume into the image that's being built at: /kaniko. That directory does not exist when the build is complete and does not appear to be cached in the docker layers.
I also found out that if if the Dockerfile secret is not passed in via the docker build command, the build still executes.
So my gitlab-ci.yml file has this excerpt (the REPO_USER/REPO_PWD variables are GitLab CI variables):
- echo "${REPO_USER}" > /kaniko/repo-credentials.txt
- echo "${REPO_PWD}" >> /kaniko/repo-credentials.txt
- /kaniko/executor
--context "${CI_PROJECT_DIR}/docker/target"
--dockerfile "${CI_PROJECT_DIR}/docker/target/Dockerfile"
--destination "${IMAGE_NAME}:${BUILD_TAG}"
Key piece here is echo'ing the credentials to a file in the /kaniko directory before calling the executor. That directory is (temporarily) mounted into the image which the executor is building. And since all this happens inside of the kaniko image, that file will disappear when kaniko (gitlab) job completes.
The developer build script (snip):
//to keep it simple, this assumes that the developer has their credentials//cached in a file (ignored by git) called dev-credentials.txt
DOCKER_BUILDKIT=1 docker build . \
--secret id=repo-creds,src=dev-credentials.txt
Basically same as before. Had to put it in a file instead of environment variables.
The dockerfile (snip):
RUN --mount=type=secret,id=repo-creds,target=/kaniko/repo-credentials.txt USER=$(sed '1q;d' /kaniko/repo-credentials.txt) PASS=$(sed '2q;d' /kaniko/repo-credentials.txt) ./scriptThatUsesTheseEnvVarCredentialsToPullArtifacts...rest of build..
This Works!
In the Dockerfile, by mounting the secret in the /kaniko subfolder, it will work with both the DinD developer build as well as with the CI Kaniko executor.
For Dev builds, DinD secret works as always. (had to change it to a file rather than env variables which I didn't love.)
When the build is run by Kaniko, I suppose since the secret in the RUN command is not found, it doesn't even try to write the temporary credentials file (which I expected would fail the build). Instead, because I directly wrote the varibles to the temporarily mounted /kaniko directory, the rest of the run command was happy.
Advice
To me this does seem more kludgy than expected. I'm wanting to find out other/alternative solutions. Finding out the /kaniko folder is mounted into the image at build time seems to open a lot of possibilities.

Redis .conf file running problem for slave redis instance

The problem is my Windows 10 does not understand redis commands. I downlaoded and installed on D/Program Files/Redis my cli and server .msi files.
I run with command:
redis-server D:/Program Files/Redis/redis-slave.windows.conf
and expect to get an instance on redis slave with configuration provided in course in .conf file but ger error:
Invalid argument during startup: Failed to open the .conf file: Files/Redis/redis-slave.windows.conf CWD=D:\Program Files\Redis
and the problem is not in wrong configuration because I can copy default .conf redis file and it is the same
Another problem - but not so important as above to me -- but similar when I try to run a cluster Windows 10 does not know how to open this file. I run command:
D:\Program Files\Redis2\redis-7.0.4\utils\create-cluster>./create-cluster start
and receive windows where I need to choose a program to run this file but does not seen it in your MacOS case, anyway this create-cluster file does not have extension so I do not know what to do to make it run

PowerCLI deploy template

I am trying to deploy an OVA file using PowerCLI on my laptop. The script works if the -Source is on a UNC share or in this case $ovfpath is a mapped drive on my laptop. But what this does is drag the 12gb ova file across the network every time a new vm gets created. What I would like is to have the -Source on the datastore and only have to copy it across the WAN 1 time. I've tried using https:\host.... but the script fails. If I use the vSphere GUI to deploy from template and use the HTTPS url it works. Any ideas for how to access the -Source from a datastore?
$ovfpath = Get-ChildItem z:\
$myDatastore = Get-Datastore -Name "Datastore2"
$vmHost = Get-VMHost -Name "$newHost"
$vmHost | Import-vApp **-Source "$ovfpath\Win2012_R2_Std.ova"** -Name newVM01 -Datastore $myDatastore -Force
can you not import the OVF, convert the imported VM to a template and then deploy additional VMs from the template?
That way it will not use the network each time.

Pentaho 5.4.0 Community Edition Remote Execution

am using Pentaho community edition 5.4.0 ,I explain My requirement very Simply,
1) I have my jobs and transformation in my local windows machine and i like to execute those in my client machine ,So that i installed same Pentaho community version 5.4.0 on his machine. For Remote Execution i heard about Carte.bat service,I searched the installation procedure and configuration settings for remote execution,but i didn't get a clear idea about that,Please help me a clear step by step procedure for how to run remotely in my client machine .
2) Is there possible for Schedule those jobs and transformation in Pentaho Community edition 5..4.0 ? Is it possible please explain the same.
Thanks and Regards
Dhamodharan.
Install jenkins
https://wiki.jenkins-ci.org/display/JENKINS/Installing+Jenkins
At least read what variables are available in Jenkins. It is pretty handy to know them.
Download PDI KETTLE from http://pentaho.com unzip in any suitable directory.
Configure executables and PDI variables as in here
How to configure Database connection for production environment in Pentaho data integration Kettle transformation
Start jenkins and login into admin panel. Create an new job,
in paragraph Build add Execute shell inside input text area add lines:
cd $WORKSPACE
kitchen.sh -file=main.kjb
Done.
There are a lot of jenkins plugins.
You can add post-build actions:
notice by email
archive publish result
.... so on
Worth to use Jenkins if it is used for some other functionalities, means it is already exists in infrastructure, otherwise carte will be enought.
Variable configured in .bashrc and .bash_profile (User should be same as used for Jenkins)
#.bashrc
export KETTLE_HOME=/opt/R1/data-integration
export KETTLE_JNDI_ROOT=$KETTLE_HOME/simple-jndi
export PATH=$PATH:$KETTLE_HOME
To force evaluate .bashrc on ssh login add to .bash_profile
#.bash_profile
if [ -f .bashrc ]; then
. ~/.bashrc
fi
Then
source .bashrc
After restart Jenkins (not from admin panel)

Jenkins: Publish over SSH after failed build

I am trying to use the Publish Over SSH plugin to publish many kinds of build artifact to an external server. Examples of build artifacts are compiled builds, XML output from testing, and JSON output from linting.
If testing or linting results in errors, the build will fail or be marked unstable. In the case of a failed build, the Publish Over SSH plugin will not copy the build artifacts, writing to the console:
SSH: Current build result is [FAILURE], not going to run.
I see no reason why I wouldn't want to publish this information if it exists, and I would like to continue to report errors as build failures. So, is there any way to force Jenkins to publish build artifacts even if the job is marked as a failure?
I thought I could use the Flexible Publish to force this, by wrapping the Publish Over SSH in an "always" condition, but this gave the same output as before on a build failure.
I can think of a couple of work-arounds:
a) store the build status in an environment variable; force the status to SUCCESS; perform the publish step; recover the build status from the environment variable using java jenkins-cli.jar set-build-status $STORED_STATUS
OR
b) Write a bash script to perform the publishing step manually using SSH, cutting out the Publish Over SSH plugin altogether
Before I push forward with either of these solutions (neither of which I like), is there any piece of configuration that I'm missing?
The solution I ended up using was to use rsync/ssh to copy the files manually using a post build script. I configured this in my Jenkins Job Builder YAML like so:
- publisher:
name: publish-to-archive
publishers:
- post-tasks:
- matches:
- log-text: ".*"
script: |
ssh -i ${{HOME}}/.ssh/id_rsa jenkins#archiver "mkdir -p {archive_path}"
rsync -Pravdtze "ssh -i ${{HOME}}/.ssh/id_rsa" {source_path} jenkins#archiver:{archive_path}
Quoting old hooky on jenkinsci-users:
How can I force Publish Over SSH to work even if the build has been marked
a failure?
Use "Send files or execute commands over SSH after the build runs" in
configuration section "Build environment"
Job configuration / Build Environment / Send files or execute commands over SSH after the build runs
instead of using a post-build or build-step.