I`m trying to create a cluster with two nodes using glassfish 4.1.1 build 1.
One node is local, an the other one is ssh.The node is working as if I ping it it responds ok. ( Successfully made SSH connection to node node2 (gfNode2))
I have setup ssh, create the node, create one instance (i2) on that node but when i want to start the instance i get:
i2: Could not start instance i2 on node node2 (gfNode2). Command failed on node node2 (gfNode2): Previous synchronization failed at Sep 10, 2016 12:25:27 PM Will perform full synchronization. Removing all cached state for instance i2. CLI802 Synchronization failed for directory config, caused by: remote failure: SynchronizeFiles: Exception reading request Command start-local-instance failed. To complete this operation run the following command locally on host gfNode2 from the GlassFish install location /opt/glassfish4: lib/nadmin start-local-instance --node node2 --sync normal i2
if i run this command on node 2 machine i get:
./nadmin start-local-instance --node node2 --sync normal i2
Previous synchronization failed at Sep 10, 2016 12:25:27 PM
Will perform full synchronization.
Removing all cached state for instance i2.
Enter admin user name> admin
Enter admin password for user "admin">
CLI802 Synchronization failed for directory config, caused by:
remote failure: SynchronizeFiles: Exception reading request
Command start-local-instance failed.
any idea what to try next ?
Update:
The DAS is reachabe, the ssh is working properly (ping-node-ssh works from das).
What I have noticed is that even after i have installed (install-node-ssh) and create with(create-node-ssh), node 2 has no files inside.
At /glassfish4/glassfish/nodes/node2/i2 there is only one file: .syncstate which is empty. The node2/i2 directories are there but nothing in i2. Maybe due to : Removing all cached state for instance i2.
That is what i got in DAS logs:
[2016-09-10T19:31:14.806+0000] [glassfish 4.1] [WARNING] [] [javax.enterprise.system.core] [tid: _ThreadID=106 _ThreadName=admin- listener(5)] [timeMillis: 1473535874806] [levelValue: 900] [[
Could not start instance i2 on node node2 (gfNode2).: Command ' /opt/glassfish4/glassfish/lib/nadmin --_auxinput - --interactive=false start-local-instance --node node2 --sync normal i2' failed on node node2 (gfNode2): Previous synchronization failed at Sep 10, 2016 12:25:27 PM
Will perform full synchronization.
Removing all cached state for instance i2.
Command start-local-instance failed.
CLI802 Synchronization failed for directory config, caused by:
remote failure: SynchronizeFiles: Exception reading request: To complete this operation run the following command locally on host gfNode2 from the GlassFish install location /opt/glassfish4:
lib/nadmin start-local-instance --node node2 --sync normal i2]]
[2016-09-10T19:31:14.818+0000] [glassfish 4.1] [SEVERE] [] [org.glassfish.admingui] [tid: _ThreadID=102 _ThreadName=admin-listener(1)] [timeMillis: 1473535874818] [levelValue: 1000] [[
RestResponse.getResponse() gives FAILURE. endpoint = 'https://localhost:4848/management/domain/servers/server/i2/start-instance'; attrs = '{}']]
[2016-09-10T19:31:14.820+0000] [glassfish 4.1] [SEVERE] [] [org.glassfish.admingui] [tid: _ThreadID=102 _ThreadName=admin-listener(1)] [timeMillis: 1473535874820] [levelValue: 1000] [[
Error in instanceAction ;
endpoint=https://localhost:4848/management/domain/servers/server/i2/start-instance;attrsMap=null]]
If I try to run the command from node2 I got what is showed on first code block of the post...
The problem here is that the remote instance i2 can't communicate with the DAS to download its configuration.
You will need to verify:
Is the DAS online?
Is the server where the DAS is reachable by the remote node?
Is SSH communication working properly? (use the asadmin command ping-node-ssh
If you open the server.log file for the instance and on the DAS, that should give you a more detailed error message and indicate whether or not the request is reaching the DAS.
The instance logs are located in:
$GLASSFISH_HOME/glassfish/nodes/node2/i2/logs/server.log
The domain logs are located in:
$GLASSFISH_HOME/glassfish/domains/domain1/logs/server.log
Related
Following the instructions from here, I'm attempting to get to-be-continuous up and running.
I've created the empty to-be-continuous root group and the Maintainer non-individual GitLab account, and generated its appropriately scoped personal access token.
Upon executing the curl command to recursively copy the tbc group, I notice that the tools sub-group isn't cloned.
Seeing that the tracking repo from the tools group is required for the next step, I manually created the tools sub-group and individually manually cloned each of the repos under it, effectively mirroring the structure and content of the authoritative tbc repo.
Additionally I've configured my self-hosted GitLab's CA in the OpenShift GitLab runner so that I no longer get x509 errors.
With the above in place, including an available GitLab runner on my OpenShift cluster, I attempted to manually run the tracking repo's pipeline (as I understand this to be prerequisite to any other pipeline runs?).
The GitLab runner seemed to pick up the pipeline, as runner's log scrolled off the following:
Checking for jobs... received [0;m job[0;m=6103 repo_url[0;m=https://git.corp.odfl.com/to-be-continuous/tools/tracking.git runner[0;m=b3CyGtqD
Checking for jobs... received [0;m job[0;m=6104 repo_url[0;m=https://git.corp.odfl.com/to-be-continuous/tools/tracking.git runner[0;m=b3CyGtqD
[31;1mERROR: Could not create cache adapter [0;m [31;1merror[0;m=cache factory not found: factory for cache adapter "" was not registered
[31;1mERROR: Could not create cache adapter [0;m [31;1merror[0;m=cache factory not found: factory for cache adapter "" was not registered
[31;1mERROR: Could not create cache adapter [0;m [31;1merror[0;m=cache factory not found: factory for cache adapter "" was not registered
[31;1mERROR: Could not create cache adapter [0;m [31;1merror[0;m=cache factory not found: factory for cache adapter "" was not registered
Checking for jobs... received [0;m job[0;m=6105 repo_url[0;m=https://git.corp.odfl.com/to-be-continuous/tools/tracking.git runner[0;m=b3CyGtqD
[31;1mERROR: Could not create cache adapter [0;m [31;1merror[0;m=cache factory not found: factory for cache adapter "" was not registered
[31;1mERROR: Could not create cache adapter [0;m [31;1merror[0;m=cache factory not found: factory for cache adapter "" was not registered
[31;1mERROR: Could not create cache adapter [0;m [31;1merror[0;m=cache factory not found: factory for cache adapter "" was not registered
[31;1mERROR: Could not create cache adapter [0;m [31;1merror[0;m=cache factory not found: factory for cache adapter "" was not registered
[31;1mERROR: Could not create cache adapter [0;m [31;1merror[0;m=cache factory not found: factory for cache adapter "" was not registered
[31;1mERROR: Could not create cache adapter [0;m [31;1merror[0;m=cache factory not found: factory for cache adapter "" was not registered
[0;33mWARNING: Job failed: command terminated with exit code 1[0;m [0;33mduration_s[0;m=9.30956493 [0;33mjob[0;m=6103 [0;33mproject[0;m=876 [0;33mrunner[0;m=b3CyGtqD
[0;33mWARNING: Failed to process runner [0;m [0;33mbuilds[0;m=2 [0;33merror[0;m=command terminated with exit code 1 [0;33mexecutor[0;m=kubernetes [0;33mrunner[0;m=b3CyGtqD
[0;33mWARNING: Job failed: command terminated with exit code 1[0;m [0;33mduration_s[0;m=9.808499871 [0;33mjob[0;m=6105 [0;33mproject[0;m=876 [0;33mrunner[0;m=b3CyGtqD
[0;33mWARNING: Failed to process runner [0;m [0;33mbuilds[0;m=1 [0;33merror[0;m=command terminated with exit code 1 [0;33mexecutor[0;m=kubernetes [0;33mrunner[0;m=b3CyGtqD
[31;1mERROR: Could not create cache adapter [0;m [31;1merror[0;m=cache factory not found: factory for cache adapter "" was not registered
[31;1mERROR: Could not create cache adapter [0;m [31;1merror[0;m=cache factory not found: factory for cache adapter "" was not registered
[31;1mERROR: Could not create cache adapter [0;m [31;1merror[0;m=cache factory not found: factory for cache adapter "" was not registered
Job succeeded [0;m duration_s[0;m=30.342517342 job[0;m=6104 project[0;m=876 runner[0;m=b3CyGtqD
At the same time, the pipeline log on GitLab shows the following:
Running with gitlab-runner 14.1.0 (8925d9a0)
on gitlab-runner-runner-5bc5455cfb-pmrpl b3CyGtqD
Preparing the "kubernetes" executor
00:00
Using Kubernetes namespace: dle-test
Using Kubernetes executor with image hadolint/hadolint:latest-alpine ...
Using attach strategy to execute scripts...
Preparing environment
00:07
Waiting for pod dle-test/runner-b3cygtqd-project-876-concurrent-0fvm2z to be running, status is Pending
Waiting for pod dle-test/runner-b3cygtqd-project-876-concurrent-0fvm2z to be running, status is Pending
ContainersNotInitialized: "containers with incomplete status: [init-logs]"
ContainersNotReady: "containers with unready status: [build helper]"
ContainersNotReady: "containers with unready status: [build helper]"
Running on runner-b3cygtqd-project-876-concurrent-0fvm2z via gitlab-runner-runner-5bc5455cfb-pmrpl...
Getting source from Git repository
00:01
Fetching changes with git depth set to 50...
Initialized empty Git repository in /builds/b3CyGtqD/0/to-be-continuous/tools/tracking/.git/
Created fresh repository.
Checking out e31d6d28 as master...
Skipping Git submodules setup
Executing "step_script" stage of the job script
00:01
$ # BEGSCRIPT # collapsed multi-line command
/scripts-876-6103/step_script: eval: line 162: can't create /etc/ssl/certs/ca-certificates.crt: Permission denied
Uploading artifacts for failed job
00:00
Uploading artifacts...
WARNING: reports/hadolint-*.json: no matching files
ERROR: No files to upload
Uploading artifacts...
WARNING: reports/hadolint-*.json: no matching files
ERROR: No files to upload
Cleaning up file based variables
00:01
ERROR: Job failed: command terminated with exit code 1
Having spent quite a few hours getting this far, I'm stumped. Any idea what I'm doing wrong?
Added kaniko log as requested:
Running with gitlab-runner 14.1.0 (8925d9a0)
on gitlab-runner-runner-5bc5455cfb-4ggsp n8KiyZgX
Preparing the "kubernetes" executor
00:00
Using Kubernetes namespace: dle-test
Using Kubernetes executor with image gcr.io/kaniko-project/executor:debug ...
Using attach strategy to execute scripts...
Preparing environment
00:13
Waiting for pod dle-test/runner-n8kiyzgx-project-876-concurrent-0knvl9 to be running, status is Pending
Waiting for pod dle-test/runner-n8kiyzgx-project-876-concurrent-0knvl9 to be running, status is Pending
ContainersNotInitialized: "containers with incomplete status: [init-logs]"
ContainersNotReady: "containers with unready status: [build helper]"
ContainersNotReady: "containers with unready status: [build helper]"
Waiting for pod dle-test/runner-n8kiyzgx-project-876-concurrent-0knvl9 to be running, status is Pending
ContainersNotReady: "containers with unready status: [build helper]"
ContainersNotReady: "containers with unready status: [build helper]"
Waiting for pod dle-test/runner-n8kiyzgx-project-876-concurrent-0knvl9 to be running, status is Pending
ContainersNotReady: "containers with unready status: [build helper]"
ContainersNotReady: "containers with unready status: [build helper]"
Running on runner-n8kiyzgx-project-876-concurrent-0knvl9 via gitlab-runner-runner-5bc5455cfb-4ggsp...
Getting source from Git repository
00:02
Fetching changes with git depth set to 50...
Initialized empty Git repository in /builds/n8KiyZgX/0/to-be-continuous/tools/tracking/.git/
Created fresh repository.
Checking out e31d6d28 as master...
Skipping Git submodules setup
Restoring cache
00:00
Checking cache for master-docker-2...
No URL provided, cache will not be downloaded from shared cache server. Instead a local version of cache will be extracted.
Successfully extracted cache
Downloading artifacts
00:01
Downloading artifacts for docker-hadolint (6121)...
Downloading artifacts from coordinator... ok id=6121 responseStatus=200 OK token=LRUFpXw7
WARNING: reports/hadolint-dde65eefd6c9a71b70c22f15c806082e.json: lchown reports/hadolint-dde65eefd6c9a71b70c22f15c806082e.json: operation not permitted (suppressing repeats)
Downloading artifacts for go-build-test (6122)...
Downloading artifacts from coordinator... ok id=6122 responseStatus=200 OK token=nqXz2-2P
WARNING: bin/: lchown bin/: operation not permitted (suppressing repeats)
Executing "step_script" stage of the job script
00:08
$ # BEGSCRIPT # collapsed multi-line command
[WARN] =======================================================================================================
[WARN] The template docker:1.2.0 you're using is not up-to-date: consider upgrading to version 2.1.1
[WARN] (set $TEMPLATE_CHECK_UPDATE_DISABLED to disable this message)
[WARN] =======================================================================================================
[INFO] Custom CA certificates configured in /kaniko/ssl/certs/ca-certificates.crt
[INFO] Docker authentication configured for
$ run_build_kaniko "$DOCKER_SNAPSHOT_IMAGE" --build-arg http_proxy="$http_proxy" --build-arg https_proxy="$https_proxy" --build-arg no_proxy="$no_proxy"
[INFO] Build & deploy image /snapshot:master
[INFO] Kaniko command: /kaniko/executor --context . --dockerfile ./Dockerfile --destination /snapshot:master --cache --cache-dir=/builds/n8KiyZgX/0/to-be-continuous/tools/tracking/.cache --verbosity info --build-arg CI_PROJECT_URL --build-arg TRACKING_CONFIGURATION --build-arg http_proxy= --build-arg https_proxy= --build-arg no_proxy=
E1013 18:05:11.931688 44 aws_credentials.go:77] while getting AWS credentials NoCredentialProviders: no valid providers in chain. Deprecated.
For verbose messaging see aws.Config.CredentialsChainVerboseErrors
error checking push permissions -- make sure you entered the correct tag name, and that you are authenticated correctly, and try again: checking push permission for "/snapshot:master": GET https://index.docker.io/v2/snapshot/blobs/uploads/: UNAUTHORIZED: authentication required; [map[Action:pull Class: Name:snapshot Type:repository]]
Uploading artifacts for failed job
00:01
Uploading artifacts...
WARNING: docker.env: no matching files
ERROR: No files to upload
Cleaning up file based variables
00:00
ERROR: Job failed: command terminated with exit code 1
First of all thanks for your feedback. I thoroughly investigated and you're right: we've recently introduced a bug in our gitlab-sync.sh script that prevented it from recursing :(
A fix is on its way, you should be able to retry it once it's merged.
About your second issue, the logs clearly suggest the hadolint job failed importing your custom CA certificates, but that should not happen using the hadolint/hadolint:latest-alpine image.
See the same job logs on gitlab.com:
[INFO] Custom CA certificates imported in /etc/ssl/certs/ca-certificates.crt
I don't see clearly where the problem could come from.
A few questions to help me investigate:
which kind of GitLab runners did you configure ?
which technique did you use to configure your custom CA certificates ? did you configure a global DEFAULT_CA_CERTS as suggested in our doc ?
is docker-hadolint the only job to fail ? You should also have go-build-test and go-ci-lint on the same stage that also import the custom CA certificates in the same way...
I try to run this 'rabbitmq-server' command in my cmd but that give me this error ...
Configuring logger redirection
13:44:01.865 [warning] Using RABBITMQ_ADVANCED_CONFIG_FILE: c:/Users/saikat/AppData/Roaming/RabbitMQ/advanced.config
13:44:02.838 [error]
13:44:02.838 [error] BOOT FAILED
BOOT FAILED
13:44:02.838 [error] ===========
===========
13:44:02.838 [error] ERROR: distribution port 25672 in use by another node: rabbit#DESKTOP-1I7H1RC
ERROR: distribution port 25672 in use by another node: rabbit#DESKTOP-1I7H1RC
13:44:02.838 [error]
13:44:03.839 [error] Supervisor rabbit_prelaunch_sup had child prelaunch started with rabbit_prelaunch:run_prelaunch_first_phase() at undefined exit with reason {dist_port_already_used,25672,"rabbit","DESKTOP-1I7H1RC"} in context start_error
13:44:03.840 [error] CRASH REPORT Process <0.152.0> with 0 neighbours exited with reason: {{shutdown,{failed_to_start_child,prelaunch,{dist_port_already_used,25672,"rabbit","DESKTOP-1I7H1RC"}}},{rabbit_prelaunch_app,start,[normal,[]]}} in application_master:init/4 line 138
{"Kernel pid terminated",application_controller,"{application_start_failure,rabbitmq_prelaunch,{{shutdown,{failed_to_start_child,prelaunch,{dist_port_already_used,25672,\"rabbit\",\"DESKTOP-1I7H1RC\"}}},{rabbit_prelaunch_app,start,[normal,[]]}}}"}
Kernel pid terminated (application_controller) ({application_start_failure,rabbitmq_prelaunch,{{shutdown,{failed_to_start_child,prelaunch,{dist_port_already_used,25672,"rabbit","DESKTOP-1I7H1RC"}}},{r
Crash dump is being written to erl_crash.dump...done
No doubt you have already resolved this, but for anyone else this may help I had the same issue on Windows and managed to resolve it by doing the followed.
Open powershell as admin in *\rabbitmq_server-3.8.9\sbin*
Stop the service by running: .\rabbitmq-service.bat stop
Start the service by running: .\rabbitmq-server.bat
If you are on windows then go to Services
Search RabbitMq and Right click
Stop service
Open cmd as administrator
run cd C:\Program Files\RabbitMQ Server\rabbitmq_server-3.8.17\sbin
and then run rabbitmq-server
For me what I did was killing the process of erl from the task manager and then running the command:
rabbitmq-server.bat
I wanted to learn more about Fabric8, however, it is not possible to build even a very simple project. I am running it locally on a Minikube cluster.
The setup is:
Mac OS Sierra
Minikube v0.18.0
Fabric8 v0.4.122
So I have a simple Spring Boot application in the local Gogs repository. The builds are failing with this message:
/usr/bin/git checkout -f d8af29f8af7a498331a244d245fb321003ef110d
/usr/bin/git rev-list d8af29f8af7a498331a244d245fb321003ef110d # timeout=10
[Pipeline] End of Pipeline
io.fabric8.kubernetes.client.KubernetesClientException: An error has occurred.
at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:57)
at io.fabric8.kubernetes.client.utils.HttpClientUtils.createHttpClient(HttpClientUtils.java:153)
[...]
Caused by: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
at sun.security.ssl.Alerts.getSSLException(Alerts.java:192)
at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1949)
at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:302)
So I took the ca.crt from Minikube (~/minikube/ca.crt) and added it (base64-encoded) to the jenkins-git-ssh secret which gets mounted in the Jenkins pod in /var/run/secrets/kubernetes.io/serviceaccount. The next build ended with this error:
/usr/bin/git checkout -f d8af29f8af7a498331a244d245fb321003ef110d
/usr/bin/git rev-list d8af29f8af7a498331a244d245fb321003ef110d # timeout=10
[Pipeline] End of Pipeline
io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: GET at: https://kubernetes.default/. Message: Unauthorized
.
The same happens when I use apiserver.crt from Minikube.
When using ca.pem instead I get:
Caused by: java.security.cert.CertificateException: Unable to initialize, java.io.IOException: extra data given to DerValue constructor
at sun.security.x509.X509CertImpl.<init>(X509CertImpl.java:198)
at sun.security.provider.X509Factory.engineGenerateCertificate(X509Factory.java:102)
I can access the Kubernetes API from the Jenkins pod only when adding both apiserver.crt and apiserver.key to the secret. Executing
curl -k --cert apiserver.crt --key apiserver.key https://kubernetes.default/.
is successful then - but the Jenkins build is still failing.
So Im a bit lost here. Does anybody have an idea how to continue?
Thanks and regards,
Daniel
we have a fix but it's not released yet. Details can be found https://github.com/fabric8io/fabric8/issues/6829#issuecomment-301467664 which also describes a workaround.
TL;DR you can edit the jenkins service account and remove the following lines before restarting the jenkins master pod:
-secrets:
-- name: "jenkins-git-ssh"
-- name: "jenkins-master-ssh"
-- name: "jenkins-release-gpg"
Hope that helps.
While setting up a single node cluster without Cygwin on windows 10,I followed the specific document- Link for Hadoop installation in windows 10
I am facing the below error while starting the hdfs using D:\hadoop-2.6.2.tar\hadoop-2.6.2\hadoop-2.6.2\sbin>start-dfs.cmd
Error message stack trace:
17/01/12 12:25:42 FATAL datanode.DataNode: Exception in secureMain java.lang.RuntimeException: Error while running command to get file permissions : ExitCodeException exitCode=-1073741515:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:808)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:791)
at org.apache.hadoop.fs.FileUtil.execCommand(FileUtil.java:1097)
at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:582)
at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:557)
at org.apache.hadoop.util.DiskChecker.mkdirsWithExistsAndPermissionCheck(DiskChecker.java:139)
at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:156)
at org.apache.hadoop.hdfs.server.datanode.DataNode$DataNodeDiskChecker.checkDir(DataNode.java:2299)
at org.apache.hadoop.hdfs.server.datanode.DataNode.checkStorageLocations(DataNode.java:2341)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2323)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2215)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2262)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2438)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2462)
at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:620)
at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:557)
at org.apache.hadoop.util.DiskChecker.mkdirsWithExistsAndPermissionCheck(DiskChecker.java:139)
at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:156)
at org.apache.hadoop.hdfs.server.datanode.DataNode$DataNodeDiskChecker.checkDir(DataNode.java:2299)
at org.apache.hadoop.hdfs.server.datanode.DataNode.checkStorageLocations(DataNode.java:2341)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2323)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2215)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2262)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2438)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2462) 17/01/12 12:25:42 INFO util.ExitUtil: Exiting with status 1
Also this error message about starting namenode:
17/01/12 12:25:43 FATAL namenode.NameNode: Failed to start namenode.
java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
at org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Native Method)
at org.apache.hadoop.io.nativeio.NativeIO$Windows.access(NativeIO.java:557)
at org.apache.hadoop.fs.FileUtil.canWrite(FileUtil.java:996)
at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:490)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:309)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:202)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1022)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:741)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:538)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:597)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:764)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:748)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1441)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1507)
17/01/12 12:25:43 INFO util.ExitUtil: Exiting with status 1
[]Problem analysis ] /data directory permissions is not enough, the NameNode cannot be started.
[Solution]
(1) in the root, the operation of the/data/directory permissions assigned to hadoop users;
(2) empty /data directory file;
(3) to reformat the NameNode, restart the hadoop cluster.
I am trying to configure a slave for my jenkins Master. I did the below steps.
enabled passwordless auth to remote host(GNU LINUX)
Configured the slave on master
I can see the slave.jar being copied to remote host folder. But it is failing with the below error
Expanded the channel window size to 4MB
[11/07/14 19:11:54] [SSH] Starting slave process: cd "/test/app/abc/slavetest" && /usr/java /jdk1.6.0_29 -XX:MaxPermSize=2048m -Xmx2048m -jar slave.jar
bash: /usr/java/jdk1.6.0_29: is a directory
hudson.util.IOException2: Slave JVM has terminated. Exit code=126
at hudson.plugins.sshslaves.SSHLauncher.startSlave(SSHLauncher.java:953)
at hudson.plugins.sshslaves.SSHLauncher.access$400(SSHLauncher.java:133)
at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:711)
at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:696)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.EOFException: unexpected stream termination
at hudson.remoting.ChannelBuilder.negotiate(ChannelBuilder.java:200)
at hudson.remoting.Channel.<init>(Channel.java:419)
at hudson.remoting.Channel.<init>(Channel.java:398)
at hudson.remoting.Channel.<init>(Channel.java:394)
at hudson.remoting.Channel.<init>(Channel.java:383)
at hudson.remoting.Channel.<init>(Channel.java:375)
at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:344)
at hudson.plugins.sshslaves.SSHLauncher.startSlave(SSHLauncher.java:945)
... 7 more
[11/07/14 19:11:54] Launch failed - cleaning up connection
[11/07/14 19:11:54] [SSH] Connection closed.
Any idea what I am doing wrong?
You have your slave's path to the java executable misconfigured:
/usr/java /jdk1.6.0_29 -XX:MaxPermSize=2048m -Xmx2048m -jar slave.jar
The blank space should be removed, and the full path should be
/usr/java/jdk1.6.0_29/bin/java
I just ran into this as well. Best to check the docker container/slave's java path by logging into the container and running whereis java.
The java path of the host and agent are probably not the same. And that jar and the java command is being executed from within the agent.