Attempting to install to be continuous with OpenShift 4 and self-managed GitLab - gitlab-ci

Following the instructions from here, I'm attempting to get to-be-continuous up and running.
I've created the empty to-be-continuous root group and the Maintainer non-individual GitLab account, and generated its appropriately scoped personal access token.
Upon executing the curl command to recursively copy the tbc group, I notice that the tools sub-group isn't cloned.
Seeing that the tracking repo from the tools group is required for the next step, I manually created the tools sub-group and individually manually cloned each of the repos under it, effectively mirroring the structure and content of the authoritative tbc repo.
Additionally I've configured my self-hosted GitLab's CA in the OpenShift GitLab runner so that I no longer get x509 errors.
With the above in place, including an available GitLab runner on my OpenShift cluster, I attempted to manually run the tracking repo's pipeline (as I understand this to be prerequisite to any other pipeline runs?).
The GitLab runner seemed to pick up the pipeline, as runner's log scrolled off the following:
Checking for jobs... received [0;m job[0;m=6103 repo_url[0;m=https://git.corp.odfl.com/to-be-continuous/tools/tracking.git runner[0;m=b3CyGtqD
Checking for jobs... received [0;m job[0;m=6104 repo_url[0;m=https://git.corp.odfl.com/to-be-continuous/tools/tracking.git runner[0;m=b3CyGtqD
[31;1mERROR: Could not create cache adapter [0;m [31;1merror[0;m=cache factory not found: factory for cache adapter "" was not registered
[31;1mERROR: Could not create cache adapter [0;m [31;1merror[0;m=cache factory not found: factory for cache adapter "" was not registered
[31;1mERROR: Could not create cache adapter [0;m [31;1merror[0;m=cache factory not found: factory for cache adapter "" was not registered
[31;1mERROR: Could not create cache adapter [0;m [31;1merror[0;m=cache factory not found: factory for cache adapter "" was not registered
Checking for jobs... received [0;m job[0;m=6105 repo_url[0;m=https://git.corp.odfl.com/to-be-continuous/tools/tracking.git runner[0;m=b3CyGtqD
[31;1mERROR: Could not create cache adapter [0;m [31;1merror[0;m=cache factory not found: factory for cache adapter "" was not registered
[31;1mERROR: Could not create cache adapter [0;m [31;1merror[0;m=cache factory not found: factory for cache adapter "" was not registered
[31;1mERROR: Could not create cache adapter [0;m [31;1merror[0;m=cache factory not found: factory for cache adapter "" was not registered
[31;1mERROR: Could not create cache adapter [0;m [31;1merror[0;m=cache factory not found: factory for cache adapter "" was not registered
[31;1mERROR: Could not create cache adapter [0;m [31;1merror[0;m=cache factory not found: factory for cache adapter "" was not registered
[31;1mERROR: Could not create cache adapter [0;m [31;1merror[0;m=cache factory not found: factory for cache adapter "" was not registered
[0;33mWARNING: Job failed: command terminated with exit code 1[0;m [0;33mduration_s[0;m=9.30956493 [0;33mjob[0;m=6103 [0;33mproject[0;m=876 [0;33mrunner[0;m=b3CyGtqD
[0;33mWARNING: Failed to process runner [0;m [0;33mbuilds[0;m=2 [0;33merror[0;m=command terminated with exit code 1 [0;33mexecutor[0;m=kubernetes [0;33mrunner[0;m=b3CyGtqD
[0;33mWARNING: Job failed: command terminated with exit code 1[0;m [0;33mduration_s[0;m=9.808499871 [0;33mjob[0;m=6105 [0;33mproject[0;m=876 [0;33mrunner[0;m=b3CyGtqD
[0;33mWARNING: Failed to process runner [0;m [0;33mbuilds[0;m=1 [0;33merror[0;m=command terminated with exit code 1 [0;33mexecutor[0;m=kubernetes [0;33mrunner[0;m=b3CyGtqD
[31;1mERROR: Could not create cache adapter [0;m [31;1merror[0;m=cache factory not found: factory for cache adapter "" was not registered
[31;1mERROR: Could not create cache adapter [0;m [31;1merror[0;m=cache factory not found: factory for cache adapter "" was not registered
[31;1mERROR: Could not create cache adapter [0;m [31;1merror[0;m=cache factory not found: factory for cache adapter "" was not registered
Job succeeded [0;m duration_s[0;m=30.342517342 job[0;m=6104 project[0;m=876 runner[0;m=b3CyGtqD
At the same time, the pipeline log on GitLab shows the following:
Running with gitlab-runner 14.1.0 (8925d9a0)
on gitlab-runner-runner-5bc5455cfb-pmrpl b3CyGtqD
Preparing the "kubernetes" executor
00:00
Using Kubernetes namespace: dle-test
Using Kubernetes executor with image hadolint/hadolint:latest-alpine ...
Using attach strategy to execute scripts...
Preparing environment
00:07
Waiting for pod dle-test/runner-b3cygtqd-project-876-concurrent-0fvm2z to be running, status is Pending
Waiting for pod dle-test/runner-b3cygtqd-project-876-concurrent-0fvm2z to be running, status is Pending
ContainersNotInitialized: "containers with incomplete status: [init-logs]"
ContainersNotReady: "containers with unready status: [build helper]"
ContainersNotReady: "containers with unready status: [build helper]"
Running on runner-b3cygtqd-project-876-concurrent-0fvm2z via gitlab-runner-runner-5bc5455cfb-pmrpl...
Getting source from Git repository
00:01
Fetching changes with git depth set to 50...
Initialized empty Git repository in /builds/b3CyGtqD/0/to-be-continuous/tools/tracking/.git/
Created fresh repository.
Checking out e31d6d28 as master...
Skipping Git submodules setup
Executing "step_script" stage of the job script
00:01
$ # BEGSCRIPT # collapsed multi-line command
/scripts-876-6103/step_script: eval: line 162: can't create /etc/ssl/certs/ca-certificates.crt: Permission denied
Uploading artifacts for failed job
00:00
Uploading artifacts...
WARNING: reports/hadolint-*.json: no matching files
ERROR: No files to upload
Uploading artifacts...
WARNING: reports/hadolint-*.json: no matching files
ERROR: No files to upload
Cleaning up file based variables
00:01
ERROR: Job failed: command terminated with exit code 1
Having spent quite a few hours getting this far, I'm stumped. Any idea what I'm doing wrong?
Added kaniko log as requested:
Running with gitlab-runner 14.1.0 (8925d9a0)
on gitlab-runner-runner-5bc5455cfb-4ggsp n8KiyZgX
Preparing the "kubernetes" executor
00:00
Using Kubernetes namespace: dle-test
Using Kubernetes executor with image gcr.io/kaniko-project/executor:debug ...
Using attach strategy to execute scripts...
Preparing environment
00:13
Waiting for pod dle-test/runner-n8kiyzgx-project-876-concurrent-0knvl9 to be running, status is Pending
Waiting for pod dle-test/runner-n8kiyzgx-project-876-concurrent-0knvl9 to be running, status is Pending
ContainersNotInitialized: "containers with incomplete status: [init-logs]"
ContainersNotReady: "containers with unready status: [build helper]"
ContainersNotReady: "containers with unready status: [build helper]"
Waiting for pod dle-test/runner-n8kiyzgx-project-876-concurrent-0knvl9 to be running, status is Pending
ContainersNotReady: "containers with unready status: [build helper]"
ContainersNotReady: "containers with unready status: [build helper]"
Waiting for pod dle-test/runner-n8kiyzgx-project-876-concurrent-0knvl9 to be running, status is Pending
ContainersNotReady: "containers with unready status: [build helper]"
ContainersNotReady: "containers with unready status: [build helper]"
Running on runner-n8kiyzgx-project-876-concurrent-0knvl9 via gitlab-runner-runner-5bc5455cfb-4ggsp...
Getting source from Git repository
00:02
Fetching changes with git depth set to 50...
Initialized empty Git repository in /builds/n8KiyZgX/0/to-be-continuous/tools/tracking/.git/
Created fresh repository.
Checking out e31d6d28 as master...
Skipping Git submodules setup
Restoring cache
00:00
Checking cache for master-docker-2...
No URL provided, cache will not be downloaded from shared cache server. Instead a local version of cache will be extracted.
Successfully extracted cache
Downloading artifacts
00:01
Downloading artifacts for docker-hadolint (6121)...
Downloading artifacts from coordinator... ok id=6121 responseStatus=200 OK token=LRUFpXw7
WARNING: reports/hadolint-dde65eefd6c9a71b70c22f15c806082e.json: lchown reports/hadolint-dde65eefd6c9a71b70c22f15c806082e.json: operation not permitted (suppressing repeats)
Downloading artifacts for go-build-test (6122)...
Downloading artifacts from coordinator... ok id=6122 responseStatus=200 OK token=nqXz2-2P
WARNING: bin/: lchown bin/: operation not permitted (suppressing repeats)
Executing "step_script" stage of the job script
00:08
$ # BEGSCRIPT # collapsed multi-line command
[WARN] =======================================================================================================
[WARN] The template docker:1.2.0 you're using is not up-to-date: consider upgrading to version 2.1.1
[WARN] (set $TEMPLATE_CHECK_UPDATE_DISABLED to disable this message)
[WARN] =======================================================================================================
[INFO] Custom CA certificates configured in /kaniko/ssl/certs/ca-certificates.crt
[INFO] Docker authentication configured for
$ run_build_kaniko "$DOCKER_SNAPSHOT_IMAGE" --build-arg http_proxy="$http_proxy" --build-arg https_proxy="$https_proxy" --build-arg no_proxy="$no_proxy"
[INFO] Build & deploy image /snapshot:master
[INFO] Kaniko command: /kaniko/executor --context . --dockerfile ./Dockerfile --destination /snapshot:master --cache --cache-dir=/builds/n8KiyZgX/0/to-be-continuous/tools/tracking/.cache --verbosity info --build-arg CI_PROJECT_URL --build-arg TRACKING_CONFIGURATION --build-arg http_proxy= --build-arg https_proxy= --build-arg no_proxy=
E1013 18:05:11.931688 44 aws_credentials.go:77] while getting AWS credentials NoCredentialProviders: no valid providers in chain. Deprecated.
For verbose messaging see aws.Config.CredentialsChainVerboseErrors
error checking push permissions -- make sure you entered the correct tag name, and that you are authenticated correctly, and try again: checking push permission for "/snapshot:master": GET https://index.docker.io/v2/snapshot/blobs/uploads/: UNAUTHORIZED: authentication required; [map[Action:pull Class: Name:snapshot Type:repository]]
Uploading artifacts for failed job
00:01
Uploading artifacts...
WARNING: docker.env: no matching files
ERROR: No files to upload
Cleaning up file based variables
00:00
ERROR: Job failed: command terminated with exit code 1

First of all thanks for your feedback. I thoroughly investigated and you're right: we've recently introduced a bug in our gitlab-sync.sh script that prevented it from recursing :(
A fix is on its way, you should be able to retry it once it's merged.
About your second issue, the logs clearly suggest the hadolint job failed importing your custom CA certificates, but that should not happen using the hadolint/hadolint:latest-alpine image.
See the same job logs on gitlab.com:
[INFO] Custom CA certificates imported in /etc/ssl/certs/ca-certificates.crt
I don't see clearly where the problem could come from.
A few questions to help me investigate:
which kind of GitLab runners did you configure ?
which technique did you use to configure your custom CA certificates ? did you configure a global DEFAULT_CA_CERTS as suggested in our doc ?
is docker-hadolint the only job to fail ? You should also have go-build-test and go-ci-lint on the same stage that also import the custom CA certificates in the same way...

Related

Hyperledger Fabric error: "TLS: bad certificate server" when installing chaincode

I'm just starting learning HLF, and I have an error while following tutorial from the docs: link
I downloaded fabric-samples using this command (replaced bit.ly link with the destination):
curl -sSL https://raw.githubusercontent.com/hyperledger/fabric/master/scripts/bootstrap.sh | bash -s -- 2.2.2 1.4.9
I run logspout in one terminal and try to execute peer lifecycle chaincode install basic.tar.gz in another one, and this is the result i get
Error: failed to retrieve endorser client for install: endorser client
failed to connect to localhost:7051: failed to create new connection:
context deadline exceeded
Log presented by Logspout:
peer0.org1.example.com|2022-03-15 13:03:24.452 UTC [core.comm]
ServerHandshake -> ERRO 04a Server TLS handshake failed in 2.650245ms
with error remote error: tls: bad certificate server=PeerServer
remoteaddress=172.22.0.1:61126
I set the envs in terminal as instructed in the docs, and I checked that CORE_PEER_TLS_ROOTCERT_FILE variable points to an existing file. The content of the file is the same as on the container.
What I tried to do:
download fabric-samples again and redo all the setup with copy-pasting the commands directly from docs
Do you have any suggestions where I can look for an issue?
I resolved the problem, I was using peer version 2.2.1 from previous experiments, it probably collided with FABRIC_CFG_PATH

meshcat_visualizer_test fails fetching repository 'yaml_cpp'

I'm trying to run drake/bindings/pydrake/systems/test/meshcat_visualizer_test.py as per the commented instructions at the top of said file, but I am unfamiliar as to how I would add yaml-ccp, which I believe is causing errors. I have the drake repo cloned and pydrake configured, and was able to Run Server (the first required command), but do not know how to add the yaml-cpp package if it is missing.
phil#philpc:~/drake/bindings/pydrake/systems/test$ bazel run --run_under='env TEST_ZMQ_URL=tcp://127.0.0.1:6000' //bindings/pydrake/systems:py/meshcat_visualizer_test -- 'TestMeshcat.test_point_cloud_visualization'
INFO: Repository yaml_cpp instantiated at:
no stack (--record_rule_instantiation_callstack not enabled)
Repository rule pkg_config_repository defined at:
/home/phil/drake/tools/workspace/pkg_config.bzl:276:25: in <toplevel>
ERROR: An error occurred during the fetch of repository 'yaml_cpp':
Unable to complete pkg-config setup for #yaml_cpp repository: error 1 from [/usr/bin/pkg-config, "yaml-cpp"]:
INFO: Repository remotejdk11_linux instantiated at:
no stack (--record_rule_instantiation_callstack not enabled)
Repository rule http_archive defined at:
/home/phil/.cache/bazel/_bazel_phil/a5ca8dfa5bc97606d4bf1d23312635a2/external/bazel_tools/tools/build_defs/repo/http.bzl:336:16: in <toplevel>
INFO: Repository remote_java_tools_linux instantiated at:
no stack (--record_rule_instantiation_callstack not enabled)
Repository rule http_archive defined at:
/home/phil/.cache/bazel/_bazel_phil/a5ca8dfa5bc97606d4bf1d23312635a2/external/bazel_tools/tools/build_defs/repo/http.bzl:336:16: in <toplevel>
INFO: Repository fmt instantiated at:
no stack (--record_rule_instantiation_callstack not enabled)
Repository rule _github_archive_real defined at:
/home/phil/drake/tools/workspace/github.bzl:102:24: in <toplevel>
INFO: Repository lcm instantiated at:
no stack (--record_rule_instantiation_callstack not enabled)
Repository rule _github_archive_real defined at:
/home/phil/drake/tools/workspace/github.bzl:102:24: in <toplevel>
ERROR: /home/phil/drake/tools/install/libdrake/BUILD.bazel:251:1: //tools/install/libdrake:libdrake_runtime_so_deps depends on #yaml_cpp//:yaml_cpp in repository #yaml_cpp which failed to fetch. no such package '#yaml_cpp//': Unable to complete pkg-config setup for #yaml_cpp repository: error 1 from [/usr/bin/pkg-config, "yaml-cpp"]:
ERROR: Analysis of target '//bindings/pydrake/systems:py/meshcat_visualizer_test' failed; build aborted: no such package '#yaml_cpp//': Unable to complete pkg-config setup for #yaml_cpp repository: error 1 from [/usr/bin/pkg-config, "yaml-cpp"]:
INFO: Elapsed time: 0.515s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (6 packages loaded, 0 targets configured)
FAILED: Build did NOT complete successfully (6 packages loaded, 0 targets configured)
Try running /usr/bin/pkg-config yaml-cpp --libs on the command line and see what happens. It should report no error and just print -lyaml-cpp.
Have you installed Drake's dependencies using https://drake.mit.edu/ubuntu.html command? It should have installed libyaml-cpp-dev at that point, which should be all that is needed for pkg-config to succeed.

Yarn Launch Container Failed with Privilege Issue

Stack: Ambari 2.4.2.0, HDP 2.5.3.0, CentOS 6.8, FreeIPA 3.0.0
When I try to use hdp user to submit a job on yarn, _000001 container can be created and launched successfully, but I got error when _000002 container is being launched after container created:
2018-11-27 22:13:35,919 WARN privileged.PrivilegedOperationExecutor (PrivilegedOperationExecutor.java:executePrivilegedOperation(170)) - Shell execution returned exit code: 255. Privileged Execution Operation Output:
main : command provided 1
main : run as user is hdp
main : requested yarn user is hdp
Getting exit code file...
Creating script paths...
Writing pid file...
Writing to tmp file /hadoop/yarn/local/nmPrivate/application_1543327888220_0001/container_e14_1543327888220_0001_01_000002/container_e14_1543327888220_0001_01_000002.pid.tmp
Writing to cgroup task files...
Creating local dirs...
Launching container...
Getting exit code file...
Creating script paths...
Full command array for failed execution:
[/usr/hdp/current/hadoop-yarn-nodemanager/bin/container-executor, hdp, hdp, 1, application_1543327888220_0001, container_e14_1543327888220_0001_01_000002, /hadoop/yarn/local/usercache/hdp/appcache/application_1543327888220_0001/container_e14_1543327888220_0001_01_000002, /hadoop/yarn/local/nmPrivate/application_1543327888220_0001/container_e14_1543327888220_0001_01_000002/launch_container.sh, /hadoop/yarn/local/nmPrivate/application_1543327888220_0001/container_e14_1543327888220_0001_01_000002/container_e14_1543327888220_0001_01_000002.tokens, /hadoop/yarn/local/nmPrivate/application_1543327888220_0001/container_e14_1543327888220_0001_01_000002/container_e14_1543327888220_0001_01_000002.pid, /hadoop/yarn/local, /hadoop/yarn/log, cgroups=none]
2018-11-27 22:13:35,921 WARN runtime.DefaultLinuxContainerRuntime (DefaultLinuxContainerRuntime.java:launchContainer(107)) - Launch container failed. Exception: org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException: ExitCodeException exitCode=255:
at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:175)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.launchContainer(DefaultLinuxContainerRuntime.java:103)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:89)
at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:392)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:317)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: ExitCodeException exitCode=255:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:933)
at org.apache.hadoop.util.Shell.run(Shell.java:844)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1123)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:150)
... 9 more
There is no more log about Privilege, anybody has some idea?
Thanks in advance!
Problem resolved and the problem is submitted job itself not YARN/Privilege.
Suggestion is that you'd better try to find details in container log not resourcemanager/nodemanager log.

Fabric8 / Minikube: Builds in Jenkins are failing due to authorization problems

I wanted to learn more about Fabric8, however, it is not possible to build even a very simple project. I am running it locally on a Minikube cluster.
The setup is:
Mac OS Sierra
Minikube v0.18.0
Fabric8 v0.4.122
So I have a simple Spring Boot application in the local Gogs repository. The builds are failing with this message:
/usr/bin/git checkout -f d8af29f8af7a498331a244d245fb321003ef110d
/usr/bin/git rev-list d8af29f8af7a498331a244d245fb321003ef110d # timeout=10
[Pipeline] End of Pipeline
io.fabric8.kubernetes.client.KubernetesClientException: An error has occurred.
at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:57)
at io.fabric8.kubernetes.client.utils.HttpClientUtils.createHttpClient(HttpClientUtils.java:153)
[...]
Caused by: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
at sun.security.ssl.Alerts.getSSLException(Alerts.java:192)
at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1949)
at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:302)
So I took the ca.crt from Minikube (~/minikube/ca.crt) and added it (base64-encoded) to the jenkins-git-ssh secret which gets mounted in the Jenkins pod in /var/run/secrets/kubernetes.io/serviceaccount. The next build ended with this error:
/usr/bin/git checkout -f d8af29f8af7a498331a244d245fb321003ef110d
/usr/bin/git rev-list d8af29f8af7a498331a244d245fb321003ef110d # timeout=10
[Pipeline] End of Pipeline
io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: GET at: https://kubernetes.default/. Message: Unauthorized
.
The same happens when I use apiserver.crt from Minikube.
When using ca.pem instead I get:
Caused by: java.security.cert.CertificateException: Unable to initialize, java.io.IOException: extra data given to DerValue constructor
at sun.security.x509.X509CertImpl.<init>(X509CertImpl.java:198)
at sun.security.provider.X509Factory.engineGenerateCertificate(X509Factory.java:102)
I can access the Kubernetes API from the Jenkins pod only when adding both apiserver.crt and apiserver.key to the secret. Executing
curl -k --cert apiserver.crt --key apiserver.key https://kubernetes.default/.
is successful then - but the Jenkins build is still failing.
So Im a bit lost here. Does anybody have an idea how to continue?
Thanks and regards,
Daniel
we have a fix but it's not released yet. Details can be found https://github.com/fabric8io/fabric8/issues/6829#issuecomment-301467664 which also describes a workaround.
TL;DR you can edit the jenkins service account and remove the following lines before restarting the jenkins master pod:
-secrets:
-- name: "jenkins-git-ssh"
-- name: "jenkins-master-ssh"
-- name: "jenkins-release-gpg"
Hope that helps.

glassfish 4.1.1 cluster ssh node error

I`m trying to create a cluster with two nodes using glassfish 4.1.1 build 1.
One node is local, an the other one is ssh.The node is working as if I ping it it responds ok. ( Successfully made SSH connection to node node2 (gfNode2))
I have setup ssh, create the node, create one instance (i2) on that node but when i want to start the instance i get:
i2: Could not start instance i2 on node node2 (gfNode2). Command failed on node node2 (gfNode2): Previous synchronization failed at Sep 10, 2016 12:25:27 PM Will perform full synchronization. Removing all cached state for instance i2. CLI802 Synchronization failed for directory config, caused by: remote failure: SynchronizeFiles: Exception reading request Command start-local-instance failed. To complete this operation run the following command locally on host gfNode2 from the GlassFish install location /opt/glassfish4: lib/nadmin start-local-instance --node node2 --sync normal i2
if i run this command on node 2 machine i get:
./nadmin start-local-instance --node node2 --sync normal i2
Previous synchronization failed at Sep 10, 2016 12:25:27 PM
Will perform full synchronization.
Removing all cached state for instance i2.
Enter admin user name> admin
Enter admin password for user "admin">
CLI802 Synchronization failed for directory config, caused by:
remote failure: SynchronizeFiles: Exception reading request
Command start-local-instance failed.
any idea what to try next ?
Update:
The DAS is reachabe, the ssh is working properly (ping-node-ssh works from das).
What I have noticed is that even after i have installed (install-node-ssh) and create with(create-node-ssh), node 2 has no files inside.
At /glassfish4/glassfish/nodes/node2/i2 there is only one file: .syncstate which is empty. The node2/i2 directories are there but nothing in i2. Maybe due to : Removing all cached state for instance i2.
That is what i got in DAS logs:
[2016-09-10T19:31:14.806+0000] [glassfish 4.1] [WARNING] [] [javax.enterprise.system.core] [tid: _ThreadID=106 _ThreadName=admin- listener(5)] [timeMillis: 1473535874806] [levelValue: 900] [[
Could not start instance i2 on node node2 (gfNode2).: Command ' /opt/glassfish4/glassfish/lib/nadmin --_auxinput - --interactive=false start-local-instance --node node2 --sync normal i2' failed on node node2 (gfNode2): Previous synchronization failed at Sep 10, 2016 12:25:27 PM
Will perform full synchronization.
Removing all cached state for instance i2.
Command start-local-instance failed.
CLI802 Synchronization failed for directory config, caused by:
remote failure: SynchronizeFiles: Exception reading request: To complete this operation run the following command locally on host gfNode2 from the GlassFish install location /opt/glassfish4:
lib/nadmin start-local-instance --node node2 --sync normal i2]]
[2016-09-10T19:31:14.818+0000] [glassfish 4.1] [SEVERE] [] [org.glassfish.admingui] [tid: _ThreadID=102 _ThreadName=admin-listener(1)] [timeMillis: 1473535874818] [levelValue: 1000] [[
RestResponse.getResponse() gives FAILURE. endpoint = 'https://localhost:4848/management/domain/servers/server/i2/start-instance'; attrs = '{}']]
[2016-09-10T19:31:14.820+0000] [glassfish 4.1] [SEVERE] [] [org.glassfish.admingui] [tid: _ThreadID=102 _ThreadName=admin-listener(1)] [timeMillis: 1473535874820] [levelValue: 1000] [[
Error in instanceAction ;
endpoint=https://localhost:4848/management/domain/servers/server/i2/start-instance;attrsMap=null]]
If I try to run the command from node2 I got what is showed on first code block of the post...
The problem here is that the remote instance i2 can't communicate with the DAS to download its configuration.
You will need to verify:
Is the DAS online?
Is the server where the DAS is reachable by the remote node?
Is SSH communication working properly? (use the asadmin command ping-node-ssh
If you open the server.log file for the instance and on the DAS, that should give you a more detailed error message and indicate whether or not the request is reaching the DAS.
The instance logs are located in:
$GLASSFISH_HOME/glassfish/nodes/node2/i2/logs/server.log
The domain logs are located in:
$GLASSFISH_HOME/glassfish/domains/domain1/logs/server.log