DCOS 1.8 cluster built on IBM z (s390x) platform does not show stats on DCOS-UI - dcos

We are in process of porting the DCOS on IBM z platform (s390x architecture). So far we have succeeded in building v1.8 from source and have managed to install the cluster having 3 masters and 2 agents configuration.
The Marathon is accessible on port 8080 and we are able to deploy applications on the cluster.
However, the DCOS UI (http://<master ip>/)
does not display any statistics about the cluster like for e.g. the Dashboard shows 0% for all three parameters like 'CPU allocation', 'Memory Allocation' and 'Disk Allocation'.
The System/Components health check displays all services as Healthy except dcos-signal.
DCOS-CLI is able to get information relating to our apps in Marathon.
However following command fails with following error message:
dcos package list
URL [http:///package/list] is unreachable:
('Connection aborted.', BadStatusLine("''",))
We are looking for pointers to debug this issue. Could you please let us know any areas/services/configurations in the DCOS cluster that need to be checked?

Related

Repeated IBM bluemix Node Red app crashing; status 1

My Node Red application in IBM BlueMix is repeatedly crashing - once an hour - with no real error message other than "exited with status: 1."
How can I troubleshoot this issue?
Is there someone from IBM BlueMix support that monitors this that could take a look?
I looked at my logs and there's nothing in there that really says what's going on.
Edit per requests:
The regular log for "OUT/ERR" is scrolling so fast with HTTPD logs that I can't get it to copy/paste. Filtering to "ERR" Channel the only thing I see is below. I believe this is an error which occurs during deploy when the application restarts.
[App/0] ERR js-bson: Failed to load c++ bson extension, using pure JS version
My Node Red application is gathering data from Wink, LIFX, and other IoT services and compiles them together into a Freeboard dashboard.
Caught crash on screenshot here -- not enough cred to post images so it'll only post as a link
The zlib error was fixed in the 0.13.2 Node-RED release (that shipped 19/02/16).
If you re-stage your application is should pick up the new version of Node-RED
You can re-stage the application using the cf command line management application:
cf restage <app name>

How to submit code to a remote Spark cluster from IntelliJ IDEA

I have two clusters, one in local virtual machine another in remote cloud. Both clusters in Standalone mode.
My Environment:
Scala: 2.10.4
Spark: 1.5.1
JDK: 1.8.40
OS: CentOS Linux release 7.1.1503 (Core)
The local cluster:
Spark Master: spark://local1:7077
The remote cluster:
Spark Master: spark://remote1:7077
I want to finish this:
Write codes(just simple word-count) in IntelliJ IDEA locally(on my laptp), and set the Spark Master URL to spark://local1:7077 and spark://remote1:7077, then run my codes in IntelliJ IDEA. That is, I don't want to use spark-submit to submit a job.
But I got some problem:
When I use the local cluster, everything goes well. Run codes in IntelliJ IDEA or use spark-submit can submit job to cluster and can finish the job.
But When I use the remote cluster, I got a warning log:
TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
It is sufficient resources not sufficient memory!
And this log keep printing, no further actions. Both spark-submit and run codes in IntelliJ IDEA result the same.
I want to know:
Is it possible to submit codes from IntelliJ IDEA to remote cluster?
If it's OK, does it need configuration?
What are the possible reasons that can cause my problem?
How can I handle this problem?
Thanks a lot!
Update
There is a similar question here, but I think my scene is different. When I run my codes in IntelliJ IDEA, and set Spark Master to local virtual machine cluster, it works. But I got Initial job has not accepted any resources;... warning instead.
I want to know whether the security policy or fireworks can cause this?
Submitting code programatically (e.g. via SparkSubmit) is quite tricky. At the least there is a variety of environment settings and considerations -handled by the spark-submit script - that are quite difficult to replicate within a scala program. I am still uncertain of how to achieve it: and there have been a number of long running threads within the spark developer community on the topic.
My answer here is about a portion of your post: specifically the
TaskSchedulerImpl: Initial job has not accepted any resources; check
your cluster UI to ensure that workers are registered and have
sufficient resources
The reason is typically there were a mismatch on the requested memory and/or number of cores from your job versus what were available on the cluster. Possibly when submitting from IJ the
$SPARK_HOME/conf/spark-defaults.conf
were not properly matching the parameters required for your task on the existing cluster. You may need to update:
spark.driver.memory 4g
spark.executor.memory 8g
spark.executor.cores 8
You can check the spark ui on port 8080 to verify that the parameters you requested are actually available on the cluster.

elasticsearch-mesos not getting listed under frameworks of mesosUI

Iam trying to run elasticsearch-mesos on mesos.My machine is running ubuntu 14.04. I have running mesos cluster installed with mesosphere packages by following these instructions. When I run test frameworks it gets lister under frameworks of mesosUI but for elasticsearch-mesos its not getting listed under mesos webUI. I want to run elasticsearch-mesos on top of mesos. I followed instructions given here. When I run ./elasticsearch-mesos I am getting a message in terminal
I0108 17:24:01.898540 23861 group.cpp:385] Trying to create path '/mesos' in ZooKeeper
I tried running ./elasticsearch-mesos on both mesos masters and slaves.
The last few lines of terminal output is given below
2015-01-08 17:24:01,881:23844(0x7f175bfff700):ZOO_INFO#zookeeper_init#786: Initiating
client connection, host=localhost:2181 sessionTimeout=10000 watcher=0x7f1762a3e6a0
sessionId=0 sessionPasswd=<null> context=0x7f1710002530 flags=0
I0108 17:24:01.881392 23858 sched.cpp:137] Version: 0.21.1
2015-01-08 17:24:01,881:23844(0x7f172b7fe700):ZOO_INFO#check_events#1703: initiated
connection to server [127.0.0.1:2181]
2015-01-08 17:24:01,897:23844(0x7f172b7fe700):ZOO_INFO#check_events#1750: session
establishment complete on server [127.0.0.1:2181], sessionId=0x14ac7c469270006,
negotiated timeout=10000
I0108 17:24:01.898455 23861 group.cpp:313] Group process (group(1)#127.0.1.1:38668)
connected to ZooKeeper
I0108 17:24:01.898509 23861 group.cpp:790] Syncing group operations: queue size (joins,
cancels, datas) = (0, 0, 0)
I0108 17:24:01.898540 23861 group.cpp:385] Trying to create path '/mesos' in ZooKeeper
According to the README at https://github.com/mesosphere/elasticsearch-mesos,
you may need to modify mesos.master.url to point to the same ZK url that the Mesos master is using (maybe not localhost). If you're using a single-master Mesos cluster, you can skip the ZK url and point this parameter directly to the Mesos master.
Please also note that the elasticsearch framework is a bit outdated, so use with caution

OpenSliceDDS across a network

I am completely new to the DDS world. I understand basic concepts like publish and subscribe, and the stuff that can be gained from the documentation. I am attempting to use OpenSlice DDS, and am able to get through the tutorial without much difficulty. However, I want to get two different computers on the same network to talk to each other, which seems like a relatively simple task, but i can find no documentation on it.
For example, the message chat room tutorial... how would i get the message board running on one machine, and the chatter on another machine?
Thanks!
Found it! http://opensplice.org/pipermail/developer/2009-July/000094.html.
To summarize from the link:
Setup your environment on node 1 by running the release file in the OSPL_HOME directory (release.bat)
start the opensplice daemon on node 1 (ospl start)
run the messageboard application on node 1
Setup your environment on node 1 by running the release file in the OSPL_HOME directory (release.bat)
start the opensplice daemon on node 2 (ospl start)
run the chatter application on node 2

Brisk TaskTracker not starting in a multi-node Brisk setup

I have a 3 node Brisk cluster (Briskv1.0_beta2). Cassandra is working fine (all three nodes see each other and data is balanced across the ring). I started the nodes with the brisk cassandra -t command. I cannot, however, run any Hive or Pig jobs. When I do, I get an exception saying that it cannot connect to the task tracker.
During the startup process, I see the following in the log:
TaskTracker.java (line 695) TaskTracker up at: localhost.localdomain/127.0.0.1:34928
A few lines later, however, I see this:
Retrying connect to server: localhost.localdomain/127.0.0.1:8012. Already tried 9 time(s).
INFO [TASK-TRACKER-INIT] RPC.java (line 321) Server at localhost.localdomain/127.0.0.1:8012 not available yet, Zzzzz...
Those lines are repeated non-stop as long as my cluster is running.
My cassandra.yaml file specifies the box IP (not 0.0.0.0 or localhost) as the listen_address and the rpc_address is set to 0.0.0.0
Why is the client attempting to connect to a different port than the log shows the task tracker as using? Is there anywhere these addresses/ports can be specified?
I figured this out. In case anyone else has the same issues, here's what was going on:
Brisk uses the first entry in the Cassandra cluster's seed list to pick the initial jobtracker. One of my nodes had 127.0.0.1 in the seed list. This worked for the Cassandra setup since all the other nodes in the cluster connected to that box to get the cluster topology but this didn't work for the job tracker selection.
looks like your jobtracker isn't running. What do you see when you run "brisktool jobtracker"?