Cloudera service monitor unable to start - apache

I get the following error on restarting the cloudera management service in a docker container:quickstart:latest, i had restarted after an error showed service monitor not running:
Mar 15, 8:45:43.760 AM ERROR com.cloudera.cmon.firehose.Main
Failed to start Firehose
java.io.IOException: Unknown version of the versioned LevelDB store.
at com.cloudera.cmon.tstore.leveldb.LDBUtils.openVersionedDB(LDBUtils.java:253)
at com.cloudera.cmon.tstore.leveldb.LDBPartitionMetadataStore.<init>(LDBPartitionMetadataStore.java:139)
at com.cloudera.cmon.tstore.leveldb.LDBPartitionMetadataStore.<init>(LDBPartitionMetadataStore.java:133)
at com.cloudera.cmon.tstore.leveldb.LDBPartitionMetadataStore.createInPartitionMetadataSubdirectory(LDBPartitionMetadataStore.java:119)
at com.cloudera.cmon.tstore.leveldb.LDBPartitionManager.createLDBPartitionManager(LDBPartitionManager.java:193)
at com.cloudera.cmon.firehose.LDBWorkDetailsTable.<init>(LDBWorkDetailsTable.java:90)
at com.cloudera.cmon.firehose.LDBWorkDetailsStore.<init>(LDBWorkDetailsStore.java:67)
at com.cloudera.cmon.firehose.LDBWorkStoreFactory.createYarnWorkDetailsStore(LDBWorkStoreFactory.java:139)
at com.cloudera.cmon.firehose.Firehose.<init>(Firehose.java:222)
at com.cloudera.cmon.firehose.Main.main(Main.java:515)
Also following is shown in the cloudera.quickstart dashboard:
Unable to issue query: the Service Monitor is not running
This is a common error found in cloudera docker container booted on a single node

I solved it by removing the old /var/lib/cloudera-service-monitor.

Related

Getting AccessDenied error while upgrading EKS cluster

I am trying to upgrade my EKS cluster from 1.15 to 1.16 using same ci pipeline which created the cluster...So the credentials have no issue.However I am receiving AccessDenied error.I am using eksctl upgrade cluster command to upgrade cluster.
info: cluster test-cluster exists, will upgrade it
[ℹ] eksctl version 0.33.0
[ℹ] using region us-east-1
[!] NOTE: cluster VPC (subnets, routing & NAT Gateway) configuration changes are not yet implemented
[ℹ] will upgrade cluster "test-cluster" control plane from current version "1.15" to "1.16"
Error: AccessDeniedException:
status code: 403, request id: 1a02b0fd-dca5-4e54-9950-da29cac2cea9
My eksctl version 0.33.0
I am not sure why the same ci pipeline which created the cluster now throwing Access denied error when trying to upgrade the cluster..Is there any permissions I need to add to IAM policy for the user ? I dont find anything in the prerequisites document.So Please let me know what I am missing here.
I have figured out the error was due to missing IAM permission.
I used --verbose 5 to diagnose this issue.

PCF Dev RabbitMQ Management Dashboard not installed

On a fresh install of PCF Dev, after logging in:
cf create-service p-rabbitmq standard my_rabbitmq
Showing info of service my_rabbitmq in org system / space system as admin...
name: my_rabbitmq
service: p-rabbitmq
bound apps:
tags:
plan: standard
description: RabbitMQ is a robust and scalable high-performance multi-protocol messaging broker.
documentation:
dashboard: https://rabbitmq-management.local.pcfdev.io/#/login/mu-3ea7453d-7cff-44bd-b7a7-ce0290d9b4d6-v509qlipcuhnu6relaeuat49ca/25849257198976921347999121731293969259
Showing status of last operation from service my_rabbitmq...
status: create succeeded
message:
started: 2018-01-05T01:48:17Z
updated: 2018-01-05T01:48:17Z
When I go to the management dashboard url, the browser displays the following error:
404 Not Found: Requested route ('rabbitmq-management.local.pcfdev.io')
does not exist.
How can I get the RabbitMQ Management Dashboard to be installed and respond?
The only fix I found was installing an earlier version of PCF Dev. This solved the issue.
For more details check this thread on Github.

ActiveMQ 5.15 HTTP ERROR: 503

Run environment :linux (CentOS 7), JDK 1.8, & ActiveMQ 5.15
I started Activemq then visit the management page with Chrome,when I try to log in with the default username & password I get the following error;
HTTP ERROR: 503
Problem accessing /admin/. Reason:
Service Unavailable Powered by Jetty://
How can I resolve this problem?
I was getting this same error. It turns out that I had run it as root user originally, then later I stopped it and ran it as a non-root user. Certain data files that had been created and owned by the original root instance were not accessible to the non-root user.
Check the ownership of the files, and change them if necessary to match the user that the broker is running as.
Had the same issue.
Maybe something went wrong the extraction of the package.
I downloaded this:
wget https://archive.apache.org/dist/activemq/5.15.0/apache-activemq-5.15.0-bin.tar.gz
and extracted it with:
sudo tar -zxvf apache-activemq-5.15.0-bin.tar.gz -C /opt
then it worked for me.
My two cents:
I start with the activemq in Ubuntu Repo, but then later change to binary package from official website.
In my case, the repo version left an /etc/default/activemq config file, which runs activemq with user "activemq". It turns out in previous experiments, I did not kill the old processes running under "activemq" when I start activemq under my own user name. There are two activemq processes running under different user names, and when connecting to admin console, I have a 503.
I delete the /etc/default/activemq file, and kill all activemq processes running under "activemq", then restart activemq with my user name, the 503 is gone.

ERROR: The overall deployment failed because too many individual instances failed deployment

I'm trying to deploy using CircleCI -> S3 -> CodeDeploy -> EC2.
I was able to upload deploy image onto S3 from CircleCI, but unable to deploy S3 to EC2 instance. Here's the error.
The overall deployment failed because too many individual instances
failed deployment, too few healthy instances are available for
deployment, or some instances in your deployment group are
experiencing problems. (Error code: HEALTH_CONSTRAINTS)
The error was provided from CodeDeploy. I can't figure out why and how.
I'd appreciate if you give some advise.
If you are running on Ubuntu there might be plenty of reasons, here is a checklist can verify
Check code-deploy agent is installed on your EC2 Instance. Please refer this document to install code deploy agent.
https://docs.aws.amazon.com/codedeploy/latest/userguide/codedeploy-agent-operations-install-ubuntu.html
$ sudo service codedeploy-agent status
In case if you are running Ubuntu release 20.x and you get this error
./install:22:in block in method_missing': undefined method path' for
#<IO:> (NoMethodError)
try running the install file via this script
sudo ./install auto > /tmp/logfile
Check you have EC2 Instance Code Deploy Role -> Create a code deployment role and assign it to the Instance, https://docs.aws.amazon.com/codedeploy/latest/userguide/getting-started-create-service-role.html.
In case if you assign the EC2 Role after initiate, restart the server.
Check your appsec.yml file placement as per the top answer, try to avoid any long timeout in it.
Log into your instance check your error log
$ tail -f /var/log/aws/codedeploy-agent/codedeploy-agent.log
You should be able to figure out what caused the individual instances to fail by digging into the deployment instance details:
http://docs.aws.amazon.com/codedeploy/latest/userguide/how-to-view-instance-details.html
These should contain more detailed information about why your application was unable to be deployed.
This error is commonly due to problems in the configuration of the appSpec.yml or appSpec.json file (It depends on the format you are using).
"If you have any Hook I recommend that you remove them, check if it works, then you can add one by one (the Hooks) and so you can identify the error"
The appspec.yml file should be located at the root of your project:
│-- appspec.yml
│-- index.html
└-- scripts
│-- install_dependencies
│-- start_server
└-- stop_server
In the scripts folder you will have to place the processes that you want to be executed according to the Hook
Here is an example of the appspec.yml file
version: 0.0
os: linux
files:
- source: /index.html
destination: /var/www/html/
hooks:
BeforeInstall:
- location: scripts/install_dependencies
timeout: 300
runas: root
- location: scripts/start_server
timeout: 300
runas: root
ApplicationStop:
- location: scripts/stop_server
timeout: 300
runas: root
I hope I can help you 😃👻🕺🏾
Make sure the CodeDeploy Host Agent Service is running in your target EC2 instance.
The error you are facing is a generic error message thrown on any of the event failure which could be beforeblockTraffic, blockTraffic, ApplicationStop etc.
The first step in this case would be check whether code deploy agent is running or not if first event i.e. BeforeBlockTraffic event is failed.
As you can see in the screenshot below, the event failure message would tell you the exact error behind.
From the failed deployments, I can see all lifecycle events were skipped. Instance i-0bcc36e73851297f2 is currently in Stopped state but I can see the IAM instance profile is missing. Your Amazon EC2 instances need permission to access the Amazon S3 buckets or GitHub repositories where the applications that will be deployed by AWS CodeDeploy are stored. To launch Amazon EC2 instances that are compatible with AWS CodeDeploy, you must create an additional IAM role, an instance profile. 1
For such failures, you can always begin with a general troubleshooting checklist for a failed deployment 2 and then look for troubleshooting guides on Deployment Issues and Instance issues3.
1[http://docs.aws.amazon.com/codedeploy/latest/userguide/how-to-create-iam-instance-profile.html]1
2 [http://docs.aws.amazon.com/codedeploy/latest/userguide/troubleshooting-general.html]2
3 [http://docs.aws.amazon.com/codedeploy/latest/userguide/troubleshooting.html]3
Check the status of the Code Deploy Agent. In my case, the agent wasn't up.
Please check the role given to the ec2 machine(where the agent is running). It should have s3 access as well. This resolved my issue.
"The CodeDeploy agent did not find an AppSpec file within the unpacked revision directory at revision-relative path 'appspec.yml'"
Please place your appspec.yml file in your root folder to solve this error
To access your after script and before script
The overall deployment failed because too many individual instances failed deployment, too few healthy instances are available for deployment, or some instances in your deployment group are experiencing problems.

elasticsearch-mesos not getting listed under frameworks of mesosUI

Iam trying to run elasticsearch-mesos on mesos.My machine is running ubuntu 14.04. I have running mesos cluster installed with mesosphere packages by following these instructions. When I run test frameworks it gets lister under frameworks of mesosUI but for elasticsearch-mesos its not getting listed under mesos webUI. I want to run elasticsearch-mesos on top of mesos. I followed instructions given here. When I run ./elasticsearch-mesos I am getting a message in terminal
I0108 17:24:01.898540 23861 group.cpp:385] Trying to create path '/mesos' in ZooKeeper
I tried running ./elasticsearch-mesos on both mesos masters and slaves.
The last few lines of terminal output is given below
2015-01-08 17:24:01,881:23844(0x7f175bfff700):ZOO_INFO#zookeeper_init#786: Initiating
client connection, host=localhost:2181 sessionTimeout=10000 watcher=0x7f1762a3e6a0
sessionId=0 sessionPasswd=<null> context=0x7f1710002530 flags=0
I0108 17:24:01.881392 23858 sched.cpp:137] Version: 0.21.1
2015-01-08 17:24:01,881:23844(0x7f172b7fe700):ZOO_INFO#check_events#1703: initiated
connection to server [127.0.0.1:2181]
2015-01-08 17:24:01,897:23844(0x7f172b7fe700):ZOO_INFO#check_events#1750: session
establishment complete on server [127.0.0.1:2181], sessionId=0x14ac7c469270006,
negotiated timeout=10000
I0108 17:24:01.898455 23861 group.cpp:313] Group process (group(1)#127.0.1.1:38668)
connected to ZooKeeper
I0108 17:24:01.898509 23861 group.cpp:790] Syncing group operations: queue size (joins,
cancels, datas) = (0, 0, 0)
I0108 17:24:01.898540 23861 group.cpp:385] Trying to create path '/mesos' in ZooKeeper
According to the README at https://github.com/mesosphere/elasticsearch-mesos,
you may need to modify mesos.master.url to point to the same ZK url that the Mesos master is using (maybe not localhost). If you're using a single-master Mesos cluster, you can skip the ZK url and point this parameter directly to the Mesos master.
Please also note that the elasticsearch framework is a bit outdated, so use with caution