DataStax Lifecycle Manager Failing to Install New Cluster - datastax

We are using Lifecycle Manager to distribute DSE to a new Cass cluster. The install fails no matter what is tried. The log shows the following where I edited ip addresses.
Meld failed on name="node1_name" ssh-management-address="node1_ip" node-id="37292a99-f324-4967-803c-d9c50ab0b87b" job-id="d413b5ba-ba87-4de4-bac7-2efac334d2d4" stdout="503 Server Error: Connect failed for url: http://opscenter:8888/api/v1/lcm/internal/nodes/37292a99-f324-4967-803c-d9c50ab0b87b/status event-resource=http://opscenter:8888/api/v1/lcm/internal/nodes/37292a99-f324-4967-803c-d9c50ab0b87b/status2017-07-10 15:53:43,925 - opsc-meld - ERROR - 503 Server Error: Connect failed for url: http://opscenter:8888/api/v1/lcm/internal/nodes/37292a99-f324-4967-803c-d9c50ab0b87b/status event-resource=http://opscenter:8888/api/v1/lcm/internal/nodes/37292a99-f324-4967-803c-d9c50ab0b87b/status " stderr=""

Related

Datadog trace errors while trying to enable APM

I am trying to enable APM for my application . In my deployment.yaml, I have set the flag : DD_APM_ENABLED to true and am installing the dd-java-agent.jar in my dockerfile . In spite of that after deployment , I see that in datadog APM Service for my service is not enabled . And I keep getting the below errors :
[dd-trace-processor] WARN datadog.trace.agent.common.writer.ddagent.DDAgentApi - Error while sending 81 traces to the DD agent. java.net.SocketTimeoutException: timeout (Will not log errors for 5 minutes)
[dd.trace 2023-02-16 ] [OkHttp http://10.184.117.129:8126/...] WARN com.datadog.profiling.uploader.ProfileUploader - Failed to upload profile Status: 503 Service Unavailable
Any suggestions as to what I am doing wrong here ?

Error while running query on Impala with Superset

I'm trying to connect impala to superset, and when I test the connection prints: "Seems OK!", and when I try to see databases on impala with the SQL Editor in the left side it shows all databases without problems.
Preview of Databases/Tables
But when i write a query and click on "Run Query", it gives the error: "Could not start SASL: b'Error in sasl_client_start (-1) SASL(-1): generic failure: GSSAPI Error: Unspecified GSS failure. Minor code may provide more information (Ticket expired)'"
Error running query
I'm running superset with SSL and in production mode (with Gunicorn) and Impala with SSL in a Kerberized Hadoop Cluster, and my impala database config is:
Impala Config
And in the extras I put:
{
"metadata_params": {},
"engine_params": {
"connect_args": {
"port": 21050,
"use_ssl": "True",
"ca_cert": "path/to/my/ca_cert.pem",
"auth_mechanism": "GSSAPI"
}
},
"metadata_cache_timeout": {},
"schemas_allowed_for_csv_upload": []
}
How can I solve this error? In my superset log it only shows:
Triggering query_id: 65
INFO:superset.views.core:Triggering query_id: 65
Query 65: Running query on a Celery worker
INFO:superset.views.core:Query 65: Running query on a Celery worker
Versions: Superset 0.36.0, Impyla 0.16.2
I was able to fix this error doing this steps:
1 - Created service user for celery-worker, created a kerberos ticket for him and created a crontab to renew the ticket.
2 - Runned celery worker from this service user, instead running from root.
3 - Killed an celery-worker that was running in another machine of my cluster
4 - Restarted Impala and Superset
I think this error ocurred because in some queries instead of use the celery worker in my superset machine, it was using the celery worker that was in another machine without a valid kerberos ticket. I could fix this error because when I was reading celery-worker log , it showed that a connection with the celery worker in other machine failed in a query running.

Puppet agent is not running successfully after updating ssl certs

I am running puppet 3.7. The certs are expiring for me so I have updated the certs (after creating a backup so I am able to get back to the original state and that's fine). After updating the certs on puppetmaster using this, updating certs on the agent using this and updating certs on puppetdb using this, I am unable to run puppet agent successfully on a client box. It gives me the following error:
root#ip-10-181-36:/var/lib/puppet# sudo puppet agent -t
Warning: Setting templatedir is deprecated. See http://links.puppetlabs.com/env-settings-deprecations
(at /usr/lib/ruby/vendor_ruby/puppet/settings.rb:1139:in 'issue_deprecation_warning')
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Warning: Unable to fetch my node definition, but the agent run will continue:
Warning: Error 403 on SERVER: Forbidden request: newer-generic-host(127.0.0.1) access to /node/ip-10-181-36 [find] authenticated at :39
Error: Could not retrieve catalog from remote server: Error 403 on SERVER: Forbidden request: newer-generic-host(127.0.0.1) access to /catalog/ip-10-181-36 [find] authenticated at :1
Warning: Not using cache on failed catalog
Error: Could not retrieve catalog; skipping run
Error: Could not send report: Error 403 on SERVER: Forbidden request: newer-generic-host(127.0.0.1) access to /report/ip-10-181-36 [save] authenticated at :91
I am stuck at this point and no googling or reading docs or seeing the logs is helping. Does anyone have any ideas?

Unable to authenticate cluster user: HORNETQ.CLUSTER.ADMIN.USER

I am getting following error in WildFly-8.2.0.Final domain mode. This error comes when I start second node.
ERROR [org.hornetq.core.server] (default I/O-1) HQ224018: Failed to create session: HornetQClusterSecurityException[errorType=CLUSTER_SECURITY_EXCEPTION message=HQ119099: Unable to authenticate cluster user: HORNETQ.CLUSTER.ADMIN.USER]
This is fixed by adding
${jboss.messaging.cluster.password:#password#} to subsystem in standalone.xml.

RabbitMQ Management : webmachine error: path="/api/overview"

After I login to rabbitmq, I get the following error :
Got response code 500 with body
Internal Server Error
The server encountered an error while processing this request:
{error,{error,{badmatch,{error,nxdomain}},
[{rabbit_nodes,cluster_name_default,0},
{rabbit_nodes,cluster_name,0},
{rabbit_mgmt_wm_overview,to_json,2},
{webmachine_resource,resource_call,3},
{webmachine_resource,do,3},
{webmachine_decision_core,resource_call,1},
{webmachine_decision_core,decision,1},
{webmachine_decision_core,handle_request,2}]}}
I see the following error in the log file in /var/log/rabbitmq :
=ERROR REPORT==== 31-Oct-2014::06:20:40 ===
webmachine error: path="/api/overview"
{error,{error,{badmatch,{error,nxdomain}},
[{rabbit_nodes,cluster_name_default,0},
{rabbit_nodes,cluster_name,0},
{rabbit_mgmt_wm_overview,to_json,2},
{webmachine_resource,resource_call,3},
{webmachine_resource,do,3},
{webmachine_decision_core,resource_call,1},
{webmachine_decision_core,decision,1},
{webmachine_decision_core,handle_request,2}]}}
The workers are able to connect to the broker and are receiving the messages, also the new relic plugin for rabbitmq seems to be working fine. However I am unable to login thru the management plugin. Any pointers in this regard will be helpful.
I had updated the hostname of the system and that was causing the issue. See the link below
https://groups.google.com/forum/#!msg/rabbitmq-users/9P-BAwGVHJU/fwOpZPJywwYJ
I added 127.0.0.1 'hostname' in /etc/hosts. That solved the management plugin problem. However rabbitmqctl still showed the following error. Restarted rabbitmq and it solved the rabbitmqctl problem as well
Listing queues ...
Error: unable to connect to node 'rabbit#<hostname>': nodedown
DIAGNOSTICS
===========
attempted to contact: ['rabbit#<hostname>']
rabbit#<hostname>:
* connected to epmd (port 4369) on <hostname>
* epmd reports node 'rabbit' running on port 25672
* TCP connection succeeded but Erlang distribution failed
* suggestion: hostname mismatch?
* suggestion: is the cookie set correctly?
current node details:
- node name: <nodename>
- home dir: <homedir>
- cookie hash: <cookiehash>