Datadog trace errors while trying to enable APM - amazon-eks

I am trying to enable APM for my application . In my deployment.yaml, I have set the flag : DD_APM_ENABLED to true and am installing the dd-java-agent.jar in my dockerfile . In spite of that after deployment , I see that in datadog APM Service for my service is not enabled . And I keep getting the below errors :
[dd-trace-processor] WARN datadog.trace.agent.common.writer.ddagent.DDAgentApi - Error while sending 81 traces to the DD agent. java.net.SocketTimeoutException: timeout (Will not log errors for 5 minutes)
[dd.trace 2023-02-16 ] [OkHttp http://10.184.117.129:8126/...] WARN com.datadog.profiling.uploader.ProfileUploader - Failed to upload profile Status: 503 Service Unavailable
Any suggestions as to what I am doing wrong here ?

Related

X-Ray Daemon don't receive any data from envoy

I have a service running a task definition with three containers:
service itself
envoy
x-ray daemon
And I want to trace and monitor my services interacting with each other with x-ray.
But I don't see any data in x-ray.
I can see the request logs and everything in the envoy logs but there are no error messages about missing connection to the x-ray daemon.
Envoy container has three env variables:
APPMESH_VIRTUAL_NODE_NAME = mesh/mesh-name/virtualNode/service-virtual-node
ENABLE_ENVOY_XRAY_TRACING = 1
ENVOY_LOG_LEVEL = trace
The x-ray daemon is pretty plain and has just a name and an image (amazon/aws-xray-daemon:1).
But when looking in the logs of the x-ray dameon, there is only the following:
2022-05-31T14:48:05.042+02:00 2022-05-31T12:48:05Z [Info] Initializing AWS X-Ray daemon 3.0.0
2022-05-31T14:48:05.042+02:00 2022-05-31T12:48:05Z [Info] Using buffer memory limit of 76 MB
2022-05-31T14:48:05.042+02:00 2022-05-31T12:48:05Z [Info] 1216 segment buffers allocated
2022-05-31T14:48:05.051+02:00 2022-05-31T12:48:05Z [Info] Using region: eu-central-1
2022-05-31T14:48:05.788+02:00 2022-05-31T12:48:05Z [Error] Get instance id metadata failed: RequestError: send request failed
2022-05-31T14:48:05.788+02:00 caused by: Get http://169.254.169.254/latest/meta-data/instance-id: dial tcp xxx.xxx.xxx.254:80: connect: invalid argument
2022-05-31T14:48:05.789+02:00 2022-05-31T12:48:05Z [Info] Starting proxy http server on 127.0.0.1:2000
As far as I read, the error you can see in these logs doesn't affect the functionality (https://repost.aws/questions/QUr6JJxyeLRUK5M4tadg944w).
I'm pretty sure I'm missing a configuration or access right.
It's running already on staging but I set this up several weeks ago and I don't find any differences between the configurations.
Thanks in advance!
In my case, I made a copy-paste mistake by copying trailing line break into the name of the environment variable ENABLE_ENVOY_XRAY_TRACING which wasn't visible in the overview and only inside the text field.

neutron-linuxbridge-agent oslo_service.service amqp.exceptions.InternalError: Connection.open: (541) INTERNAL_ERROR

Openstack Train version's neutron-linuxbridge-agent component's log show error:
2022-03-17 14:38:36.727 6 ERROR oslo_service.service File "/var/lib/kolla/venv/lib/python3.6/site-packages/amqp/connection.py", line 648, in _on_close
2022-03-17 14:38:36.727 6 ERROR oslo_service.service (class_id, method_id), ConnectionError)
2022-03-17 14:38:36.727 6 ERROR oslo_service.service amqp.exceptions.InternalError: Connection.open: (541) INTERNAL_ERROR - access to vhost '/' refused for user 'openstack': vhost '/' is down
2022-03-17 14:38:36.727 6 ERROR oslo_service.service
2022-03-17 14:38:36.729 6 INFO neutron.plugins.ml2.drivers.agent._common_agent [-] Stopping Linux bridge agent agent.
docker logs neutron_linuxbridge_agent get:
++ /usr/bin/update-alternatives --query iptables
update-alternatives: error: no alternatives for iptables
++ . /usr/local/bin/kolla_neutron_extend_start
+ echo 'Running command: '\''neutron-linuxbridge-agent --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/ml2_conf.ini'\'''
+ exec neutron-linuxbridge-agent --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/ml2_conf.ini
Running command: 'neutron-linuxbridge-agent --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/ml2_conf.ini'
All openstack network agent list show state are UP, but Alive are XXX.
What's the problem with my cluster, and how could I fixed that? Thanks a lot.
The key server is rabbitmq reference of amqp.exceptions.InternalError, and the rabbit#node-3.log shows:
2022-03-18 06:50:35.270 [error] <0.21119.0> Error on AMQP connection <0.21119.0> (1.1.1.2:12345 -> 1.1.1.3:55672 - neutron-linuxbridge-agent:7:11111111-1111-1111-1111-111111111111, vhost: 'none', user: 'openstack', state: opening), channel 0:
{handshake_error,opening,
{amqp_error,internal_error,
"access to vhost '/' refused for user 'openstack': vhost '/' is down",
'connection.open'}}
While check and login the rabbitmq server site(http://1.1.1.3:15672/), I get this error tip:
rabbitmq virtual host experienced an error on node and may be inaccessible
Solve it by:
1, come in the rabbitmq container, and remove or move out recovery.dets file in directory /var/lib/rabbitmq/mnesia/rabbit#node-3/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L.
2, restart rabbitmq container.
Because of:
In RabbitMQ versions starting with 3.7.0 all messages data is combined in the msg_stores/vhosts directory and stored in a subdirectory per vhost. Each vhost directory is named with a hash and contains a .vhost file with the vhost name, so a specific vhost's message set can be backed up separately.
In RabbitMQ versions prior to 3.7.0 messages are stored in several directories under the node data directory: queues, msg_store_persistent and msg_store_transient. Also there is a recovery.dets file which contains recovery metadata if the node was stopped gracefully.
My whole cluster was reboot by accident, it was recoveried by this method.
if you wanna fix your problem easily please deploy your Rabbimq again with Kolla-ansible.
kolla-ansible -i <INVENTORY> deploy -t rabbitmq -vvvv
it's my experience that the easiest way with the lowest cost of fixing Rabbimq or oslo problem in OpenStack is to redeploy Rabbitmq and invest your time.

Unable to start second rabbitmq node on single Windows host

I am trying to run two rabbitmq nodes on a single windows host. My end goal is to run two rabbitmq services.
currently, I have the following commands for the second node in rabbitmq-env-conf.bat :
set RABBITMQ_CONFIG_FILE=%APPDATA%\RabbitMQ\rabbitmq.conf
set RABBITMQ_NODENAME=rabbit2#hostname
set RABBITMQ_DIST_PORT=5673
set RABBITMQ_SERVER_START_ARGS="-rabbitmq_management listener [{port,15673}]"
Running .\rabbitmq-server.bat start produces the following error :
. . .
Starting broker...Logger - error: {removed_failing_handler,rabbit_log}
BOOT FAILED
===========
Error during startup: {error,
{rabbitmq_management,
{bad_return,
{{rabbit_mgmt_app,start,[normal,[]]},
{'EXIT',
{{could_not_start_listener,
[{cowboy_opts,[{sendfile,false}]},{port,15672}],
{shutdown,
{failed_to_start_child,ranch_acceptors_sup,
{listen_error,
{acceptor,{0,0,0,0,0,0,0,0},15672},
eaddrinuse}}}},
. . .
From the log :
Application rabbitmq_management exited with reason: {{could_not_start_listener,[{cowboy_opts,[{sendfile,false}]},{port,15672}],{shutdown,{failed_to_start_child,ranch_acceptors_sup,{listen_error,{acceptor,{0,0,0,0,0,0,0,0},15672},eaddrinuse}}}},{gen_server,call,[rabbit_web_dispatch_registry,{add,rabbitmq_management_tcp,[{cowboy_opts,[{sendfile,false}]},{port,15672}],#Fun<rabbit_web_dispatch.0.73002970>,[{'_',[],[{[],[],rabbit_mgmt_wm_static,{priv_file,rabbitmq_management,"www/index.html"}},{[<<"api">>,<<"overview">>],[],rabbit_mgmt_wm_overview,...},...]}],...},...]}}
It looks like I am unable to setup the rabbitmq management port successfully despite suppling starting args.
15672 is the first rabbitmq's management port number and I am not sure why this number is being picked up.
Some troubleshooting pointers will be appreciated.

Spinnaker: URLRedirection cannot recognize subpath

I know its experimental, I am trying to setup use docker-compose to build Spinnaker. I am seeing an error when trying to browse localhost:9000. Its trying to redirect to this page.
http://localhost:8084/auth/redirectto=http%3A%2F%2Flocalhost%3A9000%2F%23%2Finfrastructure
Looks like either its a fiat or gate issue. Tried adding proxy to apache2.
Errors in Fiat:
RetrofitError: unexpected url: front50/serviceAccounts
2017-09-15 19:24:31.642 WARN 1 --- [ont50Service-10]
c.n.s.f.p.internal.Front50Service : [] Falling back to service
account cache. Cause: unexpected url: front50/serviceAccounts
2017-09-15 19:24:31.645 WARN 1 --- [ecutionAction-1]
c.n.s.fiat.roles.UserRolesSyncer : [] User permission sync
failed. Server status is DOWN. Trying again in 10000 ms. Cause:
(Provider: DefaultServiceAccountProvider) retrofit.RetrofitError:
unexpected url: front50/serviceAccounts
Errors in gate:
2017-09-15 19:18:19.386 ERROR 1 --- [ost-startStop-1]
o.s.b.b.PropertiesConfigurationFactory : Properties configuration
failed validation
2017-09-15 19:18:19.394 ERROR 1 --- [ost-startStop-1]
o.s.b.b.PropertiesConfigurationFactory : Field error in object
'target' on field 'services[ORCA_HOST]': rejected value [orca]; codes
For errors in fiat, you can configure the environment variable in docker compose to
environment:
- "SERVICES_CLOUDDRIVER_BASEURL=http://clouddriver:7002"
- "SERVICES_FRONT50_BASEURL=http://front50:8080"
Update: This workaround work for Gate
environment:
- "services.clouddriver.host=clouddriver"
- "services.echo.host=echo"
- "services.front50.host=front50"
- "services.igor.host=igor"
- "services.orca.host=orca"
- "services.rosco.host=rosco"

DataStax Lifecycle Manager Failing to Install New Cluster

We are using Lifecycle Manager to distribute DSE to a new Cass cluster. The install fails no matter what is tried. The log shows the following where I edited ip addresses.
Meld failed on name="node1_name" ssh-management-address="node1_ip" node-id="37292a99-f324-4967-803c-d9c50ab0b87b" job-id="d413b5ba-ba87-4de4-bac7-2efac334d2d4" stdout="503 Server Error: Connect failed for url: http://opscenter:8888/api/v1/lcm/internal/nodes/37292a99-f324-4967-803c-d9c50ab0b87b/status event-resource=http://opscenter:8888/api/v1/lcm/internal/nodes/37292a99-f324-4967-803c-d9c50ab0b87b/status2017-07-10 15:53:43,925 - opsc-meld - ERROR - 503 Server Error: Connect failed for url: http://opscenter:8888/api/v1/lcm/internal/nodes/37292a99-f324-4967-803c-d9c50ab0b87b/status event-resource=http://opscenter:8888/api/v1/lcm/internal/nodes/37292a99-f324-4967-803c-d9c50ab0b87b/status " stderr=""