Context deadline exceeded - selenoid

Selenoid is throwing context deadline exceeded after the tests runs for a while. Initially tests runs fine but somewhere after 15-20 mins, it starts throwing exceptions.
Selenoid and Selenoid-ui is run using commands:
./selenoid -conf /home/ec2-user/browsers.json -limit 10 -retry-count 3 -session-attempt-timeout 4m -log-output-dir /home/ec2-user/selenoid-logs -timeout 4m -video-recorder-image "selenoid/video-recorder:latest-release"
./selenoid-ui
Error from console:
selenoid-ui
2019/08/05 18:18:30 [ERROR] [Can't get status: context deadline exceeded]
2019/08/05 18:19:07 Removed client. 0 registered clients
selenoid
2019/08/05 18:04:19 [4093] [CLIENT_DISCONNECTED] [unknown] [xx.xx.xx.xx] [Error: context canceled]
2019/08/05 18:06:44 [4102] [CLIENT_DISCONNECTED] [unknown] [xx.xx.xx.xx] [Error: context canceled]
2019/08/05 18:14:24 [4107] [SESSION_DELETED] [4c4d9e47a8853ba972476cb70ffde4a3]
2019/08/05 18:17:44 [4107] [CLIENT_DISCONNECTED] [unknown] [xx.xx.xx.xx] [Error: context canceled]
2019/08/05 18:18:35 [4105] [CLIENT_DISCONNECTED] [unknown] [xx.xx.xx.xx] [Error: dial tcp 127.0.0.1:33292: i/o timeout]
Test Infrastructure
1) AWS EC2 t2.micro 1 cpu
2) NightwatchJS 1.1.11
3) Chrome 75.0
Additional Issue
I see no sessions logs saved in -log-output-dir and dir has the write permissions.
Tried giving session timeout with 4m but it is not helping.

Related

Datadog trace errors while trying to enable APM

I am trying to enable APM for my application . In my deployment.yaml, I have set the flag : DD_APM_ENABLED to true and am installing the dd-java-agent.jar in my dockerfile . In spite of that after deployment , I see that in datadog APM Service for my service is not enabled . And I keep getting the below errors :
[dd-trace-processor] WARN datadog.trace.agent.common.writer.ddagent.DDAgentApi - Error while sending 81 traces to the DD agent. java.net.SocketTimeoutException: timeout (Will not log errors for 5 minutes)
[dd.trace 2023-02-16 ] [OkHttp http://10.184.117.129:8126/...] WARN com.datadog.profiling.uploader.ProfileUploader - Failed to upload profile Status: 503 Service Unavailable
Any suggestions as to what I am doing wrong here ?

X-Ray Daemon don't receive any data from envoy

I have a service running a task definition with three containers:
service itself
envoy
x-ray daemon
And I want to trace and monitor my services interacting with each other with x-ray.
But I don't see any data in x-ray.
I can see the request logs and everything in the envoy logs but there are no error messages about missing connection to the x-ray daemon.
Envoy container has three env variables:
APPMESH_VIRTUAL_NODE_NAME = mesh/mesh-name/virtualNode/service-virtual-node
ENABLE_ENVOY_XRAY_TRACING = 1
ENVOY_LOG_LEVEL = trace
The x-ray daemon is pretty plain and has just a name and an image (amazon/aws-xray-daemon:1).
But when looking in the logs of the x-ray dameon, there is only the following:
2022-05-31T14:48:05.042+02:00 2022-05-31T12:48:05Z [Info] Initializing AWS X-Ray daemon 3.0.0
2022-05-31T14:48:05.042+02:00 2022-05-31T12:48:05Z [Info] Using buffer memory limit of 76 MB
2022-05-31T14:48:05.042+02:00 2022-05-31T12:48:05Z [Info] 1216 segment buffers allocated
2022-05-31T14:48:05.051+02:00 2022-05-31T12:48:05Z [Info] Using region: eu-central-1
2022-05-31T14:48:05.788+02:00 2022-05-31T12:48:05Z [Error] Get instance id metadata failed: RequestError: send request failed
2022-05-31T14:48:05.788+02:00 caused by: Get http://169.254.169.254/latest/meta-data/instance-id: dial tcp xxx.xxx.xxx.254:80: connect: invalid argument
2022-05-31T14:48:05.789+02:00 2022-05-31T12:48:05Z [Info] Starting proxy http server on 127.0.0.1:2000
As far as I read, the error you can see in these logs doesn't affect the functionality (https://repost.aws/questions/QUr6JJxyeLRUK5M4tadg944w).
I'm pretty sure I'm missing a configuration or access right.
It's running already on staging but I set this up several weeks ago and I don't find any differences between the configurations.
Thanks in advance!
In my case, I made a copy-paste mistake by copying trailing line break into the name of the environment variable ENABLE_ENVOY_XRAY_TRACING which wasn't visible in the overview and only inside the text field.

yarn application command hangs due to absence of Kerberos ticket

Within a bash script, I am invoking yarn application command in order to get the current applications running on a Cloudera Hadoop cluster secured by Kerberos. In case my application is not running, it is necessary to restart it:
spark_rtp_app_array=( $(yarn application --list -appTypes SPARK -appStates ACCEPTED,RUNNING | awk -F "\t" ' /'my_user'/ && /'my_app'/ {print $1}') )
Whenever the Kerberos ticket has ended I need to invoke kinit command, in order to renew that ticket before calling yarn application --list:
kinit -kt my_keytab_file.keytab my_kerberos_user
Otherwise, I could end with an authentication error which keeps repeating in an undefinite way with the following traces:
19/02/13 15:00:22 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS in\
itiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
19/02/13 15:00:22 WARN security.UserGroupInformation: PriviledgedActionException as:my_kerberos_user (auth:KERBEROS) cause:java.io\
.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechan\
ism level: Failed to find any Kerberos tgt)]
[...]
Is there any way of setting a maximum number of connection retries to YARN?
The bash script is being executed in a cron task, so it should not be hung in any way.

Data Node and Node Manager is not starting in pseudo-cluster mode(Apache Hadoop)

The Data Node and Node Manager is not starting in pseudo-cluster mode (Apache Hadoop).
Seeing this error in the log file:
***2017-08-22 17:15:08,403 ERROR org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Unexpected error starting NodeStatusUpdater
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Recieved SHUTDOWN signal from Resourcemanager ,Registration of NodeManager failed, Message from ResourceManager: NodeManager from archit doesn't satisfy minimum allocations, Sending SHUTDOWN signal to the NodeManager.***
at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:278)
at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:197)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStart(NodeManager.java:272)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:496)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:543)
2017-08-22 17:15:08,404 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl failed in state STARTED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException:
The "ResourceManager: NodeManager from archit doesn't satisfy minimum allocations" error is seen when node on which node manager is being started does not have enough resources w.r.t yarn.scheduler.minimum-allocation-vcores and yarn.scheduler.minimum-allocation-mb configurations.
Reduce values of yarn.scheduler.minimum-allocation-vcores and yarn.scheduler.minimum-allocation-mb then restart yarn.

Way to show AWS API calls being made by Packer in post-processors section?

I have a Packer template with the following post-processors section:
"post-processors": [
{
"type": "amazon-import",
"ami_name": "my_image-{{user `os_version`}}",
"access_key": "{{user `aws_access_key`}}",
"secret_key": "{{user `aws_secret_key`}}",
"region": "us-east-1",
"s3_bucket_name": "my_s3_bucket",
"tags": {
"Description": "Packer build {{timestamp}}",
"Version": "{{user `build_version`}}"
},
"only": ["aws"]
}
I'm trying to debug a policy/permissions issue and wanted to see more details as to what AWS API calls Packer is making here with the amazon-import Post-Processor.
I'm aware of the PACKER_LOG=1 environment variable, but is there anything more verbose than this? This output doesn't give me much to go on:
2017/08/11 23:55:24 packer: 2017/08/11 23:55:24 Waiting for state to become: completed
2017/08/11 23:55:24 packer: 2017/08/11 23:55:24 Using 2s as polling delay (change with AWS_POLL_DELAY_SECONDS)
2017/08/11 23:55:24 packer: 2017/08/11 23:55:24 Allowing 300s to complete (change with AWS_TIMEOUT_SECONDS)
2017/08/12 00:29:59 ui: aws (amazon-import): Import task import-ami-fg0qxxdb complete
aws (amazon-import): Import task import-ami-fg0qxxdb complete
2017/08/12 00:29:59 ui: aws (amazon-import): Starting rename of AMI (ami-c01125bb)
aws (amazon-import): Starting rename of AMI (ami-c01125bb)
2017/08/12 00:29:59 ui: aws (amazon-import): Waiting for AMI rename to complete (may take a while)
2017/08/12 00:29:59 packer: 2017/08/12 00:29:59 Waiting for state to become: available
aws (amazon-import): Waiting for AMI rename to complete (may take a while)
2017/08/12 00:29:59 packer: 2017/08/12 00:29:59 Using 2s as polling delay (change with AWS_POLL_DELAY_SECONDS)
2017/08/12 00:29:59 packer: 2017/08/12 00:29:59 Allowing 300s to complete (change with AWS_TIMEOUT_SECONDS)
2017/08/12 00:29:59 packer: 2017/08/12 00:29:59 Error on AMIStateRefresh: UnauthorizedOperation: You are not authorized to perform this operation.
2017/08/12 00:29:59 packer: status code: 403, request id: f53ea750-788e-4213-accc-def6ca459113
2017/08/12 00:29:59 [INFO] (telemetry) ending amazon-import
2017/08/12 00:29:59 [INFO] (telemetry) found error: Error waiting for AMI (ami-3f132744): UnauthorizedOperation: You are not authorized to perform this operation.
status code: 403, request id: f53ea750-788e-4213-accc-def6ca459113
2017/08/12 00:29:59 Deleting original artifact for build 'aws'
2017/08/12 00:29:59 ui error: Build 'aws' errored: 1 error(s) occurred:
* Post-processor failed: Error waiting for AMI (ami-3f132744): UnauthorizedOperation: You are not authorized to perform this operation.
status code: 403, request id: f53ea750-788e-4213-accc-def6ca459113
2017/08/12 00:29:59 Builds completed. Waiting on interrupt barrier...
2017/08/12 00:29:59 machine readable: error-count []string{"1"}
2017/08/12 00:29:59 ui error:
==> Some builds didn't complete successfully and had errors:
2017/08/12 00:29:59 machine readable: aws,error []string{"1 error(s) occurred:\n\n* Post-processor failed: Error waiting for AMI (ami-3f132744): UnauthorizedOperation: You are not authorized to perform this operation.\n\tstatus code: 403, request id: f53ea750-788e-4213-accc-def6ca459113"}
Build 'aws' errored: 1 error(s) occurred:
2017/08/12 00:29:59 ui error: --> aws: 1 error(s) occurred:
* Post-processor failed: Error waiting for AMI (ami-3f132744): UnauthorizedOperation: You are not authorized to perform this operation.
status code: 403, request id: f53ea750-788e-4213-accc-def6ca459113
2017/08/12 00:29:59 ui:
==> Builds finished but no artifacts were created.
* Post-processor failed: Error waiting for AMI (ami-3f132744): UnauthorizedOperation: You are not authorized to perform this operation.
status code: 403, request id: f53ea750-788e-4213-accc-def6ca459113
==> Some builds didn't complete successfully and had errors:
--> aws: 1 error(s) occurred:
* Post-processor failed: Error waiting for AMI (ami-3f132744): UnauthorizedOperation: You are not authorized to perform this operation.
status code: 403, request id: f53ea750-788e-4213-accc-def6ca459113
==> Builds finished but no artifacts were created.
2017/08/12 00:30:00 [WARN] (telemetry) Error finalizing report. This is safe to ignore. Post https://checkpoint-api.hashicorp.com/v1/telemetry/packer: context deadline exceeded
2017/08/12 00:30:00 waiting for all plugin processes to complete...
2017/08/12 00:30:00 /usr/local/bin/packer: plugin process exited
2017/08/12 00:30:00 /usr/local/bin/packer: plugin process exited
2017/08/12 00:30:00 /usr/local/bin/packer: plugin process exited
I'm assuming this is a policy permissions issue but I can't tell what I'm missing from the above output.
Unfortunately there is no more debugging to enable.
I recommend that that you review that you have created all policies according to the docs and review the permission for the user. You can do that by pasting the ACCESS KEY ID in Search IAM.
As an last resource it can be good to go through the process manually with the AWS cli.
Not within Packer, but you could use AWS CloudTrail to see which API's have been called:
https://aws.amazon.com/cloudtrail/