My AKS Cluster was brought down, how can I recover? - azure-container-service

I have been playing around with load-testing my application on a single agent cluster in AKS. During the testing, the connection to the dashboard stalled and never resumed. My application seems down as well, so I am assuming the cluster is in a bad state.
The API server is restate-f4cbd3d9.hcp.centralus.azmk8s.io
kubectl cluster-info dump shows the following error:
{
"name": "kube-dns-v20-6c8f7f988b-9wpx9.14fbbbd6bf60f0cf",
"namespace": "kube-system",
"selfLink": "/api/v1/namespaces/kube-system/events/kube-dns-v20-6c8f7f988b-9wpx9.14fbbbd6bf60f0cf",
"uid": "47f57d3c-d577-11e7-88d4-0a58ac1f0249",
"resourceVersion": "185572",
"creationTimestamp": "2017-11-30T02:36:34Z",
"InvolvedObject": {
"Kind": "Pod",
"Namespace": "kube-system",
"Name": "kube-dns-v20-6c8f7f988b-9wpx9",
"UID": "9d2b20f2-d3f5-11e7-88d4-0a58ac1f0249",
"APIVersion": "v1",
"ResourceVersion": "299",
"FieldPath": "spec.containers{kubedns}"
},
"Reason": "Unhealthy",
"Message": "Liveness probe failed: Get http://10.244.0.4:8080/healthz-kubedns: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)",
"Source": {
"Component": "kubelet",
"Host": "aks-agentpool-34912234-0"
},
"FirstTimestamp": "2017-11-30T02:23:50Z",
"LastTimestamp": "2017-11-30T02:59:00Z",
"Count": 6,
"Type": "Warning"
}
As well as some Pod Sync errors in Kube-System.
Example of issue:
az aks browse -g REstate.Server -n REstate
Merged "REstate" as current context in C:\Users\User\AppData\Local\Temp\tmp29d0conq
Proxy running on http://127.0.0.1:8001/
Press CTRL+C to close the tunnel...
error: error upgrading connection: error dialing backend: dial tcp 10.240.0.4:10250: getsockopt: connection timed out

You'll probably need to ssh to the node to see if the Kubelet service is running. For future you can set Resource quotas from exhausting all resources in the cluster nodes.
Resource Quotas -https://kubernetes.io/docs/concepts/policy/resource-quotas/

Related

Google Home doesn't send UDP broadcast to my smart home device

I am trying to develop a smart home action that works with local fulfillment but my device doesn't receive the UDP broadcast request. I have a google home device at home which is connected to my account.
I have made the next steps:
add otherDevicesIds field to my sync response
add device scan configuration (I have chosen UDP )
implement UDP server on the device. it listens on 8888 ports and responds to echo -n "test data" | nc -u -b 255.255.255.255 8888
Packets from my laptop are received by my DIY smart home device but I don't see any packets from the google home assistant device(it is in the same network as my laptop). It seems that google assistant does not send UDP broadcast at all.
I have added an example of my sync response and a screenshot of Device Scan Configuration for this action.
How to make my device receive UDP broadcast? Tell me if I have understood the local fulfillment wrong.
SYNC response example below
{
"payload": {
"agentUserId": "fas87df6a8s7d6f",
"devices": [
{
"otherDeviceIds": [
{
"agentId": "fasf87da",
"deviceId": "sdfta87sd6f"
}
],
"deviceInfo": {
"model": "LIGHT",
"manufacturer": "87sd6f87asd",
"swVersion": "1.0",
"hwVersion": "LIGHT"
},
"customData": {},
"id": "light-1234112",
"attributes": {},
"type": "action.devices.types.LIGHT",
"name": {
"defaultNames": [
"light"
],
"nicknames": [
"light"
],
"name": "7f6as87fa8"
},
"traits": [
"action.devices.traits.OnOff",
"action.devices.traits.Brightness"
],
"willReportState": false
}
]
},
"requestId": "7298347129347192374"
}
Below is the example of google action UDP config.
As mentioned in the comments by you, you didn’t create a local fulfillment app which is necessary for developing & testing local fulfillment for any of your existing Smart Home Actions. You can find more information about this at https://developers.google.com/assistant/smarthome/develop/local?hl=en
Additionally, make sure that you have added the devices correctly after enabling the test suite.

geth process is stopped because debug_traceBlockByNumber

geth version:
Version: 1.9.24-stable
When I send the request, the geth process stops.
{
"jsonrpc": "2.0",
"method": "debug_traceBlockByNumber",
"params": [
"0x242c60",
{"tracer": "evmdisTracer"}
],
"id": 1
}
geth error logs:
error logs
how should i solve this problem?

How to monitor a EMR cluster whether it is terminated using cloudwatch

I want to set alarm, when any EMR cluster is terminated(caused by internal errors), I know there is a "IsIdle" option, but my EMR clusters are designed to be persistent, so "IsIdle" is not really fit my case. Is there a health-check metric that I can used?
You can configure Amazon CloudWatch to send a "State Change" event to another service like an AWS Lambda function or an Amazon SNS topic.
To achieve this, open the CloudWatch console, in the navigation pane click on Rules > Create rule.
Service Name: EMR
Event Type: State Change
Specific detail type(s): EMR Cluster State Change
Specific State: TERMINATED and TERMINATED_WITH_ERRORS
Targets: Put the receiving service of your choice.
Here's an example of such an event:
{
"version": "0",
"id": "8535abb0-f87e-4640-b7b6-8de000dfc30a",
"detail-type": "EMR Cluster State Change",
"source": "aws.emr",
"account": "123456789012",
"time": "2016-12-16T21:00:23Z",
"region": "us-east-1",
"resources": [],
"detail": {
"severity": "INFO",
"stateChangeReason": "{\"code\":\"USER_REQUEST\",\"message\":\"Terminated by user request\"}",
"name": "Development Cluster",
"clusterId": "j-1YONHTCP3YZKC",
"state": "TERMINATED",
"message": "Amazon EMR Cluster j-1YONHTCP3YZKC (Development Cluster) has terminated at 2016-12-16 21:00 UTC with a reason of USER_REQUEST."
}
}

Packer ssh timeout

I am trying to build images with packer in a jenkins pipeline. However, the packer ssh provisioner does not work as the ssh never becomes available and error out with timeout.
Farther investigation of the issue shows that, the image is missing network interface files ifconfig-eth0 in /etc/sysconfig/network-scripts directory so it never gets an ip and does not accept ssh connection.
The problem is, there are many such images to be generated and I can't open each one manually in GUI of virtualbox and correct the issue and repack. Is there any other possible solution to that?
{
"variables": {
"build_base": ".",
"isref_machine":"create-ova-caf",
"build_name":"virtual-box-jenkins",
"output_name":"packer-virtual-box",
"disk_size":"40000",
"ram":"1024",
"disk_adapter":"ide"
},
"builders":[
{
"name": "{{user `build_name`}}",
"type": "virtualbox-iso",
"guest_os_type": "Other_64",
"iso_url": "rhelis74_1710051533.iso",
"iso_checksum": "",
"iso_checksum_type": "none",
"hard_drive_interface":"{{user `disk_adapter`}}",
"ssh_username": "root",
"ssh_password": "Secret1.0",
"shutdown_command": "shutdown -P now",
"guest_additions_mode":"disable",
"boot_wait": "3s",
"boot_command": [ "auto<enter>"],
"ssh_timeout": "40m",
"headless":
"true",
"vm_name": "{{user `output_name`}}",
"disk_size": "{{user `disk_size`}}",
"output_directory":"{{user `build_base`}}/output-{{build_name}}",
"format": "ovf",
"vrdp_bind_address": "0.0.0.0",
"vboxmanage": [
["modifyvm", "{{.Name}}","--nictype1","virtio"],
["modifyvm", "{{.Name}}","--memory","{{ user `ram`}}"]
],
"skip_export":true,
"keep_registered": true
}
],
"provisioners": [
{
"type":"shell",
"inline": ["ls"]
}
]
}
When you don't need the SSH connection during the provisioning process you can switch it off. See the packer documentation about communicator, there you see the option none to switch of the communication between host and guest.
{
"builders": [
{
"type": "virtualbox-iso",
"communicator": "none"
}
]
}
Packer Builders DOCU virtualbox-iso

Can't Connect to Service via Marathon-lb using DCOS

I recently went through the tutorial for load balancing apps in DCOS using marathon-lb (in the example they balance some nginx containers: https://dcos.io/docs/1.9/networking/marathon-lb/marathon-lb-advanced-tutorial/). I am trying to use this approach to internally load balance my own custom application. The custom app I am using is a play scala app. I have the internal marathon-lb set up and can successfully use it for the nginx container but when I try to use my own docker image I cannot get this to work. I start up my service with my custom image and I can access the service fine by using the IP and port that gets assigned to it (i.e. if the service gets deployed on 10.0.0.0 and is available on port 1234 then curl http://10.0.0.0:1234/ works as expected and I can also make my api calls as defined in my application routes). However, when I try to access the app through the load balancer (curl -i http://marathon-lb-internal.marathon.mesos:10002, where 10002 is the service port) then I get this message:
HTTP/1.0 503 Service Unavailable
Cache-Control: no-cache
Connection: close
Content-Type: text/html
<html><body><h1>503 Service Unavailable</h1>
No server is available to handle this request.
</body></html>
For reference, here is my json file I'm using to start my custom service:
{
"id": "my-app",
"container": {
"type": "DOCKER",
"docker": {
"image": "my_repo/my_image:1.0.0",
"network": "BRIDGE",
"portMappings": [
{ "hostPort": 0, "containerPort": 9000, "servicePort": 10002, "protocol": "tcp" }
],
"parameters": [
{ "key": "env", "value": "USER_NAME=user" },
{ "key": "env", "value": "USER_PASSWORD=password" }
],
"forcePullImage": true
}
},
"instances": 1,
"cpus": 1,
"mem": 1000,
"healthChecks": [{
"protocol": "HTTP",
"path": "/v1/health",
"portIndex": 0,
"timeoutSeconds": 10,
"gracePeriodSeconds": 10,
"intervalSeconds": 2,
"maxConsecutiveFailures": 10
}],
"labels":{
"HAPROXY_GROUP":"internal"
},
"uris": [ "https://s3.amazonaws.com/my_bucket/my_docker_credentials" ]
}
I had the same problem and found the solution here
marathon-lb health check failing on all spray.io containers
Need to add
"HAPROXY_0_BACKEND_HTTP_HEALTHCHECK_OPTIONS": " http-send-name-header Host\n timeout check {healthCheckTimeoutSeconds}s\n"
To your config so that the REST layer doesn't bark on the health check from marathon