Getting AccessDenied error while upgrading EKS cluster - amazon-eks

I am trying to upgrade my EKS cluster from 1.15 to 1.16 using same ci pipeline which created the cluster...So the credentials have no issue.However I am receiving AccessDenied error.I am using eksctl upgrade cluster command to upgrade cluster.
info: cluster test-cluster exists, will upgrade it
[ℹ] eksctl version 0.33.0
[ℹ] using region us-east-1
[!] NOTE: cluster VPC (subnets, routing & NAT Gateway) configuration changes are not yet implemented
[ℹ] will upgrade cluster "test-cluster" control plane from current version "1.15" to "1.16"
Error: AccessDeniedException:
status code: 403, request id: 1a02b0fd-dca5-4e54-9950-da29cac2cea9
My eksctl version 0.33.0
I am not sure why the same ci pipeline which created the cluster now throwing Access denied error when trying to upgrade the cluster..Is there any permissions I need to add to IAM policy for the user ? I dont find anything in the prerequisites document.So Please let me know what I am missing here.

I have figured out the error was due to missing IAM permission.
I used --verbose 5 to diagnose this issue.

Related

How to update existing deployment in AWS EKS Cluster?

I have my application deployed in AWS EKS Cluster and now I want to update the deployment with the new image that I created from the recent GIT commit.
I did try to use:
kubectl set image deployment/mydeploy mydeploy=ECR:2.0
error: unable to find container named "stag-simpleui-deployment"
I also tried:
kubectl rolling-update mydeploy mydeploy.2.0 --image=ECR:2.0
Command "rolling-update" is deprecated, use "rollout" instead
Error from server (NotFound): replicationcontrollers "stag-simpleui-deployment" not found
It is confusing with so many articles say different ways, but none is working.
I was able to crack it. In below command line "mydeploy=" should be same as your image name in your "kubectl edit deployment mydeploy"
kubectl set image deployment/mydeploy mydeploy=ECR:2.0

Trouble setting up cert-manager without helm or rbac on gke

I believe I have followed this guide:
https://medium.com/#hobochild/installing-cert-manager-on-a-gcloud-k8s-cluster-d379223f43ff
which, has me install the without-rbac version of cert-manager from this repo:
https://github.com/jetstack/cert-manager
however when the cert-manager pod boots up it starts spamming this error:
leaderelection.go:224] error retrieving resource lock cert-manager/cert-manager-controller: configmaps "cert-manager-controller" is forbidden: User "system:serviceaccount:cert-manager:default" cannot get configmaps in the namespace "cert-manager": Unknown user "system:serviceaccount:cert-manager:default"
Hoping someone has some ideas.
The errors seem to be coming from RBAC. If you're running this in minikube you can grant the default service account in the cert-manager namespace the proper rights by running:
kubectl create clusterrolebinding cert-manager-cluster-admin --clusterrole=cluster-admin --serviceaccount=cert-manager:default
After creating the role binding, cert-manager should complete its startup.
You should use the 'with-rbac.yaml' variant if you are installing in GKE, unless you have explicitly disabled RBAC on the GKE cluster!
This should resolve the issues you're seeing here, as by the looks of your error message, you do have RBAC enabled!

ERROR: The overall deployment failed because too many individual instances failed deployment

I'm trying to deploy using CircleCI -> S3 -> CodeDeploy -> EC2.
I was able to upload deploy image onto S3 from CircleCI, but unable to deploy S3 to EC2 instance. Here's the error.
The overall deployment failed because too many individual instances
failed deployment, too few healthy instances are available for
deployment, or some instances in your deployment group are
experiencing problems. (Error code: HEALTH_CONSTRAINTS)
The error was provided from CodeDeploy. I can't figure out why and how.
I'd appreciate if you give some advise.
If you are running on Ubuntu there might be plenty of reasons, here is a checklist can verify
Check code-deploy agent is installed on your EC2 Instance. Please refer this document to install code deploy agent.
https://docs.aws.amazon.com/codedeploy/latest/userguide/codedeploy-agent-operations-install-ubuntu.html
$ sudo service codedeploy-agent status
In case if you are running Ubuntu release 20.x and you get this error
./install:22:in block in method_missing': undefined method path' for
#<IO:> (NoMethodError)
try running the install file via this script
sudo ./install auto > /tmp/logfile
Check you have EC2 Instance Code Deploy Role -> Create a code deployment role and assign it to the Instance, https://docs.aws.amazon.com/codedeploy/latest/userguide/getting-started-create-service-role.html.
In case if you assign the EC2 Role after initiate, restart the server.
Check your appsec.yml file placement as per the top answer, try to avoid any long timeout in it.
Log into your instance check your error log
$ tail -f /var/log/aws/codedeploy-agent/codedeploy-agent.log
You should be able to figure out what caused the individual instances to fail by digging into the deployment instance details:
http://docs.aws.amazon.com/codedeploy/latest/userguide/how-to-view-instance-details.html
These should contain more detailed information about why your application was unable to be deployed.
This error is commonly due to problems in the configuration of the appSpec.yml or appSpec.json file (It depends on the format you are using).
"If you have any Hook I recommend that you remove them, check if it works, then you can add one by one (the Hooks) and so you can identify the error"
The appspec.yml file should be located at the root of your project:
│-- appspec.yml
│-- index.html
└-- scripts
│-- install_dependencies
│-- start_server
└-- stop_server
In the scripts folder you will have to place the processes that you want to be executed according to the Hook
Here is an example of the appspec.yml file
version: 0.0
os: linux
files:
- source: /index.html
destination: /var/www/html/
hooks:
BeforeInstall:
- location: scripts/install_dependencies
timeout: 300
runas: root
- location: scripts/start_server
timeout: 300
runas: root
ApplicationStop:
- location: scripts/stop_server
timeout: 300
runas: root
I hope I can help you 😃👻🕺🏾
Make sure the CodeDeploy Host Agent Service is running in your target EC2 instance.
The error you are facing is a generic error message thrown on any of the event failure which could be beforeblockTraffic, blockTraffic, ApplicationStop etc.
The first step in this case would be check whether code deploy agent is running or not if first event i.e. BeforeBlockTraffic event is failed.
As you can see in the screenshot below, the event failure message would tell you the exact error behind.
From the failed deployments, I can see all lifecycle events were skipped. Instance i-0bcc36e73851297f2 is currently in Stopped state but I can see the IAM instance profile is missing. Your Amazon EC2 instances need permission to access the Amazon S3 buckets or GitHub repositories where the applications that will be deployed by AWS CodeDeploy are stored. To launch Amazon EC2 instances that are compatible with AWS CodeDeploy, you must create an additional IAM role, an instance profile. 1
For such failures, you can always begin with a general troubleshooting checklist for a failed deployment 2 and then look for troubleshooting guides on Deployment Issues and Instance issues3.
1[http://docs.aws.amazon.com/codedeploy/latest/userguide/how-to-create-iam-instance-profile.html]1
2 [http://docs.aws.amazon.com/codedeploy/latest/userguide/troubleshooting-general.html]2
3 [http://docs.aws.amazon.com/codedeploy/latest/userguide/troubleshooting.html]3
Check the status of the Code Deploy Agent. In my case, the agent wasn't up.
Please check the role given to the ec2 machine(where the agent is running). It should have s3 access as well. This resolved my issue.
"The CodeDeploy agent did not find an AppSpec file within the unpacked revision directory at revision-relative path 'appspec.yml'"
Please place your appspec.yml file in your root folder to solve this error
To access your after script and before script
The overall deployment failed because too many individual instances failed deployment, too few healthy instances are available for deployment, or some instances in your deployment group are experiencing problems.

Cloudera service monitor unable to start

I get the following error on restarting the cloudera management service in a docker container:quickstart:latest, i had restarted after an error showed service monitor not running:
Mar 15, 8:45:43.760 AM ERROR com.cloudera.cmon.firehose.Main
Failed to start Firehose
java.io.IOException: Unknown version of the versioned LevelDB store.
at com.cloudera.cmon.tstore.leveldb.LDBUtils.openVersionedDB(LDBUtils.java:253)
at com.cloudera.cmon.tstore.leveldb.LDBPartitionMetadataStore.<init>(LDBPartitionMetadataStore.java:139)
at com.cloudera.cmon.tstore.leveldb.LDBPartitionMetadataStore.<init>(LDBPartitionMetadataStore.java:133)
at com.cloudera.cmon.tstore.leveldb.LDBPartitionMetadataStore.createInPartitionMetadataSubdirectory(LDBPartitionMetadataStore.java:119)
at com.cloudera.cmon.tstore.leveldb.LDBPartitionManager.createLDBPartitionManager(LDBPartitionManager.java:193)
at com.cloudera.cmon.firehose.LDBWorkDetailsTable.<init>(LDBWorkDetailsTable.java:90)
at com.cloudera.cmon.firehose.LDBWorkDetailsStore.<init>(LDBWorkDetailsStore.java:67)
at com.cloudera.cmon.firehose.LDBWorkStoreFactory.createYarnWorkDetailsStore(LDBWorkStoreFactory.java:139)
at com.cloudera.cmon.firehose.Firehose.<init>(Firehose.java:222)
at com.cloudera.cmon.firehose.Main.main(Main.java:515)
Also following is shown in the cloudera.quickstart dashboard:
Unable to issue query: the Service Monitor is not running
This is a common error found in cloudera docker container booted on a single node
I solved it by removing the old /var/lib/cloudera-service-monitor.

Unexpected error: CLIENT_ERROR on Openshift

Following the guide at https://www.openshift.com/developers/download-cartridges i wanted to try installing the CDK to see what it brings to the table. Unfortunately, I was unable to install the cartridge because of the following error:
Unexpected error: CLIENT_ERROR: Download of 'http://cdk-claytondev.rhcloud.com/archive/2ccd7a3a7762e4ebb873c0d64a247b180e0600b8/cdk.zip' exceeded Content-Length of 9728. Download aborted.
Execute rhc create-app cdk http://cdk-claytondev.rhcloud.com/manifest/2ccd7a3a7762e4ebb873c0d64a247b180e0600b8 against a local installation of OpenShift Origin. Or try to create an App throught the web console (again on a local installation, both are untested on rhcloud).
Looks like this was an issue on OpenShift Online also, which got fixed, but is still an issue on OpenShift Origin. Here is the bugzilla ticket for this issue (https://bugzilla.redhat.com/show_bug.cgi?id=1017776), I suggest you add your email address to it to be notified as they make progress against fixing it.