How do I add Linux capabilities SYS_NICE and DAC_READ_SEARCH to container in AWS Fargate? - aws-fargate

I'm trying to setup a task definition in ECS Fargate for running Koha containers but Fargate won't accept
--cap-add=SYS_NICE --cap-add=DAC_READ_SEARCH
(or any other kernel capabilities except for SYS_PTRACE) in the task definition json file. I tried adding "linuxParameters": {"capabilities": { "add": [ "SYS_NICE", "DAC_READ_SEARCH"],to the task definition json file but Fargate simply deletes the code.
The mpm_itk module fails without this option (and my container throws an 500 error with the following warning/error in the logs
[mpm_itk:warn] [pid 17146] (itkmpm: pid=17146 uid=33, gid=33) itk_post_perdir_config(): setgid(1000): Operation not permitted
How do I work around this? Is there a way to pass on these capabilities after the container has started up? Any help will be appreciated, thanks!

According to AWS Fargate only allows you to add the SYS_PTRACE kernel capability. It is not possible to add any other capabilities at the moment. The only viable workaround that I can see working is to use ECS EC2.

The container created by docker runc is bounded by capability flag i.e.
0x00000000a80425fb=cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap
The container can get the capabilities from this set only.

Related

GitLab CI stuck at "Waiting Fargate task to be ready" - but Fargate task is in fact running, but never completes

Having set up GitLab CI and AWS Fargate resources as described in the documentation, we have a situation where the runner can trigger the Fargate task, which goes into RUNNING state, but the master runner never seems to realize this.
Running with gitlab-runner 14.7.0 (98daeee0)
on gitlab-fargate-master DyE5BsVA
Preparing the "custom" executor
INFO[2022-01-27T13:54:49Z] Starting fargate PID=1447 version="0.2.0 (933d940)"
INFO[2022-01-27T13:54:49Z] Executing the command PID=1447 command=config_exec
Using Custom executor with driver fargate 0.2.0 (933d940)...
INFO[2022-01-27T13:54:49Z] Starting fargate PID=1452 version="0.2.0 (933d940)"
INFO[2022-01-27T13:54:49Z] Executing the command PID=1452 command=prepare_exec
INFO[2022-01-27T13:54:56Z] Starting new Fargate task PID=1452 command=prepare_exec
INFO[2022-01-27T13:54:58Z] Persisting data that will be used by other commands PID=1452 command=prepare_exec taskARN="arn:aws:ecs:us-east-1:558517226390:task/gitlab-ci-cluster/ee488fa1d7d7475fab9be01d5bad180e"
INFO[2022-01-27T13:54:58Z] Waiting Fargate task to be ready PID=1452 command=prepare_exec taskARN="arn:aws:ecs:us-east-1:558517226390:task/gitlab-ci-cluster/ee488fa1d7d7475fab9be01d5bad180e"
Within AWS, the task has created its Log Stream in Cloudwatch, but there are no events in that log. It's unclear what is actually happening.
What can be done to find out?
We have reverted to using a vanilla Docker container from the GitLab documentation registry.gitlab.com/tmaczukin-test-projects/fargate-driver-debian:latest but exactly same happens.
Solved - problem was missing AWS permission ECS:DescribeTasks, which for some reason was not causing an error message in the Runner.
(I had mistakenly added AmazonEC2_FullAccess, not AmazonECS_FullAccess as described in the docs)
Having run a "Generate Policy" in AWS based on CloudTrail Events (awesome new feature!), I can now confirm the permissions actually being used are:
EC2: DescribeNetworkInterfaces.
ECS: StopTask, DescribeTasks, RunTask
Note the EC2 permission, which is missing from the docs.
Not sure if you have solved your problem but I noticed this question as I had the exact same issue yesterday. For me this was caused as my gitlab manager task was using an IAM role which was limited to start and stop tasks but it was apparently missing permissions to check weather a task is in the RUNNING state. So I fixed my ecs execution role and then it started working for me.

Running AWS Log Agent from inside a Fargate container

Trying to run the AWS Logs Agent inside a docker container running on AWS ECS Fargate.
This has been working fine under EC2 for several years. Under Fargate context, it does not seem to be able to resolve the task role being passed to it.
Permissions on the Task Role should be good... I've even tried giving it full CloudWatch permissions to eliminate that as a reason.
I've managed to hack the python based launcher script to add a --debug flag which gave me this in the log:
Caught retryable HTTP exception while making metadata service request to
http://169.254.169.254/latest/meta-data/iam/security-credentials
It does not appear to be properly resolving the credentials that are passed into the task as the 'Task Role'
I managed to find a hack workaround, that may illustrate what I believe to be a bug or inadequacy in the agent. I had to hack the launcher script using sed as follows:
sed -i "s|HTTPS_PROXY|AWS_CONTAINER_CREDENTIALS_RELATIVE_URI=$AWS_CONTAINER_CREDENTIALS_RELATIVE_URI HTTPS_PROXY|"
/var/awslogs/bin/awslogs-agent-launcher.sh
This essentially de-references the ENV variable holding the URI for retrieving the task role and passes it to the agent's launcher.
It results in something like this:
/usr/bin/env -i AWS_CONTAINER_CREDENTIALS_RELATIVE_URI=/v2/credentials/f4ca7e30-b73f-4919-ae14-567b1262b27b (etc...)
With this in place, I restart the log agent and it works as expected.
Note that you can do something like this to add --debug flag to the launcher also which was very helpful in trying to figure out where it went astray.

Build spinnaker with docker-compose, redirect to localhost

i build spinnaker using docker-compose follow here
but it always redirect to localhost, how can i fix this.
e.g.
http://localhost:8084/auth/redirect?to=http%3A%2F%2F192.168.99.100%3A9000%2F%23%2Finfrastructure
i set the host:0.0.0.0 in spinnaker-local.yml and configured deck apache2 with proxyPreserve=On, it's not working.
where is the configuration about 'redirect'?
All containers running well but fiat gets error mesages, like this:
WARN 1 --- [ecutionAction-1] c.n.s.fiat.roles.UserRolesSyncer : [] User permission sync failed. Server status is DOWN. Trying again in 10000 ms. Cause:(Provider: DefaultServiceAccountProvider) retrofit.RetrofitError: unexpected url: front50/serviceAccounts
i'm sure set fiat false, is this matter?
thanks.
The docker-compose link project is not available anymore. That deployment type is not supported anymore.
The easiest way i suggest for people to get started quick is by using Armory Open source Minnaker. It runs on top of a K3S small cluster and contains a functional spinnaker deployment.
Great way to get started.
I tried the debian local deployment and it failed all the time.
Enjoy your CD operations.

ERROR: The overall deployment failed because too many individual instances failed deployment

I'm trying to deploy using CircleCI -> S3 -> CodeDeploy -> EC2.
I was able to upload deploy image onto S3 from CircleCI, but unable to deploy S3 to EC2 instance. Here's the error.
The overall deployment failed because too many individual instances
failed deployment, too few healthy instances are available for
deployment, or some instances in your deployment group are
experiencing problems. (Error code: HEALTH_CONSTRAINTS)
The error was provided from CodeDeploy. I can't figure out why and how.
I'd appreciate if you give some advise.
If you are running on Ubuntu there might be plenty of reasons, here is a checklist can verify
Check code-deploy agent is installed on your EC2 Instance. Please refer this document to install code deploy agent.
https://docs.aws.amazon.com/codedeploy/latest/userguide/codedeploy-agent-operations-install-ubuntu.html
$ sudo service codedeploy-agent status
In case if you are running Ubuntu release 20.x and you get this error
./install:22:in block in method_missing': undefined method path' for
#<IO:> (NoMethodError)
try running the install file via this script
sudo ./install auto > /tmp/logfile
Check you have EC2 Instance Code Deploy Role -> Create a code deployment role and assign it to the Instance, https://docs.aws.amazon.com/codedeploy/latest/userguide/getting-started-create-service-role.html.
In case if you assign the EC2 Role after initiate, restart the server.
Check your appsec.yml file placement as per the top answer, try to avoid any long timeout in it.
Log into your instance check your error log
$ tail -f /var/log/aws/codedeploy-agent/codedeploy-agent.log
You should be able to figure out what caused the individual instances to fail by digging into the deployment instance details:
http://docs.aws.amazon.com/codedeploy/latest/userguide/how-to-view-instance-details.html
These should contain more detailed information about why your application was unable to be deployed.
This error is commonly due to problems in the configuration of the appSpec.yml or appSpec.json file (It depends on the format you are using).
"If you have any Hook I recommend that you remove them, check if it works, then you can add one by one (the Hooks) and so you can identify the error"
The appspec.yml file should be located at the root of your project:
│-- appspec.yml
│-- index.html
└-- scripts
│-- install_dependencies
│-- start_server
└-- stop_server
In the scripts folder you will have to place the processes that you want to be executed according to the Hook
Here is an example of the appspec.yml file
version: 0.0
os: linux
files:
- source: /index.html
destination: /var/www/html/
hooks:
BeforeInstall:
- location: scripts/install_dependencies
timeout: 300
runas: root
- location: scripts/start_server
timeout: 300
runas: root
ApplicationStop:
- location: scripts/stop_server
timeout: 300
runas: root
I hope I can help you 😃👻🕺🏾
Make sure the CodeDeploy Host Agent Service is running in your target EC2 instance.
The error you are facing is a generic error message thrown on any of the event failure which could be beforeblockTraffic, blockTraffic, ApplicationStop etc.
The first step in this case would be check whether code deploy agent is running or not if first event i.e. BeforeBlockTraffic event is failed.
As you can see in the screenshot below, the event failure message would tell you the exact error behind.
From the failed deployments, I can see all lifecycle events were skipped. Instance i-0bcc36e73851297f2 is currently in Stopped state but I can see the IAM instance profile is missing. Your Amazon EC2 instances need permission to access the Amazon S3 buckets or GitHub repositories where the applications that will be deployed by AWS CodeDeploy are stored. To launch Amazon EC2 instances that are compatible with AWS CodeDeploy, you must create an additional IAM role, an instance profile. 1
For such failures, you can always begin with a general troubleshooting checklist for a failed deployment 2 and then look for troubleshooting guides on Deployment Issues and Instance issues3.
1[http://docs.aws.amazon.com/codedeploy/latest/userguide/how-to-create-iam-instance-profile.html]1
2 [http://docs.aws.amazon.com/codedeploy/latest/userguide/troubleshooting-general.html]2
3 [http://docs.aws.amazon.com/codedeploy/latest/userguide/troubleshooting.html]3
Check the status of the Code Deploy Agent. In my case, the agent wasn't up.
Please check the role given to the ec2 machine(where the agent is running). It should have s3 access as well. This resolved my issue.
"The CodeDeploy agent did not find an AppSpec file within the unpacked revision directory at revision-relative path 'appspec.yml'"
Please place your appspec.yml file in your root folder to solve this error
To access your after script and before script
The overall deployment failed because too many individual instances failed deployment, too few healthy instances are available for deployment, or some instances in your deployment group are experiencing problems.

How to submit code to a remote Spark cluster from IntelliJ IDEA

I have two clusters, one in local virtual machine another in remote cloud. Both clusters in Standalone mode.
My Environment:
Scala: 2.10.4
Spark: 1.5.1
JDK: 1.8.40
OS: CentOS Linux release 7.1.1503 (Core)
The local cluster:
Spark Master: spark://local1:7077
The remote cluster:
Spark Master: spark://remote1:7077
I want to finish this:
Write codes(just simple word-count) in IntelliJ IDEA locally(on my laptp), and set the Spark Master URL to spark://local1:7077 and spark://remote1:7077, then run my codes in IntelliJ IDEA. That is, I don't want to use spark-submit to submit a job.
But I got some problem:
When I use the local cluster, everything goes well. Run codes in IntelliJ IDEA or use spark-submit can submit job to cluster and can finish the job.
But When I use the remote cluster, I got a warning log:
TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
It is sufficient resources not sufficient memory!
And this log keep printing, no further actions. Both spark-submit and run codes in IntelliJ IDEA result the same.
I want to know:
Is it possible to submit codes from IntelliJ IDEA to remote cluster?
If it's OK, does it need configuration?
What are the possible reasons that can cause my problem?
How can I handle this problem?
Thanks a lot!
Update
There is a similar question here, but I think my scene is different. When I run my codes in IntelliJ IDEA, and set Spark Master to local virtual machine cluster, it works. But I got Initial job has not accepted any resources;... warning instead.
I want to know whether the security policy or fireworks can cause this?
Submitting code programatically (e.g. via SparkSubmit) is quite tricky. At the least there is a variety of environment settings and considerations -handled by the spark-submit script - that are quite difficult to replicate within a scala program. I am still uncertain of how to achieve it: and there have been a number of long running threads within the spark developer community on the topic.
My answer here is about a portion of your post: specifically the
TaskSchedulerImpl: Initial job has not accepted any resources; check
your cluster UI to ensure that workers are registered and have
sufficient resources
The reason is typically there were a mismatch on the requested memory and/or number of cores from your job versus what were available on the cluster. Possibly when submitting from IJ the
$SPARK_HOME/conf/spark-defaults.conf
were not properly matching the parameters required for your task on the existing cluster. You may need to update:
spark.driver.memory 4g
spark.executor.memory 8g
spark.executor.cores 8
You can check the spark ui on port 8080 to verify that the parameters you requested are actually available on the cluster.