Access a specific task inside a Fargate service - aws-fargate

I'm new to aws so forgive me if the question is trivial.
I have a Cluster running a single fargate service with two tasks that is hosting my internal api service. I can access the api via the main level and everything works.
https://<serviceid>.execute-api.us-east-1.amazonaws.com/lookupx will return the lookupx result from one of two tasks as determined by the load balancer.
I would like to get the result from each task. I know the ein for each task and I know the private IPs.
What do I need to do in in order to access a specific task in a call?
Why do I care? The service reads 40+ files from s3 at startup into memory and provides an endpoint to lookup a value and return corresponding data. I'd like to add an endpoint to reload a file on demand, but I need to make sure both tasks get updated. Not my design and I do not have time and budget to rebuild. Just looking for a better solution than restarting the tasks, reloading all 40+ files, just to update one. Wasn't bad with weekly updates, kinda sucks with daily updates.

Please notice private IP can change after task is restarted.
You can run extra scheduled/on demand task with the same or different task definition to find the service via AWS API, get its current tasks and their IPs and then call your API for all of them.
the script can be bash or any other supported language
https://aws.amazon.com/developer/tools/
with bash
you can list all service tasks:
aws ecs list-tasks --cluster <clusterName> --service <serviceName>
and their ip:
aws ecs describe-tasks --cluster <clusterName> --tasks <taskARN1 taskARN2> --query 'tasks[].attachments[].details[?name==`privateIPv4Address`].value[]'

Related

What are the best practices for Tekton implementation with multiple repositories with multiple deployments

We have multiple repositories that have multiple deployments in K8S.
Today, we have Tekton with the following setup:
We have 3 different projects, that should be build the same and deploy (they are just different repo and different name)
We defined 3 Tasks: Build Image, Deploy to S3, and Deploy to K8S cluster.
We defined 1 Pipeline that accepts parameters from the PipelineRun.
Our problem is that we want to get Webhooks externally from GitHub and to run the appropriate Pipeline automatically without the need to run it with params.
In addition, we want to be able to have the PipelineRun with default paramaters, so Users can invoke deployments automatically.
So - is our configuration and setup seems ok? Should we do something differently?
Our problem is that we want to get Webhooks externally from GitHub and to run the appropriate Pipeline automatically without the need to run it with params. In addition, we want to be able to have the PipelineRun with default paramaters, so Users can invoke deployments automatically.
This sounds ok. The GitHub webhook initiates PipelineRuns of your Pipeline through a Trigger. But your Pipeline can also be initiated by the users directly in the cluster, or by using the Tekton Dashboard.

Ideal way of triggering an Airflow DAG with config

Current Structure:
I am currently deploying airflow on the servers. I am having a server dedicated for airflow. There are also few other servers as worker servers, where each server has applications to perform the airflow task.
Usage:
For each DAG, I am using SSHOperators to do SSH commands on the worker servers, to complete the tasks.
Config:
For each task, it will need to access a config file that contains the file paths and keyed values for the operation. The config file is likely to be slightly different for every DAG run.
I do understand that there are many ways to trigger a DAG, including
passing a config object at run time, either via CLI or REST API
having a config.json stored on the worker servers, and having each
task to load it when the task starts
saving the config information in the airflow admin page, and access the config element using xcom
Concerns:
I am currently passing the config as a JSON string (2-3KB) via REST API, and embed the config in the SSH bash commands
/task/foo --do-something --config "{{ dag_run.conf["foo"] }}".
I am worry if that will one day overload the Airflow database, or someone who mistakenly send a huge config (>10MB).
Questions:
I am wondering what will be the ideal way for triggering an Airflow DAG with config? How is the run_dag config being stored? Is there any garbage collection feature that will clean out the cached config periodically?

Does Serverless, Inc ever see my AWS credentials?

I would like to start using serverless-framework to manage lambda deploys at my company, but we handle PHI so security’s tight. Our compliance director and CTO had concerns about passing our AWS key and secret to another company.
When doing a serverless deploy, do AWS credentials ever actually pass through to Serverless, Inc?
If not, can someone point me to where in the code I can prove that?
Thanks!
Running serverless deploy isn't just one call, it's many.
AWS example (oversimplification):
Check if deployment s3 bucket already exists
Create an S3 bucket
Upload packages to s3 bucket
Call CloudFormation
Check CloudFormation stack status
Get info of created recourses (e.g. endpoint urls of created APIs)
And those calls can change dependent on what you are doing and what you have done before.
The point I'm trying to make is is that these calls which contain your credentials are not all located in one place and if you want to do a full code review of Serverless Framework and all it's dependencies, have fun with that.
But under the hood, we know that it's actually using the JavaScript aws-sdk (go check out the package.json), and we know what endpoints that uses {service}.{region}.amazonaws.com.
So to prove to your employers that nothing with your credentials is going anywhere except AWS you can just run a serverless deploy with wireshark running (other network packet analyzers are available). That way you can see anything that's not going to amazonaws.com
But wait, why are calls being made to serverless.com and serverlessteam.com when I run a deploy?
Well that's just tracking some stats and you can see what they track here. But if you are uber paranoid, this can be turned off with serverless slstats --disable.

Amazon EC2 Load Testing

I am designing a AWS deployment solution for a new dynamic website project. I have acquired an EC2 instance for testing the environment. Need some help on how do I do a load testing on an Ec2 instance to determine how many HTTP requests it can safely handle... P.S. I am new to the AWS platform.
Thanks...
RedLine offers an EC2 Load Testing solution that will automate the distribution of load tests on your own EC2 instances.
Late to the party but could help someone in the future:
A possible tool for load tests, stress tests, whatever you may call them, is Apache JMeter, but there are plenty of alternatives.
A simple starting setup, further explained in this excellent tutorial on DigitalOcean, can exist of a Thread Group containing an HTTP Request Sampler and a View Results in Table Listener. The Thread Group can be used to configure the amount of "clients" you want to simulate. The Request Sampler will be used to configure the server's properties (hostname, path, etc). The Table View Listener outputs a handy CSV file that can be used to calculate means, compare different types of EC2 instances,...
JMeter is a beautiful program with a GUI that can be run on your local workstation, producing an XML file that can be executed on another EC2 instance, for instance. You can even do simple manual edits to the XML file on your server afterward, if necessary.
Take a look at Amazon's testing policy to make sure you're not doing anything illegal.
A couple of quick points;
Set the environment up exactly like it's supposed to run. If there's a database involved, you'll want to involve that in the testing too. Synthetic <?php echo "ok"; CPU based benchmarks won't help you much since normally very little of the time spent replying to HTTP requests is actual CPU time.
A recommendation is to use a service for the benchmarking. Setting load testing up is not without its complexities, and unless you consider benchmarking your core business, you're probably better off using something like Neustar to load and measure your site (there are many services, they're not necessarily what fits you best, just pulled one out of memory)
Of course you can set a load test up yourself, but getting that done right is not anything that can be described in a few sentences. There are very well paid people that only do that for a living :)
There is good experience in using curl-loader aka Davilka tool, also on Amazon EC2 env
http://curl-loader.sourceforge.net

Not able to backup the log files during instance termination issued by Auto Scaling Policy

I am having EC2 instances with auto scaling enabled on it.
Now as part of scale down policy when one of the instance is issued termination, the log files remaining on that instance need to be backed up on s3, but I am not finding any way to perform s3 logging of log files for that instance. I have tried putting the needed script in rc0.d directory through chkconfig with highest priority. I also tried to put my script in /lib/systemd/system/halt.service (or reboot.service or poweroff.service), but no luck till now.
I have found some threads related to this on stack overflow and AWS forum but no proper solution found till now.
Can any one please let me know the solution to this problem?
The only reliable way I have found of achieving this behaviour is to use rsyslog/syslog to transfer the log files to a central host as soon as they are written to the syslog subsystem.
This means you will need to run another instance that receives the log files and ships them to S3, or use an SQS-based system such as logstash.
Unfortunately there is no other way to ensure all of your log messages will be stored on S3 - you can not guarantee that your script will finish before autoscaling "pulls the plug".