How to determine the Cloudwatch log stream for a Fargate service? - amazon-cloudwatch

I've got a Fargate service running, and can view its Cloudwatch log streams using the AWS console (navigate to the service, and click on its Logs tab).
I'm looking at the AWS documentation for GetLogEvents and see that you can access the logs using the log group name and log stream name. While I know the log group name for the service, the log stream name is generated dynamically.
How do I obtain the current log stream name for the running Fargate service?
I'm checking the AmazonECSClient documentation, any pointers would be helpful.
EDIT:
I found that the log group is actually specified for the container, not the service. Retrieving the task definition for the service, I can iterate over the container definitions which have the LogConfiguration section that indicates the Options, however that only provides the log group and a stream prefix, no log stream name:
- service
- task definition
- container definitions
- LogConfiguration:
LogDriver: awslogs
Options: awslogs-group=/ecs/myservice
awslogs-region=us-east-1
awslogs-stream-prefix=ecs
EDIT 2:
I see from the AWS Console, that the link in the Logs tab does contain the log stream name. See the stream value in this sample URL:
https://us-east-1.console.aws.amazon.com/cloudwatch/home
?region=us-east-1
#logEventViewer:group=/ecs/myservice;stream=ecs/myservice/ad7246dd-bb0e-4eff-b059-767d30d40e69
How does the AWS Console obtain that value?

I finally found the format of the log stream name in the AWS documentation here:
awslogs-stream-prefix
Required: No, unless using the Fargate launch type in which case it is required.
The awslogs-stream-prefix option allows you to associate a log stream
with the specified prefix, the container name, and the ID of the Amazon
ECS task to which the container belongs. If you specify a prefix with
this option, then the log stream takes the following format:
prefix-name/container-name/ecs-task-id
Note that the ecs-task-id is the GUID portion of the task's ARN:
For this sample Task ARN:
arn:aws:ecs:us-east-1:123456789012:task/12373b3b-84c1-4398-850b-4caef9a983fc
the ecs-task-id to use for the log stream name is:
12373b3b-84c1-4398-850b-4caef9a983fc

Related

How to programmatically set up Airflow 1.10 logging with localstack s3 endpoint?

In attempt to setup airflow logging to localstack s3 buckets, for local and kubernetes dev environments, I am following the airflow documentation for logging to s3. To give a little context, localstack is a local AWS cloud stack with AWS services including s3 running locally.
I added the following environment variables to my airflow containers similar to this other stack overflow post in attempt to log to my local s3 buckets. This is what I added to docker-compose.yaml for all airflow containers:
- AIRFLOW__CORE__REMOTE_LOGGING=True
- AIRFLOW__CORE__REMOTE_BASE_LOG_FOLDER=s3://local-airflow-logs
- AIRFLOW__CORE__REMOTE_LOG_CONN_ID=MyS3Conn
- AIRFLOW__CORE__ENCRYPT_S3_LOGS=False
I've also added my localstack s3 creds to airflow.cfg
[MyS3Conn]
aws_access_key_id = foo
aws_secret_access_key = bar
aws_default_region = us-east-1
host = http://localstack:4572 # s3 port. not sure if this is right place for it
Additionally, I've installed apache-airflow[hooks], and apache-airflow[s3], though it's not clear which one is really needed based on the documentation.
I've followed the steps in a previous stack overflow post in attempt verify if the S3Hook can write to my localstack s3 instance:
from airflow.hooks import S3Hook
s3 = S3Hook(aws_conn_id='MyS3Conn')
s3.load_string('test','test',bucket_name='local-airflow-logs')
But I get botocore.exceptions.NoCredentialsError: Unable to locate credentials.
After adding credentials to airflow console under /admin/connection/edit as depicted:
this is the new exception, botocore.exceptions.ClientError: An error occurred (InvalidAccessKeyId) when calling the PutObject operation: The AWS Access Key Id you provided does not exist in our records. is returned. Other people have encountered this same issue and it may have been related to networking.
Regardless, a programatic setup is needed, not a manual one.
I was able to access the bucket using a standalone Python script (entering AWS credentials explicitly with boto), but it needs to work as part of airflow.
Is there a proper way to set up host / port / credentials for S3Hook by adding MyS3Conn to airflow.cfg?
Based on the airflow s3 hooks source code, it seems a custom s3 URL may not yet be supported by airflow. However, based on the airflow aws_hook source code (parent) it seems it should be possible to set the endpoint_url including port, and it should be read from airflow.cfg.
I am able to inspect and write to my s3 bucket in localstack using boto alone. Also, curl http://localstack:4572/local-mochi-airflow-logs returns the contents of the bucket from the airflow container. And aws --endpoint-url=http://localhost:4572 s3 ls returns Could not connect to the endpoint URL: "http://localhost:4572/".
What other steps might be needed to log to localstack s3 buckets from airflow running in docker, with automated setup and is this even supported yet?
I think you're supposed to use localhost not localstack for the endpoint, e.g. host = http://localhost:4572.
In Airflow 1.10 you can override the endpoint on a per-connection basis but unfortunately it only supports one endpoint at a time so you'd be changing it for all AWS hooks using the connection. To override it, edit the relevant connection and in the "Extra" field put:
{"host": "http://localhost:4572"}
I believe this will fix it?
I managed to make this work by referring to this guide. Basically you need to create a connection using the Connection class and pass the credentials that you need, in my case I needed AWS_SESSION_TOKEN, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, REGION_NAME to make this work. Use this function as a python_callable in a PythonOperator which should be the first part of the DAG.
import os
import json
from airflow.models.connection import Connection
from airflow.exceptions import AirflowFailException
def _create_connection(**context):
"""
Sets the connection information about the environment using the Connection
class instead of doing it manually in the Airflow UI
"""
AWS_ACCESS_KEY_ID = os.getenv("AWS_ACCESS_KEY_ID")
AWS_SECRET_ACCESS_KEY = os.getenv("AWS_SECRET_ACCESS_KEY")
AWS_SESSION_TOKEN = os.getenv("AWS_SESSION_TOKEN")
REGION_NAME = os.getenv("REGION_NAME")
credentials = [
AWS_SESSION_TOKEN,
AWS_ACCESS_KEY_ID,
AWS_SECRET_ACCESS_KEY,
REGION_NAME,
]
if not credentials or any(not credential for credential in credentials):
raise AirflowFailException("Environment variables were not passed")
extras = json.dumps(
dict(
aws_session_token=AWS_SESSION_TOKEN,
aws_access_key_id=AWS_ACCESS_KEY_ID,
aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
region_name=REGION_NAME,
),
)
try:
Connection(
conn_id="s3_con",
conn_type="S3",
extra=extras,
)
except Exception as e:
raise AirflowFailException(
f"Error creating connection to Airflow :{e!r}",
)

dms s3 source endpoint connection fails

Getting below connection error when trying to validate S3 source endpoint of DMS.
Test Endpoint failed: Application-Status: 1020912, Application-Message: Failed to connect to database.
Followed all the steps listed in the below links but still maybe I am missing something...
https://aws.amazon.com/premiumsupport/knowledge-center/dms-connection-test-fail-s3/
https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.S3.html
The role associated with the endpoint does have access to the S3 bucket of the endpoint, along with dms being listed as trusted entity.
I got this same error when trying to use S3 as a target.
The one thing not mentioned in the documentation, and which turned out to be the root cause for my error, is that the DMS Replication Instance and the Bucket need to be in the same region.

"Failed to Get Framework Assemblies Local Path During Pushing Edge Package" message on Azure Stream Analytics Edge Module deployment

I Had a very simple ASA edge job depolyed and running on a device for 1 week, and as of last thursday (11/07/2019), the module disappeared from my device and I can no longer add it. It returns the following message: "Failed to Get Framework Assemblies Local Path During Pushing Edge Package".
It looks like the ASA job definition is not being saved on the storage container. I tried to configure the storage account/container both manually and automatically, and when I clieck save, it shows thh operation successful message, the logs show that the job received an update, but if I open the storage account setting on the ASA job, it is not configured. If I explore the storage container, it´s empty.
the storage account is configured as blob account, hot, public accessible.
The region is Central (US).
It turns out it is a region bug. We created the ASA job on the East US region and it worked. Go figure...

Google Cloud Platform - Catch and log errors and automatically terminate VM

I am running a workflow on a n1-ultramem-40 instance that will run for several days. If an error occurs, I would like to catch and log the error, be notified, and automatically terminate the Virtual Machine. Could I use StackDriver and gcloud logging to achieve this? How could I automatically terminate the VM using these tools? Thanks!
Let's break the puzzle into two parts. The first is logging an error to Stackdriver and the second is performing an external action automatically when such an error is detected.
Stackdriver provides a wide variety of language bindings and package integrations that result in log messages being written. You could include such API calls in your application which detects the error. If you don't have access to the source code of your application but it instead logs to an external file, you could use the Stackdriver agents to monitor log files and relay the log messages to Stackdriver.
Once you have the error messages being sent to Stackdriver, the next task would be defining a Stackdriver log export definition. This is the act of defining a "filter" that looks for the specific log entry message(s) that you are interested in acting upon. Associated with this export definition and filter would be a PubSub topic. A pubsub message would then be written to this topic when an Stackdriver log entry is made.
Finally, we now have our trigger to perform your action. We could use a Cloud Function triggered from a PubSub message to execute arbitrary API logic. This could be code that performs an API request to GCP to terminate the VM.

How to create a Datalake using Apache Kafka, Amazon Glue and Amazon S3?

I want to store all the data from a Kafka's topic into Amazon S3. I have a Kafka cluster that receives in one topic 200.000 messages per second, and each value message has 50 fields (strings, timestamps, integers, and floats).
My main idea is to use Kafka Connector to store the data in a bucket s3 and after that use Amazon Glue to transform the data and keep it into another bucket. I have the next questions:
1) How to do it? That architecture will work well? I tried with Amazon EMR (Spark Streaming) but I had too many concerns How to decrease the processing time and failed tasks using Apache Spark for events streaming from Apache Kafka?
2) I tried to use Kafka Connect from Confluent, but I have a few questions:
Can I connect to my Kafka Cluster from other Kafka instance and
run in a standalone way my Kafka Connector s3?
What means this error "ERROR Task s3-sink-0 threw an uncaught an
unrecoverable exception"?
ERROR Task s3-sink-0 threw an uncaught and unrecoverable exception
(org.apache.kafka.connect.runtime.WorkerTask:142)
java.lang.NullPointerException at
io.confluent.connect.hdfs.HdfsSinkTask.close(HdfsSinkTask.java:122)
at
org.apache.kafka.connect.runtime.WorkerSinkTask.commitOffsets(WorkerSinkTask.java:290)
at
org.apache.kafka.connect.runtime.WorkerSinkTask.closePartitions(WorkerSinkTask.java:421)
at
org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:146)
at
org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:140)
at
org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:175)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745) [2018-10-05 15:32:26,086]
ERROR Task is being killed and will not recover until manually
restarted (org.apache.kafka.connect.runtime.WorkerTask:143)
[2018-10-05 15:32:27,980] WARN could not create Dir using directory
from url file:/targ. skipping. (org.reflections.Reflections:104)
java.lang.NullPointerException at
org.reflections.vfs.Vfs$DefaultUrlTypes$3.matches(Vfs.java:239) at
org.reflections.vfs.Vfs.fromURL(Vfs.java:98) at
org.reflections.vfs.Vfs.fromURL(Vfs.java:91) at
org.reflections.Reflections.scan(Reflections.java:237) at
org.reflections.Reflections.scan(Reflections.java:204) at
org.reflections.Reflections.(Reflections.java:129) at
org.apache.kafka.connect.runtime.AbstractHerder.connectorPlugins(AbstractHerder.java:268)
at
org.apache.kafka.connect.runtime.AbstractHerder$1.run(AbstractHerder.java:377)
at java.lang.Thread.run(Thread.java:745) [2018-10-05 15:32:27,981]
WARN could not create Vfs.Dir from url. ignoring the exception and
continuing (org.reflections.Reflections:208)
org.reflections.ReflectionsException: could not create Vfs.Dir from
url, no matching UrlType was found [file:/targ] either use
fromURL(final URL url, final List urlTypes) or use the static
setDefaultURLTypes(final List urlTypes) or
addDefaultURLTypes(UrlType urlType) with your specialized UrlType. at
org.reflections.vfs.Vfs.fromURL(Vfs.java:109) at
org.reflections.vfs.Vfs.fromURL(Vfs.java:91) at
org.reflections.Reflections.scan(Reflections.java:237) at
org.reflections.Reflections.scan(Reflections.java:204) at
org.reflections.Reflections.(Reflections.java:129) at
org.apache.kafka.connect.runtime.AbstractHerder.connectorPlugins(AbstractHerder.java:268)
at
org.apache.kafka.connect.runtime.AbstractHerder$1.run(AbstractHerder.java:377)
at java.lang.Thread.run(Thread.java:745) [2018-10-05 15:32:35,441]
INFO Reflections took 12393 ms to scan 429 urls, producing 13521 keys
and 95814 values (org.reflections.Reflections:229)
If you can resume the steps to connect to Kafka and keep on s3 from
another Kafka instance, how will you do?
What means all these fields key.converter, value.converter, key.converter.schemas.enable, value.converter.schemas.enable, internal.key.converter,internal.value.converter, internal.key.converter.schemas.enable, internal.value.converter.schemas.enable?
What are the possible values for key.converter, value.converter?
3) Once my raw data is in a bucket, I would like to use Amazon Glue to take these data, to deserialize Protobuffer, to change the format of some fields, and finally to store it in another bucket in Parquet. How can I use my own java protobuffer library in Amazon Glue?
4) If I want to query with Amazon Athena, how can I load the partitions automatically (year, month, day, hour)? With the crawlers and schedulers of Amazon Glue?
To complement #cricket_007's answer
Can I connect to my Kafka Cluster from other Kafka instance and run in a standalone way my Kafka Connector s3?
Kafka S3 Connector is part of the Confluent distribution, which also includes Kafka, as well as other related services, but it is not meant to run on your brokers directly, rather:
as a standalone worker running a Connector's configuration given when the service is launched
or as an additional workers' cluster running on the side of your Kafka Brokers' cluster. In that case, interaction/running of connectors is better via the Kafka Connect REST API (Search for "Managing Kafka Connectors" for documentation with examples)
If you can resume the steps to connect to Kafka and keep on s3 from
another Kafka instance, how will you do?
Are you talking about another Kafka Connect instance?
if so, you can simply execute the Kafka Connect service in distributed mode which was meant to give the reliability you seem to be looking for...
Or do you mean another Kafka (brokers) cluster?
in that case, you could try (but that would be experimental, and I haven't tried it myself...) to run Kafka Connect in standalone mode and simply update bootstrap.servers parameter of your connector's configuration to point to the new cluster. Why that might work: in standalone mode the offsets of your sink connector(s) are stored locally on your worker (contrarily to distributed mode where the offsets are stored on the Kafka cluster directly...). Why that might not work: it's simply not intended for this use and I'm guessing you might need your topics and partitions to be exactly the same...?
What are the possible values for key.converter, value.converter?
Check Confluent's documentation for kafka-connect-s3 ;)
How can I use my own java protobuffer library in Amazon Glue?
Not sure of the actual method, but Glue jobs spawn off an EMR cluster behind the scenes so I don't see why it shouldn't be possible...
If I want to query with Amazon Athena, how can I load the partitions automatically (year, month, day, hour)? With the crawlers and schedulers of Amazon Glue?
Yes.
Assuming a daily partitioning, you could actually have you're schedule run the crawler first thing in the morning, as soon as you can expect new data to have created that day's folder on S3 (so at least one object for that day exists on S3)... The crawler will add that day's partition which will then be available for querying with any newly added object.
We use S3 Connect for hundreds of topics and process data using Hive, Athena, Spark, Presto, etc. Seems to work fine, though I feel like an actual database might return results faster.
In any case, to answer about Connect
Can I connect to my Kafka Cluster from other Kafka instance and run in a standalone way my Kafka Connector s3?
I'm not sure I understand the question, but Kafka Connect needs to connect to one cluster, you don't need two Kafka clusters to use it. You'd typically run Kafka Connect processes as part of their own cluster, not on the brokers.
What means this error "ERROR Task s3-sink-0 threw an uncaught an unrecoverable exception"?
It means you need to look at the logs to figure out what exception is being thrown and stopping the connector from reading data.
WARN could not create Dir using directory from url file:/targ ... If you're using HDFS connector, I don't think you should be using the default file:// URI
If you can resume the steps to connect to Kafka and keep on s3 from another Kafka instance, how will you do?
You can't "resume from another Kafka instance". As mentioned, Connect can only consume from a single Kafka cluster, and any consumed offsets and consumer groups are stored with it.
What means all these fields
These fields are removed from the latest Kafka releases, you can ignore them. You definitely should not change them
internal.key.converter,internal.value.converter, internal.key.converter.schemas.enable, internal.value.converter.schemas.enable
These are your serializers and deserializers like the regular producer consumer API have
key.converter, value.converter
I believe these are only important for JSON converters. See https://rmoff.net/2017/09/06/kafka-connect-jsondeserializer-with-schemas-enable-requires-schema-and-payload-fields
key.converter.schemas.enable, value.converter.schemas.enable
to deserialize Protobuf, to change the format of some fields, and finally to store it in another bucket in Parquet
Kafka Connect would need to be loaded with a Protobuf converter, and I don't know there is one (I think Blue Apron wrote something... Search github).
Generally speaking, Avro would be much easier to convert to Parquet because native libraries already exist to do that. S3 Connect by Confluent doesn't currently write Parquet format, but there in an open PR. The alternative is to use Pinterest Secor library.
I don't know Glue, but if it's like Hive, you would use ADD JAR during a query to load external code plugins and functions
I have minimal experience with Athena, but Glue maintains all the partitions as a Hive metastore. The automatic part would be the crawler, you can put a filter on the query to do partition pruning