Kafka Connect + Camel + S3 sink: keyName attribute behavior - amazon-s3

I am trying to use Camel S3 sink connector within Kafka Connect.
I would like to be able to configure S3 object key based on Kafka message key.
However keyName attribute doesn't behave the way I expected.
This is my connector configuration:
{
"connector.class": "org.apache.camel.kafkaconnector.awss3sink.CamelAwss3sinkSinkConnector",
"camel.kamelet.aws-s3-sink.overrideEndpoint": true,
"camel.kamelet.aws-s3-sink.uriEndpointOverride": "http://localstack:4566",
"camel.kamelet.aws-s3-sink.bucketNameOrArn": "outbox",
"camel.kamelet.aws-s3-sink.region": "eu-west-1",
"camel.kamelet.aws-s3-sink.accessKey": "test",
"camel.kamelet.aws-s3-sink.secretKey": "test",
"camel.kamelet.aws-s3-sink.keyName": "adjlaksdlaksdj",
"topics": "outbox.s3",
"value.converter": "io.debezium.converters.BinaryDataConverter",
"xxx": "xxx"
}
Whatever I put into camel.kamelet.aws-s3-sink.keyName attribute, the key in S3 becomes some machine string like 3A63F047E2E0AE2-0000000000000006.
My understanding was that I should be able to configure S3 object key using this attribute using some expression language.
Is there any setting I can apply to fix this?

Related

Error exporting state transition results to S3

When enabling distributed map task and exporting the state transitions history to S3 triggers a exception with the following error message
An error occurred while executing the state 'Map' (entered at the event id #12). Failed to write a test manifest into the specified output bucket. | Message from S3: The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint
This is my the ResultWriter key object definition
/// other keys omitted...
"ResultWriter": {
"Resource": "arn:aws:states:::s3:putObject",
"Parameters": {
"Bucket": "sds-qa-nv",
"Prefix": "distributed_excecutions/"
}
}
I tried enabling Export Map state results to Amazon S3 to save state transition to S3 and I'm expecting that the results are saved to S3 without failing.

Kafka Connect S3 Sink add MetaData

I am trying to add the metadata to the output from kafka into the S3 bucket.
Currently, the output is just the values from the messages from the kafka topic.
I want to get it wrapped with the following (metadata): topic, timestamp, partition, offset, key, value
example:
{
"topic":"some-topic",
"timestamp":"some-timestamp",
"partition":"some-partition",
"offset":"some-offset",
"key":"some-key",
"value":"the-orig-value"
}
note: when I am fetching it throw a consumer it fetched all the metadata. as I wished.
my connector configuration:
{
"name" : "test_s3_sink",
"config" : {
"connector.class" : "io.confluent.connect.s3.S3SinkConnector",
"errors.log.enable" : "true",
"errors.log.include.messages" : "true",
"flush.size" : "10000",
"format.class" : "io.confluent.connect.s3.format.json.JsonFormat",
"name" : "test_s3_sink",
"rotate.interval.ms" : "60000",
"s3.bucket.name" : "some-bucket-name",
"storage.class" : "io.confluent.connect.s3.storage.S3Storage",
"topics" : "some.topic",
"topics.dir" : "some-dir"
}
}
Thanks.
Currently, the output is just the values from the messages from the kafka topic.
Correct, this is the documented behavior. There's a setting for including the key data that you're missing, if you wanted that, as well, but no settings to get the rest of the data.
For the record timestamp, you could edit your producer code to simply add that as part of your records. (and everything else, for that matter, if you're able to query for the next offset of the topic every time you produce)
For Topic and Partition, those are part of the S3 file, so whatever you're reading the files with should be able to parse out that information; the offset value is also part of the filename, then add the line number within the file to get the (approximate) offset of the record.
Or, you can use a Connect transform such as this archive one that relocates the Kafka record metadata (except offset and partition) all into the Connect Struct value such that the sink connector will then write all of it to the files
https://github.com/jcustenborder/kafka-connect-transform-archive
Either way, ConnectRecord has no offset field, a SinkRecord does, but I think that's too late in the API for transforms to access it

Can't create a S3 bucket with KMS_MANAGED key and bucketKeyEneabled via CDK

I have a S3 bucket with this configuration:
I'm trying to create a bucket with this same configuration via CDK:
Bucket.Builder.create(this, "test1")
.bucketName("com.myorg.test1")
.encryption(BucketEncryption.KMS_MANAGED)
.bucketKeyEnabled(true)
.build()
But I'm getting this error:
Error: bucketKeyEnabled is specified, so 'encryption' must be set to KMS (value: MANAGED)
This seems like a bug to me, but I'm relatively new to CDK so I'm not sure. Am I doing something wrong, or is this indeed a bug?
I encountered the issue recently, and I have found the answer. I want to share the findings here just in case anyone gets stuck.
Yes, this was a bug in the AWS-CDK. The fix was merged this month: https://github.com/aws/aws-cdk/pull/22331
According to the CDK doc (https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_s3.Bucket.html#bucketkeyenabled), if bucketKeyEnabled is set to true, S3 will use its own time-limited key instead, which helps reduce the cost (see https://docs.aws.amazon.com/AmazonS3/latest/userguide/bucket-key.html); it's only relevant when Encryption is set to BucketEncryption.KMS or BucketEncryption.KMS_MANAGED.
bucketKeyEnabled flag , straight from docs:
Specifies whether Amazon S3 should use an S3 Bucket Key with
server-side encryption using KMS (SSE-KMS)
BucketEncryption has 4 options.
NONE - no encryption
MANAGED - Kms key managed by AWS
S3MANAGED - Key managed by S3
KMS - key managed by user, a new KMS key will be created.
we don't need to set bucketKeyEnabled at all for any scenario. In this case, all we need is aws/s3 , so,
bucketKeyEnabled: Need not be set.(since this is only for SSE-KMS)
encryption: Should be set to BucketEncryption.S3_MANAGED
Example:
const buck = new s3.Bucket(this, "my-bucket", {
bucketName: "my-test-bucket-1234",
encryption: s3.BucketEncryption.KMS_MANAGED,
});

Filter EC2 based on tag while sending EC2 Instance State-change Notification through SNS using Cloudwatch event rule

I am trying to configure AWS Event rule using event pattern. Bye default the code is
{
"source": [
"aws.ec2"
],
"detail-type": [
"EC2 Instance State-change Notification"
]
}
I want to filter the EC2 based on tag lets say all of my EC2 has unique AppID attached i.e.20567. Reason I want to filter it because other teams have EC2's under same AWS account and I want to configure SNS only for the instances that belongs to me based on tag 'App ID'
Target I have selected SNS topic and using input formatter with value
{"instance":"$.detail.instance-id","state":"$.detail.state","time":"$.time","region":"$.region","account":"$.account"}
Any suggestion where can I pass tag key value to filter my EC2 Instances.
I can only speak for Cloudwatch Events (now called as EventBridge). We do not get tag information from EC2 prior to rule-matching. A sample EC2 event is shown at https://docs.aws.amazon.com/eventbridge/latest/userguide/event-types.html#ec2-event-type
{
"id":"7bf73129-1428-4cd3-a780-95db273d1602",
"detail-type":"EC2 Instance State-change Notification",
"source":"aws.ec2",
"account":"123456789012",
"time":"2015-11-11T21:29:54Z",
"region":"us-east-1",
"resources":[
"arn:aws:ec2:us-east-1:123456789012:instance/i-abcd1111"
],
"detail":{
"instance-id":"i-abcd1111",
"state":"pending"
}
}
So you best course of action would be to fetch the tags for a resource and filter out the events after reading.

Accessing FlowFile content in NIFI PutS3Object Processor

I am new to NIFI and want to push data from Kafka to an S3 bucket. I am using the PutS3Object processor and can push data to S3 if I hard code the Bucket value as mphdf/orderEvent, but I want to specify the buckets based on a field in the content of the FlowFile, which is in Json. So, if the Json content is this {"menu": {"type": "file","value": "File"}}, can I have the value for the Bucket property as as mphdf/$.menu.type? I have tried to do this and get the error below. I want to know if there is a way to access the FlowFile content with the PutS3Object processor and make Bucket names configurable or will I have to build my own processor?
ERROR [Timer-Driven Process Thread-10]
o.a.nifi.processors.aws.s3.PutS3Object
com.amazonaws.services.s3.model.AmazonS3Exception: The XML you
provided was not well-formed or did not validate against our
published schema (Service: Amazon S3; Status Code: 400; Error Code:
MalformedXML; Request ID: 77DF07828CBA0E5F)
I believe what you want to do is use an EvaluateJSONPath processor, which evaluates arbitrary JSONPath expressions against the JSON content and extracts the results to flowfile attributes. You can then reference the flowfile attribute using NiFi Expression Language in the PutS3Object configuration (see your first property Object Key which references ${filename}). In this way, you would evaluate $.menu.type and store it into an attribute menuType in the EvaluateJSONPath processor, then in PutS3Object you would have Bucket be mphdf/${menuType}.
You might have to play around with it a bit but off the top of my head I think that should work.