Accessing FlowFile content in NIFI PutS3Object Processor - amazon-s3

I am new to NIFI and want to push data from Kafka to an S3 bucket. I am using the PutS3Object processor and can push data to S3 if I hard code the Bucket value as mphdf/orderEvent, but I want to specify the buckets based on a field in the content of the FlowFile, which is in Json. So, if the Json content is this {"menu": {"type": "file","value": "File"}}, can I have the value for the Bucket property as as mphdf/$.menu.type? I have tried to do this and get the error below. I want to know if there is a way to access the FlowFile content with the PutS3Object processor and make Bucket names configurable or will I have to build my own processor?
ERROR [Timer-Driven Process Thread-10]
o.a.nifi.processors.aws.s3.PutS3Object
com.amazonaws.services.s3.model.AmazonS3Exception: The XML you
provided was not well-formed or did not validate against our
published schema (Service: Amazon S3; Status Code: 400; Error Code:
MalformedXML; Request ID: 77DF07828CBA0E5F)

I believe what you want to do is use an EvaluateJSONPath processor, which evaluates arbitrary JSONPath expressions against the JSON content and extracts the results to flowfile attributes. You can then reference the flowfile attribute using NiFi Expression Language in the PutS3Object configuration (see your first property Object Key which references ${filename}). In this way, you would evaluate $.menu.type and store it into an attribute menuType in the EvaluateJSONPath processor, then in PutS3Object you would have Bucket be mphdf/${menuType}.
You might have to play around with it a bit but off the top of my head I think that should work.

Related

Weed Filer backup errors in the log

I started a weed filer.backup process to backup all the data to an S3 bucket. Lot of logs are getting generated with the below error messages. Do i need to update any config to resolve or these messages can be ignored?
s3_write.go:99] [persistent-backup] completeMultipartUpload buckets/persistent/BEV_Processed_Data/2011_09_30/2011_09_30/GT_BEV_Output/0000000168.png: EntityTooSmall: Your proposed upload is smaller than the minimum allowed size
Apr 21 09:20:14 worker-server-004 seaweedfs-filer-backup[3076983]: #011status code: 400, request id: 10N2S6X73QVWK78G, host id: y2dsSnf7YTtMLIQSCW1eqrgvkom3lQ5HZegDjL4MgU8KkjDG/4U83BOr6qdUtHm8S4ScxI5HwZw=
Another message
malformed xml the xml you provided was not well formed or did not validate against
This issue happens with empty files or files with small content. Looks like aws s3 multipart upload does not accept streaming empty files. Is there any setting on SeaweedFs that i am missing?

S3 always returns error code: NoSuchKey even with incorrect Bucket Name instead of some specific error code detailing about bucket

S3 always returns error code: NoSuchKey i.e.
"when bucket name given in request is incorrect"
or
"when bucket name given in request is correct but with invalid object key"
Is there any way so that S3 API start returning me some specific error code stating that Bucket do not exist instead of generic error: NoSuchKey in a scenario where invalid bucketname is passed while requesting object.
First, check S3 object URL and the requested object URL is the same. Then check S3 upload handle properly asynchronously.
There can be a GetObject request that happened before the upload is completed.

How to push data from AWS IoT MQTT broker to a random file in S3 bucket

I have created a rule to forward all messages published to any topic e.g. foo/bar of my AWS IoT core managed MQTT broker to a nested folder in S3 bucket. For that, I am using key section. I can send data to nested folder like a/b/c. The problem is - it takes c as destination file and this file gets updated with new data as it arrives. Is there any configuration that I can do to put data in bucket in a new file (with any random name) as it arrives (similar to how it happens when we forward data from firehose to S3)
You can change your key to use the newuuid() function. e.g.
a/b/${newuuid()}
This will write the data to a file in the a/b folder with a filename that is a generated UUID.
The key in AWS IoT S3 Actions allow you to use the IoT SQL Reference Functions to form the folder and filename.
The documentation for the key states:
The path to the file where the data is written. For example, if the value of this argument is "${topic()}/${timestamp()}", the topic the message was sent to is "this/is/my/topic,", and the current timestamp is 1460685389, the data is written to a file called "1460685389" in the "this/is/my/topic" folder on Amazon S3.
If you don't want to use a timestamp then you could form the name of the file using other functions such as a random float (rand()), calculate a hash (md5()), a UUID (newuuid()) or the trace id of the message (traceid()).

S3 java SDK - set expiry to object

I am trying to upload a file to S3 and set an expire date for it using Java SDK.
This is the code i got:
Instant expiration = Instant.now().plus(3, ChronoUnit.DAYS);
ObjectMetadata metadata = new ObjectMetadata();
metadata.setExpirationTime(Date.from(expiration));
metadata.setHeader("Expires", Date.from(expiration));
s3Client.putObject(bucketName, keyName, new FileInputStream(file), metadata);
The object has no expire data on it in the S3 console.
What can I do?
Regards,
Ido
These are two unrelated things. The expiration time shown in the console is x-amz-expiration, which is populated by the system, by lifecycle policies. It is read-only.
x-amz-expiration
Amazon S3 will return this header if an Expiration action is configured for the object as part of the bucket's lifecycle configuration. The header value includes an "expiry-date" component and a URL-encoded "rule-id" component.
https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectHEAD.html
Expires is a header which, when set on an object, is returned in the response when the object is downloaded.
Expires
The date and time at which the object is no longer able to be cached. For more information, go to http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.21.
https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPUT.html
It isn't possible to tell S3 when to expire (delete) a specific object -- this is only done as part of bucket lifecycle policies, as described in the User Guide under Object Lifecycle Management.
Following the documentation the method setExpirationTime() using for internal needs and do not define expiration time for the uploaded object
public void setExpirationTime(Date expirationTime)
For internal use only. This will not set the object's expiration
time, and is only used to set the value in the object after receiving
the value in a response from S3.
So you can’t directly set expiration date for particular object. To solve this problem you can:
Define lifecycle rule for a bucket(remove bucket with objects after number of days)
Define lifecycle rule for bucket level to remove objects with specific tag or prefix after numbers of days
To define those rules use documentation:
https://docs.aws.amazon.com/AmazonS3/latest/userguide/how-to-set-lifecycle-configuration-intro.html

How can I retrieve the name of the S3 bucket a CloudFront distribution is using?

Is there any way I can ask the CloudFront API for the name of the bucket it uses on Amazon S3?
This is possible via the GET Distribution action:
To get the information about a distribution, you do a GET on the
2012-03-15/distribution/ resource.
Have a look at the sample syntax in the Responses section, which specifically includes fragments for either S3Origin or CustomOrigin, e.g. abbreviated:
<Distribution xmlns="http://cloudfront.amazonaws.com/doc/2012-03-15/">
<!-- ... -->
<DistributionConfig>
<S3Origin>
<DNSName>myawsbucket.s3.amazonaws.com</DNSName>
<OriginAccessIdentity>origin-access-identity/cloudfront/E127EXAMPLE51Z</OriginAccessIdentity>
</S3Origin>
<!-- ... -->
</DistributionConfig>
</Distribution>
Please note that The S3Origin element is returned only if you use an Amazon S3 origin for your distribution, whereas The CustomOrigin element is returned only if you use a custom origin for your distribution. Furthermore, for more information about the CustomOrigin element and the S3Origin element, see DistributionConfig Complex Type.