Weed Filer backup errors in the log - amazon-s3

I started a weed filer.backup process to backup all the data to an S3 bucket. Lot of logs are getting generated with the below error messages. Do i need to update any config to resolve or these messages can be ignored?
s3_write.go:99] [persistent-backup] completeMultipartUpload buckets/persistent/BEV_Processed_Data/2011_09_30/2011_09_30/GT_BEV_Output/0000000168.png: EntityTooSmall: Your proposed upload is smaller than the minimum allowed size
Apr 21 09:20:14 worker-server-004 seaweedfs-filer-backup[3076983]: #011status code: 400, request id: 10N2S6X73QVWK78G, host id: y2dsSnf7YTtMLIQSCW1eqrgvkom3lQ5HZegDjL4MgU8KkjDG/4U83BOr6qdUtHm8S4ScxI5HwZw=
Another message
malformed xml the xml you provided was not well formed or did not validate against
This issue happens with empty files or files with small content. Looks like aws s3 multipart upload does not accept streaming empty files. Is there any setting on SeaweedFs that i am missing?

Related

Trying to pass binary files through Logstash

Some process is producing into my Kafka binary files (from Java it comes as bytearray).
I'm trying to consume from Kafka with Logstash and upload the file into s3.
My pipeline:
input {
kafka {
bootstrap_servers => "my-broker:9092"
topic => "my-topic"
partition_assignment_strategy => "org.apache.kafka.clients.consumer.StickyAssignor"
value_deserializer_class => "org.apache.kafka.common.serialization.ByteArrayDeserializer"
}
}
filter {
mutate {
remove_field => ["#timestamp", "host"]
}
}
output {
s3 {
region => "eu-west-1"
bucket => "my_bucket"
time_file => 1
prefix => "files/"
rotation_strategy => "time"
}
}
As you can see I used a different deserializer class. However, it seems that Logstash uses by default a coded that converts the bytes array to string. My goal is to upload the file to s3 as it is. Is there any codec known that doesn't do anything to the input data and upload it as it is to?
Right now the files are uploaded to s3, but I can't read them or open them. The binary content was corrupted by Logstash somehow. For example - I tried sending a gzip that contains multiple files inside and I can't open it afterwards in s3.
The warning that I get on Logstash:
0-06-02T10:49:29,149][WARN ][logstash.codecs.plain ][my_pipeline] Received an event that has a different character encoding than you configured. {:text=>"7z\\xBC\\xAF'\\u001C\\u0000\\u0002\\xA6j<........more binary data", :expected_charset=>"UTF-8"}
I'm not sure that Logstash is the best fit for passing binary data and I in the end implemented a Java consumer but the following solution worked for me with Logstash:
The data was sent to Kafka can be serialized to binary data. For
example, I used filebeat to send the binary data so if Kafka`s
output module there is a parameter that is called "value_serializer"
and it should be set to
"org.apache.kafka.common.serialization.ByteArraySerializer"
In your Logstash settings (kafka input) define the
value_deserializer_class to
"org.apache.kafka.common.serialization.ByteArrayDeserializer" just
as I did in the post
Your output in logstash can be any resource that can get binary data.
Be aware, that the output will get a binary data and you will need to deserialize it.
I don't think you really understand what logstash is for.
As it's name log-stash it is for streaming ascii type of files using EOL delimiter to deffer between different log events.
I did managed to find community developed kafkaBeat for reading data from Kafka Topics, there are 2 options:
kafkabeat - Reads data from Kafka topics.
kafkabeat2 - Reads data (json or plain) from Kafka topics.
I didn't test those my own, but using the S3 output option with those might do the trick. If the S3 option is not yet supported you can develop it yourself and push it to the open-source so everyone can enjoy it :-)

Accessing FlowFile content in NIFI PutS3Object Processor

I am new to NIFI and want to push data from Kafka to an S3 bucket. I am using the PutS3Object processor and can push data to S3 if I hard code the Bucket value as mphdf/orderEvent, but I want to specify the buckets based on a field in the content of the FlowFile, which is in Json. So, if the Json content is this {"menu": {"type": "file","value": "File"}}, can I have the value for the Bucket property as as mphdf/$.menu.type? I have tried to do this and get the error below. I want to know if there is a way to access the FlowFile content with the PutS3Object processor and make Bucket names configurable or will I have to build my own processor?
ERROR [Timer-Driven Process Thread-10]
o.a.nifi.processors.aws.s3.PutS3Object
com.amazonaws.services.s3.model.AmazonS3Exception: The XML you
provided was not well-formed or did not validate against our
published schema (Service: Amazon S3; Status Code: 400; Error Code:
MalformedXML; Request ID: 77DF07828CBA0E5F)
I believe what you want to do is use an EvaluateJSONPath processor, which evaluates arbitrary JSONPath expressions against the JSON content and extracts the results to flowfile attributes. You can then reference the flowfile attribute using NiFi Expression Language in the PutS3Object configuration (see your first property Object Key which references ${filename}). In this way, you would evaluate $.menu.type and store it into an attribute menuType in the EvaluateJSONPath processor, then in PutS3Object you would have Bucket be mphdf/${menuType}.
You might have to play around with it a bit but off the top of my head I think that should work.

Amazon Redshift COPY always return S3ServiceException:Access Denied,Status 403

I'm really struggling with how to do data transfer from Amazon S3 bucket to Redshift with COPY command.
So far, I created an IAM User and 'AmazonS3ReadOnlyAccess' policy is assigned. But when I call COPY command likes following, Access Denied Error is always returned.
copy my_table from 's3://s3.ap-northeast-2.amazonaws.com/mybucket/myobject' credentials 'aws_access_key_id=<...>;aws_secret_access_key=<...>' REGION'ap-northeast-2' delimiter '|';
Error:
Amazon Invalid operation: S3ServiceException:Access Denied,Status 403,Error AccessDenied,Rid EB18FDE35E1E0CAB,ExtRid ,CanRetry 1
Details: -----------------------------------------------
error: S3ServiceException:Access Denied,Status 403,Error AccessDenied,Rid EB18FDE35E1E0CAB,ExtRid ,CanRetry 1
code: 8001
context: Listing bucket=s3.ap-northeast-2.amazonaws.com prefix=mybucket/myobject
query: 1311463
location: s3_utility.cpp:542
process: padbmaster [pid=4527]
-----------------------------------------------;
Is there anyone can give me some clues or advice?
Thanks a lot!
Remove the endpoint s3.ap-northeast-2.amazonaws.com from the S3 path:
COPY my_table
FROM 's3://mybucket/myobject'
CREDENTIALS ''
REGION 'ap-northeast-2'
DELIMITER '|'
;
(See the examples in the documentation.) While the Access Denied error is definitely misleading, the returned message gives some hint as to what went wrong:
bucket=s3.ap-northeast-2.amazonaws.com
prefix=mybucket/myobject
We'd expect to see bucket=mybucket and prefix=myobject, though.
Check encription of bucket.
According doc : https://docs.aws.amazon.com/en_us/redshift/latest/dg/c_loading-encrypted-files.html
The COPY command automatically recognizes and loads files encrypted using SSE-S3 and SSE-KMS.
Check kms: rules on you key|role
If files from EMR, check Security configurations for S3.
your redshift cluster role does not have right to access to the S3 bucket. make sure the role you use for redshift has access to the bucket and bucket does not have policy that blocks the access

Amazon S3 error- A conflicting conditional operation is currently in progress against this resource.

Why I got this error when I try to create a bucket in amazon S3?
This error means that, the bucket was recently deleted and is queued for delete in S3. You must wait until the name is available again.
This error means that, the bucket was recently deleted and is queued for delete in S3. You must wait until the Bucket name is available again.
Kindly note, I received this error when my access-priviliges were blocked.
The error means your Operation for creating new bucket at S3 is aborted.
There can be multiple reasons for this, you can check the below points for rectifying this error:
Is this Bucket available or is Queued for Deletion
Do you have adequate access privileges for this operation
Your Bucket Name must be unique
P.S: Edited this answer to add more details as shared by Sanity below, and his answer is more accurate with updated information.
You can view the related errors for this operation here.
I am editing my asnwer so that correct answer posted below can be selected as correct answer to this question.
Creating a S3 bucket policy and the S3 public access block for a bucket at the same time will cause the error.
Terraform example
resource "aws_s3_bucket_policy" "allow_alb_access_bucket_elb_log" {
bucket = local.bucket_alb_log_id
policy = data.aws_iam_policy_document.allow_alb_access_bucket_elb_log.json
}
resource "aws_s3_bucket_public_access_block" "lb_log" {
bucket = local.bucket_alb_log_id
block_public_acls = true
block_public_policy = true
}
Solution
resource "aws_s3_bucket_public_access_block" "lb_log" {
bucket = local.bucket_alb_log_id
block_public_acls = true
block_public_policy = true
#--------------------------------------------------------------------------------
# To avoid OperationAborted: A conflicting conditional operation is currently in progress
#--------------------------------------------------------------------------------
depends_on = [
aws_s3_bucket_policy.allow_alb_access_bucket_elb_log
]
}
We have also observed this error several times when we try to move bucket from one account to other. In order to achieve this you should do the following :
Backup content of the S3 bucket you want to move.
Delete S3 bucket on the account.
Wait for 1/2 hours
Create a bucket with the same name in another account
Restore s3 bucket backup
I received this error running a terraform apply with the error:
Error: error creating public access block policy for S3 bucket
(bucket-name): OperationAborted: A conflicting
conditional operation is currently in progress against this resource.
Please try again.
status code: 409, request id: 30B386F1FAA8AB9C, host id: M8flEj6+ncWr0174ftzHd74CXBjhlY8Ys70vTyORaAGWA2rkKqY6pUECtAbouqycbAZs4Imny/c=
It said to "please try again" which I did and it worked the second time. It seems there wasn't enough wait time when provisioning the initial resource with Terraform.
To fully resolve this error, I inserted a 5 second sleep between multiple requests. There is nothing else that I had to do.

Authorization header is invalid -- one and only one ' ' (space) required - Amazon S3

Trying to precompile my assets in Rails app and sync with Amazon S3 Storage:
with this mesage:
Any feedback appreciated:
Expected(200) <=> Actual(400 Bad Request)
response => #<Excon::Response:0x00000007c45a98 #data={:body=>"<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<Error><Code>InvalidArgument</Code><Message>Authorization header is invalid -- one and only one ' ' (space) required</Message><ArgumentValue>AWS [\"AKIAINSIQYCZLWYSROWQ\", \"7RAxhY5nLkbACICMqjDlee5pCaEhf4LKgSpJ+R9k\"]:LakbTXVMX6I72MViNie/fe+79qU=</ArgumentValue><ArgumentName>Authorization</ArgumentName><RequestId>250C76936044E6D5</RequestId><HostId>j2jK/dv0xTnNddtSFHuVicGv5wWjXl4zXuhOyPcO6+2WWlAYWSkn0CHPwdtnOPet</HostId></Error>", :headers=>{"x-amz-request-id"=>"250C76936044E6D5", "x-amz-id-2"=>"j2jK/dv0xTnNddtSFHuVicGv5wWjXl4zXuhOyPcO6+2WWlAYWSkn0CHPwdtnOPet", "Content-Type"=>"application/xml", "Transfer-Encoding"=>"chunked", "Date"=>"Tue, 20 Aug 2013 13:28:36 GMT", "Connection"=>"close", "Server"=>"AmazonS3"}, :status=>400, :remote_ip=>"205.251.235.165"}, #body="<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<Error><Code>InvalidArgument</Code><Message>Authorization header is invalid -- one and only one ' ' (space) required</Message><ArgumentValue>AWS [\"AKIAINSIQYCZLWYSROWQ\", \"7RAxhY5nLkbACICMqjDlee5pCaEhf4LKgSpJ+R9k\"]:LakbTXVMX6I72MViNie/fe+79qU=</ArgumentValue><ArgumentName>Authorization</ArgumentName><RequestId>250C76936044E6D5</RequestId><HostId>j2jK/dv0xTnNddtSFHuVicGv5wWjXl4zXuhOyPcO6+2WWlAYWSkn0CHPwdtnOPet</HostId></Error>", #headers={"x-amz-request-id"=>"250C76936044E6D5", "x-amz-id-2"=>"j2jK/dv0xTnNddtSFHuVicGv5wWjXl4zXuhOyPcO6+2WWlAYWSkn0CHPwdtnOPet", "Content-Type"=>"application/xml", "Transfer-Encoding"=>"chunked", "Date"=>"Tue, 20 Aug 2013 13:28:36 GMT", "Connection"=>"close", "Server"=>"AmazonS3"}, #status=400, #remote_ip="205.251.235.165">
Have had an error with the same message twice now and both times it was due to pasting an extra space at the end of access key or secret key in config file.
Check where your setting the aws_access_key_id to use with your asset syncer.
This should be something that looks like AKIAINSIQYCZLWYSROWQ, whereas it looks like you've set it to a 2 element array of both your access key id and the secret access key.
Furthermore, given that you've now placed those credentials in the public domain you should revoke them immediately.
Extra space at the end of access key is one issue and the reason is copying from Amazon IAM UI puts the extra space.
The other thing is that when you have configuration in /.aws/credentials folder or other configuration conflicts with environment values. This happened to me when configuring CircleCI and docker machines.
This error also happens if you haven't enabled GET/POST in cloudfront and try to do GET/POST to api which are hosted behind cloudfront.
Error 400 occurs more than 20 cases. Here is a pdf that describe all errors: List of AWS S3 Error Codes