SignatureDoesNotMatch error while connecting to Amazon S3 through JAVA - amazon-s3

I am getting the error as "SignatureDoesnotMatch" since last 4 hours.Please check and let me know what is missing in my code:
AmazonS3 s3Client = new AmazonS3Client(new ProfileCredentialsProvider());
s3Client.setEndpoint("***my-service-end-point");
s3Client.setRegion("my-region");
java.util.Date expiration = new java.util.Date();
long milliSeconds = expiration.getTime();
milliSeconds += 1000 * 60 * 60;
expiration.setTime(milliSeconds);
GeneratePresignedUrlRequest generatePresignedUrlRequest =
new GeneratePresignedUrlRequest(existingBucketName, keyName);
generatePresignedUrlRequest.setMethod(HttpMethod.GET);
generatePresignedUrlRequest.setExpiration(expiration);
URL url = s3Client.generatePresignedUrl(generatePresignedUrlRequest);
System.out.println("s3Client :" + s3Client.getBucketLocation(existingBucketName))

First try to integrate latest aws sdk in your code and if still doesn't work then I think you are facing this issue with signature version 4 authentication mechanism. Amazon documentation states that
**
In the Asia Pacific (Mumbai), Asia Pacific (Seoul), EU (Frankfurt) and
China (Beijing) regions, Amazon S3 supports only Signature Version 4.
In all other regions, Amazon S3 supports both Signature Version 4 and
Signature Version 2.
**
Please look into these links:
http://doatt.com/2015/01/19/aws-s3-and-the-signaturedoesnotmatch-error/
http://s3.amazonaws.com/doc/s3-developer-guide/RESTAuthentication.html
http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingAWSSDK.html
http://docs.aws.amazon.com/general/latest/gr/sigv4_signing.html
I hope this will give you enough idea to solve this problem.

Related

Why we need to setReadLimit(int) in AWS S3 Java client

I am working on AWS Java S3 Library.
This is my code which is uploading the file to s3 using High-Level API of AWS.
ClientConfiguration configuration = new ClientConfiguration();
configuration.setUseGzip(true);
configuration.setConnectionTTL(1000 * 60 * 60);
AmazonS3Client amazonS3Client = new AmazonS3Client(configuration);
TransferManager transferManager = new TransferManager(amazonS3Client);
ObjectMetadata objectMetadata = new ObjectMetadata();
objectMetadata.setContentLength(message.getBodyLength());
objectMetadata.setContentType("image/jpg");
transferManager.getConfiguration().setMultipartUploadThreshold(1024 * 10);
PutObjectRequest request = new PutObjectRequest("test", "/image/test", inputStream, objectMetadata);
request.getRequestClientOptions().setReadLimit(1024 * 10);
request.setSdkClientExecutionTimeout(1000 * 60 * 60);
Upload upload = transferManager.upload(request);
upload.waitForCompletion();
I am trying to upload a large file. It is working properly but sometimes I am getting below error. I have set readLimit as (1024*10).
2019-04-05 06:41:05,679 ERROR [com.demo.AwsS3TransferThread] (Aws-S3-upload) Error in saving File[media/image/osc/54/54ec3f2f-a938-473c-94b7-a55f39aac4a6.png] on S3[demo-test]: com.amazonaws.ResetException: Failed to reset the request input stream; If the request involves an input stream, the maximum stream buffer size can be configured via request.getRequestClientOptions().setReadLimit(int)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.resetRequestInputStream(AmazonHttpClient.java:1221)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1042)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:948)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:661)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:635)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:618)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$300(AmazonHttpClient.java:586)
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:573)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:445)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4041)
at com.amazonaws.services.s3.AmazonS3Client.doUploadPart(AmazonS3Client.java:3041)
at com.amazonaws.services.s3.AmazonS3Client.uploadPart(AmazonS3Client.java:3026)
at com.amazonaws.services.s3.transfer.internal.UploadCallable.uploadPartsInSeries(UploadCallable.java:255)
at com.amazonaws.services.s3.transfer.internal.UploadCallable.uploadInParts(UploadCallable.java:189)
at com.amazonaws.services.s3.transfer.internal.UploadCallable.call(UploadCallable.java:121)
at com.amazonaws.services.s3.transfer.internal.UploadMonitor.call(UploadMonitor.java:139)
at com.amazonaws.services.s3.transfer.internal.UploadMonitor.call(UploadMonitor.java:47)
What is the perpose of readLimit?
How it will usefull?
What should I do to avoid this kind of exception?
After researching on this for 1 week,
I have found that if your uploading file size is less than 48GB then you can set readLimit value 5.01MB.
because AWS dividing file into multiple parts and each part size is value is 5MB(If you have not changed minimum part size value). as per the AWS specs, last part size is less than 5MB. so I have set readLimit 5MB and it solves the issue.
InputStream readLimit purpose:
Marks the current position in this input stream. A subsequent call to the reset method repositions this stream at the last marked position so that subsequent reads re-read the same bytes.Readlimit arguments tells this input stream to allow that many bytes to be read before the mark position gets invalidated. The general contract of mark is that, if the method markSupported returns, the stream somehow remembers all the bytes read after the call to mark and stands ready to supply those same bytes again if and whenever the method reset is called. However, the stream is not required to remember any data at all if more than readLimit bytes are read from the stream before reset is called.

AWS S3 file upload - 1 gb file

I am trying to upload large files (less than 5 GB, hence not multipart upload, normal upload) using java sdk. Smaller files gets uploaded in no time. but files which are above 1 MB, doesnt upload. My code gets stuck in the lince where actual upload happens. I tried using transfer manager (TransferManager.upload) function, when I check the number of bytes transferred, it keeps transferring more than 1 MB and keeps running until I force stop my java application. what could be the reason, where am I going wrong. same code works for smaller files. Issue is only with larger files.
DefaultAWSCredentialsProviderChain credentialProviderChain = new DefaultAWSCredentialsProviderChain();
TransferManager tx = new TransferManager(credentialProviderChain.getCredentials());
Upload myUpload = tx.upload(S3bucket,fileKey, file);
while(myUpload.isDone() == false) {
System.out.println("Transfer: " + myUpload.getDescription());
System.out.println(" - State: " + myUpload.getState());
System.out.println(" - Progress: "
+ myUpload.getProgress().getBytesTransferred());
}
s3Client.upload(new PutObjectRequest(S3bucket,fileKey, file));
Tried both transfer manager upload and putobject methods. Same issue with both.
TIA.

S3ResponseError: 403 Forbidden.An error occurred (NoSuchKey) when calling the GetObject operation: The specified key does not exist

try:
conn = boto.connect_s3(access_key,secret_access_key)
bucket = conn.get_bucket(bucket_name, validate=False)
k1 = Key(bucket)
k1.key = 'Date_Table.csv'
# k = bucket.get_key('Date_Table.csv')
k1.make_public()
k1.get_contents_to_filename(tar)
except Exception as e:
print(e)
i am getting error
S3ResponseError: 403 Forbidden
AccessDeniedAccess
DeniedD9ED8BFF6D6A993Eaw0KmxskATNBTDUEo3SZdwrNVolAnrt9/pkO/EGlq6X9Gxf36fQiBAWQA7dBSjBNZknMxWDG9GI=
i tried all posibility and still getting same error .. please guide me how to solve this issue.
i tried other way as below and getting error
An error occurred (NoSuchKey) when calling the GetObject operation:
The specified key does not exist.
session = boto3.session.Session(aws_access_key_id=access_key, aws_secret_access_key=secret_access_key,region_name='us-west-2')
print ("session:"+str(session)+"\n")
client = session.client('s3', endpoint_url=s3_url)
print ("client:"+str(client)+"\n")
stuff = client.get_object(Bucket=bucket_name, Key='Date_Table.csv')
print ("stuff:"+str(stuff)+"\n")
stuff.download_file(local_filename)
ge
Always use boto3. boto is deprecated.
As long as you setup AWS CLI credential, you don't need to pass the hard-coded credential. Read boto3 credential setup throughly.
There is no reason to initiate boto3.session unless you are using different region and user profile.
Take your time and study difference between service client(boto3.client) vs service resources(boto3.resources).
Low level boto3.client is easier to use for experiments. Use high level boto3.resource if you need to pass around arbitrary object.
Here is the simple code for boto3.client("s3").download_file.
import boto3
# initiate the proper AWS services client, i.e. S3
s3 = boto3.client("s3")
s3.download_file('your_bucket_name', 'Date_Table.csv', '/your/local/path/and/filename')

S3 path error with Flume HDFS Sink

I have a Flume consolidator which writes every entry on a S3 bucket on AWS.
The problem is with the directory path.
The events are supposed to be written on /flume/events/%y-%m-%d/%H%M, but they're on //flume/events/%y-%m-%d/%H%M.
It seems that Flume is appending one more "/" at the beginning.
Any ideas for this issue? Is that a problem with my path configuration?
master.sources = source1
master.sinks = sink1
master.channels = channel1
master.sources.source1.type = netcat
# master.sources.source1.type = avro
master.sources.source1.bind = 0.0.0.0
master.sources.source1.port = 4555
master.sources.source1.interceptors = inter1
master.sources.source1.interceptors.inter1.type = timestamp
master.sinks.sink1.type = hdfs
master.sinks.sink1.hdfs.path = s3://KEY:SECRET#BUCKET/flume/events/%y-%m-%d/%H%M
master.sinks.sink1.hdfs.filePrefix = event
master.sinks.sink1.hdfs.round = true
master.sinks.sink1.hdfs.roundValue = 5
master.sinks.sink1.hdfs.roundUnit = minute
master.channels.channel1.type = memory
master.channels.channel1.capacity = 1000
master.channels.channel1.transactionCapactiy = 100
master.sources.source1.channels = channel1
master.sinks.sink1.channel = channel1
The Flume NG HDFS sink doesn't implement anything special for S3 support. Hadoop has some built-in support for S3, but I don't know of anyone actively working on it. From what I have heard, it is somewhat out of date and may have some durability issues under failure.
That said, I know of people using it because it's "good enough".
Are you saying that "//xyz" (with multiple adjacent slashes) is a valid path name on S3? As you probably know, most Unixes collapse adjacent slashes.

How can I backup or sync an Amazon S3 bucket?

I have critical data in an Amazon S3 bucket. I want to make a weekly backup of its other contents to another cloud service, or even inside S3. The best way would to sync my bucket to a new bucket inside a different region, in case of data loss.
How can I do that?
I prefer to backup locally using sync where only changes are updated. That is not the perfect backup solution but you can implement periodic updates later as you need:
s3cmd sync --delete-removed s3://your-bucket-name/ /path/to/myfolder/
If you never used s3cmd, install and configure it using:
pip install s3cmd
s3cmd --configure
Also there should be S3 backup services for $5/month but I would also check Amazon Glacier which lets you put nearly 40 GB single archive file if you use multi-part upload.
http://docs.aws.amazon.com/amazonglacier/latest/dev/uploading-archive-mpu.html#qfacts
Remember, if your S3 account is compromised, you have chance to lose all of your data as you would sync an empty folder or malformed files. So, you better write a script to archive your backup few times, for e.g by detecting start of the week.
Update 01/17/2016:
Python based AWS CLI is very mature now.
Please use: https://github.com/aws/aws-cli
Example: aws s3 sync s3://mybucket .
This script backs up an S3 bucket:
#!/usr/bin/env python
from boto.s3.connection import S3Connection
import re
import datetime
import sys
import time
def main():
s3_ID = sys.argv[1]
s3_key = sys.argv[2]
src_bucket_name = sys.argv[3]
num_backup_buckets = sys.argv[4]
connection = S3Connection(s3_ID, s3_key)
delete_oldest_backup_buckets(connection, num_backup_buckets)
backup(connection, src_bucket_name)
def delete_oldest_backup_buckets(connection, num_backup_buckets):
"""Deletes the oldest backup buckets such that only the newest NUM_BACKUP_BUCKETS - 1 buckets remain."""
buckets = connection.get_all_buckets() # returns a list of bucket objects
num_buckets = len(buckets)
backup_bucket_names = []
for bucket in buckets:
if (re.search('backup-' + r'\d{4}-\d{2}-\d{2}' , bucket.name)):
backup_bucket_names.append(bucket.name)
backup_bucket_names.sort(key=lambda x: datetime.datetime.strptime(x[len('backup-'):17], '%Y-%m-%d').date())
# The buckets are sorted latest to earliest, so we want to keep the last NUM_BACKUP_BUCKETS - 1
delete = len(backup_bucket_names) - (int(num_backup_buckets) - 1)
if delete <= 0:
return
for i in range(0, delete):
print 'Deleting the backup bucket, ' + backup_bucket_names[i]
connection.delete_bucket(backup_bucket_names[i])
def backup(connection, src_bucket_name):
now = datetime.datetime.now()
# the month and day must be zero-filled
new_backup_bucket_name = 'backup-' + str('%02d' % now.year) + '-' + str('%02d' % now.month) + '-' + str(now.day);
print "Creating new bucket " + new_backup_bucket_name
new_backup_bucket = connection.create_bucket(new_backup_bucket_name)
copy_bucket(src_bucket_name, new_backup_bucket_name, connection)
def copy_bucket(src_bucket_name, dst_bucket_name, connection, maximum_keys = 100):
src_bucket = connection.get_bucket(src_bucket_name);
dst_bucket = connection.get_bucket(dst_bucket_name);
result_marker = ''
while True:
keys = src_bucket.get_all_keys(max_keys = maximum_keys, marker = result_marker)
for k in keys:
print 'Copying ' + k.key + ' from ' + src_bucket_name + ' to ' + dst_bucket_name
t0 = time.clock()
dst_bucket.copy_key(k.key, src_bucket_name, k.key)
print time.clock() - t0, ' seconds'
if len(keys) < maximum_keys:
print 'Done backing up.'
break
result_marker = keys[maximum_keys - 1].key
if __name__ =='__main__':main()
I use this in a rake task (for a Rails app):
desc "Back up a file onto S3"
task :backup do
S3ID = "AKIAJM3FAKEFAKENRWVQ"
S3KEY = "0A5kuzV+F1pbaMjZxHQAZfakedeJd0dfakeNpry"
SRCBUCKET = "primary-mzgd"
NUM_BACKUP_BUCKETS = 2
Dir.chdir("#{Rails.root}/lib/tasks")
system "./do_backup.py #{S3ID} #{S3KEY} #{SRCBUCKET} #{NUM_BACKUP_BUCKETS}"
end
The AWS CLI supports this now.
aws s3 cp s3://first-bucket-name s3://second-bucket-name --recursive
I've tried to do this in the past, and it's still annoyingly difficult, especially with large, multi-GB, many-millions-of-files buckets. The best solution I ever found was S3S3Mirror, which was made for exactly this purpose.
It's not as trivial as just flipping a switch, but it's still better than most other DIY solutions I've tried. It's multi-threaded and will copy the files much faster than similar single-threaded approaches.
One suggestion: Set it up on a separate EC2 instance, and once you run it, just shut that machine off but leave the AMI there. Then, when you need to re-run, fire the machine up again and you're all set. This is nowhere near as nice as a truly automated solution, but is manageable for monthly or weekly backups.
The best way would be to have the ability to sync my bucket with a new bucket in a different region in case of a data loss.
As of 24 Mar 2015, this is possible using the Cross-Region Replication feature of S3.
One of the listed Use-case Scenarios is "compliance requirements", which seems to match your use-case of added protection of critical data against data loss:
Although, by default, Amazon S3 stores your data across multiple geographically distant Availability Zones, compliance requirements might dictate that you store data at even further distances. Cross-region replication allows you to replicate data between distant AWS regions to satisfy these compliance requirements.
See How to Set Up Cross-Region Replication for setup instructions.