JClouds S3: Specify Content Length when uploading file - amazon-s3

I wrote an application using JClouds 1.6.2 and had file upload code like
java.io.File file = ...
blobStore.putBlob(containerName,
blobStore.blobBuilder(name)
.payload(file)
.calculateMD5()
.build()
);
This worked perfectly well.
Now, in jclouds 1.7, BlobStore.calculateMD5() is deprecated. Furthermore, even if calculating the MD5 hash manually (using guava Hashing) and passing it with BlobStore.contentMD5(), I get the following error:
java.lang.IllegalArgumentException: contentLength must be set, streaming not supported
So obviously, I also have to set the content length.
What is the easiest way to calculate the correct content length?
Actually I don't think, jclouds suddenly removed support of features and makes uploading files much more difficult. Is there a way to let jclouds calculate MD5 and/or content length?

You should work with ByteSource which offers several helper methods:
ByteSource byteSource = Files.asByteSource(new File(...));
Blob blob = blobStore.blobBuilder(name)
.payload(byteSource)
.contentLength(byteSource.size())
.contentMD5(byteSource.hash(Hashing.md5()).asBytes())
.build();
blobStore.putBlob(containerName, blob);
jclouds made these changes to remove functionality duplicated by Guava and make some of the costs of some operations, e.g., hashing, more obvious.

Related

Why am i getting different asset amount using algorand Indexer vs daemon?

So I created a new ASA (AKA: Algorand Standard Asset) and set the total amount of that asset to be maximum.
Here's a quick snippet of how I did it:
const UINT64_MAX: bigint = BigInt('18446744073709551615');
Now, When I check how many tokens asset creator has with Algorand's Daemon API
curl http://localhost:8980/v2/accounts/3IELQKOD...3C5IB3BP4V4A/assets
I get it exactly right as: 18446744073709551615
But when i check it with the indexer in the sdk its something different.
It shows total assets as "18446744073709552000" to be exact which is not true.
What am i doing wrong here or this is error in library?
you need to set your client to support big int or mixed.
as JS Only supports 2^^53
You can easily set it by setting IntDecoding method for all JSON requests created by client here.

Apache OAK Direct Binary Access with S3DataStore

I'm trying to figure out how the direct binary access feature works with Apache Oak.
My understanding so far is, I can set binary properties to nodes, and later, I should be able to get a direct download link (from S3).
First, I created a node and added a binary property with the contents of some file.
val ntFile = session.getRootNode.addNode(path, "nt:file")
val ntResource = ntFile.addNode("jcr:content", "nt:resource")
ntResource.setProperty("jcr:mimeType", "application/octet-stream")
ntResource.setProperty("jcr:lastModified", Calendar.getInstance())
val fStream = new FileInputStream("/home/evren/cast.webm")
val bin = session.getValueFactory.asInstanceOf[JackrabbitValueFactory].createBinary(fStream)
ntResource.setProperty("jcr:data", bin)
And I can see on the AWS Console, my binary is uploaded.
But, still, I cannot generate direct download URI, even following the documentation on the OAK website. (So the code continues)
session.save()
session.refresh(false)
val binary = session.getRootNode.getNode(path)
.getNode("jcr:content").getProperty("jcr:data").getValue.getBinary
val uri = binary.asInstanceOf[BinaryDownload].getURI(BinaryDownloadOptions.DEFAULT)
It's always returning null.
Someone please could point me to what I am doing wrong or is my understanding.
Thanks in advance.
I figured it out. In case, anyone else is facing the same issue, the trick is to register your BlobStore using a WhiteBoard.
This explains a lot about the issue that, I could upload files directly using BlobStorage but OAK itself could not use the BlobStore functionality to get a direct download link.
val wb = new DefaultWhiteboard()
// register s3/azure as BlobAccessProvider
wb.register(
classOf[BlobAccessProvider],
blobStore.asInstanceOf[BlobAccessProvider],
Collections.emptyMap()
)
val jcrRepo = new Jcr(nodeStore).`with`(wb).createRepository()
And once you create your JCR Repo like this, direct binary download/upload works as expected.

Is multipart copy really needed to revert an object to a prior version?

For https://github.com/wlandau/gittargets/issues/6, I am trying to programmatically revert an object in an S3 bucket to an earlier version. From reading https://docs.aws.amazon.com/AmazonS3/latest/userguide/RestoringPreviousVersions.html, it looks like copying the object to itself (old version to current version) is recommended. However, I also read that there is a 5 GB limit for copying objects in S3. Does that limit apply to reverting an object to a previous version in the same bucket? A local download followed by a multi-part upload seems extremely inefficient for this use case.
You can create a multi-part transfer request that transfers from S3 to S3. It still takes time, but it doesn't require downloading the object's data and uploading it again, so in practice it tends to be considerably faster than other options:
import boto3
s3 = boto3.resource('s3')
bucket = s3.Bucket('example-bucket')
bucket.copy(
{
'Bucket': 'example-bucket',
'Key': 'test.dat',
'VersionId': '0011223344', # From previous call to bucket.object_versions
},
Key='test.dat',
)

Using Leigh version of S3Wrapper.cfc Can't get past Init

I am new to S3 and need to use it for image storage. I found a half dozen versions of an s2wrapper for cf but it appears that the only one set of for v4 is one modified by Leigh
https://gist.github.com/Leigh-/26993ed79c956c9309a9dfe40f1fce29
Dropped in the com directory and created a "test" page that contains the following code:
s3 = createObject('component','com.S3Wrapper').init(application.s3.AccessKeyId,application.s3.SecretAccessKey);
but got the following error :
So I changed the line 37 from
variables.Sv4Util = createObject('component', 'Sv4').init(arguments.S3AccessKey, arguments.S3SecretAccessKey);
to
variables.Sv4Util = createObject('component', 'Sv4Util').init(arguments.S3AccessKey, arguments.S3SecretAccessKey);
Now I am getting:
I feel like going through Leigh code and start changing things is a bad idea since I have lurked here for year an know Leigh's code is solid.
Does any know if there are any examples on how to use this anywhere? If not what I am doing wrong. If it makes a difference I am using Lucee 5 and not Adobe's CF engine.
UPDATE :
I followed Leigh's directions and the error is now gone. I am addedsome more code to my test page which now looks like this :
<cfscript>
s3 = createObject('component','com.S3v4').init(application.s3.AccessKeyId,application.s3.SecretAccessKey);
bucket = "imgbkt.domain.com";
obj = "fake.ping";
region = "s3-us-west-1"
test = s3.getObject(bucket,obj,region);
writeDump(test);
test2 = s3.getObjectLink(bucket,obj,region);
writeDump(test2);
writeDump(s3);
</cfscript>
Regardless of what I put in for bucket, obj or region I get :
JIC I did go to AWS and get new keys:
Leigh if you are still around or anyone how has used one of the s3Wrappers any suggestions or guidance?
UPDATE #2:
Even after Alex's help I am not able to get this to work. The Link I receive from getObjectLink is not valid and getObject never does download an object. I thought I would try the putObject method
test3 = s3.putObject(bucketName=bucket,regionName=region,keyName="favicon.ico");
writeDump(test3);
to see if there is any additional information, I received this :
I did find this article https://shlomoswidler.com/2009/08/amazon-s3-gotcha-using-virtual-host.html but it is pretty old and since S3 specifically suggests using dots in bucketnames I don't that it is relevant any longer. There is obviously something I am doing wrong but I have spent hours trying to resolve this and I can't seem to figure out what it might be.
I will give you a rundown of what the code does:
getObjectLink returns a HTTP URL for the file fake.ping that is found looking in the bucket imgbkt.domain.com of region s3-us-west-1. This link is temporary and expires after 60 seconds by default.
getObject invokes getObjectLink and immediately requests the URL using HTTP GET. The response is then saved to the directory of the S3v4.cfc with the filename fake.ping by default. Finally the function returns the full path of the downloaded file: E:\wwwDevRoot\taa\fake.ping
To save the file in a different location, you would invoke:
downloadPath = 'E:\';
test = s3.getObject(bucket,obj,region,downloadPath);
writeDump(test);
The HTTP request is synchronous, meaning the file will be downloaded completely when the functions returns the filepath.
If you want to access the actual content of the file, you can do this:
test = s3.getObject(bucket,obj,region);
contentAsString = fileRead(test); // returns the file content as string
// or
contentAsBinary = fileReadBinary(test); // returns the content as binary (byte array)
writeDump(contentAsString);
writeDump(contentAsBinary);
(You might want to stream the content if the file is large since fileRead/fileReadBinary reads the whole file into buffer. Use fileOpen to stream the content.
Does that help you?

Can't replace mongo document

I am attempting to save documents to a mongoDB cluster (sharded replica sets) and am having a strange issue. I am using pymongo 2.7.2 and TokuMX 1.5 mongodb 2.4.10.
When I attempt to save (overwrite) existing documents I am getting an exception that looks like the document I am saving is too large:
doc = db.collection.find_one()
db.collection.save(doc)
pymongo.errors.OperationFailure: BSONObj size: 18798961 (0x71D91E01) is invalid. Size must be between 0 and 16793600(16MB) First element: op: "u"
However this works fine:
doc = db.collection.find_one()
db.collection.remove({'_id': doc['_id']})
db.collection.save(doc)
The document in question is about 9mb, so it looks like when I attempt to replace the document it is somehow doubling the size of the document, exceeding the 16mb limit.
Any ideas as to what could cause this behavior?
Apparently this is a known issue with TokuMX. Oplog entries are twice the size of the document, so replacing a 9mb document will result in a 18mb oplog entry- which raises the exception.
The solution would be to limit document writes to less than 8mb so that oplog entries never exceed 16mb.
I think this is a side effect of how save is implemented in PyMongo.
Under the hood if the document has a _id then the save(doc) is turned into an update(doc, doc). That is where the doubling is coming into play since the query+update is 18MB.
When you removed the _id you changed the save(doc) into a insert(doc) of a new document with a new _id. I don't think that is what you wanted.
Rather than use save I would recommend constructing a query with just the _id field from the original document and doing the update call manually. I would even go so far as you should enter a Jira ticket to get PyMongo to do this for you.
HTH,
Rob.