Add Metadata on Amazon S3 object is not persistent

Add Metadata on Amazon S3 object is not persistent - amazon-s3

I am trying to add a new meta data information to an S3 object so I know what are all objects that I already processed. But for some reason, the new data that I added is not persistent. When I exit the program, its not there anymore. I do not see the new field "processed" in the old data, but I see it is present in the new data. I expect the newly added metadata field to present on the object permanently but it is gone after I exit the program
ObjectMetadata objMetaData = new ObjectMetadata();
objMetaData = s3.getObjectMetadata(bucketName,prefix);
Map<String, String> map = new HashMap<String, String>();
Map<String, String> newMap = new HashMap<String, String>();
map = objMetaData.getUserMetadata();
System.out.println("old Meta data is " + map.toString());
objMetaData.addUserMetadata("x-amz-meta-processed", "true");
newMap = objMetaData.getUserMetadata();
System.out.println("New processed data is" +newMap.toString());

I suspect your confusion comes from not understanding an important part of the design of S3:
S3 objects can't be modified, and neither can their metadata. It's all immutable.
Wait, what? It's technically true.
The only way to modify object metadata is to make a copy of the object and set the metadata.
http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingMetadata.html
You can, in practice, "add" metadata to an object, but what that really means is that you're asking S3 to make a copy of the object, with the same key for the source and target, but using "different" metadata.
If the "different" metadata you want on the object includes metadata that was already present, you have to include that in the request.
S3 supports copying an object onto itself, so you don't actually have to re-upload the object.
What you are doing now is just changing the values in your local code's data structures.

Related

Apache OAK Direct Binary Access with S3DataStore

I'm trying to figure out how the direct binary access feature works with Apache Oak.
My understanding so far is, I can set binary properties to nodes, and later, I should be able to get a direct download link (from S3).
First, I created a node and added a binary property with the contents of some file.
val ntFile = session.getRootNode.addNode(path, "nt:file")
val ntResource = ntFile.addNode("jcr:content", "nt:resource")
ntResource.setProperty("jcr:mimeType", "application/octet-stream")
ntResource.setProperty("jcr:lastModified", Calendar.getInstance())
val fStream = new FileInputStream("/home/evren/cast.webm")
val bin = session.getValueFactory.asInstanceOf[JackrabbitValueFactory].createBinary(fStream)
ntResource.setProperty("jcr:data", bin)
And I can see on the AWS Console, my binary is uploaded.
But, still, I cannot generate direct download URI, even following the documentation on the OAK website. (So the code continues)
session.save()
session.refresh(false)
val binary = session.getRootNode.getNode(path)
.getNode("jcr:content").getProperty("jcr:data").getValue.getBinary
val uri = binary.asInstanceOf[BinaryDownload].getURI(BinaryDownloadOptions.DEFAULT)
It's always returning null.
Someone please could point me to what I am doing wrong or is my understanding.
Thanks in advance.

I figured it out. In case, anyone else is facing the same issue, the trick is to register your BlobStore using a WhiteBoard.
This explains a lot about the issue that, I could upload files directly using BlobStorage but OAK itself could not use the BlobStore functionality to get a direct download link.
val wb = new DefaultWhiteboard()
// register s3/azure as BlobAccessProvider
wb.register(
classOf[BlobAccessProvider],
blobStore.asInstanceOf[BlobAccessProvider],
Collections.emptyMap()
)
val jcrRepo = new Jcr(nodeStore).`with`(wb).createRepository()
And once you create your JCR Repo like this, direct binary download/upload works as expected.

Updating existing metadata of my S3 object

I am trying to update the existing metadata of my S3 object but in spite of updating it is creating the new one. As per the documentation, it is showing the same way but don't know why it is not able to update it.
k = s3.head_object(Bucket='test-bucket', Key='test.json')
s3.copy_object(Bucket='test-bucket', Key='test.json', CopySource='test-bucket' + '/' + 'test.json', Metadata={'Content-Type': 'text/plain'}, MetadataDirective='REPLACE')

I was able to update using the copy_from method
s3 = boto3.resource('s3')
object = s3.Object(bucketName, uploadedKey)
object.copy_from(
CopySource={'Bucket': bucketName,'Key': uploadedKey},
MetadataDirective="REPLACE",
ContentType=value
)

S3 metadata is read-only, so updating only metadata of an S3 object is not possible. The only way to update the metadata is to recreate/copy the object. Check the 1st paragraph of the official docs
You can set object metadata at the time you upload it. After you upload the object, you cannot modify object metadata. The only way to modify object metadata is to make a copy of the object and set the metadata.

jackson-dataformat-csv: cannot serialize LocalDate

When I try to serialize object containing Local date, I get following error:
csv generator does not support object values for properties
I have JSR-310 module enabled, with WRITE_DATES_AS_TIMESTAMPS and I can convert the same object to JSON without problem.
For now I resorted to mapping the object to another, string only object, but it's decadent and wasteful.
Is there a way for Jackson csv mapper to acknowledge localDates? Should I somehow enable JSR-310 specifically for csv mapper?

I had the same problem because of configuring mapper after schema. Make sure you are using the latest verson of jackson and its modules. This code works for me:
final CsvMapper mapper = new CsvMapper();
mapper.findAndRegisterModules();
mapper.configure(SerializationFeature.WRITE_DATES_AS_TIMESTAMPS, false); //Optional
final CsvSchema schema = mapper.schemaFor(PojoWithLocalDate.class);
// Use this mapper and schema as you need to: get readers, writers etc.
No additional annotations needed in Pojo class.

How can I get references to BlockBlob objects from CloudBlobDirectory.ListBlobs?

I am using the Microsoft Azure .NET client libraries to interact with Azure cloud storage. I need to be able to access additional information about each blob in its metadata collection. I am currently using CloudBlobDirectory.ListBlobs() method to get a list of blobs in a particular directory of a directory structure I've devised in the blob names. The ListBlobs() method returns a list of IListBlobItem objects. They only have a couple of properties: Url and references to parent directory and parent container. I need to get to the metadata of the actual blob objects.
I envisioned there would be a way to either cast the IListBlobItem to a BlockBlob object or use the IListBlockItem to get a reference to the BlockBlob, but can't seem to find a way to do that.
My question is: Is there a way to get a BlockBlob object from this method, or do I have to use a different way of getting the actual BlockBlob objects? If different, then can you suggest a way to achieve this, while also being able to filter by the "directory" scheme?

OK... I found a way to do this, and while it seems a little clunky and indirect, it does achieve the main thing I thought should be doable, which is to cast the IListBlobItem directly to a CloudBlockBlob object.
What I am doing is getting the list from the Directory object's ListBlobs() method and then looping over each item in the list and casting the item to a CloudBlockBlob object and then calling the FetchAttributes() method to retrieve the properties (including the metadata). Then add a new "info" object to a new list of info objects. Here's the code I'm using:
CloudBlobDirectory dir = container.GetDirectoryReference(dirPath);
var blobs = dir.ListBlobs(true);
foreach (IListBlobItem item in blobs)
{
CloudBlockBlob blob = (CloudBlockBlob)item;
blob.FetchAttributes();
files.Add(new ImageInfo
{
FileUrl = item.Uri.ToString(),
FileName = item.Uri.PathAndQuery.Replace(restaurantId.ToString().PadLeft(3, '0') + "/", ""),
ImageName = blob.Metadata["Name"]
});
}
The whole "Blob" concept seems needlessly complex and doesn't seem to achieve what I'd have thought would have been one of the main features of the Blob wrapper. That is, a way to expand search capabilities by allowing a query over name, directory, container and metadata. I'd have thought you could construct a linq query that would read somewhat like: "return a list of all blobs in the 'images' container, that are in the 'natural/landscapes/' directory path that have a metadata key of 'category' with the value of 'sunset'". There doesn't seem to be a way to do that and that seems to be a missed opportunity to me. Oh, well.
If I'm wrong and way off base here, please let me know.

This approach has been developed for Java, but I hope it can somehow be modified to fit any other supported language. Despite the functionality you ask has not been explicitly developed yet, I think I found a different (hopefully less clunky) way to access CloudBlockBlob data from a ListBlobItem element.
The following code can be used to delete, for example, every blob inside a specific directory.
String blobUri;
CloudBlobClient blobClient = /* Obtain your blob client */
try{
CloudBlobContainer container = /* Obtain your blob container */
for (ListBlobItem blobItem : container.listBlobs(blobPrefix)) {
if (blobItem instanceof CloudBlob) {
blob = (CloudBlob) blobItem;
if (blob.exists()){
System.out.println("Deleting blob " + blob.getName());
blob.delete();
}
}
}
}catch (URISyntaxException | StorageException ex){
Logger.getLogger(BlobOperations.class.getName()).log(Level.SEVERE, null, ex);
}

The previous answers are good. I just wanted to point out 2 things:
1) Nowadays ASYNC programming is recommended to do and supported by Azure SDK as well. So try to use it:
CloudBlobDirectory dir = container.GetDirectoryReference(dirPath);
var blobs = dir.ListBlobs(true);
foreach (IListBlobItem item in blobs)
{
CloudBlockBlob blob = (CloudBlockBlob)item;
await blob.FetchAttributesAsync(); //Use async calls...
}
2) Fetching Metadata in a separate call is not efficient. The code makes 2 HTTP request per blob object. ListBlobs() method supports getting Metadata with as well in one call by setting BlobListingDetails parameter:
CloudBlobDirectory dir = container.GetDirectoryReference(dirPath);
var blobs = dir.ListBlobs(useFlatBlobListing: true, blobListingDetails: BlobListingDetails.Metadata);
I recommend to use second code it it is possible. Since it is the most efficient way to fetch Metadata.

Amazon S3 boto: How do you rename a file in a bucket?

How do you rename a S3 key in a bucket with boto?

You can't rename files in Amazon S3. You can copy them with a new name, then delete the original, but there's no proper rename function.

Here is an example of a Python function that will copy an S3 object using Boto 2:
import boto
def copy_object(src_bucket_name,
src_key_name,
dst_bucket_name,
dst_key_name,
metadata=None,
preserve_acl=True):
"""
Copy an existing object to another location.
src_bucket_name Bucket containing the existing object.
src_key_name Name of the existing object.
dst_bucket_name Bucket to which the object is being copied.
dst_key_name The name of the new object.
metadata A dict containing new metadata that you want
to associate with this object. If this is None
the metadata of the original object will be
copied to the new object.
preserve_acl If True, the ACL from the original object
will be copied to the new object. If False
the new object will have the default ACL.
"""
s3 = boto.connect_s3()
bucket = s3.lookup(src_bucket_name)
# Lookup the existing object in S3
key = bucket.lookup(src_key_name)
# Copy the key back on to itself, with new metadata
return key.copy(dst_bucket_name, dst_key_name,
metadata=metadata, preserve_acl=preserve_acl)

There is no direct method to rename the file in s3. what do you have to do is copy the existing file with new name (Just set the target key) and delete the old one. Thank you

//Copy the object
AmazonS3Client s3 = new AmazonS3Client("AWSAccesKey", "AWSSecretKey");
CopyObjectRequest copyRequest = new CopyObjectRequest()
.WithSourceBucket("SourceBucket")
.WithSourceKey("SourceKey")
.WithDestinationBucket("DestinationBucket")
.WithDestinationKey("DestinationKey")
.WithCannedACL(S3CannedACL.PublicRead);
s3.CopyObject(copyRequest);
//Delete the original
DeleteObjectRequest deleteRequest = new DeleteObjectRequest()
.WithBucketName("SourceBucket")
.WithKey("SourceKey");
s3.DeleteObject(deleteRequest);

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Add Metadata on Amazon S3 object is not persistent - amazon-s3

Related

Apache OAK Direct Binary Access with S3DataStore

Updating existing metadata of my S3 object

jackson-dataformat-csv: cannot serialize LocalDate

How can I get references to BlockBlob objects from CloudBlobDirectory.ListBlobs?

Amazon S3 boto: How do you rename a file in a bucket?

Categories

Resources