S3 objects have been deleted randomly - amazon-s3

Is there a command in AWS CLI to restore Versioning files?
i've been developing a web server using Django
someday i found there was deleted image files randomly in S3
i think Django sorl-thumbnail will delete it
and tried to fix it but it failed
So I thought of temporary solution.
AWS S3 is delivering versioning. i use it to recover manually every day.
This is very Annoying to do, so I am writing a script.
But I could not find a way to restore a file with a delete marker.
Does anyone know the situation above?
thanks you!

Recovering of objects is a bit tricky in s3. As per AWS documentation http://docs.aws.amazon.com/AmazonS3/latest/dev/DeletingObjects.html
When you delete an object from a versioned bucket, S3 creates a new object called a delete marker, which has its own, new version ID.
If you delete that "version" of the object, it will restore your object's visibility.
You can use this command
aws s3api delete-object --bucket <bucket> --key <key> --version-id <version_id_of_delete_marker>

Related

Will objects with the same name uploaded in AWS s3 be overwritten

I'm uploading images/videos to S3 using their API and putObject.
When I use upload of com.amazonaws.services.s3.transfer to post the same PutObjectRequest twice, will the object be overwritten by the latest one ? Or aws will save the object twice with different versionID?
I didn't find the answer in Aws official document. I've checked SO but it's quite an old question and I don't know how the current version is.
Yes, by default the versioning on S3 buckets is disabled.

Upload multiple files to AWS S3 bucket without overwriting existing objects

I am very new to AWS technology.
I want to add some files to an existing S3 bucket without overwriting existing objects. I am using Spring Boot technology for my project.
Can anyone please suggest how can we add/upload multiple files without overwriting existing objects?
AWS S3 supports object versioning in the bucket, in which for use case of uploading same file, S3 will keep all files within the bucket with different version rather than overwriting it.
This can be configured using AWS Console or CLI to enable the Versioning feature. You may want to refer this link for more info.
You probably already found an answer to this, but if you're using the CDK or the CLI you can specify a destinationKeyPrefix. If you want multiple folders in an S3, which was my case, the folder name will be your destinationKeyPrefix.

Automatic sync S3 to local

I want to auto sync my local folder with S3 bucket. I mean, when i change some file in S3, automatically this file would update in the local folder.
I tried using scheduler task and AWS cli but i think there is a better way to do that.
Do you know some app or better solution?
Hope you can help me.
#mgg, You can mount the s3 bucket to the local server using s3fs, this way you can sync your local changes to s3 bucket.
You could execute code (Lambda Functions) that responds to some events in a given bucket (such file change, deleted or created), so, you could have a simple http service that receives a post or a get request from that lambda and update your local data accordingly.
Read more:
Tutorial, Using AWS Lambda with Amazon S3
Working with Lambda Functions
The other approach (I don't recommend this) is to have some code "pulling" for changes in some bucket and then reflecting those changes locally. At first glance it looks easier to implement, but ... it get complicated when you try to handle not just creation events.
And of course for each cycle of your "pulling" component you have to check all elements in your local directory against all elements in the bucket, it is a performance killing approach!

Emrfs file sync with s3 not working

After running a spark job on an Amazon EMR cluster, I deleted the output files directly from s3 and tried to rerun the job again. I received the following error upon trying to write to parquet file format on s3 using sqlContext.write:
'bucket/folder' present in the metadata but not s3
at com.amazon.ws.emr.hadoop.fs.consistency.ConsistencyCheckerS3FileSystem.getFileStatus(ConsistencyCheckerS3FileSystem.java:455)
I tried running
emrfs sync s3://bucket/folder
which did not appear to resolve the error even though it did remove some records from the DynamoDB instance that keeps track of the metadata. Not sure what else I can try. How do I resolve this error?
It turned out that I needed to run
emrfs delete s3://bucket/folder
first before running sync. Running the above solved the issue.
Mostly the consistent problem comes due to retry logic in spark and hadoop systems. When a process of creating a file on s3 failed, but it already updated in the dynamodb. when the hadoop process restarts the process as the entry is already present in the dynamodb. It throws the consistent error.
If you want to delete the metadata of s3 which is stored in the dynamaoDB, whose objects are already removed.
This are the steps,
Delete all the metadata
Deletes all the objects in the path,
emrfs delete uses the hash function to delete the records, so it may delete unwanted entries also, so we are doing the import and sync in the consequent steps
emrfs delete s3://path
Retrieves the metadata for the objects that are physically present in s3 into dynamo db
emrfs import s3://path
Sync the data between s3 and the metadata.
emrfs sync s3://path
After all the operations, to see whether that particular object is present in both s3 and metadata
emrfs diff s3://path
http://docs.aws.amazon.com/emr/latest/ManagementGuide/emrfs-cli-reference.html
I arrived at this page because I was getting the error "key is marked as directory in metadata but is file in s3" and was very puzzled. I think what happened is that I accidentally created both a file and directory by the same name. By deleting the file it solved my issue:
aws s3 rm s3://bucket/directory_name_without_trailing_slash

AWS elasticbeanstalk automating deletion of logs published to S3

I have enabled publishing of logs from AWS elasticbeanstalk to AWS S3 by following these instructions: http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/using-features.loggingS3.title.html
This is working fine. My question is how do I automate the deletion of old logs from S3, say over one week old? Ideally I'd like a way to configure this within AWS but I can't find this option. I have considered using logrotate but was wondering if there is a better way. Any help is much appreciated.
I eventually discovered how to do this. You can create an S3 Lifecycle rule to delete particular files or all files in a folder more than N days old. Note: you can also archive instead of delete or archive for a while before deleting, among other things- it's a great feature.
Reference: http://docs.aws.amazon.com/AmazonS3/latest/dev/ObjectExpiration.html
and http://docs.aws.amazon.com/AmazonS3/latest/dev/manage-lifecycle-using-console.html