How to delete lot of objects named with common prefix from S3 bucket? - amazon-s3

I have files in S3 bucket, and their names have the following format:
username#file_id#...
How to remove all john#doe#* items, without listing them? There are thousands of them, so when user request my app to delete all of them, he has to wait.

For anyone who stumbles upon this now, you can create a lifecycle rule to either delete or set expiration of files with a certain prefix.

There's no way to tell S3 to delete all files that meet a specific criteria - you have to delete one key at a time.
Most client libraries offer a way to filter and paginate such that you'd only list the files you need to delete and you can provide a status update. For an example, Boto's bucket listing accepts prefix as one of the parameters.

I have mistakenly create logging files in same bucket and there are like tons of log file in my bucket.
Luckily I came across a nodejs util node-s3-utils and it save my day!
example of delete files with foo/ prefix, having extension .txt
$ s3utils files delete -c ./.s3-credentials.json -p foo/ -r 'foo\/(\w)+\.txt'

Related

How do i create folders for each file when it comes in my s3 bucket?

is it possible to create a folder for each file coming in to my s3 bucket , something like this simple batch script i found where if you put this .bat file in a folder and run it, it creates a folder for each file. So basically every time a users file syncs to my s3 bucket it automatically creates a folder for each file.
#echo off
for %%i in (*) do (
if not "%%~ni" == "organize" (
md "%%~ni" && move "%%~i" "%%~ni"
)
)
The first thing to understand is that S3 does not have folders. Everything in a bucket in S3 is a file, or UI sugar meant to replicate the visual experience of a file system. A "folder" is nothing more than a prefix ending in / (with the caveat that, when you create an empty folder, S3 creates an invisible file to have something to refer to at that prefix).
Therefore, what you're trying to accomplish is effectively just renaming files, to give each file a unique prefix. You can do this in a number of ways:
If you want it to happen as files are written, I would recommend using a lambda trigger which triggers when an object is written, copies it to a new object with the prefix you desire, and deletes the original. You'll need to make sure, of course, to not trigger on the files the lambda itself writes - this can be accomplished by writing to a specific prefix (before your file-specific prefix) which you exclude from the lambda trigger.
If you want it to happen as a batch process (like the bat file you mention), you have many more options. You can write a lambda function that does the renaming, and hook it up to a Cloudwatch trigger to run on a schedule. You can also just write a bash script, using the AWS CLI.

Adding meta-data to a folder in amazon S3

I can set 'Cache-Control'(meta data) of a particular file and a particular bucket in amazon S3.
But I want that Cache-Control to be set for every file in a particular folder(not the entire bucket, but only folder).
Also when I upload a new file in that particular folder, Cache-Control header gets automatically set for the new file.
I have followed this and S3 Documentation.
Is there any way by which this can be achieved?
Based on another question's answer here I found that with this tool we can run a recursive for all files in a particular folder, but this won't be applied for the new files.
steps:
git clone https://github.com/s3tools/s3cmd
Run s3cmd --configure
(You will be asked for the two keys - copy and paste them from your confirmation email or from your Amazon account page. Be careful when copying them! They are case sensitive and must be entered accurately or you'll keep getting errors about invalid signatures or similar. Remember to add s3:ListAllMyBuckets permissions to the keys or you will get an AccessDenied error while testing access.)
./s3cmd --recursive modify --add-header="Cache-Control:public ,max-age= 31536000" s3://your_bucket_name/
#FreeFly
It seems like with the new updates to the S3 console UI, there is now an option to add metadata on a directory, and doing so will recursively apply metadata to each object within that directory.
Go to
Your Bucket
Check object (can be a directory)
Click on Select and from the dropdown menu select Edit metadata
then select Add metadata and enter metadata
Click on Edit Metadata.
and this should recursively apply metadata to all objects inside current directory.

Migrate to Amazon S3 - Keeping my hierarchical directories?

I have a Rails 3 app with Paperclip gem.
Actually, my local directories structure is based on my record UUID to stock images:
5D5E5641-FCE8-4D0B-A413-A9F993CD0E34
becomes:
5/D/5/E/5/6/....... 3/4/full/image.jpg
5/D/5/E/5/6/....... 3/4/thumb/image.jpg
so that, I never have more than 32000 nodes per directory.
I want to migrate to S3:
1) Can I keep this directories structure on S3 ? Could it be a perf issue ?
2) Does Amazon S3 has its own directories management per bucket ?
Thanks.
There is no such stuff as folders in Amazon S3. It is a "flat" file system. The closer you can get to folders is adding prefixes like you said: 5/D/image.jpg to your file names. In this case, 5 is a prefix and 5/D is also a prefix. On the other hand, your delimiter could be /.
Even though several S3 tools will show you stuff as if they were contained inside folders, this concept does not exist on S3. Please see this and this related threads.
You can definitely use the pattern you suggested, and I don't think you will suffer any performance penalties by doing so.

AWS: Append only mode for S3 bucket

Context
I want to have a machine upload a file dump.rdb to s3/blahblahblah/YEAR-MONTH-DAY-HOUR.rdb on the hour.
Thus, I need this machine to have the ability to upload new files to S3.
However, I don't want this machine to have the ability to (1) delete existing files or (2) overwrite existing files.
In a certain sense, it can only "append" -- it can only add in new objects.
Question:
Is there a way to configure an S3 setup like this?
Thanks!
I cannot comment yet, so here is a refinement to #Viccari 's answer...
The answer is misleading because it only addresses #1 in your requirements, not #2. In fact, it appears that it is not possible to prevent overwriting existing files, using either method, although you can enable versioning. See here: Amazon S3 ACL for read-only and write-once access.
Because you add a timestamp to your file names, you have more or less worked around the problem. (Same would be true of other schemes to encode the "version" of each file in the file name: timestamps, UUIDs, hashes.) However, note that you are not truly protected. A bug in your code, or two uploads in the same hour, would result in an overwritten file.
Yes, it is possible.
There are two ways to add permissions to a bucket and its contents: Bucket policies and Bucket ACLs. You can achieve what you want by using bucket policies. On the other hand, Bucket ACLs do not allow you to give "create" permission without giving "delete" permission as well.
1-Bucket Policies:
You can create a bucket policy (see some common examples here), allowing, for example, an specific IP address to have specific permissions.
For example, you can allow: s3:PutObject and not allow s3:DeleteObject.
More on S3 actions in bucket policies can be found here.
2-Bucket ACLs:
Using Bucket ACLs, you can only give the complete "write" permission, i.e. if a given user is able to add a file, he is also able to delete files.
This is NOT possible! S3 is a key/value store and thus inherently doesn't support append only. The PUT/cp command to S3 can always overwrite a file. By enabling versioning on your bucket you are still safe in cause the account uploading the files gets compromised.

Folder won't delete on Amazon S3

I'm trying to delete a folder created as a result of a MapReduce job. Other files in the bucket delete just fine, but this folder won't delete. When I try to delete it from the console, the progress bar next to its status just stays at 0. Have made multiple attempts, including with logout/login in between.
I had the same issue and used AWS CLI to fix it:
aws s3 rm s3://<your-bucket>/<your-folder-to-delete>/ --recursive ;
(this assumes you have run aws configure and aws s3 ls s3://<your-bucket>/ already works)
First and foremost, Amazon S3 doesn't actually have a native concept of folders/directories, rather is a flat storage architecture comprised of buckets and objects/keys only - the directory style presentation seen in most tools for S3 (including the AWS Management Console itself) is based solely on convention, i.e. simulating a hierarchy for objects with identical prefixes - see my answer to How to specify an object expiration prefix that doesn't match the directory? for more details on this architecture, including quotes/references from the AWS documentation.
Accordingly, your problem might stem from a tool using a different convention for simulating this hierarchy, see for example the following answers in the AWS forums:
Ivan Moiseev's answer to the related question Cannot delete file from bucket, where he suggests to use another tool to inspect whether you might have such a problem and remedy it accordingly.
The AWS team response to What are these _$folder$ objects? - This is a convention used by a number of tools including Hadoop to make directories in S3. They're primarily needed to designate empty directories. One might have preferred a more aesthetic scheme, but well that is the way that these tools do it.
Good luck!
I was getting the following error when I tried to delete a bucket which was a directory that held log files from Cloudfront.
An unexpected error has occurred. Please try again later.
After I disabled logging in Cloudfront I was able to delete the folder successfully.
My guess is that it was a system folder used by Cloudfront that did not allow deletion by the owner.
In your case, you may want to check if MapReduce is holding on to the folder in question.
I was facing the same problem. Tried many login, logout attempts and refresh but problem persist. Searched stackoverflow and found suggestions to cut and paste folder in different folder then delete but didn't worked.
Another thing you should look is for versioning that might effect your bucket may be suspending the versioning allow you to delete the folder.
My solution was to delete it with code. I have used boto package in python for file handling over s3 and the deletion worked when I tried to delete that folder from my python code.
import boto
from boto.s3.key import Key
keyId = "your_aws_access_key"
sKeyId = "your_aws_secret_key"
fileKey="dummy/foldertodelete/" #Name of the file to be deleted
bucketName="mybucket001" #Name of the bucket, where the file resides
conn = boto.connect_s3(keyId,sKeyId) #Connect to S3
bucket = conn.get_bucket(bucketName) #Get the bucket object
k = Key(bucket,fileKey) #Get the key of the given object
k.delete() #Delete
S3 doesn't keep directory it just have a flat file structure so everything is managed with key.
For you its a folder but for S3 it just an key.
If you want to delete a folder named -> dummy
then key would be
fileKey = "/dummy/"
Firstly, read the content of directory from getBucket method, then you got a array list of all files, then delete the file from deleteObject method.
if (($contents = $this->S3->getBucket(AS_S3_BUCKET, "file_path")) !== false)
{
foreach ($contents as $file)
{
$result = $this->S3->deleteObject(AS_S3_BUCKET,$file['name']);
}
}
$this->S3 is S3 class object, and AS_S3_BUCKET is bucket name.