aws s3 search sub-folders containing one specific file - amazon-s3

I understand that s3 does not have "folder" but I will still use the term to illustrate what I am looking for.
I have this folder structure in s3:
my-bucket/folder-1/file-named-a
my-bucket/folder-2/...
my-bucket/folder-3/file-named-a
my-bucket/folder-4/...
I would like to find all folders containing "file-named-a", so folder-1 and folder-3 in above example will be returned. I only need to search the "top level" folders under my-bucket. There could be tens of thousands of folders to search. How to construct the ListObjectsRequest to do that?
Thanks,
Sam

An Amazon S3 bucket can be listed (ListBucket()) to view its contents, and this API call can be limited by a Prefix. However, it is not possible to put a wildcard within the prefix.
Therefore, you would need to retrieve the entire bucket listing, looking for these files. This would require repeated calls if there are a large number of objects.
Example: Listing Keys Using the AWS SDK for Java

Related

How is it possible to have folders in object storage? [duplicate]

This question already has answers here:
Add folder in Amazon s3 bucket
(16 answers)
Closed last month.
As per my understanding, object storage has a 'flat' structure so you cannot create folders within buckets. However, in both GCP & AWS, I am able to upload regular folders to the buckets, which also look like regular folders on their web UI console. What is the difference between the folders I am seeing on these buckets, and the folders which are there in a file-storage system (like my personal laptop)?
As far as I know Object Storage has a 'flat' structure so you cannot create folders within buckets, nor can you nest buckets in buckets.
If you need to have some form of 'folder'-like structure, then using prefixes is the way to go. You'll then end up with this structure: {endpoint}/{bucket-name}/{object-prefix}/{object-name}.
thats what you are seeing according to me
Amazon S3 has a flat structure instead of a hierarchy as you would see in a file system. However, for the sake of organizational simplicity, the Amazon S3 console supports the folder concept as a means of grouping objects. It does this by using a shared name prefix for objects (that is, objects have names that begin with a common string). Object names are also referred to as key names.
For example, you can create a folder on the console named photos and store an object named myphoto.jpg in it. The object is then stored with the key name photos/myphoto.jpg, where photos/ is the prefix.
Here are two more examples:
If you have three objects in your bucket—logs/date1.txt,
logs/date2.txt, and logs/date3.txt—the console will show a folder
named logs. If you open the folder in the console, you will see three
objects: date1.txt, date2.txt, and date3.txt.
If you have an object named photos/2017/example.jpg, the console will
show you a folder named photos containing the folder 2017. The folder
2017 will contain the object example.jpg.
When you create a folder in Amazon S3, S3 creates a 0-byte object with a key that's set to the folder name that you provided. For example, if you create a folder named photos in your bucket, the Amazon S3 console creates a 0-byte object with the key photos/. The console creates this object to support the idea of folders.
You can read more in the Amazon S3 user guide.

Listing S3 folders via Java API, excluding files

I have an AWS S3 bucket with several folders, subfolders, and files. I want to get a list of all subfolders of a folder, excluding files. I think I understand that the S3 key concept makes such distinctions iffy, but the AWS web gui allows users to create folders without files.
The listObject() method defined in com.amazonaws.services.s3.AmazonS3 returns an ObjectListing with a list of S3ObjectSummary for the actual files. Knowing the delimiter, it would be possible to split the keys into a folder hierarchy and filenames, but this appears complicated and error-prone.
Is there an API to get a list of folders without parsing the key property of S3ObjectSummary?
'Objects' in S3 (ie files) do not exist on S3 in actual folders - the 'Keys' for those objects are all at the same level of the hierarchy, and contain slashes which are then displayed to the user on the console as if they were in folders. To find only the 'folders' as opposed to the files, you will need to work out for each object in the bucket whether there are any other files with the same key but with a slash (/) and then more characters. So eg if you had objects with keys like:
a
b
b/c
b/c/d1
b/c/d2
you would only know c is a folder because there are other objects 'inside' it (ie with extra characters after the slash.

Amazon S3, storing large number of files (millions, and many TB of data)

I'll have to store millions of files (many TB in the future) in S3.
Are there any limitations? (not a price :) ), i'm asking about architectural limitations (like - don't store it this way, the other way will be better/faster).
My files are in a hierarchy
/{country}/{number}/{code}/docs
and i checked i can keep them that way (to access them easy thru REST)
(of course i know S3 keeps them internally in other way - not important to me).
So, are there any limitations/pitfalls ?
S3 has no limits that you would hit. The files are not really in folders, they are just strings as locations. Make the folder structure something that is easy for you to keep track of and organize.
You do NOT want to be listing the "folder" contents in S3 to find things.
S3 is slow at giving directory listings, because it's not really directories.
You should be storing either the whole path /{country}/{number}/{code}/docs in a database or the logic should be so repeatable that you can be confident that the file will be in that location.
James Brady gave an excellent and very detailed answer to how s3 treats file storage in a question here https://stackoverflow.com/a/394505/4179009
AWS S3 does definitely have limits to access 100req/sec in case of similar path prefix, see the official docs: http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html
From the other side a hierarchical approach makes logic complicated. A trade off depends on your requirements, one of good options can be using at least 4 symbols length key (primary id or hash key) in front of URL. In case of having limited number countries try using multiple buckets with country code as a bucket name, it also helps to define a specific physical location if required.

S3 — Auto generate folder structure?

I need to store user uploaded files in Amazon S3. I'm new to S3, but as I got from docs, S3 requires of me to specify file upload path in PUT method.
I'm wondering if there is a way to send file to S3, and simply get link for http(s) access? I wish Amazon to handle all headache related to file/folder structure itself. For example, I just pipe from node.js file to S3, and on callback I get http link with no expiration date. And Amazon itself creates smth like /2014/12/01/.../$hash.jpg and just returns me the final link? Such use case looks to be quite common.
Is it possible? If no, could you suggest any options to simplify file storage/filesystem tree structure in S3?
Many thanks.
S3 doesnt' have folders, actually. In a normal filesystem, 2014/12/01/blah.jpg would mean you've got a 2014 folder with a folder called 12 inside it and so on, but in S3 the entire 2014/12/01/blah.jpg it the key - essentially a single long filename. You don't have to create any folders.

Migrate to Amazon S3 - Keeping my hierarchical directories?

I have a Rails 3 app with Paperclip gem.
Actually, my local directories structure is based on my record UUID to stock images:
5D5E5641-FCE8-4D0B-A413-A9F993CD0E34
becomes:
5/D/5/E/5/6/....... 3/4/full/image.jpg
5/D/5/E/5/6/....... 3/4/thumb/image.jpg
so that, I never have more than 32000 nodes per directory.
I want to migrate to S3:
1) Can I keep this directories structure on S3 ? Could it be a perf issue ?
2) Does Amazon S3 has its own directories management per bucket ?
Thanks.
There is no such stuff as folders in Amazon S3. It is a "flat" file system. The closer you can get to folders is adding prefixes like you said: 5/D/image.jpg to your file names. In this case, 5 is a prefix and 5/D is also a prefix. On the other hand, your delimiter could be /.
Even though several S3 tools will show you stuff as if they were contained inside folders, this concept does not exist on S3. Please see this and this related threads.
You can definitely use the pattern you suggested, and I don't think you will suffer any performance penalties by doing so.