List directories in amazon S3 with AWS SDK - amazon-s3

I am trying to list folders in S3:
string delimiter = "/";
folder = "a/";
ListObjectsResponse r = s3Client.ListObjects(new Amazon.S3.Model.ListObjectsRequest()
{
BucketName = BucketName,
Prefix = folder,
MaxKeys = 1000,
Delimiter = delimiter
});
and i expect list of directories such as:
a/Folder1
a/Folder2
....
a/FolderN
but my actual result is only 1 object:
'a1'

Folders are not treated as objects in S3.
Instead, I need to read string[] CommonPrefixes property, which has my subfolders

Related

Pass credentials / .json file stored in S3 bucket to GoogleServiceAccountClient

Here the code to initialize the GoogleRefreshTokenClient using the credentials from json key file.
oauth2_client = oauth2.GoogleServiceAccountClient(key_file, oauth2.GetAPIScope('ad_manager'))
Key_file .json is stored in S3 bucket.
Is there any way to pass .json file (with credentials) stored in s3 to GoogleServiceAccountClient?
Ps. Info to DalmTo stackoverflow member. Do not close or merge this question, please :)
You can read the file from S3 and write it as a json file to your /tmp folder
def readFileFromS3(file_name):
tmp_path = "/tmp/"+file_name
file_path = Path(tmp_path)
if file_path.is_file():
return tmp_path
s3 = boto3.resource(
's3',
aws_access_key_id = <AWS_ACCESS_KEY>,
aws_secret_access_key = <AWS_SECRET>,
region_name = <YOUR_REGION_NAME>
)
content_object = s3.Object(<BUCKET_NAME>, file_name)
file_content = content_object.get()['Body'].read().decode('utf-8')
json_content = json.loads(file_content)
with open(tmp_path, 'w') as res_file:
json.dump(json_content, res_file, indent=4)
return tmp_path
Then use the path returned from the above function in GoogleServiceAccountClient
key_file = readFileFromS3(<key_file_name_in_s3>)
oauth2_client = oauth2.GoogleServiceAccountClient(key_file, oauth2.GetAPIScope('ad_manager'))

How can I get the custom storage filename for uploaded images using UploadCare?

I want to use my own S3 storage and display the image that was just uploaded. How can I get the filename that was uploaded to S3?
For example I can upload a jpg image to UploadCare.
This is the output I can get:
fileInfo.cdnUrl: "https://ucarecdn.com/6314bead-0404-4279-9462-fecc927935c9/"
fileInfo.name: "0 (3).jpg"
But if I check my S3 bucket this is the file name that was actually uploaded to S3: https://localdevauctionsite-us.s3.us-west-2.amazonaws.com/6314bead-0404-4279-9462-fecc927935c9/03.jpg
Here is the javascript I have so far:
var widget = uploadcare.Widget('[role=uploadcare-uploader]');
widget.onChange(group => {
group.files().forEach(file => {
file.done(fileInfo => {
// Try to list the file from aws s3
console.log('CDN url:', fileInfo.cdnUrl);
//https://uploadcare.com/docs/file_uploader_api/files_uploads/
console.log('File name: ', fileInfo.name);
});
});
});
Filenames are sanitized before copying to S3, so the output file name can contain the following characters only:
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_
The following function can be used to get a sanitized file name:
function sanitize(filename) {
var extension = '.' + filename.split('.').pop();
var name = filename.substring(0, filename.length - extension.length);
return name.replace(/[^A-Za-z0-9_]+/g, '') + extension;
}
The final S3 URL can be composed of the S3 base URL, a file UUID, which you can obtain from the fileInfo object, and a sanitized name of the uploaded file.

Use Terraform to create folder and subfolder in s3 bucket

How can I create folder and subfolder in S3 bucket using Terraform?
This is how my current code look like.
resource "aws_s3_bucket_object" "Fruits" {
bucket = "${aws_s3_bucket.s3_bucket_name.id}"
key = "${var.folder_fruits}/"
content_type = "application/x-directory"
}
variable "folder_fruits" {
type = string
}
I would need a folder structure like fruits/apples
Folders in S3 are simply objects that end with a / character. You should be able to create the fruits/apples/ folder with the following Terraform code:
variable "folder_fruits" {
type = string
}
resource "aws_s3_bucket_object" "fruits" {
bucket = "${aws_s3_bucket.s3_bucket_name.id}"
key = "${var.folder_fruits}/"
content_type = "application/x-directory"
}
resource "aws_s3_bucket_object" "apples" {
bucket = "${aws_s3_bucket.s3_bucket_name.id}"
key = "${var.folder_fruits}/apples/"
content_type = "application/x-directory"
}
It is likely that this would also work without the fruits folder.
For more information, see https://docs.aws.amazon.com/AmazonS3/latest/user-guide/using-folders.html
You can create a null object with a prefix that ends with '/'. All objects in a bucket are at the same hierarchy level but AWS displays it like folders using '/' as the separator.
resource "aws_s3_bucket_object" "fruits" {
bucket = "your-bucket"
key = "fruits/"
source = "/dev/null"
resource "aws_s3_bucket_object" "apples" {
bucket = "your-bucket"
key = "fruits/apples/"
source = "/dev/null"
}
It creates the following folder like structure:
s3://your-bucket/fruits/
s3://your-bucket/fruits/apples/

Python boto3 load model tar file from s3 and unpack it

I am using Sagemaker and have a bunch of model.tar.gz files that I need to unpack and load in sklearn. I've been testing using list_objects with delimiter to get to the tar.gz files:
response = s3.list_objects(
Bucket = bucket,
Prefix = 'aleks-weekly/models/',
Delimiter = '.csv'
)
for i in response['Contents']:
print(i['Key'])
And then I plan to extract with
import tarfile
tf = tarfile.open(model.read())
tf.extractall()
But how do I get to the actual tar.gz file from s3 instead of a some boto3 object?
You can download objects to files using s3.download_file(). This will make your code look like:
s3 = boto3.client('s3')
bucket = 'my-bukkit'
prefix = 'aleks-weekly/models/'
# List objects matching your criteria
response = s3.list_objects(
Bucket = bucket,
Prefix = prefix,
Delimiter = '.csv'
)
# Iterate over each file found and download it
for i in response['Contents']:
key = i['Key']
dest = os.path.join('/tmp',key)
print("Downloading file",key,"from bucket",bucket)
s3.download_file(
Bucket = bucket,
Key = key,
Filename = dest
)

S3 Boto3 python - change all files acl to public read

I am trying to change ACL of 500k files within a S3 bucket folder from 'private' to 'public-read'
Is there any way to speed this up?
I am using the below snippet.
from boto3.session import Session
from multiprocessing.pool import ThreadPool
pool = ThreadPool(processes=100)
BUCKET_NAME = ""
aws_access_key_id = ""
aws_secret_access_key = ""
Prefix='pics/'
session = Session(aws_access_key_id=aws_access_key_id, aws_secret_access_key=aws_secret_access_key)
_s3 = session.resource("s3")
_bucket = _s3.Bucket(BUCKET_NAME)
def upload(eachObject):
eachObject.Acl().put(ACL='public-read')
counter = 0
filenames = []
for eachObject in _bucket.objects.filter(Prefix=Prefix):
counter += 1
filenames.append(eachObject)
if counter % 100 == 0:
pool.map(upload, filenames)
print(counter)
if filenames:
pool.map(upload, filenames)
As far as i can tell, without applying the ACL to the entire bucket, there is no way to simply apply the ACL to all items containing the same prefix without iterating through each item like below:
bucketName='YOUR_BUCKET_NAME'
prefix="YOUR_FOLDER_PREFIX"
s3 = boto3.resource('s3')
bucket = s3.Bucket(bucketName)
[obj.Acl().put(ACL='public-read') for obj in bucket.objects.filter(Prefix=prefix).all()]