How to get information of S3 bucket? - amazon-s3

Say for example I have the following bucket set up:
bucketone
…/folderone
…/text1.txt
…/text2.txt
…/foldertwo
…/file1.json
…/folderthree
…/folderthreesub
…/file2.json
…/file3.json
But it only goes down one level.
What’s the proper way of retrieving information under a bucket?
Will be sure to accept/upvote answer.

Whats wrong with just doing this from the CLI?
aws s3 cp s3://bucketing . --recursive

Contrary to the way you'd think it will work, rsplit() actually returns the splits from left-right, even though it applies it right-to-left.
Therefore, you actually want to obtain the last element of the split:
filename = obj['Key'].rsplit('/', 1)[-1]
See: Python rsplit() documentation
Also, be careful of 'pretend directories' that might be created via the console. They are actually zero-length files the make the folder appear in the UI. Therefore, skip files with no name after the final slash.
Make those fixes and it works as desired:
import boto3
import os
s3client = boto3.client('s3')
for obj in s3client.list_objects_v2(Bucket='my-bucket')['Contents']:
filename = obj['Key'].rsplit('/', 1)[-1]
localfiledir = os.path.join('/tmp', filename)
if filename != '':
s3client.download_file('my-bucket', obj['Key'], localfiledir)

Related

AWS structure of S3 trigger

I am building a Python Lambda in AWS and wanted to add an S3 trigger to it. Following these instructions I saw how to get the bucket and key on which I got the trigger using:
def func(event):
bucket = event['Records'][0]['s3']['bucket']['name']
key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')
There is an example of such an object in the link, but I wasn't able, however, to find a description of the entire event object anywhere in AWS' documentation.
Is there a documentation for this object's structure? Where might I find it?
You can find documentation about the whole object in the S3 documentation:
https://docs.aws.amazon.com/AmazonS3/latest/userguide/notification-content-structure.html
I would also advise to iterate the records, because there could be multiple at once:
for record in event['Records']:
bucket = record['s3']['bucket']['name']
key = record['s3']['object']['key']
[...]

Unable to figure out why my code is printing "None". even though it should bring values

I am trying to access S3 object and when i am printing the data objects, it prints "None" basically I am unable to move beyond that point since my "if" condition is failing. Could anyone please assist
if s3_resource.Bucket(s3_bucket).creation_date:
print("UTKARSH")
this piece of code which you have specified is correct. Check whether you are able to access the object first and then go for if condition.
s3_object = boto.resource('s3').Bucket("Your Bucket Name")
or else you can also get the bucket info using s3 client like:
busket_list = boto.client('s3').list_buckets()
try to get the details first then you can filter the details according to your need.
Creation_date is one of the property of s3 bucket which stores the date of creation of that particular bucket.
What is your s3_bucket? is that String? Check my code.
import boto3
session = boto3.session.Session(region_name='ap-northeast-2')
s3 = session.resource('s3')
if s3.Bucket('seoul-dev-datalake').creation_date:
print(s3.Bucket('my-bucket-name').name, 'Done')
It gives the correct result.
my-bucket-name Done

How to download multiple files via Flask and boto3 from S3

I have a list of .zip files on S3 which is dynamically created and passed to flask view /download. My problem doesn't seem to be in looping through this list, but rather in returning a response so that all files from the list are downloaded to users computer. If I have a view return a response it only downloads the first file, as return closes the connection.
I have tried a number of things and looked at similar issues (like this one: Boto3 to download all files from a S3 Bucket ), but so far had no luck in resolving this. I have also looked at streaming (as in here: http://flask.pocoo.org/docs/1.0/patterns/streaming/ ) and tried creating a subfunction which is a generator, but the same issue persists as I still have to pass a return value to View function - here is that last code example:
#app.route('/download', methods=['POST'])
def download():
download=[]
download = request.form.getlist('checked')
def generate(result):
s3_resource = boto3.resource('s3')
my_bucket = s3_resource.Bucket(S3_BUCKET)
d_object = my_bucket.Object(result).get()
yield d_object['Body'].read()
for filename in download:
return Response (generate(filename), mimetype='application/zip', headers={'Content-Disposition': 'attachment;filename=' + filename})
What would be the best way of doing this so that all the files are downloaded?
Is there an easier way to pass a list to boto3 or any other flask module to download those files?

Using Leigh version of S3Wrapper.cfc Can't get past Init

I am new to S3 and need to use it for image storage. I found a half dozen versions of an s2wrapper for cf but it appears that the only one set of for v4 is one modified by Leigh
https://gist.github.com/Leigh-/26993ed79c956c9309a9dfe40f1fce29
Dropped in the com directory and created a "test" page that contains the following code:
s3 = createObject('component','com.S3Wrapper').init(application.s3.AccessKeyId,application.s3.SecretAccessKey);
but got the following error :
So I changed the line 37 from
variables.Sv4Util = createObject('component', 'Sv4').init(arguments.S3AccessKey, arguments.S3SecretAccessKey);
to
variables.Sv4Util = createObject('component', 'Sv4Util').init(arguments.S3AccessKey, arguments.S3SecretAccessKey);
Now I am getting:
I feel like going through Leigh code and start changing things is a bad idea since I have lurked here for year an know Leigh's code is solid.
Does any know if there are any examples on how to use this anywhere? If not what I am doing wrong. If it makes a difference I am using Lucee 5 and not Adobe's CF engine.
UPDATE :
I followed Leigh's directions and the error is now gone. I am addedsome more code to my test page which now looks like this :
<cfscript>
s3 = createObject('component','com.S3v4').init(application.s3.AccessKeyId,application.s3.SecretAccessKey);
bucket = "imgbkt.domain.com";
obj = "fake.ping";
region = "s3-us-west-1"
test = s3.getObject(bucket,obj,region);
writeDump(test);
test2 = s3.getObjectLink(bucket,obj,region);
writeDump(test2);
writeDump(s3);
</cfscript>
Regardless of what I put in for bucket, obj or region I get :
JIC I did go to AWS and get new keys:
Leigh if you are still around or anyone how has used one of the s3Wrappers any suggestions or guidance?
UPDATE #2:
Even after Alex's help I am not able to get this to work. The Link I receive from getObjectLink is not valid and getObject never does download an object. I thought I would try the putObject method
test3 = s3.putObject(bucketName=bucket,regionName=region,keyName="favicon.ico");
writeDump(test3);
to see if there is any additional information, I received this :
I did find this article https://shlomoswidler.com/2009/08/amazon-s3-gotcha-using-virtual-host.html but it is pretty old and since S3 specifically suggests using dots in bucketnames I don't that it is relevant any longer. There is obviously something I am doing wrong but I have spent hours trying to resolve this and I can't seem to figure out what it might be.
I will give you a rundown of what the code does:
getObjectLink returns a HTTP URL for the file fake.ping that is found looking in the bucket imgbkt.domain.com of region s3-us-west-1. This link is temporary and expires after 60 seconds by default.
getObject invokes getObjectLink and immediately requests the URL using HTTP GET. The response is then saved to the directory of the S3v4.cfc with the filename fake.ping by default. Finally the function returns the full path of the downloaded file: E:\wwwDevRoot\taa\fake.ping
To save the file in a different location, you would invoke:
downloadPath = 'E:\';
test = s3.getObject(bucket,obj,region,downloadPath);
writeDump(test);
The HTTP request is synchronous, meaning the file will be downloaded completely when the functions returns the filepath.
If you want to access the actual content of the file, you can do this:
test = s3.getObject(bucket,obj,region);
contentAsString = fileRead(test); // returns the file content as string
// or
contentAsBinary = fileReadBinary(test); // returns the content as binary (byte array)
writeDump(contentAsString);
writeDump(contentAsBinary);
(You might want to stream the content if the file is large since fileRead/fileReadBinary reads the whole file into buffer. Use fileOpen to stream the content.
Does that help you?

Locally calculate dropbox hash of files

Dropbox rest api, in function metatada has a parameter named "hash" https://www.dropbox.com/developers/reference/api#metadata
Can I calculate this hash locally without call any remote api rest function?
I need know this value to reduce upload bandwidth.
https://www.dropbox.com/developers/reference/content-hash explains how Dropbox computes their file hashes. A Python implementation of this is below:
import hashlib
import math
import os
DROPBOX_HASH_CHUNK_SIZE = 4*1024*1024
def compute_dropbox_hash(filename):
file_size = os.stat(filename).st_size
with open(filename, 'rb') as f:
block_hashes = b''
while True:
chunk = f.read(DROPBOX_HASH_CHUNK_SIZE)
if not chunk:
break
block_hashes += hashlib.sha256(chunk).digest()
return hashlib.sha256(block_hashes).hexdigest()
The "hash" parameter on the metadata call isn't actually the hash of the file, but a hash of the metadata. It's purpose is to save you having to re-download the metadata in your request if it hasn't changed by supplying it during the metadata request. It is not intended to be used as a file hash.
Unfortunately I don't see any way via the Dropbox API to get a hash of the file itself. I think your best bet for reducing your upload bandwidth would be to keep track of the hash's of your files locally and detect if they have changed when determining whether to upload them. Depending on your system you also likely want to keep track of the "rev" (revision) value returned on the metadata request so you can tell whether the version on Dropbox itself has changed.
This won't directly answer your question, but is meant more as a workaround; The dropbox sdk gives a simple updown.py example that uses file size and modification time to check the currency of a file.
an abbreviated example taken from updown.py:
dbx = dropbox.Dropbox(api_token)
...
# returns a dictionary of name: FileMetaData
listing = list_folder(dbx, folder, subfolder)
# name is the name of the file
md = listing[name]
# fullname is the path of the local file
mtime = os.path.getmtime(fullname)
mtime_dt = datetime.datetime(*time.gmtime(mtime)[:6])
size = os.path.getsize(fullname)
if (isinstance(md, dropbox.files.FileMetadata) and mtime_dt == md.client_modified and size == md.size):
print(name, 'is already synced [stats match]')
As far as I am concerned, No you can't.
The only way is using Dropbox API which is explained here.
The rclone go program from https://rclone.org has exactly what you want:
rclone hashsum dropbox localfile
rclone hashsum dropbox localdir
It can't take more than one path argument but I suspect that's something you can work with...
t0|todd#tlaptop/p8 ~/tmp|295$ echo "Hello, World!" > dropbox-hash-demo/hello.txt
t0|todd#tlaptop/p8 ~/tmp|296$ rclone copy dropbox-hash-demo/hello.txt dropbox-ttf:demo
t0|todd#tlaptop/p8 ~/tmp|297$ rclone hashsum dropbox dropbox-hash-demo
aa4aeabf82d0f32ed81807b2ddbb48e6d3bf58c7598a835651895e5ecb282e77 hello.txt
t0|todd#tlaptop/p8 ~/tmp|298$ rclone hashsum dropbox dropbox-ttf:demo
aa4aeabf82d0f32ed81807b2ddbb48e6d3bf58c7598a835651895e5ecb282e77 hello.txt