How to read trained data file in S3 - amazon-s3

I'm trying to make face recognition service using AWS Lambda.
I want to deploy .zip file including trained data file.
But, AWS Lambda don't deploy it because of its size.
So, I change the way. Upload trained data file to S3 and use it.
But, I don't know how to do it.
Could you tell me the way to read trained data file in S3, in AWS Lambda function?

Once you have the data in S3, you can copy the file from S3 into lambda. Lambda provides 512 MB of storage in the tmp folder that is writable at run-time.
import boto3
s3 = boto3.resource('s3')
s3.meta.client.download_file('mybucket', 'hello.txt', '/tmp/hello.txt')
https://docs.aws.amazon.com/lambda/latest/dg/limits.html
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.download_file

Related

AWS SageMaker Notebook's Default S3 Bucket - Cant Access Uploaded Files within Notebook

In SageMaker Studio, I created directories and uploaded files to my SageMaker's default S3 bucket using the GUI, and was exploring how to work with those uploaded files using a SageMaker Studio Notebook.
Within the SageMaker Studio Notebook, I ran
sess = sagemaker.Session()
bucket = sess.default_bucket() #sagemaker-abcdef
prefix = "folderJustBelowRoot"
conn = boto3.client('s3')
conn.list_objects(Bucket=bucket, Prefix=prefix)
# this returns a response dictionary with the corresponding metadata, which includes 'HTTPStatusCode': 200, 'server': 'AmazonS3' => which means the request-response was successful
What I dont understand is why the 'Contents' key and its value are missing from the 'conn.list_objects' dictionary response?
And when I go to 'my SageMaker's default bucket' in the S3 console, I am wondering why my uploaded files are not appearing.
===============================================================
I was expecting
the response from conn.list_objects(Bucket=bucket, Prefix=prefix) to contain the 'Contents' key (within my SageMaker Studio Notebook)
the S3 console to show the files I uploaded to 'my SageMaker's default bucket'
Question 2: And when I go to 'my SageMaker's default bucket' in the S3 console, I am wondering why my uploaded files are not appearing.
It seems that when you upload files from your local desktop/laptop onto AWS SageMaker Studio using the GUI, your files are in the the Elastic Block Storage/EBS of your SageMaker Studio instance.
To access the following items within your SageMaker Studio instance:
Folder Path - "subFolderLayer1/subFolderLayer2/subFolderLayer3" => to access 'subFolderLayer3'
File Path - "subFolderLayer1/subFolderLayer2/subFolderLayer3/fileName.extension" => to access 'fileName.extension' within your subFolderLayers
=========
To access the files on the default S3 storage bucket for your AWS SageMaker instance, first identify it by
sess = sagemaker.Session()
bucket = sess.default_bucket() #sagemaker-abcdef
Then go to the bucket and upload your files and folders. When you have done that, move to the response for question 1.
=================================================================
Question 1: What I dont understand is why the 'Contents' key and its value are missing from the 'conn.list_objects' dictionary response?
prefix = "folderJustBelowYourBucket"
conn = boto3.client('s3')
conn.list_objects(Bucket=bucket, Prefix=prefix)
The 'conn.list_objects' dictionary response now contains a 'Contents' key, containing a list of metadata as its values - 1 metadata dictionary for each file/folder within that 'prefix'/'folderJustBelowYourBucket'.
You can upload and download files from Amazon SageMaker to Amazon S3 using SageMaker Python SDK. SageMaker S3 utilities provides S3Uploader and S3Downloader classes to easily work with S3 from within SageMaker studio notebooks.
A comment about the 'file system' in your question 2, the files are stored onto SageMaker Studio user profile Amazon Elastic File System (Amazon EFS) volume, and not EBS(SageMaker classic notebooks uses EBS volumes). Refer this blog for more detailed overview of SageMaker architecture

How to make ohif look at s3 for loading studies

I have built object storage plugin to store orthanc data in s3 bucket in legacy mode. I am now trying to eliminate local storage of files of orthanc and move it to s3 completely. I also have OHIF viewer integrated which is serving orthanc data, How do I make it fetch from s3 bucket? I have read that json file of dicom file can be used to do this, but I dont know how to do that because the json file has url of each instance in s3 bucket. How do i generate this json file if this is the way to do it?

how to access text file from s3 bucket into sagemaker for training a model?

I am trying to train chatbot model using tensorflow and seq to seq architecture using sagemaker also I have completed coding in spyder but when
I am trying to access cornel movie corpus dataset from s3 bucket into sagemaker it says no such file or directory even granting access to s3 bucket
if you're in a notebook: aws s3 cp s3://path_to_the_file /home/ec2-user/SageMaker will copy data from s3 to your SageMaker directory in the notebook (if you have the IAM permissions to do so)
if you're in the docker container of a SageMaker training job: you need to pass the s3 path to the SDK training call: estimator.fit({'mydata':'s3://path_to_the_file'}) and in the docker your tensorflow code must read from this path: opt/ml/input/data/mydata

How would cv2 access videos in AWS s3 bucket

I can use the SageMaker notebook now. But here is a significant problem. When I wanted to use cv2.VideoCapture to read the video in the s3 bucket. It said the path doesn't exist. One answer in Stackoverflow said cv2 only supports local files, which means we have to download videos from s3 bucket to notebook but I don't want to do this. I wonder how you read the videos? Thanks.
I found one solution is to use CloudFront but would this be charged and is it fast?
You are using Python in SageMaker, so you could use:
import boto3
s3_client = boto3.client('s3')
s3_client.download_file('deepfake2020', 'dfdc_train_part_1/foo.mp4', foo.mp4')
This will download the file from Amazon S3 to the local disk, in a file called foo.mp4.
See: download_file() in boto3
This requires that the SageMaker instance has been granted permissions to access the Amazon S3 bucket.
This solution is also working.
To use AWS SageMaker,
1) go to Support Center to ask to improve notebook instance limit. They will reply in 1 day normally.
2) When creating a notebook, change local disk size to 1TB (double data size).
3) Open Jupyter lab and type cd SageMaker on terminal
4) Use CurlWget to get the download link of the dataset.
5) After downloading, unzip data
unzip dfdc_train_all.zip
unzip '*.zip'
There you go.

How to load a .hdf5 model in S3 to EC2 directly?

I used keras created a .hdf5 model and stored it in S3. Now I want to load this model from S3 to EC2 directly? I wonder what is the best practice to do this? I mean I can aws s3 cp to move the model down from s3 to ec2 but I feel like there should be better way to do it.
I know I can use load_model (from keras.model) but the file have to be on EC2.
Also, how do I directly write thing from ec2 to s3 instead of copying them to s3?
Thank you!