How to load a .hdf5 model in S3 to EC2 directly? - amazon-s3

I used keras created a .hdf5 model and stored it in S3. Now I want to load this model from S3 to EC2 directly? I wonder what is the best practice to do this? I mean I can aws s3 cp to move the model down from s3 to ec2 but I feel like there should be better way to do it.
I know I can use load_model (from keras.model) but the file have to be on EC2.
Also, how do I directly write thing from ec2 to s3 instead of copying them to s3?
Thank you!

Related

How would cv2 access videos in AWS s3 bucket

I can use the SageMaker notebook now. But here is a significant problem. When I wanted to use cv2.VideoCapture to read the video in the s3 bucket. It said the path doesn't exist. One answer in Stackoverflow said cv2 only supports local files, which means we have to download videos from s3 bucket to notebook but I don't want to do this. I wonder how you read the videos? Thanks.
I found one solution is to use CloudFront but would this be charged and is it fast?
You are using Python in SageMaker, so you could use:
import boto3
s3_client = boto3.client('s3')
s3_client.download_file('deepfake2020', 'dfdc_train_part_1/foo.mp4', foo.mp4')
This will download the file from Amazon S3 to the local disk, in a file called foo.mp4.
See: download_file() in boto3
This requires that the SageMaker instance has been granted permissions to access the Amazon S3 bucket.
This solution is also working.
To use AWS SageMaker,
1) go to Support Center to ask to improve notebook instance limit. They will reply in 1 day normally.
2) When creating a notebook, change local disk size to 1TB (double data size).
3) Open Jupyter lab and type cd SageMaker on terminal
4) Use CurlWget to get the download link of the dataset.
5) After downloading, unzip data
unzip dfdc_train_all.zip
unzip '*.zip'
There you go.

How to read trained data file in S3

I'm trying to make face recognition service using AWS Lambda.
I want to deploy .zip file including trained data file.
But, AWS Lambda don't deploy it because of its size.
So, I change the way. Upload trained data file to S3 and use it.
But, I don't know how to do it.
Could you tell me the way to read trained data file in S3, in AWS Lambda function?
Once you have the data in S3, you can copy the file from S3 into lambda. Lambda provides 512 MB of storage in the tmp folder that is writable at run-time.
import boto3
s3 = boto3.resource('s3')
s3.meta.client.download_file('mybucket', 'hello.txt', '/tmp/hello.txt')
https://docs.aws.amazon.com/lambda/latest/dg/limits.html
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.download_file

Copy file from AWS S3 bucket to Google cloud storage bucket

I am trying to copy file from AWS S3 bucket to Google Storage bucket. I am trying to implement with python api. Please help me if anyone done this.
Thanks in advance.
You can use gsutil to do this.
You can also use the GCS Transfer Service.

Merging pdf files stored on Amazon S3

Currently I'm using pdfbox to download all my pdf files on my server and then using pdfbox to merge them together. It's working perfectly fine but it's very slow--since I have to download them all.
Is there a way to perform all of this on S3 directly? I'm trying to find a way to do it, even if not in java also in python and unable to do so.
I read the following:
Merging files on S3 Amazon
https://github.com/boazsegev/combine_pdf/issues/18
Is there a way to merge files stored in S3 without having to download them?
EDIT
The way I ended up doing it was using concurrent.futures and implementing it with concurrent.futures.ThreadPoolExecutor. I set a maximum of 8 worker threads to download all the pdf files from s3.
Once all files were downloaded I merged them with pdfbox. Simple.
S3 is just a data store, so at some level you need to transfer the PDF files from S3 to a server and then back. You'll probably gain the best speed by doing your conversions on an EC2 instance located in the same region as your S3 bucket.
If you don't want to spin up an EC2 instance yourself just to do this then another alternative may be to make use of AWS Lambda, which is a compute service where you can upload your code and have AWS manage the execution of it.

block file system on S3

i am a little puzzled i hope someone can help me out.
we create some ORC-Files that we would like to query while they are stored on S3.
We noticed that the S3 native Filesystem S3n does not really work out for this manner. I am not really sure what the problem is - but my guess is, that the reader is not able to jump to specific bytes inside the file so that he has to load the whole file before he can query it.
So we tried storing the files on S3 (uri s3://) which is a block file system just like HDFS backed by s3 and it worked great.
But i am a little worried after reading up on this source about Amazon EMR which says
Amazon S3 block file system (URI path: s3bfs://)
The Amazon S3 block file system is a legacy file storage system. We strongly discourage the use of this system.
Important
We recommend that you do not use this file system because it can trigger a race condition that might cause your cluster to fail. However, it might be required by legacy applications.
EMRFS (URI path: s3://)
EMRFS is an implementation of HDFS used for reading and writing regular files from Amazon EMR directly to Amazon S3.
I am not using EMR - i create my files by launching an EC2 cluster and then use s3 as a cold storage - but I am kind of puzzled right now and not sure which filesystem I use when I store my files on s3 using the URI scheme s3:// - do i use EMRFS or do i use the deprecated s3bfs filesystem?
Amazon S3 is an object storage system. It is not recommended to "mount" S3 as a filesystem. Amazon Elastic Block Store (EBS) is a block storage system that appears as volumes on Amazon EC2 instances.
When used from Amazon Elastic MapReduce (EMR), Hadoop has extensions that make it easy to work with Amazon S3. However, if you are not using EMR, there is no need to use EMRFS (which is available only on EMR), nor should you use S3 as a block storage system.
The easiest way to use S3 from EC2 is via the AWS Command-Line Interface (CLI). You can copy files to/from S3 by using the aws s3 cp command. There's also a sync command to make it easy to syncrhonize data to/from S3.
You can also programmatically connect to Amazon S3 via an SDK, so that your app can directly transfer files to/from S3.
As to which to choose... typically, applications like to work with files on a local filesystem, so copy your files from S3 to a local device. However, if your app can directly communicate with S3, there will be less "moving parts".