How would cv2 access videos in AWS s3 bucket - amazon-s3

I can use the SageMaker notebook now. But here is a significant problem. When I wanted to use cv2.VideoCapture to read the video in the s3 bucket. It said the path doesn't exist. One answer in Stackoverflow said cv2 only supports local files, which means we have to download videos from s3 bucket to notebook but I don't want to do this. I wonder how you read the videos? Thanks.
I found one solution is to use CloudFront but would this be charged and is it fast?

You are using Python in SageMaker, so you could use:
import boto3
s3_client = boto3.client('s3')
s3_client.download_file('deepfake2020', 'dfdc_train_part_1/foo.mp4', foo.mp4')
This will download the file from Amazon S3 to the local disk, in a file called foo.mp4.
See: download_file() in boto3
This requires that the SageMaker instance has been granted permissions to access the Amazon S3 bucket.

This solution is also working.
To use AWS SageMaker,
1) go to Support Center to ask to improve notebook instance limit. They will reply in 1 day normally.
2) When creating a notebook, change local disk size to 1TB (double data size).
3) Open Jupyter lab and type cd SageMaker on terminal
4) Use CurlWget to get the download link of the dataset.
5) After downloading, unzip data
unzip dfdc_train_all.zip
unzip '*.zip'
There you go.

Related

How to read trained data file in S3

I'm trying to make face recognition service using AWS Lambda.
I want to deploy .zip file including trained data file.
But, AWS Lambda don't deploy it because of its size.
So, I change the way. Upload trained data file to S3 and use it.
But, I don't know how to do it.
Could you tell me the way to read trained data file in S3, in AWS Lambda function?
Once you have the data in S3, you can copy the file from S3 into lambda. Lambda provides 512 MB of storage in the tmp folder that is writable at run-time.
import boto3
s3 = boto3.resource('s3')
s3.meta.client.download_file('mybucket', 'hello.txt', '/tmp/hello.txt')
https://docs.aws.amazon.com/lambda/latest/dg/limits.html
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.download_file

How to create and interaction between google drive and aws s3?

I'm trying to set up a connection a Google Drive folder and S3 bucket, but I'm not sure where to start.
I've already created a sort of "Frankenstein process", but it's easy to use only by me and sharing it to my co-workers it's a pain.
I have a script that generates a plain text file and saves it into a drive folder. And to upload, I've installed Drive file stream to save it in my mac, then all I did was create a script using Python3, with the boto3 library, to upload the text file into different s3 buckets depending on the file name.
I was thinking that I can create a lambda to process the file into the s3 buckets but I cannot resolve how to create the connection between drive and s3. I would appreciate if someone could give me a piece of advise on how to start with this.
Thanks
if you just simply want to connect google drive and aws s3 there is one service name zapier which provide different type of integration without line of code
https://zapier.com/apps/amazon-s3/integrations/google-drive
For more details you can check this link out

How to list several specific files from amazon S3 quickly

I want to check if some files are really in my s3 bucket. Using aws cli I can do it for one files with ls like
aws s3 ls s3://mybucket/file/path/file001.jpg
but I need to be able to do it for several files
aws s3 ls s3://mybucket/file/path/file001.jpg ls s3://mybucket/file/path/file002.jpg
Won't work
nor
aws s3 ls s3://mybucket/file/path/file001.jpg s3://mybucket/file/path/file005.jpg
Of course
aws s3 ls s3://mybucket/file/path/file001.jpg;
aws s3 ls s3://mybucket/file/path/file005.jpg
Works perfectly, but slowly. It takes about 1 sec to get one file, because it connects an close the connection each time.
I've hundreds of files to check on a regular basis, so I need a fast way to do it. Thanks
I'm not insisting on using ls, or passing a path, a "find" of the filenames would also do (but aws cli seems to lack find). Another tool (as long as it can be invoked with the command line), will be ok
I don’t want to get a list of all files or have a script looking at all files and then post process. I need a way to ask s3 give me fila a,r,z in one go.
I think s3api listobjects call should be the one but I fail at its syntax to ask several file names at once.
You can easily do that using python boto3 sdk for AWS
import boto3
s3 = boto3.resource('s3')
bucket=s3.Bucket('mausamrest');
for obj in bucket.objects.all():
print(obj.key)
where mausamrest is the bucket

AWS S3 auto save folder

Is there a way I can autosave autocad files or changes on the autocad files directly to S3 Bucket?, probably an API I can utilize for this workflow?
While I was not able to quickly find a plug in that does that for you, what you can do is one of the following:
Mount S3 bucket as a drive. You can read more at CloudBerry Drive - Mount S3 bucket as Windows drive
This might create some performance issues with AutoCad.
Sync saved files to S3
You can set a script to run every n minutes that automatically syncs your files to S3 using aws s3 sync. You can read more about AWS S3 Sync here. Your command might look something like
aws s3 sync /path/to/cad/files s3://bucket-with-cad/project/test

Merging pdf files stored on Amazon S3

Currently I'm using pdfbox to download all my pdf files on my server and then using pdfbox to merge them together. It's working perfectly fine but it's very slow--since I have to download them all.
Is there a way to perform all of this on S3 directly? I'm trying to find a way to do it, even if not in java also in python and unable to do so.
I read the following:
Merging files on S3 Amazon
https://github.com/boazsegev/combine_pdf/issues/18
Is there a way to merge files stored in S3 without having to download them?
EDIT
The way I ended up doing it was using concurrent.futures and implementing it with concurrent.futures.ThreadPoolExecutor. I set a maximum of 8 worker threads to download all the pdf files from s3.
Once all files were downloaded I merged them with pdfbox. Simple.
S3 is just a data store, so at some level you need to transfer the PDF files from S3 to a server and then back. You'll probably gain the best speed by doing your conversions on an EC2 instance located in the same region as your S3 bucket.
If you don't want to spin up an EC2 instance yourself just to do this then another alternative may be to make use of AWS Lambda, which is a compute service where you can upload your code and have AWS manage the execution of it.