Uploading Image from google drive to amazon S3 using app script - amazon-s3

I am integrating google form to our backend system. In the form, we are accepting images on google drive. I am trying to move google drive images to s3 whenever a form is submitted.
I am using this to fetch an image from google drive.
var driveFile = DriveApp.getFileById("imageId");
For uploading image to S3, I am using app script library S3-for-Google-Apps-Script. File is being uploaded to s3 but format is not correct.
code for uploading image to S3 is
var s3 = S3.getInstance(awsAccessKeyId, awsSecretKey);
s3.putObject(bucket, "file name", driveFile.getBlob(), {logRequests:true});
I am not able to open image after downloading from s3.
Getting error "It may be damaged or use a file format that Preview doesn’t recognize."
Thanks in Advance.

Do
pip install boto3 googledrivedownloader requests
first,
Then use the code given below:
import boto3
from google_drive_downloader import GoogleDriveDownloader as gdd
import os
ACCESS_KEY = 'get-from-aws'
SECRET_KEY = 'get-from-aws'
SESSION_TOKEN = 'not-mandatory'
REGION_NAME = 'ap-southeast-1'
BUCKET = 'dev-media-uploader'
def drive_to_s3_download(drive_url) :
if "drive.google" not in drive_url:
return drive_url #since its not a drive url.
client = boto3.client(
's3',
aws_access_key_id=ACCESS_KEY,
aws_secret_access_key=SECRET_KEY,
region_name=REGION_NAME,
aws_session_token=SESSION_TOKEN # Optional
)
file_id = drive_url.split('/')[5]
print(file_id)
gdd.download_file_from_google_drive(file_id=file_id,
dest_path=f'./{file_id}.jpg',
unzip=True)
client.upload_file(Bucket=BUCKET, Key=f"{file_id}.jpg", Filename=f'./{file_id}.jpg')
os.remove(f'./{file_id}.jpg')
return f'https://{BUCKET}.s3.amazonaws.com/{file_id}.jpg'
client = boto3.client(
's3',
aws_access_key_id=ACCESS_KEY,
aws_secret_access_key=SECRET_KEY,
region_name=REGION_NAME,
aws_session_token=SESSION_TOKEN # Optional
)
# client.download_file(Bucket='test-bucket-drive-to-s3-upload', Key=, Filename=f'./test.jpg')
print(drive_to_s3_download('https://drive.google.com/file/d/1Wlr1PdAv8nX0qt_PWi0SJpx0IYgQDYG6/view?usp=sharing'))
The above code downloads drive file into local, then upload into S3 and then returns the S3 url, using which file can be viewed by anyone, based on permission.

Related

How to configure my credentials s3 in heroku [duplicate]

On boto I used to specify my credentials when connecting to S3 in such a way:
import boto
from boto.s3.connection import Key, S3Connection
S3 = S3Connection( settings.AWS_SERVER_PUBLIC_KEY, settings.AWS_SERVER_SECRET_KEY )
I could then use S3 to perform my operations (in my case deleting an object from a bucket).
With boto3 all the examples I found are such:
import boto3
S3 = boto3.resource( 's3' )
S3.Object( bucket_name, key_name ).delete()
I couldn't specify my credentials and thus all attempts fail with InvalidAccessKeyId error.
How can I specify credentials with boto3?
You can create a session:
import boto3
session = boto3.Session(
aws_access_key_id=settings.AWS_SERVER_PUBLIC_KEY,
aws_secret_access_key=settings.AWS_SERVER_SECRET_KEY,
)
Then use that session to get an S3 resource:
s3 = session.resource('s3')
You can get a client with new session directly like below.
s3_client = boto3.client('s3',
aws_access_key_id=settings.AWS_SERVER_PUBLIC_KEY,
aws_secret_access_key=settings.AWS_SERVER_SECRET_KEY,
region_name=REGION_NAME
)
This is older but placing this here for my reference too. boto3.resource is just implementing the default Session, you can pass through boto3.resource session details.
Help on function resource in module boto3:
resource(*args, **kwargs)
Create a resource service client by name using the default session.
See :py:meth:`boto3.session.Session.resource`.
https://github.com/boto/boto3/blob/86392b5ca26da57ce6a776365a52d3cab8487d60/boto3/session.py#L265
you can see that it just takes the same arguments as Boto3.Session
import boto3
S3 = boto3.resource('s3', region_name='us-west-2', aws_access_key_id=settings.AWS_SERVER_PUBLIC_KEY, aws_secret_access_key=settings.AWS_SERVER_SECRET_KEY)
S3.Object( bucket_name, key_name ).delete()
I'd like expand on #JustAGuy's answer. The method I prefer is to use AWS CLI to create a config file. The reason is, with the config file, the CLI or the SDK will automatically look for credentials in the ~/.aws folder. And the good thing is that AWS CLI is written in python.
You can get cli from pypi if you don't have it already. Here are the steps to get cli set up from terminal
$> pip install awscli #can add user flag
$> aws configure
AWS Access Key ID [****************ABCD]:[enter your key here]
AWS Secret Access Key [****************xyz]:[enter your secret key here]
Default region name [us-west-2]:[enter your region here]
Default output format [None]:
After this you can access boto and any of the api without having to specify keys (unless you want to use a different credentials).
If you rely on your .aws/credentials to store id and key for a user, it will be picked up automatically.
For instance
session = boto3.Session(profile_name='dev')
s3 = session.resource('s3')
This will pick up the dev profile (user) if your credentials file contains the following:
[dev]
aws_access_key_id = AAABBBCCCDDDEEEFFFGG
aws_secret_access_key = FooFooFoo
region=op-southeast-2
There are numerous ways to store credentials while still using boto3.resource().
I'm using the AWS CLI method myself. It works perfectly.
https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html?fbclid=IwAR2LlrS4O2gYH6xAF4QDVIH2Q2tzfF_VZ6loM3XfXsPAOR4qA-pX_qAILys
you can set default aws env variables for secret and access keys - that way you dont need to change default client creation code - though it is better to pass it as a parameter if you have non-default creds

How to integrate CEPH with Amazon-S3?

I'm trying to adapt the open-source project mmfashion on Amazon SageMaker that requires the CEPH module for backend. Unfortunately pip install ceph doesn't work. The only work-around was to build the ceph source-code manually by running in my container:
!git clone git://github.com/ceph/ceph
!git submodule update --init --recursive
This does allow me to import ceph successfully. But it throws the following error when it comes to fecthing data from Amazon S3:
AttributeError: module 'ceph' has no attribute 'S3Client'
Has someone integrated CEPH with Amazon S3 Bucket or has suggestions in the same line on how to tackle this?
you can use ceph S3 api to connect to AWS buckets , here is the simple python example script to connect to any S3 api :
import boto
import boto.s3.connection
access_key = 'put your access key here!'
secret_key = 'put your secret key here!'
conn = boto.connect_s3(
aws_access_key_id = access_key,
aws_secret_access_key = secret_key,
host = 'objects.dreamhost.com',
#is_secure=False, # uncomment if you are not using ssl
calling_format = boto.s3.connection.OrdinaryCallingFormat(),
)
then you will be able to list the buckets :
for bucket in conn.get_all_buckets():
print "{name}\t{created}".format(
name = bucket.name,
created = bucket.creation_date,
)

Configuring Google cloud bucket as Airflow Log folder

We just started using Apache airflow in our project for our data pipelines .While exploring the features came to know about configuring remote folder as log destination in airflow .For that we
Created a google cloud bucket.
From Airflow UI created a new GS connection
I am not able to understand all the fields .I just created a sample GS Bucket under my project from google console and gave that project ID to this Connection.Left key file path and scopes as blank.
Then edited airflow.cfg file as follows
remote_base_log_folder = gs://my_test_bucket/
remote_log_conn_id = test_gs
After this changes restarted the web server and scheduler .But still my Dags is not writing logs to the GS bucket .I am able to see the logs which is creating logs in base_log_folder .But nothing is created in my bucket .
Is there any extra configuration needed from my side to get it working
Note: Using Airflow 1.8 .(Same issue I faced with AmazonS3 also. )
Updated on 20/09/2017
Tried the GS method attaching screenshot
Still I am not getting logs in the bucket
Thanks
Anoop R
I advise you to use a DAG to connect airflow to GCP instead of UI.
First, create a service account on GCP and download the json key.
Then execute this DAG (you can modify the scope of your access):
from airflow import DAG
from datetime import datetime
from airflow.operators.python_operator import PythonOperator
def add_gcp_connection(ds, **kwargs):
"""Add a airflow connection for GCP"""
new_conn = Connection(
conn_id='gcp_connection_id',
conn_type='google_cloud_platform',
)
scopes = [
"https://www.googleapis.com/auth/pubsub",
"https://www.googleapis.com/auth/datastore",
"https://www.googleapis.com/auth/bigquery",
"https://www.googleapis.com/auth/devstorage.read_write",
"https://www.googleapis.com/auth/logging.write",
"https://www.googleapis.com/auth/cloud-platform",
]
conn_extra = {
"extra__google_cloud_platform__scope": ",".join(scopes),
"extra__google_cloud_platform__project": "<name_of_your_project>",
"extra__google_cloud_platform__key_path": '<path_to_your_json_key>'
}
conn_extra_json = json.dumps(conn_extra)
new_conn.set_extra(conn_extra_json)
session = settings.Session()
if not (session.query(Connection).filter(Connection.conn_id ==
new_conn.conn_id).first()):
session.add(new_conn)
session.commit()
else:
msg = '\n\tA connection with `conn_id`={conn_id} already exists\n'
msg = msg.format(conn_id=new_conn.conn_id)
print(msg)
dag = DAG('add_gcp_connection', start_date=datetime(2016,1,1), schedule_interval='#once')
# Task to add a connection
AddGCPCreds = PythonOperator(
dag=dag,
task_id='add_gcp_connection_python',
python_callable=add_gcp_connection,
provide_context=True)
Thanks to Yu Ishikawa for this code.
Yes, you need to provide additional information for both, S3 and GCP connection.
S3
Configuration is passed via extra field as JSON. You can provide only profile
{"profile": "xxx"}
or credentials
{"profile": "xxx", "aws_access_key_id": "xxx", "aws_secret_access_key": "xxx"}
or path to config file
{"profile": "xxx", "s3_config_file": "xxx", "s3_config_format": "xxx"}
In case of the first option, boto will try to detect your credentials.
Source code - airflow/hooks/S3_hook.py:107
GCP
You can either provide key_path and scope (see Service account credentials) or credentials will be extracted from your environment in this order:
Environment variable GOOGLE_APPLICATION_CREDENTIALS pointing to a file with stored credentials information.
Stored "well known" file associated with gcloud command line tool.
Google App Engine (production and testing)
Google Compute Engine production environment.
Source code - airflow/contrib/hooks/gcp_api_base_hook.py:68
The reason for logs not being written to your bucket could be related to service account rather than config on airflow itself. Make sure it has access to the mentioned bucket. I had same problems in the past.
Adding more generous permissions to the service account, e.g. even project wide Editor and then narrowing it down. You could also try using gs client with that key and see if you can write to the bucket.
For me personally this scope works fine for writing logs: "https://www.googleapis.com/auth/cloud-platform"

S3 Python client with boto3 SDK

I'd like to make a python S3 client to store data in the S3 Dynamic Storage service provided by the appcloud. So I've discovered the boto3 SDK for python and was wondering how this thing works on the appcloud. Locally you install the aws cli to configure your credentials but how you do that on the cloud? Does someone have experience with creating a S3 python client for the internal appcloud and could provide me with a short example (boto3 or different approach)?
Greetings
Edit 1:
Tried this:
import boto3
s3 = boto3.client('s3', endpoint_url='https://ds31s3.swisscom.com/', aws_access_key_id=ACCESS_KEY, aws_secret_access_key=SECRET)
s3.create_bucket(Bucket="sc-testbucket1234")
But I got this exception:
botocore.exceptions.EndpointConnectionError: Could not connect to the endpoint URL: "https://ds31s3.swisscom.com"
import boto3
conn = boto3.resource('s3',
region_name='eu-west-1',
endpoint_url='https://x',
aws_access_key_id='xx',
aws_secret_access_key='xx',)
conn.create_bucket(Bucket="bucketname")
Works with this configuration (with python 3.5):
import boto3
conn = boto3.resource('s3', region_name='eu-west-1', endpoint_url=HOST, aws_access_key_id=KEY, aws_secret_access_key=SECRTE)
conn.create_bucket(Bucket="pqdjmalsdnf12098")
Thanks to #user3080315

Laravel uploaded image fires a 404 Not Found error

I've written a Laravel 5.3 app that uploads an image and saves its details to the database. When I used it on localhost, everything worked fine. As soon as I uploaded it to Forge, all images uploaded via the app return a 404 code instead of displaying the image.
Some details:
The new files are uploaded correctly. I can access them via console and ls -l the files.
The new files have the same permissions and owner as the files I have uploaded via local and then pushed to the server. The old files (those pushed) are showing. The new files (those newly uploaded) are giving 404.
The new files and the old files have an identical URL structure. That is, they both go to /storage/uploads/file_name.jpg
The upload code:
if($image = $request->file('image')) {
$file_data = [];
$file_data['path'] = $image->store('uploads', 'public');
$file_details = pathinfo($file_data['path']);
$file_data['url'] = 'uploads/';
$file_data['filename'] = $file_details['filename'];
$file_data['extension'] = $file_details['extension'];
$file_data['is_image'] = 1;
$file = new File($file_data);
$file->save();
$data['image'] = $file->id;
}
Help, please!