How to test Luigi with FakeS3? - amazon-s3

I'm trying to test my Luigi pipelines inside a vagrant machine using FakeS3 to simulate my S3 endpoints. For boto to be able to interact with FakeS3 the connection must be setup with the OrdinaryCallingFormat as in:
from boto.s3.connection import S3Connection, OrdinaryCallingFormat
conn = S3Connection('XXX', 'XXX', is_secure=False,
port=4567, host='localhost',
calling_format=OrdinaryCallingFormat())
but when using Luigi this connection is buried in the s3 module. I was able to pass most of the options by modifying my luigi.cfg and adding an s3 section as in
[s3]
host=127.0.0.1
port=4567
aws_access_key_id=XXX
aws_secret_access_key=XXXXXX
is_secure=0
but I don't know how to pass the required object for the calling_format.
Now I'm stuck and don't know how to proceed. Options I can think of:
Figure out how to pass the OrdinaryCallingFormat to S3Connection through luigi.cfg
Figure out how to force boto to always use this calling format in my Vagrant machine, by setting an unknown option to me either in .aws/config or boto.cfg
Make FakeS3 to accept the default calling_format used by boto that happens to be SubdomainCallingFormat (whatever it means).
Any ideas about how to fix this?

Can you not pass it into the constructor as kwargs for the S3Client?
client = S3Client(aws_access_key, aws_secret_key,
{'calling_format':OrdinaryCallingFormat()})
target = S3Target('s3://somebucket/test', client=client)

I did not encounter any problem when using boto3 connect to fakeS3.
import boto3
s3 = boto3.client(
"s3", region_name="fakes3",
use_ssl=False,
aws_access_key_id="",
aws_secret_access_key="",
endpoint_url="http://localhost:4567"
)
no specially calling method required.
Perhaps I am wrong that you really need OrdinaryCallingFormat, If my code doesn't work, please go through the github topic boto3 support on :
https://github.com/boto/boto3/issues/334

You can set it with the calling_format parameter. Here is a configuration example for fake-s3:
[s3]
aws_access_key_id=123
aws_secret_access_key=abc
host=fake-s3
port=4569
is_secure=0
calling_format=boto.s3.connection.OrdinaryCallingFormat

Related

sqlite3.OperationalError: When trying to connect to S3 Airflow Hook

I'm currently exploring implementing hooks in some of my DAGs. For instance, in one dag, I'm trying to connect to s3 to send a csv file to a bucket, which then gets copied to a redshift table.
I have a custom module written which I import to run this process. I am trying to currently set up an S3Hook to undergo this process instead. But I'm a little confused in setting up the connection, and how everything works.
First, I input the hook
from airflow.hooks.S3_hook import S3Hook
Then I try to make the hook instance
s3_hook = S3Hook(aws_conn_id='aws-s3')
Next I try to set up the client
s3_client = s3_hook.get_conn()
However when I run the client line above, I received this error
OperationalError: (sqlite3.OperationalError)
no such table: connection
[SQL: SELECT connection.password AS connection_password, connection.extra AS connection_extra, connection.id AS connection_id, connection.conn_id AS connection_conn_id, connection.conn_type AS connection_conn_type, connection.description AS connection_description, connection.host AS connection_host, connection.schema AS connection_schema, connection.login AS connection_login, connection.port AS connection_port, connection.is_encrypted AS connection_is_encrypted, connection.is_extra_encrypted AS connection_is_extra_encrypted
FROM connection
WHERE connection.conn_id = ?
LIMIT ? OFFSET ?]
[parameters: ('aws-s3', 1, 0)]
(Background on this error at: http://sqlalche.me/e/13/e3q8)
I'm trying to diagnose the error, but the tracebook is long. I'm a little confused on why sqlite3 is involved here, when I'm trying to utilize s3 here. Can anyone unpack this? Why is this error being thrown when trying to set up the client?
Thanks
Airflow is not just a library - it's also an application.
To execute Airflow code you must have airflow instance running this mean also having a database with the needed schema.
To create the tables you must execute airflow init db.
Edit:
After the discussion in comments. Your issue is that you have working Airflow application inside docker but your DAGs are written on your local disk. Docker is closed environment if you want Airflow to recognize your dags you must move the files to the DAG folder in the docker.

boto3 load custom models

For example:
session = boto3.Session()
client = session.client('custom-service')
I know that I can create a json with API definitions under ~/.aws/models and botocore will load it from there. The problem is that I need to get it done on the AWS Lambda function, which looks like impossible to do so.
Looking for a way to tell boto3 where are the custom json api definitions so it could load from the defined path.
Thanks
I have only a partial answer. There's a bit of documentation about botocore's loader module, which is what reads the model files. In a disscusion about loading models from ZIP archives, a monkey patch was offered up which extracts the ZIP to a temporary filesystem location and then extends the loader search path to that location. It doesn't seem like you can load model data directly from memory based on the API, but Lambda does give you some scratch space in /tmp.
Here's the important bits:
import boto3
session = boto3.Session()
session._loader.search_paths.extend(["/tmp/boto"])
client = session.client("custom-service")
The directory structure of /tmp/boto needs to follow the resource loader documentation. The main model file needs to be at /tmp/boto/custom-service/yyyy-mm-dd/service-2.json.
The issue also mentions that alternative loaders can be swapped in using Session.register_component so if you wanted to write a scrappy loader which returned a model straight from memory you could try that too. I don't have any info about how to go about doing that.
Just adding more details:
import boto3
import zipfile
import os
s3_client = boto3.client('s3')
s3_client.download_file('your-bucket','model.zip','/tmp/model.zip')
os.chdir('/tmp')
with zipfile.ZipFile('model.zip', 'r') as archive:
archive.extractall()
session = boto3.Session()
session._loader.search_paths.extend(["/tmp/boto"])
client = session.client("custom-service")
model.zip is just a compressed file that contains:
Archive: model.zip
Length Date Time Name
--------- ---------- ----- ----
0 11-04-2020 16:44 boto/
0 11-04-2020 16:44 boto/custom-service/
0 11-04-2020 16:44 boto/custom-service/2018-04-23/
21440 11-04-2020 16:44 boto/custom-service/2018-04-23/service-2.json
Just remember to have the proper lambda role to access S3 and your custom-service.
boto3 also allows setting the AWS_DATA_PATH environment variable which can point to a directory path of your choice.
[boto3 Docs]
Everything zipped with your lambda function is put under /opt/.
Let's assume all your custom models live under a models/ folder. When this folder is mounted to the lambda environment, it'll live under /opt/models/.
Simply specify AWS_DATA_PATH=/opt/models/ in the Lambda configuration and boto3 will pick up models in that directory.
This is better than fetching models from S3 during runtime, unpacking, and then modifying session parameters.

boto3 s3 copy_object with ContentEncoding argument

I'm trying to copy s3 object with boto3 command like below
import boto3
client = boto3.client('s3')
client.copy_object(Bucket=bucket_name, ContentEncoding='gzip', CopySource=copy_source, Key=new_key)
To copy the object succeeded, but ContentEncoding metadata was not added to the object.
When I use the console to add Content-Encoding metadata, there was no problem.
But using python boto3 copy command, it cannot do that.
Here's a document link about client.copy_object()
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.copy_object
And the application versions are like this.
python=2.7.16
boto3=1.0.28
botocore=1.13.50
Thank you in advance.
Try adding MetadataDirective='REPLACE' to your copy_object call
client.copy_object(Bucket=bucket_name, ContentEncoding='gzip', CopySource=copy_source, Key=new_key, MetadataDirective='REPLACE')

Acessing a database using zeep

I am trying to programmatically retrieve information from a database(BRENDA) using Zeep.
The following is the code.
import zeep
import hashlib
wsdl = "https://www.brenda-enzymes.org/soap/brenda.wsdl"
password = hashlib.sha256("xx".encode('utf-8')).hexdigest()
parameters = "xxx," + password + ",ecNumber*{}#organism*{}#".format("2.7.1.2", "Homo sapiens")
client = zeep.Client(wsdl=wsdl)
print(client)
km_string = client.getKmValue(parameters)
However, I get the following error
AttributeError: 'Client' object has no attribute 'getKmValue'
Could someone help me with this?
The above code works fine while using SOAPpy library in python 2. However, I couldn't successfully install SOAPpy in python 3, therefore I tried Zeep.
The sample code that shows SOAP implementation is available here
We fixed the webservice. It should work, now. Please have a look at the SOAP documentation on our website.
not the resolution but some hints.
1) with zeep you need to put .service between client and the name of the method. the correct syntax is client.service.getKmValue(parameters) (take a look at documentation)
anyway for zeep, getKmValue doesn't exists (but it exists on the wsdl schema and SoapUi see it).
you can also try py-suds,
but for some reason i obtain a 403 calling the wsdl.
from suds.client import Client
import hashlib
client = Client("https://www.brenda-enzymes.org/soap/brenda.wsdl")

Renaming an Amazon CloudWatch Alarm

I'm trying to organize a large number of CloudWatch alarms for maintainability, and the web console grays out the name field on an edit. Is there another method (preferably something scriptable) for updating the name of CloudWatch alarms? I would prefer a solution that does not require any programming beyond simple executable scripts.
Here's a script we use to do this for the time being:
import sys
import boto
def rename_alarm(alarm_name, new_alarm_name):
conn = boto.connect_cloudwatch()
def get_alarm():
alarms = conn.describe_alarms(alarm_names=[alarm_name])
if not alarms:
raise Exception("Alarm '%s' not found" % alarm_name)
return alarms[0]
alarm = get_alarm()
# work around boto comparison serialization issue
# https://github.com/boto/boto/issues/1311
alarm.comparison = alarm._cmp_map.get(alarm.comparison)
alarm.name = new_alarm_name
conn.update_alarm(alarm)
# update actually creates a new alarm because the name has changed, so
# we have to manually delete the old one
get_alarm().delete()
if __name__ == '__main__':
alarm_name, new_alarm_name = sys.argv[1:3]
rename_alarm(alarm_name, new_alarm_name)
It assumes you're either on an ec2 instance with a role that allows this, or you've got a ~/.boto file with your credentials. It's easy enough to manually add yours.
Unfortunately it looks like this is not currently possible.
I looked around for the same solution but it seems neither console nor cloudwatch API provides that feature.
Note:
But we can copy the existing alram with the same parameter and can save on new name
.