How to batch export CloudWatch Logs to S3 using Lambda(python) - amazon-s3

Based on AWS developer guide, Here, It does not tell you how to batch export CloudWatch Logs to S3 using Lambda. Below is my code using boto3, I've researched quite sometime now and could not find any that allow you to batch export CloudWatch Logs to S3 using Lambda.
Any Help is Appreciated!
import boto3
import collections
import time; time.time()
region = 'us-east-2'
def lambda_handler(event, context):
client = boto3.client('logs')
response = client.create_export_task(
taskName='export to S3',
logGroupName=('/aws/lambda/my_bucket'),
logStreamNamePrefix='default',
fromTime=456858943000,
to=5655086899000,
destination='my_bucket',
destinationPrefix='default'
)
print(response)

Related

DataNotFoundError: Unable to load data for: endpoints

I'm trying to write from a data frame to CSV directly to an s3 bucket
I've tried the stringIO method but the problem is that I run into the "KeyTooLong" error.
import boto3
client = boto3.client('s3')
client.create_bucket(Bucket = 'poolpo-rent-a-car-bucket')
# checking if the bucket was created
response = client.list_buckets()
response['Buckets']
bucket_name = 'poolpo-rent-a-car-bucket'
car_costs.to_csv(f"s3://{bucket_name}/{car_costs}.csv")
This is the StringIO one
from io import StringIO
bucket_name = 'poolpo-rent-a-car-bucket'
csv_buffer = StringIO()
branch_locations.to_csv(csv_buffer)
s3_resource = boto3.resource('s3')
s3_resource.Object(bucket_name, f'{branch_locations}.csv').put(Body=csv_buffer.getvalue())
And the error
ClientError: An error occurred (KeyTooLongError) when calling the PutObject operation: Your key is too long
These are medium size dataframes, like 5000 rows and like 3-5 columns
For an unrelated reason, I had to reinstall anaconda and the problems got away.
Ended up using a way simpler approach.
import boto3
client = boto3.client('s3')
client.create_bucket(Bucket = 'poolpo-rent-a-car-bucket')
response = client.list_buckets()
response['Buckets']
car_costs.to_csv(f"s3://{bucket_name}/car_costs.csv")
One other thing that I noticed in s3 was that when I was using the f string to input the dataframe I was basically using the dataframe as a name hence why I was having the KeyTooLongError

AWS S3 and Sagemaker: No such file or directory

I have created an S3 bucket 'testshivaproject' and uploaded an image in it. When I try to access it in sagemaker notebook, it throws an error 'No such file or directory'.
# import libraries
import boto3, re, sys, math, json, os, sagemaker, urllib.request
from sagemaker import get_execution_role
import numpy as np
# Define IAM role
role = get_execution_role()
my_region = boto3.session.Session().region_name # set the region of the instance
print("success :"+my_region)
Output: success :us-east-2
role
Output: 'arn:aws:iam::847047967498:role/service-role/AmazonSageMaker-ExecutionRole-20190825T121483'
bucket = 'testprojectshiva2'
data_key = 'ext_image6.jpg'
data_location = 's3://{}/{}'.format(bucket, data_key)
print(data_location)
Output: s3://testprojectshiva2/ext_image6.jpg
test = load_img(data_location)
Output: No such file or directory
There are similar questions raised (Load S3 Data into AWS SageMaker Notebook) but did not find any solution?
Thanks for using Amazon SageMaker!
I sort of guessed from your description, but are you trying to use the Keras load_img function to load images directly from your S3 bucket?
Unfortunately, the load_img function is designed to only load files from disk, so passing an s3:// URL to that function will always return a FileNotFoundError.
It's common to first download images from S3 before using them, so you can use boto3 or the AWS CLI to download the file before calling load_img.
Alternatively, since the load_img function simply creates a PIL Image object, you can create the PIL object directly from the data in S3 using boto3, and not use the load_img function at all.
In other words, you could do something like this:
from PIL import Image
s3 = boto3.client('s3')
test = Image.open(BytesIO(
s3.get_object(Bucket=bucket, Key=data_key)['Body'].read()
))
Hope this helps you out in your project!
You may use the following code to pull in a CSV file into sagemaker.
import pandas as pd
bucket='your-s3-bucket'
data_key = 'your.csv'
data_location = 's3://{}/{}'.format(bucket, data_key)
df = pd.read_csv(data_location)
alternative formatting for data_location variable:
data_location = f's3://{bucket}/{data_key}'

how to link s3 bucket to sagemaker notebook

I am trying to link my s3 bucket to a notebook instance, however i am not able to:
Here is how much I know:
from sagemaker import get_execution_role
role = get_execution_role
bucket = 'atwinebankloadrisk'
datalocation = 'atwinebankloadrisk'
data_location = 's3://{}/'.format(bucket)
output_location = 's3://{}/'.format(bucket)
to call the data from the bucket:
df_test = pd.read_csv(data_location/'application_test.csv')
df_train = pd.read_csv('./application_train.csv')
df_bureau = pd.read_csv('./bureau_balance.csv')
However I keep getting errors and unable to proceed.
I haven't found answers that can assist much.
PS: I am new to this AWS
You can load S3 Data into AWS SageMaker Notebook by using the sample code below. Do make sure the Amazon SageMaker role has policy attached to it to have access to S3.
[1] https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html
import boto3
import botocore
import pandas as pd
from sagemaker import get_execution_role
role = get_execution_role()
bucket = 'Your_bucket_name'
data_key = your_data_file.csv'
data_location = 's3://{}/{}'.format(bucket, data_key)
pd.read_csv(data_location)
You're trying to use Pandas to read files from S3 - Pandas can read files from your local disk, but not directly from S3.
Instead, download the files from S3 to your local disk, then use Pandas to read them.
import boto3
import botocore
BUCKET_NAME = 'my-bucket' # replace with your bucket name
KEY = 'my_image_in_s3.jpg' # replace with your object key
s3 = boto3.resource('s3')
try:
# download as local file
s3.Bucket(BUCKET_NAME).download_file(KEY, 'my_local_image.jpg')
# OR read directly to memory as bytes:
# bytes = s3.Object(BUCKET_NAME, KEY).get()['Body'].read()
except botocore.exceptions.ClientError as e:
if e.response['Error']['Code'] == "404":
print("The object does not exist.")
else:
raise
You can use the https://s3fs.readthedocs.io/en/latest/ to read s3 files directly with pandas. The code below is taken from here
import os
import pandas as pd
from s3fs.core import S3FileSystem
os.environ['AWS_CONFIG_FILE'] = 'aws_config.ini'
s3 = S3FileSystem(anon=False)
key = 'path\to\your-csv.csv'
bucket = 'your-bucket-name'
df = pd.read_csv(s3.open('{}/{}'.format(bucket, key), mode='rb'))
In pandas 1.0.5, if you've already provided access to the notebook instance, reading a csv from S3 is as easy as this (https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#reading-remote-files):
df = pd.read_csv('s3://<bucket-name>/<filepath>.csv')
During the notebook setup process I attached a SageMakerFullAccess policy to the notebook instance granting it access to the S3 bucket. You can also do this via the IAM Management console.
If you need credentials, there's three ways to providing them (https://s3fs.readthedocs.io/en/latest/#credentials):
aws_access_key_id, aws_secret_access_key, and aws_session_token environment variables
configuration files such as ~/.aws/credentials
for nodes on EC2, the IAM metadata provider
import boto3
# files are referred as objects in S3.
# file name is referred as key name in S3
def write_to_s3(filename, bucket_name, key):
with open(filename,'rb') as f: # Read in binary mode
return boto3.Session().resource('s3').Bucket(bucket).Object(key).upload_fileobj(f)
# Simple call the write_to_s3 function with required argument
write_to_s3('file_name.csv',
bucket_name,
'file_name.csv')

AWS S3 bucket write error

I created AWS S3 bucket and tried sample kmeans example on Jupyter notebook.
Being account owner I have read/write permissions but I am unable to write logs with following error,
ClientError: An error occurred (AccessDenied) when calling the PutObject operation: Access Denied
here's the kmeans sample code,
from sagemaker import get_execution_role
role = get_execution_role()
bucket='testingshk'
import pickle, gzip, numpy, urllib.request, json
urllib.request.urlretrieve("http://deeplearning.net/data/mnist/mnist.pkl.gz", "mnist.pkl.gz")
with gzip.open('mnist.pkl.gz', 'rb') as f:
train_set, valid_set, test_set = pickle.load(f, encoding='latin1')
from sagemaker import KMeans
data_location = 's3://{}/kmeans_highlevel_example/data'.format(bucket)
output_location = 's3://{}/kmeans_example/output'.format(bucket)
print('training data will be uploaded to: {}'.format(data_location))
print('training artifacts will be uploaded to: {}'.format(output_location))
kmeans = KMeans(role=role,
train_instance_count=2,
train_instance_type='ml.c4.8xlarge',
output_path=output_location,
k=10,
data_location=data_location)
kmeans.fit(kmeans.record_set(train_set[0]))
Even if you have all the access to the bucket, you need to provide access key and secret in order to put some object in bucket if it is private. Or if you make bucket access public to all then you can push object to bucket without any problem.

Python Reddis Queue ValueError: Functions from the __main__ module cannot be processed by workers

I'm trying to enqueue a basic job in redis using python-rq, But it throws this error
"ValueError: Functions from the main module cannot be processed by workers"
Here is my program:
import requests
def count_words_at_url(url):
resp = requests.get(url)
return len(resp.text.split())
from rq import Connection, Queue
from redis import Redis
redis_conn = Redis()
q = Queue(connection=redis_conn)
job = q.enqueue(count_words_at_url, 'http://nvie.com')
print job
Break the provided code to two files:
count_words.py:
import requests
def count_words_at_url(url):
resp = requests.get(url)
return len(resp.text.split())
and main.py (where you'll import the required function):
from rq import Connection, Queue
from redis import Redis
from count_words import count_words_at_url # added import!
redis_conn = Redis()
q = Queue(connection=redis_conn)
job = q.enqueue(count_words_at_url, 'http://nvie.com')
print job
I always separate the tasks from the logic running those tasks to different files. It's just better organization. Also note that you can define a class of tasks and import/schedule tasks from that class instead of the (over-simplified) structure I suggest above. This should get you going..
Also see here to confirm you're not the first to struggle with this example. RQ is great once you get the hang of it.
Currently there is a bug in RQ, which leads to this error. You will not be able to pass functions in enqueue from the same file without explicitly importing it.
Just add from app import count_words_at_url above the enqueue function:
import requests
def count_words_at_url(url):
resp = requests.get(url)
return len(resp.text.split())
from rq import Connection, Queue
from redis import Redis
redis_conn = Redis()
q = Queue(connection=redis_conn)
from app import count_words_at_url
job = q.enqueue(count_words_at_url, 'http://nvie.com')
print job
The other way is to have the functions in a separate file and import them.