AWS S3 and Sagemaker: No such file or directory - amazon-s3

I have created an S3 bucket 'testshivaproject' and uploaded an image in it. When I try to access it in sagemaker notebook, it throws an error 'No such file or directory'.
# import libraries
import boto3, re, sys, math, json, os, sagemaker, urllib.request
from sagemaker import get_execution_role
import numpy as np
# Define IAM role
role = get_execution_role()
my_region = boto3.session.Session().region_name # set the region of the instance
print("success :"+my_region)
Output: success :us-east-2
role
Output: 'arn:aws:iam::847047967498:role/service-role/AmazonSageMaker-ExecutionRole-20190825T121483'
bucket = 'testprojectshiva2'
data_key = 'ext_image6.jpg'
data_location = 's3://{}/{}'.format(bucket, data_key)
print(data_location)
Output: s3://testprojectshiva2/ext_image6.jpg
test = load_img(data_location)
Output: No such file or directory
There are similar questions raised (Load S3 Data into AWS SageMaker Notebook) but did not find any solution?

Thanks for using Amazon SageMaker!
I sort of guessed from your description, but are you trying to use the Keras load_img function to load images directly from your S3 bucket?
Unfortunately, the load_img function is designed to only load files from disk, so passing an s3:// URL to that function will always return a FileNotFoundError.
It's common to first download images from S3 before using them, so you can use boto3 or the AWS CLI to download the file before calling load_img.
Alternatively, since the load_img function simply creates a PIL Image object, you can create the PIL object directly from the data in S3 using boto3, and not use the load_img function at all.
In other words, you could do something like this:
from PIL import Image
s3 = boto3.client('s3')
test = Image.open(BytesIO(
s3.get_object(Bucket=bucket, Key=data_key)['Body'].read()
))
Hope this helps you out in your project!

You may use the following code to pull in a CSV file into sagemaker.
import pandas as pd
bucket='your-s3-bucket'
data_key = 'your.csv'
data_location = 's3://{}/{}'.format(bucket, data_key)
df = pd.read_csv(data_location)
alternative formatting for data_location variable:
data_location = f's3://{bucket}/{data_key}'

Related

How to download csv file from S3 bucket into numpy array

I have a csv file in an AWS S3 bucket. How do I download the CSV and assign it to a numpy array?
[Using python 3.6/boto3]
I've tried various forms including:
s3 = boto3.resource('s3', region_name=region)
obj = s3.Object(bucket, key)
with io.BytesIO(obj.get()["Body"].read()) as f:
# rewind the file
f.seek(0)
arr_data = numpy.load(f)
arr_data = numpy.genfromtxt('https://BUCKETNAME.s3-eu-west-1.amazonaws.com/folder/infile.csv',dtype='str',delimiter=',')
This also doesn't work
Essentially I'm trying to replicate in S3:
arr_data = np.genfromtxt('path...input.csv',dtype='str',delimiter=',')
I was able to convert a csv to a numpy array using pandas in-between... not sure if that's what you're looking for. But here's how I did it:
import pandas as pd
import numpy as np
data_location = 's3://<path>'
data = pd.read_csv(data_location)
data_numpy = data.value.values.reshape(-1,1)

How to load data from your S3 bucket to Sagemaker jupyter notebook to train the model?

I have csv files in S3 bucket, I want to use those to train model in sagemaker.
using this code but it gives an error (file not found)
import boto3
import pandas as pd
region = boto3.Session().region_name
train_data_location = 's3://taggingu-{}/train.csv'.format(region)
df=pd.read_csv(train_data_location, header = None)
print df.head
What can be the solution to this ?
Not sure but could this stackoverflow answer it? Load S3 Data into AWS SageMaker Notebook
To quote #Chhoser:
import boto3
import pandas as pd
from sagemaker import get_execution_role
role = get_execution_role()
bucket='my-bucket'
data_key = 'train.csv'
data_location = 's3://{}/{}'.format(bucket, data_key)
pd.read_csv(data_location)
You can use AWS SDK for Pandas, a library that extends Pandas to work smoothly with AWS data stores.
import awswrangler as wr
df = wr.s3.read_csv("s3://bucket/file.csv")
Most notebook kernels have it, if missing it can be installed via pip install awswrangler.

AWS Sagemaker: AttributeError: module 'pandas' has no attribute 'core'

Let me prefix this by saying I'm very new to tensorflow and even newer to AWS Sagemaker.
I have some tensorflow/keras code that I wrote and tested on a local dockerized Jupyter notebook and it runs fine. In it, I import a csv file as my input.
I use Sagemaker to spin up a jupyter notebook instance with conda_tensorflow_p36. I modified the pandas.read_csv() code to point to my input file, now hosted on a S3 bucket.
So I changed this line of code from
import pandas as pd
data = pd.read_csv("/input.csv", encoding="latin1")
to this
import pandas as pd
data = pd.read_csv("https://s3.amazonaws.com/my-sagemaker-bucket/input.csv", encoding="latin1")
and I get this error
AttributeError: module 'pandas' has no attribute 'core'
I'm not sure if it's a permissions issue. I read that as long as I name my bucket with the string "sagemaker" it should have access to it.
Pull our data from S3 for example:
import boto3
import io
import pandas as pd
# Set below parameters
bucket = '<bucket name>'
key = 'data/training/iris.csv'
endpointName = 'decision-trees'
# Pull our data from S3
s3 = boto3.client('s3')
f = s3.get_object(Bucket=bucket, Key=key)
# Make a dataframe
shape = pd.read_csv(io.BytesIO(f['Body'].read()), header=None)

how to link s3 bucket to sagemaker notebook

I am trying to link my s3 bucket to a notebook instance, however i am not able to:
Here is how much I know:
from sagemaker import get_execution_role
role = get_execution_role
bucket = 'atwinebankloadrisk'
datalocation = 'atwinebankloadrisk'
data_location = 's3://{}/'.format(bucket)
output_location = 's3://{}/'.format(bucket)
to call the data from the bucket:
df_test = pd.read_csv(data_location/'application_test.csv')
df_train = pd.read_csv('./application_train.csv')
df_bureau = pd.read_csv('./bureau_balance.csv')
However I keep getting errors and unable to proceed.
I haven't found answers that can assist much.
PS: I am new to this AWS
You can load S3 Data into AWS SageMaker Notebook by using the sample code below. Do make sure the Amazon SageMaker role has policy attached to it to have access to S3.
[1] https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html
import boto3
import botocore
import pandas as pd
from sagemaker import get_execution_role
role = get_execution_role()
bucket = 'Your_bucket_name'
data_key = your_data_file.csv'
data_location = 's3://{}/{}'.format(bucket, data_key)
pd.read_csv(data_location)
You're trying to use Pandas to read files from S3 - Pandas can read files from your local disk, but not directly from S3.
Instead, download the files from S3 to your local disk, then use Pandas to read them.
import boto3
import botocore
BUCKET_NAME = 'my-bucket' # replace with your bucket name
KEY = 'my_image_in_s3.jpg' # replace with your object key
s3 = boto3.resource('s3')
try:
# download as local file
s3.Bucket(BUCKET_NAME).download_file(KEY, 'my_local_image.jpg')
# OR read directly to memory as bytes:
# bytes = s3.Object(BUCKET_NAME, KEY).get()['Body'].read()
except botocore.exceptions.ClientError as e:
if e.response['Error']['Code'] == "404":
print("The object does not exist.")
else:
raise
You can use the https://s3fs.readthedocs.io/en/latest/ to read s3 files directly with pandas. The code below is taken from here
import os
import pandas as pd
from s3fs.core import S3FileSystem
os.environ['AWS_CONFIG_FILE'] = 'aws_config.ini'
s3 = S3FileSystem(anon=False)
key = 'path\to\your-csv.csv'
bucket = 'your-bucket-name'
df = pd.read_csv(s3.open('{}/{}'.format(bucket, key), mode='rb'))
In pandas 1.0.5, if you've already provided access to the notebook instance, reading a csv from S3 is as easy as this (https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#reading-remote-files):
df = pd.read_csv('s3://<bucket-name>/<filepath>.csv')
During the notebook setup process I attached a SageMakerFullAccess policy to the notebook instance granting it access to the S3 bucket. You can also do this via the IAM Management console.
If you need credentials, there's three ways to providing them (https://s3fs.readthedocs.io/en/latest/#credentials):
aws_access_key_id, aws_secret_access_key, and aws_session_token environment variables
configuration files such as ~/.aws/credentials
for nodes on EC2, the IAM metadata provider
import boto3
# files are referred as objects in S3.
# file name is referred as key name in S3
def write_to_s3(filename, bucket_name, key):
with open(filename,'rb') as f: # Read in binary mode
return boto3.Session().resource('s3').Bucket(bucket).Object(key).upload_fileobj(f)
# Simple call the write_to_s3 function with required argument
write_to_s3('file_name.csv',
bucket_name,
'file_name.csv')

AWS S3 bucket write error

I created AWS S3 bucket and tried sample kmeans example on Jupyter notebook.
Being account owner I have read/write permissions but I am unable to write logs with following error,
ClientError: An error occurred (AccessDenied) when calling the PutObject operation: Access Denied
here's the kmeans sample code,
from sagemaker import get_execution_role
role = get_execution_role()
bucket='testingshk'
import pickle, gzip, numpy, urllib.request, json
urllib.request.urlretrieve("http://deeplearning.net/data/mnist/mnist.pkl.gz", "mnist.pkl.gz")
with gzip.open('mnist.pkl.gz', 'rb') as f:
train_set, valid_set, test_set = pickle.load(f, encoding='latin1')
from sagemaker import KMeans
data_location = 's3://{}/kmeans_highlevel_example/data'.format(bucket)
output_location = 's3://{}/kmeans_example/output'.format(bucket)
print('training data will be uploaded to: {}'.format(data_location))
print('training artifacts will be uploaded to: {}'.format(output_location))
kmeans = KMeans(role=role,
train_instance_count=2,
train_instance_type='ml.c4.8xlarge',
output_path=output_location,
k=10,
data_location=data_location)
kmeans.fit(kmeans.record_set(train_set[0]))
Even if you have all the access to the bucket, you need to provide access key and secret in order to put some object in bucket if it is private. Or if you make bucket access public to all then you can push object to bucket without any problem.