I get the lambda event s3 key as url, want to get as string: - amazon-s3

I am converting the s3 video to hls, in that case if the s3 file name has a space or bracket, it will converted into "+", so if the code search with the "+" in s3 bucket it didn't get object.
def handler(event, context):
assetID = str(uuid.uuid4())
sourceS3Bucket = event['Records'][0]['s3']['bucket']['name']
sourceS3Key = event['Records'][0]['s3']['object']['key']
sourceS3 = 's3://'+ sourceS3Bucket + '/' + sourceS3Key
sourceS3Basename = os.path.splitext(os.path.basename(sourceS3))[0]
My S3 URI is :
s3://bucket_4999/ne gi.mp4
But in lambda i get it as
s3://bucket_4999/ne+gi.mp4
So in AWS Elemental MediaConvert, i get the error as
Unable to open input file [s3://bucket_4999/ne+gi.mp4]: [Failed probe/open: [Can't read input stream: [Failed to read data: HeadObject failed]]]

Related

How can I get the custom storage filename for uploaded images using UploadCare?

I want to use my own S3 storage and display the image that was just uploaded. How can I get the filename that was uploaded to S3?
For example I can upload a jpg image to UploadCare.
This is the output I can get:
fileInfo.cdnUrl: "https://ucarecdn.com/6314bead-0404-4279-9462-fecc927935c9/"
fileInfo.name: "0 (3).jpg"
But if I check my S3 bucket this is the file name that was actually uploaded to S3: https://localdevauctionsite-us.s3.us-west-2.amazonaws.com/6314bead-0404-4279-9462-fecc927935c9/03.jpg
Here is the javascript I have so far:
var widget = uploadcare.Widget('[role=uploadcare-uploader]');
widget.onChange(group => {
group.files().forEach(file => {
file.done(fileInfo => {
// Try to list the file from aws s3
console.log('CDN url:', fileInfo.cdnUrl);
//https://uploadcare.com/docs/file_uploader_api/files_uploads/
console.log('File name: ', fileInfo.name);
});
});
});
Filenames are sanitized before copying to S3, so the output file name can contain the following characters only:
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_
The following function can be used to get a sanitized file name:
function sanitize(filename) {
var extension = '.' + filename.split('.').pop();
var name = filename.substring(0, filename.length - extension.length);
return name.replace(/[^A-Za-z0-9_]+/g, '') + extension;
}
The final S3 URL can be composed of the S3 base URL, a file UUID, which you can obtain from the fileInfo object, and a sanitized name of the uploaded file.

downloading file from S3 using boto3 key error

I am trying to download a joblib file from S3 but getting errors with the key format..
This is my S3 path to the file:
"s3://v1/v2/v3/v4/model.joblib"
This is my code:
import boto3
bucketname = "v1"
key = "v2/v3/v4"
filename = "model.joblib"
s3 = boto3.resource('s3')
obj = s3.Object(bucketname, key)
body = obj.get()['label_model.joblib'].read()
ultimately i want to be able to do:
from joblib import load
model = load("model.joblib")
Error i got:
NoSuchKey: An error occurred (NoSuchKey) when calling the GetObject operation: The specified key does not exist.
You are trying to access the file without the filename.
Your code is:
import boto3
bucketname = "v1"
key = "v2/v3/v4"
filename = "model.joblib"
s3 = boto3.resource('s3')
obj = s3.Object(bucketname, key)
body = obj.get()['label_model.joblib'].read()
But you need to add the filename to the key variable. Here is an example downloading the file from s3:
bucketname = "v1"
key = "v2/v3/v4"
filename = "model.joblib"
s3 = boto3.resource('s3')
bucket = s3.Bucket(bucketname)
with open('filename', 'wb') as f:
bucket.download_fileobj(f'{key}/{filename}', f)

Python boto3 load model tar file from s3 and unpack it

I am using Sagemaker and have a bunch of model.tar.gz files that I need to unpack and load in sklearn. I've been testing using list_objects with delimiter to get to the tar.gz files:
response = s3.list_objects(
Bucket = bucket,
Prefix = 'aleks-weekly/models/',
Delimiter = '.csv'
)
for i in response['Contents']:
print(i['Key'])
And then I plan to extract with
import tarfile
tf = tarfile.open(model.read())
tf.extractall()
But how do I get to the actual tar.gz file from s3 instead of a some boto3 object?
You can download objects to files using s3.download_file(). This will make your code look like:
s3 = boto3.client('s3')
bucket = 'my-bukkit'
prefix = 'aleks-weekly/models/'
# List objects matching your criteria
response = s3.list_objects(
Bucket = bucket,
Prefix = prefix,
Delimiter = '.csv'
)
# Iterate over each file found and download it
for i in response['Contents']:
key = i['Key']
dest = os.path.join('/tmp',key)
print("Downloading file",key,"from bucket",bucket)
s3.download_file(
Bucket = bucket,
Key = key,
Filename = dest
)

Cannot download_file from S3 to lambda after notification that S3 object was created

I have configured SES to put some emails into S3 bucket and set a S3 trigger to fire lambda function on object created. In lambda, I need to parse and process the email. Here is my lambda (relevant part):
s3client = boto3.client('s3')
def lambda_handler(event, context):
my_bucket = s3.Bucket(‘xxxxxxxx')
my_key = event['Records'][0]['s3']['object']['key']
filename = '/tmp/'+ my_key
logger.info('Target file: ' + filename)
s3client.download_file(my_bucket, my_key, filename)
# Process email file
download_file throws an exception:
expected string or bytes-like object: TypeError
Traceback (most recent call last):
File "/var/task/lambda_function.py", line 22, in lambda_handler
s3client.download_file(my_bucket, my_key, filename)
...
File "/var/runtime/botocore/handlers.py", line 217, in validate_bucket_name
if VALID_BUCKET.search(bucket) is None:
TypeError: expected string or bytes-like object
Any idea what is wrong? The bucket is fine, object exists in the bucket.
The error is related to the bucket name (and you have a strange curly quote in your code).
The recommended way to retrieve the object details is:
for record in event['Records']:
bucket = record['s3']['bucket']['name']
key = record['s3']['object']['key']
...
s3_client.download_file(bucket, key, download_path)
Edit: My first answer was probably wrong, here's another attempt
The validation function that throws the exception can be found here
# From the S3 docs:
# The rules for bucket names in the US Standard region allow bucket names
# to be as long as 255 characters, and bucket names can contain any
# combination of uppercase letters, lowercase letters, numbers, periods
# (.), hyphens (-), and underscores (_).
VALID_BUCKET = re.compile(r'^[a-zA-Z0-9.\-_]{1,255}$')
# [I excluded unrelated code here]
def validate_bucket_name(params, **kwargs):
if 'Bucket' not in params:
return
bucket = params['Bucket']
if VALID_BUCKET.search(bucket) is None:
error_msg = (
'Invalid bucket name "%s": Bucket name must match '
'the regex "%s"' % (bucket, VALID_BUCKET.pattern))
raise ParamValidationError(report=error_msg)
boto3 uses the S3Transfer Download Manager under the hood, which then uses the download method that is defined as follows:
def download(self, bucket, key, fileobj, extra_args=None,
subscribers=None):
"""Downloads a file from S3
:type bucket: str
:param bucket: The name of the bucket to download from
...
It expects the bucket parameter to be a string and you're passing an s3.Bucket(‘xxxxxxxx') object, which probably isn't a string.
I'd try to pass the bucket name to download_file as a string.
Old and most likely wrong answer as pointed out in the comments
Some sample code in the Boto Documentation shows us how downloads from S3 can be performed:
import boto3
import botocore
BUCKET_NAME = 'my-bucket' # replace with your bucket name
KEY = 'my_image_in_s3.jpg' # replace with your object key
s3 = boto3.resource('s3')
try:
s3.Bucket(BUCKET_NAME).download_file(KEY, 'my_local_image.jpg')
except botocore.exceptions.ClientError as e:
if e.response['Error']['Code'] == "404":
print("The object does not exist.")
else:
raise
Looking at your code, it seems as if you're calling the download_file method the wrong way, it should look like this - you need to call the method on the Bucket-Object:
s3client = boto3.client('s3')
def lambda_handler(event, context):
my_bucket = s3.Bucket(‘xxxxxxxx')
my_key = event['Records'][0]['s3']['object']['key']
filename = '/tmp/'+ my_key
logger.info('Target file: ' + filename)
my_bucket.download_file(my_key, filename)
# Process email file
The important part is my_bucket.download_file(my_key, filename)

How to upload input file for batch prediction in gcloud ml-engine?

I'm trying to create a batch prediction job in google cloud ml-engine. Unfortunately, I always get the same error:
{
insertId: "wr85wwg6shs9ek"
logName: "projects/tensorflow-test-1-168615/logs/ml.googleapis.com%2Ftest_job_23847239"
receiveTimestamp: "2017-08-04T16:07:29.524193256Z"
resource: {
labels: {
job_id: "test_job_23847239"
project_id: "tensorflow-test-1-168615"
task_name: "service"
}
type: "ml_job"
}
severity: "ERROR"
textPayload: "TypeError: decoding Unicode is not supported"
timestamp: "2017-08-04T16:07:29.524193256Z"
}
I create the file in java and upload it to a bucket with the following code:
BufferedImage bufferedImage = ImageIO.read(new URL(media.getUrl()));
int[][][] imageMatrix = convertToImageToMatrix(bufferedImage);
String imageString = matrixToString(imageMatrix);
String inputContent = "{\"instances\": [{\"inputs\": " + imageString + "}]}";
byte[] inputBytes = inputContent.getBytes(Charset.forName("UTF-8"));
Blob inputBlob = mlInputBucket.create(media.getId().toString() + ".json", inputBytes, "application/json");
inputPaths.add("gs://" + Properties.getCloudBucketNameInputs() + "/" + inputBlob.getName());
In this code, I download the image, convert it to uint8 matrix and format the matrix as a json string. The file gets created and is present in the bucket. I also verified, that the json file is valid.
In the next step, I collect all created files and start the prediction job:
GoogleCloudMlV1PredictionInput input = new GoogleCloudMlV1PredictionInput();
input.setDataFormat("TEXT");
input.setVersionName("projects/" + DatastoreOptions.getDefaultProjectId() + "/models/" + Properties.getMlEngineModelName() + "/versions/" + Properties.getMlEngineModelVersion());
input.setRegion(Properties.getMlEngineRegion());
input.setOutputPath("gs://" + Properties.getCloudBucketNameOutputs() + "/" + jobId);
input.setInputPaths(inputPaths);
GoogleCloudMlV1Job job = new GoogleCloudMlV1Job();
job.setJobId(jobId);
job.setPredictionInput(input);
engine.projects().jobs().create("projects/" + DatastoreOptions.getDefaultProjectId() , job).execute();
Finally, the job gets created but the result is the one from the beginning.
I also tried to start the job with the gcloud sdk, but the result is the same. But when I modify the file to remove the instances object and match the correct format for for online prediction, it works (To make it work, I need to remove the most of the rows from the input, because of the payload quota for online predictions).
I'm using the trained pets model from the object detection. One of my created input files can be found here.
What I'm doing wrong here?
did I answer your question in tensorflow serving prediction not working with object detection pets example? The input of batch prediction should not include '{"instances: }'.