I'm trying to convert the MLOps template for model building, training, and deployment CloudFormation template into a CDK project so I can easily update the definitions, synth the template and upload it into CloudCatalog in order to be used as a project template in SageMaker Studio.
I'm quite new to CDK though, and I'm having some troubles trying to initialize a CodeCommit repository with the sagemaker pipeline seed-code stored in S3, which was accomplished as follows in the original template :
'ModelBuildCodeCommitRepository':
'Type': 'AWS::CodeCommit::Repository'
'Properties':
'RepositoryName':
'Fn::Sub': 'sagemaker-${SageMakerProjectName}-${SageMakerProjectId}-modelbuild'
'RepositoryDescription':
'Fn::Sub': 'SageMaker Model building workflow infrastructure as code for the
Project ${SageMakerProjectName}'
'Code':
'S3':
'Bucket': 'sagemaker-servicecatalog-seedcode-sa-east-1'
'Key': 'toolchain/model-building-workflow-v1.0.zip'
'BranchName': 'main'
The CDK API docs does refer to the code parameter in codecommit.Repository as an initialization option, but it's only for local files being compressed and uploaded to S3 and such. That's because it assumes a deployment of the CDK project, but I only want the template generated by cdk synth.
Of course I can always use codecommit.CfnRepository and its code parameter to point into S3, but then I cannot insert it in the codepipeline's stage codepipeline_actions.CodeCommitSourceAction's repository parameter because it expects an IRepository object.
I also want to stick to aws-cdk-lib.aws_codepipeline to grasp the fundamental logic of CloudPipeline (which I'm quite new too) and avoid using the high level aws-cdk-lib.pipelines.
Any ideas on how can I accomplish this?
Construct a Repository without a Code prop. Get an escape hatch reference to its L1 CfnRepository layer. Set the CfnRepository's property manually to the existing S3 bucket:
const repo = new codecommit.Repository(this, 'Repo', { repositoryName: 'my-great-repo' });
const cfnRepo = repo.node.defaultChild as codecommit.CfnRepository;
cfnRepo.addPropertyOverride('Code', {
S3: {
Bucket: 'sagemaker-servicecatalog-seedcode-sa-east-1',
Key: 'toolchain/model-building-workflow-v1.0.zip',
},
BranchName: 'main',
});
The above code will synth the YAML output in the OP. Pass repo as the pipeline's source action.
Don't forget to grant the necessary IAM permissions on the S3 bucket.
I am building a Python Lambda in AWS and wanted to add an S3 trigger to it. Following these instructions I saw how to get the bucket and key on which I got the trigger using:
def func(event):
bucket = event['Records'][0]['s3']['bucket']['name']
key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')
There is an example of such an object in the link, but I wasn't able, however, to find a description of the entire event object anywhere in AWS' documentation.
Is there a documentation for this object's structure? Where might I find it?
You can find documentation about the whole object in the S3 documentation:
https://docs.aws.amazon.com/AmazonS3/latest/userguide/notification-content-structure.html
I would also advise to iterate the records, because there could be multiple at once:
for record in event['Records']:
bucket = record['s3']['bucket']['name']
key = record['s3']['object']['key']
[...]
I am trying to copy data from a large number of files in s3 over to Redshift. I have read-only access to the s3 bucket which contains these files. In order to COPY them efficiently, I created a manifest file that contains the links to each of the files I need copied over.
Bucket 1:
- file1.gz
- file2.gz
- ...
Bucket 2:
- manifest
Here is the command I've tried to copy data from bucket 1 using the manifest in bucket 2:
-- Load data from s3
copy data_feed_eval from 's3://bucket-2/data_files._manifest'
CREDENTIALS 'aws_access_key_id=bucket_1_key;aws_secret_access_key=bucket_1_secret'
manifest
csv gzip delimiter ',' dateformat 'YYYY-MM-DD' timeformat 'YYYY-MM-DD HH:MI:SS'
maxerror 1000 TRUNCATECOLUMNS;
However, when running this command, I get the following error:
09:45:32 [COPY - 0 rows, 7.576 secs] [Code: 500310, SQL State: XX000] [Amazon](500310) Invalid operation: Problem reading manifest file - S3ServiceException:Access Denied,Status 403,Error AccessDenied,Rid 901E02533CC5010D,ExtRid tEvf/TVfZzPfSNAFa8iTYjTBjvaHnMMPmuwss58SwopY/sZSkhUBe3yMGHTDyA0yDhDCD7ybX9gl45pV/eQ=,CanRetry 1
Details:
-----------------------------------------------
error: Problem reading manifest file - S3ServiceException:Access Denied,Status 403,Error AccessDenied,Rid 901E02533CC5010D,ExtRid tEvf/TVfZzPfSNAFa8iTYjTBjvaHnMMPmuwss58SwopY/sZSkhUBe3yMGHTDyA0yDhDCD7ybX9gl45pV/eQ=,CanRetry 1
code: 8001
context: s3://bucket-2/data_files._manifest
query: 2611231
location: s3_utility.cpp:284
process: padbmaster [pid=10330]
-----------------------------------------------;
I believe the issue here is I'm passing bucket_1 credentials in my COPY command. Is it possible to pass credentials for multiple buckets (bucket_1 with the actual files, and bucket_2 with the manifest) to the COPY command? How should I approach this assuming I don't have write access to bucket_1?
You have indicated that bucket_1_key key (which is IAM user) has permissions limited to "read-only" from bucket_1. If this is the case then the error occurs because that key has no permission read from bucket_2. You have mentioned this a possible cause already and it is exactly that.
There is no option to supply two sets of keys to COPY command. But, you should consider the following options:
Option 1
According to this "You can specify the files to be loaded by using an Amazon S3 object prefix or by using a manifest file."
If there is a common prefix for the set of files you want to load, you can use that prefix in bucket_1 in COPY command.
See http://docs.aws.amazon.com/redshift/latest/dg/t_loading-tables-from-s3.html
You have mentioned you have read-only access to bucket 1. Make sure this is sufficient access as defined in http://docs.aws.amazon.com/redshift/latest/dg/copy-usage_notes-access-permissions.html#copy-usage_notes-iam-permissions
All the other options require changes to your key/IAM user permissions or Redshift itself.
Option 2
Extend permissions of bucket_1_key key to be able to read from bucket_2 as well. You will have to make sure that your bucket_1_key key has LIST access to bucket_2 and GET access for the bucket_2 objects (as documented here).
This way you can continue using bucket_1_key key in COPY command. This method is referred to as Key-Based Access Control and uses plain-text access key ID and secret access key. AWS recommends to use Role-Based Access Control (option 3) instead.
Option 3
Use IAM role in COPY command instead of key (option 2). This is referred to as Role-Based Access Control. This is also strongly recommended authentication option to use in COPY command.
This IAM role would have to privileges to LIST access on buckets 1 and 2 and GET access for the objects in those buckets.
More info about Key-Based and Role-Based Access Control is here.
I want to add Folder in my amazon s3 bucket using coding.
Can you please suggest me how to achieve this?
There are no folders in Amazon S3. It just that most of the S3 browser tools available show part of the key name separated by slash as a folder.
If you really need that you can create an empty object with the slash at the end. e.g. "folder/" It will looks like a folder if you open it with a GUI tool and AWS Console.
As everyone has told you, in AWS S3 there aren't any "folders", you're thinking of them incorrectly. AWS S3 has "objects", these objects can look like folders but they aren't really folders in the fullest sense of the word. If you look for creating folders on the Amazon AWS S3 you won't find a lot of good results.
There is a way to create "folders" in the sense that you can create a simulated folder structure on the S3, but again, wrap your head around the fact that you are creating objects in S3, not folders. Going along with that, you will need the command "put-object" to create this simulated folder structure. Now, in order to use this command, you need the AWS CLI tools installed, go here AWS CLI Installation for instructions to get them installed.
The command is this:
aws s3api put-object --bucket your-bucket-name --key path/to/file/yourfile.txt --body yourfile.txt
Now, the fun part about this command is, you do not need to have all of the "folders" (objects) created before you run this command. What this means is you can have a "folder" (object) to contain things, but then you can use this command to create the simulated folder structure within that "folder" (object) as I discussed earlier. For example, I have a "folder" (object) named "importer" within my S3 bucket, lets say I want to insert sample.txt within a "folder" (object) structure of the year, month, and then a sample "folder" (object) within all of that.
If I only have the "importer" object within my bucket, I do not need to go in beforehand to create the year, month, and sample objects ("folders") before running this command. I can run this command like so:
aws s3api put-object --bucket my-bucket-here --key importer/2016/01/sample/sample.txt --body sample.txt
The put-object command will then go in and create the path that I have specified in the --key flag. Here's a bit of a jewel: even if you don't have a file to upload to the S3, you can still create objects ("folders") within the S3 bucket, for example, I created a shell script to "create folders" within the bucket, by leaving off the --body flag, and not specifying a file name, and leaving a slash at the end of the path provided in the --key flag, the system creates the desired simulated folder structure within the S3 bucket without inserting a file in the process.
Hopefully this helps you understand the system a little better.
Note: once you have a "folder" structure created, you can use the S3's "sync" command to syncronize the descendant "folder" with a folder on your local machine, or even with another S3 bucket.
Java with AWS SDK:
There are no folders in s3, only key/value pairs. The key can contain slashes (/) and that will make it appear as a folder in management console, but programmatically it's not a folder it is a String value.
If you are trying to structure your s3 bucket, then your naming conventions (the keys you give your files) can simply follow normal directory patterns, i.e. folder/subfolder/file.txt.
When searching (depending on language you are using), you can search via prefix with a delimiter. In Java, it would be a listObjects(String storageBucket, String prefix, String delimiter) method call.
The storageBucket is the name of your bucket, the prefix is the key you want to search, and the delimiter is used to filter your search based off the prefix.
The AWS:S3 rails gem does this by itself:
AWS::S3::S3Object.store("teaser/images/troll.png", file, AWS_BUCKET)
Will automatically create the teaser and images "folders" if they don't already exist.
With AWS SDK .Net works perfectly, just add "/" at the end of the name folder:
var folderKey = folderName + "/"; //end the folder name with "/"
AmazonS3 client = Amazon.AWSClientFactory.CreateAmazonS3Client(AWSAccessKey, AWSSecretKey);
var request = new PutObjectRequest();
request.WithBucketName(AWSBucket);
request.WithKey(folderKey);
request.WithContentBody(string.Empty);
S3Response response = client.PutObject(request);
Then refresh your AWS console, and you will see the folder
With aws cli, it is possible to copy an entire folder to a bucket.
aws s3 cp /path/to/folder s3://bucket/path/to/folder --recursive
There is also the option to sync a folder using aws s3 sync
This is a divisive topic, so here is a screenshot in 2019 of the AWS S3 console for adding folders and the note:
When you create a folder, S3 console creates an object with the above
name appended by suffix "/" and that object is displayed as a folder
in the S3 console.
Then 'using coding' you can simply adjust the object name by prepending a valid folder name string and a forward slash.
For Swift I created a method where you pass in a String for the folder name.
Swift 3:
import AWSS3
func createFolderWith(Name: String!) {
let folderRequest: AWSS3PutObjectRequest = AWSS3PutObjectRequest()
folderRequest.key = Name + "/"
folderRequest.bucket = bucket
AWSS3.default().putObject(folderRequest).continue({ (task) -> Any? in
if task.error != nil {
assertionFailure("* * * error: \(task.error?.localizedDescription)")
} else {
print("created \(Name) folder")
}
return nil
})
}
Then just call
createFolderWith(Name:"newFolder")
In iOS (Objective-C), I did following way
You can add below code to create a folder inside amazon s3 bucket programmatically. This is working code snippet. Any suggestion Welcome.
-(void)createFolder{
AWSS3PutObjectRequest *awsS3PutObjectRequest = [AWSS3PutObjectRequest new];
awsS3PutObjectRequest.key = [NSString stringWithFormat:#"%#/", #"FolderName"];
awsS3PutObjectRequest.bucket = #"Bucket_Name";
AWSS3 *awsS3 = [AWSS3 defaultS3];
[awsS3 putObject:awsS3PutObjectRequest completionHandler:^(AWSS3PutObjectOutput * _Nullable response, NSError * _Nullable error) {
if (error) {
NSLog(#"error Creating folder");
}else{
NSLog(#"Folder Creating Sucessful");
}
}];
}
Here's how you can achieve what you're looking for (from code/cli):
--create/select the file (locally) which you want to move to the folder:
~/Desktop> touch file_to_move
--move the file to s3 folder by executing:
~/Desktop> aws s3 cp file_to_move s3://<path_to_your_bucket>/<new_folder_name>/
A new folder will be created on your s3 bucket and you'll now be able to execute cp, mv, rm ... statements i.e. manage the folder as usual.
If this new file created above is not required, simply delete it. You now have an s3 bucket created.
You can select language of your choice from available AWS SDK
Alternatively you can try minio client libraries available in Python, Go, .Net, Java, Javascript for your application development environment, it has example directory with all basic operations listed.
Disclaimer: I work for Minio
In swift 2.2 you can create folder using
func createFolderWith(Name: String!) {
let folderRequest: AWSS3PutObjectRequest = AWSS3PutObjectRequest()
folderRequest.key = Name + "/"
folderRequest.bucket = "Your Bucket Name"
AWSS3.defaultS3().putObject(folderRequest).continueWithBlock({ (task) -> AnyObject? in
if task.error != nil {
assertionFailure("* * * error: \(task.error?.localizedDescription)")
} else {
print("created \(Name) folder")
}
return nil
})
}
Below creates a empty directory called "mydir1".
Below is nodejs code, it should be similar for other languages.
The trick is to have slash (/) at the end of the name of object, as in "mydir1/", otherwise a file with name "mydir1" will be created.
let AWS = require('aws-sdk');
AWS.config.loadFromPath(__dirname + '\\my-aws-config.json');
let s3 = new AWS.S3();
var params = {
Bucket: "mybucket1",
Key: "mydir1/",
ServerSideEncryption: "AES256" };
s3.putObject(params, function (err, data) {
if (err) {
console.log(err, err.stack); // an error occurred
return;
} else {
console.log(data); // successful response
return;
/*
data = {
ETag: "\"6805f2cfc46c0f04559748bb039d69ae\"",
ServerSideEncryption: "AES256",
VersionId: "Ri.vC6qVlA4dEnjgRV4ZHsHoFIjqEMNt"
}
*/
} });
Source: http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#putObject-property
in-order to create a directory inside s3 bucket and copy contents inside that is pretty simple.
S3 command can be used:
aws s3 cp abc/def.txt s3://mybucket/abc/
Note: / is must that makes the directory, otherwise it will become a file in s3.
I guess your query is just simply creating a folder inside folder(subfolder).
so while coping any directory data inside a bucket sub folder use command like this.
aws s3 cp mudit s3://mudit-bucket/Projects-folder/mudit-subfolder --recursive
It will create a subfolder and put ur directory contents in it. Also once your subfolder data gets empty. Your Subfolder will automatically gets deleted.
You can use copy command to create a folder while copy a file.
aws s3 cp test.xml s3://mybucket/myfolder/test.xml