Bazel extension that allows to load file from S3 into BUILD

Bazel extension that allows to load file from S3 into BUILD - amazon-s3

Currently I have a list of dictionaries in a .bzl file:
test_data = [
{ "name": "test", "data": "test_data"}
]
That I load in a BUILD file and perform some magic with list comprehension...
[
foo(name=data["name"], data=data["data"])
for data in test_data
]
I need to be able to pull this file in from S3 and provide the contents of the BUILD file the same way I do with the static .bzl file.

Related

Filepulse Connector error with S3 provider (Source Connector)

I am trying to poll csv files from S3 buckets using Filepulse source connector. When the task starts I get the following error. What additional libraries do I need to add to make this work from S3 bucket ? Config file below.
Where did I go wrong ?
Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask:208)
java.nio.file.FileSystemNotFoundException: Provider "s3" not installed
at java.base/java.nio.file.Path.of(Path.java:212)
at java.base/java.nio.file.Paths.get(Paths.java:98)
at io.streamthoughts.kafka.connect.filepulse.fs.reader.LocalFileStorage.exists(LocalFileStorage.java:62)
Config file :
{
"name": "FilePulseConnector_3",
"config": {
"connector.class": "io.streamthoughts.kafka.connect.filepulse.source.FilePulseSourceConnector",
"filters": "ParseCSVLine, Drop",
"filters.Drop.if": "{{ equals($value.artist, 'U2') }}",
"filters.Drop.invert": "true",
"filters.Drop.type": "io.streamthoughts.kafka.connect.filepulse.filter.DropFilter",
"filters.ParseCSVLine.extract.column.name": "headers",
"filters.ParseCSVLine.trim.column": "true",
"filters.ParseCSVLine.seperator": ";",
"filters.ParseCSVLine.type": "io.streamthoughts.kafka.connect.filepulse.filter.DelimitedRowFilter",
"fs.cleanup.policy.class": "io.streamthoughts.kafka.connect.filepulse.fs.clean.LogCleanupPolicy",
"fs.cleanup.policy.triggered.on":"COMMITTED",
"fs.listing.class": "io.streamthoughts.kafka.connect.filepulse.fs.AmazonS3FileSystemListing",
"fs.listing.filters":"io.streamthoughts.kafka.connect.filepulse.fs.filter.RegexFileListFilter",
"fs.listing.interval.ms": "10000",
"file.filter.regex.pattern":".*\\.csv$",
"offset.policy.class":"io.streamthoughts.kafka.connect.filepulse.offset.DefaultSourceOffsetPolicy",
"offset.attributes.string": "name",
"skip.headers": "1",
"topic": "connect-file-pulse-quickstart-csv",
"tasks.reader.class": "io.streamthoughts.kafka.connect.filepulse.fs.reader.LocalRowFileInputReader",
"tasks.file.status.storage.class": "io.streamthoughts.kafka.connect.filepulse.state.KafkaFileObjectStateBackingStore",
"tasks.file.status.storage.bootstrap.servers": "172.27.157.66:9092",
"tasks.file.status.storage.topic": "connect-file-pulse-status",
"tasks.file.status.storage.topic.partitions": 10,
"tasks.file.status.storage.topic.replication.factor": 1,
"tasks.max": 1,
"aws.access.key.id":"<<>>",
"aws.secret.access.key":"<<>>",
"aws.s3.bucket.name":"mytestbucketamtrak",
"aws.s3.region":"us-east-1"
}
}
What should I put in the libraries to make this work ? Note : The lenses connector sources from S3 bucket without issues. So its not a credentials issue.

As mentioned in comments by #OneCricketeer
Suggest you follow - github.com/streamthoughts/kafka-connect-file-pulse/issues/382 pointed to root cause.
Modifying the config file to use this property sourced the file:
"tasks.reader.class": "io.streamthoughts.kafka.connect.filepulse.fs.reader.AmazonS3RowFileInputReader"

Is it possible to automatically select command option while running extension command in task?

I’m using «PlantUML» extension, and I want to export diagram automatically on file save. For this purpose I found «Trigger Task on Save» extension and tried write task that will execute command «command:plantuml.exportCurrent» on save of any .puml file.
settings.json (for «Trigger Task on Save» extension)
{
"triggerTaskOnSave.on": true,
"triggerTaskOnSave.restart": true,
"triggerTaskOnSave.tasks": {
"Export current diagram to SVG": [
"*.puml"
]
},
}
tasks.json
{
"version": "2.0.0",
"cwd": "${workspaceFolder}",
"tasks": [
{
"label": "Export current diagram to SVG",
"command": "${command:plantuml.exportCurrent}",
}
],
}
This export command requires from user to select file format (file format select options). I need to export diagram in svg format. Is it possible to automatically select «svg» option in task?

It should be possible if you set (in the PlantUML extension) your export format to SVG (so it won't ask you the format when you export):

Using CDSAPI in google colab

I installed a python library called cdsapiin google colab.
To use it I need to locate its config file (which in a general Linux system is $HOME/.cdsapirc) and add my account key to it.
More details can be found here (https://cds.climate.copernicus.eu/api-how-to).
I am having a problem with this step
Copy the code displayed beside, in the file $HOME/.cdsapirc (in your
Unix/Linux environment): url: {api-url} key: {uid}:{api-key}
I tried using !cd /home/ in colab notebook but it doesn't contain this file.
I have also tried !cat /home/.cdsapirc, it gave error:
cat: /home/.cdsapirc: No such file or directory

I achieved this successfully. My code in Colab is as follows:
First, create '.cdsapirc' and write your key in root dir:
url = 'url: https://cds.climate.copernicus.eu/api/v2'
key = 'key: your uid and key'
with open('/root/.cdsapirc', 'w') as f:
f.write('\n'.join([url, key]))
with open('/root/.cdsapirc') as f:
print(f.read())
Then, install cdsapi:
!pip install cdsapi
Run example:
import cdsapi
c = cdsapi.Client()
c.retrieve("reanalysis-era5-pressure-levels",
{
"variable": "temperature",
"pressure_level": "1000",
"product_type": "reanalysis",
"year": "2008",
"month": "01",
"day": "01",
"time": "12:00",
"format": "grib"
}, "/target/dir/download.grib")
The target dir could be your google drive folder.

You can specify your UID, API key and CDS API endpoint directly as arguments into the constructor:
uid = <YOUR UID HERE>
apikey = <YOUR APIKEY>
c = cdsapi.Client(key=f"{uid}:{apikey}", url="https://cds.climate.copernicus.eu/api/v2")

Unbundeling a pre-built Javascript file built using browserify

I have a third party library, non uglified which was bundled using browserify. Unfortunately the original sources are not available.
Is there a way to unbundled it into different files/sources.

You should be able to 'unbundle' the pre-built Browserify bundle using browser-unpack.
It will generate JSON output like this:
[
{
"id": 1,
"source": "\"use strict\";\r\nvar TodoActions = require(\"./todo\"); ... var VisibilityFilterActions = require(\"./visibility-filter\"); ...",
"deps": {
"./todo": 2,
"./visibility-filter": 3
}
},
{
"id": 2,
"source": "\"use strict\";\r\n ...",
"deps": {}
},
{
"id": 3,
"source": "\"use strict\";\r\n ...",
"deps": {}
},
...
]
It should be reasonably straight-forward to transform the JSON output into source files that can be required. Note that the mappings of require literals (like "./todo") are in the deps. That is, the module required as "./todo" corresponds to the source with an id of 2.
There is also a browserify-unpack tool - which writes the contents as files - but I've not used it.

create folder inside S3 bucket using Cloudformation

I'm able to create an S3 bucket using cloudformation but would like to create a folder inside an S3 bucket..like
<mybucket>--><myfolder>
Please let me know the template to be used to create a folder inside a bucket ...both should be created at the sametime...
I'm Using AWS lambda as below
stackname = 'myStack'
client = boto3.client('cloudformation')
response = client.create_stack(
StackName= (stackname),
TemplateURL= 'https://s3.amazonaws.com/<myS3bucket>/<myfolder>/nestedstack.json',
Parameters=<params>
)

AWS doesn't provide an official CloudFormation resource to create objects within an S3 bucket. However, you can create a Lambda-backed Custom Resource to perform this function using the AWS SDK, and in fact the gilt/cloudformation-helpers GitHub repository provides an off-the-shelf custom resource that does just this.
As with any Custom Resource setup is a bit verbose, since you need to first deploy the Lambda function and IAM permissions, then reference it as a custom resource in your stack template.
First, add the Lambda::Function and associated IAM::Role resources to your stack template:
"S3PutObjectFunctionRole": {
"Type": "AWS::IAM::Role",
"Properties": {
"AssumeRolePolicyDocument": {
"Version" : "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": [ "lambda.amazonaws.com" ]
},
"Action": [ "sts:AssumeRole" ]
}
]
},
"ManagedPolicyArns": [
{ "Ref": "RoleBasePolicy" }
],
"Policies": [
{
"PolicyName": "S3Writer",
"PolicyDocument": {
"Version" : "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:DeleteObject",
"s3:ListBucket",
"s3:PutObject"
],
"Resource": "*"
}
]
}
}
]
}
},
"S3PutObjectFunction": {
"Type": "AWS::Lambda::Function",
"Properties": {
"Code": {
"S3Bucket": "com.gilt.public.backoffice",
"S3Key": "lambda_functions/cloudformation-helpers.zip"
},
"Description": "Used to put objects into S3.",
"Handler": "aws/s3.putObject",
"Role": {"Fn::GetAtt" : [ "S3PutObjectFunctionRole", "Arn" ] },
"Runtime": "nodejs",
"Timeout": 30
},
"DependsOn": [
"S3PutObjectFunctionRole"
]
},
Then you can use the Lambda function as a Custom Resource to create your S3 object:
"MyFolder": {
"Type": "Custom::S3PutObject",
"Properties": {
"ServiceToken": { "Fn::GetAtt" : ["S3PutObjectFunction", "Arn"] },
"Bucket": "mybucket",
"Key": "myfolder/"
}
},
You can also use the same Custom Resource to write a string-based S3 object by adding a Body parameter in addition to Bucket and Key (see the docs).

This is not possible using an AWS CloudFormation template.
It should be mentioned that folders do not actually exist in Amazon S3. Instead, the path of an object is prepended to the name (key) of an object.
So, file bar.txt stored in a folder named foo is actually stored with a Key of: foo/bar.txt
You can also copy files to a folder that doesn't exist and the folder will be automatically created (which is not actually true, since the folder itself doesn't exist). However, the Management Console will provide the appearance of such a folder and the path will suggest that it is stored in such a folder.
Bottom line: There is no need to pre-create a folder. Just use it as if it were already there.

We cannot (at least as of now) create a sub folder inside s3 bucket.
You can try using following command :
aws s3 mb s3://yavdhesh-bucket/inside-folder
And then try to list all the folders inside the bucket using command:
aws s3 ls s3://yavdhesh-bucket
And you will observe that the sub folder was not created.
there is only one way to create a subfolder, that is by creating/copying a file inside a non-existing sub folder or sub directory (with respect to bucket)
For example,
aws s3 cp demo.txt s3://yavdhesh-bucket/inside-folder/
Now if you list down the files present inside your sub-folder, it should work.
aws s3 ls s3://yavdhesh-bucket/inside-folder/
it should list down all the files present in this sub folder.
Hope it helps.

I ended up with a small python script. It should be run manually, but it does the the sync automatically. It's for lazy people who don't want to create a Lambda-Backed Custom Resource.
import subprocess
import json
STACK_NAME = ...
S3_RESOURCE = <name of your s3 resource, as in CloudFormation template file>
LOCAL_DIR = <path of your local dir>
res = subprocess.run(
['aws', 'cloudformation', 'describe-stack-resource', '--stack-name', STACK_NAME, '--logical-resource-id', S3_RESOURCE],
capture_output=True,
)
out = res.stdout.decode('utf-8')
resource_details = json.loads(out)
resource_id = resource_details['StackResourceDetail']['PhysicalResourceId']
res = subprocess.run(
['aws', 's3', 'sync', LOCAL_DIR, f's3://{resource_id}/', '--acl', 'public-read']
)

The link provided by wjordan to gilt/cloudformation-helpers doesn't work anymore.
This KB from AWS Knowledge Center outlines how to do it via both JSON or YAML templates:
https://aws.amazon.com/premiumsupport/knowledge-center/cloudformation-s3-custom-resources/
Note this little line:
Note: In the following resolution, all the S3 bucket content is
deleted when the CloudFormation stack is deleted.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Bazel extension that allows to load file from S3 into BUILD - amazon-s3

Related

Filepulse Connector error with S3 provider (Source Connector)

Is it possible to automatically select command option while running extension command in task?

Using CDSAPI in google colab

Unbundeling a pre-built Javascript file built using browserify

create folder inside S3 bucket using Cloudformation

Categories

Resources