Filepulse Connector error with S3 provider (Source Connector) - amazon-s3

I am trying to poll csv files from S3 buckets using Filepulse source connector. When the task starts I get the following error. What additional libraries do I need to add to make this work from S3 bucket ? Config file below.
Where did I go wrong ?
Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask:208)
java.nio.file.FileSystemNotFoundException: Provider "s3" not installed
at java.base/java.nio.file.Path.of(Path.java:212)
at java.base/java.nio.file.Paths.get(Paths.java:98)
at io.streamthoughts.kafka.connect.filepulse.fs.reader.LocalFileStorage.exists(LocalFileStorage.java:62)
Config file :
{
"name": "FilePulseConnector_3",
"config": {
"connector.class": "io.streamthoughts.kafka.connect.filepulse.source.FilePulseSourceConnector",
"filters": "ParseCSVLine, Drop",
"filters.Drop.if": "{{ equals($value.artist, 'U2') }}",
"filters.Drop.invert": "true",
"filters.Drop.type": "io.streamthoughts.kafka.connect.filepulse.filter.DropFilter",
"filters.ParseCSVLine.extract.column.name": "headers",
"filters.ParseCSVLine.trim.column": "true",
"filters.ParseCSVLine.seperator": ";",
"filters.ParseCSVLine.type": "io.streamthoughts.kafka.connect.filepulse.filter.DelimitedRowFilter",
"fs.cleanup.policy.class": "io.streamthoughts.kafka.connect.filepulse.fs.clean.LogCleanupPolicy",
"fs.cleanup.policy.triggered.on":"COMMITTED",
"fs.listing.class": "io.streamthoughts.kafka.connect.filepulse.fs.AmazonS3FileSystemListing",
"fs.listing.filters":"io.streamthoughts.kafka.connect.filepulse.fs.filter.RegexFileListFilter",
"fs.listing.interval.ms": "10000",
"file.filter.regex.pattern":".*\\.csv$",
"offset.policy.class":"io.streamthoughts.kafka.connect.filepulse.offset.DefaultSourceOffsetPolicy",
"offset.attributes.string": "name",
"skip.headers": "1",
"topic": "connect-file-pulse-quickstart-csv",
"tasks.reader.class": "io.streamthoughts.kafka.connect.filepulse.fs.reader.LocalRowFileInputReader",
"tasks.file.status.storage.class": "io.streamthoughts.kafka.connect.filepulse.state.KafkaFileObjectStateBackingStore",
"tasks.file.status.storage.bootstrap.servers": "172.27.157.66:9092",
"tasks.file.status.storage.topic": "connect-file-pulse-status",
"tasks.file.status.storage.topic.partitions": 10,
"tasks.file.status.storage.topic.replication.factor": 1,
"tasks.max": 1,
"aws.access.key.id":"<<>>",
"aws.secret.access.key":"<<>>",
"aws.s3.bucket.name":"mytestbucketamtrak",
"aws.s3.region":"us-east-1"
}
}
What should I put in the libraries to make this work ? Note : The lenses connector sources from S3 bucket without issues. So its not a credentials issue.

As mentioned in comments by #OneCricketeer
Suggest you follow - github.com/streamthoughts/kafka-connect-file-pulse/issues/382 pointed to root cause.
Modifying the config file to use this property sourced the file:
"tasks.reader.class": "io.streamthoughts.kafka.connect.filepulse.fs.reader.AmazonS3RowFileInputReader"

Related

Use variables in azure stream analytics properties

I want to reduce the number of overrides during the deployment of my ASA by using environment variables in my properties.
Expectations
Having variables defined in the asaproj.json or the JobConfig.json file or a .env file.
{
...
"variables": [
"environment": "dev"
]
}
Call those variables in a properties file such as an SQL Reference properties input file
{
"Name": "sql-query",
"Type": "Reference data",
"DataSourceType": "SQL Database",
"SqlReferenceProperties": {
"Database": "${environment}-sql-bdd",
"Server": "${environment}-sql",
"User": "user",
"Password": null,
"FullSnapshotPath": "sql-query.snapshot.sql",
"RefreshType": "Execute periodically",
"RefreshRate": "06:00:00",
"DeltaSnapshotPath": null
},
"DataSourceCredentialDomain": null,
"ScriptType": "Input"
}
Attempt
I could use a powershell script to override values from the ARM variables file generated by the npm package azure-streamanalytics-cicd. It's not clean at all.
Problem
I can't find resources about environment variables in azure stream analytics online. Does such a thing exist ? If so, can you provide some piece of documentation ?

What should be used for endpoint in renovate-bot config.json?

I am trying to set up config.json for Bitbucket Cloud to automatically update dependencies in npm repos of Bitbucket Cloud. I found one example, but cannot figure out two things:
endpoint - what should go there (ABC)? - our company's bitbucket namespace link looks like: https://bitbucket.org/uvxyz/
Can I use renovate-bot to issue PRs without bitbucket pipelines? If so, can I make renovate to update only particular repo or repos via config.json mods or I shall put renovate.json file in each repo where automatic dependency update is required?
appreciate any examples on the latter.
config.json:
module.exports = {
"platform": "bitbucket",
"username": "<my.username>",
"password": "<bitbucket token on my account>",
"endpoint": "ABC",
"hostRules": [
{
"hostType": "bitbucket",
"domainName": "ABC",
"timeout": 10000,
"username": "<my.username>",
"password": "<bitbucket token on my account>"
}
]
};
according to the code:
const BITBUCKET_PROD_ENDPOINT = 'https://api.bitbucket.org/';
const defaults = { endpoint: BITBUCKET_PROD_ENDPOINT };
there's a default, it works for me without setting it
what you see in the documentation is all you need
I was able to get renovate working with BB after putting the following configuration into its config.js file as
{
hostType: 'bitbucket',
matchHost: 'https://api.bitbucket.org/2.0/',
username: "bb-username",
password: "<special app password generated for bb-username>",
}
for BitBucket app passwords please look at
https://support.atlassian.com/bitbucket-cloud/docs/create-an-app-password/
and
https://support.atlassian.com/bitbucket-cloud/docs/app-passwords/

Terraform data source of an existing S3 bucket fails plan stage attempting a GetBucketWebsite request which returns NoSuchWebsiteConfiguration

I'm trying to use a data source of an existing S3 bucket like this:
data "aws_s3_bucket" "src-config-bucket" {
bucket = "single-word-name" }
And Terraform always fails the plan stage with the message:
Error: UnauthorizedOperation: You are not authorized to perform this operation.
status code: 403, request id: XXXXX
The requests failing can be viewed with the following info in the results:
{
"eventVersion": "1.08",
"userIdentity": {
​​
"type": "IAMUser",
"principalId": "ANONYMIZED",
"arn": "arn:aws:iam::1234567890:user/terraformops",
"accountId": "123456789012",
"accessKeyId": "XXXXXXXXXXXXXXXXXX",
"userName": "terraformops"
}​​,
"eventTime": "2021-02-02T18:12:19Z",
"eventSource": "s3.amazonaws.com",
"eventName": "GetBucketWebsite",
"awsRegion": "eu-west-1",
"sourceIPAddress": "X.Y.Z.W",
"userAgent": "[aws-sdk-go/1.36.28 (go1.15.5; linux; amd64) APN/1.0 HashiCorp/1.0 Terraform/0.14.4 (+https://www.terraform.io)]",
"errorCode": "NoSuchWebsiteConfiguration",
"errorMessage": "The specified bucket does not have a website configuration",
"requestParameters": {
​​
"bucketName": "s3-bucket-name",
"website": "",
"Host": "s3-bucket-name.s3.eu-west-1.amazonaws.com"
}
Why can't I use an existing S3 bucket as a data source within Terraform ? I don't treat it as a website anywhere in the terraform project so I don't know why it asks the server the GetBucketWebsite call and fail. Hope someone can help.
Thanks.
I don't know why it asks the server the GetBucketWebsite call and fail.
It asks GetBucketWebsite as data source aws_s3_bucket returns this information by providing website_endpoint and website_domain.
So you need to have permissions to call this action on the bucket. The error message suggests that the IAM user/role which you use for querying the bucket does not have all permissions to get needed information.

Connecting to s3 bucket with Media Library Strapi

I've used Strapi for a while as a headless cms and, in their most recent update, they changed the File Upload plugin to Media Library. You used to be able to connect an s3 bucket to your app via File Upload's settings- does anyone have any idea how you do the same thing now that Media Library has replaced it?
If you are using strapi version 3.0.0-beta.20.x
What you have to do is to create a settings.json file with below config
./extensions/upload/config/settings.json
{
"provider": "aws-s3",
"providerOptions": {
"accessKeyId": "dev-key",
"secretAccessKey": "dev-secret",
"region": "aws-region",
"params": {
"Bucket": "my-bucket"
}
}
}
You can check out the plugin for more details

Druid RabbitMQ Firehose

I'm trying to setup druid to work with rabbitmq firehose but getting the following error from Tranquility
java.lang.IllegalArgumentException: Could not resolve type id 'rabbitmq' into a subtype of [simple type, class io.druid.data.input.FirehoseFactory]
I did the following
1. Installed Druid
2. Downloaded extension druid-rabbitmq
3. Copied druid-rabbitmq into druid extensions
4. Copied amqp-client jar to druid lib
5. Added druid-rabbitmq into druid.extensions.loadList in common.runtime.properties
6. In Tranquility server.json configuration added the firehose config
"ioConfig" : {
"type" : "realtime",
"firehose" : {
"type" : "rabbitmq",
"connection" : {
"host": "localhost",
"port": "5672",
"username": "blackbox",
"password": "blackbox",
"virtualHost": "blackbox-vhost",
"uri": "amqp://localhost:5672/blackbox-vhost"
},
"config" : {
"exchange": "test-exchange",
"queue" : "test-q",
"routingKey": "#",
"durable": "true",
"exclusive": "false",
"autoDelete": "false",
"maxRetries": "10",
"retryIntervalSeconds": "1",
"maxDurationSeconds": "300"
}
}
}
I'm using imply 1.3.0 but I think Tranquility is for stream pushing while a firehose is used for stream pulling so I think this was the problem. So now I created a realtime node and it's running fine. I also had to copy lyra jar file into druid lib directory. Now I can publish data from rabbit and its been inserted into druid and I can query the data but problem is that in rabbit the message is still showing as unacked. Any idea?