I'm trying to setup druid to work with rabbitmq firehose but getting the following error from Tranquility
java.lang.IllegalArgumentException: Could not resolve type id 'rabbitmq' into a subtype of [simple type, class io.druid.data.input.FirehoseFactory]
I did the following
1. Installed Druid
2. Downloaded extension druid-rabbitmq
3. Copied druid-rabbitmq into druid extensions
4. Copied amqp-client jar to druid lib
5. Added druid-rabbitmq into druid.extensions.loadList in common.runtime.properties
6. In Tranquility server.json configuration added the firehose config
"ioConfig" : {
"type" : "realtime",
"firehose" : {
"type" : "rabbitmq",
"connection" : {
"host": "localhost",
"port": "5672",
"username": "blackbox",
"password": "blackbox",
"virtualHost": "blackbox-vhost",
"uri": "amqp://localhost:5672/blackbox-vhost"
},
"config" : {
"exchange": "test-exchange",
"queue" : "test-q",
"routingKey": "#",
"durable": "true",
"exclusive": "false",
"autoDelete": "false",
"maxRetries": "10",
"retryIntervalSeconds": "1",
"maxDurationSeconds": "300"
}
}
}
I'm using imply 1.3.0 but I think Tranquility is for stream pushing while a firehose is used for stream pulling so I think this was the problem. So now I created a realtime node and it's running fine. I also had to copy lyra jar file into druid lib directory. Now I can publish data from rabbit and its been inserted into druid and I can query the data but problem is that in rabbit the message is still showing as unacked. Any idea?
Related
I am trying to poll csv files from S3 buckets using Filepulse source connector. When the task starts I get the following error. What additional libraries do I need to add to make this work from S3 bucket ? Config file below.
Where did I go wrong ?
Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask:208)
java.nio.file.FileSystemNotFoundException: Provider "s3" not installed
at java.base/java.nio.file.Path.of(Path.java:212)
at java.base/java.nio.file.Paths.get(Paths.java:98)
at io.streamthoughts.kafka.connect.filepulse.fs.reader.LocalFileStorage.exists(LocalFileStorage.java:62)
Config file :
{
"name": "FilePulseConnector_3",
"config": {
"connector.class": "io.streamthoughts.kafka.connect.filepulse.source.FilePulseSourceConnector",
"filters": "ParseCSVLine, Drop",
"filters.Drop.if": "{{ equals($value.artist, 'U2') }}",
"filters.Drop.invert": "true",
"filters.Drop.type": "io.streamthoughts.kafka.connect.filepulse.filter.DropFilter",
"filters.ParseCSVLine.extract.column.name": "headers",
"filters.ParseCSVLine.trim.column": "true",
"filters.ParseCSVLine.seperator": ";",
"filters.ParseCSVLine.type": "io.streamthoughts.kafka.connect.filepulse.filter.DelimitedRowFilter",
"fs.cleanup.policy.class": "io.streamthoughts.kafka.connect.filepulse.fs.clean.LogCleanupPolicy",
"fs.cleanup.policy.triggered.on":"COMMITTED",
"fs.listing.class": "io.streamthoughts.kafka.connect.filepulse.fs.AmazonS3FileSystemListing",
"fs.listing.filters":"io.streamthoughts.kafka.connect.filepulse.fs.filter.RegexFileListFilter",
"fs.listing.interval.ms": "10000",
"file.filter.regex.pattern":".*\\.csv$",
"offset.policy.class":"io.streamthoughts.kafka.connect.filepulse.offset.DefaultSourceOffsetPolicy",
"offset.attributes.string": "name",
"skip.headers": "1",
"topic": "connect-file-pulse-quickstart-csv",
"tasks.reader.class": "io.streamthoughts.kafka.connect.filepulse.fs.reader.LocalRowFileInputReader",
"tasks.file.status.storage.class": "io.streamthoughts.kafka.connect.filepulse.state.KafkaFileObjectStateBackingStore",
"tasks.file.status.storage.bootstrap.servers": "172.27.157.66:9092",
"tasks.file.status.storage.topic": "connect-file-pulse-status",
"tasks.file.status.storage.topic.partitions": 10,
"tasks.file.status.storage.topic.replication.factor": 1,
"tasks.max": 1,
"aws.access.key.id":"<<>>",
"aws.secret.access.key":"<<>>",
"aws.s3.bucket.name":"mytestbucketamtrak",
"aws.s3.region":"us-east-1"
}
}
What should I put in the libraries to make this work ? Note : The lenses connector sources from S3 bucket without issues. So its not a credentials issue.
As mentioned in comments by #OneCricketeer
Suggest you follow - github.com/streamthoughts/kafka-connect-file-pulse/issues/382 pointed to root cause.
Modifying the config file to use this property sourced the file:
"tasks.reader.class": "io.streamthoughts.kafka.connect.filepulse.fs.reader.AmazonS3RowFileInputReader"
I am trying to get started with a monitoring server solution. I got the Sensu Clients, RabbitMQ and Uchiwa configured but then I tried using Graphite but there were so many parts to configure I tried InfluxDB instead. I am stuck configuring Sensu to InfluxDB.
Is there a part missing in the below configuration?
Client [Sensu] > RabbitMQ <> Sensu Server <> InfluxDB <> Grafana
Any suggestions?
cat influx.json
{
"influxdb": {
"hosts" : ["192.168.1.1"],
"host" : "192.168.1.1",
"port" : "8086",
"database" : "sensumetrics",
"time_precision": "s",
"use_ssl" : false,
"verify_ssl" : false,
"initial_delay" : 0.01,
"max_delay" : 30,
"open_timeout" : 5,
"read_timeout" : 300,
"retry" : null,
"prefix" : "",
"denormalize" : true,
"status" : true
}
}
cat handler.json
{
"handlers": {
"influxdb": {
"type": "pipe",
"command": "/opt/sensu/embedded/bin/metrics-influxdb.rb"
}}}
checks1,
{
"checks": {
"check_memory_linux": {
"handlers": ["influxdb","default"],
"command": "/opt/sensu/embedded/bin/check-memory-percent.rb -w 90 -c 95",
"interval": 60,
"occurrences": 5,
"subscribers": [ "TEST" ]
}}}
checks2,
{
"checks": {
"check_cpu_linux-elkctrl-pipe": {
"type": "metric",
"command": "/opt/sensu/embedded/bin/check-cpu.rb -w 80 -c 90",
"subscribers": ["TEST"],
"interval": 10,
"handlers": ["debug","influxdb"]
}}}
To use InfluxDB to persist your data, you must have:
InfluxDB plugin installed (also, installation and usage instructions here)
Definitions for the plugin (an influxdb.json containin at least the host, port, user, password and database to be used by Sensu)
The definition, as other config files, must be in /etc/sensu/conf.d/
Handler configuration set properly (also in conf.d)
Mutator for InfluxDB (extensions)
Your checks must send results to the handler, so their definition must contain:
"handlers": [
"influxdb"
]
Or whatever name you gave your handler.
Case, if the influxdb config you provided above is the full extent of your configuration, it would seem to be missing the username/password attributes required by the influxdb configuration. If they're present, but not provided in the post, no big deal. However, I'd recommend doing the following for your Sensu logs:
grep -i influxdb /var/logs/sensu/sensu-server.log
And seeing if the check result is getting sent to your influxdb instance. If they are, you should be receiving an error that might be pointing a bit more to what's going on.
You can also check your influxdb logs to see if they're getting a post from your Sensu server:
journalctl -u influxdb.service -f
But yeah, if the username/password is missing from the configuration, that'd be the first place that I start.
First i config my modeshape configuration file like this:
"storage" : {
"persistence" : {
"type" : "db",
"connectionUrl": "${database.url}",
"driver": "${database.driver}",
"username": "${database.user}",
"password": "${database.password}",
"tableName": "GOVERNANCE_MODESHAPE",
"poolSize" : 5,
"createOnStart" : true,
"dropOnExit" : false
}
}
After I create a node and set a property for it and save it in my local environment, I can still find the node and the property in my local environment. But it will can't be found in my colleague local environment.
Then I change the configuration like this:
"storage" : {
"persistence" : {
"type" : "db",
"connectionUrl": "${database.url}",
"driver": "${database.driver}",
"username": "${database.user}",
"password": "${database.password}",
"tableName": "GOVERNANCE_MODESHAPE",
"poolSize" : 5,
"createOnStart" : true,
"dropOnExit" : false
},
"binaryStorage" : {
"type" : "file",
"directory": "/var/thinkbig/modeshape",
"minimumBinarySizeInBytes" : 5000000
}
}
I can find the node and property which created in my local environment, and my colleague also can find it in his local environment. But i can't find the directory of path /var/thinkbig/modeshape.
So I want to know the modeshape binary store from where? Why I add the "binaryStorage" config in the configuration file, everybody can find the node and property? Thanks in advance!
Per the doc for minimumBinarySizeInBytesthe minimum size (in bytes) above which binary values will be stored in the store. Any binary value lower in size will be stored together with the other node information..
This means that binaries smaller than the specified size are stored in the database, rather than the file system. You could change this to a value of 1 byte if you want to ensure that all binaries get stored in the file system .
I'm trying to make storage plugin for Hadoop (hdfs) and Apache Drill.
Actually I'm confused and I don't know what to set as port for hdfs:// connection, and what to set for location.
This is my plugin:
{
"type": "file",
"enabled": true,
"connection": "hdfs://localhost:54310",
"workspaces": {
"root": {
"location": "/",
"writable": false,
"defaultInputFormat": null
},
"tmp": {
"location": "/tmp",
"writable": true,
"defaultInputFormat": null
}
},
"formats": {
"psv": {
"type": "text",
"extensions": [
"tbl"
],
"delimiter": "|"
},
"csv": {
"type": "text",
"extensions": [
"csv"
],
"delimiter": ","
},
"tsv": {
"type": "text",
"extensions": [
"tsv"
],
"delimiter": "\t"
},
"parquet": {
"type": "parquet"
},
"json": {
"type": "json"
},
"avro": {
"type": "avro"
}
}
}
So, is ti correct to set localhost:54310 because I got that with command:
hdfs -getconf -nnRpcAddresses
or it is :8020 ?
Second question, what do I need to set for location? My hadoop folder is in:
/usr/local/hadoop
, and there you can find /etc /bin /lib /log ... So, do I need to set location on my datanode, or?
Third question. When I'm connecting to Drill, I'm going through sqlline and than connecting on my zookeeper like:
!connect jdbc:drill:zk=localhost:2181
My question here is, after I make storage plugin and when I connect to Drill with zk, can I query hdfs file?
I'm very sorry if this is a noob question but I haven't find anything useful on internet or at least it haven't helped me.
If you are able to explain me some stuff, I'll be very grateful.
As per Drill docs,
{
"type" : "file",
"enabled" : true,
"connection" : "hdfs://10.10.30.156:8020/",
"workspaces" : {
"root" : {
"location" : "/user/root/drill",
"writable" : true,
"defaultInputFormat" : null
}
},
"formats" : {
"json" : {
"type" : "json"
}
}
}
In "connection",
put namenode server address.
If you are not sure about this address.
Check fs.default.name or fs.defaultFS properties in core-site.xml.
Coming to "workspaces",
you can save workspaces in this. In the above example, there is a workspace with name root and location /user/root/drill.
This is your HDFS location.
If you have files under /user/root/drill hdfs directory, you can query them using this workspace name.
Example: abc is under this directory.
select * from dfs.root.`abc.csv`
After successfully creating the plugin, you can start drill and start querying .
You can query any directory irrespective to workspaces.
Say you want to query employee.json in /tmp/data hdfs directory.
Query is :
select * from dfs.`/tmp/data/employee.json`
I have similar problem, Drill cannot read dfs server. Finally, the problem is cause by namenode port.
The default address of namenode web UI is http://localhost:50070/.
The default address of namenode server is hdfs://localhost:8020/.
I am new to Apache drill.While creating the storage plugin for Apache hive.I am getting the error.I have tried two ways.Below is the configuration.
1.First approach:
{
"type": "hive",
"enabled": false,
"configProps": {
"hive.metastore.uris": "thrift2:localhost:10000",
"fs.default.name": "hdfs://localhost:9000/",
"hive.metastore.sasl.enabled": "false"
}
}
2.Second approach:
{
"type": "hive",
"enabled": false,
"configProps": {
"hive.metastore.uris": "",
"javax.jdo.option.ConnectionURL": "jdbc:derby://localhost:1527/metastore_db;create=true",
"hive.metastore.warehouse.dir": "/user/tmp/warehouse/hive",
"fs.default.name": "hdfs://localhost:9000",
"hive.metastore.sasl.enabled": "false"
}
}
I am using plain Apache components and both drill and hive2 are installed in the same machine.
For both the cases I am getting the error in the GUI as
Please retry: error (unable to create/ update storage)
Kindly help me in resolving the same.Thanks in Advance!!
I am able to connection through the first approach i.e Hive Remote Metastore Connection.
Here is the Configuration:
{
"type": "hive",
"enabled": false,
"configProps": {
"hive.metastore.uris": "thrift:localhost:9083",
"fs.default.name": "hdfs://localhost:9000/",
"hive.metastore.sasl.enabled": "false"
}
}
Also make sure that Hive metastore is up and running.It can be started using the below command
hive -- service metastore &.
Also the parameter hive.metastore.uris in the hive-site.xml should be updated with thrift://localhost:9083.
Thanks