Cacti graphs showing empty data though rrd files are getting generated - cacti

rrd files are getting generated through cron job.
I have not configured snmp and i configured the same in graph management.
So, there is no need to configure snmp on unix server (localhost).

It usually takes a while for it to collect data that it can use to generate graphs. If you set it up today, give it time for it to collate data it can use to show graphs.

Related

Read S3 file based on the path that comes in Kafka - Apache Flink

I have a pipeline that listens to a Kafka topic that receives the s3 file-name & path. The pipeline has to read the file from S3 and do some transformation & aggregation.
I see the Flink has support to read the S3 file directly as source connector, but this use case is to read as part of the transformation stage.
I don't believe this is currently possible.
An alternative might be to keep a Flink session cluster running, and dynamically create and submit a new Flink SQL job running in batch mode to handle the ingestion of each file.
Another approach you might be tempted by would be to implement a RichFlatMapFunction that accepts the path as input, reads the file, and emits its records one by one. But this is likely to not work very well unless the files are rather small because Flink really doesn't like to have user functions that run for long periods of time.

Saving Apache Superset query results to S3

When Apache Superset runs a query, I want to save the results to S3 so they can be further reused and not lost.
How can I configure the default output directory for Apache Superset?
Superset is a reporting tool, so the result of the queries are used for creating charts, filters, etc.
For your purpose (reusing a result), I think you should have a look to Superset Cache (more info here or in the official page)
If you want to store the data in a CSV file in S3, I recommend that you use an ETL tool or a custom development (with Python for instance), but that would mean defining the view in the database level (so it can be accessed from the ETL), and then reuse the view in Superset

Clean up and prevent excessive data accumulation in an MobileFirst Analytics 8.0 environment

Our analytics data is taking up almost 100% disk space on the file system. How do we remove the old er data, and prevent such situation from occurring again?
You can follow the url, https://mobilefirstplatform.ibmcloud.com/tutorials/en/foundation/8.0/installation-configuration/production/server-configuration/#setting-up-jndi-properties-for-mobilefirst-server-web-applications to setup JNDI properties in Mobilefirst. You need to
set the TTL values base on you business requirements, and keep the values as short as possible, so that huge data accumulation does not occur again. To clean up the existing data, you can perform the following
Setup the Analytics server with JNDI properties set for TTL and other configuration
Stop the Analytics Server
Delete the /analyticsData directory contents to discard any initial data (this will not affect as there is no data accumulated yet. So that there is no directories within the analyticsData directory) Note:
/analyticsData is the default location, please refer
http://mobilefirstplatform.ibmcloud.com/tutorials/en/foundation/8.0/installation-configuration/production/analytics/configuration/ to verify the actual value in your environment.
Restart the Analytics server. (Now the index will be created brand new with TTL in effect causing the proper data purging in place)

How are Apache Pig UDFs distributed to data nodes?

There are plenty of documentation about how to write Pig UDFs in the various languages but I haven't found anything on how they are distributed to the data nodes.
Are they done automatically when pig script is invoked? If it makes any difference, I'd be writing UDF in Java.
Let me make it more clear. Whenever we wite a UDF and the pig is in hdfs mode. Then UDFs, which initially resides in the local or the client side, is carried to the cluster as per the internal architecture of hadoop. Now the UDFs task is performed by the task tracker and it becomes the duty of the job tracker to assign the the UDFs to task tracker, which is near to the data node where the input file resides.
Note: Its always the job tracker(component of name node), which actually decides which task tracker should perform the execution of the UDFs.
If the input file is in local file system(local mode), then the UFDs get executed in the local JVM.
The fact is apache pig works in two modes
1) local mode
2) hdfs mode
To answer you question, which belongs to pig running in hdfs mode, we only made sure that the input file that we are loading is present in the hdfs(data node). When the question comes for UDF, this is simply a function that is used to process the input file, just link pig latin language. We are writing UDFs, pig latin via the client side node and thus all the data related to this will be stored in the client side machine.
Above all, we have configure the pig so that client can interact with the hdfs to process the required result.
Hope this helps

Running multiple Kettle transformation on single JVM

We want to use pan.sh to execute multiple kettle transformations. After exploring the script I found that it internally calls spoon.sh script which runs in PDI. Now the problem is every time a new transformation starts it create a separate JVM for its executions(invoked via a .bat file), however I want to group them to use single JVM to overcome memory constraints that the multiple JVM are putting on the batch server.
Could somebody guide me on how can I achieve this or share the documentation/resources with me.
Thanks for the good work.
Use Carte. This is exactly what this is for. You can startup a server (on the local box if you like) and then submit your jobs to it. One JVM, one heap, shared resource.
Benefit of that is then scalability, so when your box becomes too busy just add another one, also using carte and start sending some of the jobs to that other server.
There's an old but still current blog here:
http://diethardsteiner.blogspot.co.uk/2011/01/pentaho-data-integration-remote.html
As well as doco on the pentaho website.
Starting the server is as simple as:
carte.sh <hostname> <port>
There is also a status page, which you can use to query your carte servers, so if you have a cluster of servers, you can pick a quiet one to send your job to.