Pentaho Kettle and Scheduling Tool - pentaho

I am using Pentaho Data Integration 5.2 CE and wish to use a scheduling tool with web UI and power to send alerts as mails in case of success or failures.
Please suggest a good open source tool for the purpose.

PDI comes with it's own Carte server which allows you to do some monitoring and remote execution. Scheduling can be done with cron and you can then use the carte server to monitor.
If you'd like a nicer Web UI, I can recommend Azakban, created by the folks at LinkedIn, which i've found to be: easy to setup, very stable, simple for looking at success and failure on transformations or jobs. You basically setup a job file with the executable and upload that to the server. Then you can schedule through it's web ui, as well as look at the results run-time and actually go into the details and review the PDI output logs, which is great.
I've also read good things about Chronos, but haven't tried it as it was a bit more complicated to setup and requieres a Mesos installation.

Related

MarkLogic 10 - configure App Server, Forest and Database from remote location on the server

I am looking for full automation right from configuring App Server, Forest, and Database. I found way to configure MarkLogic through configuration program as per https://docs.marklogic.com/guide/admin-api/configure
I am looking for a way that can be integrated into the build process as a part of automation which will take care of this one-time configuration for MarkLogic 10.
ml-gradle might be what you're looking for:
https://github.com/marklogic-community/ml-gradle
"ml-gradle is a Gradle plugin that can automate everything you do with MarkLogic. Deploy an application, add a host..."

Automated Testing of Nifi flows using Jenkins

Is there any way to automatically run regression/functional tests on Nifi flows using Jenkins pipeline ?
Searched for it, without any success.
Thanks for your help.
With the recent release of NiFI-1.5.0 and NiFi-Registry-0.1.0, the community has come together to produce a number of SDLC/CICD integration tools to make using things like Jenkins Pipeline easier.
There is both Python (NiPyAPI), and Java (NiFi-Toolkit-CLI) API wrappers being produced by a team of collaborators to allow scripted manipulation of NiFi Flows across different environments.
Common functions include interaction with integrated version control, import/export of flows as JSON documents, deployment between environments, start/stop of flows, etc.
So, we are working quickly towards supporting things like an integrated wrapper for declarative Jenkins Pipelines, and I would add it is being done fully in public codebase under the Apache license, so we (I am the lead NiPy author) would welcome your collaboration.

Better jmeter report

Currently I use jmeter aggregate report or summary report for submitting reports. But they expect something extra.. How can I give. Is there any plugins for getting server resources usage when testing load.
Reporting: since JMeter 3.0 there is a HTML Reporting Dashboard which can be generated during the test run. It contains exhaustive overview information. If you need to find out the reason of the bottleneck or memory leak or whatever you can consider extra Graphs available via JMeter Plugins project.
The same JMeter Plugins project provides PerfMon - client-server application which is able to collect over 70 different metrics and plot them via JMeter Listener. See How to Monitor Your Server Health & Performance During a JMeter Load Test guide for detailed setup and usage instructions.
There are quite a few plug-ins available that can help you analyze the results better. You can refer to https://jmeter-plugins.org/ for the same.
Most popularly used ones are:
Response Times Over Time
Response Times Percentiles
Transactions per Second
Response Latencies Over Time
In case of server usage you can use following that comes with JMeter plug-ins
PerfMon Metrics Collector and Server Agent or
In case of Unix based system use sar command that comes with sysstat package or VMstat. In case of windows based system use Perfmon to capture the system utilization data while the test is running and then use Ksar to plot graphs with the data collected using sar. https://sourceforge.net/projects/ksar/
If you have collected data using Perfmon then plot the graphs using PAL. https://pal.codeplex.com/
In this case, I would suggest using Grafana. It shows realtime results. And the best thing is, it can be configured according to the need.
Now, the thing is how to use it? Using it is not that tough.
If you're using a Mac or Linux (Any Flavour) things become easy. If you're using Windows, I would suggest using a virtual machine. The reason behind that is windows block traffic after some requests. And that causes a lot of pain in the head.
In my case, I used a virtual machine to setup ubuntu inside it and then configured Grafana.
For working with Grafana, you need to have these two things installed.
Grafana Itself
Influx Db for the backend
Links for both here below:
https://grafana.com/grafana/download?platform=linux
https://portal.influxdata.com/downloads/
Once installed and setup,
You need to use Backen Listener to push results o Graphite Client (Installed along with Influx DB Automatically).
I know it is a bit confusing but once you understand the thing, you and your client will love the detailed reports.
Remeber, Grafana is all about configuration.
Let me know if you have any confusion regarf=ding this.
Happy to help. :)

What's the best way to monitor rabbitmq to make sure everything is running smoothly?

Many times, I get:
-Frozen, load goes to 5.0. Can't use my box.
-Just doesn't work.
Do following steps:
1.rabbitmq-plugins enable rabbitmq_management
2.service rabbitmq-server restart
3.browse to http://rabbitmq-server-ip:15672
4.login with
username: guest
password: guest
Dont forget to change your password later.
As sheki notes, rabbitmqctl is your first port of call for diagnostics, and for building monitoring on top of, but it's not suitable for actual monitoring directly being a manual command line.
I've found DataDog very good to monitor both the MQ details, plus the host platform in parallel. e.g. you can watch the queue levels and set alerts on queues backing-up, while also watching the CPU/memory/IO inflicted by these queue levels. It really helps to get ratios of resource usage, and the alerts are good. Having a uniform platform for both infrastructure and application level monitoring is surprisingly rare, but speeds up diagnoses of production issues hugely.
NewRelic is similar and also has a RabbitMQ plugin, although I've not used this plugin specifically, I've used NR for years and found it invaluable in diagnosing operational issues.
AppDynamics is another example. Similarly this allows you to drill down into your app from a high-level dashboard, and visually navigate from problems to causes. It's especially good with visualising the network of a distributed application across various services/servers. I've used this, for example, to find complex problems in .NET applications and SQL Server clusters using 3rd party Web Services (e.g. latency and its consequences to your app over chatty protocols). These things are very difficult to diagnose, especially for developers who are limited to checking their code. Diagnosing operational issues requires a much broader picture.
I gave up trying to even install and configure Nagios. I know it's the 'best' but it's the best of an old breed of self-configured beasts which we don't have time to manage. I didn't even get it going... and eventually turned to the more 'modern' cloud approach. Once you get over the trust factor, it's pretty liberating.
I'm using these APM platforms together* to aggregate data from:
Windows O/S level Event Logs/Services
Linux O/S level
AWS console level
RDS, EC2
Apache
MySQL
App integrations / custom NR plugins I've written
Rabbit MQ
*NewRelic can feed into Datadog! So if you are already using NR you don't need to install DD on those hosts as well.
Being able to view all these levels together gives you a view on the publishers, middleware, MQ servers, workers and front-end app - all in one dashboard.
I would highly recommend an approach like this, because just looking at one server alone leads you to a lot of head-scratching. Seeing an entire stack in one customisable dashboard is just so illuminating it takes most of the guesswork out of it.
Worried about installing these things? I found New Relic to be especially light-weight and unobtrusive. AppDynamics seemed to stress the host a bit more, but mostly that's because you had to run the visualisation tools on the host! (this may have changed). DataDog seems performant, but creates a lot of control panels/icons on the target host (perhaps just a visual impression).
To a four year old question - this answer probably wasn't available in 2011, but in 2015 these once 'startup' style APM services are just tens or hundred dollars a month for an unbelievably rich enterprise-level solution.
There are bunch of RabbitMQ monitoring plugins available for different monitoring systems like Nagios, Zabbix etc.
Look at http://www.rabbitmq.com/how.html#management
Using rabbitmqctl is the most straight forward solution to check the status of the node.
$ rabbitmqctl status
This should tell you the status of the RabbitMQ node.
If you have PRTG (or any probe system with a HTTP sensor check), you can check the server status described at the following page:
https://blog.cdemi.io/monitoring-rabbitmq-in-prtg/
In particular you have to
Enable Management Plugin
The rabbitmq-management plugin provides an HTTP-based API for management and monitoring of your RabbitMQ
server, along with a browser-based UI and a command line tool,
rabbitmqadmin. The management plugin is included in the RabbitMQ
distribution. To enable it, we need to run: rabbitmq-plugins enable
rabbitmq_management on the RabbitMQ nodes. For more details on the
Management plugin refer to RabbitMQ Documentation.
The web UI is located at: http://server-name:15672/ The HTTP API and
its documentation are both located at: http://server-name:15672/api/
Once done, you can check the overview of your server with the API:
http://server-name:15672/api/overview
Where you have a JSON with all details about the server, active connections, queues, etc.
This cmd will help you service rabbitmq-server status
OR try theseservice rabbitmq-server stop and service rabbitmq-server start then service rabbitmq-server status.

Best methodology for developing c# long running processor apps

I have several different c# worker applications that run various continuous tasks: sending emails from queue, importing new orders from website database to orders database, making database backups and restores, running data processing for OLTP -> OLAP, and other related tasks. Before, I released these as windows services, but currently I release them as regular console applications. They are all based on a common task runner framework I created, and I am happy with that, however I am not sure what is the best way to deploy these types of applications. I like the console version because it is quick and easy, and it is possible to quickly see program activity and output. The downside is that the worker computer has several console screens running and it gets messy. On the other hand the service method seems to take to long to deploy and I have to go through event logs to see messages. What are some experiences/comments on this?
I like the console app approach. I typically have things set up so I can pass a switch like -unattended that suppresses the console screen.
Windows Service would be a good choice, it runs in the background no matter if you close current session, also you can configure it to start automatically after windows restart when performing a patches update on the server. You can log important messages to event viewer or database table.
For a thing like this, the standard way of doing it is with Windows services. You want the service to run on the network account so it won't require a logged in user.
I worked on something a few years ago that had similar issues. Logically I needed a service, but sometimes I needed to see what was going on and generally I wanted a history. So I developed a service which did the work, any time it wanted to log, it called to it's subscribers (implemented as an observer pattern).
The service registered it's own data logger (writing to a database) and at run time, the user could run a GUI which connected to the service using remoting to become a live listener!
I'm going to vote for Windows Services. It's going to get to be a real pain managing those console applications.
Windows Service deployment is easy: after the initial install, you just turn them off and do an XCOPY. No need to run any complicated installers. It's only semi-complicated the first time, and even then it's just
installutil MyApp.exe
Configre the services to run under a domain account for the best security and easiest interop with other machines.
Use a combination of event logs (with Error, Warning, and Information) for important notifications, and just dump verbose logging to a text file.
Why not get the best of all worlds and use something like:
http://topshelf-project.com/
It will allow you to run your program as command line or a windows service.
I'm not sure if this applies to your applications or not, but when I have some console applications that are not dependent on user input or they are the kind of applications that just do their job and quit, I run such programs on a virtual server, this way I don't see a screen popping up when I'm working, and virtual servers are easy to create and restart.
We regularly use windows services as the background processes. I don't like command-line apps as you need to be logged into the server for them to run. Services run in the background all the time (assuming they're auto-start). They're also trivial to install w/the sc.exe command-line tool that's in windows. I like it better than the bloat-ware that is installutil.exe. Of course installutil does more, but I don't need what it does. I just want to register my service.
We've also created a infrastructure where we have a generic service .exe that loads .DLLs based on an interface definition, so adding a new "service" is as simple as dropping in a new DLL and restarting the service host.
However, we started to move away from services. The problem we have with them is that they lock up the DLLs (for obvious reasons) so it's a pain to upgrade them. We need to stop, upgrade and then restart. Not hard, but additional steps. Instead we're moving to special "pages" in our asp.net apps that run the actual background jobs we need done. There's still a service, but all it does it invoke the asp.net pages so it doesn't lock up any of our DLLs. Then we can replace the DLLs in the asp.net bin directory and normal asp.net rules for app-domain restart kick in.