What can be done using CKAN's asynchronous background jobs? - redis

I have just started installing CKAN.
I was looking in the initial documentation to install CKAN and on the principal website of ckan.org, its impossible for me to find and know why Redis is necessary to run a version 2.7 or higher of CKAN.
For example:
Why do we need Redis for running CKAN?
In this question, it says that its necessary for the new system of asynchronous background jobs that is it used on CKAN. So what type of asynchronous job can do CKAN? and when CKAN can use this type of asynchronous background jobs?

Currently, CKAN core only provides the infrastructure for asynchronous background jobs but doesn't actually create such jobs on its own (this might change in future releases).
There are, however, CKAN extensions that use the system, for example ckanext-extractor.
Disclosure: I'm the author of ckanext-extractor.

Related

Automated Testing of Nifi flows using Jenkins

Is there any way to automatically run regression/functional tests on Nifi flows using Jenkins pipeline ?
Searched for it, without any success.
Thanks for your help.
With the recent release of NiFI-1.5.0 and NiFi-Registry-0.1.0, the community has come together to produce a number of SDLC/CICD integration tools to make using things like Jenkins Pipeline easier.
There is both Python (NiPyAPI), and Java (NiFi-Toolkit-CLI) API wrappers being produced by a team of collaborators to allow scripted manipulation of NiFi Flows across different environments.
Common functions include interaction with integrated version control, import/export of flows as JSON documents, deployment between environments, start/stop of flows, etc.
So, we are working quickly towards supporting things like an integrated wrapper for declarative Jenkins Pipelines, and I would add it is being done fully in public codebase under the Apache license, so we (I am the lead NiPy author) would welcome your collaboration.

How to package and deploy cumulocity server-side agents?

We are creating a server-side agent which periodically fetches data from nodes and maps this data to cumulocity measurements, events.
What is an elegant approach for hosting and/or packaging such a server-side agent?
We are hosting our own instance of the Cumulocity platform.
It's preferable to keep this server-side agent as 'close' to the core platform as possible, e.g. share some core agent framework dependencies.
We'd like to limit the amount of setting up additional environments or containers (e.g. Tomcat).
Cumulocity uses Karaf, would it make any sense to deploy the server-side agent into Karaf as a bundle?
Is there any recommended approach for hosting server-side agents? Does the cumulocity platform offer an alternative to deploying the agent to some "own environment"?
The Cumulocity examples repository contains the "tracker-agent" server-side agent example, which is an embedded tomcat Java application. There is little information about the intended deployment location.
I don't recommend deploying agents/microservices directly into the core Karaf server, since that endangers the resources available to the core APIs and is not supported. (I.e., will likely be overwritten with the next upgrade...)
Typically, people just provision an additional VM or docker next to Cumulocity to place their agents/microservices in. On top of that, we, for example, often use Spring Boot, so the effort is pretty low (java -jar ...).
We do have a hosting system for agents/microservices and will make that generally available also for others to use in Q1/2018. Follow the announcement channel at https://support.cumulocity.com to stay posted...

Can a Java application server (WebLogic) manage a native executable?

Is it possible (...knowing full well that this is crazy and seriously ill-advised...) to have a J2EE application running in a Java app server (using weblogic presently), and have a native executable process started, used, and stopped as part of this Java application's lifecycle? (Note: this is not JNI, it's actually a separate native process. It's unix/linux, but should also run on windows.) I haven't found any docs on the subject -- and for good reason, probably.
Background: The native process is actually some monolithic 3rd party software package that is un-hackable and there's no API other than stdin/stdout. The Java app requires the native app to perform certain services. I can easily wrap the native process via ProcessBuilder and start/stop and communicate with it (using stdin/stdout). For testing purposes I have a simple exe (C++) that communicates via stdin/stdout and can receive "start", "shutdown" and performs a simple "echo" service. (The "start" is a no-op, but simply returns "ok" if the native process started successfully.)
So, ideally, when the app server is started/shutdown, and/or the deployed Java app is started/shutdown, the associated native process can also be started/shutdown. And ideally, this can happen cleanly & reliably (no lingering processes after shutdown, all startup failures logged, the lifecycle timing issues synchronized).
If this actually worked, then "part 2" of the question would be if this could actually work in a cluster/failover environment. The native process could be tied to a platform and software-specific monitoring & management service, but I'd like to have everything bundled and managed with the Java app, if possible.
If Glassfish or any other OSGi type environment would make this simpler, please feel free to let me know (it could be an option... I'd prefer Glassfish, but WLS is the blanket mandate.)
I'm trying to put together a proof-of-concept, but any clear answer "yes, I've done it" or "no, it won't work" would be much appreciated & a huge time-saver (with supporting doc links, if you have them).
Edit: just to clarify (the subject may be misleading): there is a considerable Java application running as well (which I've written & can freely modify as necessary); the 3rd party native process just performs a service that the Java application requires. I'm not merely trying to manage a native process via an app server.
The answer to part 1 is yes, it is absolutely possible to have a Java application server manage a native system process. It sounds like you've pretty much figured this out for yourself, if you're thinking about using a ProcessBuilder to spawn the external program and interact with it. That's pretty much the way to do it.
I have used exactly that kind of setup in the past to implement a media transcoding service on top of a Java server (the Java server spawned transcoding jobs via ffmpeg processes, monitoring their status and reporting back to the rest of the application on success/failure/etc.). How cleanly it can all be done depends upon how you implement it and upon the behavior of your external app (i.e. is it guaranteed to respond gracefully and quickly to a shutdown request?), but it will be very difficult (if not impossible) to get it completely perfect. At a minimum, if someone does a kill -9 on your Java server process, there is no way for you to gracefully shut down the native process, at least not until the server is restarted and you see that the native process is already running.
The second part depends upon exactly what you mean by "work in a cluster/failover environment". In terms of managing the native process, if you can start it and interact with it in Java then you can also manage it in Java. But if you mean you want perfect failover behavior such that if the node with the native process on it goes down then a new node automatically resumes the process in the exact same state as it was before, then that may be very difficult or even impossible. But, if you abstract out interactions with the external process so that it just appears as a service that your Java code interacts with (for instance, perhaps by sending requests to some facade class that understands how to interact with and manage the external process) then you should be able to get some fairly good results.
The transcoding service that I implemented ran in a clustered environment (using JBoss/Tomcat), and the way it worked was that when a transcoding job was requested a message would be dispatched. This message would be received by a coordinating class that would manage the queue of transcode requests, spawning jobs as worker processes became available. The state of the queue was replicated across the cluster, so if the node running the ffmpeg processes went down the currently scheduled jobs would be remembered, and then resumed as soon as a suitable node was available again (the transcoding service was configurable so that it could be enabled/disabled per node). In practice the system proved to be quite robust.

What's the best way to monitor rabbitmq to make sure everything is running smoothly?

Many times, I get:
-Frozen, load goes to 5.0. Can't use my box.
-Just doesn't work.
Do following steps:
1.rabbitmq-plugins enable rabbitmq_management
2.service rabbitmq-server restart
3.browse to http://rabbitmq-server-ip:15672
4.login with
username: guest
password: guest
Dont forget to change your password later.
As sheki notes, rabbitmqctl is your first port of call for diagnostics, and for building monitoring on top of, but it's not suitable for actual monitoring directly being a manual command line.
I've found DataDog very good to monitor both the MQ details, plus the host platform in parallel. e.g. you can watch the queue levels and set alerts on queues backing-up, while also watching the CPU/memory/IO inflicted by these queue levels. It really helps to get ratios of resource usage, and the alerts are good. Having a uniform platform for both infrastructure and application level monitoring is surprisingly rare, but speeds up diagnoses of production issues hugely.
NewRelic is similar and also has a RabbitMQ plugin, although I've not used this plugin specifically, I've used NR for years and found it invaluable in diagnosing operational issues.
AppDynamics is another example. Similarly this allows you to drill down into your app from a high-level dashboard, and visually navigate from problems to causes. It's especially good with visualising the network of a distributed application across various services/servers. I've used this, for example, to find complex problems in .NET applications and SQL Server clusters using 3rd party Web Services (e.g. latency and its consequences to your app over chatty protocols). These things are very difficult to diagnose, especially for developers who are limited to checking their code. Diagnosing operational issues requires a much broader picture.
I gave up trying to even install and configure Nagios. I know it's the 'best' but it's the best of an old breed of self-configured beasts which we don't have time to manage. I didn't even get it going... and eventually turned to the more 'modern' cloud approach. Once you get over the trust factor, it's pretty liberating.
I'm using these APM platforms together* to aggregate data from:
Windows O/S level Event Logs/Services
Linux O/S level
AWS console level
RDS, EC2
Apache
MySQL
App integrations / custom NR plugins I've written
Rabbit MQ
*NewRelic can feed into Datadog! So if you are already using NR you don't need to install DD on those hosts as well.
Being able to view all these levels together gives you a view on the publishers, middleware, MQ servers, workers and front-end app - all in one dashboard.
I would highly recommend an approach like this, because just looking at one server alone leads you to a lot of head-scratching. Seeing an entire stack in one customisable dashboard is just so illuminating it takes most of the guesswork out of it.
Worried about installing these things? I found New Relic to be especially light-weight and unobtrusive. AppDynamics seemed to stress the host a bit more, but mostly that's because you had to run the visualisation tools on the host! (this may have changed). DataDog seems performant, but creates a lot of control panels/icons on the target host (perhaps just a visual impression).
To a four year old question - this answer probably wasn't available in 2011, but in 2015 these once 'startup' style APM services are just tens or hundred dollars a month for an unbelievably rich enterprise-level solution.
There are bunch of RabbitMQ monitoring plugins available for different monitoring systems like Nagios, Zabbix etc.
Look at http://www.rabbitmq.com/how.html#management
Using rabbitmqctl is the most straight forward solution to check the status of the node.
$ rabbitmqctl status
This should tell you the status of the RabbitMQ node.
If you have PRTG (or any probe system with a HTTP sensor check), you can check the server status described at the following page:
https://blog.cdemi.io/monitoring-rabbitmq-in-prtg/
In particular you have to
Enable Management Plugin
The rabbitmq-management plugin provides an HTTP-based API for management and monitoring of your RabbitMQ
server, along with a browser-based UI and a command line tool,
rabbitmqadmin. The management plugin is included in the RabbitMQ
distribution. To enable it, we need to run: rabbitmq-plugins enable
rabbitmq_management on the RabbitMQ nodes. For more details on the
Management plugin refer to RabbitMQ Documentation.
The web UI is located at: http://server-name:15672/ The HTTP API and
its documentation are both located at: http://server-name:15672/api/
Once done, you can check the overview of your server with the API:
http://server-name:15672/api/overview
Where you have a JSON with all details about the server, active connections, queues, etc.
This cmd will help you service rabbitmq-server status
OR try theseservice rabbitmq-server stop and service rabbitmq-server start then service rabbitmq-server status.

EC2 automation tools / strategies?

What tools or strategies are you using for automation of EC2 activities?
I need to be able to bring up a number of EC2 instances, provision various software to it (primarily Python packages), interact with S3 (primarily download data), and run various jobs. I'll be doing this both on-demand and on a scheduled basis.
I'm trying to decide if I should:
Create an AMI with all my software loaded on it
or
Launch a plain vanilla linux AMI instance and scp my software to it
For the provisioning and automation Boto looks pretty good. Or I could write something with Paramiko. Recommend either or anything else I should be looking it?
Basically I'm looking for advice / success stories, let me know what's working for you.
To answer your bullets about selecting AMIs, I would say that it depends on how much software you're installing.
I have been successful with a hybrid approach, where I build an AMI and load my heavyweight and more stable software. This is the stuff that needs to run an installer, or takes considerable time to install (remember that if you re-install a package every time as part of your startup process, you're paying for the install every time). Then, I upload the small and volatile software at provisioning/startup time. In this bucket goes most of the application code, data, etc. That way, I can change my app and not have to touch the AMI.
The benefits of this approach:
Don't have to pay for running the same software install thousands of times.
AMI can stay fairly stable over time.
Can use software that requires intervention or GUI interaction to install.
Major drawbacks:
Your AMI's OS version will become stale over time.
Your AMI may not be flexible as to the instance type/architecture it will run on. For instance, you may create it on a 32-bit OS and thereby prevent it from running on the High CPU instance types, or vice versa. So you may lock yourself into a pricing scheme.
I don't use Python, so I can't comment on either of the APIs you referenced.
AWS just released the Systems Manager suite, which includes an Automation service that will (among other things) handle your use cases around AMIs.
This question was asked some time ago now but I believe my answer could be useful to other users. I believe the best automations tools available on the market are provided by Cloud Management platforms. For example they offer auto-scaling, configuration software integration (Chef/Puppet), databases replications, dns management...
The most popular cloud management softwares are Scalr (disclaimer: I work there), RightScale and enStratus. Scalr is open-source and released under the Apache 2 license.
Regarding your specific question on AMIs, cloud Management platforms usually provide pre-configured AMIs (at Scalr, we call them roles). If you want to create your own AMI built on an existing instance, you'll be able to create snpashots and use them as a template for future instances.