How to build a development and production environment in apache nifi - apache

I have 2 apache nifi servers that are development and production hosted on AWS, currently the migration between development and production is done manually. I would like to know if it is possible to automate this process and ensure that people do not develop in production?
I thought about uploading the entire nifi in github and having it deploy the new nifi on the production server, but I don't know if that would be correct to do.

One option is to use NiFi registry, store the flows in the registry and share the registry between Development and Production environments. You can then promote the latest version of the flow from dev to prod.
As you say, another option is to potentially use Git to share the flow.xml.gz between environments and using a deploy script. The flow.xml.gz stores the data flow configuration/canvas. You can use parameterized flows (https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#Parameters) to point NiFi at different external dev/prod services (eg. NiFi dev processor uses a dev database URL, NiFi prod points to prod database URL).
One more option is to export all or part of the NiFi flow as a template, and upload the template to your production NiFi, however registry is probably a better way of handling this. More info on templates here: https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#templates.
I believe the original design plan behind NiFi was not necessarily to have different environments, and to allow live changes in production. I guess you would build your initial data flow using some test data in production and then once it's ready start the live data flow. But I think it's reasonable to want to have separate environments.

Related

Is it possible to share datasource testcontainer between quarkus apps in dev mode?

For kafka, redis and other testcontainer services there is a quarkus.*.devservices.shared configuration option (e.g. https://quarkus.io/guides/dev-services#quarkus-kafka-client-config-group-kafka-dev-services-build-time-config_quarkus.kafka.devservices.shared), which will reuse testcontainers of that type if there is already an existing one running.
Is there a way to achieve something similar with datasources/dbs?
Example:
I have two quarkus apps and I want to share a mysql db between them in dev mode. Setting up the tables is done with flyway.

We are using Airflow in dev environment. How to migrate changes from one environment to another environment?

Currently DAG code is in a github and can be migrated to other environments easily. But what is the best way to migrate variables and connections to another environments?
We just browsed and found that we can do it through CLI but that is not working. Is there any other way?
You could write a DAG that uses the API to read variables and connections from dev and creates or updates them on the other environments.
Airflow REST API Variables
Airflow REST API Connections
In order to use the API, you need to activate API authentication.

Google Cloud Manage Tomcat Service

Does google cloud or aws provide manage Apache tomcat which just take war file and do auto-scaling based on load increase and decrease ? not compute engine. I dont want to create VM. this should be manage by manage service.
Google App Engine can directly take and run a WAR file - just use the appcfg deployment method.
You will have more options if you package with docker, as this then provides an image type that can be run in many places (Multilpe GCP, AWS and Azure options, on-prem Kubernetes, etc). This can even be as simple as building a dockerfile that just copies the WAR into a jetty image:
FROM jetty:latest
COPY YOUR_WAR.war /var/lib/jetty/webapps
It might be better to explode the war though - see discussion in this question
AWS provide ** AWS Elastic Beanstalk **
The AWS Elastic Beanstalk Tomcat platform is a set of environment configurations for Java web applications that can run in a Tomcat web container. Each configuration corresponds to a major version of Tomcat, like Java 8 with Tomcat 8.
Platform-specific configuration options are available in the AWS Management Console for modifying the configuration of a running environment. To avoid losing your environment's configuration when you terminate it, you can use saved configurations to save your settings and later apply them to another environment.
To save settings in your source code, you can include configuration files. Settings in configuration files are applied every time you create an environment or deploy your application. You can also use configuration files to install packages, run scripts, and perform other instance customization operations during deployments.
It also provide autoscaling
The Auto Scaling group in your Elastic Beanstalk environment uses two Amazon CloudWatch alarms to trigger scaling operations. The default triggers scale when the average outbound network traffic from each instance is higher than 6 MB or lower than 2 MB over a period of five minutes. To use Amazon EC2 Auto Scaling effectively, configure triggers that are appropriate for your application, instance type, and service requirements. You can scale based on several statistics including latency, disk I/O, CPU utilization, and request count.

Prometheus target management

We are using prometheus in our production envirment recently. Before we only have 30-40 nodes for each service and those servers not change very often, so we just write it in the prometheus.yml, but right now it become too long to hold in one file and change much frequently then before, so my question is should i use file_sd_config to put those server list out of yml file and change those config files sepearately, or using consul for service discovery(same much easy to handle changes).
I have install 3 nodes consul cluster in data center and as i can see if i change to use consul to slove this problem , i also need to install consul client in each server(node) and define its services info. Is that correct? or does anyone have good advise.
Thanks
I totally advocate the use of a service discovery system. It may be a bit hard to deploy at first but surely it will worth it in the future.
That said, Prometheus comes with a lot of service discovery integrations. It's possible that you don't need a Consul cluster. If your servers are in a cloud provider like AWS, GCP, Azure, Openstack, etc, prometheus are able to autodiscover the instances.
If you keep running with Consul, the answer is yes, the agent must be running in every node. You can also register services and nodes via API but it's easier to deploy the agent.

Push code from VSTS repository to on-prem TFS?

this is my first post on here so forgive me if I've missed an existing answer to this question.
Basically my company conducts off-site development for various clients in government. Internally, we use cloud VSTS, Octopus deploy and Selenium to ensure a continuous delivery pipeline in our internal Azure environments. We are looking to extend this pipeline into the on-prem environments of our clients to cut down on unnecessary deployment overheads. Unfortunately, due to security policies we are unable to use our VSTS/Octopus instances to push code directly into the client environment, so I'm looking for a way to get code from our VSTS environment into an on-prem instance of TFS hosted on their end.
What I'm after, really, is a system whereby the client logs into our VSTS environment, validates the code, then pushes some kind of button which will pull it to their local TFS, where a replica of our automated build and test process will manage the CI pipeline through their environments and into prod.
Is this at all possible? What are my options here?
There is not a direct way to achieve migrating source code with history from VSTS to a on-premise TFS. You would need 3rd party tool, like Commercial Edition of OpsHub (note it is not free).
It sounds like you need a new feature that is comming to Octopus Deploy, see https://octopus.com/blog/roadmap-2017 --> Octopus Release Promotions
I quote:
Many customers work in environment where releases must flow between more than one Octopus server - the two most common scenarios being:
Agencies which use one Octopus for dev/test, but then need an Octopus server at each of their customer's sites to do production deployments
I will suggest the following. Though it contains small custom script.
Add build agent to your vsts which is physically located on customer's premises. This is easy, just register agent with online endpoint.
Create build definition in vsts that gets code from vsts. But instead of building commits it to local tfs. You will need a small powershell code here. You can add it as custom powershell step in build definition.
Local tfs orchestrates the rest.
Custom code:
Say your agent is on d:/agent
1. Keep local tfs mapped to some directory (say c:/tfs)
The script copies new sources over some code from d:/agent/work/ to c:/tfs
Commits from c:/tfs to local tfs.
Note:You will need /force option (and probably some more) to prevent conflicts.
I believe this not as ugly as it sounds.