How to convert Cloudwatch Dashboard Source Json to CDK - amazon-cloudwatch

So we've created a dashboard in CloudWatch and we want it initialized by CDK every startup across all our environments. We noticed there's a view/edit source that you can copy paste a json in and we wondered is there a way to convert the View/Edit Source to CDK objects or widgets so it would be easier to maintain?

You can do this using the low-level L1 CfnDashboard construct. L1 constructs map 1 to 1 to CloudFormation resources, and since CloudFormation supports creating a dashboard from JSON, this can be done in CDK.
Simply provide your JSON string to the dashboardBody prop of CfnDashboard.
Keep in mind, though, that all the metric names and regions will be hardcoded, so if you need them to change based on the environment, you'll need to do that yourself.
If your goal is ease of maintainability, I would strongly suggest converting your dashboard to CDK code. This should be straightforward to do and will give you readability and ease of modification.
Reference: https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_cloudwatch.CfnDashboard.html#dashboardbody

Related

Best Practice for deploying ADF pipelines

I am brand new to ADF and am creating my very first data factory. I am using the UI option (if anyone can point me to any documents for using code I'd be most grateful).
I will have 3 different environments - dev/test/prod. Each of these have got slightly different configs (yes I know!). So my datasets and linked services will need to change for each environment. What is the best way to do this? How would you approach this?
(p.s: We also have BitBucket and Jenkins/Octopus for CI/CD, so ideally would like to create scripts to automate this if possible.)
Thank you
You can create data factory using code. You can find code with detailed information here
There are 2 approach to deploy ADF pipeline.
ARM template
Custom approach (Json files, via REST API) - With this approach, we can fully automate CI/CD process as collaboration branch will be our source for deployment. This is the reason why the approach is also known as (direct) deployment from code (JSON files).
Refer this blog by Kamil Nowinski
Scope of the question is broad. But, this video by Mohamed Radwan practically shows how you can deploy and manage 3 different environments i.e. ADF-DEV, ADF-PROD and ADF-UAT.

Can we automate ETL in Azure?

I am currently working on a very interesting ETL project using Azure to transform my data manually. However, transforming data manually can be exhausting and lengthy when I start having several source files to process. My pipeline is working fine for now because I have only a few files to transform but what if I have thousands of excel files?
So what I want to achieve is that I want to extend the project and extract the excel files that are coming from Email using the logic app then apply ETL directly on top of them. Is there any way I can automate ETL in Azure. Can I do ETL without modifying the pipeline for a different type of data manually? How can I make my pipeline flexible to be able to handle data transformation for various types of source data?
Thank you in advance for your help.
Can I do ETL without modifying the pipeline for a different type of
data manually?
According to your description, i suppose that you already knew the ADF connector is supported in the Logic App. You could execute ADF pipeline in the Logic App flow and even pass parameters into ADF pipeline.
Normally, the source and sink service should be fixed in one copy activity, but you could define dynamic file path in the datasets. So you don't need to create multiple copy activities.
If the data types are different, you could try to pass the parameter from Logic App into ADF. Then before the data transmission, you could use Switch activity to route the transmission into different branches.

When AEM is configured to use a S3 data store will it make blue-green deployments faster?

Background
We know it's possible to setup a devops pipeline that deploys updates to AEM via a blue/green approach by using crx2oak to migrate the content from old to new environment. Why is out of scope of this question.
The problem with this approach is the content copy operation can take a significant time, as the amount of content in the JCR grows. Other ideas to mittigate this are appreciated.
We also know that AEM can have a S3 datastore that off-loads the binary content into a S3 bucket which would not be re-built during blue/green deployment as per:
https://helpx.adobe.com/experience-manager/6-3/sites/deploying/using/storage-elements-in-aem-6.html#OverviewofStorageinAEM6
What is unclear from Adobe's documentation is whether the same S3 bucket can be shared across AEM instances (i.e. blue/green instances). Maybe it's just my google fu that has failed...
Question(s)
When a new AEM instance is configured to use a S3 datastore that already has content in it from the old instance, when crx2oak is used to migrate content, will the new instance be able to access the existing content?
Are there any articles/blogs that describe what the potential time savings of this approach would be?
Yes I could do an experiment, and may do so in the future to answer my own question. I'm looking for information from anyone who has already done this? I'm an engineer so will not re-invent the wheel if someone else has done so.
You can certainly share the same S3 bucket between instances - in fact, this is commonly used along with binary-less replication from author->publisher(s) and is a tried and true configuration.
It's even possible to share the same bucket between completely different environments (e.g. DEV/STAGE, or BLUE/GREEN in your case). The main "gotcha" to be aware of is with regard to DataStore Garbage Collection (DSGC) because it's very possible that there will be blobs which are referenced by only some of the instances sharing the bucket and so when purging unused blobs this needs to be taken into account.
This is all part of the design though, and there is a flag designed specifically for this purpose which tells DSGC to only execute the first phase (the "mark" phase) of GC, and skip the 2nd "sweep" phase, until all instances have marked which blobs they wish to keep/discard. Once all instances have done so the sweep phase can be run to purge blobs not needed by any instances using the bucket.
For a more detailed explanation see the Oak docs:
https://jackrabbit.apache.org/oak/docs/plugins/blobstore.html#Shared_DataStore_Blob_Garbage_Collection_Since_1.2.0
I find it helps to understand that pretty much all of the datastore implementations are done such that blobs are stored according to their checksum, so the same file added uploaded twice will only have one copy stored in the datastore, and there will be two segment store records referencing that same blob. In the same way, multiple AEM instances sharing the same bucket will be able to find a given blob regardless of which instance put it there in the first place.
You can observe see this in action easily with FileDataStore by finding a blob and sha256'ing it - e.g. (this example is on OS X, the checksum command on Linux/Windows will be slightly different):
$ shasum -a256 crx-quickstart/repository/datastore/0c/9e/40/0c9e405fc8d0f0405930cd0044611cfbf014938a1837ae0cfaa266d7732d1002
0c9e405fc8d0f0405930cd0044611cfbf014938a1837ae0cfaa266d7732d1002 crx-quickstart/repository/datastore/0c/9e/40/0c9e405fc8d0f0405930cd0044611cfbf014938a1837ae0cfaa266d7732d1002
There you can see that a) the filename is the checksum, and b) it's nested using the first 3 pairs of characters from that checksum, so you can locate the file by just knowing the hash and if you store the same binary, even if the name or JCR metadata is different, the blob referenced will be the same literal file on disk.
From memory S3 datastore uses prefixes rather than directory nesting because this performance better, but the principle is the same.
Finally, a couple of things to consider are:
1) S3 storage is relatively cheap (and practically unlimited) so there is an argument to be made that it's not as necessary to perform regular DSGC unless you're really trying to pinch pennies.
2) If you do run DSGC you need to think about how this will work with whatever backup strategy you're using for the AEM instances. For instance, if you roll back a segment store shortly after running DSGC you'll likely have to recover some of those purged blobs. You can use versioning and/or lifecycle rules to help with this, but it can add significant additional complexity and time to your restore process.
If you opt to simply skip DSGC and leave the blobs there indefinitely it's a good idea to make sure the access key or IAM roles AEM is using doesn't have the DeleteObject permission for the bucket, just to be sure a rogue GC process can't delete anything.
Hope this helps.
Edit
In all that I forgot to actually answer your question - yes it will save some time in cloning in most cases. You'll still need to sync the segment store (obviously) and there are various approaches for this. crx2oak is certainly one - you'll see in the documentation there are specific options for using it w/ S3 where you supply a configuration file (basically a serialised .config file like you'd use with Felix/OSGi).
You can also use something like rsync to simply copy the TAR files over (while at least the target AEM is stopped. Oak is generally atomic so a hot copy from the source can work in theory, but YMMV).
Finally you could obviously use Mongo and cluster the segment store that way, but all the usual cost/complexity/performance issues with doing so apply).
Another interesting development on the horizon for blue/green type is the CompositeNodeStore - there is a good talk from the 2017 adaptTo() conference that talks about this:
https://adapt.to/2017/en/schedule/zero-downtime-deployments-for-the-sling-based-apps-using-docker.html
An external datastore will help a lot, as usually the most space is used by binary assets. The pure content typed in by real people is much less.
On my current project (quite small, but relations should be normal):
Repository 4,8 GB total (4.1 GB Segment Store, 780 MB Index)
File DataStore 222 GB total
If you wanna do it, I have the following remarks:
There are different datastores available. For testing I would start with the File DataStore.
The S3 DataStore makes only sense in my point of view, if you are hosting at Amazons AWS anyway. Adobe Managed Services is doing this, and so S3 makes sense for them. But also there only if you have more than 500 GB assets.
If you use the green/blue approach, then be careful the DataStore garbage collection (just do it manually). The shared Datastore is meant for several publishers, that have the same content. As example you could have the following situation: Your editors delete some assets, you run the DataStore GC and finally your rollback your environment. That means the assets are still in the content repository, but the binaries are cleaned out of the DataStore.
In order to to use a shared file datastore, you need to do the following:
Unpack Quickstart java -jar AEM_6.3_Quickstart.jar -unpack
Create an directory for the file datastore (anywhere outside of the crx-quickstart folder)
Create a directory install inside the extracted crx-quickstart folder
Create a file called org.apache.jackrabbit.oak.plugins.blob.datastore.FileDataStore.cfg inside this install folder
This file contains just 1 line path=<path to file datastore> (see https://jackrabbit.apache.org/oak/docs/osgi_config.html)
Place a reference.key file inside the datastore directory. First time it will be created automatically. But if you use always the same key, the same hash-values are used all datastores across all your environments. This is also a prerequisite for a feature called "binary-less replication" (so binary would only be replicated the first time between author and publisher)
kind regards,
Alex

Is there a way to import backups in NiFi?

Using NiFi v0.6.1 is there a way to import backups/archives?
And by backups I mean the files that are generated when you call
POST /controller/archive using the REST api or "Controller Settings" (tool bar button) and then "Back-up flow" (link).
I tried unzipping the backup and importing it as a template but that didn't work. But after comparing it to an exported template file, the formats are reasonably different. But perhaps there is a way to transform it into a template?
At the moment my current work around is to not select any components on the top level flow and then select "create template"; which will add a template with all my components. Then I just export that. My issue with this is it's a bit more tricky to automate via the REST API. I used Fiddler to determine what the UI is doing and it first generates a snippet that includes all the components (labels, processors, connections, etc.). Then it calls create template (POST /nifi-api/contorller/templates) using the snippet ID. So the template call is easy enough but generating the definition for the snippet is going to take some work.
Note: Once the following feature request is implemented I'm assuming I would just use that instead:
https://cwiki.apache.org/confluence/display/NIFI/Configuration+Management+of+Flows
The entire flow for a NiFi instance is stored in a file called flow.xml.gz in the conf directory (flow.xml.tar in a cluster). The back-up functionality is essentially taking a snapshot of that file at the given point in time and saving it to the conf/archive directory. At a later point in time you could stop NiFi and replace conf/flow.xml.gz with one of those back-ups to restore the flow to that state.
Templates are a different format from the flow.xml.gz. Templates are more public facing and shareable, and can be used to represent portions of a flow, or the entire flow if no components are selected. Some people have used templates as a model to deploy their flows, essentially organizing their flow into process groups and making template for each group. This project provides some automation to work with templates: https://github.com/aperepel/nifi-api-deploy
You just need to stop NiFi, replace the nifi flow configuration file (for example this could be flow.xml.gz in the conf directory) and start NiFi back up.
If you have trouble finding it check your nifi.properties file for the string nifi.flow.configuration.file= to find out what you've set this too.
If you are using clustered mode you need only do this on the NCM.

How to add more data to be stored in jenkins rest api

To make the question simple, I know that I can get some build information with https://jenkins_server/...///api/json|xml|python. And I get a lot of information for that build record.
However, I want to add more information to that build record. For example, the docker image created, or the tickets or files changed from last build to create release note, ... etc. How do I do that?
For now, I use a script to create a json file as an artifact and call that json file to get these information, but it seems a duplicate if I can add more data to the jenkins build object directly.
The Jenkins remote access API is designed to provide access to generic Jenkins-internal information, like build numbers, timestamps, fingerprints etc.
If you want to add your own data there, then you must extend Jenkins accordingly, e.g., by designing a plugin that advertises your (custom) information items as standard Jenkins-"internal" data. If you want to do that, you may want to have a look at they way fingerprint information is handled (I found that quite instructive).
However, I'd recommend that you stick with your current approach, and keep generic Jenkins-internal information separated from Job-specific data. It is less effort and clearly separates your own data from Jenkins' data.