i am creating infra with terraform. I have decided to modularize it. But after modularizing and using terraform plan, i can see my Plan: 28 to add, 0 to change, 28 to destroy.
If i change the existing structure to modularize will terraform destroy all? Is there any way to not delete infra
Since you decided to split your infrastructure code in several modules, terraform will treat your resources as a new ones, because their location did change.
Moving resource blocks from one module into several child modules causes Terraform to see the new location as an entirely different resource.
Documentation: https://www.terraform.io/language/modules/syntax#transferring-resource-state-into-modules
There're several ways you can proceed now:
a. You can use refactoring feature of Terraform (available from version 1.1): https://www.terraform.io/language/modules/develop/refactoring and utilize moved block to map old resources to the new ones.
b. You can start with the clean terraform state and manually import resources from your actual infrastructure to the state (https://www.terraform.io/cli/import) (you need to do it for all 28 resources)
But if its your new project, the easiest way would be to just recreate resources from scratch (of course if it's not the production environment containing important data)
Related
With Azure DevOps Server 2019 RC it is possible to enable inherited process model on new collections (see release notes). Is there any way to use the inherited process model also for existing collections, where no customization on the process was made
Inherited process model is currently only supported for new collections created with Azure DevOps Server 2019 and not for existing collections.
See this Developer Community entry which asks for it.
I added a set of comments on how I hacked my way from an existing XML collection with a set of Projects to the Inherited type.
https://developercommunity.visualstudio.com/content/idea/614232/bring-inherited-process-to-existing-projects-for-a.html
Working as long as a vanilla workflow is applied to an existing XML collection before doing the voodoo thing.
Not exactly an answer for your question but we recently had the same task and I want to share how we handled this. We also wanted to move to the inherited model and we did not want to do any hacking. So we decided to create a new Collection on our Azure Devops Server 2020 with the inherited model and also migrate our tfvc repository to git.
Create the new Collection. Documentation
git-tfs to create a local repository from our tfvc repository and push it
azure-devops-migration-tools to copy all work items from the old collection to the new collection
In the old collection add the ReflectedWorkItemId for every WorkItem look here
In the new collection add the ReflectedWorkItemId for every WorkItem by using the process editor
Pro-Tip: create a full backup of the new collection to revert to this state easily. I had multiple try-error-restores.
You can't migrate shared steps or shared parameters like this, because you can't edit these work item types in the new collection. There is a workaround
We used the WorkItemTrackingProcessor to migrate all Epics/Features/Product Backlog Items/Bugs/Tasks/Test Cases. Then the same processor but with the mentioned workaround for Shared Steps and Shared Parameters.
This processor also migrates the Iterations and Area Paths
Finally we used the TestPlansAndSuitesMigration to migrate the Test Plans and Suites
For speeding up the migration, you can chunk the work items (for example by date or id) and start the migration multiple times.
Our build and release pipelines + task groups were migrated manually by import and export
We migrated the variable groups by using the API
The teams were created manually and we added the default area paths also by hand
Background
We know it's possible to setup a devops pipeline that deploys updates to AEM via a blue/green approach by using crx2oak to migrate the content from old to new environment. Why is out of scope of this question.
The problem with this approach is the content copy operation can take a significant time, as the amount of content in the JCR grows. Other ideas to mittigate this are appreciated.
We also know that AEM can have a S3 datastore that off-loads the binary content into a S3 bucket which would not be re-built during blue/green deployment as per:
https://helpx.adobe.com/experience-manager/6-3/sites/deploying/using/storage-elements-in-aem-6.html#OverviewofStorageinAEM6
What is unclear from Adobe's documentation is whether the same S3 bucket can be shared across AEM instances (i.e. blue/green instances). Maybe it's just my google fu that has failed...
Question(s)
When a new AEM instance is configured to use a S3 datastore that already has content in it from the old instance, when crx2oak is used to migrate content, will the new instance be able to access the existing content?
Are there any articles/blogs that describe what the potential time savings of this approach would be?
Yes I could do an experiment, and may do so in the future to answer my own question. I'm looking for information from anyone who has already done this? I'm an engineer so will not re-invent the wheel if someone else has done so.
You can certainly share the same S3 bucket between instances - in fact, this is commonly used along with binary-less replication from author->publisher(s) and is a tried and true configuration.
It's even possible to share the same bucket between completely different environments (e.g. DEV/STAGE, or BLUE/GREEN in your case). The main "gotcha" to be aware of is with regard to DataStore Garbage Collection (DSGC) because it's very possible that there will be blobs which are referenced by only some of the instances sharing the bucket and so when purging unused blobs this needs to be taken into account.
This is all part of the design though, and there is a flag designed specifically for this purpose which tells DSGC to only execute the first phase (the "mark" phase) of GC, and skip the 2nd "sweep" phase, until all instances have marked which blobs they wish to keep/discard. Once all instances have done so the sweep phase can be run to purge blobs not needed by any instances using the bucket.
For a more detailed explanation see the Oak docs:
https://jackrabbit.apache.org/oak/docs/plugins/blobstore.html#Shared_DataStore_Blob_Garbage_Collection_Since_1.2.0
I find it helps to understand that pretty much all of the datastore implementations are done such that blobs are stored according to their checksum, so the same file added uploaded twice will only have one copy stored in the datastore, and there will be two segment store records referencing that same blob. In the same way, multiple AEM instances sharing the same bucket will be able to find a given blob regardless of which instance put it there in the first place.
You can observe see this in action easily with FileDataStore by finding a blob and sha256'ing it - e.g. (this example is on OS X, the checksum command on Linux/Windows will be slightly different):
$ shasum -a256 crx-quickstart/repository/datastore/0c/9e/40/0c9e405fc8d0f0405930cd0044611cfbf014938a1837ae0cfaa266d7732d1002
0c9e405fc8d0f0405930cd0044611cfbf014938a1837ae0cfaa266d7732d1002 crx-quickstart/repository/datastore/0c/9e/40/0c9e405fc8d0f0405930cd0044611cfbf014938a1837ae0cfaa266d7732d1002
There you can see that a) the filename is the checksum, and b) it's nested using the first 3 pairs of characters from that checksum, so you can locate the file by just knowing the hash and if you store the same binary, even if the name or JCR metadata is different, the blob referenced will be the same literal file on disk.
From memory S3 datastore uses prefixes rather than directory nesting because this performance better, but the principle is the same.
Finally, a couple of things to consider are:
1) S3 storage is relatively cheap (and practically unlimited) so there is an argument to be made that it's not as necessary to perform regular DSGC unless you're really trying to pinch pennies.
2) If you do run DSGC you need to think about how this will work with whatever backup strategy you're using for the AEM instances. For instance, if you roll back a segment store shortly after running DSGC you'll likely have to recover some of those purged blobs. You can use versioning and/or lifecycle rules to help with this, but it can add significant additional complexity and time to your restore process.
If you opt to simply skip DSGC and leave the blobs there indefinitely it's a good idea to make sure the access key or IAM roles AEM is using doesn't have the DeleteObject permission for the bucket, just to be sure a rogue GC process can't delete anything.
Hope this helps.
Edit
In all that I forgot to actually answer your question - yes it will save some time in cloning in most cases. You'll still need to sync the segment store (obviously) and there are various approaches for this. crx2oak is certainly one - you'll see in the documentation there are specific options for using it w/ S3 where you supply a configuration file (basically a serialised .config file like you'd use with Felix/OSGi).
You can also use something like rsync to simply copy the TAR files over (while at least the target AEM is stopped. Oak is generally atomic so a hot copy from the source can work in theory, but YMMV).
Finally you could obviously use Mongo and cluster the segment store that way, but all the usual cost/complexity/performance issues with doing so apply).
Another interesting development on the horizon for blue/green type is the CompositeNodeStore - there is a good talk from the 2017 adaptTo() conference that talks about this:
https://adapt.to/2017/en/schedule/zero-downtime-deployments-for-the-sling-based-apps-using-docker.html
An external datastore will help a lot, as usually the most space is used by binary assets. The pure content typed in by real people is much less.
On my current project (quite small, but relations should be normal):
Repository 4,8 GB total (4.1 GB Segment Store, 780 MB Index)
File DataStore 222 GB total
If you wanna do it, I have the following remarks:
There are different datastores available. For testing I would start with the File DataStore.
The S3 DataStore makes only sense in my point of view, if you are hosting at Amazons AWS anyway. Adobe Managed Services is doing this, and so S3 makes sense for them. But also there only if you have more than 500 GB assets.
If you use the green/blue approach, then be careful the DataStore garbage collection (just do it manually). The shared Datastore is meant for several publishers, that have the same content. As example you could have the following situation: Your editors delete some assets, you run the DataStore GC and finally your rollback your environment. That means the assets are still in the content repository, but the binaries are cleaned out of the DataStore.
In order to to use a shared file datastore, you need to do the following:
Unpack Quickstart java -jar AEM_6.3_Quickstart.jar -unpack
Create an directory for the file datastore (anywhere outside of the crx-quickstart folder)
Create a directory install inside the extracted crx-quickstart folder
Create a file called org.apache.jackrabbit.oak.plugins.blob.datastore.FileDataStore.cfg inside this install folder
This file contains just 1 line path=<path to file datastore> (see https://jackrabbit.apache.org/oak/docs/osgi_config.html)
Place a reference.key file inside the datastore directory. First time it will be created automatically. But if you use always the same key, the same hash-values are used all datastores across all your environments. This is also a prerequisite for a feature called "binary-less replication" (so binary would only be replicated the first time between author and publisher)
kind regards,
Alex
My app consists of two containers: the app itself and a database. I'm planning to wrap the app into a chart, thus paving a way for easy reproducible deployment.
Apart from setting/reading environment envs (which helm+kubernetes seems to handle really well), part of app's configuration is:
making sure the database is pre-filled with special auxiliary data (e.g. admin user exists, some user role names required to create new users are there, etc.).
I like the idea of having readable yaml files hold the entire configuration in a human readable format. However at a glance it doesn't seem that helm in any way would help with this (DB records) kind of configuration.
That being said, what is the best place to put code/configuration ensuring that DB contains certain auxiliary records? A config yaml file? An container init script, written in bash?
You are right, Kubernetes or Helm cannot help with preparing your pre-filled database records/schema.
You should probably have your application initialize those pre-filled data. If you don't want to put this logic into your application, you can ship an initialization script and configure an init container with Kubernetes.
Kubernetes makes sure every time your application container is restarted, the init container runs first. In the init container, you can execute a bash/python/... script that makes sure the records you want are there.
I am working on a legacy application currently incorporating Jackrabbit 2.6, which at some point used the jackrabbit versioning (I am not even sure if it was with this or another jackrabbit version). Currently the versioning is still present in the configuration and its corresponding DB tables (*_BINVAL, *_BUNDLE, *_NAMES, *_REFS) are still there.
I would like to have the versioning disabled and completely removed as it takes up space in our database and slows down the Jackrabbit garbage collection with an empty run over the versioning persistence manager. I cannot find any information though about how to proceed with it.
Is it safe to simply remove the <Versioning>...</Versioning> tag from the xml configuration and to drop the related tables? How should I proceed?
Unfortunately, versioning is mandatory. Therefore we needed to clean as much of the version information as possible. In my case it turned out that somehow the mix:versionable mixins disappeared (probably due to changes in the custom node types and OCM), leaving the version related properties behind. What I ended up doing:
Iterate over the whole repository deleting the version history for each node (either by removing the mixin or the versioning properties in my case), saving the session after every X of changed nodes.
Close the Jackrabbit repository and rename the versioning tables (*_BINVAL, *_BUNDLE, *_NAMES, *_REFS) in the database to hide them from Jackrabbit.
Start Jackrabbit again - the tables in the database have been recreated and besides three default nodes are empty
After confirming that the repository is intact, drop the the hidden tables.
The garbage collection has become faster - we went down from two weeks to 4 hours. The version history contained millions of entries, which were completely unnecessary.
I have a situation where I would like to ignore specific folders inside of where Flyway is looking for the migration files.
Example
/db/Migration
2.0-newBase.sql
/oldScripts
1.1-base.sql
1.2-foo.sql
I want to ignore everything inside of the 'oldScripts' sub folder. Is there a flag that I can set in Flyway configs like ignoreFolder=SOME_FOLDER or scanRecursive=false?
An example for why I would do this is say, I have 1000 scripts in my migration folder. If we onboard a new member, instead of having them run the migration on 1000 files, they could just run the one script (The new base) and proceed from there. The alternative would be to never sync those files in the first place, but then people would need to remember to check source control to prior migrations instead of just looking on their local drive.
This is not currently supported directly. You could put both directories at the same level in the hierarchy (without nesting them) and selectively configure flyway.locations to achieve the same thing.
Since Flyway 6.4.0 wildcards are supported in flyway.locations. Examples:
db/**/test
db/release1.*
db/release1.?
More info at https://flywaydb.org/blog/organising-your-migrations