How to break down terraform state file - module

I am looking for guidance/advice on how to best break down a terraform state file into smaller state files.
We currently have one state file for each environment and it has become unmanageable so we are now looking to have a state file per terraform module so we need to separate out the current state file.
Would it be best to point it to a new s3 bucket, then run a plan and apply for the broken down modules and generate a fresh state file for each module or is there an easier or better way to achieve this?

This all depends upon how your environment has been provisioned and how critical the down time is ?
Below are the two general scenarios, I can think of from your question.
First Scenario - ( if you can take down time )
Destroy everything that you got and start from scratch by defining separate backend for each module and provision the infrastructure from that point on. So now you can have backend segregation and infrastructure management becomes easier.
Second Scenario - ( If you can't take down time )
Lets' say you are running mission critical workloads that absolutely can't take any down time.
In this case, you will have to come up with proper plan of migrating huge monolith backend to smaller backends.
Terraform has the command called terraform state mv which can help you with migrating one terraform state to another one.
When you work on the scenario, start from lower level environments and work from there.
Note down any caveats that you are encountering during these migration in lower level environments, the same caveats will apply in higher level environments as well
Some useful links
https://www.terraform.io/docs/cli/commands/state/mv.html
https://www.terraform.io/docs/cli/commands/init.html#backend-initialization

Although the only other answer (as of now) lists only two options - the other option is that you can simply make terraform repos (or folders, however you are handling your infrastructure) - and then do terraform import to bring existing infrastructure into those (hopefully) repos.
Once all of the imports have proven to be successful, you can remove the original repo/source/etc. of the monolithic terraform state.
The caveat is that the code for each of the new state sources must match the existing code and state, otherwise this will fail.

Related

Liquibase incremental snapshots

We've got a rather interesting use-case where we're using Liquibase to deploy a database for our application but we're not actually in control of the database. This means that we've got to add in a lot of extra logic around each time we run Liquibase to avoid encountering any errors during the actual run. One way we've done that is that we're generating snapshots of what the DB should look like for each release of our product and then comparing that snapshot with the running DB to know that it's in a compatible state. The snapshot files for our complete database aren't gigantic but if we have to have a full one for every possible release that could cause our software package to get large in the future with dead weight.
We've looked at using the Linux patch command to create offset files as the deltas between these files will typically be very small (i.e. 1 column change, etc.) but the issues are the generated IDs in the snapshot that are not consistent across runs:
"snapshotId": "aefa109",
"table": "liquibase.structure.core.Table#aefa103"
Is there any way to force the IDs to be consistent or attack this problem in a different way?
Thanks!
Perhaps we should change how we think about PROD deployments. When I read:
This means that we've got to add in a lot of extra logic around each time we run Liquibase to avoid encountering any errors during the actual run.
This is sort of an anti-pattern in the world of Liquibase. Typically, Liquibase is used in a CI/CD pipeline and deployments of SQL are done on "lower environments" to practice for the PROD deployment (which many do not have control over, so your situation is a common one).
When we try to accommodate the possible errors during a PROD deployment, I feel we already are in a bad place with our deployment automation. We should have been testing the deploys on lower environmets that look like PROD.
For example your pipeline for your DB could look like:
DEV->QA->PROD
Create SQL for deployment in a changelog
DEV & QA seeded with restore from current state of PROD (maybe minus the row data)
You would have all control in DEV (the wild west)
Less control of QA (typically only by QA)
Iterate till you have no errors in your DEV & QA env
Deploy to PROD
If you still have errors, I would argue that you must root cause why and resolve so you can have a pipeline that is automatable.
Hope that helps,
Ronak

Can you make separate schedules for workflows on staging versus prod?

I would like to have certain workflows run on a different schedule in staging and in production (ex: one workflow run multiple times a day in staging and only once a day in production). Doing so would help with getting quicker feedback on the runs and also save on compute costs. How can I do this with flytekit? Is it recommended?
There is no easy way to do this as it goes against main integration/deployment strategy championed by Flyte.
Flyte entities, comprising tasks, workflow, and launch plans, are designed to be iterated under in a user's development domain. After iterating, users are expected to deploy that version to the staging domain and then to production. The more difference there are between those domains, the more confusion there will be down the road we suspect.
That said, it is possible to do this because the registration step allows the user to specify a different configuration file. One of the entries in the configuration file is this workflow_packages construct. This enables the user basically to look at different folders when registering in staging vs. production for instance.
In order to get a launch plan to only exist in one domain, you'll need to put it in a new folder/module that is inaccessible from any of the extant workflow packages, and then put the other one in yet another.
In the staging file,
[sdk]
workflow_packages=app.workflows,staging_lps
In the production file,
[sdk]
workflow_packages=app.workflows,production_lps

Updating Redis Click-To-Deploy Configuration on Compute Engine

I've deployed a single micro-instance redis on compute engine using the (very convenient) click-to-deploy feature.
I would now like to update this configuration to have a couple of instances, so that I can benchmark how this increases performance.
Is it possible to modify the config while it's running?
The other option would be to add a whole new redis deployment, bleed traffic onto that over time and eventually shut down the old one. Not only does this sound like a pain in the butt, but, I also can't see any way in the web UI to click-to-deploy multiple clusters.
I've got my learners license with all this, so would also appreciate any general 'good-to-knows'.
I'm on the Google Cloud team working on this feature and wanted to chime in. Sorry no one replied to this for so long.
We are working on some of the features you describe that would surely make the service more useful and powerful. Stay tuned on that.
I admit that there really is not a good solution for modifying an existing deployment to date, unless you launch a new cluster and migrate your data over / redirect reads and writes to the new cluster. This is a limitation we are working to fix.
As a workaround for creating two deployments using Click to Deploy with Redis, you could create a separate project.
Also, if you wanted to migrate to your own template using the Deployment Manager API https://cloud.google.com/deployment-manager/overview, keep in mind Deployment Manager does not have this limitation, and you can create multiple deployments from the same template in the same project.
Chris

Rails "sub-environment" -- still production (or test, etc.) but different

How should we best handle code that is part of a single Rails app, but is used in several different "modes"?
We have several different cases of an app that is driven from the same data sources (MySQL, MongoDB, SOLR) and shares core logic, assets, etc. across multiple different uses.
Background/details:
HTML vs REST API
A common scenario is that we have HTML and REST interfaces. These differences are handled through routing (e.g. /api/v1/user/new vs /user/new) -- with minor differences they provide the same functions. This seems reasonably clean to me.
Multi-tenant
Another common scenario is that the app is "multi-tenant", determined mainly by subdomain of the URL, e.g. partner1.example.com and partner2.example.com (or query-string parameter for API customers) -- each has a number of features or properties that differ. This is handled by a filter ApplicationController using data largely stored in a set of tenant-specific database tables with tenant-specific functionality encapsulated by methods. This also seems reasonably clean to me.
Offline Tasks
One scenario is that a great deal of the data is acquired through a very large number of tasks, running pretty much continuously: feed loaders, scrapers, crawlers, and other tasks of this sort ... the kinds of things you would find in a search engine, which is a large part of what we do. These tasks are launched on idle server instances and run periodically ... but are just rake tasks that are part of the app.
These tasks are characteristically different than our front-end code -- they update data, run calculations, do maintenance tasks and so on -- some tasks run for days (e.g. update 30M documents from an external web service). In the end, these tasks create and keep fresh the core data that our front end app uses.
This one doesn't seem as clean to me, in particular, in some cases, these tasks are running and doing data updates at the same time as our application is using them, so occasionally need to defer to the front-end app when we're under peak loads.
Major Variants of the App
This last case is clearly wrong -- we have made major customizations of our app -- 15% or 20% different, by making branches and then running as an entirely separate app, sharing some of the core data sometimes, but using some of its own data other times. We have mostly fixed this now, as it was, of course, untenable.
OK, there's a question in here somewhere, right?
So in particular for the offline tasks I feel like the app really needs to be launched in a "mode" or perhaps "sub-environment". But we still have normal development, test, qa, demo, pre_release, production environments that have their own isolated data and other configuration parameters. For each of these, we want to be able to run, develop, test and deploy the various "modes" of the application.
Can anyone suggest an appropriate architecture that is similar to the declarative notions of standard Rails environments?
If the number of modes is ever-increasing:
Perhaps the offline tasks could be separated from the main app, into their own application (or a parent abstract task with actual tasks inheriting from it and deployed individually).
If the number of modes is relatively small and won't be changing often:
You could put the per-mode configuration into a config file, logically separate from the rest of the code. Then during the deployments, you would be able to provide a combination of (environment, mode, set of hosts) and get a good level of control of your environments while using the same codebase.

Redis databases on a dev machine with multiple projects

How do you manage multiple projects on your development and/or testing machine, when some of those projects use Redis databases?
There are 2 major problems:
Redis doesn't have named databases (only numbers 0-16)
Tests are likely to execute FLUSHDB on each run
Right now, I think we have three options:
Assign different databases for each project, each dev and test environment
Prefix keys with a project name using something like redis-namespace
Nuke and seed the databases anytime you switch between projects
The first one is problematic if multiple projects assign "0" for the main use and "1" for the test and such. Even if Project B decided to change to "2" and "3", another member in the project might have a conflict in another projects for him. In other words, that approach is not SCM friendly.
For the second one, it's a bad idea simply because it adds needless overhead on runtime performance and memory efficiency. And no matter what you do, another project might be already using the same key coincidentally when you joined the project.
The third option is rather a product of compromise, but sometimes I want to keep my local data untouched while I deploy small patches for another projects.
I know this could be a feature request for Redis, but I need a solution now.
Any ideas, practices?
If the projects are independent and so do not need to share data, it is much better to use multiple redis instances - each project configuration has a port number rather than a database name/id. Create an appropriately named config file and startup script for each one so that you can get whichever instance you need running with a single click.
Make sure you update the save settings in each config file as well as setting the ports - Multiple instances using the same dump.rdb file will work, but lead to some rather confusing bugs.
I also use separate instances for development and testing so that the test instance never writes anything to disk and can be flushed at the start of each test.
Redis is moving away from multiple databases, so I would recommend you start migrating put of that mechanism sooner rather than later. This means one instance per db. Given the very low overhead of running Redis, this isn't a problem from a resources standpoint.
That said, you can specify the number of databases, and providing A naming standard would work. For example, configure redis to have say, 60 DBS and you add 10 for the test db. For example db3 uses db13 for testing.
It sounds like your dev, test, and prod environments are pretty tied together. If so, I'd suggest moving away from that. Using separate instances is the easiest route to that, and provides protection against cross purpose contamination. Between this and the future of redis being single-db per instance, separate instances is the best route.