Managing complex configurations - configuration-management

I would like to ask you for your opinion on the best practices of managing big numbers of complex (for example xml, .properties, custom formats etc) configuration files as nowadays every more complex project consist of way to many to count.
How not to get lost in such a mess? How to reuse those in best ways? Any good tool that can help (maybe some Eclipse based)?

Obviously, the most fundamental thing to do is to put those configuration files into source control, as they really are a kind of source code.
But the real challenge with configuration management is deciding how many files to have and what to put where so that you can reuse common configurations. Those are design decisions and very project- and environment-specific, and no tool can make them for you.
A common approach is to have a master config file that contains default values, and environment-specific files that contain only those config values that are different for their environment, and which overwrite the defaults. This happens as part of an automated build process (which you really, really should have).

I'm not sure if this would be applicable to your situation, but I'd try and store them all in a database in a normalised fashion. maybe you need a could write some import tools that grabs the info from the files and adds them to the db.
Alternativly reading this, or this might help you.

Related

What are the advantages of creating an Odoo module as opposed to forking it?

We are interested in using Odoo but we would need to modify it slightly for our use case, for instance modifying partner by adding fields, and integrating with an external system.
Is it best to fork it or to make a module with the changes in? The changes would be quite specific to our use case and existing system so it's unlikely it would be useful to anyone else as a module/app.
My thinking is that by forking it would be easier to stay up to date with Odoo - we just have to pull in changes from upstream occasionally. It seems like with a module you would end up with lots of stale code that's difficult to update because you've moved it outside the source tree.
It also seems like it would be easier to deploy because you have all the code in one place rather than two.
As from my POV and according to many years of ERP - experience, it is the best advice to implement those kind of changes always in an own module (inheriting all the required standard components) and leave the standard untouched. This applies also for very specific customizations as well as for general improvements.
This procedure provides you with the best flexibility in installing, updating, distributing and maintaining your code while not being touched when updates to the standard modules are carried out on the target systems.
You will also be able to share and move your code between dev/test/prod systems and applying a version control on it.
Please always make sure to be in line with valid license obligations applying to your code (especially when inheriting standard modules).
Hope this helps ;-)

Automatisation&Piping of diverse tasks

I am looking for recommendations for a very generic automation/task execution tool. The scope is somewhat between a script, a build system like make and orchestration tools like Ansible or Puppet. The best I can do is describe my rather vague 'requirements' and hope for clues how others have solved these problems. Sorry for the long description, I guess I don't really know what exactly I want he solution to do. I profit from programming answers on SO all the time but I am not entirely sure if my open ended question is acceptable here.
--
We work as data analysts/system validators in a corporate setting. We perform a range of diverse tasks and interact with lots of ever changing systems. Each little step we do is arguably mundane/easy, but the bigger picture only forms if lots of iterations with slightly different inputs or combinations are repeated. It is a bit like looking for a needle in a hay stack, but the concrete problem is slightly different every time. This makes it hard to use a normal script or automation tool, which require more structure to work. But doing things semi-manual without a big team does not allow us to cover all the analysis/cases we want/need.
To give an applied example: a typical tasks could involve setting up a big calculation in a vendor system, extracting their ASCII output from a web server and parsing it. Then we would suck raw input data from a set of configuration files and data bases. This is piped into some of our home grown replication tools/models living in C++. Then both the system's results and our replication is scanned for interesting outliers (e.g. regression tested) and only this subset is uploaded for human analysts to investigate, nicely presented in an Excel sheet.
We can do all these things easily by hand for a once-off or maybe using ad-hoc tools/scripts. We just can't do it repeatedly for ever so slightly different settings. We seem to need a library for 'common tasks' that are just specialized by some few inputs (e.g. task it to download a time series and scan for outliers - parameters would be db access/login and maybe parameters defining what an outlier is in that context). And then I need to chain these tasks together to make complex tasks repeatable and simple to build up from atomic steps.
I have not found anything really do something like this. There seems to be specialist scripting or tools for each niche available, but not something combining all the different tasks I need to perform.
I have been so far toying on and off with a minimalist sqlite database which controls a set of python 'scripts'/wrappers. These scripts take input parameters from the data base, and they are chained/piped based on the database. The scripts write their results back to the database, mostly as plain text and floats/ints. This kind of db interface is very error prone and complicated for humans; the idea is to have (template) scripts writing (concrete/parametrised) scripts to the db for execution, like rolling itself out before executing. Not sure if this is a smart idea, but the db is driving the scripts, without much interacting among these building block script; rather than having the conventional bunch of scripts calling each other and dumping some data into db as an after thought. So far we have lots of separate wrappers (scripts) to talk to all the systems and do the work, what is really missing is something tying it all together an controlling it.
I am interested (obviously) more in data/flow transparency, repeatability and chaining mini-programs together to bigger units, rather than speed or scaling to larger data sets. All the heavier lifting is either done in the systems we interact with, or it is delegated to C++ called from these python scripts. This is not a production system with more stability and fixed goals but rather a flexible analysis/investigation helper.
I really hope someone here has previously run into exactly that problem severely limiting our productivity, and we can just piggy back off your solution or ideas.
I would suggest that you consider staf (Software Test Automation Framework). It's open source, distributed, and cross-platform. It will run just about any task on just about any platform. It has a variety of plugin "Services" available for specific purposes, or you can create your own custom Service. You can also extend the functionality through scripting (jython) It's also well documented and reasonably well supported through user forums by IBM.

shorten coldfusion namespaces for components

i am making an object-oriented app in coldfusion, and so i have really broken down the code. so, i have really long namespaces for my components; for example:
folder1.folder2.plugin1.datatypes.Object
i seem to be repeating a lot of stuff, but at the same time, some of these things are acting like "modules". what i mean by this is that "folder2" in the example really contains, for lack of a better term, "stand-alone" components/applications (think of them like plugins). so, aside from them calling other plugins' resources, they act on their own. but, due to the folder structure, i still have to refer to them all as folder1.folder2.... and so on.
so, let us assume that the "folder1.folder2." could change on a whim. (this will not happen, but since "plugin1" would define a stand-alone component, it does not care what "folder1" or "folder2" contains, if they even exist).
when i am writing code within the plugin, is there anyway i can shorten the namespace string; is there such a thing as "relative" namespaceing, just like using relative href links?
such a thing would save me a lot of time, but would also help ensure these things are more stand-alone as they would not be tied to their encapsulating folder structure...
You could use ColdFusion mappings, specifically per-application mappings in Application.cfc.
You do this in Application.cfc
<cfset this.mappings["/com"] = expandPath("folder1/folder2/plugin1") />
The you could reference components by doing com.datatypes.object.
Cannot recall when per-app mappings came about, but its been there for a few releases.
Sounds like you may want to consider dependency injection such as WireBox. This would allow you to have a single configuration file with the full paths and allow you to use an alias to obtain your models. In fact, you can even have wire box scan locations so you don't have to list every object you create.
WireBox was extracted from the amazing ColdBox framework. It is available independently of the ColdBox framework and should be somewhat simple to introduce into your application.
There is a helpful Google Group for ColdOx (and related boxes), ColdBox connection meetings that are recorded and other types of training available for WireBox.
I can not imagine building sophisticated OO without dependency injection. Well worth the effort to learn and implement.

Can everything be done programmatically in WCF or are configuration files for certain features?

I have a strong preference for working in code, leverage IntelliSense and opening up all of the power of the C# language to work with WCF but I want to make sure that I'm not moving in a direction that ultimately will limit the WCF feature set I can access. My experience is so limited with WCF that I don't understand the benefits of using the configuration files, especially if you can do everything in code (?).
Note: I'm using .NET 3.5.
Can you do 'everything' with WCF programmatically or are configuration files required for the full WCF feature set?
You can do about 99.8% of things in code as well as config.
Some things can be done only in code - like setting user name and password on a call that requires those two for authentication.
And there appear to be a few things that can be done in config only - see this other recent SO question for one example.
But I think, if you prefer code, you should be fine for the vast majority of cases.
Marc
An overgrown comment...
Marc_s' answer and the question's perspective is good (two +1s from me).
I have no doubt that the following will not be news to either of you, but wanted to point it out in case someone encounters this and isn't aware of the cons of a purely programmatic approach.
Moving to programmatic configuration from config-file based setup means
you lose the ability to adjust (read: hack!) things in the field -- your only avenue of recourse will be to recompile and redeploy binaries. For many scenarios (including one of mine) this is not n.
you lose the ability to switch between multiple sets of configurations by juggling them in the config file.
I admit that both of the cited 'losses' are debatable - they can encourage bad habits and prevent you from reaching the most solid solution for your customers in the quickest manner possible.
UPDATE: I've implemented a mechanism where I use ChannelFactory<T> but pick up a customised config from the app.config if it's present, or provide a default if it isn't (my scenario is that I'm a guest in someone else's process and hence can't assume a config fuile is easy to update / has been updated, yet dont want to lose the option of tweaking settings after deployment)

What is the best way to save my POJOs into Jackrabbit JCR?

In Jackrabbit I have experienced two ways to save my POJOs into repository nodes for storage in the Jackrabbit JCR:
writing my own layer
and
using Apache Graffito
Writing my own code has proven time consuming and labor intensive (had to write and run a lot of ugly automated tests) though quite flexible.
Using Graffito has been a disappointment because it seems to be a "dead" project stuck in 2006
What are some better alternatives?
Another alternative is to completely skip an OCM framework and simply use javax.jcr.Node as a very flexible DAO itself. The fundamental reason why OCM frameworks exist is because with RDBMS you need a mapping from objects to the relational model. With JCR, which is already very object-oriented (node ~= object), this underlying reason is gone. What is left is that with DAOs you can restrict what your programmers can access in their code (incl. the help of autocompletion). But this approach does not really leverage the JCR concept, which means schema-free and flexible programming. Using the JCR API directly in your code is the best way to follow that concept.
Imagine you want to add a new property to an existing node/object later in the life of your application - with an OCM framework you have to modify it as well and make sure it still works properly. With direct access to nodes it is simply a single point of change. I know, this is a good way to get problems with typos in eg. property names; but this fear is not really backed by reality, since you will in most cases very quickly notice typos or non-matching names when you test your application. A good solution is to use string constants for the common node or property names, even as part of your APIs if you expose the JCR API across them. This still gives you the flexibility to quickly add new properties without having to adopt OCM layers.
For having some constraints on what is allowed or what is mandatory (ie. "semi-schema") you can use node types and mixins (since JCR 2.0 you can also change the node type for existing content): thus you can handle this completely on the repository level and don't have to care about typing and constraints inside your application code - apart from catching the exceptions ;-)
But, of course, this choice depends on your requirements and personal preferences.
You might want to have a look at Jackrabbit OCM that is alive and kickin. Of course another way is to manually serialize/deserialize the POJOs. For that there are many different options. Question is whether you need fix schema to query the objects in JCR. If you just want to serialize into XML then XStream is a very painless way to do so. If you need a more fix schema there is also Betwixt from Apache Commons.
It depends on your needs. When you directly use javax.jcr.node, it means your code is heavily coupled to the underlying mechanism. In medium and even some small sized projects, this is not a good idea. Obviously the question will be how to go from the Node to your own domain model. The problem is quite similar as with going from Jdbc ResultSet to your own domain model. Mind you, I mean from a technical point of view the problem is similar. From a functional point of view, there are huge differences between using JDBC and JCR.
Another deciding factor is whether you can impose a structure in your JCR content or not. Some application domains can (but still match better with JCR than JDBC), in other domains the content may be highly unstructured in nature. In such case OCM is clearly overkill. I'd still advice to write your own wrapper layer around javax.jcr.* classes.
There's also https://github.com/ilikeorangutans/omf, a very flexible object to JCR mapper. Unfortunately it doesn't have write support yet. However we're successfully using this framework in a large CMS installation.
There is also the JCROM project at http://code.google.com/p/jcrom/. That project went dormant for a couple of years, but there have been a few new releases as of summer 2013.