Configuration Settings Service/Repository: Are They Used in a Real World - configuration-management

I’m currently enjoying reading "Continuous Delivery: Reliable Software Releases Through Build, Test, and Deployment Automation" and the part that caught my attention is that on managing configuration settings for applications.
What's proposed in the book is that all configuration settings are externalized and centralized in a repository of some sort, be it an LDAP directory, ESCAPE Server or somesuch and then retrieved from there.
This sounds really compelling to me as this approach can provide a number of tangible benefits, but after Googling around for a bit it seems to me that this is not exactly a widespread approach.
I know there is a Twelve-Factor App article on this subject, but it suggests using environment variables instead of a centralized repository. This approach seems to be the most commonly used one, but it feels like a dirty one compared to a repository-based solution.
So, is the central-configuration-repository approach used in any significant manner in a real world, and if not -- what are the reasons for this?

Apparently, Zookeeper is frequently used for managing config variables:
http://zookeeper.apache.org/doc/r3.3.3/recipes.html#sc_outOfTheBox
Real World Use of Zookeeper
Doozer looks interesting as well:
https://github.com/ha/doozerd
And here is RESTful wrapper over Git inspired by the same text you refer to:
http://www.andycaine.com/configuration-management-with-restful-git/
Finally, env vars may feel dirty but they are actually quite a clean approach
https://devcenter.heroku.com/articles/config-vars

Related

Confusion between mobx-state-tree and mobx-keystone. When to use which?

Both recommended in the official Mobx page if one wants an opinionated way to using mobx for state management.
Based on these(1,2), keystone seems like an improvement of state-tree. Having everything that state-tree has + more. Nowhere I could find anything that state-tree has that keystone doesn't.
I see keystone is nowhere nearly as mature as state-tree. That's probably the main point stopping me from picking it instead. What are other good points for state-tree over keystone?
P.S. It's going to be used in a React app.
I'm the current maintainer of MobX-State-Tree. I think the primary benefit of MST over MobX-Keystone is that MST is used more widely and has more broad third-party support. For example, mobx-devtools supports MST but not MobX-Keystone, as does Reactotron.
With all that said, I'm very interested in exploring MobX-Keystone for our own usage at my consulting company. Even though I'm maintaining MST, I'm not opposed to MobX-Keystone, and the much better TypeScript support is very tempting. If we end up using it in a project and it goes well, we will likely build in support to Reactotron for it.
I hope this perspective helps.
(With regard to the other answer asking if you really need more than just MobX, it's my opinion that MST and MobX-Keystone bring super useful patterns and tools that help you scale across a whole application in a more cohesive way than remaking them yourself using MobX.)
First ask yourself if you really need those libraries, because you can get really far with just mobx and good ol' OOP patterns. In the official docs you have an example of a store that does auto-saving and serializing.
Having said that, I would go for the mobx-keystone. Typescript works right out of the box, and you can use classes to construct your store, which is IMHO, easier that MST stores. Plus the author is very responsive and he is also a contributor to mobx library.
mobx-keystone is super awesome. After using both of the two libraries for a while I would recommend using it over mobx-state-tree in every possible occasions. It's much more intuitive and easer to learn. It saved a lot of time for my project (switching from zustand to jotai to mobx-keystone, IMO each next one is a bit better than previous one in my use case)
After extensive testing / reading the source code / playing with test suite it feels that mobx-keystone provides a more robust developer experience with Typescript and also has enough escape hatches if you need to work with performance sensitive code. I would implementing a small problem (but complex enough) problem with both libraries to judge for yourself.

SmalltalkHub versus SqueakSource3

With the next closing of SqueakSource repository, I wonder what is the advantage of SmalltalkHub over SqueakSource3. Is there some feature missing in SqueakSource3?
I have noticed that basic functionality like explore projects seems broken in SmalltalkHub, which is understandable because it is still in beta, but why to move or register projects to SmalltalkHub then?
Pharo's core development happens completely on SmalltalkHub.
It is much faster than SqueakSource3, which is the main point in favor of it. SmalltalkHub is still being extended and new features are on the way. For a typical use case from the image side there is little or no difference between the two.
SmalltalkHub is faster than SS3. It has a new look. You can manage team and collaborators with a modern interface. Nicolas Petton uses it daily for work and in addition it will continue to enhance it. All the Pharo core packages are hosted there. Finally, you can also easily install and use SmalltalkHub in your company if you need it. The software stack behind SmalltalkHub is really robust and nice. Several companies uses it for other projects.
From a risk-avoidance POV, there is something to be said for storing them in both. Their software stacks and feature sets are sufficiently different. Just make sure you identify one as primary. Copying the monticello files from one repository to the other is easily automated. Stef provided some scripts to copy whole projects from squeaksource to smalltalkhub recently, they would equally well work for canary or ss3.

Should I default the environment for someone using my library?

I have been having this debate with a friend where i have a library (its python but I didn't include that as a tag as the question is applicable to any language) that has a few dependencies. The debate is whether to provide a default environment in the initialization or force the user of the code to explicitly set one.
My opinion is to force the user as its explicit and will avoid confusion and make it clear what they are pointing to.
My friend this is safer and more convenient to default to an environment and let the user override if he wants to.
Thoughts ? Are there any good references or examples / patterns in popular libraries that support either of our arguments? also, any popular blogs or articles that discuss this API design point?
I don't have any references, but here are my thoughts as a potential user of said library.
I think it's good to have a default configuration available to allow developers to quickly evaluate the library. I don't want to have to go through a bunch of configuration just to see if the library will do what I need. Once I'm happy that the library will do what i need it to do, then I'm happy to configure it the way I want.
A good example is Microsoft's ASP.Net MVC framework. When you create a new MVC project it hooks in a default authentication and membership provider, which allows the developer to very quickly get a functioning application up and running. It is also easy to configure different providers to be used if the default one's don't meet the requirements of the application in question.
As a slightly different example, Atlassian Confluence is wiki software which supports many different back-end databases. Atlassian could have chosen to have no default DB configuration, but instead Confluence ships with a default, simple, file-based database to allow users to evaluate the software. For production installations you can then hook up to Oracle, SQL Server, mySQL or whatever else you like.
There may be instances where a default configuratino for a library doesn't really make sense, but I think that would be a special case, rather than a general rule.
It depends. If you can provide sensible defaults, you might want to do that: it will make life easer on the occasional user of the library as they can set only the relevant settings, as opposed to the whole environment (with possibly settings the implications of which they don't fully understand (yet)). You are correct, that in situations it is possible this leads to frustration and confusion as the defaulted settings might cause unexpected behavior (unexpected by the (inexperienced) user) -- you have to weigh the reduced frustration of convenience against the price of not-understood defaults to make the choice for each of these possible-to-default settings, which choice might affect the choice for other, related settings as well
On the other hand, if there is no sensible default (e.g. DB credentials, remote address), you should require the user to provide those settings.
The key in both cases is to provide enough information in the documentation of the library and in the error messages (either for missing settings or conflicting ones) that the user can figure out what those settings actually mean/control without having to read through the source code of the library. This part is hard because 1) it is usally tedious from the point of view of the library developer (so it is often skimped) and 2) the documentation has to be written from the mindset of a newbie to the library, which is often different from the library developer's mindset -- the latter knows the implicit connections/implications, the former has to be told about those in an understandable way.
Although not exactly identical in terms of problem domain, this strikes me as the Convention over Configuration argument.
There has been quite a lot momentum behind CoC in recent years, and in my mind, it makes a whole lot of sense. As long as flexibility is not lost, you have everything to gain. Lower friction development is what we are all after, and if I've got to configure every aspect of your API in order to get it working, I'm less inclined to use it over another API of equal functionality.
I happen to like Hanselman's podcasts, so if you want a little light listening, check out this podcast.
I think your question needs some clarification. For starters, I don't think a library should have any runtime configuration. In terms of dependencies, library dependencies should be handled in a manner appropriate to the environment they are being written for. In python, those dependencies should be in the setup.py file (under requirements), and ultimately that file should meet the requirements of whatever service you plan on making it available on (i.e. pypi for python).
For applications, it is completely okay to require runtime configuration, but you should try to have sensible defaults. If your application depends on libraries, that dependency should be handled in the same way a library dependency would be handled, even though that information may be redundant in the context of an installer (if needed). For the most part first-run scripts and their ilk should be apart of the installer/rpm.
For Web Frameworks, it is typical that your app would carry configuration with it, and likely that it would need to be installed in a different way than traditional applications. Here, about the only thing you can do is try to follow the conventions of whatever framework you are writing in.

HTTP requests and Apache modules: Creative attack vectors

Slightly unorthodox question here:
I'm currently trying to break an Apache with a handful of custom modules.
What spawned the testing is that Apache internally forwards requests that it considers too large (e.g. 1 MB trash) to modules hooked in appropriately, forcing them to deal with the garbage data - and lack of handling in the custom modules caused Apache in its entirety to go up in flames. Ouch, ouch, ouch.
That particular issue was fortunately fixed, but the question's arisen whether or not there may be other similar vulnerabilities.
Right now I have a tool at my disposal that lets me send a raw HTTP request to the server (or rather, raw data through an established TCP connection that could be interpreted as an HTTP request if it followed the form of one, e.g. "GET ...") and I'm trying to come up with other ideas. (TCP-level attacks like Slowloris and Nkiller2 are not my focus at the moment.)
Does anyone have a few nice ideas how to confuse the server's custom modules to the point of server-self-immolation?
Broken UTF-8? (Though I doubt Apache cares about encoding - I imagine it just juggles raw bytes.)
Stuff that is only barely too long, followed by a 0-byte, followed by junk?
et cetera
I don't consider myself a very good tester (I'm doing this by necessity and lack of manpower; I unfortunately don't even have a more than basic grasp of Apache internals that would help me along), which is why I'm hoping for an insightful response or two or three. Maybe some of you have done some similar testing for your own projects?
(If stackoverflow is not the right place for this question, I apologise. Not sure where else to put it.)
Apache is one of the most hardened software projects on the face of the planet. Finding a vulnerability in Apache's HTTPD would be no small feat and I recommend cutting your teeth on some easier prey. By comparison it is more common to see vulnerabilities in other HTTPDs such as this one in Nginx that I saw today (no joke). There have been other source code disclosure vulnerablites that are very similar, I would look at this and here is another. lhttpd has been abandoned on sf.net for almost a decade and there are known buffer overflows that affect it, which makes it a fun application to test.
When attacking a project you should look at what kind of vulnerabilities have been found in the past. Its likely that programmers will make the same mistakes again and again and often there are patterns that emerge. By following these patterns you can find more flaws. You should try searching vulnerablites databases such as Nist's search for CVEs. One thing that you will see is that apache modules are most commonly compromised.
A project like Apache has been heavily fuzzed. There are fuzzing frameworks such as Peach. Peach helps with fuzzing in many ways, one way it can help you is by giving you some nasty test data to work with. Fuzzing is not a very good approach for mature projects, if you go this route I would target apache modules with as few downloads as possible. (Warning projects with really low downloads might be broken or difficult to install.)
When a company is worried about secuirty often they pay a lot of money for an automated source analysis tool such as Coverity. The Department Of Homeland Security gave Coverity a ton of money to test open source projects and Apache is one of them. I can tell you first hand that I have found a buffer overflow with fuzzing that Coverity didn't pick up. Coverity and other source code analysis tools like the open source Rats will produce a lot of false positives and false negatives, but they do help narrow down the problems that affect a code base.
(When i first ran RATS on the Linux kernel I nearly fell out of my chair because my screen listed thousands of calls to strcpy() and strcat(), but when i dug into the code all of the calls where working with static text, which is safe.)
Vulnerability resarch an exploit development is a lot of fun. I recommend exploiting PHP/MySQL applications and exploring The Whitebox. This project is important because it shows that there are some real world vulnerabilities that cannot be found unless you read though the code line by line manually. It also has real world applications (a blog and a shop) that are very vulnerable to attack. In fact both of these applications where abandoned due to security problems. A web application fuzzer like Wapiti or acuentix will rape these applications and ones like it. There is a trick with the blog. A fresh install isn't vulnerable to much. You have to use the application a bit, try logging in as an admin, create a blog entry and then scan it. When testing a web application application for sql injection make sure that error reporting is turned on. In php you can set display_errors=On in your php.ini.
Good Luck!
Depending on what other modules you have hooked in, and what else activates them (or is it only too-large requests?), you might want to try some of the following:
Bad encodings - e.g. overlong utf-8 like you mentioned, there are scenarios where the modules depend on that, for example certain parameters.
parameter manipulation - again, depending on what the modules do, certain parameters may mess with them, either by changing values, removing expected parameters, or adding unexpected ones.
contrary to your other suggestion, I would look at data that is just barely short enough, i.e. one or two bytes shorter than the maximum, but in different combinations - different parameters, headers, request body, etc.
Look into HTTP Request Smuggling (also here and here) - bad request headers or invalid combinations, such as multiple Content-Length, or invalid terminators, might cause the module to misinterpret the command from Apache.
Also consider gzip, chunked encoding, etc. It is likely that the custom module implements the length check and the decoding, out of order.
What about partial request? e.g requests that cause a 100-Continue response, or range-requests?
The fuzzing tool, Peach, recommended by #TheRook, is also a good direction, but don't expect great ROI first time using it.
If you have access to source code, a focused security code review is a great idea. Or, even an automated code scan, with a tool like Coverity (as #TheRook mentioned), or a better one...
Even if you don't have source code access, consider a security penetration test, either by experienced consultant/pentester, or at least with an automated tool (there are many out there) - e.g. appscan, webinspect, netsparker, acunetix, etc etc.

Can everything be done programmatically in WCF or are configuration files for certain features?

I have a strong preference for working in code, leverage IntelliSense and opening up all of the power of the C# language to work with WCF but I want to make sure that I'm not moving in a direction that ultimately will limit the WCF feature set I can access. My experience is so limited with WCF that I don't understand the benefits of using the configuration files, especially if you can do everything in code (?).
Note: I'm using .NET 3.5.
Can you do 'everything' with WCF programmatically or are configuration files required for the full WCF feature set?
You can do about 99.8% of things in code as well as config.
Some things can be done only in code - like setting user name and password on a call that requires those two for authentication.
And there appear to be a few things that can be done in config only - see this other recent SO question for one example.
But I think, if you prefer code, you should be fine for the vast majority of cases.
Marc
An overgrown comment...
Marc_s' answer and the question's perspective is good (two +1s from me).
I have no doubt that the following will not be news to either of you, but wanted to point it out in case someone encounters this and isn't aware of the cons of a purely programmatic approach.
Moving to programmatic configuration from config-file based setup means
you lose the ability to adjust (read: hack!) things in the field -- your only avenue of recourse will be to recompile and redeploy binaries. For many scenarios (including one of mine) this is not n.
you lose the ability to switch between multiple sets of configurations by juggling them in the config file.
I admit that both of the cited 'losses' are debatable - they can encourage bad habits and prevent you from reaching the most solid solution for your customers in the quickest manner possible.
UPDATE: I've implemented a mechanism where I use ChannelFactory<T> but pick up a customised config from the app.config if it's present, or provide a default if it isn't (my scenario is that I'm a guest in someone else's process and hence can't assume a config fuile is easy to update / has been updated, yet dont want to lose the option of tweaking settings after deployment)