Can everything be done programmatically in WCF or are configuration files for certain features? - wcf

I have a strong preference for working in code, leverage IntelliSense and opening up all of the power of the C# language to work with WCF but I want to make sure that I'm not moving in a direction that ultimately will limit the WCF feature set I can access. My experience is so limited with WCF that I don't understand the benefits of using the configuration files, especially if you can do everything in code (?).
Note: I'm using .NET 3.5.
Can you do 'everything' with WCF programmatically or are configuration files required for the full WCF feature set?

You can do about 99.8% of things in code as well as config.
Some things can be done only in code - like setting user name and password on a call that requires those two for authentication.
And there appear to be a few things that can be done in config only - see this other recent SO question for one example.
But I think, if you prefer code, you should be fine for the vast majority of cases.
Marc

An overgrown comment...
Marc_s' answer and the question's perspective is good (two +1s from me).
I have no doubt that the following will not be news to either of you, but wanted to point it out in case someone encounters this and isn't aware of the cons of a purely programmatic approach.
Moving to programmatic configuration from config-file based setup means
you lose the ability to adjust (read: hack!) things in the field -- your only avenue of recourse will be to recompile and redeploy binaries. For many scenarios (including one of mine) this is not n.
you lose the ability to switch between multiple sets of configurations by juggling them in the config file.
I admit that both of the cited 'losses' are debatable - they can encourage bad habits and prevent you from reaching the most solid solution for your customers in the quickest manner possible.
UPDATE: I've implemented a mechanism where I use ChannelFactory<T> but pick up a customised config from the app.config if it's present, or provide a default if it isn't (my scenario is that I'm a guest in someone else's process and hence can't assume a config fuile is easy to update / has been updated, yet dont want to lose the option of tweaking settings after deployment)

Related

Configuration Settings Service/Repository: Are They Used in a Real World

I’m currently enjoying reading "Continuous Delivery: Reliable Software Releases Through Build, Test, and Deployment Automation" and the part that caught my attention is that on managing configuration settings for applications.
What's proposed in the book is that all configuration settings are externalized and centralized in a repository of some sort, be it an LDAP directory, ESCAPE Server or somesuch and then retrieved from there.
This sounds really compelling to me as this approach can provide a number of tangible benefits, but after Googling around for a bit it seems to me that this is not exactly a widespread approach.
I know there is a Twelve-Factor App article on this subject, but it suggests using environment variables instead of a centralized repository. This approach seems to be the most commonly used one, but it feels like a dirty one compared to a repository-based solution.
So, is the central-configuration-repository approach used in any significant manner in a real world, and if not -- what are the reasons for this?
Apparently, Zookeeper is frequently used for managing config variables:
http://zookeeper.apache.org/doc/r3.3.3/recipes.html#sc_outOfTheBox
Real World Use of Zookeeper
Doozer looks interesting as well:
https://github.com/ha/doozerd
And here is RESTful wrapper over Git inspired by the same text you refer to:
http://www.andycaine.com/configuration-management-with-restful-git/
Finally, env vars may feel dirty but they are actually quite a clean approach
https://devcenter.heroku.com/articles/config-vars

Best ESB/Message Queue for appharbor

I'm currently trying to find the best message queue solution for an appharbor application. Most of the ones of looked at assume you have a windows environment with MSMQ and DTC installed, which I don't believe the appharbor environment provides.
I would like something that works well with ravendb, as that is the database we are using. Something who's only dependence is on raven would be ideal, especially if it integrates with our existing unit of work. Ie, when save changes is called in our controller action the messages are saved in the same transaction.
It would also need a host that works in a console application for background processing.
Ideally I would like something that "just works" in a development environment also. With raven, for example, we use the embedded mode while developing and I would like something that doesn't require installation.
I've looked at nServicebus, which seems to fail these conditions because it needs a transport (msmq, sql, etc) and much of the documentation is out of date.
I also looked at rhino service bus but there is a distinct lack of documentation and community. I'm also not sure if it can depend entirely on ravendb.
The others I looked at all seemed quite heavyweight and required installation and configuration to run in a development environment.
Edit: the other option, is to implement our own.
First of all, congratulations on being the 1000th NServiceBus question on StackOverflow!
Second, if you were to use SQL for persisting your business data, then you could run NServiceBus on top of that same SQL where all the messages go through tables (instead of queues) and then you wouldn't need the DTC.
Third, if you did want to go with RavenDB as your transport for NServiceBus, you would have to implement the ISendMessages and IReceiveMessages interfaces on top of it, but I believe that somebody in the community has already started working on that, so possibly you could join forces with them.
Finally, I wouldn't recommend writing your own ESB these days - not when there are so many good choices already out there. You mentioned the issues of community and documentation - those tend to be handled the worst when writing your own infrastructure.

Should I default the environment for someone using my library?

I have been having this debate with a friend where i have a library (its python but I didn't include that as a tag as the question is applicable to any language) that has a few dependencies. The debate is whether to provide a default environment in the initialization or force the user of the code to explicitly set one.
My opinion is to force the user as its explicit and will avoid confusion and make it clear what they are pointing to.
My friend this is safer and more convenient to default to an environment and let the user override if he wants to.
Thoughts ? Are there any good references or examples / patterns in popular libraries that support either of our arguments? also, any popular blogs or articles that discuss this API design point?
I don't have any references, but here are my thoughts as a potential user of said library.
I think it's good to have a default configuration available to allow developers to quickly evaluate the library. I don't want to have to go through a bunch of configuration just to see if the library will do what I need. Once I'm happy that the library will do what i need it to do, then I'm happy to configure it the way I want.
A good example is Microsoft's ASP.Net MVC framework. When you create a new MVC project it hooks in a default authentication and membership provider, which allows the developer to very quickly get a functioning application up and running. It is also easy to configure different providers to be used if the default one's don't meet the requirements of the application in question.
As a slightly different example, Atlassian Confluence is wiki software which supports many different back-end databases. Atlassian could have chosen to have no default DB configuration, but instead Confluence ships with a default, simple, file-based database to allow users to evaluate the software. For production installations you can then hook up to Oracle, SQL Server, mySQL or whatever else you like.
There may be instances where a default configuratino for a library doesn't really make sense, but I think that would be a special case, rather than a general rule.
It depends. If you can provide sensible defaults, you might want to do that: it will make life easer on the occasional user of the library as they can set only the relevant settings, as opposed to the whole environment (with possibly settings the implications of which they don't fully understand (yet)). You are correct, that in situations it is possible this leads to frustration and confusion as the defaulted settings might cause unexpected behavior (unexpected by the (inexperienced) user) -- you have to weigh the reduced frustration of convenience against the price of not-understood defaults to make the choice for each of these possible-to-default settings, which choice might affect the choice for other, related settings as well
On the other hand, if there is no sensible default (e.g. DB credentials, remote address), you should require the user to provide those settings.
The key in both cases is to provide enough information in the documentation of the library and in the error messages (either for missing settings or conflicting ones) that the user can figure out what those settings actually mean/control without having to read through the source code of the library. This part is hard because 1) it is usally tedious from the point of view of the library developer (so it is often skimped) and 2) the documentation has to be written from the mindset of a newbie to the library, which is often different from the library developer's mindset -- the latter knows the implicit connections/implications, the former has to be told about those in an understandable way.
Although not exactly identical in terms of problem domain, this strikes me as the Convention over Configuration argument.
There has been quite a lot momentum behind CoC in recent years, and in my mind, it makes a whole lot of sense. As long as flexibility is not lost, you have everything to gain. Lower friction development is what we are all after, and if I've got to configure every aspect of your API in order to get it working, I'm less inclined to use it over another API of equal functionality.
I happen to like Hanselman's podcasts, so if you want a little light listening, check out this podcast.
I think your question needs some clarification. For starters, I don't think a library should have any runtime configuration. In terms of dependencies, library dependencies should be handled in a manner appropriate to the environment they are being written for. In python, those dependencies should be in the setup.py file (under requirements), and ultimately that file should meet the requirements of whatever service you plan on making it available on (i.e. pypi for python).
For applications, it is completely okay to require runtime configuration, but you should try to have sensible defaults. If your application depends on libraries, that dependency should be handled in the same way a library dependency would be handled, even though that information may be redundant in the context of an installer (if needed). For the most part first-run scripts and their ilk should be apart of the installer/rpm.
For Web Frameworks, it is typical that your app would carry configuration with it, and likely that it would need to be installed in a different way than traditional applications. Here, about the only thing you can do is try to follow the conventions of whatever framework you are writing in.

Boiler plate for WCF project, versioning anticipated

I'm starting to find myself getting more and more in to using WCF for projects I implement for internal use (automating company tasks, making sure all clients are on the same page, etc.) This is largely due to the 3-10 clients I am automating at once whenever I do implement a solution, and (even if it was a small sample) the company is growing which continually adds more clients in the pool and thus a higher demand for reliability/consistency. With that said, I'm recognizing how important it is to make sure I make things expandable as (previously) pushing a release was getting harder the more clients I have depending on the service.
My latest project has a potential of being externalized. Until now I've done it the way I know works, but I'd still like to travel down the "right" path in terms of future updates. How should I be setting up my project file to make this as easy and seamless as possible to keep maintained, up-to-date and expansive? Should I be placing version numbers in to the namespace (as in Company.Interfaces.Contracts.June2011.IMyService), using pseudo folders, ...?
I just don't feel confident in this aspect of moving forward. I'd like to know that whatever ground work I have in place now won't place burdens on future expansion/customizing later. I'd also like to stick to the "development norm" as much as possible as it's getting more plausible that we'll hire additional programmers to help the work load.
Does anyone with this kind of experience have any thoughts, suggestions, guidance in this field? I would really appreciate any examples, books, documentation, etc. that you can provide.
Update (06-17-2011)
To give some insight, I'm also looking for some specific questions. These include:
How do you decorate a service class vs a DTO in terms of namespace? I've seen http://service.domain.com/ServerName/Version used on the Service class itself & http://types.domain.com/ServiceName/Version used on the DTOs. Is this common? (Separate the namespace in to a type and service collection?)
Should I be implementing IExtensibleDataObject on all my objects on the basis that they could potentially be evolved in future released? (Lay the ground work out now)
If my database has constraints on it for (e.g.) string length, I should be extending IParameterInspector and using that method for validity (keeping logic and validation separate), correct?
Should the "actual service" be broken out in to its own class so, as I version, the Service Contract classes just call the code (keeping each new version release with an minimal code as possible?) Or should I keep it within the service class and inherit from it with any new methods (likewise, what happens should you remove a method?)
I'm sorry if I have a lot of questions, I just see two ends of the spectrum in documentation. I see "Setting up wcf" then directly to "this is a versioned WCF"--no segue/steps between. I'm assuming it's going to just "click" once I get enough information, but I'm (sadly) not there yet.
tl;dr
When you start writing a WCF service that you know is going to hit several iterations, how do you setup your project(s) to make it as easy as possible in the future (on yourself and teammates)?
I have had success using a "strict" versioning policy (it seems from past experience you are heading in this direction anyway) where you simply create new endpoint/s each time a new definition is released. This means you won't have any contract backwards compatibility concerns for legacy clients - older versions can easily be turned off once logging indicates all clients have upgraded. It is generally necessary however to write bridging code for any legacy endpoint/s so they can continue to call into the modified business logic.
In terms of project organisation, I would create a new project for each version so they can easily be deployed separately. Namespaces using v1, v2 are normally works well enough. The endpoint names can also include a version number which should easily distinguish them from each other.
Alternately you could try using a "lax" versioning policy where you can have the ability to add or remove data members by implementing the IExtensibleDataObject interface in all your services. Some useful MSDN article links can be found in a popular response to a similar question: WCF client's and versioning.
Another "lax" kind of option is to move more towards a messaging solution (which WCF can support through message contracts and/or the MSMQ binding). Here podcast by SOA guru Udi Dahan that provides an interesting perspective and is definitely worth a listen - there is no IDog2.
Finally here is a good blog post with some further more fine-grained guidelines on whichever strategy you end up using:
http://wcfpro.wordpress.com/2010/12/21/wcf-versioning-guidelines-2/.

HTTP requests and Apache modules: Creative attack vectors

Slightly unorthodox question here:
I'm currently trying to break an Apache with a handful of custom modules.
What spawned the testing is that Apache internally forwards requests that it considers too large (e.g. 1 MB trash) to modules hooked in appropriately, forcing them to deal with the garbage data - and lack of handling in the custom modules caused Apache in its entirety to go up in flames. Ouch, ouch, ouch.
That particular issue was fortunately fixed, but the question's arisen whether or not there may be other similar vulnerabilities.
Right now I have a tool at my disposal that lets me send a raw HTTP request to the server (or rather, raw data through an established TCP connection that could be interpreted as an HTTP request if it followed the form of one, e.g. "GET ...") and I'm trying to come up with other ideas. (TCP-level attacks like Slowloris and Nkiller2 are not my focus at the moment.)
Does anyone have a few nice ideas how to confuse the server's custom modules to the point of server-self-immolation?
Broken UTF-8? (Though I doubt Apache cares about encoding - I imagine it just juggles raw bytes.)
Stuff that is only barely too long, followed by a 0-byte, followed by junk?
et cetera
I don't consider myself a very good tester (I'm doing this by necessity and lack of manpower; I unfortunately don't even have a more than basic grasp of Apache internals that would help me along), which is why I'm hoping for an insightful response or two or three. Maybe some of you have done some similar testing for your own projects?
(If stackoverflow is not the right place for this question, I apologise. Not sure where else to put it.)
Apache is one of the most hardened software projects on the face of the planet. Finding a vulnerability in Apache's HTTPD would be no small feat and I recommend cutting your teeth on some easier prey. By comparison it is more common to see vulnerabilities in other HTTPDs such as this one in Nginx that I saw today (no joke). There have been other source code disclosure vulnerablites that are very similar, I would look at this and here is another. lhttpd has been abandoned on sf.net for almost a decade and there are known buffer overflows that affect it, which makes it a fun application to test.
When attacking a project you should look at what kind of vulnerabilities have been found in the past. Its likely that programmers will make the same mistakes again and again and often there are patterns that emerge. By following these patterns you can find more flaws. You should try searching vulnerablites databases such as Nist's search for CVEs. One thing that you will see is that apache modules are most commonly compromised.
A project like Apache has been heavily fuzzed. There are fuzzing frameworks such as Peach. Peach helps with fuzzing in many ways, one way it can help you is by giving you some nasty test data to work with. Fuzzing is not a very good approach for mature projects, if you go this route I would target apache modules with as few downloads as possible. (Warning projects with really low downloads might be broken or difficult to install.)
When a company is worried about secuirty often they pay a lot of money for an automated source analysis tool such as Coverity. The Department Of Homeland Security gave Coverity a ton of money to test open source projects and Apache is one of them. I can tell you first hand that I have found a buffer overflow with fuzzing that Coverity didn't pick up. Coverity and other source code analysis tools like the open source Rats will produce a lot of false positives and false negatives, but they do help narrow down the problems that affect a code base.
(When i first ran RATS on the Linux kernel I nearly fell out of my chair because my screen listed thousands of calls to strcpy() and strcat(), but when i dug into the code all of the calls where working with static text, which is safe.)
Vulnerability resarch an exploit development is a lot of fun. I recommend exploiting PHP/MySQL applications and exploring The Whitebox. This project is important because it shows that there are some real world vulnerabilities that cannot be found unless you read though the code line by line manually. It also has real world applications (a blog and a shop) that are very vulnerable to attack. In fact both of these applications where abandoned due to security problems. A web application fuzzer like Wapiti or acuentix will rape these applications and ones like it. There is a trick with the blog. A fresh install isn't vulnerable to much. You have to use the application a bit, try logging in as an admin, create a blog entry and then scan it. When testing a web application application for sql injection make sure that error reporting is turned on. In php you can set display_errors=On in your php.ini.
Good Luck!
Depending on what other modules you have hooked in, and what else activates them (or is it only too-large requests?), you might want to try some of the following:
Bad encodings - e.g. overlong utf-8 like you mentioned, there are scenarios where the modules depend on that, for example certain parameters.
parameter manipulation - again, depending on what the modules do, certain parameters may mess with them, either by changing values, removing expected parameters, or adding unexpected ones.
contrary to your other suggestion, I would look at data that is just barely short enough, i.e. one or two bytes shorter than the maximum, but in different combinations - different parameters, headers, request body, etc.
Look into HTTP Request Smuggling (also here and here) - bad request headers or invalid combinations, such as multiple Content-Length, or invalid terminators, might cause the module to misinterpret the command from Apache.
Also consider gzip, chunked encoding, etc. It is likely that the custom module implements the length check and the decoding, out of order.
What about partial request? e.g requests that cause a 100-Continue response, or range-requests?
The fuzzing tool, Peach, recommended by #TheRook, is also a good direction, but don't expect great ROI first time using it.
If you have access to source code, a focused security code review is a great idea. Or, even an automated code scan, with a tool like Coverity (as #TheRook mentioned), or a better one...
Even if you don't have source code access, consider a security penetration test, either by experienced consultant/pentester, or at least with an automated tool (there are many out there) - e.g. appscan, webinspect, netsparker, acunetix, etc etc.