Can the SVN and HTTP protocols be used safely on the same repository simultaneously? - apache

We would like to evaluate whether the SVN protocol works better for our team than HTTP, but we don't want to commit to a full switch just yet.
Right now we have an Apache sever serving up our main repository. Can we safely use svnserve.exe to with the same repository so that a few of our developers can test it? My initial guess is that we can, but we don't want to risk corrupting our repository.

Yes, it's possible. The official SVN book has a chapter devoted to this situation:
http://svnbook.red-bean.com/en/1.5/svn.serverconfig.multimethod.html . There are some pitfalls but they have more to do with permission settings.

Exactly, Subversion is designed to support concurrent access via multiple protocols, something which causes major problems with CVS. Not only can you use http:// and svn://, but also file:// (if you happen to be working locally on the machine, for example with a continuous integration tool or other post-commit hook) https://, svn+ssh://, etc.
In my experience, one method hasn't proven to be objectively "better" than the other, but there are certain benefits to each. For example, Apache is extremely adept at handling lots of accesses at one. On the other hand, if you're not already using Apache, or don't want to make it handle SVN traffic, the svnserve daemon is lightweight and quite performant. On my Macs, I set up svnserve using launchd to start up only when a request comes in, so it doesn't use any resources when there is no repository activity. What works best will largely be a factor of the access patterns you see in practice.

Related

Do companies that provide APIs use a shim or proxy in front of their APIs?

I'm researching how large companies manage their public APIs. I'm thinking of companies with mature established APIs such as Google, Facebook, Twitter, and Amazon.
These companies have a number of different APIs that they expose to the public. Google, for example, has Plus, AdSense, AdWords etc. APIs that are publicly consumable. I'd like to understand if they use a cluster of reverse-proxy servers in front of those APIs to provide common functionality so that their specialist API servers don't need to implement that.
For example: Throttling and Authentication could be handled at this layer instead of implementing it in each API cluster.
The questions: Does anyone use a shim or reverse proxy in front of their APIs to handle common tasks? What are the use cases that make a reverse-proxy a good or bad idea for a cluster of API servers?
Most large companies explore a variety of things to handle the traffic and load on their servers. Roughly speaking:
A load balancer sits between the entry point and the actual client.
A reverse proxy often times sits between these to handle static files, pre-computed/rendered views, and other such largely static assets.
Any cast is used for DNS purposes, so that you are routed towards the nearest server that handles that URL.
Back pressure is employed in systems to limit the amount of requests feeding through a single pipeline and so that services don't tip over.
Memcached, Redis and the like are used as short term caches. That is, if it's going to roughly be the same result every 5 seconds, then that result can be cached in memory for faster delivery. Some proxies can be configured to read out of these.
If you're really interested, start reading some of the Netflix blog. Take a look at some of the open source they've used like Hystrix or Zuul. You can also take a look at some of their videos. They make heavy use of proxies and have built in some very advanced distributed behavior.
As far as a reverse proxy being a good idea, think in terms of failure. If your service calls out to another API by direct route and that service fails, then your service will fail and cascade upwards to the end user. On the other hand, if it's hitting a reverse proxy, then that proxy can be configured or even auto detect failures and divert traffic to back up servers.
As far as a reverse proxy being a good idea, think in terms of load. Sometimes servers can only handle a fraction of the traffic individually so that load must be shared on many servers. This is true not just of CPU capped but also IO capped resources (even if the return signal itself will not be the cause of the IO capping.)
Daisy chaining like this presents its own special little hell but it's sometimes unavoidable. On the downsides and what makes it a really bad choice if you can avoid it at all costs is a loss of deterministic behavior. Sometimes the stupidest things will bring your servers down. And by stupid, I mean, really, really dumb stuff that you never thought in a million years might bite you in the butt (think server clocks out of sync.) You have to start using rolling deploys of code, take down servers manually or forcefully if they stop responding, and keep those proxy configs in good order.
HTTP1.1 support can also be an issue. Not all reverse proxy adhere to the spec. In fact, some of them only cover ~50%. HAProxy does not do SSL. If you're only limited hardware then thread based proxy can unexpectedly swamp the system with threads.
Finally, adding in a proxy is one more thing that will break (not can, will.) You have to monitor them just like any piece of the platform, aggregate their logs, and run mock drills on them too.

Using Akka to load balance HTTP SOAP request between multiple backend servers

I am working on a project which has following requirements:
Perform sticky based load balancing(based on SOAP session ID) onto multiple backend servers.
Possibility to plugin my own custom based load balancer.
Easy to write and deploy.
A central configuration file(Possibly an XML), to take care of all the backend servers.
Easy extraction of a node from this configuration file(Possibly with xpath).
I tried working with camel for a while but, wasn't able to do perform certain task with it.
So thought of giving a try to Akka.
Will akka be possibly able to satisfy the above requirements?
If so is there a load balancing example in akka or proxy example?
Would really appreciate some feedbacks.
You can do everything you've described with Akka.
You don't mention what language you're working with, Scala or Java. I've included links to the Scala documentation.
Before you do anything with Akka you HAVE TO read the documentation and understand how Akka works.
http://doc.akka.io/docs/akka/2.0.3/
Doing so, you'll find Akka is perfect for the project you've described with some minor caveats.
Once you read the documentation the following answers should make a lot of sense.
Perform sticky based load balancing(based on SOAP session ID) onto multiple backend servers.
Load balancing is already part of the framework (it's called Routing in Akka http://doc.akka.io/docs/akka/2.0.3/scala/routing.html) and Remoting (http://doc.akka.io/docs/akka/2.0.3/scala/remoting.html) will take care of the backend servers. You can easily combine the two.
To my knowledge the idea of sticky load balancing is not a part of Akka but I can envision this being accomplished with a Map using the session ID as the key and the Actor name (or path) as the value. A quick actorFor will take care of the rest. Not well thought out but should give you a good idea of where to start.
Possibility to plugin my own custom based load balancer.
Refer to the Routing documentation.
Easy to write and deploy.
This depends on your aptitude and effort but after you read certain parts of the documentation you should be build a proof of concept in a couple of hours.
Deployment can be a bit frustrating mostly because the documentation isn't really great with respect to deploying Akka networks with remote components. However, there are enough examples on the web that you can figure out how to get it done...eventually. Once you do it once it's no big deal.
A central configuration file(Possibly an XML), to take care of all the backend servers.
Akka uses Typesafe Config (https://github.com/typesafehub/config) which is a lot easier to work with than XML (but I hate XML so take that with a grain of salt). As far as a central configuration, I'm not sure what you're trying to accomplish but it sounds like something that can be solved using remote actor creation. Again, see the Remoting documentation.
Easy extraction of a node from this configuration file(Possibly with xpath).
Akka provides a lookup method .actorFor. There's no need to go to the configuration file once the system is up and running.
If so is there a load balancing example in akka or proxy example?
Google is your friend.

Apache Camel equivalent in Rails

Is there an equivalent to Apache Camel in Rails ?!
I'm creating an application that needs to "listen" to messages from one source (for example: email (POP3)) and sends them to another source (for example: logfile or email (using smtp)).
Any ideas ?
I am not sure about a complete equivalent to Apache Camel. But, to just listen for mails from a POP3 server and send to another source, try the mailman gem
EDIT: You should also look at mailcatcher gem
I am pretty sure there are no ports of Apache Camel to other languages, including ruby (but others as well, there was a question about .net recently). However, you can use Apache Camel with your application. You can treat Camel as an independent daemon that you need to configure and you can do that conveniently via xml. If you need some of your ruby code to be invoked during processing you can use the Camel org.jruby:jruby support. It may be less than ideal, but it works well. To interact with external systems Camel supports a large number of protocols already (including the ones you mentioned), but one could plug in her own.
Given Camel's support for many languages, protocols and data formats, I doubt anybody will go through the significant effort of porting it to other languages, but you never know.
You should definitely look at Llama.
They are in an early stage, but seems like they are going to build "an integration-framework on top of EventMachine that helps with tying together various backend services", which is what Camel is.

Tomcat test and production environment

What is the best design to have many enviroments for one web-app? Is it better to have multiple tomcat instances or multiple web-app instances deployed on one Tomcat server?
If one server can handle the load, I would said it's better to have just one Tomcat instance and deploy web-app multiple times if necessary.
This way:
You'll have only one server to take care of (secure, administer, backup).
You share hardware resources among applications (RAM, DISK, CPU)
The idea of deploying the same web-app several times in order to reduce administration burden is good.
But in my opinion, this isn't an acceptable solution : suppose you deploy a web-app twice. Once for a TEST environment and a second time for a PRODUCTION environment. The web-app may encounter exceptions/errors (typically, memory-related issues) that may lead the whole Tomcat server to crash. In such a situation, problems that were encountered in one environment would cause the other one to be unavailable.
Therefore, I would rather install as many Tomcat instances as different environment.
Ideally, you should keep all production code on a completely separate environment as much as possible just to avoid mistakes and for security reasons.
Depending on your resources and team size, say for example, you have an enclave for production: web server, database, mail server. This should have rules to disallow any development resources from access production resources and vice versa. If your dev resources have been compromised or you run a script going to the wrong resource, there would be a layer of protection for that.
Yes, this is all inconvenient, but it could save you from having big headaches in the long run.

Apache resource usage vs Mongoose or other lightweight web server

How much memory and/or other resources does Apache web server use?
How much more are lightweight servers efficient?
Say appache vs. Mongoose Web Server
Neil Butterworth you out there?
Thanks.
Yes, lightweight servers are more efficient with memory and resources, as the term 'lightweight' would indicate. nginx is a popular one.
Apache's memory and resource usage depends a lot on what you're doing with it - which modules are loaded, what your PHP etc. scripts are doing. There's no single answer.
You have to take into account your specific task, and also the fact that almost every web server has some sort of specialization (a niche).
Apache is configurable and stable.
nginx is extremely fast, but works only with static context.
lighttpd is small, fast and does both static and dynamic context.
Mongoose is embeddable, small and easy to use.
There are many more web servers, I won't go through the whole list here. You need to decide which features do you require for your task, and make a choice accordingly.
Apache Httpd is great if you need lots of flexibility that is provided via various mods. If you're looking for straight-up file serving or proxying, then some lightweight options might be better. I manage the Maven Central repo that gets millions of hits a day and I have some experience with Nginx.