Does proxy have to load everything completely first before sending them back? - apache

I've been using proxy services, and I want to know some details behind it, regarding its speed and efficiency. Consider following scenario:
There's a mp3 file on server M, a client wants to download that file, but he doesn't want to expose himself, so he decides to use a proxy website to download. The get mp3 request is therefore send to proxy server P first, then proxy server would get that mp3 for the client, here's my question about some details:
Does P have to download the entire mp3 file first before it can pass it to the client? If so, the file is downloaded twice (first on proxy server, then on client's machine) , taking about twice amount of time?

Proxies normally operate in two modes: HTTP and Connect.
The Connect mode is for blackbox protocols like HTTPS or ftp. Where most of the data is meaningless octet streams. Because they are encrypted or unstructured files.
However, for HTTP, proxies are pretty smart. One of the things that they do is caching stuff. Like images and web page contents when you are downloading a website in your browser via proxy.Moreover, for octet streams under HTTP, proxies show the connect behavior, meaning that they open a relay socket and let you download the content. In the meanwhile, they will store it locally, and if it doesn't exceed a certain size the file will be also cached.
The files are also forwarded, or relayed, or sometimes called rewrited. This here is a sample config file that shows squid configured to forward Youtube videos and not caching them.
Another reason why downloading and forwarding is not an option is doubling the round trip time (RTT). It is really counterintuitive when you add another RTT that slows down a HTTP session.

Related

HTTPS request served by Apache2 slower than via Reverse Proxy

My situation:
A website is hosted using a default apache2 installation on an ubuntu server.
Served on port 443 using HTTPS and a self-signed certificate (for developping).
Now I have a simple service written in golang that listens at port 8080 and acts as a Reverse Proxy to take https requests, forward them to apache locally and return the response back to the client. This webservice doesn't cache any files and only forwards requests.
Code: https://play.golang.org/p/tnfKVWyLuZQ
My "problem":
Calling apache directly, i.e. https://foo.com/bar/ is remarkably slower (200-400ms) than calling the website through my reverse proxy, i.e. https://foo.com:8080/bar/
Why is it slower to call apache2 directly? I expected to have overhead using a reverse proxy, not a speedup. -> Comparison for example page: https://i.imgur.com/TqznM2v.png
UPDATE: Sketch to show the current setup:
Current Setup
Regarding the encoding: The Encoding is consistent in both situations: Encoding header and Content-Length is in both cases (Situation 1 vs 2) the same, the client also receives the file size. Not sure why in the HAR Viewer it only displays the uncompressed size in the second case. If checking in Chrome I can see the compressed size in both case.
Update #2: I came to the conclusion that the golang implementation handles multiple requests from the same client in a short time more efficiently than apache2 in it's default configuration. Sicne I only test with few clients I can't say how well it scales - I imagine the webservice will fall behind when under load.
I see this as closed, thanks all for the help.
As far as i can see. There are two possible reasons.
The apache reverse proxy handled some cache contain static file like images, css or javascript.
When you browse a https url. A process named "ssl certificate uninstall" will happen, and it can cause huge server load. So if the web application and the ssl certificate are deployed on the same server, the load may cause high-latency. Generally, we use a special device named load-balancer to uninstall ssl certificate, Just like a reverse proxy.

Local HTTPS proxy possible?

TL;DR
I want to set up a local HTTPS proxy that can (LOCALLY) modify the content of HTML pages on my machine. Is this possible?
Motivation
I have used an HTTP Proxy called GlimmerBlocker for years. It started in 2008 as a proxy-based approach to blocking ads (as opposed to browser extensions or other OS X-specific hacks like InputManagers). But besides blocking ads, it also allows the user to inject their own CSS or JavaScript into the page. Development has seriously slowed, but it remains incredibly useful.
The only problem is that it doesn’t do HTTPS (from its FAQ):
Ads on https pages are not blocked
When Safari fetches an https page using a proxy, it doesn't really use the http protocol, but makes a tunneled tcp connection so Safari receives the encrypted bytes. The advantage is that any intermediate proxies can't modify or read the contents of the page, nor the URL. The disadvantage is, that GlimmerBlocker can't modify the content. Even if GlimmerBlocker tried to work as a middleman and decoded/encoded the content, it would have no means of telling Safari to trust it, nor to tell Safari if the websites certificate is valid, so Safari would think you have visited a dubious website.
Fortunately, most ad-providers are not going to switch to https as serving pages using https are much slower and would have a huge processing overhead on the ad-providers servers.
Back in 2008, maybe that last part was true…but not any more.
To be clear, I think the increasing use of SSL is a good thing. I just want to get back the control I had over the content after it arrives on my end.
Points of Confusion
While searching for a solution, I’ve become confused by some apparently contradictory points.
(Also, although I’m quite experienced with the languages of web pages, I’ve always had a difficult time grokking networks and protocols. On that note, sorry if I’m missing something that is way obvious!)
I found this StackOverflow question asking whether HTTPS proxies were possible. The best answer says that “TLS/SSL (The S in HTTPS) guarantees that there are no eavesdroppers between you and the server you are contacting, i.e. no proxies.” (The same answer then described a hack to pull it off, but I don’t understand the instructions. It was very theoretical, anyway.)
In OS X under Network Preferences ▶︎ Advanced… ▶︎ Proxies, there is clearly a setting for an HTTPS proxy. This seems to contradict the previous statement that TLS/SSL’s guarantee against eavesdropping implies the impossibility of proxies.
Other things of note
I can’t remember where, but I read that it is possible to set up an HTTPS proxy, but that it makes HTTPS pointless (by breaking the secure communication in the process). I don’t want this! Encryption is good. I don’t want to filter anyone else’s traffic; I just want something to customize the content after I’ve already received it.
GlimmerBlocker has a nice GUI interface, but I’m fine with non-GUI solutions, too. I may have a poor understanding of networking and protocols, but I’m perfectly comfortable on the command line, tweaking settings in text editors, and so on.
Is what I’m asking possible? Or is my question a case of “either you get security, or you can break it with hacks and get to customize your content—but not both”?
The common idea of a HTTP proxy is a server which accepts a CONNECT request which includes the target hostname and port and then just builds a tunnel to the target server. All the https is done inside the tunnel, so there is no way for the proxy to modify it (end-to-end security from browser to web server).
To modify the data you need to have a proxy which plays man-in-the-middle. In this case you have a https connection between the proxy and the web server and another https connection between the browser and the proxy. Between proxy and web server the original server certificate is used, while between browser and proxy a newly created certificate is used, which is signed by a CA specific to the proxy. Of course this CA must be imported as trusted into he browser, otherwise it would complain all the time about possible attacks.
Of course - all the verification of the original server certificate has to be done in the proxy now, and not all solutions do this the correct way. See also http://www.secureworks.com/cyber-threat-intelligence/threats/transitive-trust/
There are several proxy solution which might do this SSL interception, like squid, mitmproxy (python) or App::HTTP_Proxy_IMP (perl). The last two are specifically designed to let you modify the content with your own code, so these might be good places to start.

Should I run Tomcat by itself or Apache + Tomcat?

I was wondering if it would be okay to run Tomcat as both the web server and container? On the other hand, it seems that the right way to go about scaling your webapp is to use Apache HTTP listening on port 80 and connecting that to Tomcat listening on another port?
Are both ways acceptable? What is being used nowdays? Whats the prime difference? How do most major websites go about this?
Thanks.
Placing an Apache (or any other webserver) in front of your application server(s) (Tomcat) is a good thing for a number of reasons.
First consideration is about static resources and caching.
Tomcat will probably serve also a lot of static content, or even on dynamic content it will send some caching directives to browsers. However, each browser that hits your tomcat for the first time will cause tomcat to send the static file. Since processing a request is a bit more expensive in Tomcat than it is in Apache (because of Apache being super-optimized and exploiting very low level stuff not always available in Tomcat, because Tomcat extracting much more informations from the request than Apache needs etc...), it may be better for the static files to be server by Apache.
Since however configuring Apache to serve part of the content and Tomcat for the rest or the URL space is a daunting task, it is usually easier to have Tomcat serve everything with the right cache headers, and Apache in front of it capturing the content, serving it to the requiring browser, and caching it so that other browser hitting the same file will get served directly from Apache without even disturbing Tomcat.
Other than static files, also many dynamic stuff may not need to be updated every millisecond. For example, a json loaded by the homepage that tells the user how much stuff is in your database, is an expensive query performed thousands of times that can safely be performed each hour or so without making your users angry. So, tomcat may serve the json with proper one hour caching directive, Apache will cache the json fragment and serve it to any browser requiring it for one hour. There are obviously a ton of other ways to implement it (a caching filter, a JPA cache that caches the query etc...), but sending proper cache headers and using Apache as a reverse proxy is quite easy, REST compliant and scales well.
Another consideration is load balancing. Apache comes with a nice load balancing module, that can help you scale your application on a number of Tomcat instances, supposed that your application can scale horizontally or run on a cluster.
A third consideration is about ulrs, headers etc.. From time to time you may need to change some urls, or remove or override some headers. For example, before a major update you may want to disable caching on browsers for some hours to avoid browsers keep using stale data (same as lowering the DNS TTL before switching servers), or move the old application on another url space, or rewrite old URLs to new ones when possible. While reconfiguring the servlets inside your web.xml files is possible, and filters can do wonders, if you are using a framework that interprets the URLs you may need to do a lot of work on your sitemap files or similar stuff.
Having Apache or another web server in front of Tomcat may help a lot changing only Apache configuration files with modules like mod_rewrite.
So, I always recommend having Apache httpd in front of Tomcat. The small overhead on connection handling is usually recovered thanks to caching of resources, and the additional configuration works is regained the first time you need to move URLs or handle some headers.
It depends on your network and how you wish to have security set up.
If you have a two-firewall DMZ, with applications deployed inside the second firewall, it makes sense to have an Apache or IIS instance in between the two firewalls to handle security and proxy calls into the app server. If it's acceptable to put the Tomcat instance in the DMZ you're free to do so. The only downside that I see is that you'll have to open a port in the second firewall to access a database inside. That might put the database at risk.
Another consideration is traffic. You don't say anything about traffic, sizing servers, and possible load balancing and clustering. A load balancer in front of a cluster of app servers is more likely to be kept inside the second firewall. The Tomcat instance is capable of handling traffic on its own, but there are always volume limitations depending on the hardware it's deployed on and what the application is doing with each request. It's almost impossible to give a yes or no answer without more detailed, application-specific information.
Search the site for "tomcat without apache" - it's been asked before. I voted to close before finding duplicates.

HTTPS or SSH for server-server communication

I have two servers. I wish to send some data ( was doing it with HTTP GET till now ) to a php file residing on the server and get some output from it.
Of late, I saw the requests per second went up to 50 and Apache served HTTP 500 error for some of those. This server has 512 MB RAM and the script, in php-cli mode, usually eats up around 10 MB of memory.
I wish, if it were to reduce the load on server, to use SSH instead of HTTPS. Will it reduce the memory usage on this server (minus what the script itself needs)? Or will too many SSH connections still cause hindrance?
Note - I do not have HTTPS setup right now. But planning to switch over to it. And just then, this issue cropped up.
SSH will not speed up your program. What you can do, is to create your own server on the destination server (which will receive data). The web server do much more than just receive data, like interpret HTTP headers and route your requests to files. Your own server can do the same job in a much lighter way.
http://br.php.net/manual/en/sockets.examples.php has an example of how to do this.
I would use normal HTTP, but encrypt the data sent.
How big is the data you're wanting to transfer? Depending on the size SSH (in particular, SFTP) might very well be better. I say that because... well, when was the last time you tried to upload a 10MB file via a webpage and succeeded? Uploading small files works for HTTP but, at the end of the day, it's not a file transfer protocol.
My recommendation would be to use phpseclib, a pure PHP SFTP implementation. Upload a file via SFTP and then run a PHP script on that file via SSH.

disable request buffering in nginx

It seems that nginx buffers requests before passing it to the updstream server,while it is OK for most cases for me it is very bad :)
My case is like this:
I have nginx as a frontend server to proxy 3 different servers:
apache with a typical php app
shaveet(a open source comet server) built by me with python and gevent
a file upload server built again with gevent that proxies the uploads to rackspace cloudfiles
while accepting the upload from the client.
#3 is the problem, right now what I have is that nginx buffers all the request and then sends that to the file upload server which in turn sends it to cloudfiles instead of sending each chunk as it gets it (those making the upload faster as i can push 6-7MB/s to cloudfiles).
The reason I use nginx is to have 3 different domains with one IP if I can't do that I will have to move the fileupload server to another machine.
As soon as this [1] feature is implemented, Nginx is able to act as reverse proxy without buffering for uploads (bug client requests).
It should land in 1.7 which is the current mainline.
[1] http://trac.nginx.org/nginx/ticket/251
Update
This feature is available since 1.7.11 via the flag
proxy_request_buffering on | off;
http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_request_buffering
According to Gunicorn, they suggest you use nginx to actually buffer clients and prevent slowloris attacks. So this buffering is likely a good thing. However, I do see an option further down on that link I provided where it talks about removing the proxy buffer, it's not clear if this is within nginx or not, but it looks as though it is. Of course this is under the assumption you have Gunicorn running, which you do not. Perhaps it's still useful to you.
EDIT: I did some research and that buffer disable in nginx is for outbound, long-polling data. Nginx states on their wiki site that inbound requests have to be buffered before being sent upstream.
"Note that when using the HTTP Proxy Module (or even when using FastCGI), the entire client request will be buffered in nginx before being passed on to the backend proxied servers. As a result, upload progress meters will not function correctly if they work by measuring the data received by the backend servers."
Now available in nginx since version nginx-1.7.11.
See documentation
http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_request_buffering
To disable buffering the upload specify
proxy_request_buffering off;
I'd look into haproxy to fulfill this need.