What does HTTP/2 mean for a reverse proxy server? - reverse-proxy

How HTTP/2 affects the implementation of a proxy server? Especially, for example, when a client sends an HTTP/2 request to a content server that only supports HTTP/1.x, should the proxy server tranform the HTTP/2 request to HTTP/1.x request before directing the client request to the content server? And upon receiving response from the content server, should the proxy server transform the response to the HTTP/2 format before sending it back to the client?

As dsign discussed, your understanding is correct.
However I thought it was worth pointing out there are still massive advantages to HTTP/2 at your edge connection (i.e. Your reverse proxy) as the issues HTTP/2 solves (primarily latency) are less of an issue over the typically shorter, typically high bandwidth hop from the reverse proxy to the content server.
For example if you have a 100ms delay to the reverse proxy at the edge and only 1ms delay between the reverse proxy and the content server, then the fact the content server is speaking HTTP/1.1 to the proxy server probably won't have much impact on performance due to the super fast 1ms latency. So the end client (speaking HTTP/2 to the reverse proxy) still sees a massive performance boast over HTTP/1.1.

Yes, it is exactly as you say. Conversion from HTTP/2 to HTTP/1.1 must happen in one direction, and from HTTP/1.1 to HTTP/2 must happen in the other case.
In practice this means that although HTTP/2 the protocol doesn't need a traditional text-based parser, a comprehensive HTTP/2 server needs an HTTP/1.1 parser, not only to work with clients which are HTTP/1.1 only (among them crawlers) but also for talking to inner applications.
By use, one of the most important application protocols is FastCGI. FastCGI also requires parsing of HTTP/1.1 responses from the application and conversion to HTTP/2 responses to the client.

Related

Keep-Alive vs proxy-send-timeout vs proxy-read-timeout vs keep-alive express default request header in Nginx ingress-nginx-controller for Kubernetes

In Kubernetes cluster I have a an Nginx ingress controller that I believe it is termed the "proxy" to the underlying express server pod service.
As such, there are controls from the Nginx controller / load balancer that must be taken into account. Such as, the keep-alive timeout function.
With SSE you have to have the keep alive active and I have that set on my express server at 30 minutes.
What I am not sure of is how does the nginx service work in coordination of the express server since they are doing a similar function.
Does it even matter if I set the express server header timeout? Can I remove that and consolidate the functionality to the load balancer?
For the proxy-send and proxy-read timeouts I am assuming the send (which is described as a write) is what the SSE functions as. In terms of a websocket I would assume both would be used and the read would be a client message sent (read) would be the proxy read variety. Not sure if that is what correct.
In that case, of the proxy vs keep-alive on the nginx controller, again, is the keep-alive necessary because it seems to be a more granular control and usage based rather than overall based control? Meaning, can I just remove the keep alive setting because if my proxy timeout times are shorter the keep-alive, in theory, would never get reached.
I have an illustration of the setup.

Is it normal for TCP request to randomly "get lost in the internet"?

I created and manage a SOAP API built in ASP.NET ASMX. The API processes about 10,000 requests per day. Most days, about 3 request sent by the client (we only have 1 client) do not reach the web server (IIS). There is no discernible pattern.
We are actually using 2 web servers that sit behind a load balancer. From the IIS logs, I am 100% confident that the requests are not reaching either web server.
The team that manages the network and the load balancer have not been able to 'confirm or deny' whether the problem is occurring at the load balancer. They suggested it's normal for request to sometimes "get lost in the internet", and said that we should add retry logic to the API.
The requests are using TCP (and TLS). The client has confirmed that there is no problem occurring on their end.
My question is: is it normal for TCP requests to "get lost in the internet" at the frequency we are seeing (about 3 out of 10,000 per day).
BTW, both the web server and the client are located in the same country. For what it's worth, the country in question is an anglopshere country, so it's not the case that our internet infrastructure is shoddy.
There is no such thing as a TCP request getting lost since there is no such thing as a TCP request in the first place. There is a TCP connection and within this there is a TLS tunnel and within this the HTTP protocol is spoken - and only at this HTTP level there is the concept of request and response which then is visible in the server logs.
Problems can occur in many places, like failing to establish the TCP connection in the first place due to no route (i.e. no internet) or too much packet loss. There can be random problems at the TLS level caused by bit flips which cause integrity errors and thus connection close. There can be problems at the HTTP level, for example when using HTTP keep-alive and the server closing an idle connection while at the same time the client is trying to send another request. And probably more places.
The client has confirmed that there is no problem occurring on their end.
I have no idea what exactly this means. No problem would be if the client is sending the request and getting a response. But this is obviously not the case here, so either the client is failing to establish the TCP connection, failing at the TLS level, failing while sending the request, failing while reading the response, getting timeouts ... - But maybe the client is simply ignoring some errors and thus no problem is visible at the clients end.

Do we need to Enable http2 on Apache server , If the CDN has Http2 enabled

We have a website which is in CDN and HTTP2 is enabled .
Does it makes any difference if we enable HTTP2 with Apache server.
It depends on the CDN you are using and if they can speak HTTP/2 back to the origin server. Cloudflare for example only uses HTTP/1 back to origin: Can i use HTTP/2 between origin and cloudflare servers?(Apache)
Most of the benefit of HTTP/2 is client to first point of contact (i.e. the CDN) as HTTP/2 improves lower latency, lower bandwith connections. CDN to Origin servers connections are likely to be higher latency and higher bandwith so are likely to benefit less from HTTP/2 over HTTP/1.
Finally using a CDN is one of the easiest ways to give you HTTP/2 without having to make any changes on your side, so yes it's perfectly acceptable to leave the origin server over HTTP/1. Obviously that means you won't benefit the full way, and may not be able to use things like server push (unless implemented at the CDN level), but it still should be a good improvement until HTTP/2 becomes more readily available in server distributions.

Apache mod_proxy content-length vs chunked encoding

I have apache configured to proxy for a (old, creaky, and basically http 1.0) web server that does not deliver a content-length header, just relying on closing the connection to mark end of data.
Under apache 2.2, mod_proxy handled this by using Transfer-Encoding: chunked, and then delivering the data as fast as the remote server would deliver the data. Under apache 2.4, mod_proxy handles this by waiting for the entire response from the remote server, then delivers the page with a content-length. As the backend server could potentially take 30+ seconds to gradually fill a page of results, the older behavior is preferable. There is no obvious configuration change that would have resulted in this behavior; I've tried out proxy-sendchunked but it doesn't seem to help (as described, it relates to data being uploaded by POST requests, which isn't the issue here).
Is this configurable and I've just missed it?

apache to tomcat: mod_jk vs mod_proxy

What are the advantages and disadvantages of using mod_jk and mod_proxy for fronting a tomcat instance with apache?
I've been using mod_jk in production for years but I've heard that it's "the old way" of fronting tomcat. Should I consider changing? Would there be any benefits?
A pros/cons comparison for those modules exists on http://blog.jboss.org/
mod_proxy
* Pros:
o No need for a separate module compilation and maintenance. mod_proxy,
mod_proxy_http, mod_proxy_ajp and mod_proxy_balancer comes as part of
standard Apache 2.2+ distribution
o Ability to use http https or AJP protocols, even within the same
balancer.
* Cons:
o mod_proxy_ajp does not support large 8K+ packet sizes.
o Basic load balancer
o Does not support Domain model clustering
mod_jk
* Pros:
o Advanced load balancer
o Advanced node failure detection
o Support for large AJP packet sizes
* Cons:
o Need to build and maintain a separate module
If you wish to stay in Apache land, you can also try the newer mod_proxy_ajp, which uses the AJP protocol to communicate with Tomcat instead of plain old HTTP, but which leverages mod_proxy to do the work.
AJP vs HTTP
When using mod_jk, you are using the AJP. When using mod_proxy you will use HTTP or HTTPS. And this is essentially what makes all the difference.
The Apache JServ Protocol (AJP)
The Apache JServ Protocol (AJP) is a binary protocol that can proxy inbound requests from a web server through to an application server that sits behind the web server. AJP is a highly trusted protocol and should never be exposed to untrusted clients, which could use it to gain access to sensitive information or execute code on the application server.
Pros
Easy to set up as the correct forwarding of HTTP headers is not required.
It is less resource intensive because the TCP packets are forwarded in binary format instead of doing a costly HTTP exchange.
Cons
Transferred data is not encrypted. It should only be used within trusted networks.
Hypertext Transfer Protocol (HTTP)
HTTP functions as a request–response protocol in the client–server computing model. A web browser, for example, may be the client and an application running on a computer hosting a website may be the server. The client submits an HTTP request message to the server. The server, which provides resources such as HTML files and other content, or performs other functions on behalf of the client, returns a response message to the client. The response contains completion status information about the request and may also contain requested content in its message body.
Pros
Can be encrypted with SSL/TLS making it suitable for traffic across untrusted networks.
It is flexible as it allows to modify the request before forwarding. For example, setting custom headers.
Cons
More overhead as the correct forwarding of the HTTP headers has to be ensured.
More resource intensive as the request is fully parsed before forwarding.