Under what conditions are HTTP request headers removed by proxies? - api

I'm looking at various methods of RESTfully versioning APIs, and there are three major contenders. I believe I've all but settled on using X-API-Version. Putting that debate aside, one of the arguments against using that header, and custom headers in general, is that you can't control when headers are manipulated by proxy servers. I'm curious about what real-world examples there are of this, when it happens on the internet at large, or when it might be used on an intranet or server cluster, or when it might occur in any other situation.

The Guidelines for Web Content Transformation Proxies 1.0 is pretty much the definitive guide to understanding and predicting standards-compliant proxy server behavior. In terms of your question, the Proxy Forwarding of Request portion of the document might be especially helpful.
Each proxy software package and their individual configurations will be vary but, HTTP proxies are generally expected to follow the W3C Guidelines. Here are some highlights.
4.1 Proxy Forwarding of Request:
Other than to convert between HEAD and GET proxies must not alter request methods.
If the request contains a Cache-Control: no-transform directive, proxies must not alter the request other than to comply with transparent HTTP behavior defined in [RFC 2616 HTTP] sections section 14.9.5 and section 13.5.2 and to add header fields as described in 4.1.6 Additional HTTP Header Fields below.
4.1.3 Treatment of Requesters that are not Web browsers
Before altering aspects of HTTP requests and responses proxies need to take account of the fact that HTTP is used as a transport mechanism for many applications other than "Traditional Browsing". Increasingly browser based applications involve exchanges of data using XMLHttpRequest (see 4.2.8 Proxy Decision to Transform) and alteration of such exchanges is likely to cause misoperation.
4.1.5 Alteration of HTTP Header Field Values
Other than the modifications required by [RFC 2616 HTTP] proxies should not modify the values of header fields other than the User-Agent, Accept, Accept-Charset, Accept-Encoding, and Accept-Language header fields and must not delete header fields (see 4.1.5.5 Original Header Fields).
Other than to comply with transparent HTTP operation, proxies should not modify any request header fields unless one of the following applies:
the user would be prohibited from accessing content as a result of the server responding that the request is "unacceptable" (see 4.2.4 Server Rejection of HTTP Request);
the user has specifically requested a restructured desktop experience (see 4.1.5.3 User Selection of Restructured Experience);
the request is part of a sequence of requests comprising either included resources or linked resources on the same Web site (see 4.1.5.4 Sequence of Requests).
These circumstances are detailed in the following sections.
Note:
It is emphasized that requests must not be altered in the presence of Cache-Control: no-transform as described under 4.1.2 no-transform directive in Request.
The URI referred to in the request plays no part in determining whether or not to alter HTTP request header field values. In particular the patterns mentioned in 4.2.8 Proxy Decision to Transform are not material.
4.1.6 Additional HTTP Header Fields
Irrespective of the presence of a no-transform directive:
proxies should add the IP address of the initiator of the request to the end of a comma separated list in an X-Forwarded-For HTTP header field;
proxies must (in accordance with RFC 2616) include a Via HTTP header field (see 4.1.6.1 Proxy Treatment of Via Header Field).
There is also lots of information regarding the alteration of response headers and being able to detect those changes.
As for web service REST API versioning, there is a very lucid and useful SO thread at Best practices for API versioning? that should provide a wealth of helpful insight.
I hope all of this helps. Take care.

This isn't an answer per se, but rather a mention of real-world scenario.
My current environment uses a mixed CAS/AD solution in order to allow SSO across several different platforms (classic ASP, ASP.NET, J2EE, you name it).
Recently we identified some issues - part of the solution involves aggregating Auth tokens to HTTP headers whenever necessary to propagate credentials. One specific solution, making considerable heavy usage of cookies, was chained with an nginx implementation, whose HTTP header limit was set to 4KiB. If the cookie payload went over 2KiB, it would start leaking out headers.
Consequently, applications that had some sort of state/scope control being coordinated via HTTP headers (session cookies included) suddenly started behaving erratically.
On an interesting, related note, REST services using URL versioning (http://server/api/vX.X/resource, for example) were unaffected.

Related

ASP.NET Core - difference between Output caching and Response caching

ASP.NET Core 7 preview 6 just introduced Output caching which caches the endpoint output. However ASP.NET already has Response caching which seems to already provide the same feature.
What is the difference between the two and when should one be used and when should the other be used?
I was looking for aswers and trying to understand the differences between both, and really took a huge amount of time to understand the diferences between the two, and when (or not) to use each other.
As of November 2022 .Net 7 has been released, but the documentation is not very clear about the differences between them. The documentation and all videos only talk about the OutputCache as a replacement for the ResponseCache.
Also searching for OutputCache, it comes up with a lot of results from the old AspNet (Full framework) MVC 5.
So let´s clarify the differences and how we could use each other.
ResponseCache
First, the ResponseCache can be divided in 2 parts, that work independently and are different concepts of how and where the information would be cached. Let´s catch them up:
ResponseCacheAttribute: Basically it manipulates cache header like Vary, Cache-Control and others. It works telling the browsers or proxies to store (or not) the response content. This technique can reduce the number of requests done to the server, if used correctly.
The ResponseCache attribute sets response caching headers. Clients and
intermediate proxies should honor the headers for caching responses
under the HTTP 1.1 Caching specification
Response Caching Middleware: Basically it is used to make server side caching based on headers defined by ResponseCacheAttribute. Depending on the Request Headers sent to the server, the response would never be cached on server side.
Enables caching server responses based on HTTP cache headers.
Implements the standard HTTP caching semantics. Caches based on HTTP
cache headers like proxies do.
Is typically not beneficial for UI apps such as Razor Pages because
browsers generally set request headers that prevent caching. Output
caching, which is available in ASP.NET Core 7.0 and later, benefits UI
apps. With output caching, configuration decides what should be cached
independently of HTTP headers.
And at this point that OutputCache comes as a replacement for Response Caching Middleware.
OutputCache (available in ASP.NET Core 7.0 and later)
The OutputCache configuration decides what should be cached (server side) independently of HTTP headers. Also it comes with a lot of new features like cache entry invalidation, storage medium extensibility and others.
Conclusion
To take the benefits from both worlds you can use:
ResponseCacheAttribute: To manipulate response headers and enable the clients/proxies to store content on client side;
OutputCache: To store responses on server side and increase throuthput when responses are cached.
Both work independently. You can choose the one that fits best you application.
I haven't watch the video CodingMytra provided. But I think Output caching has some enhancements over Response caching. For example, you can specify a few seconds of caching.
I found a useful video, and it has some demos you learn more about the Output caching in .Net7. I think you can find the difference in this video.
We can find out why there is a need for Output caching in this github issue.
Link : Add support for Output Caching #27387

How to deal with proxy servers that block specific HTTP request methods?

I'm designing a REST web API, but noticed something weird lately.
Apparently some proxy servers are blocking specific HTTP request methods. In my case the PUT and PATCH methods which are crucial to modify resources. This partially breaks the functionality of the API I'm designing...
Is there a good way to bypass this problem without breaking the RESTful architecture constraints? In my opinion there isn't, because fully using the HTTP verbs is advocated when designing a REST web API over HTTP...
You have a few options:
Ignore it. People who willingly break the(ir) web (experience) using a malconfigured proxy server will have to deal with the consequences themselves.
Ask the proxy administrators to whitelist your host or the methods it accepts.
Rewrite your API, "breaking" REST principles.
Use HTTPS, so the proxy will only see the connect method.

Proper RESTful response to unsupported protocol (e.g. HTTP/HTTPS)?

I'm writing a RESTful API service, that will only work through HTTPS protocol. What kind of response code should be returned if the request comes via HTTP?
"301 Moved Permanently" where the server redirect the client from a http to https. This is the most commonly used pattern and what I would recommend you to implement on a server level. Implementation of this depend on the webserver your have and I would guess that there is plenty of good guide online to your specific server.
This will also tell the client to switch from HTTP to HTTPS Permanently.
If the question was more related to method rather than HTTP/HTTPS then "405 METHOD NOT ALLOWED" would be the correct choice. This is what you should respond to a client if the client is not not allowed to call the method itself. This would be misleading as a first step since the HTTP/HTTPS protocol is the problem and not the method (Get, Post, Put etc.) utilized by the client.
426 Upgrade Required. This suggest that https should be used.

need distributed web load testing tool with custom HTTP requests

I searched some of the similar questions, but haven't had a right solution yet.
I need to test a web cluster (which consists of many nodes, to provide some set of REST-ful APIs).
Not only HTTP GET request, I need to generate dynamic POST/PUT request in some manners. There are many tools, but I couldn't find right tool for generating POST/PUT request with non-static data.
Since I need to generate quite a large amount of requests, the load test tool should run in distributed nodes. In shorts:
ability to write the custom request for HTTP GET, POST and PUT. (any kind of major language such as Java, Ruby, etc. is okay)
ability to works in distributed Linux environment. (i.e. use multiple nodes to generate the requests)
ability to works on both HTTP and HTTPS
optional: generating nice-looking graphs
optional: construct a new request and queue for later (for state-ful API testing)
Based on certain condition, the request generator needs to parse JSON document in the HTTP body, and process it to make another GET/POST/PUT request.
Checkout Tsung, Faban, and Rain. Most likely, you have to edit some scripts within their frameworks.

Injecting data caching and other effects into the WCF pipeline

I have a service that always returns the same results for a given parameter. So naturally I would like to cache those results on the client.
Is there a way to introduce caching and other effect inside the WCF pipeline? Perhaps a custom binding class that could site between the client and the actual HTTP binding.
EDIT:
Just to be clear, I'm not talking about HTTP caching. The endpoint may not necessarily be HTTP and I am looking at far more effects than just caching. For example, one effect I need is to prevent multiple calls with the same parameters.
The WCF service can use Cache-Control directives in the HTTP header to say the client how it should use the client side cache. There are many options, which are the part of HTTP protocol. So you can for example define how long the client can just get the data from the local cache instead of making requests to the server. All clients implemented HTTP, like all web browsers, will follow the instructions. If your client use ajax requests to the WCF server, then the corresponding ajax call just return the data from the local cache.
Moreover one can implement many interesting caching scenarios. For example if one set "Cache-Control" to "max-age=0" (see here an example), then the client will always make revalidation of the cache by the server. Typically the server send so named "ETag" in the header together with the data. The "ETag" represent the MD5 hash or any other free information which will be changed if the data are changed. The client send automatically the "ETag", received previously from the server, together inside the header of the GET request to the server. The server can answer with the special response HTTP/1.1 304 Not Modified (instead of the typical HTTP/1.1 200 OK response) and with the body having no data. In the case the client will safe to get the data from the local cache.
I use "Cache-Control:max-age=0" additionally with Cache-Control: private which switch off caching the data on the proxy and declare that the data could be cached, but not shared with another users.
If you want read more about caching control with respect of HTTP headers I'll recommend you to read the following Caching Tutorial.
UPDATED: If you want implement some general purpouse caching you can use Microsoft Enterprise Library which contains Caching Application Block. The Microsoft Enterprise Library are published on the CodePlex with the source code. As an alternative in .NET 4.0 you can use System.Runtime.Caching. It can be used not only in ASP.NET (see here)
I continue recommend you to use HTTP binding with HTTP caching if it only possible in your environment. In the way you could save many time of development and receive at the end more simple, scalable and effective application. Because HTTP is so important, one implemened already so much useful things which you can use out-of-the-box. Caching is oly one from the features.