Google Cloud Function injecting 'Cache-Control: Private' header in response to Bingbot User Agent independent of function code - http-headers

We have a Google Cloud Function in live, which is essentially returning the correct redirects for us from a now-defunct site using a very simple Python script, backed by a CDN which is caching the responses to avoid triggering the function more than necessary.
We're not having any problems with how the function itself is working, however we have noticed that in response to a specific User-Agent (Bingbot) being passed with the request, Google Cloud Function is injecting a Cache-Control: Private header into the response independent of the function code (which does not specify a Cache-Control header inside the 301 response it sends back). This is causing all requests from Bingbots to be passed to the backend every time, causing our cloud function usage to be much higher than it would ordinarily be and incurring higher costs.
This also causes changes to Content and Transfer Encoding, although we are less concerned about this.
We tested this by stripping out the User-Agent header at the CDN level before the request to the backend (the function) and confirmed that without the Bingbot headers we get 0 persistent passes; allowing the header back through recreated the issue of far more Passes than we should be seeing.
We've begun stripping all User-Agent headers at this point which has solved the issue on a shallow basis, but we are concerned that this is undocumented behaviour and we have no information about when Cloud Functions may in other circumstances inject or manipulate response headers in response to request headers.
To confirm this isn't coming from our Python script, the relevant portion returning our response is as follows:
try:
return flask.redirect(redirect_dict[request.path], code=301)
except:
return flask.redirect(os.environ.get('FALLBACK_URL'), code=301)
Curl with Bingbot UA (actual URL & host obscured):
curl -v -X GET $function/$path -H 'User-Agent: Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)' -H $host
And the relevant response:
< HTTP/1.1 301 Moved Permanently
< Content-Type: text/html; charset=utf-8
< Function-Execution-Id: jzlrm3k4ndhv
< Location: $redirectURL
< X-Cloud-Trace-Context: 83841aa8390d4ea4c1c8349c3aca21be
< Content-Encoding: gzip
< Date: Mon, 20 May 2019 13:02:22 GMT
< Server: Google Frontend
< Cache-Control: private
< Alt-Svc: quic=":443"; ma=2592000; v="46,44,43,39"
< Transfer-Encoding: chunked
Without Bingbot UA, the response is:
< HTTP/1.1 301 Moved Permanently
< Content-Type: text/html; charset=utf-8
< Function-Execution-Id: t8frc9wsdvzp
< Location: $redirectURL
< X-Cloud-Trace-Context: 1f817eecdc84ad4a7542fba5898caf50;o=1
< Date: Mon, 20 May 2019 13:02:37 GMT
< Server: Google Frontend
< Content-Length: 319
< Alt-Svc: quic=":443"; ma=2592000; v="46,44,43,39"
We would expect responses to be the same as we are not injecting any cache-control headers in response to queries. Clearly varying the User Agent causes Google Cloud Functions to inject additional headers, vary encoding and otherwise transform responses. The concern is that there is no documentation or other information about this (unless I've missed it). If someone could point me at any kind of explanation, or if someone from Google could explain why this happens and any other settings we could use to prevent it, that would be the ideal outcome here.

Related

Cannot generate an authorization code on API Explorer

I'm trying to collect and download my lifelog user data. The first step into doing this is getting a user-access token. I am encountering problems while requesting authorization.
From the sony developer authenticization page I am told to input the following code into my API explorer:
https://platform.lifelog.sonymobile.com/oauth/2/authorize?client_id=YOUR_CLIENT_ID&scope=lifelog.profile.read+lifelog.activities.read+lifelog.locations.read
I am supposed to receive the authorization code as such:
https://YOUR_CALLBACK_URL?code=abcdef
However, this is what the current situation is actually like:
I have replaced my actual client ID below with MY_CLIENT_ID for security reasons
INPUT:
GET /oauth/2/authorize?client_id=MY_CLIENT_ID&scope=lifelog.profile.read%2Blifelog.activities.read%2Blifelog.locations.read HTTP/1.1
Authorization:
Bearer kN2Kj5BThn5ZvBnAAPM-8JU0TlU
Host:
platform.lifelog.sonymobile.com
X-Target-URI:
https://platform.lifelog.sonymobile.com
Connection:
Keep-Alive
RESPONSE:
HTTP/1.1 302 Found
Content-Length:
196
Location:
https://auth.lifelog.sonymobile.com/oauth/2/authorize?scope=lifelog.profile.read+lifelog.activities.read+lifelog.locations.read&client_id=MY_CLIENT_ID
Access-Control-Max-Age:
3628800
X-Amz-Cf-Id:
HILH9w3eOm-6ebs_74ghegYQyWS4xyqA1l0gXPRJuuubsoZ6eiiS3g==
Access-Control-Allow-Methods:
GET, PUT, POST, DELETE
X-Request-Id:
76caccfc976d40259ef30415d10980e9
Connection:
keep-alive
Server:
Apigee Router
X-Cache:
Miss from cloudfront
X-Powered-By:
Express
Access-Control-Allow-Headers:
origin, x-requested-with, accept
Date:
Sun, 22 Jan 2017 03:00:42 GMT
Access-Control-Allow-Origin:
*
Vary:
Accept
Via:
1.1 dc698cd00b7ec82887573cfaba9ecca6.cloudfront.net (CloudFront)
Content-Type:
text/plain; charset=utf-8
Found. Redirecting to https://auth.lifelog.sonymobile.com/oauth/2/authorize?scope=lifelog.profile.read+lifelog.activities.read+lifelog.locations.read&client_id=MY_CLIENT_ID
Nowhere can I see the authorization code in the above code. I even tried copying and pasting the URL (on the last line) into my browser, it says "localhost.com took too long to respond"
This is where I input my request
I am not sure whether it is an issue with the callback URL. I don't have an actual website or app made, I just used the default localhost
I am a beginner in this and would really appreciate all help.

How to return a 419 HTTP status in a Symfony 2 Response object

I'm currently working on an HTTP API in Symfony 2 for which I have implemented OAuth2 Password Credentials for authentication. On authentication the client received two tokens, one access and one refresh token.
Somehow I need to be able to inform the client that the access token is no longer valid and that the refresh token must be used to obtain a new access token.
I have gone through the list of HTTP codes [1] and to me the 419 Authentication Timeout status seems the most appropriate. However, when I use this in the Response object I actually receive a 500 status instead:
return new Response("OK", 419);
Results:
$ curl -X GET http://local.api
< HTTP/1.1 500 Internal Server Error
< Date: Thu, 23 Jan 2014 21:34:39 GMT
* Server Apache/2.4.6 (Ubuntu) is not blacklisted
< Server: Apache/2.4.6 (Ubuntu)
< X-Powered-By: PHP/5.5.3-1ubuntu2.1
< Cache-Control: no-cache
< X-Debug-Token: b7c386
< X-Debug-Token-Link: /app_dev.php/_profiler/b7c386
< Transfer-Encoding: chunked
< Content-Type: text/html; charset=UTF-8
I get the impression that the Response object doesn't seem to support this code (looking through the list of constants defined in the class) but from the Response class code itself I can't detect any logic that wouldn't allow me to send a 419.
I'm currently falling back on the 511 Network Authentication Required but I prefer to use the 419 instead. How can I do this?
[1] http://en.wikipedia.org/wiki/List_of_HTTP_status_codes
Look at the Response constructor, it throws an InvalidArgumentException exception if the status code is not valid. You will need to use a different status code (take a look at the class constants for the valid codes in Symfony)

Why is apache accepting this invalid range request?

I'm trying to debug a ranged request issue in my app, so I've been using curl to see the headers. If I do curl -v -H "Range: bytes=200-100" THEURL the server responds with:
< HTTP/1.1 206 Partial Content
< Date: Sat, 19 Jan 2013 17:46:52 GMT
< Server: Apache
< Content-Range: bytes 200-100/1096985137
< Etag: --REDACTED BY OP--
< Transfer-Encoding: chunked
< Content-Type: application/x-zip-compressed
Wouldn't returning 206 imply that the content range is valid and that range is going to be served?
Another thing I've noticed is that even if I use a valid, but small, content range like Range: bytes=0-100, the server responds with 206 but sends way more data than 100 bytes.
Am I doing something wrong?
Edit: It seems no matter what range I send this server, I always get back the full download. Strange.
I'll go ahead and answer my own question, even though the answer is unsatisfying. Turns out it was just a bug with the version of Apache running on the server. Once the server was updated the problem was resolved.

RabbitMQ HTTP API call to aliveness-test returns 404 but other calls work

When using the HTTP API I am trying to make a call to the aliveness-test for monitoring purposes. At the moment I am testing using curl and the following command:
curl -i http://guest:guest#localhost:55672/api/aliveness-test/
And I get the following response:
HTTP/1.1 404 Object Not Found
Server: MochiWeb/1.1 WebMachine/1.9.0 (someone had painted it blue)
Date: Mon, 05 Nov 2012 17:18:58 GMT
Content-Type: text/html
Content-Length: 193
<HTML><HEAD><TITLE>404 Not Found</TITLE></HEAD><BODY><H1>Not Found</H1>The requested document was not found on this server.<P><HR><ADDRESS>mochiweb+webmachine web server</ADDRESS></BODY></HTML>
When making a request just to list the users or vhosts, the requests returns successfully:
$ curl -I http://guest:guest#localhost:55672/api/users
HTTP/1.1 200 OK
Server: MochiWeb/1.1 WebMachine/1.9.0 (someone had painted it blue)
Date: Mon, 05 Nov 2012 17:51:44 GMT
Content-Type: application/json
Content-Length: 11210
Cache-Control: no-cache
I'm using the latest stable version (2.8.7) of RabbitMQ and obviously have the management plugin installed for the API to work with the users call (the response is left out due to it containing company data but is just regular JSON as expected).
There isn't much on the internet about this call failing so I am wondering if anyone has seen this before?
Thanks,
Kristian
Turns out that the '/' at the beginning of the vhosts names is not implicit, even when as part of a URL. To get this to work I simply changed my request from:
curl -i http://guest:guest#localhost:55672/api/aliveness-test/
To
curl -i http://guest:guest#localhost:55672/api/aliveness-test/%2F
As %2F is '/' HTTP encoded, my request now queries the vhost named '/' and returns a 200 response which looks like:
{"status":"ok"}

How to get a long url from a short url

I would like to determine what the long url of a short url is. I have tried using http HEAD requests, but very few of the returned header fields actually contain any data pertaining to the destination/long url.
Is there:
1. Any way to determine the long url?
2. If so, can it be done without downloading the body of the destination?
Thank you
Issue an HTTP GET request, don't follow the redirect, analyse the Location header. That's where the target of redirection is.
Specifically in Cocoa, use an asynchronous request with a delegate, handle the didReceiveResponse in the delegate. The first response will be the redirection one. Once you extract the URL in the handler, call [cancel] on the connection.
EDIT: depending on the provider, HEAD instead of GET might or might not work. And if you don't follow the redirect, the response data won't be loaded anyway, so there's no transmission overhead to having a GET.
Do a HEAD and look for the Location header.
% telnet bit.ly 80
Trying 168.143.173.13...
Connected to bit.ly.
Escape character is '^]'.
HEAD /cwz5Jd HTTP/1.1
Host: bit.ly
HTTP/1.1 301 Moved
Server: nginx/0.7.42
Date: Fri, 12 Mar 2010 18:37:46 GMT
Content-Type: text/html; charset=utf-8
Connection: keep-alive
Set-Cookie: _bit=4b9a89fa-002bd-030af-baa08fa8;domain=.bit.ly;expires=Wed Sep 8 14:37:46 2010;path=/; HttpOnly
Location: http://www.engadget.com/2010/03/12/motorola-milestone-with-android-2-1-hitting-bulgaria-by-march-20/?utm_source=twitterfeed&utm_medium=twitter
MIME-Version: 1.0
Content-Length: 404
LongUrlPlease offers an API which expands short urls.