I would like to determine what the long url of a short url is. I have tried using http HEAD requests, but very few of the returned header fields actually contain any data pertaining to the destination/long url.
Is there:
1. Any way to determine the long url?
2. If so, can it be done without downloading the body of the destination?
Thank you
Issue an HTTP GET request, don't follow the redirect, analyse the Location header. That's where the target of redirection is.
Specifically in Cocoa, use an asynchronous request with a delegate, handle the didReceiveResponse in the delegate. The first response will be the redirection one. Once you extract the URL in the handler, call [cancel] on the connection.
EDIT: depending on the provider, HEAD instead of GET might or might not work. And if you don't follow the redirect, the response data won't be loaded anyway, so there's no transmission overhead to having a GET.
Do a HEAD and look for the Location header.
% telnet bit.ly 80
Trying 168.143.173.13...
Connected to bit.ly.
Escape character is '^]'.
HEAD /cwz5Jd HTTP/1.1
Host: bit.ly
HTTP/1.1 301 Moved
Server: nginx/0.7.42
Date: Fri, 12 Mar 2010 18:37:46 GMT
Content-Type: text/html; charset=utf-8
Connection: keep-alive
Set-Cookie: _bit=4b9a89fa-002bd-030af-baa08fa8;domain=.bit.ly;expires=Wed Sep 8 14:37:46 2010;path=/; HttpOnly
Location: http://www.engadget.com/2010/03/12/motorola-milestone-with-android-2-1-hitting-bulgaria-by-march-20/?utm_source=twitterfeed&utm_medium=twitter
MIME-Version: 1.0
Content-Length: 404
LongUrlPlease offers an API which expands short urls.
Related
We have a Google Cloud Function in live, which is essentially returning the correct redirects for us from a now-defunct site using a very simple Python script, backed by a CDN which is caching the responses to avoid triggering the function more than necessary.
We're not having any problems with how the function itself is working, however we have noticed that in response to a specific User-Agent (Bingbot) being passed with the request, Google Cloud Function is injecting a Cache-Control: Private header into the response independent of the function code (which does not specify a Cache-Control header inside the 301 response it sends back). This is causing all requests from Bingbots to be passed to the backend every time, causing our cloud function usage to be much higher than it would ordinarily be and incurring higher costs.
This also causes changes to Content and Transfer Encoding, although we are less concerned about this.
We tested this by stripping out the User-Agent header at the CDN level before the request to the backend (the function) and confirmed that without the Bingbot headers we get 0 persistent passes; allowing the header back through recreated the issue of far more Passes than we should be seeing.
We've begun stripping all User-Agent headers at this point which has solved the issue on a shallow basis, but we are concerned that this is undocumented behaviour and we have no information about when Cloud Functions may in other circumstances inject or manipulate response headers in response to request headers.
To confirm this isn't coming from our Python script, the relevant portion returning our response is as follows:
try:
return flask.redirect(redirect_dict[request.path], code=301)
except:
return flask.redirect(os.environ.get('FALLBACK_URL'), code=301)
Curl with Bingbot UA (actual URL & host obscured):
curl -v -X GET $function/$path -H 'User-Agent: Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)' -H $host
And the relevant response:
< HTTP/1.1 301 Moved Permanently
< Content-Type: text/html; charset=utf-8
< Function-Execution-Id: jzlrm3k4ndhv
< Location: $redirectURL
< X-Cloud-Trace-Context: 83841aa8390d4ea4c1c8349c3aca21be
< Content-Encoding: gzip
< Date: Mon, 20 May 2019 13:02:22 GMT
< Server: Google Frontend
< Cache-Control: private
< Alt-Svc: quic=":443"; ma=2592000; v="46,44,43,39"
< Transfer-Encoding: chunked
Without Bingbot UA, the response is:
< HTTP/1.1 301 Moved Permanently
< Content-Type: text/html; charset=utf-8
< Function-Execution-Id: t8frc9wsdvzp
< Location: $redirectURL
< X-Cloud-Trace-Context: 1f817eecdc84ad4a7542fba5898caf50;o=1
< Date: Mon, 20 May 2019 13:02:37 GMT
< Server: Google Frontend
< Content-Length: 319
< Alt-Svc: quic=":443"; ma=2592000; v="46,44,43,39"
We would expect responses to be the same as we are not injecting any cache-control headers in response to queries. Clearly varying the User Agent causes Google Cloud Functions to inject additional headers, vary encoding and otherwise transform responses. The concern is that there is no documentation or other information about this (unless I've missed it). If someone could point me at any kind of explanation, or if someone from Google could explain why this happens and any other settings we could use to prevent it, that would be the ideal outcome here.
I'm trying to collect and download my lifelog user data. The first step into doing this is getting a user-access token. I am encountering problems while requesting authorization.
From the sony developer authenticization page I am told to input the following code into my API explorer:
https://platform.lifelog.sonymobile.com/oauth/2/authorize?client_id=YOUR_CLIENT_ID&scope=lifelog.profile.read+lifelog.activities.read+lifelog.locations.read
I am supposed to receive the authorization code as such:
https://YOUR_CALLBACK_URL?code=abcdef
However, this is what the current situation is actually like:
I have replaced my actual client ID below with MY_CLIENT_ID for security reasons
INPUT:
GET /oauth/2/authorize?client_id=MY_CLIENT_ID&scope=lifelog.profile.read%2Blifelog.activities.read%2Blifelog.locations.read HTTP/1.1
Authorization:
Bearer kN2Kj5BThn5ZvBnAAPM-8JU0TlU
Host:
platform.lifelog.sonymobile.com
X-Target-URI:
https://platform.lifelog.sonymobile.com
Connection:
Keep-Alive
RESPONSE:
HTTP/1.1 302 Found
Content-Length:
196
Location:
https://auth.lifelog.sonymobile.com/oauth/2/authorize?scope=lifelog.profile.read+lifelog.activities.read+lifelog.locations.read&client_id=MY_CLIENT_ID
Access-Control-Max-Age:
3628800
X-Amz-Cf-Id:
HILH9w3eOm-6ebs_74ghegYQyWS4xyqA1l0gXPRJuuubsoZ6eiiS3g==
Access-Control-Allow-Methods:
GET, PUT, POST, DELETE
X-Request-Id:
76caccfc976d40259ef30415d10980e9
Connection:
keep-alive
Server:
Apigee Router
X-Cache:
Miss from cloudfront
X-Powered-By:
Express
Access-Control-Allow-Headers:
origin, x-requested-with, accept
Date:
Sun, 22 Jan 2017 03:00:42 GMT
Access-Control-Allow-Origin:
*
Vary:
Accept
Via:
1.1 dc698cd00b7ec82887573cfaba9ecca6.cloudfront.net (CloudFront)
Content-Type:
text/plain; charset=utf-8
Found. Redirecting to https://auth.lifelog.sonymobile.com/oauth/2/authorize?scope=lifelog.profile.read+lifelog.activities.read+lifelog.locations.read&client_id=MY_CLIENT_ID
Nowhere can I see the authorization code in the above code. I even tried copying and pasting the URL (on the last line) into my browser, it says "localhost.com took too long to respond"
This is where I input my request
I am not sure whether it is an issue with the callback URL. I don't have an actual website or app made, I just used the default localhost
I am a beginner in this and would really appreciate all help.
We moved our website a while ago to a new hoster and experience sporadically issues where people cannot logout anymore. Not sure if that has anything to do with the hosting environment or with a code change.
This is the Wireshark log of the relevant bit - all is happening in the same TCP stream.
Logout request from the browser (note the authentication cookie):
GET /cirrus/logout HTTP/1.1
Host: subdomain.domain.com
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:26.0) Gecko/20100101 Firefox/26.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Referer: http://subdomain.domain.com/cirrus/CA/Admin/AccountSwitch
Cookie: USER.AUTH=AOvDEjH3w6xIxUC0sYNOAQR5BZ7pPmEF0RMxqohERN87Ti03Eqxd7rQC/BveqmaszmFg8QoSonP+Z+mtQQivKpvloFsQYretYKR8ENubj+moUBF479K5e4albKxS9mBEWT5Xy/XCnEyCPqLASGLY09ywkmIilNU1Ox4J3fCtYXHelE/hyzuKe9y3ui5AKEbbGs3sN9q1zYjVjHKKiNIGaHvjJ2zn7ZUs042B82Jc9RHzt0JW8dnnrl3mAkN1lJQogtlG+ynQSCyQD8YzgO8IpOnSXLJLaCMGMQcvSyX4YKJU/9sxgA5r5cZVCkHLsReS3eIJtXoxktMO6nxVZJY6MX1YwuJOgLRQvwBy9FFnQ6ye
X-LogDigger-CliVer: client-firefox 2.1.5
X-LogDigger: logme=0&reqid=fda96ee5-2db4-f543-81b5-64bdb022d358&
Connection: keep-alive
Server response. It clears the cookie value and redirects
HTTP/1.1 302 Found
Server: nginx
Date: Fri, 22 Nov 2013 14:40:22 GMT
Content-Type: text/html; charset=utf-8
Content-Length: 124
Connection: keep-alive
Cache-Control: private, no-cache="Set-Cookie"
Location: /cirrus
Set-Cookie: USER.AUTH=; expires=Fri, 22-Jul-2005 14:40:17 GMT; path=/cirrus
X-Powered-By: ASP.NET
X-UA-Compatible: chrome=IE8
<html><head><title>Object moved</title></head><body>
<h2>Object moved to here.</h2>
</body></html>
Browser follows the redirection, but with the old cookie value:
GET /cirrus HTTP/1.1
Host: subdomain.domain.com
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:26.0) Gecko/20100101 Firefox/26.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Referer: http://subdomain.domain.com/cirrus/CA/Admin/AccountSwitch
Cookie: USER.AUTH=AOvDEjH3w6xIxUC0sYNOAQR5BZ7pPmEF0RMxqohERN87Ti03Eqxd7rQC/BveqmaszmFg8QoSonP+Z+mtQQivKpvloFsQYretYKR8ENubj+moUBF479K5e4albKxS9mBEWT5Xy/XCnEyCPqLASGLY09ywkmIilNU1Ox4J3fCtYXHelE/hyzuKe9y3ui5AKEbbGs3sN9q1zYjVjHKKiNIGaHvjJ2zn7ZUs042B82Jc9RHzt0JW8dnnrl3mAkN1lJQogtlG+ynQSCyQD8YzgO8IpOnSXLJLaCMGMQcvSyX4YKJU/9sxgA5r5cZVCkHLsReS3eIJtXoxktMO6nxVZJY6MX1YwuJOgLRQvwBy9FFnQ6ye
X-LogDigger-CliVer: client-firefox 2.1.5
X-LogDigger: logme=0&reqid=0052e1e1-2306-d64d-a308-20f9fce4702e&
Connection: keep-alive
Is there anything obvious missing in the Set-Cookie header which could prevent the browser from deleting the cookie?
To change the value for an existing cookie, the following cookie parameters must match:
name
path
domain
name and path are set explecitely, the domain is not. Could that be the problem?
Edit: As it has been asked why the expiration date is set in the past, a bit more background.
This is using a slight modification of the AppHarbor Security plug-in: https://github.com/appharbor/AppHarbor.Web.Security
The modification is to include the path to the cookie. Please find here the modified logout method:
public void SignOut(string path)
{
_context.Response.Cookies.Remove(_configuration.CookieName);
_context.Response.Cookies.Add(new HttpCookie(_configuration.CookieName, "")
{
Expires = DateTime.UtcNow.AddMonths(-100),
Path = path
});
}
The expiration date in the past is done by the AppHarbor plug-in and is common practice. See http://msdn.microsoft.com/en-us/library/ms178195(v=vs.100).aspx
At a guess i'd say the historical expiry date is causing the whole Set-Cookie line to be ignored (why set a cookie that expired 8 years ago?).
expires=Fri, 22-Jul-2005
We have had issues with deleting cookies in the past and yes the domain and path must match the domain and path of the cookie you are trying to delete.
Try setting the correct domain and path in the HttpCookie.
Great question, and excellent notes. I've had this problem recently also.
There is one fail-safe approach to this, beyond what you ought to already be doing:
Set expiration in the past.
Set a path and domain.
Put bogus data in the cookie being removed!
Set-Cookie: USER.AUTH=invalid; expires=Fri, 22-Jul-2005 14:40:17 GMT; path=/cirrus; domain=subdomain.domain.com
The fail-safe approach goes like this:
Add a special string to all cookies, at the end. Unless that string exists, reject the cookie and forcibly reset it. For example, all new cookies must look like this:
Set-Cookie: USER.AUTH=AOvDEjH3w6xIxUC0sYNOAQR5BZ7pPmEF0RMxqohERN87Ti03Eqxd7rQC/BveqmaszmFg8QoSonP+Z+mtQQivKpvloFsQYretYKR8ENubj+moUBF479K5e4albKxS9mBEWT5Xy/XCnEyCPqLASGLY09ywkmIilNU1Ox4J3fCtYXHelE/hyzuKe9y3ui5AKEbbGs3sN9q1zYjVjHKKiNIGaHvjJ2zn7ZUs042B82Jc9RHzt0JW8dnnrl3mAkN1lJQogtlG+ynQSCyQD8YzgO8IpOnSXLJLaCMGMQcvSyX4YKJU/9sxgA5r5cZVCkHLsReS3eIJtXoxktMO6nxVZJY6MX1YwuJOgLRQvwBy9FFnQ6ye|1386510233; expires=Fri, 22-Jul-2005 14:40:17 GMT; path=/cirrus; domain=subdomain.domain.com
Notice the change: That extremely long string stored in USER.AUTH ends with |1386510233, which is the unix epoch of the moment when the cookie was set.
This adds a simple extra step to cookie parsing. Now you need to test for the presence of | and to discard the unix epoch unless you care to know when the cookie was set. To make it go faster, you can just check for string[length-10]==| rather than parsing the whole string. In the way I do it, I split the string at | and check for two values after the split. This bypasses a two-part parsing process, but this aspect is language specific and really just preferential when it comes to your choice of tactic. If you plan to discard the value, just check the specific index where you expect the | to be.
In the future if you change hosts again, you can test that unix epoch and reject cookies older than a certain point in time. This at the very most adds two extra processes to your cookie handler: removing the |unixepoch and if desired, checking when that time was to reject a cookie if you change hosts again. This adds about 0.001s to a pageload, or less. That is worth it compared to customer service failures and mass brain damage.
Your new cookie strategy allows you to easily reject all cookies without the |unixepoch immediately, because you know they are old cookies. And yes, people might complain about this approach, but it is the only way to truly know the cookie is valid, really. You cannot rely on the client side to provide you valid cookies. And you cannot keep a record of every single cookie out there, unless you want to warehouse a ton of data. If you warehouse every cookie and check it every time, that can add 0.01s to a pageload versus 0.001s in this strategy, so the warehousing route is not worth it.
An alternative approach is to use USER.AUTHENTICATION rather than USER.AUTH as your new cookie value, but that is more invasive perhaps. And you don't gain the benefit of what I said above if/when you change hosts again.
Good luck with your transition. I hope you get this sorted out. Using the strategy above, I was able to.
When using the HTTP API I am trying to make a call to the aliveness-test for monitoring purposes. At the moment I am testing using curl and the following command:
curl -i http://guest:guest#localhost:55672/api/aliveness-test/
And I get the following response:
HTTP/1.1 404 Object Not Found
Server: MochiWeb/1.1 WebMachine/1.9.0 (someone had painted it blue)
Date: Mon, 05 Nov 2012 17:18:58 GMT
Content-Type: text/html
Content-Length: 193
<HTML><HEAD><TITLE>404 Not Found</TITLE></HEAD><BODY><H1>Not Found</H1>The requested document was not found on this server.<P><HR><ADDRESS>mochiweb+webmachine web server</ADDRESS></BODY></HTML>
When making a request just to list the users or vhosts, the requests returns successfully:
$ curl -I http://guest:guest#localhost:55672/api/users
HTTP/1.1 200 OK
Server: MochiWeb/1.1 WebMachine/1.9.0 (someone had painted it blue)
Date: Mon, 05 Nov 2012 17:51:44 GMT
Content-Type: application/json
Content-Length: 11210
Cache-Control: no-cache
I'm using the latest stable version (2.8.7) of RabbitMQ and obviously have the management plugin installed for the API to work with the users call (the response is left out due to it containing company data but is just regular JSON as expected).
There isn't much on the internet about this call failing so I am wondering if anyone has seen this before?
Thanks,
Kristian
Turns out that the '/' at the beginning of the vhosts names is not implicit, even when as part of a URL. To get this to work I simply changed my request from:
curl -i http://guest:guest#localhost:55672/api/aliveness-test/
To
curl -i http://guest:guest#localhost:55672/api/aliveness-test/%2F
As %2F is '/' HTTP encoded, my request now queries the vhost named '/' and returns a 200 response which looks like:
{"status":"ok"}
We have an Tomcat server where we're trying to log the HTTP version which the response is sent with. We've seen a few times that it seems to be HTTP/0.9, which kills the content (not supported I guess?). We would like to get some stats on this by using the access log in apache. However, since the header line for this isn't prefixed by anything, we cannot use the %{xxx}o logging.
Is there a way to get this?
An example:
Response is:
HTTP/1.1 503 This application is not currently available
Server: Apache-Coyote/1.1
Content-Type: text/html;charset=utf-8
Content-Length: 1090
Date: Wed, 12 May 2010 12:53:16 GMT
Connection: close
And we'd like the catch HTTP/1.1 (alternatively, HTTP/1.1 503 This application is not currently available.
Is this possible? We do not have access to the application being served, so we need to do this either as a Java filter, or in the tomcat access log - Preferably in the access log.
Enabling the <Valve className="org.apache.catalina.valves.RequestDumperValve"/> in server.xml writes out the request and response headers for each request.
Example:
19-May-2010 12:26:18 org.apache.catalina.valves.RequestDumperValve invoke
INFO: protocol=HTTP/1.1