What is the technical term for when a URL is repeated within the same URL? - api

I have an API that returns URL that render an image from Adobe Scene7. The API is doubling up, or repeating, the URL in the xml return and is causing parsing issues in my presentation. Is there a technical term for this repeated URL? I need to know how to report this to my developers.
Here is an example, in the url can see where the protocol starts and repeats the exact same URL...
<d1p1:ProductImageUrl>http://s7d7.scene7.com/is/image/BBE004?&op_colorize=0f5196http://s7d7.scene7.com/is/image/BBE004?&op_colorize=0f5196</d1p1:ProductImageUrl>

I think the technical term for this is "invalid URL".

Related

How does google index web chats that load messages dynamically via XHR or WebSocket?

Why i am able to google messages in (for example) gitter.im? How did google indexed all this: https://gitter.im/neoclide/coc.nvim?at=5ea00cdda3612210839689f1 ?
Does gitter.im return its content to google in another format or via some specific interface/protocol declared in special section for web crawlers somewhere? Did google spent some resources on development to build a gitter.im-specific crawler that is able to do specific XHR-requests?
Simple:
Google ask https://gitter.im/gitter/developers
There is N recent messages embedded in HTML already, say 50. Then google just extract all the links from the HTML (from that time-tag "18:15", for example). Each time-tag gives you url of form https://gitter.im/gitter/developers?at=610011abc9f8852a970e808e and google doesnt care why. Just remember urls.
Google asks that grabbed 50 urls of form https://gitter.im/gitter/developers?at=610011abc9f8852a970e808e
Each such URL gives you ~50 messages around that exact message. So search engine think: "ok, this URL gives you THIS text".
So when you search THIS test it just gives you the url closer-to that text or maybe just any url with that text...

Facebook App in Page Tab receiving signed_request but missing page data

I have a page tab app that I am hosting. I have both http and https supported. While I receive a signed_request package as expected, after I decode it does not contain page information. That data is simply missing.
I verified that like schemes are being used (https) among facebook, my hosted site and even the 'go between'-- facebook's static page handler.
Also created a new application with page tab support but got the same results-- simply no page information in the signed_request.
Any other causes people can think of?
I add the app to the page tab using this link:
https://www.facebook.com/dialog/pagetab?app_id=176236832519816&next=https://www.intelligantt.com/Facebook/application.html
Here is the page tab I am using (Note: requires permissions):
https://www.facebook.com/pages/School-Auction-Test-2/154869721351873?id=154869721351873&sk=app_176236832519816
Here is the decoded signed_request I am receiving:
{"algorithm":"HMAC-SHA256","code":!REMOVED!,"issued_at":1369384264,"user_id":"1218470256"}
5/25 Update - I thought maybe the canvas app urls didn't match the page tab urls so I spent several hours going through scenarios where they both had a trailing slash or not. Where they both had a trailing ? or not, with query parameters or not.
I also tried changing the 'next' value when creating the page tab to the canvas app url and the page tab url.
No success on either count.
I did read where because I'm seeing the 'code' value in the signed_request it means Facebook either couldn't match my urls or that I'm capturing the second request. However, I given all the URL permutations I went through I believe the urls match. I also subscribed to the 'auth.authResponseChange' which should give me the very first authResponse that should contain the signed_request with page.id in it (but doesn't).
If I had any reputation, I'd add a bounty to this.
Thanks.
I've just spent ~5 hours on this exact same problem and posted a prior answer that was incorrect. Here's the deal:
As you pointed out, signed_request appears to be missing the page data if your tab is implemented in pure javascript as a static html page (with *.htm extension).
I repeated the exact same test, on the exact same page, but wrapped my html page (including js) within a Perl script (with *.cgi extension)... and voila, signed_request has the page info.
Although confusing (and should be better documented as a design choice by Facebook), this may make some sense because it would be impossible to validate the signed_request wholly within Javascript without placing your secretkey within the scope (and therefore revealing it to a potential hacker).
It would be much easier with the PHP SDK, but if you just want to use JavaScript, maybe this will help:
Facebook Registration - Reading the data/signed request with Javascript
Also, you may want to check out this: https://github.com/diulama/js-facebook-signed-request
simply you can't get the full params with the javascript signed_request, use the php sdk to get the full signed_request . and record the values you need into javascript variabls ...
with the php sdk after instanciation ... use the facebook object as following.
$signed_request = $facebook->getSignedRequest();
var_dump($signed_request) ;
this is just to debug but u'll see that the printed array will contain many values that u won't get with js sdk for security reasons.
hope that helped better anyone who would need it, cz it seems this issue takes at the min 3 hours for everyone who runs into.

How to use regular urls without the hash symbol in spine.js?

I'm trying to achieve urls in the form of http://localhost:9294/users instead of http://localhost:9294/#/users
This seems possible according to the documentation but I haven't been able to get this working for "bookmarkable" urls.
To clarify, browsing directly to http://localhost:9294/users gives a 404 "Not found: /users"
You can turn on HTML5 History support in Spine like this:
Spine.Route.setup(history: true)
By passing the history: true argument to Spine.Route.setup() that will enable the fancy URLs without hash.
The documentation for this is actually buried a bit, but it's here (second to last section): http://spinejs.com/docs/routing
EDIT:
In order to have urls that can be navigated to directly, you will have to do this "server" side. For example, with Rails, you would have to build a way to take the parameter of the url (in this case "/users"), and pass it to Spine accordingly. Here is an excerpt from the Spine docs:
However, there are some things you need to be aware of when using the
History API. Firstly, every URL you send to navigate() needs to have a
real HTML representation. Although the browser won't request the new
URL at that point, it will be requested if the page is subsequently
reloaded. In other words you can't make up arbitrary URLs, like you
can with hash fragments; every URL passed to the API needs to exist.
One way of implementing this is with server side support.
When browsers request a URL (expecting a HTML response) you first make
sure on server-side that the endpoint exists and is valid. Then you
can just serve up the main application, which will read the URL,
invoking the appropriate routes. For example, let's say your user
navigates to http://example.com/users/1. On the server-side, you check
that the URL /users/1 is valid, and that the User record with an ID of
1 exists. Then you can go ahead and just serve up the JavaScript
application.
The caveat to this approach is that it doesn't give search engine
crawlers any real content. If you want your application to be
crawl-able, you'll have to detect crawler bot requests, and serve them
a 'parallel universe of content'. That is beyond the scope of this
documentation though.
It's definitely a good bit of effort to get this working properly, but it CAN be done. It's not possible to give you a specific answer without knowing the stack you're working with.
I used the following rewrites as explained in this article.
http://www.josscrowcroft.com/2012/code/htaccess-for-html5-history-pushstate-url-routing/

What is Twitter's definition of an URL?

I already asked the same question over at dev.twitter.com, however, I didn't get an answer there. So maybe someone here on SO ran into the same issue and has an answer.
In my application I count the length of the characters the user enters to compose a tweet. However, if the user enters an URL, this will be shortened automatically (by Twitter's API) when posting the tweet. So I have to replace the length of the URL with the length of the resulting t.co URL in my character counter.
However, the problem is now, what is Twitter's definition of an URL so that I know when you adapt my character counter and when not. For example www.verylongexampleurl.de gets shortened, while verylongexampleurl.de (without the www) doesn't, but verylongexampleurl.com does get shortened again.
I couldn't find any documentation, but maybe I missed it. All hints are appreciated.
Quoting from dev.twitter.com:
Need help parsing tweet text?
Take a look on the Twitter text processing library we’re using for auto linking and extraction of usernames, lists & hashtags.
Ruby: https://github.com/twitter/twitter-text-rb
Java: https://github.com/twitter/twitter-text-java
Javascript: https://github.com/twitter/twitter-text-js
The actual specification (tests) can be found here: https://github.com/twitter/twitter-text-conformance/blob/master/autolink.yml

Do I need to send a 404?

We're in the middle of writing a lot of URL rewrite code that would basically take ourdomain.com/SomeTag and some something dynamic to figure out what to display.
Now if the Tag doesn't exist in our system, we're gonna display some information helping them finding what they were looking for.
And now the question came up, do we need to send a 404 header? Should we? Are there any reasons to do it or not to do it?
Thanks
Nathan
You aren't required to, but it can be useful for automated checkers to detect the response code instead of having to parse the page.
I certainly send proper response codes in my applications, especially when I have database errors or other fatal errors. Then the search engine knows to give up and retry in 5 mins instead of indexing the page. e.g. code 503 for "Service Unavailable" and I also send a Retry-After: 600 to tell it to try again...search engines won't take this badly.
404 codes are sent when the page should not be indexed or doesn't exist (e.g. non-existent tag)
So yes, do send status codes.
I say do it - if the user is actually an application acting on behalf of the user (i.e. cURL, wget, something custom, etc...) then a 404 would actually help quite a bit.
You have to keep in mind that the result code you return is not for the user; for the standard user, error codes are meaningless so don't display this info to the user.
However think about what could happen if the crawlers access your pages and consider them valid (with a 200 response); they will start indexing the content and your page will be added to the index. If you tell the search engine to index the same content for all your not found pages, it will certainly affect your ranking and if one page appears in the top search results, you will look like a fool.