How does google index web chats that load messages dynamically via XHR or WebSocket? - xmlhttprequest

Why i am able to google messages in (for example) gitter.im? How did google indexed all this: https://gitter.im/neoclide/coc.nvim?at=5ea00cdda3612210839689f1 ?
Does gitter.im return its content to google in another format or via some specific interface/protocol declared in special section for web crawlers somewhere? Did google spent some resources on development to build a gitter.im-specific crawler that is able to do specific XHR-requests?

Simple:
Google ask https://gitter.im/gitter/developers
There is N recent messages embedded in HTML already, say 50. Then google just extract all the links from the HTML (from that time-tag "18:15", for example). Each time-tag gives you url of form https://gitter.im/gitter/developers?at=610011abc9f8852a970e808e and google doesnt care why. Just remember urls.
Google asks that grabbed 50 urls of form https://gitter.im/gitter/developers?at=610011abc9f8852a970e808e
Each such URL gives you ~50 messages around that exact message. So search engine think: "ok, this URL gives you THIS text".
So when you search THIS test it just gives you the url closer-to that text or maybe just any url with that text...

Related

API Request URL returns "Invalid Access"

I'm trying to scrape data from a website but I have no experience with scraping or APIs except for making a Discord Bot once. So I followed the steps described here to find the API:
http://www.gregreda.com/2015/02/15/web-scraping-finding-the-api
The Request URL in the Headers tab with the important information is this one:
https://api.amiami.com/api/v1.0/item?gcode=FIGURE-119023&lang=eng
When I try to open this page, like he does, it only returns:
{"RSuccess":false,"RValue":{"HttpStatusCode":400},"RMessage":"Invalid access."}
If you want to try getting the Request URL yourself, the original page I used was:
https://www.amiami.com/eng/detail/?gcode=FIGURE-119023
Removing the language argument doesn't seem to change anything either. So I guess there's something that detects that I'm not accessing it in a normal way. Any Ideas on how to fix this?

How to remove URLs with argument in google result

I have a website which I have recently started and also submitted my sitemap on google webmaster tool. My site got index whiten short time but whenever I search about my website on google, I see two three version of my same pages with diff URL arguments on each
Means suppose my site name is example.com, so when I search about exmaple.com on Google I get the results like following
www.example.com/?page=2
www.example.com/something/?page=3
www.example.com
As I know result 1 and result 3 are same, why are they being shown separately ? I don't have any such URL in my sitemap and not even in any of my html page so why is this happening I am little confused. I want to get rid of it
Also result no 2 should be displayed simple as www.exaple.com/something
and not like www.example.com/something?page=3
There is actually a setting in google webmaster tool which helps in removing URLs with parameters. To access & configure the setting, navigate to Webmaster tool --> Crawl --> URL Parameters and set them according to your needs
I also found following article useful for understanding concept behind those parameters and how could we remove pages getting crawled with unnecessary parameters
http://www.shoutmeloud.com/google-webmaster-tool-added-url-parameter-option-seo.html

How to get such header title and search input in Google search for website

How to get such header title and search input in Google search for website like below:
I am doing my website with core PHP (not any CMS like WordPress, Drupal etc.). So please help me to get such a result in Google.
This is called Sitelinks.
Check it out here: https://support.google.com/webmasters/answer/47334?hl=en
It's an Google automated process and you can't do much to control it. Although a google search on "how to get sitelinks" gives you plenty of results on how to get them, for example, here, or here.
Or perhaps you can purchase them under your AdWords advertisement.
As far as I know, it is much related to PHP. It's more on Search Engine Optimization (SEO).

Vimeo search videos "app not allowed to perform that action"

Man it seems improbably difficult just to get a URL that searches Vimeo videos. They've got feed URLs to get specific users' videos, or info on a specific video, but seemingly not for a generic video search.
From other posts and the docs, I eventually came up with this:
https://api.vimeo.com/videos?query=vimeo&client_id=xxxxxxxxxxxxxx
...where client_id is valid (having registered my app).
However that yields an error that...
the app is not allowed to perform that action.
Any thoughts?
Edit - All API Apps now have access to the beta API by default. If you encounter this error it's a different issue, and you should contact Vimeo at https://vimeo.com/help/contact
It looks like you are trying to use API3.
Currently API3 is in beta, and has to be explicitly enabled by going to the url to edit your app : "developer.vimeo.com/apps/:id/edit" and add the querystring "?oauth2=enabled".
Your final URL should look like "developer.vimeo.com/apps/:id/edit?oauth2=enabled". Now save your app again with the new "OAuth 2" check box clicked and try your search request again.

Track incoming Referring site via link in PDF file?

I have recently placed an ad in a weekly publication that sends out a PDF file. My ad is directly linked so that the reader can click on it and go to my website. The PDF file is hosted on a different server, but is, in fact, a PDF file that has to be downloaded and viewed on that site, not emailed or shared that way. I have Google Analytics and a couple other stats tracking programs installed and I can't see the referring URL from this other site at all, in anything. Is there something I can ask the designer of the PDF file to include in her links to make them trackable? Or is this simply not possible?
Use Google Analytics Campaign Tagging.
This tool will help set it up. You'll want to classify the variables such that the source and the medium are set, at minimum.
http://www.google.com/support/analytics/bin/answer.py?hl=en&answer=55578
So, for example, if your URL is http://example.com, you could set the parameters as such:
utm_source: BlahNews
utm_medium: newsletter
utm_campaign: july10issue
Your resulting URL would be http://example.com/?utm_source=BlahNews&utm_medium=newsletter&utm_campaign=july10issue
Google Analytics would track these hits under that Campaign, Source and medium.
If the URL is displayed raw, and want to avoid 'displaying' an ugly URL, you could setup an internal redirect to that URL, and it looks like you're using WordPress, there are a few free plugins that manage redirects like this (I happen to like 'Redirection')
So, you could tell the plugin to redirect
http://example.com/blahnews TO http://example.com/?utm_source=BlahNews&utm_medium=newsletter&utm_campaign=july10issue
Can you ask them to put some token in the query string of the URL to the site?