How do sites like Hubspot track inbound links? - api

Are all these types of sites just illegally scraping Google or another search engine?
As far as I can tell ther is no 'legal' way to get this data for a commercial site.. The Yahoo! api ( http://developer.yahoo.com/search/siteexplorer/V1/inlinkData.html ) is only for noncommercial use, Yahoo! Boss does not allow automated queries etc.
Any ideas?

For example, if you wanted to find all the links to Google's homepage, search for
link:http://www.google.com
So if you want to find all the inbound links, you can simply traverse your website's tree, and for each item it finds, build a URL. Then query Google for:
link:URL
And you'll get a collection of all the links that Google has from other websites into your website.
As for the legality of such harvesting, I'm sure it's not-exactly-legal to make a profit from it, but that's never stopped anyone before, has it?
(So I wouldn't bother wondering whether they did it or not. Just assume they do.)

I don't know what hubspot do, but, if you wanted to find out what sites link to your site, and you don't have the hardware to crawl the web, one thing you can do is monitor the HTTP_REFERER of visitors to your site. This is, for example, how Google Analytics (as far as I know) can tell you where your visitors are arriving from. This is not 100% reliable as not all browsers set it, particularly in "Privacy Mode", but you only need one visitor per link to know that it exists!
This is ofter accomplished by embedding a script into each of your webpages (often in a common header or footer). For example, if you examine the source for the page you are currently reading you will find (right down at the bottom) a script that reports back to Google information about your visit.
Now this won't tell you if there are links out there that no one has ever used to get to your site, but let's face it, they are a lot less interesting than the ones people actually use.

Related

Track how often link was clicked

I am currently running a website where I promote different coffees from pubs in my city. On my website I have links to the different coffees.
I have recently seen some of this links being shared on Facebook and other social networks.
So I was wondering if it is somehow possible to track how often one of this links are being clicked?
I have tried using redirects to my site but Facebook uses my pictures in the previews, whereas I don't want this because it is misleading.
I have seen that this works with Bitly so it must somehow be possible?
And there are of course different services providing this, but it would be nice if it would run without any foreign services.
So basically I am looking for a solution which will let me know how often a link, origination from my site was clicked in Facebook, Google+ or any other forum.
There definitely is. Try looking into Google Analytics, it will show you show much data from your personal websites and links that it can blow your mind! Here is the link
Google Analytics helps you analyze visitor traffic and paint a
complete picture of your audience and their needs. Track the routes
people take to reach you and the devices they use to get there with
reporting tools like Traffic Sources. Learn what people are looking
for and what they like with In-Page Analytics. Then tailor your
marketing and site content for maximum impact.
You can even get a free package to use!
Hope this helps!
Yes you have plenty of analytical options.
Something as straight forward as Google Analytics for example.
If you are using cpanel on your hosts server, you even have options such as AWSTATS, which will also provide information.
If all else fails you can even use post data stored in your apache / nginx logs.
Since you have amended your question you might want to check out this tool. It is not google. :)
It is called Click Meter and performs Link Tracking and provides click reports, etc

Automatic Google Indexing

Implemented Google site search in our company website. We need to automate the google indexing for our website.
Suppose like our customers are updated the forum. We need to show the up to updated forum information in our forum search ?
Is there any option in google API or any other API please help me ?
You can use an XML sitemap. This will tell the search engines where your content is so they can find it and crawl it. Keep in mind there is no way to make the search engines crawl your site when you want them to. They will crawl on a schedule they determine to be right for your site. (You can set a crawl rate in Google Webmaster Tools but that rate is relative to what crawl rate Google already has set for you. Setting it to fastest will not speed up heir crawl rate)).
Unfortunately, Google will only crawl your site when it feels like it. It is based on many variables to determine how often this occurs (i.e. site ranking, standards compliance, and so on). The sitemap XML is a helpful way to help Google determine what parts of your site to index, however if you don't have one Google will find it by crawling links on other parts of your page and updating its index if the page changes.
The more visitors you get and the more often your site's links appear on other sites will make Google index more frequently.
To start, I'd suggest http://validator.w3.org/ to validate your site and make sure you get it as close to possible to no errors. This makes it easier for Google to index your site because it can find the information it expects without having to crawl over invalid markup. Also, chances are, if a site validates with a very small amount of errors, it is more credible than one containing many errors. It tells the search engine that you update your site to ensure most all browsers can use it and that it is accessible.
Also validating your site gives you some bragging rights over those who don't meet W3 standards :)
Hope this helps!

How is it possible for new content to appear in Google results mere minutes after it is created?

For example, when I post to Stackoverflow, the post appears in the Google index a minute later. How is this accomplished? What do I have to do to my web-site to get the same frequency of indexing?
You could start by:
getting 65,000-odd regular users on your site.
making your site linked to from all over the place.
make your site very active.
providing very useful content.
This is all standard SEO stuff which will up your "importance" in the eyes of Google (and other search engines, presumably, but who cares :-).
The faster a page changes, the more google will re-index it.
Obviously, if your site is "important" enough for google.
You should check out Google Webmaster Tools here http://www.google.com/webmasters/tools
To help with indexing from Google, but also Yahoo and MS, you'll want to use the sitemap protocol, see http://en.wikipedia.org/wiki/Sitemaps .
Simply put, if you want do that you, first, need to lure Google robot to you site.
To do this you should do those things:
Building as much hyperlinks to high-ranked, active, relevant sites as possible.
make your own site active. In this way, google believes your site is worthwhile to visit frequently!
In addition to this, you can provide premier content and structure(site map).
To sum all of them up, you need build a great site in the eyes of search engines!
Good luck!

Is this a blackhat SEO technique?

I have a site which has been developed completely in flash. Now the site owners do not want to shift to a more text/html based site. So am planning to create an alternative html/text based site which the googlebot will get redirected to. (By checking the useragent). My question is that is this allowed officially by google?
If not then how come there are many subscription based sites which display a different set of data to google compared to the users? Is that allowed?
Thank you very much.
I've dealt with this exact scenario for a large ecommerce site and Google essentially ignored the site. Google considers it cloaking and addresses it directly here and says:
Cloaking refers to the practice of presenting different content or URLs to users and search engines. Serving up different results based on user agent may cause your site to be perceived as deceptive and removed from the Google index.
Instead, create an ADA compliant version of the website so that users with screen readers and vision aids can use your web site. As long as there as link from your home page to your ADA compliant pages, Google will index them.
The official advice seems to be: offer a visible link to a non-flash version of the site. Fooling the googlebot is a surefire way to get in trouble. And remember, Google results will link to the matching page! Do not make useless results.
Google already indexes flash content so my suggestion would be to check how your site is being indexed. Maybe you don't have to do anything.
I don't think showing an alternate version of the site is good from a Google perspective.
If you serve up your page with the exact same address, then you're probably fine. For example, if you show 'http://www.somesite.com/' but direct googlebot to 'http://www.somesite.com/alt.htm', then Google might direct search users to alt.htm. You don't want that, right?
This is called cloaking. I'm not sure what the effects of it are but it is certainly not whitehat. I am pretty sure Google is working on a way to crawl flash now so it might not even be a concern.
I'm assuming you're not really doing a redirect but instead a PHP import or something similar so it shows up as the same page. If you're actually redirecting then it's just going to index the other page like normal.
Some sites offer a different level of content -- they LIMIT the content, they don't offer alternative and additional content. This is done so it doesn't index unrelated things generally.

How to find inbound links to a given URL on the fly?

Technorarati's got their Cosmos api, which works fairly well but limits you to noncommercial use and no more than 500 queries a day.
Yahoo's got a Site Explorer InLink Data API, but it defines the task very literally, returning links from sidebar widgets in blogs rather than just links from inside blog content.
Is there any other alternative for tracking who's linking to a given URL (think of the discussion links that run below stories on Techmeme.com)? Or will I have to roll my own?
Well, it's not an API, but if you google (for example): "link:nytimes.com", the search results that come back show inbound links to that site.
I haven't tried to implement what you want yet, but the Google search API almost certainly has that functionality built in.
Is this for links to Urls under your control?
If so, you could whip up something quick that logs entries in the Referrer HTTP header.
If you wanted to do to this for an entire web site without altering application code, you could implement as an ISAPI filter or equivalent for your web server of choice.
Information available publicly from web crawlers is always going to be incomplete and unreliable (not that my solution isn't...).