How do the Facebook like button and Google +1 button deal with a redirected url? [duplicate] - seo

I understand the og:url meta tag is the canonical url for the resource in the open graph.
What strategies can I use if I wish to support 301 redirecting of the resource, while preserving its place in the open graph? I don't want to lose my likes because i've changed the URLs.
Is the best way to do this to store the original url of the content, and refer to that? Are there any other strategies for dealing with this?
To clarify - I have page:
/page1, with an og:url of http://www.example.com/page1
I now want to move it to
/page2, using a 301 redirect to http://www.example.com/page2
Do I have any options to avoid losing the likes and comments other than setting the og:url meta to /page1?

Short answer, you can't.
Once the object has been created on Facebook's side its URL in Facebook's graph is fixed - the Likes and Comments are associated with that URL and object; you need that URL to be accessible by Facebook's crawler in order to maintain that object in the future. (note that the object becoming inaccessible doesn't necessarily remove it from Facebook, but effectively you'd be starting over)
What I usually recommend here is (with examples http://www.example.com/oldurl and http://www.example.com/newurl):
On /newpage, keep the og:url tag pointing to /oldurl
Add a HTTP 301 redirect from /oldurl to /newurl
Exempt the Facebook crawler from this redirect
Continue to serve the meta tags for the page on http://www.example.com/oldurl if the request comes from the Facebook crawler.
No need to return any actual content to the crawler, just a simple HTML page with the appropriate tags
Thus:
Existing instances of the object on Facebook will, when clicked, bring users to the correct (new) page via your redirect
The Like button on the (new) page will still produce a like of the correct object (but at the old URL)
If you're moving a lot of URLs around or completely rewriting your URL scheme you should use the new URLs for new articles/products/etc, but you'll need to keep the redirect in place if you want to retain likes, comments, etc on the older content.
This includes if you're changing domain.
The only problem here is maintaining the old URL -> new URL mapping somewhere in your code, but it's not technically difficult, just an additional thing to maintain in the future.
BTW, The Facebook crawler UA is currently facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)

I'm having the same problem with my old sites. Domains are changing, admins want to change urls for seo etc
I came to conclusion its best to have some sort uniqe id in db just for facebook - from the beginning. For articles for example I have myurl.com/a/123 where 123 is ID of the article.
Real url is myurl.com/category/article-title. Article can then be put in different category, renamed etc with extensive logic for 301 redirects behind it. But the basic fb identifier can stay the same for ever.
Of course this is viable only when starting with a fresh site or when implementing fb comments for the first time.
Just an idea if you can plan ahead :) Let me know what you think.

Related

Deprecated domain in google index

We have got a deprecated domain www.deprecateddomain.com. Specific fact is that we have got reverse proxy working and redirecting all requests from this domain to the new one www.newdomain.com.
The problem is when you type "deprecateddomain.com" in google search, there is a link to www.deprecateddomain.com in search results besides results with "newdomain.com". It means that there is such entries in google index. Our customer don't want to see links to old site.
We were suggested to create fake robots.txt with Disallow: / directive for www.deprecateddomain.com and reverse proxy rules to get this file from some directory. But after investigation the subject I started hesitating that it will help. Will it remove entries with old domain from index?
Why not to just create the request in search console to remove www.deprecateddomain.com from index? In my opinion it might help.
Anyway, I'm novice in this question. Could you give me advice what to do?
Google takes time to remove old/obsolete entries from its ranking, especially on low visited or low value pages. You have no control on it. Google needs to revisit each page to see the redirection you have implemented.
So DO NOT implement a disallow on the old website, because it will make the problem worse. Bots won't be able to crawls those pages and see the redirection you have implemented. So they will stay longer in the rankings.
You must also make sure you implement a proper 301 redirection (i.e. a permanent one, not a temporary) for all pages of the old website. Else, some pages may stay in the ranking for quite some time.
If some pages are obsolete and should be deleted rather than redirected, return a 404 for them. Google will remove them quickly from its index.

Is rel=self the correct rel tag to use for forum permalinks?

I have been building a forum from scratch with my friends just for fun, and we're starting to see bots and scrapers go by. The problem we're having is that you can load a page /post/1 with four replies, and each reply includes a little permalink to itself /reply/1#reply-1. If I am on /post/1 and navigate to /reply/1, I'll end up right back where I started, just with the anchor to the reply. But! Scrapers have no idea this is the case, so they're opening every /post link and then following every /reply link, and it's causing performance issues, so I've been looking around SEO sites to try to fix it.
I've started using rel=canonical on the /reply page, to tell the bots they're all the same, but as far as I can tell that doesn't help me until the bot has already loaded the page, and thus I wind up with tons of traffic. Would it be correct to change my
Permalink
tags to
Permalink
since they should be the same content? Or would this be misusing rel="self" and there's another, better rel tag I should be using instead?
The self link type is not defined for HTML (but for Atom), so it can’t be used in HTML5 documents.
The canonical link type is appropriate for your case (if you make sure that it always points to the correct page, in case the thread is paginated), but it doesn’t prevent bots from crawling the URLs.
If you want to prevent crawling, no link type will help (not even the nofollow link type, but it’s not appropriate for your case anyway). You’d have to use robots.txt, e.g.:
User-agent: *
Disallow: /reply/
That said, you might want to consider changing the permalink design. I think it’s not useful (neither for your users nor for bots) to have such an architecture. It’s a good practice to have exactly one URL per document, and if users want to link to a certain post, there is no reason to require a new page load if it’s actually the same document.
So I would either use the "canonical" URL and add a fragment component (/post/1#reply-1, or what might make more sense: /threads/1#post-1), or (if you think it can be useful for your users) I would create a page that only contains the reply (with a link back to the full thread).

How do I create "internal Outlinks" for a better SEO?

I was searching on the web after I analyzed the link structure of Yoast. There he uses links to redirect users to a different page.
Here a example:
https://yoast.com/out/synthesis/
Can someone tell me what this is called, or how I create such links as well?
It's actually really simple. He isn't using it for SEO purposes since it's just a 301 redirect. He is purposefully hiding the affiliate url AND adding 'onclick' Google Analytics tracking to the link. Also - the "/out/" directory is being blocked by robots.txt and then redirect's back to the index page.
To answer your question:
This is not for SEO reasons. He is using it for both tracking click and hiding his affiliate link/url.
These are called internal links, when you link to you one of your domain or subdomain pages. Internal links adds values for SEO as it makes the crawlers aware of those existing pages. There are many options for generating internal links. It depends on your page structure etc. Some of the common options are by using html sitemap like trip advisor's does, using header and footer. For html sitemaps, go to http://www.tripadvisor.com/, scroll all the way bottom to the footer section. There you can sitemap link, which is a path way for many internal links.

Removing URL duplicates when using pretty urls

I'm using pretty URLs in my web app, one example is 'forum/post/1' which invokes PostController in Forum module, which loads a post with id=1. This is what I need but that post is also accessible from 'forum/post/view/id/1'. That's bad, because search crawlers don't like when same page is accessible from several URLs, right?
I'm using Yii framework which supports 'useStrictParsing' option, which tells that incoming request must match at least one "pretty" route, otherwise request fails with 404. However it's not a perfect solution, because I don't have pretty URLs for every controller/action.
Ideally, framework should redirect 'forum/post/view/id/1' to 'forum/post/1' with a 301 status code. How did you solve this problem? It's not Yii/PHP specific question, how does your framework/tool deal with it?
The best way to make sure search engines only rank one page the pretty url over another, if there are multiple ways to view the content is to your a canonical tag within the header of your document
<link rel="canonical" href="http://www.mydomain.com/nice-url/" />
This is very useful with windows based system as IIS is not case sensitive with its web pages but the web standard is case sensitive.
So
www.maydomain.com/Newpage.aspx
www.maydomain.com/newpage.aspx
www.maydomain.com/NEWPAGE.aspx
These are all seen by Google as different pages, and you are then marked down for having a site with duplicate content, but not so with a canonical as each page in the case above would have the same canonical meta tag and the that url is the only one which will be used by the search engines.
Provided that no one links to your non-pretty urls, the search engines will never know that they exist.
If you do want to eliminate them, you could bypass your web framework by adding an alias in you web server's configuration file; the url will be redirected before it ever reaches the framework.
Frameworks like Django, which don't provide 'magic' routing, don't face this issue, the only routes which exist are those which you define manually. In it's case, you could define a view for the non-pretty url which returns the appropriate redirect.

Google Page Rank - New Domain / Link Structure Migration

i've been tasked with re-organizing a pure HTML site into a CMS. if all goes well, the new site will eventually become the main URL, and the old domain will be phased out. the old domain has a decent enough page rank, and the company wishes to mitigate any loss of page rank for that. in looking over the options available, i've discovered a few things:
it's better to use a 301 redirect when you're ready to make the switch (source).
the current site does not have a sitemap, so adding one and submitting it may help their future page rank.
i'll need to suggest to them that they contact people currently linking to them to update their links.
the process for regaining an old page rank takes awhile, so plan on rebuilding links while we see if the new site is flexible enough to warrant switching over completely.
my question is: as a result of a move to a CMS driven site, the links to various pages will change to accommodate the new structure. will this be an issue for trying to maintain (or improve) the current page rank? what sort of methods are available to mitigate the issue of changing individual page URL's? is there a preferable method beyond mapping individual pages to their new locations with 301 redirects? (the site has literally hundreds of pages, ugh...)
ex.
http://domain.com/Messy_HTML_page_with_little_categorization.html ->
http://newdomain.com/nice/structured/pages.php
i realize this isn't strictly a programming question, however i felt the information could be useful to developers who are tasked with handling this sort of thing in addition to development of the site.
edit: additions in italics
If you really truly want to ensure that page rank is not lost, you will want to replace the old content with something that performs a proper 301 redirect to the new location. With a 301 redirect the search spiders will know that the content is moved and the page rank typically carries over. It also helps external links.
However, the down side is that after a certain period of time you just have to get rid of the old domains.
You can make a handler for HTML files and map the old pages to the new structure with a 301 redirect.