Google duplicate content issue for social network applications - seo

I am making a social network application where user will come and share the posts like facebook. But now I have some doubts like lets say a user is just shared a content by coping it from another site and same with the case of images. So does google crawler consider it as a duplicate content or not?
If yes then how I can tell to the google crawler that "don't consider it as a spam, its a social networking site and the content is shared by the user not by the me". Is there any way or any kind of technique that help me.

Google might consider it to be duplicate content, in which case the search algorithm will choose 1 version, which it believes to be the original or more important one and drop the other.
This isn't a bad thing per se - unless you see that most of your site's content is becoming duplicated.
You can use canonical URL declarations to do what you are saying, but i wouldn't advise it.

If your website belongs to one of these types - forum or e-commerce, it will not be punished for duplicate content issue. I think "social platform" is one type of forum.
If your pages are too similar, the result is that the two or more similar pages will scatter the click rate, flow etc, so the rank in SERPs may not look well.
I suggest do not use "canonical" because this instruction tell the crawlers do not crawl/count this page. If you use it, in the webmaster tool, you will see the indexed pages decrease a lot.
Do not too worry about the duplicate content issue. You can see this article: Google’s Matt Cutts: Duplicate Content Won’t Hurt You, Unless It Is Spammy

Related

SEO: secure pages and rel=nofollow

Should one apply rel="nofollow" attribute to site links that are bound for secure/login required pages?
We have a URI date based link structure where the previous year's news content is free, while the current year, and any year prior to the last, are paid, login required content.
The net effect is that when doing a search for our company name in google, what comes up first is Contact, About, Login, etc., standard non-login required content. That's fine, but ideally we have our free content, the pages we want to promote, shown first in the search engine results.
Toward this end, the link structure now generates rel="follow" for the free content we want to promote, and rel="nofollow" for all paid content and Contact, About, Login, etc. screens that we want at the bottom of the SEO search result ladder.
I have yet to deploy the new linking scheme for fear of, you know, blowing up the site SEO-wise ;-) It's not in great shape to begin with, despite our decent ranking, but I don't want us to disappear either.
Anyway, words of wisdom appreciated.
Thanks
nofollow
I think Emil Vikström is wrong about nofollow. You can use the rel value nofollow for internal links. The microformats spec and the HTML5 spec don't say the opposite.
Google even gives such an example:
Crawl prioritization: Search engine robots can't sign in or register as a member on your forum, so there's no reason to invite Googlebot to follow "register here" or "sign in" links. Using nofollow on these links enables Googlebot to crawl other pages you'd prefer to see in Google's index. However, a solid information architecture — intuitive navigation, user- and search-engine-friendly URLs, and so on — is likely to be a far more productive use of resources than focusing on crawl prioritization via nofollowed links.
This does apply to your use case. So you could nofollow the links to your login page. Note however, if you also meta-noindex them, people that search for "YourSiteName login" probably won't get the desired page in their search results, then.
follow
There is no rel value "follow". It's not defined in the HTML5 spec nor in the HTML5 Link Type extensions. It isn't even mentioned in http://microformats.org/wiki/existing-rel-values at all. A link without the rel value nofollow is automatically a "follow link".
You can't overwrite a meta-nofollow for certain links (the two nofollow values even have a different semantic).
Your case
I'd use nofollow for all links to restricted/paid content. I wouldn't nofollow the links to the informational pages about the site (About, Contact, Login), because they are useful, people might search especially for them, and they give information about your site, while all the content pages give information about the various topics.
Nofollow is only for external links, it does not apply to links within your own domain. Search engines will try to give the most relevant content for the query asked, and they generally actively avoid taking the website owners wishes into account. Thus, nofollow will not help you here.
What you really want to do is make the news content the best choice for a search on your company name. A user searching for your company name may do this for two reasons: They want your homepage (the first page) or they more specifically want to know more about your company. This means that your homepage as well as "About", "Contact", etc, are generally actually what the user is looking for and the search engines will show them at the top of their results pages.
If you don't want this you must make those pages useless for one wanting to know more about your company. This may sound really silly. To make your "About" and "Contact" pages useless to one searching for your company you should remove your company name from those pages, as well as any information about what your company does. Put that info on the news pages instead and the search engines may start to rank the news higher.
Another option is to not let the search engine index those other pages at all by adding them to a robots.txt file.

SEO: Allowing crawler to index all pages when only few are visible at a time

I'm working on improving the site for the SEO purposes and hit an interesting issue. The site, among other things, includes a large directory of individual items (it doesn't really matter what these are). Each item has its own details page, which is accessed via
http://www.mysite.com/item.php?id=item_id
or
http://www.mysite.com/item.php/id/title
The directory is large - having about 100,000 items in it. Naturally, on any of the pages only a few items are listed. For example, on the main site homepage, there are links to about 5 or 6 items, from some other page there links to about a dozen different items, etc.
When real users visits the site, they can use search form to find item by keyword or location - so there would be a list produced matching their search criteria. However when, for example, a google crawler visits the site, it won't even attempt to put a text into the keyword search field and submit the form. Thus as far as the bot is concern, after indexing the entire site, it has covered only a few dozen items at best. Naturally, I want it to index each individual item separately. What are my options here?
One thing I considered is to check the user agent and IP ranges and if the requestor is a bot (as best I can say), then add a div to the end of the most relevant page with links to each individual item. Yes, this would be a huge page to load - and I'm not sure how google bot would react to this.
Any other things I can do? What are best practices here?
Thanks in advance.
One thing I considered is to check the user agent and IP ranges and if
the requestor is a bot (as best I can say), then add a div to the end
of the most relevant page with links to each individual item. Yes,
this would be a huge page to load - and I'm not sure how google bot
would react to this.
That would be a very bad thing to do. Serving up different content to the search engines specifically for their benefit is called cloaking and is a great way to get your site banned. Don't even consider it.
Whenever a webmaster is concerned about getting their pages indexed having an XML sitemap is an easy way to ensure the search engines are aware of your site's content. They're very easy to create and update, too, if your site is database driven. The XML file does not have to be static so you can dynamically produce it whenever the search engines request it (Google, Yahoo, and Bing all support XML sitemaps). You can find out mroe about XML sitemaps at sitemaps.org.
If you want to make your content available to search engines and want to benefit from semantic markup (i.e. HTML) you should also make sure your all of content can be reached through hyperlinks (in other words not through form submissions or JavaScript). The reason for this is twofold:
The anchor text in the links to your items will contain the keywords you want to rank well for. This is one of the more heavily weighted ranking factors.
Links count as "votes", especially to Google. Links from external websites, especially related websites, are what you'll hear people recommend the most and for good reason. They're valuable to have. But internal links carry weight, too, and can be a great way to prop up your internal item pages.
(Bonus) Google has PageRank which used to be a huge part of their ranking algorithm but plays only a small part now. But it still has value and links "pass" PageRank to each page they link to increasing the PageRank of that page. When you have as many pages as you do that's a lot of potential PageRank to pass around. If you built your site well you could probably get your home page to a PageRank of 6 just from internal linking alone.
Having an HTML sitemap that somehow links to all of your products is a great way to ensure that search engines, and users, can easily find all of your products. It is also recommended that you structure your site so more important pages are closer to the root of your website (home page) and then as you branch out gets to sub pages (categories) and then to specific items. This gives search engines an idea of what pages are important and helps them organize them (which helps them rank them). It also helps them follow those links from top to bottom and find all of your content.
Each item has its own details page, which is accessed via
http://www.mysite.com/item.php?id=item_id
or
http://www.mysite.com/item.php/id/title
This is also bad for SEO. When you can pull up the same page using two different URLs you have duplicate content on your website. Google is on a crusade to increase the quality of their index and they consider duplicate content to be low quality. Their infamous Panda Algorithm is partially out to find and penalize sites with low quality content. Considering how many products you have it is only a matter of time before you are penalized for this. Fortunately the solution is easy. You just need to specify a canonical URL for your product pages. I recommend the second format as it is more search engine friendly.
Read my answer to an SEO question at the Pro Webmaster's site for even more information on SEO.
I would suggest for starters having an xml sitemap. Generate a list of all your pages, and submit this to Google via webmaster tools. It wouldn't hurt having a "friendly" sitemap either - linked to from the front page, which lists all these pages, preferably by category, too.
If you're concerned with SEO, then having links to your pages is hugely important. Google could see your page and think "wow, awesome!" and give you lots of authority -- this authority (some like to call it link juice" is then passed down to pages that are linked from it. You ought to make a hierarchy of files, more important ones closer to the top and/or making it wide instead of deep.
Also, showing different stuff to the Google crawler than the "normal" visitor can be harmful in some cases, if Google thinks you're trying to con it.
Sorry -- A little bias on Google here - but the other engines are similar.

SEO Question, and about Server.Transfer (Asp.net)

So, we're trying to up our application in the rankings in the search engines, and one way our SEO guy told us to do that was to register similar domains...for example we have something like
http://www.myapplication.com/parks.html
so..we acquired the domain parks.com (again just an example).
Now when people go to http://www.parks.com ...we want it to display the content of http://www.myapplication.com/parks.html.
I could just put a forwarding page there, but from what i've been told that makes us look bad because it's technically a permanent redirect..and we're trying to get higher in the search engine rankings, not lower.
Is this a situation where we would use the Server.Transfer method of ASP.net?
How are situations like this handled, because I've defiantly seen this done by many websites.
We also don't want to cheat the system, we are showing relevant content and not spam or tricking customers in anyway, so the proper way to do achieve what i'm looking for would be great.
Thanks
Use your "similar" domain names to host individual and targetted landing pages that will point to your master content.
It's easier to manage and you will get a higher conversion rate.
Having to create individual page will force you to write relevent content and will increase the popularity of the page.
I also suggest you to not only build landing pages, but mini sites (of few pages).
SEO is sa very high demanding task.
Regarding technical aspects: Server.Transfer is what you should use. Never use Response.Redirect, Google and other search engines will drop your ranking.
I used permanent URL rewrite in the past. I changed my website and since lots of traffic was coming from others website linking mine, I wanted to have a permanent solution.
Read more about URL rewriting : http://msdn.microsoft.com/en-us/library/ms972974.aspx

Is a deep directory structure a bad thing for SEO?

a friend of mine told me that the company he works at are redoing their SEO for their large website. Large == both number of pages and traffic they get a day.
Currently they have a (quote) deeply nested site , which i'm assuming means /x/y/z/a/b/c.. or something. I also know it's very unRESTful from some of the pages i've also seen -> eg. foo.blah?a=1&b=2&c=3......z=24 (yep, lots of crap in the url).
So updating their SEO sounds like a much needed thing.
But, they are going flat. I mean -> totally flat. eg. /foo-bar-pew-pew-abc-article1
This scares the bollox out of me.
From what he said (if i understood him right), each - character doesn't mean a new heirachial level.
so /foo-bar-pew-pew-abc-article1 does not mean /foo/bar/pew/pew/abc/article1
A space could be replace by a -. A + represents a space, but only if the two words are suppose to be one word (whatever that means). ie. Jean-Luke will be jean+luke but if i had a subject like 'hello world, that would be listed ashello-world`.
Excuse me while i blow my head up.
Is this just mean or is it totally silly to go completly flat. To mean, I was under the impression that when SEO people say keep it as flat as possible, they are trying to say keep it to 1 or 2 levels. 4 is the utter max=.
Is this me or is a flat heirachy a 'really really good thing' for seo ... for MEDIUM and LARGE sites (lots of resources, not necessairly lots of hits/page views).
Well, let's take a step back and look at what SEO is supposed to accomplish; it's meant to help a search engine identify quality, relevant content for users based on key phrases and terms.
Take, for example, the following blog URLs:
* http://blog.example.com/articles/2010/01/20/how-to-improve-seo/
* http://blog.example.com/how-to-improve-seo/
Yes, one is deep and the other is flat; but the URL structure is important for two reasons:
URL terms and phrases are high-value targets for determining relevance of a page by a search engine
A confusing URL may immediately force a user to skip your link in the search results
Let's face it: Google and other search engines can associate even the worst URLs with relevant content.
Take, for example, a search for "sears kenmore white refrigerator" in Google: http://www.google.com/search?q=sears+kenmore+white+refrigerator&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a.
Notice the top hit? The URL is http://www.sears.com/shc/s/p_10153_12605_04665802000P , and yet Google replaces the lousy URL with www.sears.com › Refrigerators › Top Freezers. (Granted, 2 results down is the true URL.)
If your goal for SEO is optimized organic relevance, then I would wholeheartedly recommend generating either key/value pairs in the URL, like www.sears.com/category/refrigerators/company/kenmore (meh), or phrase-like URLs like www.sears.com/kenmore/refrigerators/modelNumber. You want to align your URLs with the user's search terms and phrases to maximize your effort.
In the end, if you offer valuable content and you structure your content and site properly, the search engines will accurately gather it. You just need to help them realize how specific and authoritative your content is. :)
Generally the less navigation to reach content the better. But with a logical breadcrumb strategy and well thought out deep linking the excess of directory depth can be managed and not hurt seo and the visibility in search.
Remember that Google is trying to return the most relevant link and the best user experience, so if your site has 3 urls coming up for the same search term and it take 2 or 3 exits to find the appropriate content, Google will read that as bad and start lowering all of your urls in SERPs.
You have to consider how visitors will find your content - not navigate it. Think content discovery and just navigation.
HTH
Flat or deeply nested really shouldn't affect the SEO. The key part is how those individual pages are linked to will determine how they get ranked. I did write some basic stuff on this years ago see here, but essentially if pages are not buried deeply within a site, i.e. it takes several clicks (or links from Google's perspective) then they should rank fairly much the same in either case. Google used to put a lot more weight on keywords in URL's but this has been scaled back in more recent algorithm changes. It helps to have keywords there, but its no longer the be-all and end-all.
What you/they will need to consider are the following two important points:
1) How will the URL structure be perceived by the users of the site? Will they they be able to easily navigate the site and not have to rely on the URL structure in the address bar?
2) In making navigational changes such as this its vitally important to set-up redirects from old url's. Google, hates 404's and they should either put in 410 (Gone) HTTP responses for pages are no longer valid or 301 HTTP response for permanent redirects (with new url).
In making any large changes such as this you can save loads of time getting the site indexed successfully by utilising XML sitemaps and Google's webmaster console.

Does a "blog" sub-domain help the pagerank of your main site?

I have my main application site https://drchrono.com, and I have a blog sub-domain under http://blog.drchrono.com. I was told by some bloggers that the blog sub-domain of your site helps the pagerank of your main site. Does traffic to your blog sub-domain help the Google Pagerank of your site and count as traffic to your main site?
I don't think Google gives any special treatment to sub domains named "blog". If they did, that would be a wide open door for abuse, and they're smart enough to realize that.
At one time, I think there were advantages to putting your blog on a separate subdomain though. Links from your blog to your main site could help with your main site's page rank if your blog has a decent page rank.
However, it seems like that has changed. Here's an interesting post about setting up blog subdomains vs. folders. It seems like they are actually treated the same by Google now, although nobody but Google really knows for sure how they treat them.
With regard to traffic, your Google ranking is only incidentally related to the amount of traffic your site gets. Google rankings are based primarily on content and number & quality of incoming links, not on how much traffic you get. Which makes sense since Google really has no way of knowing how much traffic you get to your site other than perhaps the traffic they send there via Google searches.
Not directly, but...
I do not know if "blog" specifically helps the pagerank of your site in some special way - google guards its pagerank secrets fairly well. If you really wanted to find out, you would create two sites roughly the same content but one with blog in the domain name and one without. Index them and see if the pagerank settings are different. My gut instinct is - no.
It is known that google indexes the name of the site and it improves your chances of getting listed on the search results if the site name corresponds to the search terms. So, it would be reasonable to assume that (unless google specifically removed indexing of the word blog) that when someone searched for a main search term and "blog" the chances of your site showing up would be slightly higher.
For example, it should help searches for: drchrono blog.
By the way, google changes its algorithms all the time, so this is just speculation.
according to an article on hubspot.com
The search engines are treating subdomains more and more as just portions of the main website, so the SEO value for your blog is going to add to your main website domain. If you want your blog to be seen as part of your company, you should it this way (or the next way).
however they go on to say there isn't a big difference between blog.domain.com and domain.com/blog
you can read the full article here: hubspot article on blog domains
One thing using a sub-domain will help is your sites Alexa rank.
Alexa give rank to all pages using your main domain. If you use the Alexa Toolbar you I see all subdomains have the same rank as your main page. So hit's to your sub's will count toward your sites Alexa.
I don't think the subdomain will anything to the pagerank, but however, it might make content easier to find than in a folder.
Let's say you search for something on google, from your page, I could search for
domain:blog.drchrono.com someTopic or articleImLookingFor
Since it is a subdomain, I would guess it counts as traffic to the main site.
Personally, if I was to setup a blog, I would go for the subdomain and would probably set up a redirect from
http://drchrono.com/blog to
http://blog.drchrono.com
blog.domain.tld and www.domain.tld are not treated as unrelated sites, assuming they're handled by the same final ns authority. It has never been clear to me if pages are ranked entirely independently or if a reputation for a domain and hence it's subdomains figures into it beyond just being linked to.
But if I read your question differently, I'd say there's no difference in doing either:
I've tried setting up pages at both photos.domain.tld/stuffAboutPhotos and www.domain.tld/photos/stuffAboutPhotos for a month at a time. I found no noticeable difference between the search engine referral rates.
But then it's actually hard to do this independently of other factors.
Therefore I conclude that despite the human logic indicating that the domain is more important, there is no advantage to putting a keyword in the domain as opposed to the rest of the url, except to be sure it's clearly delimited (use slash, dash, or underscore in the rest of the url).
If Google has a shortlist of keywords that do rank better in a domain name than in the rest of the url, they're definitely not sharing it with anyone not wearing a Google campus dampened exploding collar.
Google treat a subdomain as a domain. If this wasn't true, then all those blogspot blogs would have had a higher SERPS.
With subdomains it is a bit easier as Google "knows" it is a "separate" site. With sub-directories it is tricky. Though, with sub-domains it is the same. Google would rank these ones anything between PR0 and PR3 in the past year, currently:
PR1: of-cour.se
Cheers!
Not really. Blogs do do some nice things to the SEO for your sites, but if they're inside the site it doesn't work the same.
A better option is have a completely separate domain that contains the blog (something like drchronoblog.com), and have lots of links from the blog site to the main site.
That way search engines see the links but do not make the connection between the blog and the main site, and thus it makes your page rank better.
It wont give your site higher priority just because you have a blog. subdomain.
But im sure more people will find your site if they search for blogs..´
And therefore more traffic´, more traffic, more visits though the search engines and so on..
So id say yes :)
Since PageRank is dealing with the rank on search engine. Let's make a little test:
https://www.google.com/search?q=blog
you may see that
example.com/blog
rank higher than
blog.example.com
This almost in the same figure for whatever domains.
However when it were possible, I will fight more to get blog.wordpress.com as it treated on any search engine as my own profile than a folder named wordpress.com/blog that for sure still belong to wordpress.com.
The only way a blog can help you as far as SEO depends on the content in your blog. Just having a blog isn't enough.