URL scheme for a multi-version web app - seo

I'm looking for the best URL schema to use for a web app that has multiple versions, namely several languages and a simplified version for use by mobile phones - both aspects can be combined, so there's an English regular and mobile version, a German regular and mobile version, etc.
Goals (in order of importance):
User-friendliness
Search engine friendliness
Ease of development
Aspects to consider:
How should the URLs look like?
How should the user navigate between versions?
How much logic should there be to automatically decide on a version?
I'll describe my concept so far below, maybe some of you have better ideas.

My current concept:
When a new user arrives, the app decides, based on cookies (see below), the Accept-Language: header and the user agent string (used to identify mobile browsers) which version to show, but does not reflect this in the URL (no redirects)
It defaults to the non-simplified English version
There are prominently displayed icons (flags, a stylized mobile phone) to choose other versions
When the user explicitly chooses a different version, this is reflected both in a changed URL and a browser cookie
The URL schema is / for the "automatic" version, /en/, /de/, etc. for the language version, /mobile/ for the simplified version, /normal/ for the non-simplified one, and combinations thereof i.e. /mobile/en/ and /normal/de/
mod_rewrite is used to strip these URL prefixes and convert them to GET parameters for the app to parse
robots.txt disallows /mobile/ and /normal/
Advantages:
The different language versions are all indexed separately by search engines
Cookies help, but are not necessary
There'S a good chance that people will see the version that's ideal for them without having to make any choice
The user can always explicitly choose which version he wants (this makes the /normal/ URL necessary)
Each version has an URL which will display exactly that version when passed to others
/mobile/ and /normal/ are ignored by search engines; they would only be duplicate content.
Disadvantages:
Requires heavy use of mod_rewrite, which I find rather cryptic
Users could send their current URL to someone and that person, when visiting it, could end up seeing a different version, which could cause confusion
There is still duplicate content between / and /en/ - I can't disallow / in robots.txt - should I trust the search engines not to penalize me for exact duplicate content on the same domain, or disallow /en/ and accept that people coming to / via a search engine may see a different version than what they found in the search engine?

I suggest subdomains, personally.
I wouldn't include the mobile at all - use the useragent to determine this, and possibly a cookie incase the user wants to view the full site on their mobile (think how Flickr and Google do it). But for languages, yes - primary language at http://mydomain.com/, secondary languages at i.e. http://de.mydomain.com/ or http://fr.mydomain.com/

I am unclear why you would want to incorporate any kind of what you call versioning information, such as accept-language or user-agent, specific designation in the URL scheme. The URL scheme should be indicative of the content only. The server should investigate the various request headers to determine how to retrieve and/or format the response.

Related

How to geotarget website with multiple languages but one link only?

I have a website with two languages, which works in this format:
example.com/changelanguage.xx?lang=de
and redirects to German language
and calling the same URL again like:
example.com/changelanguage.xx?lang=​en
redirects to English language.
The URL remains the same example.com after redirection, just the language changes.
How to add the hreflang attribute here (for Google indexing)?
It’s a bad practice to use the same URL for different (i.e., translated) content.
Consumers, like search engine bots, would use rel-alternate + hreflang markup to find translations. For this to work, you have to provide a different URL for the translated page.
From the perspective of the search engine, it doesn’t work for their users if using the same URL: when they give http://example.com/foobar as search result, they want to make sure that their users get the language the search engine intended (e.g., someone searching for German terms should get the German page). But with your system, this doesn’t work; the search engine user might end up with the English version.
Instead, you should represent the language in the URL, e.g. the language code as first path segment:
http://example.com/en/contact
http://example.com/de/kontact
(Or use different domains/subdomains, or add a query parameter, etc. If you can make sure that translated pages would never have the same URL slug, you could even omit the language codes.)
This is a year late but https://www.bablic.com/ do exactly this!
Furthermore they can automatically detect the language set in the user's browser and automatically show the user your website in that language!

Duplicate content and international sites clarification

Something is not clear, here is my case:
i want to have have the same content for us and uk people,
could i safely avoid duplicate content with thoses url:
www.example.us/info.html (hosted on us server)
www.example.co.uk/info.html (hosted on uk server)
from google :
Websites that provide content for different regions and in different languages sometimes create content that is the same or similar but available on different URLs. This is generally not a problem as long as the content is for different users in different countries. While we strongly recommend that you provide unique content for each different group of users, we understand that this might not always be possible. There is generally no need to "hide" the duplicates by disallowing crawling in a robots.txt file or by using a "noindex" robots meta tag. However, if you're providing the same content to the same users on different URLs (for instance, if both example.de/ and example.com/de/ show German language content for users in Germany), you should pick a preferred version and redirect (or use the rel=canonical link element) appropriately. In addition, you should follow the guidelines on rel-alternate-hreflang to make sure that the correct language or regional URL is served to searchers.
Seems not clear for me, what do you think about my case ?!
flau
Go for hreflang. When implemented properly, you will avoid all duplicate content issues.
if you're providing the same content to the same users on different URLs (for instance, if both example.de/ and example.com/de/ show German language content for users in Germany), you should pick a preferred version and redirect (or use the rel=canonical link element) appropriately. In addition, you should follow the guidelines on rel-alternate-hreflang to make sure that the correct language or regional URL is served to searchers
That covers your scenario:
Choose one as your preferred URL for the US and make it redirect (or use canonical), and
Follow hreflang guidelines: https://support.google.com/webmasters/answer/189077?hl=en

How to get sites identical in content but different in language and TLD indexed by major search engines?

Is it possible to get two "editions" of a website both indexed by the major search engines (Google/Yahoo/Bing/Teoma) which differ in content language only and are hosted under different TLDs?
Say English content is available at "http://domain.com/", German content at "http://domain.de/". Now, if e.g. Google.com is used I want it to list the "domain.com" entry and vice versa. Is "Duplicate Content" an issue here?
Depending on website software you use (wordpress, joomla, custom, etc), you might have a plugin or addon for each that supports multiple domains and search-engine pinging/seo. If that's the case, it should be possible.
I'm assuming your website layout is the same but you have a ".com" and ".de" TLD pointing to the same directory/software installation and a (auto?) language selector to choose between English and German.
Edit: (for quick readers)
It shouldn't need separate webspace for each site. What I do for my sites to get them submitted is use Sitemaps. I've never generated one myself, so I can't help in that aspect. However, you could generate sitemaps for each language (e.g. sitemap.en.xml.gz | sitemap.de.xml.gz) and have your application ping search engines with these sitemaps. Essentially, you'll have the same content but in different languages and it'll be in a sitemap which can be submitted to google/bing/yahoo/etc.
I used this method on a wordpress blog I had and every time I submitted/changed content, it would re-generate sitemaps (updating links/etc) and ping the search engines again.

Will rel=canonical break site: queries?

Our company publishes our software product's documentation using a custom-built content management system using a dynamic URL namespace like this:
http://ourproduct.com/documentation/version/pageid
Where "version" is the version number to which the documentation applies, and "pageid" is a unique string which identifies that page in our back-end content management system. For example, if content (e.g. a page about configuration best practices) is unchanged from version 3.0 and 4.0 of our product, it'd be reachable by two different URLs:
http://ourproduct.com/documentation/3.0/configuration-best-practices
http://ourproduct.com/documentation/4.0/configuration-best-practices
This URL scheme allows us to scope Google search results to see only documentaiton for a particular product version, like this:
configuration site:ourproduct.com/documentation/4.0
But when the user is searching across all versions, we don't want Google to arbitrarily choose one of the URLs to show in results. Instead, we always want the latest version to show up. Hence our planned use of rel=canonical so we can proscriptively tell Google which URL we want to show up if multiple versions are being searched. (Users who do oddball things like searching 2 versions but not all of them are a corner case, so we don't care which version(s) show up in that case-- the primary use-cases we care about is searching one version or searching all versions)
But what will happen to scoped searches if we do this? If my rel=canonical URL points to version 4.0, but my search is scoped to 3.0, will Google return a result?
Even if you don't know the answer offhand, do you know a site which uses rel=canonical to redirect across folders in a URL namespace. If so, I could run a few Google searches and figure out the answer.
The rel=canonical link element helps search engines to determine the URL that they should index, so ultimately, by specifying it for a URL, you're telling them to drop the old version and only to index the new version. In practice, it might be that both versions are indexed for a while (depending on how they're discovered and crawled), but in the long run only the canonical will generally remain indexed. In other words, if you do this for your site, over time the site:-query results for the old versions will drop (which probably makes sense).
If you need to have both versions indexed, then I wouldn't use the rel=canonical link element, I'd just link from the old versions to the new versions (eg "The current version of this document can be found at X").
Wikia uses rel=canonical link elements fairly extensively, though I don't think they use it in folders, but you can still see the results for individual URLs.

Redirect depending on the Country?

We basically have 2 sites ( Java /JSP / Apache Webserver) :
something.ca & something.com
The .ca is canadian content, and the .com is american content.
We need users to be redirected based on the ip addreess.
We want US users to get the .com site and Canadian users get the .ca site.
What is the best way to do this (at a webserver level or otherwise ) ?
Please elaborate.
In my web surfing experience, most websites - UPS.com for example - ask the user to select their country site rather than trying to figure it out themselves. They remember the selection in a cookie. Much depends on how voluntary your use case requires this redirection to be.
On the implementation side, I'd use a filter that would check the setting and fire a redirect if need be.
I've used GeoIP from Maxmind and it works well. They have a free version GeoCountry Lite That's 99.3% accurate. the Java API is here I would follow google's practice of having a link back to the original version if you do the redirect.
Check out GeoDirection. It may handle what you want through javascript.
http://www.geobytes.com/GeoDirection.htm
Another option would be to grab the culture from the browser environment settings and map those cultures to countries in your application. Depending on what you are actually trying to do this may not work for you as this will not give you the user's physical location, but will get you their preferred culture. So if a Canadian travels to the US they will still get the Canadian site unless they changed their browser settings for some reason.
There are a lot of IP geolocation APIs out there - I don't know if there's anything good out there that you don't have to pay for:
Using culture settings is an option, but doesn't work in some cases. What if you have a German user in the US who likes his dates etc. displayed in the format he's comfortable with? Doesn't change the fact that he's in the US.
I think that's one of the reasons why most companies simply ask the user and then store that information in a cookie (UPS, FedEx and most major airlines do that). Check out www.lufthansa.com. They actually ask for location and language(to account for countries with more than one official language like Switzerland).