Redirect based on Accept-Language - apache

I need to honor the web browser's list of language preferences. Supported languages are English and French. For example: http_accept_language="jp-JP;fr;en-US;en" redirects to a directory called /French/. How can I do this with rewrite rules in my .htaccess file?

I wouldn’t use mod_rewrite for this but a more powerful language. Because Accept-Language is a list of weighted values (see quality value) and the occurrence of one of the identifiers does not mean that it’s preferred over another value (especially q=0 means not acceptable at all).
As already said, use a more powerful language than mod_rewrite, parse the list of value and find the best match of preferred options and available options.

Related

Category & product url SEO

What is considered to be best practice for url structuring these days?
for some reason i thought including an extension at the end of a url was once you got down to the 'lowest' part of your hierarchy e.g.
/category/sub-category/product.html
then all category urls would be:
/category/sub-category/
rather than including an extension at the end because there is still further to go down the structure.
looking forward to your thoughts.
Andy.
EDIT
Just for clarification purposes: I'm looking at this from an ecommerce perspective.
Your question is not very clear, but I'll reply as I understand it.
As to use or not to use file extensions, according to Google's representative Matt Cutts, Google crawls .html, .php, or .asp, but you should keep away from .exe, .dll,.bin. They signify largely binary data, so may be ignored by Googlebot.
Still, when designing SEO friendly URLs, keep in mind they should be short and descriptive, so you may use your keywords to rank higher. So, if you have good keywords in your category names, why not let them be visible in the URL.
Make sure you're using static instead of dynamic URLs, they are easier to remember, and they don't change.

automatic language translation and SEO

I'm looking into translating a large UK English site into a number of other european languages. I was wondering what are the free options out there for automatic translation?
Also, in regards to SEO, how do search engines treat language copies of web pages in regards to the duplicate content rules?
Thanks
My limited experience in the matter is that the big G treats automatic language translation as duplicate content. It seems that the DC detection algorithms are language-agnostic. However, when I hand-translate into languages that I know, the 'new' pages rate highly. In fact, I would say that translating highly-rated (PR 4 and above) pages leads to better-performing pages (more search engine landings, and more varied terms as well) then even new original-content pages.
I have done no comparisons in this regard to other search engines, as they typically supply less than 10-20% of my traffic anyway.
You'd be better off hiring translators to write language specific versions that aren't simply translated versions of your English copy. You'll end up with better results, too.
The majority of your audience can probably understand English well enough to navigate your site, so I think you might be putting too much thought into this.
If someone finds this question nowadays, know that today there are multiple free and paid translation tools that can be used, and about the duplicate content because of the other languages, you can set up metatags to tell that your content has alternate languages and tell the search engines which one is the cannonical content, you can have a look at it in Google Documentation for examples:
https://developers.google.com/search/docs/specialty/international/localized-versions

Robots.txt: Is this wildcard rule valid?

Simple question. I want to add:
Disallow */*details-print/
Basically, blocking rules in the form of /foo/bar/dynamic-details-print --- foo and bar in this example can also be totally dynamic.
I thought this would be simple, but then on www.robotstxt.org there is this message:
Note also that globbing and regular expression are not supported in either the User-agent or Disallow lines. The '*' in the User-agent field is a special value meaning "any robot". Specifically, you cannot have lines like "User-agent: bot", "Disallow: /tmp/*" or "Disallow: *.gif".
So we can't do that? Do search engines abide by it? But then, there's Quora.com's robots.txt file:
Disallow: /ajax/
Disallow: /*/log
Disallow: /*/rss
Disallow: /*_POST
So, who is right -- Or am I misunderstanding the text on robotstxt.org?
Thanks!
The answer is, "it depends". The robots.txt "standard" as defined at robotstxt.org is the minimum that bots are expected to support. Googlebot, MSNbot, and Yahoo Slurp support some common extensions, and there's really no telling what other bots support. Some say what they support and others don't.
In general, you can expect the major search engine bots to support the wildcards that you've written, and the one you have there looks like it will work. Best bet would be to run it past one or more of these robots.txt validators or use Google's Webmaster tools to check it.

SEO: Multiple languages [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 5 years ago.
Improve this question
Okay, I know this question have been asked plenty of times already, but I haven't found any actual answer.
Considering SEO, what is the best way to construct the URL for multiple languages? One top-level domain for each language would feel unnecessary, so I'm thinking about different subdomains or sub-folders. And in that case, which would be better - en.mydomain.com or english.mydomain.com? And if eg. the english version is more viewed than the swedish version, how do I tell the search engines that they actually are the same page?
Pretty everything is answered in this Google Webmasters article: Multi-regional and multilingual sites.
Here's a summary of relevance:
URL structures
Consider using a URL structure that makes it easy to geotarget parts of your site to different regions. The following table outlines your options:
ccTLDs (country-code top-level domain names)
Example: example.de
Pros:
Clear geotargeting
Server location irrelevant
Easy separation of sites
Cons:
Expensive (and may have limited availability)
Requires more infrastructure
Strict ccTLD requirements (sometimes)
Subdomains with gTLDS (generic top-level domain name)
Example: de.example.com
Pros:
Easy to set up
Can use Webmaster Tools geotargeting
Allows different server locations
Easy separation of sites
Cons:
Users might not recognize geotargeting from the URL alone (is "de" the language or country?)
Subdirectories with gTLDs
Example: example.com/de/
Pros:
Easy to set up
Can use Webmaster Tools geotargeting
Low maintenance (same host)
Cons:
Users might not recognize geotargeting from the URL alone
Single server location
Separation of sites harder
URL parameters
Example: example.com?loc=de
Pros:
Not recommended.
Cons:
URL-based segmentation difficult
Users might not recognize geotargeting from the URL alone
Geotargeting in Webmaster Tools is not possible
Duplicate content and international sites
Websites that provide content for different regions and in different languages sometimes create content that is the same or similar but available on different URLs. This is generally not a problem as long as the content is for different users in different countries. While we strongly recommend that you provide unique content for each different group of users, we understand that this may not always be possible. There is generally no need to "hide" the duplicates by disallowing crawling in a robots.txt file or by using a "noindex" robots meta tag. However, if you're providing the same content to the same users on different URLs (for instance, if both example.de/ and example.com/de/ show German language content for users in Germany), you should pick a preferred version and redirect (or use the rel=canonical link element) appropriately.
Google's guidelines are:
Make sure the page language is obvious
Make sure each language version is easily discoverable
This point specifically references URLs as needing to be kept separate. The example they provide is:
For example, the following .ca URLs use fr as a subdomain or subdirectory to clearly indicate French content: http:// example.ca/fr/vélo-de-montagne.html and http:// fr.example.ca/vélo-de-montagne.html.
They also state:
It’s fine to translate words in the URL, or to use an Internationalized Domain Name (IDN). Make sure to use UTF-8 encoding in the URL (in fact, we recommend using UTF-8 wherever possible) and remember to escape the URLs properly when linking to them.
Targeting the site content to a specific country
This is done through CCTLDs, Geotargetting settings in Search Console, Server Location and 'other signals'.
If you're worried about duplicate content, they state:
Websites that provide content for different regions and in different languages sometimes create content that is the same or similar but available on different URLs. This is generally not a problem as long as the content is for different users in different countries. While we strongly recommend that you provide unique content for each different group of users, we understand that this might not always be possible.
If you do re-use the same content across the same website (but in a different language then:
There is generally no need to "hide" the duplicates by disallowing crawling in a robots.txt file or by using a "noindex" robots meta tag.
But!
However, if you're providing the same content to the same users on different URLs (for instance, if both example.de/ and example.com/de/ show German language content for users in Germany), you should pick a preferred version and redirect (or use the rel=canonical link element) appropriately. In addition, you should follow the guidelines on rel-alternate-hreflang to make sure that the correct language or regional URL is served to searchers.
So, be sure to declare the relationship between different languages using hreflang.
Example below:
<link rel="alternate" href="http://example.com" hreflang="en-us" />
You can use this in a number of places including your page markup, HTTP headers, or even the sitemap.
Here's a link to a hreflang generator which you might find useful.
Hope this helps.

Is there an easy, simple, lazy way to test rules against Apache's mod_rewrite?

I want to test the effects of my RewriteRules without going through all the trouble of setting up a vhost and a RewriteLog and throwing URLs at the browser (or curling them).
But I don't just wanna test regular expressions. I want my URLs to actually go through Apache's mod_rewrite stack, and I want to see the response that would come out of it.
Awesome if I could get some trace of which rules acted on the URL, with which order, and what the interim results were. (I guess most of this appears in the rewrite log, but I wanted to avoid that setup)
Is there any tool for this out there?
I'm ok with it not being able to handle RewriteConds, since those generally rely on the request headers and whatnot.
I haven't come across a mod_rewrite validator, but setting up a vhost may have been quicker than posting here :)
Your best bet is unit testing. Provide rewrite rules and a list of expected results then get a regular report. I don't know your environment but Google results look promising.
Hope that points you in the right direction!
Here is an online htaccess tester:
http://htaccess.madewithlove.be/
(as per https://stackoverflow.com/a/5907896/190791)
As the answer here https://stackoverflow.com/a/30966316/4338417 says...
http://www.generateit.net/mod-rewrite/index.php lets you create the RewriteRule and also allows to modify the resulting URL as per your needs.