How to add content-language for a single page in http header - http-headers

My index page is tri-lingual... in this scenario, W3 informs us that the original 'ID solution' was dropped, without a replacement......
W3 does suggest the use of HTTP headers, but fails to explain how this is accomplished.
Can stackoverflow solve this problem?
Background
W3 suggests that this code is not good/should not be used:
<meta http-equiv="Content-Language" content="de, fr, en">
However, they then say that there is nothing to replace it:
One implication of HTML5 dropping the meta element for declaring
language is that there is now no obvious way to provide metadata about
the document inside the document itself.
That's a painful statement, but... they then go on to suggest that "content-language" should be specified in a HTTP header.
This information is associated with a particular page by settings on
the server or by server-side scripting.
Fantastic... they even show a typical example... great!
HTTP/1.1·200·OK
Date:·Sat,·23·Jul·2011·07:28:50·GMT
Server:·Apache/2
Content-Location:·qa-http-and-lang.en.php
Vary:·negotiate,accept-language,Accept-Encoding
TCN:·choice
P3P:·policyref="http://www.w3.org/2001/05/P3P/p3p.xml"
Connection:·close
Transfer-Encoding:·chunked
Content-Type:·text/html; charset=utf-8
Content-Language:·en
But where is this file... and why is this character "·" used?
Why not use comma separated en, fr, de ?
Rant (after hours of researching):
If website programmers are advised not to use in-doc programming, it would be better if we were told exactly how to edit the HTTP header for any given page.
Therefore the question is simple?
Using CPanel, or Filezilla (and perhaps notepad++)... How do I modify the HTTP header for index.html to show that it contains English, French, German?
Note: I am currently using the bad code PLUS 'lang tags' eg:
<li lang="fr">
I'm trying to do what is right, but after looking on 'HTTP header help-sites', I never once found a statement re:
Exact file location
Filename and extension
Can anybody help solve this mystery?

If you didn't manage to find this, the HTTP Headers are what you are after as they describe the language you are expecting your target audience to use, and it can be multiple languages. HTTP Headers are set on your web server and apply to every page in your website.
If you are using Apache find the .htaccess file and add something along the lines of:
Header set Content-Language "en"
If you are using IIS then:
navigate to your website in the IIS GUI
double-click 'Http Response headers'
click 'Add...'
the name is Content-Language, the value is the language you want to use, for example use en for any kind of English, use commas to seperate multiple
Click OK
I got most of my info from here:
https://www.w3.org/International/questions/qa-html-language-declarations#metadata
Here is a list of the subtags you can use:
http://www.iana.org/assignments/language-subtag-registry/language-subtag-registry
thickguru supplied the .htaccess solution above, many thanks, his answer is here:
Language not declared Ideally

Related

Google URL Parameter tool - what to exclude?

Situation: Site built on OpenCart, which utilizes faceted navigation.
Problem: Google Webmaster Tools' "URL Parameters" tool reports a huge number of URLs with parameters like "sort", "order", "limit", "search", and "page.
I would like to exclude them, but I'm worried about 2 things:
1.) Maybe there's a better way to handle this issue? Exclusion directives in robots.txt? Something else? I.e. fixing the problem on the site, before Google detects it in the first place.
2.) I don't want to accidentally exclude actual content.
So... anyone familiar with SEO and/or OpenCart, please give me a 2nd opinion on which of these parameters I should exclude, or change the settings for?
Thanks!
I'm not aware of a robots.txt option. But you might sovle this using http headers and/or html headers.
You could set <META NAME="ROBOTS" CONTENT="NOINDEX, FOLLOW"> for duplicate pages in HTML header (cf. http://www.robotstxt.org/meta.html).
Another approach could be to provide canonical URLs (e.g., in HTML header or HTTP headers, cf. https://yoast.com/rel-canonical/, https://support.google.com/webmasters/answer/139066 and https://support.google.com/webmasters/answer/1663744).

What are the most used Accept-Language in HTTP header?

I make website and want to use the Accept-Language in HTTP header to help visitor find their language. However, I have a hard time to find statistics about the use of Accept-Language.
Will most visitor have something set as their Accept-Language? Some places it is written things like "most modern browsers support Accept-Language", but do anyone have overview of which specific browser versions that support it? And will usually browser language be set as Accept-Language by default if the user don't actively change their own Accept-Language settings? I guess most people don't change these settings, but that doesn't mean that Accept-Language is left blank?
Do anyone have statistics for the most used language codes set inside Accept-Language? I can make mapping system to map them with my site languages, but I also have problem to find some good statistics about most used codes. It would help a lot to get the overview for how to make this work better!
Thanks in advance!
Browsers send an Accept-Language header field out of the box. By default, the same language is requested, that is used for the user interface of the browser.
As Oswald said, by default browsers set this to the language used by the browser UI. So no, the default setting isn't blank, it's something like "en-US,en".
The only figures that I have found are on https://panopticlick.eff.org/results?#fingerprintTable . That page tests for the amount of information contained in HTTP requests. On the test result page, after clicking on "Show full results for fingerprinting", for various pieces of information it shows their frequency in column "one in x browsers have this value".
In row "HTTP_ACCEPT Headers" it shows the frequency of a combination of some Accept header values given by the browser. For example, it says that one in 5.25 browsers send the value "text/html, /; q=0.01 gzip, deflate, br en-US,en;q=0.5". Unfortunately, that value seems to be the concatenation of the values of headers "Accept" (somewhat stripped), "Accept-Encoding", and "Accept-Language", with a "br" thrown in for good measure.
As I wrote, when I experimented with panopticlick, it showed "one in 5.25 requests" for "en-US,en". This value is one of the more common ones, if not the most common one. One in 295.2 requests had just "en-US", one in 547.18 requests had just "en" and one in 1076.94 requests had "en,en-US" (which should have the same effects as "en", so it does not really make sense to use it).
Varying only the configuration of accepted languages, we can infer the frequency of the languages as seen by panopticlick. A more direct way would of course be to simply write to them and ask them for a table. I'm sure that the sample set of panopticlick is not representative for the entire internet, but at least it's a start.

Is using Content-Language header appropriate to localize a side effect of an HTTP POST?

I'm designing a REST API where content in the form of HTML is being posted to an endpoint. I'm using the lang attribute in the HTML to specify the language of the document or sections thereof. That is working nicely.
However, the content can be posted to a 'default' pseudo-resource, whose user-visible name is automatically generated, and thus needs to be localized. I need a way to specify which language to use when creating this name on the fly as a side effect of a first POST to the default section. Unfortunately, I'm not able to derive my user's preferred language from their login profile.
Does it seem reasonable to use the Content-Language header to specify this? There could be a clash with the languages(s) of the actual HTML content, and it is not strictly the language of the entity being POSTed.
Would it even make sense to treat the side effect as a type of 'response' and thus use Accept-Language instead?
A content-language is subject of content negotiation as well as content-type. Browsers automatically generate Accept-Language values from user's settings. e.g.
Accept-Language: en-US,en;q=0.8,cs;q=0.6
so you will only get user's language preference and that's all.
You can also use content-language (note that multiple languages are supported, e.g. content-language: en, de) to denote language(s) of content.
I would discourage you from giving users an ability to affect final URLs, but I guess that you are doing it because of SEO, right? If yes, common practice is to use something like mod_rewrite to strip dynamically generated 'nice URL' e.g.
http://example.org/some really nice name to be indexed/232323
to
http://example.org/?id=232323
Add your question: There can be probably only clash with built-in browsers translators as I'm not aware of any other component utilizing Content-Language.

How to direct multiple clean URL paths to a single page?

(Hi! This is my first time asking a question on Stack Overflow after years of finding answers here... Thanks!)
I have a dynamic page, and I'd like to have fixed URLs that point to different states of that page. So, for example: "www.mypage.co"(/index.php) is the base page, and it rearranges its content based on user choices. I'd then like to be able to point to "www.mypage.co/contentA" or "www.mypage.co/contentB" in order to automatically load base the page at "www.mypage.co" with the desired content.
At heart the problem is an aesthetic one. I know I could simply write www.mypage.co/index.html?state=contentA to reach the desired end, but I want to keep the URL simple and readable (ie, clean). I also, due to limitations in my hosting relationship, would most appreciate a solution that is server-independent (across LAM[PHP] stacks, at least), if possible.
Also, if I just have incorrect assumptions about how to implement clean URLs, I'd appreciate direction to a good, comprehensive explanation. I can't seem to find one...
You could use a htaccess file to redirect all requests to one location and then from there determine what you want to return to the client. Look over the htaccess/dispatch system that Tonic uses.
If you use Apache, you can use mod_rewrite. I have a rule like this where multiple restful urls all go to the same page, using regex and moving parts of the old url into parameters for the new url:
RewriteRule ^/testapp/(name|number|rn|sid|unii|inchikey|formula)(/(startswith))?/?(.*) /testapp/ProxyServlet?objectHandle=Search&actionHandle=drillIn&searchtype=$1&searchterm=$4&startswith=$3 [NC,PT]
That particular regex accepts urls like
testapp/name
testapp/name/zuchini
testapp/name/startswith/zuchini
and forwards them to the same page.
I also use UrlRewriteFilter for Tomcat, but as you mentioned PHP, that doesn't seem that it would be useful.

Hash character in URLs (accessing and redirecting in Apache)

It looks as though this question has been asked in part by some others, but I can't find the answer I'm looking for specifically, so I thought I'd pose my particular scenario in case anyone is able to help.
We have an old website (developed externally by a third party) that is due to be retired and replaced by a new site designed in house. For reasons best known to themselves, the developers of the old site used the hash character as part of the URL for the old site (www.mysite.com/#/my-content-stuff). To assist with the transition and help with SEO I need to set up 301 redirects for the top performing URLs from the old site. As I'm now discovering however, I'm not able to set up a simple redirect in the .htaccess file as I believe it takes the hash character to be a comment and ignores the remainder of the line. I've tried escape characters, using %23 instead, wildcard matching, nothing seems to work.
As a workaround, I wondered about simply creating dummy files with the same paths and URLs as the old site had, then simply creating HTML redirects within them to drive traffic to the correct new pages, but it looks as though the server is doing something similar regarding the hash character in the URL, and ignoring anything afterit. So, if I create a sub-folder on my news server called '#' and create a file in there called 'test.html', I expected to be able to just go to 'www.myNEWsite.com/#/test.html', but it just takes me to the default root file of my site.
Please can anyone shed any light on how I might get around this? I must admit I'm not that clued up on Apache so I'm having to learn a lot as I go.
Many thanks in advance for any pointers or info anyone can provide.
Cheers,
Rich
A hash character in the URL specifies the anchor, and it's not even sent to your webserver. A redirect is impossible on the server side, and the old developer probably did it using JavaScript. Implement fallback URLs without the hash instead, and have a global JavaScript script detect these URLs and redirect automatically.
Hash tags cannot be read by the server. They are regarded as locations within the document and are therefore not exposed to the server. The client is the only one whom see's these. The best you could do is use a "meta refresh" tag, or alternatively, you could use javascript to detect the url, and if its one which requires 301 redirection, use "window.location" to move the user to a full url where mod_rewrite or a php page can issue a 301 header.
However neither are SEO friendly and only really solve the issue for users that click onto an old link via an external site
<!-- Put in head tag so the page does not wait to load the content-->
<script type="text/javascript">
if(window.location.hash != "") {
var h = window.location.hash.match(/#\/?(.*)/i)[1];
switch(h) {
case "something_old":
window.location = "/something_new.html";
break;
case "something_also_old":
window.location = "/something_also_new.html";
break;
}
}
</script>