Precedence of X-Robots-Tag HTTP header vs robots.txt - seo

For example, if this is set in Apache settings:
<IfModule mod_headers.c>
Header set X-Robots-Tag "noindex, nofollow"
</IfModule>
and this is set in a robots.txt file:
User-agent: *
Allow: /
which one will take precedence?

These are different instructions and so your query is void. Here the robots.txt file is allowing the crawling and the x-robots-tag is controlling the indexing.
Though if you had shown something which had the same directives but in different formats, then it would be the case that (for googlebot at least) the most restrictive would be noted.

Related

How to add conditions to htaccess or conf file for robots noindex

Question 1
I currently use the following to noindex a site in htaccess
Header set X-Robots-Tag "noindex, follow"
I have tried all sorts of ways to noindex a pattern and am lost which is why I need help from you experts.
I would like to noindex /tags/ and /s/ and all pages within those categories.
Question 2
I also have another question which is related so I'll ask here instead of posting another question.
I have a number of aliases on a server and one htaccess file. How would I noindex a single URL such as https://www.website.com and allow the others to be indexed?
Can you help?
Use something like
<IfModule mod_headers.c>
<IfModule mod_setenvif.c>
SetEnvIf Request_URI "(*\/tags\/*|*\/p\/*)" x_tag=yes
Header set X-Robots-Tag "noindex, follow" env=x_tag
</IfModule>
</IfModule>

prevent googlebot from indexing file types in robots.txt and .htaccess

There are many Stack Overflow questions on how to prevent google bot from indexing, for instance, txt files. There's this:
robots.txt
User-agent: Googlebot Disallow: /*.txt$
.htaccess
<Files ~ "\.txt$">
Header set X-Robots-Tag "noindex, nofollow"
</Files>
However, what is the syntax for both of these when trying to prevent two types of files from being indexed? In my case - txt and doc.
In your robots.txt file:
User-agent: Googlebot
Disallow: /*.txt$
Disallow: /*.doc$
More details at Google Webmasters: Create a robots.txt file
In your .htaccess file:
<FilesMatch "\.(txt|doc)$">
Header set X-Robots-Tag "noindex, nofollow"
</FilesMatch>
More details here: http://httpd.apache.org/docs/current/sections.html

Allow Access-Control for Subdomain in .htaccess

Having issues setting up a generic Allow Origin for any subdomain in my .htaccess file. The following works for a singular subdomain:
<IfModule mod_headers.c>
Header set Access-Control-Allow-Origin http://subdomain.website.com
</IfModule>
But what I am looking for is something similar to this:
<IfModule mod_headers.c>
Header set Access-Control-Allow-Origin {ANY SUBDOMAIN}.website.com
</IfModule>
I have tried using a simple *.website.com wildcard, but that does not seem to work. Do you have to specify exactly what is coming in?
If you're looking to do it for whatever subdomain was being requested, try the following:
<IfModule mod_headers.c>
Header set Access-Control-Allow-Origin %{HTTP_HOST}
</IfModule>
If you need something more advanced, use mod_rewrite to set an environment variable and then refer to it using %{variable_name}e

Header add Access-Control-Allow-Origin "*" causes internal server error

Our assets are in a sub domain and in order to overrun security features of our platform so we can add a Json query we have to add the following htaccess code
<FilesMatch "\.(ttf|otf|eot|woff)$">
<IfModule mod_headers.c>
Header set Access-Control-Allow-Origin "*"
</IfModule>
</FilesMatch>
Header add Access-Control-Allow-Origin "*"
However the last line "Header add Access-Control-Allow-Origin "*"" creates an internal server error on my local machine, which is odd because we do not get the same error on our prod environment. We are using Apache 2.2.22 php 5.4.3.
Any help is appreciated thanks.
Is it possible you do not have mod_headers enabled?
Secondly I think you may want to put the IfModule block outside the FilesMatch block. Like so
# Allow access from all domains for web fonts
<IfModule mod_headers.c>
<FilesMatch "\.(eot|font.css|otf|ttc|ttf|woff)$">
Header set Access-Control-Allow-Origin "*"
</FilesMatch>
</IfModule>
Code taken directly from https://github.com/h5bp/html5-boilerplate/blob/master/.htaccess#L45

".htaccess" doesn't work for cache-control at sub directories

I made my own cache-control rule in httpd.conf. ANd need to apply different rules on each different sub directories.
I made no-cache for .do extension for default(httpd.conf).
# use .htaccess files for overriding,
AccessFileName .htaccess
...
<ifModule mod_headers.c>
<LocationMatch "\.(do)$">
Header append Cache-Control "max-age=0, no-cache, no-store, must-revalidate"
</LocationMatch>
</ifModule>
And need to cache for some directories(.htaccess).
example URL : XXX.com/en/product.do
So I made a .htaccess on <webRoot>/en.
<ifModule mod_headers.c>
<LocationMatch "\.(do)$">
Header set Cache-Control "max-age=216000, public, must-revalidate"
</LocationMatch>
</ifModule>
Am I going wrong? Is there other way to rule different on different directories?
Nothing like <locationMatch> can be used in .htaccess; it will generate a runtime error.
Also, usually *.do is proxied, in which case no filesystem directory would ever be read for .htaccess.
I suggest putting the second stanza first, and adding ^/en/ to the front.