Why the files directive doesn't work in Apache's httpd.conf? - apache

I had to noindex pdf files. I did it many times, so in this case, I used a files directive for adding noindex header with X-Robots-Tag, like Google recommends:
<Files ~ "\.pdf$">
Header set X-Robots-Tag "noindex, nofollow"
</Files>
When I have used this before, it worked like a charm. But in this case, I realized no X-Robots-Tag on its own, neither its content (noindex, nofollow) in header. Mod_headers was enabled.
I tried
<FilesMatch ~ "\.pdf$">
Header set X-Robots-Tag "noindex, nofollow"
</FilesMatch>
with no luck.
After many further tries and errors I've got it working with
<LocationMatch ~ "\.pdf$">
Header set X-Robots-Tag "noindex, nofollow"
</LocationMatch>
But I don't really understand why the rule I used for years stopped working and the rule I blindly tried, suddenly works.
Could somebody explain it to me?

The documentation for Apache states that FilesMatch takes a regular expression pattern <FilesMatch regexp> and is preferred over using <Files ~ "regexp">
The <FilesMatch> directive limits the scope of the enclosed directives by filename, just as the <Files> directive does. However, it accepts a regular expression.
In my experience with RegEx, this means using a wildcard to match all, rather than the normal <Files> directive which matches on a substring.
As for matching all named files in an expression, that means a small tweak is required to your existing code:
<FilesMatch ".+\.pdf$">
Header set X-Robots-Tag "noindex, nofollow"
</FilesMatch>
If you expect to have a file named .pdf that you also need to exclude, replace + in that expression with *. This is due to how RegEx matches:
. Match any character, once.
+ The previous modifier or block must occur one or more times
* The previous modifier or block may occur zero or more times
This means .+ matches all files with at least one character before .pdf in the filename, and .* matches all files ending on .pdf.
As for an explanation on why your Files directive doesn't work:
The Files directive may be overridden by other Files directives appearing later in the same configuration or within a .htaccess file in the directory you're keeping the pdf files in. Furthermore, there's an order in which the directives are handled and they can all override previous steps:
Directory < Files in Directory < .htaccess < Files in .htaccess < Location. So it's most probably a different part of the configuration that ignores the Files directive

Related

In htaccess, how to set a response header for all URLs of except one?

I want to use this rule:
<IfModule mod_headers.c>
Header always set X-FRAME-OPTIONS "DENY"
</IfModule>
But only for the front pages of my website.
I.e. I have a backoffice : example.com/gestion for which I don't want the rule to apply and I want to have the rule applied only for example.com (so all URLs without gestion)
Any idea ?
Try something like this using an Apache <If> expression to match all URLs, except for any URL that starts /gestion or contains multiple path segments or contains dots (ie. actual files).
For example:
<If "%{REQUEST_URI} =~ m#^/(?!gestion)[\w-]*$#">
Header always set X-FRAME-OPTIONS "DENY"
</If>
This uses a negative lookahead to avoid matching any URL that starts /gestion.
I'm assuming that your "front page" URLs only consist of single path segments containing characters in the range [0-9a-zA-Z_-].
The <IfModule> wrapper is not required (unless this is optional and you are using the same config on multiple server's where mod_headers may not be enabled - unlikely).

How to add X-Robots-Tag "noindex" for multiple subdirectoris (from .htaccess / shared host)

I have a list of folders (named as numbers) located in domain.com/user/uploaded/ directory (for example: ../435/, ../580/ etc.).
I'm trying to use Header set X-Robots-Tag "noindex" from .htaccess for these folders, for example:
domain.com/user/uploaded/435/
domain.com/user/uploaded/580/
etc. for other folders within /user/uploaded/{number} folders.
That means that directory named /435/, /580/ etc. should have 'X-Robots-Tag: noindex' added.
I only have access to .htaccess (it's shared host / litespeed). I tried to add this:
<FilesMatch "^user/uploaded/?$">
Header set X-Robots-Tag: "noindex"
</FilesMatch>
but it doesn't seem to work..
You should put a new .htaccess in user/uploaded/ directory. In this file you will be able to specify your .htaccess rule
Header set X-Robots-Tag: "noindex"
You don't need to use FilesMatch except if you want to target specific files.

Why is this FilesMatch not matching correctly?

We have been attempting to configure our server not to cache our .htm files as it is causing a few issues with our analytics package as well as not displaying the pages correctly if the visitor hits the back button in their browser.
We have attempted to tackle it by adding:
<FilesMatch "\.(htm)$">
Header set Cache-Control "max-age=0, no-cache, no-store, must-revalidate"
Header set Pragma "no-cache"
Header set Expires "Wed, 11 Jan 1984 05:00:00 GMT"
Header set Warning "Testing"
</FilesMatch>
to our httd file but it does not appear to execute, however, when we move the Header set outside of the FilesMatch it appears to execute fine..
Anyone have any ideas where we are going wrong?
I recently needed to figure out the same kind of problem and, although this post pointed me in the right direction, I wanted to share some clarifying information for the edification of those who search on this topic in the future.
David, your initial FilesMatch was not working because FilesMatch only works on real, physical files that exist on your filesystem. http://httpd.apache.org/docs/current/sections.html states it as:
The Directory and Files directives, along with their regex counterparts, apply directives to parts of the filesystem.
This is also why your second post using LocationMatch resolved the issue. Also from http://httpd.apache.org/docs/current/sections.html, it states:
The Location directive and its regex counterpart, on the other hand, change the configuration for content in the webspace. < SNIP > The directive need not have anything to do with the filesystem. For example, the following example shows how to map a particular URL to an internal Apache HTTP Server handler provided by mod_status. No file called server-status needs to exist in the filesystem.
<Location /server-status>
SetHandler server-status
</Location>
The Apache docs summarizes this behavior with the following statement:
Use Location to apply directives to content that lives outside the filesystem. For content that lives in the filesystem, use Directory and Files. An exception is < Location / >, which is an easy way to
apply a configuration to the entire server.
For those that want to understand more of the mechanics, this is how I understand the internals:
Location directives match based on the HTTP request URI (e.g. example.com/this/is/a/uri.htm without the example.com part).
Directory and Files directives, on the other hand, match based on whether there is a directory path or file in the filesystem of the DocumentRoot that matches to respective part of the the HTTP request URI
The Apache docs summarizes this behavior as:
What to use When
Choosing between filesystem containers and webspace containers is actually quite easy. When applying directives to objects that reside in the filesystem always use Directory or Files. When applying directives to objects that do not reside in the filesystem (such as a webpage generated from a database), use Location.
[IMPORTANT!] It is important to never use Location when trying to restrict access to objects in the filesystem. This is because many different webspace locations (URLs) could map to the same filesystem location, allowing your restrictions to be circumvented.
This issue has now been resolved.
In order to get it to work we have changed from using FilesMatch to LocationMatch and now the headers are being set perfectly.
We believe this is because the page is being redirected from a JSP page to an HTML page.
<LocationMatch "\.(htm|html)$">
Header set Cache-Control "max-age=0, no-cache, no-store, must-revalidate"
Header set Pragma "no-cache"
Header set Expires "Wed, 11 Jan 1984 05:00:00 GMT"
Header set Warning "Testing"
</LocationMatch>
Hopefully others will find this helpful.

Set Content-Disposition header to attachment only on files in a certain directory?

I've got this this rule in my htaccess file to force linked files to download rather than open in the browser:
<FilesMatch "\.(gif|jpe?g|png)$">
ForceType application/octet-stream
Header set Content-Disposition attachment
</FilesMatch>
Is there a way to alter the RegExp so it only applies to files in a certain directory?
Thanks
Like #gumbo said, put the .htaccess file in the highest level folder you want to affect. and those settings will trickle down to sub folders. You may also want to make sure the headers module is enabled before using this in your htaccess file. The following line will generate an error if the headers module is not enabled:
Header set Content-Disposition attachment
here's an example that forces download of mp3 files only if the headers module is enabled:
<IfModule mod_headers.c>
<FilesMatch "\.(mp3|MP3)$">
ForceType audio/mpeg
Header set Content-Disposition "attachment"
Allow from all
</FilesMatch>
</IfModule>
Note: it does not enable the module, it just ignores anything inside the IfModule tags if the module is not enabled.
To enable apache modules you'll either need to edit your httpd.conf file or in wamp server you can click the wamp tray icon and select "Apache -> Apache Modules -> headers_module" or make sure it is checked.
You will probably need to put the directives in the .htaccess file in the particular directory.
Put it in a <Location> directive, and/or modify the regex to exclude slashes or as appropriate.

.htaccess allow one specific file format only in directory listing

I have a directory but only want one file type to be listed.
I've tried the following:
<FilesMatch "\.(?!ext).*$">
Order Allow,Deny
Deny from all
</FilesMatch>
However it gives me a 403.
Is there any way to do this?
Check out the IndexIgnore directive
The IndexIgnore directive adds to the
list of files to hide when listing a
directory. File is a shell-style
wildcard expression or full filename.
Multiple IndexIgnore directives add to
the list, rather than the replacing
the list of ignored files. By default,
the list contains . (the current
directory).
IndexIgnore README .htaccess *.bak *~