remove //xx after directory in URL - apache

I need to redirect all URLs like this:
example.com/podcasts//rebt
to
example.com/podcasts
I am trying to adjust this code to do both but I can't get it to work:
RewriteCond %{REQUEST_METHOD} !=POST
RewriteCond %{REQUEST_URI} ^(.*?)(/{2,})(.*)$
RewriteRule . %1/%3 [R=301,L]

To remove //<something> at the end of the URL-path (eg. /podcasts//rebt to /podcasts, try the following instead at the top of the root .htaccess file:
RewriteEngine On
RewriteCond %{THE_REQUEST} \s([^?]+?)//
RewriteRule . %1 [R=301,L]
THE_REQUEST server variable contains the first line of the initial request headers (eg. GET /podcasts/rebt HTTP/1.1) and does not change when the request is internally rewritten (unlike REQUEST_URI).
The regex \s([^?]+?)// captures the part of the URL-path before the first instance of a double slash. Anything after and including the double slash, are discarded. This regex also ensures we do not inadvertently match against the query string (if any).
The %1 backreference contains the captured subpattern (ie. everything before the first double slash in the URL-path) from the preceding CondPattern.
Aside: Note that this will not work properly if the preceding URL-path maps to a physical directory, since it will result in two redirects. eg. /directory//something to /directory to /directory/ (by mod_dir). In this case, you should avoid removing the first trailing slash.
You should test first with a 302 (temporary) redirect to avoid any potential caching issues and only change to a 301 (permanent) redirect when you are sure it's working as intended. You should clear your browser cache before testing.
A look at your existing rule...
RewriteCond %{REQUEST_METHOD} !=POST
RewriteCond %{REQUEST_URI} ^(.*?)(/{2,})(.*)$
RewriteRule . %1/%3 [R=301,L]
This code is intended to reduce multiple slashes to single slashes in the URL-path, not remove the double slash and remaining path entirely. eg. /podcasts//rebt to /podcasts/rebt. However, since it checks against the REQUEST_URI server variable (which can change throughout the request) it may not work as intended.
Also, the condition that checks against the REQUEST_METHOD would seem to be redundant, unless you are erroneously POSTing to double-slashed URLs internally? A 301 redirect removes any POST data (since the browser converts it to GET) - hence why the check may be necessary in certain cases.

Related

I want to remove a string with a question mark at the end of my URL with .htaccess

I want to remove the string
?mobile=1
out from different URLs with .htaccess. So:
https://www.example.com/?mobile=1 should become https://www.example.com/
and
https://www.example.com/something/?mobile=1 should become https://www.example.com/something/
I tried the following
RewriteEngine On
RewriteRule ^(.+)?mobile=1 /$1 [R=301,L,NC]
But that does not seem to work. Any ideas?
RewriteRule ^(.+)?mobile=1 /$1 [R=301,L,NC]
The RewriteRule pattern matches against the URL-path only, which notably excludes the query string. So the above would never match. (Unless there was a %-encoded ? in the URL-path, eg. %3F)
To match the query string you need an additional condition (RewriteCond directive) and match against the QUERY_STRING server variable.
The regex .+ (1 or more) will not match the document root (ie. your first example: https://www.example.com/?mobile=1). You need to allow for an empty URL-path in this case. eg. .* (0 or more).
For example, try the following near the top of your root .htaccess file:
RewriteCond %{QUERY_STRING} =mobile=1
RewriteRule (.*) /$1 [QSD,R=301,L]
This matches the query string mobile=1 exactly, case-sensitive (as in your examples). No other URL parameters can exist. The = prefix on the CondPattern makes this an exact match string comparison, rather than a regex as it normally would.
And redirects to the same URL-path, represented by the $1 backreference in the substitution string that contains the URL-path from the captured group in the RewriteRule pattern.
The QSD (Query String Discard) flag removes the query string from the redirect response.
Test first with a 302 (temporary) redirect and and only change to a 301 (permanent) - if that is the intention - once you have confirmed this works as intended. 301s are cached persistently by the browser so can make testing problematic.

Log image filename that's cached by external cdn using htaccess

I want to keep a log of image file names whenever a specific cdn caches our images but I can't quite get it. Right now, my code looks something like:
RewriteCond %{HTTP_USER_AGENT} Photon/1.0
RewriteRule ^(.*)$ log.php?image=$1 [L]
The above always logs the image as being "log.php" even if I'm making the cdn cache "example.jpg" and I thoroughly don't understand why.
The above always logs the image as being "log.php" even if I'm making the cdn cache "example.jpg" and I thoroughly don't understand why.
Because in .htaccess the rewrite engine loops until the URL passes through unchanged (despite the presence of the L flag) and your rule also matches log.php (your rule matches everything) - so this is the "image" that is ultimately logged. The L flag simply stops the current pass through the rewrite engine.
For example:
Request /example.jpg
Request is rewritten to log.php?image=example.jpg
Rewrite engine starts over, passing /log.php?image=example.jpg to the start of the second pass.
Request is rewritten to log.php?image=log.php by the same RewriteRule directive.
Rewrite engine starts over, passing /log.php?image=log.php to the start of the third pass.
Request is rewritten to log.php?image=log.php (again).
URL has not changed in the last pass - processing stops.
You need to make an exception so that log.php itself is not processed. Or, state that all non-.php files are processed (instead of everything). Or, if only images are meant to be processed then only check for images.
For example:
# Log images only
RewriteCond %{HTTP_USER_AGENT} Photon/1\.0
RewriteRule ^(.+\.(?:png|jpg|webp|gif))$ log.php?image=$1 [L]
Remember to backslash-escape literal dots in the regex.
Or,
# Log Everything except log.php itself
RewriteCond %{HTTP_USER_AGENT} Photon/1\.0
RewriteCond %{REQUEST_URI} ^/(.+)
RewriteRule !^log\.php$ log.php?image=%1 [L]
In the last example, %1 refers to the captured subpattern in the preceding CondPattern. I only did it this way, rather than using REQUEST_URI directly since you are excluding the slash prefix in your original logging directive (ie. you are passing image.jpg to your script when /image.jpg is requested). If you want to log the slash prefix as well, then you can omit the 2nd condition and pass REQUEST_URI directly. For example:
# Log Everything except log.php itself (include slash prefix)
RewriteCond %{HTTP_USER_AGENT} Photon/1.0
RewriteRule !^log\.php$ log.php?image=%{REQUEST_URI} [L]
Alternatively, on Apache 2.4+ you can use the END flag instead of L to force the rewrite engine to stop and prevent further passes through the rewrite engine. For example:
RewriteCond %{HTTP_USER_AGENT} Photon/1\.0
RewriteRule (.+) log.php?image=$1 [END]

htaccess RewriteRule : Problem to omit all after 1st argument

Goal: Want to rewrite all URLs of type
https://www.example.com/page/1234/?/blog/foo/bar/
to
https://www.example.com/page/1234/
In .htaccess I tried many variations along the line
RewriteEngine On
RewriteBase /
RewriteRule ^page/(\d+)/(.*)$ /page/$1 [R=301,L]
Using an .htaccess tester I see that at least the matching pattern is valid.
I would expect that the rewrite would not include anything after $1, but it does, and show the complete original URL.
What am I missing?
https://www.mypage.com/page/1234/?/blog/foo/bar/
Everything after the first ? is the query string part of the URL. By default, Apache passes the query string unaltered from the request to the target URL (unless you create a new query string yourself on the RewriteRule substitution). This explains why you are seeing the same query string on the target URL, without seemingly doing anything with it.
Incidentally, the RewriteRule pattern only matches against the URL-path only - this notably excludes the query string. To match the query string in mod_rewrite you need an additional condition that checks the QUERY_STRING server variable.
On Apache 2.4+ you can use the QSD (Query String Discard) flag to remove the query string from the target URL. Or, specify an empty query string on the substitution - by including a trailing ? (the ? itself does not appear on the resulting URL).
For example (on Apache 2.4):
RewriteCond %{QUERY_STRING} .
RewriteRule ^page/(\d+)/ /page/$1/ [QSD,R=301,L]
The RewriteCond directive checks for the presence of a query string, which is necessary to prevent a redirect loop.
The trailing (.*)$ on the RewriteRule pattern was superfluous.
You had omitted the trailing slash on the end of the substitution (that is present on the example URL). This would have also prevented a redirect loop, but as mentioned, this is not as per your example. (Alternatively, you could include the slash in the captured backreference.)
If you are still on Apache 2.2 then you would need to include a trailing ? instead of the QSD flag. For example:
RewriteRule ^page/(\d+)/ /page/$1/? [R=301,L]
You will need to clear your browser cache before testing, as 301 (permanent) redirects are cached persistently by the browser. For this reason, it is often easier to first test with 302 (temporary) redirects.

.htaccess mod_rewrite linking to wrong page

I have in my .htaccess the following code:
RewriteEngine On
RewriteRule ^/?([^/\.]+)/?$ $1.php [L]
RewriteRule ^/?([^/\.]+).php$ $1/ [R,L]
RewriteRule ^/?([^/\.]+)/?$ $1.php [L] is working fine. What this is doing is taking a url like http://www.example.com/whatever and making it read the page as http://www.example.com/whatever.php.
However, what I'd like to be able to do is take a url like http://www.example.com/whatever.php and automatically send it to http://www.example.com/whatever, hence the second line of the code. However, this isn't working. What its doing now, is as soon as it comes across a link ending in .php, the url becomes http://localhost/C:/Sites/page/whatever/, and pulling a 403: Forbidden page.
All I want to know is what I can to so that http://www.example.com/whatever.php will be read as http://www.example.com/whatever, and that if http://www.example.com/whatever.php is entered into the URL bar, it will automatically redirect to http://www.example.com/whatever.
Does that make any sense?
EDIT
Ok, so it appears I wasn't all too clear.. basically, I want /whatever/ to read as whatever.php while the URL still stays as /whatever/, right? However, if the URL was /whatever.php, I want it to actually redirect the users URL to /whatever/, and then once again read it as whatever.php. Is this possible?
If you're rules are inside an .htaccess file, you can omit the leading slash when you match against a URI:
RewriteRule ^([^/\.]+)/?$ /$1.php [L]
Also note that a leading slash is included in the target (/$1.php), this makes sure /whatever/ gets rewritten to /whatever.php. When you redirect, if you are missing this leading slash, apache prepends the document root to it. Thus /whatever.php gets redirected to the document root C:/Sites/page/whatever/. Even if you include the leading slash, this will never work because you're going to cause a redirect loop:
Enter "http://www.example.com/whatever.php" in your address bar
apache redirects you to "http://www.example.com/whatever/"
apache gets the URI whatever/ and applies the first rule and the URI gets rewritten to /whatever.php
The URI gets put through the rewrite engine again
the URI /whatever.php matches the second rule and redirects the browser to "http://www.example.com/whatever/"
repeat steps 3-5
You need to add a condition that the actual request is for /whatever.php:
RewriteCond %{THE_REQUEST} ^(GET|POST|HEAD)\ /([^/\.]+)\.php
RewriteRule ^ /%2/ [R,L]
So altogether, you'll have:
RewriteEngine On
RewriteRule ^([^/\.]+)/?$ /$1.php [L]
RewriteCond %{THE_REQUEST} ^(GET|POST|HEAD)\ /([^/\.]+)\.php
RewriteRule ^ /%2/ [R,L]
You're making a relative path substitution in a per-directory context (.htaccess is a per-directory context). This requires RewriteBase. Per-directory rewrites are done in a later stage of processing, when URLs have been mapped to paths. But the rewrite must produce a URL, which is processed again. I think without the RewriteBase to supply the URL prefix, you end up with a filesystem prefix instead of the URL. That may be why you're getting the C:/Sites thing. Try RewriteBase. But after a correct RewriteBase to specify the correct URL prefix to be tacked in front to the relative rewritten part, I'm afraid you will have the rewrite loop, because you're rewriting whatever.php to whatever; and whatever to whatever.php.
Reference: http://httpd.apache.org/docs/current/rewrite/tech.html

What's going on with my mod_rewrite?

I have a simple mod_rewrite system set up on my site which basically converts
http://site.com/file -> http://site.com/file.php
Here's the .htaccess file
Options -MultiViews
RewriteEngine On
RewriteCond %{HTTP_HOST} ^www.site.com
RewriteRule ^(.*)$ http://site.com/$1 [R=301]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^([a-z]+)/?$ http://site.com/$1.php [L]
This was working for a long time and then a couple of days ago I realized that while the RewriteRule was working, it was actually changing my URL in the status bar.
For instance, it would redirect /photos to /photos.php, but it would also change the URL to show the .php. This has never happened before and I'm not sure what happened to trigger the change.
Any ideas?
The first rewrite rule needs the [L] flag. From the mod_rewrite documentation for the [R] flag:
You will almost always want to use [R] in conjunction with [L] (that is, use [R,L]) because on its own, the [R] flag prepends http://thishost[:thisport] to the URI, but then passes this on to the next rule in the ruleset, which can often result in 'Invalid URI in request' warnings.
In this case, you don't get a warning, but appending the ".php" extension happens before issuing the redirect rather than when the second, redirected request comes in.
Also, remove the scheme and domain name from the substitution in the second rewrite rule. A full URL can cause an implicit redirect. From the documentation for RewriteRule:
The Substitution of a
rewrite rule is the string that replaces the original URL-path that
was matched by Pattern. The Substitution may
be a:
[...]
Absolute URL
If an absolute URL is specified,
mod_rewrite checks to see whether the
hostname matches the current host. If it does, the scheme and
hostname are stripped out and the resulting path is treated as
a URL-path. Otherwise, an external redirect is performed for
the given URL. To force an external redirect back to the
current host, see the [R] flag below.