Why does RewriteRule ^page/?$ page.php [L] match site.com/page// - apache

RewriteEngine on
RewriteRule ^page/?$ page.php [L]
This ends up matching the url www.site.com/page// but internally it acts differently than www.site.com/page/ because the stylesheets and images no longer appear properly. Am I doing something wrong or is this just something I need to deal with if I don't want to go through a lot of trouble?
To me it looks like it should only match www.site.com/page or www.site.com/page/

Apache strips the empty path segment. So /path// is treated as /path/. But your browser doesn’t so relative URLs are resolved using /path//.
If you want to remove the multiple slashes, you can use the following rule:
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /(([^/\ ]+/)*)/+([^\ ]*)
RewriteRule ^ /%1%3 [L,R=301]
Explaination
Despite Apache removes empty path segments internal, the THE_REQUEST environment variable (holding the HTTP request line) stays untouched. So we can use this value to check for multiple slashes.
^[A-Z]+\ / matches the request method, the following space and the first slash character of the URI path.
(([^/\ ]+/)*) matches all following non-empty path segments (foo/, foo/bar/, foo/bar/baz/, etc.) or nothing, if there are none.
/+ matches the empty path segments as the character before this slash is always another slash (see the expressions before).
([^\ ]*) matches the rest of the URI (that may contain further empty path segments).
Example: Let’s say we request http://example.com/foo/bar//baz, the request line will look like this:
GET /foo/bar//baz HTTP/1.1
The pattern would then match as follows:
0: GET /foo/bar//baz
1: foo/bar/
2: bar/
3: baz
So the requested path /foo/bar//baz would be redirected to /foo/bar/baz (/%1%3).

Related

Set environment variable in .htaccess when the URL starts with directory

I'm trying to set an environment variable in my .htaccess based on the 1st subdirectory of a URL. As long as the URL contains dira as the 1st subdirectory, it should set my variable (myvar) to a value of 123. For example, these URLs should work:
test.com/dira
test.com/dira/
test.com/dira/b
test.com/dira/b/
test.com/dira/b/random.html
test.com/dira/b/2
test.com/dira/b/2/
test.com/dira/b/2/and-so-in-into-infinity
Any other URL should be ignored, like these:
test.com
test.com/
test.com/dirb
test.com/dirc/123/456/789/blah
test.com/123/dira/
test.com/dirz/
I've tried all these rules in my .htaccess without success:
RewriteRule ^dira(.*)$ [E=myvar:123]
RewriteRule ^dira/(.*)$ [E=myvar:123]
RewriteRule ^dira/? [E=myvar:123]
RewriteRule ^dira/?$ [E=myvar:123]
In every case, myvar isn't set. So my question is, what's the syntax to wildcard everything after the first sub-directory? Thanks.
RewriteRule ^dira(.*)$ [E=myvar:123]
RewriteRule ^dira/(.*)$ [E=myvar:123]
RewriteRule ^dira/? [E=myvar:123]
RewriteRule ^dira/?$ [E=myvar:123]
You are missing the substitution string (2nd argument) so none of these directives are actually setting an environment variable. [E=myvar:123] is seen as the substitution string (2nd argument), so would result in a rather malformed internal rewrite.
It should read something like this:
RewriteRule ^dira($|/) - [E=myvar:123]
Note the single hyphen (-) as the substitution string (2nd argument) indicating "no substitution".
The regex ^dira($|/) matches any URL-path that contains dira as the first complete path segment. eg. It matches dira, dira/ and dira/anything, but not dirasomething.
However, there are additional "problems" resulting from other directives in your .htaccess file...
RewriteRule ^([^/\.]+)/?$ index.php?acid={ENV:myvar} [L]
(Aside: That should be %{ENV:myvar} - you are missing the % prefix.)
If that is the only other rule you have in your .htaccess file then the myvar env var should now be successfully set when requesting /dira/something, but not /dira or /dira/. (Because both /dira and /dira/ are internally rewritten by the above rule.)
The reason being, when matched, the above RewriteRule triggers a second pass of the rewrite engine (when used in .htaccess). This second pass causes any environment variables that are already set to be renamed with a REDIRECT_ prefix. ie. myvar becomes REDIRECT_myvar. myvar is not (re)set on the second pass because index.php does not match your rule.
You can either:
Modify the above RewriteRule directive (that internally rewrites the request) to prevent a second pass of the rewrite engine. This can be done by changing the L flag to END (requires Apache 2.4). For example:
RewriteRule ^([^/.]+)/?$ index.php?acid=%{ENV:myvar} [END]
Note that I corrected the env var reference by including the % prefix. And there is no need to backslash-escape literal dots when used inside a regex character class.
However, you will need to be mindful of future rules you add in order to prevent loops by the rewrite engine. And sometimes the looping nature of the rewrite engine can be desirable.
OR
Check for both myvar and REDIRECT_myvar in your PHP code (ie. $_SERVER['REDIRECT_myvar']). Since not all requests that start dira are rewritten by the above rule, you'll have either myvar or REDIRECT_myvar set at different times. Specifically, /dira and /dira/ will result in REDIRECT_myvar being set and /dira/something will have the expected myvar set (since it's not rewritten).

htaccess url redirect with get parameters ID and reduce value

I want to do an url redirect to a new domain by retrieving the ID parameter but only taking the first 4 characters. Anyone know how to do this?
For example, an original url:
http://www.original.example/see/news/actualite.php?newsId=be9e836&newsTitle="blablabla"
To :
https://www.new.example/actualites/be9e
I have tested :
RewriteCond %{QUERY_STRING} ^newsId=(.*)$ [NC]
RewriteRule ^$ https://www.new.example/actualites/%1? [NC,L,R]
RewriteCond %{QUERY_STRING} ^newsId=(.*)$ [NC]
RewriteRule ^$ https://www.new.example/actualites/%1? [NC,L,R]
There are a couple of problems with this:
The regex ^$ in the RewriteRule pattern only matches the document root. The URL in your example is /see/news/actualite.php - so this rule will never match (and the conditions are never processed).
The regex ^newsId=(.*)$ is capturing everything after newsId=, including any additional URL parameters. You only need the first 4 characters of this particular URL param.
As an aside, your existing condition is dependent on newsId being the first URL parameter. Maybe this is always the case, maybe not. But it is relatively trivial to check for this URL parameter, regardless of order.
Also, do you need a case-insensitive match? Or is it always newsId as stated in your example. Only use the NC flag if this is necessary, not as a default.
Try the following instead:
RewriteCond %{QUERY_STRING} (?:^|&)newsId=([^&]{4})
RewriteRule ^see/news/actualite\.php$ https://www.new.example/actualites/%1 [QSD,R,L]
The %1 backreference now contains just the first 4 characters of the newsId URL parameter value (ie. non & characters), as denoted by the regex ([^&]{4}).
The QSD flag (Apache 2.4) discards the original query string from teh redirect response. No need to append the substitution string with ? (an empty query string), as would have been required in earlier versions of Apache.
UPDATE:
I have an anchor link (#) which is added at the end of the link, is there a possibility of deleting it to make a clean link? Example, currently I have: https://www.new.example/news/4565/#title Ideally : https://www.new.example/news/4565
The "problem" here is that the browser manages the "fragment identifier" (fragid) (ie. the "anchor link (#)") and preserves this through the redirect. In other words, the browser re-appends the fragid to the redirect response from the server. The fragid is never sent to the server, so we cannot detect this server side prior to issuing the HTTP redirect.
The only thing we can do is to append an empty fragid (ie. a trailing #) in the hope that the browser discards the original fragment. Unfortunately, you will likely end up with a trailing # on your redirected URLs (browser dependent).
For example (simplified):
:
RewriteRule .... https://example.com/# [R=301,NE,L]
Note that you will need the NE flag here to prevent Apache from URL-encoding the # in the redirect response.
Like I say above, browsers might handle this differently.
Further reading:
URL Fragment and 302 redirects
redirect is keeping hash
How to clear fragment identifier on 302 redirect?

What is really replaced by a substitution in a RewriteRule inside .htaccess files?

I am trying to understand the Apache Rewrite Module. The documentation at RewriteRule Directive Documentation says:
The Substitution of a rewrite rule is the string that replaces the original URL-path that was matched by Pattern.
Does it mean that the whole URL-Path is replaced when a match occurs or only the part that is matched by the regex?
For example:
RewriteRule ^test new
Now I type into my browser: http://example.com/test/file.php
test/file.php is matched against ^test and the substitution is new. How will the result look like?
new/file.php or just new?
What exactly is replaced here?
"new/file.php" or just "new"?
Just "new". The substitution string replaces the whole URL-path if the Pattern matched.
In order to get the result new/file.php, having requested /test/file.php, you would need a slightly different directive. For example:
RewriteRule ^test/(.*) new/$1 [L]
UPDATE: Note that the above is an internal rewrite (as opposed to an external redirect). The rewrite is internal to the server, the user does not see this.
Also note that new/$1 is a relative path (as opposed to root-relative - starting with a slash, or absolute - with a scheme + hostname).
If the .htaccess file is in the root directory then you shouldn't need to include the slash prefix (for an internal rewrite) since - by default - it's relative to the directory that contains the .htaccess file. (No harm to add it though.) The directory-prefix (the absolute filesystem path of where the .htaccess file is located) is added back to relative path substitutions at the end. So new/file.php actually becomes /path/to/root-directory/new/file.php (when the .htaccess file is located at /path/to/root-directory/.htaccess).
However, if you were creating an external redirect then you would need a slash prefix (in order to make it root-relative - relative to the document root), or set the RewriteBase directive (which overrides the directory-prefix). Otherwise, you would end up with an external redirect to http://example.com/path/to/root-directory/new/file.php - which is most probably "wrong".
For example:
RewriteRule ^test/(.*) /new/$1 [R,L]
Note the R flag, that triggers an external redirect back to the client.

apache redirect rule with sharp character in query string

I have an old url of the form: http://example.com/foo.php?title=foobar#9. The person who constructed the url did not mean for the sharp character to be an actual anchor to the page, it is a special character within the title param value; title IS actually 'foobar#9'.
Now I need to create a rewrite rule. Using the following:
RewriteCond %{query_string} ^title=foobar#9$
RewriteRule ^/foo.php$ http://example.com/test? [R=301,L]
the condition is never matched. Using the following
RewriteCond %{query_string} ^title=foobar%239$
RewriteRule ^/foo.php$ http://example.com/test? [R=301,L]
the condition is only matched when the url is actually encoded (http://example.com/foo.php?title=foobar%239). Is there any way I can achieve that a user clicks on the (non-encoded) url http://example.com/foo.php?title=foobar#9 link and the condition is matched (and the rewrite rule takes effect)?
I'm not sure this can work. The # is something that is usually interpreted only by browsers.
The browser doesn't actually send this character to the webserver (unless encoded) so your rule won't ever match.
This was the log line I saw when trying:
127.0.0.1 - - [15/Apr/2014] "GET /?title=foobar HTTP/1.1" 200 466 "-"
Note that the #9 is missing from the request altogether.

modrewrite alter 1 element of query string

Flash movies are called based on dynamic links on mypage.php. mypage.php has the flash player embedded. The links look like mypage.php?moviefolder=folder1/folder2&swfTitle=sometitle.swf. mypage.php is parsed on each link click (per the href). Folder2 is always the same but movieTitle.swf is dynamic. Sometimes subfolders will be called (folder2/subfolder2/sometitle.swf).
Can mod_rewrite allow the query string to reflect folder2 but instead silently serve folder3 as well as occasional subfolders? I would place all files in folder3. The goal is to have the user not know where the swfs are. Thanks in advance again!
Using a RewriteCond to match the contents of the query string (since they are not read in a RewriteRule directive, you can extract swfTitle=sometitle.swf and substitute folder1/folder3 for folder1/folder2 in the moviefolder.
This will use a regex pattern like ([^&]+) to match everything up to the next & (which denotes another query param).
# Capture everything after folder2 into %1
RewriteCond %{QUERY_STRING} moviefolder=folder1/folder2([^&]+) [NC]
# Capture everything in the swfTitle param into %2
# Both conditions must be matched...
RewriteCond %{QUERY_STRING} swfTitle=([^&]+) [NC]
# Then silently rewrite mypage.php to substitute folder3,
# and pass in the original swfTitle captured above
RewriteRule ^mypage\.php$ mypage.php?moviefolder=folder1/folder3%1&swfTitle=%2 [L]
Hopefully, you won't get a rewrite loop, since the rewritten folder1/folder3 won't match the second time. [NC] allows for a case-insensitive match.
I did manage to successfully test this over at http://htaccess.madewithlove.be/, using the sample input:
http://example.com/mypage.php?swfTitle=thetitle.swf&moviefolder=folder1/folder2/thing
---> http://example.com/mypage.php?moviefolder=folder1/folder3/thing&swfTitle=thetitle.swf
http://example.com/mypage.php?moviefolder=folder1/folder2/thing999zzz&swfTitle=thetitle.swf
---> http://example.com/mypage.php?moviefolder=folder1/folder3/thing999zzz&swfTitle=thetitle.swf