Set environment variable in .htaccess when the URL starts with directory - apache

I'm trying to set an environment variable in my .htaccess based on the 1st subdirectory of a URL. As long as the URL contains dira as the 1st subdirectory, it should set my variable (myvar) to a value of 123. For example, these URLs should work:
test.com/dira
test.com/dira/
test.com/dira/b
test.com/dira/b/
test.com/dira/b/random.html
test.com/dira/b/2
test.com/dira/b/2/
test.com/dira/b/2/and-so-in-into-infinity
Any other URL should be ignored, like these:
test.com
test.com/
test.com/dirb
test.com/dirc/123/456/789/blah
test.com/123/dira/
test.com/dirz/
I've tried all these rules in my .htaccess without success:
RewriteRule ^dira(.*)$ [E=myvar:123]
RewriteRule ^dira/(.*)$ [E=myvar:123]
RewriteRule ^dira/? [E=myvar:123]
RewriteRule ^dira/?$ [E=myvar:123]
In every case, myvar isn't set. So my question is, what's the syntax to wildcard everything after the first sub-directory? Thanks.

RewriteRule ^dira(.*)$ [E=myvar:123]
RewriteRule ^dira/(.*)$ [E=myvar:123]
RewriteRule ^dira/? [E=myvar:123]
RewriteRule ^dira/?$ [E=myvar:123]
You are missing the substitution string (2nd argument) so none of these directives are actually setting an environment variable. [E=myvar:123] is seen as the substitution string (2nd argument), so would result in a rather malformed internal rewrite.
It should read something like this:
RewriteRule ^dira($|/) - [E=myvar:123]
Note the single hyphen (-) as the substitution string (2nd argument) indicating "no substitution".
The regex ^dira($|/) matches any URL-path that contains dira as the first complete path segment. eg. It matches dira, dira/ and dira/anything, but not dirasomething.
However, there are additional "problems" resulting from other directives in your .htaccess file...
RewriteRule ^([^/\.]+)/?$ index.php?acid={ENV:myvar} [L]
(Aside: That should be %{ENV:myvar} - you are missing the % prefix.)
If that is the only other rule you have in your .htaccess file then the myvar env var should now be successfully set when requesting /dira/something, but not /dira or /dira/. (Because both /dira and /dira/ are internally rewritten by the above rule.)
The reason being, when matched, the above RewriteRule triggers a second pass of the rewrite engine (when used in .htaccess). This second pass causes any environment variables that are already set to be renamed with a REDIRECT_ prefix. ie. myvar becomes REDIRECT_myvar. myvar is not (re)set on the second pass because index.php does not match your rule.
You can either:
Modify the above RewriteRule directive (that internally rewrites the request) to prevent a second pass of the rewrite engine. This can be done by changing the L flag to END (requires Apache 2.4). For example:
RewriteRule ^([^/.]+)/?$ index.php?acid=%{ENV:myvar} [END]
Note that I corrected the env var reference by including the % prefix. And there is no need to backslash-escape literal dots when used inside a regex character class.
However, you will need to be mindful of future rules you add in order to prevent loops by the rewrite engine. And sometimes the looping nature of the rewrite engine can be desirable.
OR
Check for both myvar and REDIRECT_myvar in your PHP code (ie. $_SERVER['REDIRECT_myvar']). Since not all requests that start dira are rewritten by the above rule, you'll have either myvar or REDIRECT_myvar set at different times. Specifically, /dira and /dira/ will result in REDIRECT_myvar being set and /dira/something will have the expected myvar set (since it's not rewritten).

Related

htaccess url redirect with get parameters ID and reduce value

I want to do an url redirect to a new domain by retrieving the ID parameter but only taking the first 4 characters. Anyone know how to do this?
For example, an original url:
http://www.original.example/see/news/actualite.php?newsId=be9e836&newsTitle="blablabla"
To :
https://www.new.example/actualites/be9e
I have tested :
RewriteCond %{QUERY_STRING} ^newsId=(.*)$ [NC]
RewriteRule ^$ https://www.new.example/actualites/%1? [NC,L,R]
RewriteCond %{QUERY_STRING} ^newsId=(.*)$ [NC]
RewriteRule ^$ https://www.new.example/actualites/%1? [NC,L,R]
There are a couple of problems with this:
The regex ^$ in the RewriteRule pattern only matches the document root. The URL in your example is /see/news/actualite.php - so this rule will never match (and the conditions are never processed).
The regex ^newsId=(.*)$ is capturing everything after newsId=, including any additional URL parameters. You only need the first 4 characters of this particular URL param.
As an aside, your existing condition is dependent on newsId being the first URL parameter. Maybe this is always the case, maybe not. But it is relatively trivial to check for this URL parameter, regardless of order.
Also, do you need a case-insensitive match? Or is it always newsId as stated in your example. Only use the NC flag if this is necessary, not as a default.
Try the following instead:
RewriteCond %{QUERY_STRING} (?:^|&)newsId=([^&]{4})
RewriteRule ^see/news/actualite\.php$ https://www.new.example/actualites/%1 [QSD,R,L]
The %1 backreference now contains just the first 4 characters of the newsId URL parameter value (ie. non & characters), as denoted by the regex ([^&]{4}).
The QSD flag (Apache 2.4) discards the original query string from teh redirect response. No need to append the substitution string with ? (an empty query string), as would have been required in earlier versions of Apache.
UPDATE:
I have an anchor link (#) which is added at the end of the link, is there a possibility of deleting it to make a clean link? Example, currently I have: https://www.new.example/news/4565/#title Ideally : https://www.new.example/news/4565
The "problem" here is that the browser manages the "fragment identifier" (fragid) (ie. the "anchor link (#)") and preserves this through the redirect. In other words, the browser re-appends the fragid to the redirect response from the server. The fragid is never sent to the server, so we cannot detect this server side prior to issuing the HTTP redirect.
The only thing we can do is to append an empty fragid (ie. a trailing #) in the hope that the browser discards the original fragment. Unfortunately, you will likely end up with a trailing # on your redirected URLs (browser dependent).
For example (simplified):
:
RewriteRule .... https://example.com/# [R=301,NE,L]
Note that you will need the NE flag here to prevent Apache from URL-encoding the # in the redirect response.
Like I say above, browsers might handle this differently.
Further reading:
URL Fragment and 302 redirects
redirect is keeping hash
How to clear fragment identifier on 302 redirect?

.htaccess : Pretty URL with whatever number+names of parameters

Hello !
I know there already are a lot of topics about URL rewritting and I honestly swear I've spent a lot of time trying to apply them to my problem but I can't see any of them perfectly applying to my situation (if you find otherwise, please give the link).
-----
Here's the problem :
I'm learning MVC model and URL rewriting and I have my URL like this :
http://localhost/blahblahblah/mywebsite/index.php?param1=value1&param2=value2&param3=value3 ... etc ...
What I want (for some MVC template goals) is to have this kind of URL :
http://localhost/blahblahblah/mywebsite/value1/value2/value3 ... etc ...
-----
Whatever are the names of the parameters and whatever are the values.
This is the most essential thing I can't find a solution for.
(Also don't mind the localhost blahblahblah, this has to work even on distant websites but I trust it will work fine on online website has this part of URL may have no importance in what I want to do)
Thanks a lot for your time if you can help me seeing clearer in what I need to do.
If the .htaccess file is located in the document root (ie. effectively at http://localhost/.htaccess) then you would need to do something like the following using mod_rewrite:
RewriteEngine On
RewriteRule ^(blahblahblah/mywebsite)/(\w+)$ $1/index.php?param1=$2 [L]
RewriteRule ^(blahblahblah/mywebsite)/(\w+)/(\w+)$ $1/index.php?param1=$2&param2=$3 [L]
RewriteRule ^(blahblahblah/mywebsite)/(\w+)/(\w+)/(\w+)$ $1/index.php?param1=$2&param2=$3&param3=$4 [L]
# etc.
Where $n is a backreference to the corresponding captured group in the preceding RewriteRule pattern (1st argument).
UDPATE: \w is a shorthand character class that matches a-z, A-Z, 0-9 and _ (underscore).
A new directive is required for every number of parameters. You could combine them into a single (complex) directive but you would have lots of empty parameters when only a few parameters were passed (rather than not passing those parameters at all).
I'm assuming your URLs do not end in a trailing slash.
If, however, the .htaccess file is located in the /blahblahblah/mywebsite directory then then directives could be simplified a bit:
RewriteRule ^(\w+)$ index.php?param1=$1 [L]
RewriteRule ^(\w+)/(\w+)$ index.php?param1=$1&param2=$2 [L]
RewriteRule ^(\w+)/([\w]+)/([\w]+)$ index.php?param1=$1&param2=$2&param3=$3 [L]
# etc.
Don't use URL parameters (alternative method)
An alternative approach is to not convert the path segments into URL parameters in .htaccess and instead just pass everything to index.php and let your PHP script split the URL into parameters. This allows for any number of parameters.
For example, your .htaccess file then becomes rather more simple:
RewriteRule ^\w+(/\w+)*$ index.php [L]
(This assumes the .htaccess file is located in /blahblahblah/mywebsite directory, otherwise you need to add the necessary directory prefix as above.)
The RewriteRule pattern simply validates the request URL is of the form /value1 or /value1/value2 or /value1/value2/value3 etc. And the request is rewritten to index.php (the front-controller) to handle everything.
In index.php you then examine $_SERVER['REQUEST_URI'] and parse the requested URL.

What is really replaced by a substitution in a RewriteRule inside .htaccess files?

I am trying to understand the Apache Rewrite Module. The documentation at RewriteRule Directive Documentation says:
The Substitution of a rewrite rule is the string that replaces the original URL-path that was matched by Pattern.
Does it mean that the whole URL-Path is replaced when a match occurs or only the part that is matched by the regex?
For example:
RewriteRule ^test new
Now I type into my browser: http://example.com/test/file.php
test/file.php is matched against ^test and the substitution is new. How will the result look like?
new/file.php or just new?
What exactly is replaced here?
"new/file.php" or just "new"?
Just "new". The substitution string replaces the whole URL-path if the Pattern matched.
In order to get the result new/file.php, having requested /test/file.php, you would need a slightly different directive. For example:
RewriteRule ^test/(.*) new/$1 [L]
UPDATE: Note that the above is an internal rewrite (as opposed to an external redirect). The rewrite is internal to the server, the user does not see this.
Also note that new/$1 is a relative path (as opposed to root-relative - starting with a slash, or absolute - with a scheme + hostname).
If the .htaccess file is in the root directory then you shouldn't need to include the slash prefix (for an internal rewrite) since - by default - it's relative to the directory that contains the .htaccess file. (No harm to add it though.) The directory-prefix (the absolute filesystem path of where the .htaccess file is located) is added back to relative path substitutions at the end. So new/file.php actually becomes /path/to/root-directory/new/file.php (when the .htaccess file is located at /path/to/root-directory/.htaccess).
However, if you were creating an external redirect then you would need a slash prefix (in order to make it root-relative - relative to the document root), or set the RewriteBase directive (which overrides the directory-prefix). Otherwise, you would end up with an external redirect to http://example.com/path/to/root-directory/new/file.php - which is most probably "wrong".
For example:
RewriteRule ^test/(.*) /new/$1 [R,L]
Note the R flag, that triggers an external redirect back to the client.

Rewrite rule and the_request

How to rewrite search/2 from index.php?search="x"&&searc_by="y"&page_no=2?
If I am not wrong %REQUEST_URI is search/2, right? Also what is %THE_REQUEST in this case.
The page where search/2 link is located is rewritten as just home_page.
%{REQUEST_URI} and %{THE_REQUEST} are variables in mod_rewrite. These variables contain the following:
%{REQUEST_URI} will contain everything behind the hostname and before the query string. In the url http://www.example.com/its/a/scary/polarbear?truth=false, %{REQUEST_URI} would contain /its/a/scary/polarbear. This variable updates after every rewrite.
%{THE_REQUEST} is a variable that contains the entire request as it was made to the server. This is something in the form of GET /its/a/scary/polarbear?truth=false HTTP/1.1. Since the request that was made to the server is static in the lifespan of one such request, this variable does not change when a rewrite is made. It is therefore helpful in certain situations where you only want to rewrite if an external request contained something. It is often used to prevent infinite loops from happening.
A complete list of variables can be found here.
In your case you will have a link to search/2?search=x&search_by=y. You want to internally rewrite this to index.php?search=x&search_by=y&page_no=2. You can do this with the following rule:
RewriteRule ^search/([0-9]+)$ /index.php?page_no=$1 [QSA,L]
The first argument matches the external request that comes in. It is then rewritten to /index.php?page_no=2. The QSA (query string append) flag appends the existing query string to the rewritten query string. You end up with /index.php?search=x&search_by=y&page_no=2. The L flag stops this 'round' of rewriting. It's just an optimalization thing.

modrewrite alter 1 element of query string

Flash movies are called based on dynamic links on mypage.php. mypage.php has the flash player embedded. The links look like mypage.php?moviefolder=folder1/folder2&swfTitle=sometitle.swf. mypage.php is parsed on each link click (per the href). Folder2 is always the same but movieTitle.swf is dynamic. Sometimes subfolders will be called (folder2/subfolder2/sometitle.swf).
Can mod_rewrite allow the query string to reflect folder2 but instead silently serve folder3 as well as occasional subfolders? I would place all files in folder3. The goal is to have the user not know where the swfs are. Thanks in advance again!
Using a RewriteCond to match the contents of the query string (since they are not read in a RewriteRule directive, you can extract swfTitle=sometitle.swf and substitute folder1/folder3 for folder1/folder2 in the moviefolder.
This will use a regex pattern like ([^&]+) to match everything up to the next & (which denotes another query param).
# Capture everything after folder2 into %1
RewriteCond %{QUERY_STRING} moviefolder=folder1/folder2([^&]+) [NC]
# Capture everything in the swfTitle param into %2
# Both conditions must be matched...
RewriteCond %{QUERY_STRING} swfTitle=([^&]+) [NC]
# Then silently rewrite mypage.php to substitute folder3,
# and pass in the original swfTitle captured above
RewriteRule ^mypage\.php$ mypage.php?moviefolder=folder1/folder3%1&swfTitle=%2 [L]
Hopefully, you won't get a rewrite loop, since the rewritten folder1/folder3 won't match the second time. [NC] allows for a case-insensitive match.
I did manage to successfully test this over at http://htaccess.madewithlove.be/, using the sample input:
http://example.com/mypage.php?swfTitle=thetitle.swf&moviefolder=folder1/folder2/thing
---> http://example.com/mypage.php?moviefolder=folder1/folder3/thing&swfTitle=thetitle.swf
http://example.com/mypage.php?moviefolder=folder1/folder2/thing999zzz&swfTitle=thetitle.swf
---> http://example.com/mypage.php?moviefolder=folder1/folder3/thing999zzz&swfTitle=thetitle.swf