.htaccess file dropping parts when not using [R] - apache

I want URLs that are of the structure /news/categories/CATEGORY to redirect to /news/categories/dynamic-categories.php?category=CATEGORY
And I have this working for most situations using this .htaccess file rule:
RewriteRule ^news/categories/([a-zA-Z0-9\s]+)/?$ /news/categories/dynamic-categories.php?category=$1 [L]
However, in certain situations, the category names have spaces, and this falls apart. Stuff like /news/categories/with%20space gets rewritten to where I'm only seeing the category GET parameter having the value of with.
However, an odd thing to add to this, if I add the redirect flag ([R]) into it, the rule works (although with a redirect...) and the whole category (with space) gets passed.
What do I need to change here?

This actually appears to be an artifact of some other things.
The PHP page we're redirecting to is embedded though a CMS, and it looks like the query strings are getting stripped off during on of those transfer stages it does. The rewrite rule was right all along.

Related

Log htaccess-rejected urls

An .htaccess features a set of rules to reject some ill formed urls as eg :
RewriteCond %{QUERY_STRING} (select|\/\*\*\/) [NC]
RewriteRule ^ - [F,L]
How can i get a log of all rejected urls ?
Or how can i best log efficiently or temporily these rejected urls ?
[EDIT with more context :] My site sometimes goes down due to excesses of hackerbots attempts to find a way into it. To avoid that i have setup some rules in the .htaccess that reject the most common patterns found in hackerbots urls. This works fine, or at least it looks like it works fine. I now wish to (once every some time) check whether
some rules are useless and i could remove them
some rules are too broad and reject legitimate requests
So as to do so, I could build a script that applies the exact same rules (taken from the htaccess) to the apache access.logs that contain all requests. But it would require to sync the script everytime i update the htaccess. Hence, i wish to know if there is a setting or a "good" way to log all-and-only htaccess-rejected urls.
I begin to understand now with the additional comment you made above. What you ask is actually not clear from what you wrote in your question. You wrote "a log of all rejected urls", I understood of requested and rejected URLs, because that is what an http server deals with. But now I understand that you are actually not interested in URLs at all, but in a list of all possible query strings matching that condition. So we are talking about theoretical informatics here, artificial languages, a part of complexity theory.
What you ask is not possible. Reason is that the list you ask for is infinitely large, obviously. So all you could do is setup an algorithm that creates one matching string after another along a specific rule set. But I dare say that this won't really help, the actual rule set is probably more interesting for you....
I would phrase it this way: your regular expression will match string that contains either one of the two substrings "select" or "/**/" anywhere, so at the beginning, in the middle or at the end, regardless of what is before and after it. Take a look at this: https://regex101.com/r/tHkqZE/1 In there "foo" and "bar" can be anything ...
Maybe you want to limit that set. A first step, a probably step, would be to anchor the expression at the beginning or end of the full string or at the "&" character, considering the typical construction of a query string.
As #arkascha mentionned it, apache's handling status for each request is stated in apache's access.log
So best is to get it from there.

Rewriting Link with mod-rewrite

I have a link being sent to users in one format, but I need to make sure it passes through a main index page for login purposes. I figured mod rewrite was the way to go.
Link being clicked on by user:
https://sub.domain.com/link/link.jsp?pageId=1234567&id=12345
Where it needs to go:
https://sub.domain.com/index.html?o=(full original URL from above, including query string)
The o= in this case will let the user login and pass them along to that original URL. Now, there are also some images and a style sheet involved, so I need to have the rewrite ignore them.
After reading documentation and a number of code examples (many from this site), I tried to just get a basic code going to see if my rewrites will even work, as I'm new to this.
This appears to force the URL to rewrite but isn't passing the user along to the original link:
RewriteEngine On
RewriteCond %{HTTP_HOST} ^sub.domain.com [NC]
RewriteRule .* https://sub.domain.com/index.html?o=%{HTTP_HOST}%{REQUEST_URI}?%{QUERY_STRING} [L]
Also please note this is all going inside a virtual host, not htaccess. As I said, the code above appears to redirect when I test it, but it might be unintended and not at all how I should write it.
I also tried adding in this code for images/stylesheet ignores
RewriteCond %{REQUEST_URI} !\.(css|jpg|jpeg)$ [NC]
At that point everything went crazy, but I know it's because I'm slapping code together and thinking it's going to work. From that point I tried a lot of changes, but most resulted in a loop condition and I kept falling back to square one (original code you see above). Apologies for the long winded post, but I'm hitting my head against a wall at something I thought wouldn't be difficult. Obviously, despite reading, I'm lacking some understanding. Any guidance would be very helpful.
The problem with your current solution is: an URL is only allowed to have one ? inside.
Note that your code will send a redirection status code to redirect inside the browser, because you provided a FQDN. Try doing an internal redirect by just using /index.html?o=.... If that doesn't work either (not sure right now), URL-Encode the second ?
[Edit]: rewrite, guess I got the question wrong.

Simple mod_rewrite, replace one word in every instance

I've been looking for an answer to this forever and can't find it, yet it seems like it should be so simple!
I want to use mod_rewrite to replace a word in a url in every instance that it shows up, but I don't want a redirect to happen, just changing the way the url appears to site users.
Example:
Change
mysite.com/something/groups/anything...
to:
mysite.com/something/projects/anything...
I know I could go through and start tweaking files but mod_rewrite would work much better because I'm sure I'll mess something up otherwise (for reference I'm using joomla/jomsocial).
RewriteEngine On
RewriteRule something/groups/.*$ something/projects/$1

apache mod_rewrite: using database to update rewrite rules

Total newbie at mod_rewrite.
Let's say I want to create nice URLs for every manufacturer on my site,
so I have
www.mysite.com/samsung
www.mysite.com/sony
www.mysite.com/acme
works well enough.
However, if I have hundreds of manufacturers and if they're changing constantly, what then? There are some vague references for something called rewrite map somewhere but nothing that explains it and no tutorials. Can anyone help?
Also, why is this problem not the main topic covered in tutorials for mod_rewrite? How is mod_rewrite possibly useful when you have to maintain it manually (assuming you have new content on your site once in a while)?
There is also mention of needing to have access to httpd.conf
How do I access httpd.conf on my hosting provider's server? How does every other site do this?
Thanks
Just came across this answer while searching for a similar solution — searching a bit further I discovered that mod_rewrite now has the RewriteMap directive, which will do exactly what you want without the need to run PHP or another scripting language.
It lets you define a mapping rule with a text file, a DBM file, an external script or an SQL query.
I hope that helps!
The way this would typically be done is that you would take all URLs that match a specific pattern and route them to a PHP file (or whatever your server-side programming language is) for more complex routing. Something like this:
RewriteRule ^(.*)$ myroute.php?url=$1 [QSA,L]
Then, in your myroute.php file, you can include logic to look at the "url" query string parameter, since it will contain the original URL that came in. Perhaps you could match it to a manufacturer in the database, or whatever else is required.
This example obviously takes all URLs and maps them to myroute.php. Another example might be something like:
RewriteRule ^/manufacturers/(.*)$ manuf.php?name=$1 [QSA,L]
In this case, it will map URLs like so:
/manufacturers/sony => /manuf.php?name=sony
/manufacturers/samsung => /manuf.php?name=samsung
etc...
In this case, your manuf.php file could look up the database based on the name query string parameter.

Apache - Prettifying URLs with mod_rewrite while also catching some edge cases

Sorry to bug everyone with another mod_rewrite problem but you know the drill.
Basically, I have viewer.php, which accepts two arguments, chapter and page. Sometimes people will request a chapter only, and sometimes they will request a chapter and page. i.e. viewer.php?chapter=10 or viewer.php?chapter=10&page=5. The php is smart enough to display page one for users who don't specify a page, and I don't care about users who request viewer.php?page=3&chapter=50, nobody will do that.
I want to hide viewer.php from the public and make the format c5/p3.html and c5 canonical. i.e. example.com/c5/p3.html displays the results of example.com/viewer.php?chapter=5&page=3 and example.com/c5 displays the results of example.com/viewer.php?chapter=5. If I can I'd also like to catch people who forget the .html, i.e. example.com/c14/p3. In all these cases I want their address-bar URL to change as well as them being served the appropriate viewer.php content.
This is my current attempt at doing that, but it has problems.
## PRETTIFY URLS
# We'll help those who screw it up and forget the .html (i.e. /c12/p3), but..
RewriteRule c([0-9\.]+)/p([0-9]+)?$ /c$1/p$2.html [R=Permanent,NC]
# (this is a vestige of when I thought I wanted /p1.html appended for those who didn't specify a page, changed my mind now)
RewriteRule c([0-9\.]+)(/)?$ /c$1/p1.html [R=Permanent,NC]
# The canonical form is /c12/p3.html and that's that.
RewriteRule c([0-9\.]+)/p([0-9]+).html?$ /viewer.php?chapter=$1&page=$2`
This works great for c1, c14/p3.html and c14/p3. But: by virtue of the second RewriteRule (which I can't figure out how to remove without Apache showing a "Moved permanently" error page that links to itself) it transforms c5/ into c5/p1.html when I'd rather it just remove the trailing slash and become c5. It also throws a 404 if the user requests c5/p4/ instead of knowing what they meant and transforming it into c5/p4.html.
As an additional problem, I have a form somewhere that uses method="get" to submit a chapter to viewer.php, and in that case the underlying view.php?chapter=5 structure is shown to them in the resultant URL, so maybe I should add a rule that grabs direct requests to viewer.php and puts them in the newer style somehow.
So, could anyone help me with this? I hope I've been clear enough in what I want. It would seem to me that if modifying my existing code, I need to handle trailing slashes better and somehow clean up requests for viewer.php in the c5 style without causing an infinite loop.
Help is so so much appreciated.
Try these rules:
RewriteRule ^c([0-9]+)/p([0-9]+)/?$ /c$1/p$2.html [R=Permanent,NC]
RewriteRule ^c([0-9]+)/?$ /c$1/p1.html [R=Permanent,NC]
RewriteRule ^c([0-9]+)/p([0-9]+)\.html$ viewer.php?chapter=$1&page=$2