Apache rewriting eats one level of escaping (%23) - apache

I want to use fancy URLs for a tag filter on my website. The URLs should look like http://example.com/source/+tag1+tag2. This should filter for all items tagged with tag1 and tag2. I came up with the following rewrite rule for that, which I have saved to the root directory of the site:
RewriteRule ^([^+]+)(\+.+)$ $1?tags=$2 [L]
This works fine for all normal tag names, but it fails for the tag name "c#". I know that the hash character is not sent to the server, so the tag name is url-encoded and the link in the HTML page is like this: ./+c%23 But the target page will only see the "c" in its tags parameter, the rest and anything after the "#" is not there anymore.
I have enabled Apache's rewrite logging and saw that it already logs the incoming URL request like …/+c#. This made me think that another level of escaping could be required. So I tried with %2523 which actually passed the rewriting successfully and the whole string "c#" turned up in my page.
But then again, when I access the page with its internal URL like ./?tags=c%23, it already works, too. So why is Apache eating up one level of escaping? Is there a hidden rewrite flag I can use to avoid that? Do I need to use public URLs that are double-encoded for my fancy URLs to work? Or will it be too messy and I should instead just rename my tag to "csharp"?

I think you need the B flag (so use [L,B])

Related

S3 no trailing slash removes query parameters

I have a website set up using S3 as a static website host. If I have a link such as "xxx.com/play?test=1", this gets a 302 redirect to "xxx.com/play/" with the query parameter stripped.
I am trying to find a way so that the query string parameters gets preserved. I cannot change the original link (xxx.com/play?test=1) - but it feels that within either the redirect rules, or within the objects themselves I can make this work. Is this possible?
Don't know if you found a solution by now... but here it is for future reference.
I'm guessing you have a "play" folder in your bucket and that's why it's redirected.
The solution it to create a "play" object/file (alongside the folder) without the ".html" extension and change the metadata to "text/html".

.htaccess file dropping parts when not using [R]

I want URLs that are of the structure /news/categories/CATEGORY to redirect to /news/categories/dynamic-categories.php?category=CATEGORY
And I have this working for most situations using this .htaccess file rule:
RewriteRule ^news/categories/([a-zA-Z0-9\s]+)/?$ /news/categories/dynamic-categories.php?category=$1 [L]
However, in certain situations, the category names have spaces, and this falls apart. Stuff like /news/categories/with%20space gets rewritten to where I'm only seeing the category GET parameter having the value of with.
However, an odd thing to add to this, if I add the redirect flag ([R]) into it, the rule works (although with a redirect...) and the whole category (with space) gets passed.
What do I need to change here?
This actually appears to be an artifact of some other things.
The PHP page we're redirecting to is embedded though a CMS, and it looks like the query strings are getting stripped off during on of those transfer stages it does. The rewrite rule was right all along.

Hash character in URLs (accessing and redirecting in Apache)

It looks as though this question has been asked in part by some others, but I can't find the answer I'm looking for specifically, so I thought I'd pose my particular scenario in case anyone is able to help.
We have an old website (developed externally by a third party) that is due to be retired and replaced by a new site designed in house. For reasons best known to themselves, the developers of the old site used the hash character as part of the URL for the old site (www.mysite.com/#/my-content-stuff). To assist with the transition and help with SEO I need to set up 301 redirects for the top performing URLs from the old site. As I'm now discovering however, I'm not able to set up a simple redirect in the .htaccess file as I believe it takes the hash character to be a comment and ignores the remainder of the line. I've tried escape characters, using %23 instead, wildcard matching, nothing seems to work.
As a workaround, I wondered about simply creating dummy files with the same paths and URLs as the old site had, then simply creating HTML redirects within them to drive traffic to the correct new pages, but it looks as though the server is doing something similar regarding the hash character in the URL, and ignoring anything afterit. So, if I create a sub-folder on my news server called '#' and create a file in there called 'test.html', I expected to be able to just go to 'www.myNEWsite.com/#/test.html', but it just takes me to the default root file of my site.
Please can anyone shed any light on how I might get around this? I must admit I'm not that clued up on Apache so I'm having to learn a lot as I go.
Many thanks in advance for any pointers or info anyone can provide.
Cheers,
Rich
A hash character in the URL specifies the anchor, and it's not even sent to your webserver. A redirect is impossible on the server side, and the old developer probably did it using JavaScript. Implement fallback URLs without the hash instead, and have a global JavaScript script detect these URLs and redirect automatically.
Hash tags cannot be read by the server. They are regarded as locations within the document and are therefore not exposed to the server. The client is the only one whom see's these. The best you could do is use a "meta refresh" tag, or alternatively, you could use javascript to detect the url, and if its one which requires 301 redirection, use "window.location" to move the user to a full url where mod_rewrite or a php page can issue a 301 header.
However neither are SEO friendly and only really solve the issue for users that click onto an old link via an external site
<!-- Put in head tag so the page does not wait to load the content-->
<script type="text/javascript">
if(window.location.hash != "") {
var h = window.location.hash.match(/#\/?(.*)/i)[1];
switch(h) {
case "something_old":
window.location = "/something_new.html";
break;
case "something_also_old":
window.location = "/something_also_new.html";
break;
}
}
</script>

apache mod_rewrite: using database to update rewrite rules

Total newbie at mod_rewrite.
Let's say I want to create nice URLs for every manufacturer on my site,
so I have
www.mysite.com/samsung
www.mysite.com/sony
www.mysite.com/acme
works well enough.
However, if I have hundreds of manufacturers and if they're changing constantly, what then? There are some vague references for something called rewrite map somewhere but nothing that explains it and no tutorials. Can anyone help?
Also, why is this problem not the main topic covered in tutorials for mod_rewrite? How is mod_rewrite possibly useful when you have to maintain it manually (assuming you have new content on your site once in a while)?
There is also mention of needing to have access to httpd.conf
How do I access httpd.conf on my hosting provider's server? How does every other site do this?
Thanks
Just came across this answer while searching for a similar solution — searching a bit further I discovered that mod_rewrite now has the RewriteMap directive, which will do exactly what you want without the need to run PHP or another scripting language.
It lets you define a mapping rule with a text file, a DBM file, an external script or an SQL query.
I hope that helps!
The way this would typically be done is that you would take all URLs that match a specific pattern and route them to a PHP file (or whatever your server-side programming language is) for more complex routing. Something like this:
RewriteRule ^(.*)$ myroute.php?url=$1 [QSA,L]
Then, in your myroute.php file, you can include logic to look at the "url" query string parameter, since it will contain the original URL that came in. Perhaps you could match it to a manufacturer in the database, or whatever else is required.
This example obviously takes all URLs and maps them to myroute.php. Another example might be something like:
RewriteRule ^/manufacturers/(.*)$ manuf.php?name=$1 [QSA,L]
In this case, it will map URLs like so:
/manufacturers/sony => /manuf.php?name=sony
/manufacturers/samsung => /manuf.php?name=samsung
etc...
In this case, your manuf.php file could look up the database based on the name query string parameter.

Apache - Prettifying URLs with mod_rewrite while also catching some edge cases

Sorry to bug everyone with another mod_rewrite problem but you know the drill.
Basically, I have viewer.php, which accepts two arguments, chapter and page. Sometimes people will request a chapter only, and sometimes they will request a chapter and page. i.e. viewer.php?chapter=10 or viewer.php?chapter=10&page=5. The php is smart enough to display page one for users who don't specify a page, and I don't care about users who request viewer.php?page=3&chapter=50, nobody will do that.
I want to hide viewer.php from the public and make the format c5/p3.html and c5 canonical. i.e. example.com/c5/p3.html displays the results of example.com/viewer.php?chapter=5&page=3 and example.com/c5 displays the results of example.com/viewer.php?chapter=5. If I can I'd also like to catch people who forget the .html, i.e. example.com/c14/p3. In all these cases I want their address-bar URL to change as well as them being served the appropriate viewer.php content.
This is my current attempt at doing that, but it has problems.
## PRETTIFY URLS
# We'll help those who screw it up and forget the .html (i.e. /c12/p3), but..
RewriteRule c([0-9\.]+)/p([0-9]+)?$ /c$1/p$2.html [R=Permanent,NC]
# (this is a vestige of when I thought I wanted /p1.html appended for those who didn't specify a page, changed my mind now)
RewriteRule c([0-9\.]+)(/)?$ /c$1/p1.html [R=Permanent,NC]
# The canonical form is /c12/p3.html and that's that.
RewriteRule c([0-9\.]+)/p([0-9]+).html?$ /viewer.php?chapter=$1&page=$2`
This works great for c1, c14/p3.html and c14/p3. But: by virtue of the second RewriteRule (which I can't figure out how to remove without Apache showing a "Moved permanently" error page that links to itself) it transforms c5/ into c5/p1.html when I'd rather it just remove the trailing slash and become c5. It also throws a 404 if the user requests c5/p4/ instead of knowing what they meant and transforming it into c5/p4.html.
As an additional problem, I have a form somewhere that uses method="get" to submit a chapter to viewer.php, and in that case the underlying view.php?chapter=5 structure is shown to them in the resultant URL, so maybe I should add a rule that grabs direct requests to viewer.php and puts them in the newer style somehow.
So, could anyone help me with this? I hope I've been clear enough in what I want. It would seem to me that if modifying my existing code, I need to handle trailing slashes better and somehow clean up requests for viewer.php in the c5 style without causing an infinite loop.
Help is so so much appreciated.
Try these rules:
RewriteRule ^c([0-9]+)/p([0-9]+)/?$ /c$1/p$2.html [R=Permanent,NC]
RewriteRule ^c([0-9]+)/?$ /c$1/p1.html [R=Permanent,NC]
RewriteRule ^c([0-9]+)/p([0-9]+)\.html$ viewer.php?chapter=$1&page=$2