.htaccess - rewriting url requests and taking care of resource files - apache

I thought i should open a new question for a matter related to my previous one (you can find it here). The situation is very similar to that question, but i found a different approach, and would like to know if it's correct.
I wanted to rewrite the urls for my site, and being able to take care of resource files (css, js, images, etc) so that browser doenn't search for them in the wrong directory.
Following are snippets of the html code of a sample page found browsing for example to articles/writer/erenor (here htaccess rewrites the url to /articles.php?writer=erenor, and this part works well).
Into <head> tag:
<script type="text/javascript" src="./inc-javascript-files/jquery.js"></script>
<style type="text/css">
#import url(./inc-css-files/index.css);
</style>
Into <body> tag:
<img alt="Avatar" src="./inc-images-files/avatar.png">
<img alt="Pattern" src="./inc-images-files/pattern/violet.png">
<br><br>
Writer Erenor
Now, i have this snippet from .htaccess file:
# take care of resource files
RewriteRule inc\-(css|javascript|images)\-files/(.*)\.(png|jp[e]g|gif|js|css) includes-$1/$2.$3 [L]
# url rewriting
RewriteRule ^articles/writer/(\w*)/?$ articles.php?writer=$1
And, finally, the example of the file structure:
/mysite
/mysite/.htaccess //this is the htaccess file we are talking about ;-)
/mysite/articles.php
/mysite/includes-css
/mysite/includes-css/index.css
/mysite/includes-javascript
/mysite/includes-javascript/jquery.js
/mysite/includes-images
/mysite/includes-images/avatar.png
/mysite/includes-images/pattern
/mysite/includes-images/pattern/violet.png
I just tested the code, and it seems to work: my browser asks for the css file (searching it in the "wrong" place) and it retrieves it correctly, so i'm quite happy with it :)
Links in the page will be like the one in the html shown above, which seems to work well.
Questions:
1. Is this a good approach to be able to avoid browsers requests to resources being in the "wrong" place? (I know that i will have to add more parameters when it comes to other files like for example videos, txt's, tga, etc etc etc)
2. Should i move this site to production server, will it work without modifications? In other words, is this a kind of "box" that can be moved here and there easily?
A little note: since these are just code snippets, tell me if something appears to be missing..maybe i just forgot to copy/paste it ;-)

You seem to be requesting the files in the current directory. Why do you do that? Why don't you request the from the site root? (remove the '.'). If I read it correctly, on /articles/writer/asdf and /articles/writer/zxcv the files will request /articles/writer/asdf/inc-css-files/index.css and /articles/writer/zxcv/inc-css-files/index.css. The browser sees this as two different files. If the user requests 100 writers, it will download index.css 100 times, and caches the same file 100 times under different names. That is wasteful.
I recommend requesting the files relative to the site root instead if you have the files stored in a folder in the site root.

Related

Archiving an old PHP website: will any webhost let me totally disable query string support?

I want to archive an old website which was built with PHP. Its URLs are full of .phps and query strings.
I don't want anything to actually change from the perspective of the visitor -- the URLs should remain the same. The only actual difference is that it will no longer be interactive or dynamic.
I ran wget --recursive to spider the site and grab all the static content. So now I have thousands of files such as page.php?param1=a&param2=b. I want to serve them up as they were before, so that means they'll mostly have Content-Type: text/html, and the webserver needs to treat ? and & in the URL as literal ? and & in the files it looks up on disk -- in other words it needs to not support query strings.
And ideally I'd like to host it for free.
My first thought was Netlify, but deployment on Netlify fails if any files have ? in their filename. I'm also concerned that I may not be able to tell it that most of these files are to be served as text/html (and one as application/rss+xml) even though there's no clue about that in their filenames.
I then considered https://surge.sh/, but hit exactly the same problems.
I then tried AWS S3. It's not free but it's pretty close. I got further here: I was able to attach metadata to the files I was uploading so each would have the correct content type, and it doesn't mind the files having ? and & in their filenames. However, its webserver interprets ?... as a query string, and it looks up and serves the file without that suffix. I can't find any way to disable query strings.
Did I miss anything -- is there a way to make any of the above hosts act the way I want them to?
Is there another host which will fit the bill?
If all else fails, I'll find a way to transform all the filenames and all the links between the files. I found how to get wget to transform ? to #, which may be good enough. It would be a shame to go this route, however, since then the URLs are all changing.
I found a solution with Netlify.
I added the wget options --adjust-extension and --restrict-file-names=windows.
The --adjust-extension part adds .html at the end of filenames which were served as HTML but didn't already have that extension, so now we have for example index.php.html. This was the simplest way to get Netlify to serve these files as HTML. It may be possible to skip this and manually specify the content types of these files.
The --restrict-file-names=windows alters filenames in a few ways, the most important of which is that it replaces ? with #. This is needed since Netlify doesn't let us deploy files with ? in the name. It's a bit of a hack; this is not really what this option is meant for.
This gives static files with names like myfile.php#param1=value1&param2=value2.html and myfile.php.html.
I did some cleanup. For example, I needed to adjust a few link and resource paths to be absolute rather than relative due to how Netlify manages presence or lack of trailing slashes.
I wrote a _redirects file to define URL rewriting rules. As the Netlify redirect options documentation shows, we can test for specific query parameters and capture their values. We can use those values in the destinations, and we can specify a 200 code, which makes Netlify handle it as a rewrite rather than a redirection (i.e. the visitor still sees the original URL). An exclamation mark is needed after the 200 code if a "query-string-less" version (such as mypage.php.html) exists, to tell Netlify we are intentionally shadowing.
/mypage.php param1=:param1 param2=:param2 /mypage.php#param1=:param1&param2=:param2.html 200!
/mypage.php param1=:param1 /mypage.php#param1=:param1.html 200!
/mypage.php param2=:param2 /mypage.php#param2=:param2.html 200!
If not all query parameter combinations are actually used in the dumped files, not all of the redirect lines need to be included of course.
There's no need for a final /mypage.php /mypage.php.html 200 line, since Netlify automatically looks for a file with a .html extension added to the requested URL and serves it if found.
I wrote a _headers file to set the content type of my RSS file:
/rss.php
Content-Type: application/rss+xml
I hope this helps somebody.

How to make a pretty URL using .htaccess

I am creating my whole application with .html extension, to play with the database I am using jQuery Ajax. I have created the project structure like WordPress, for each file I am having a folder and inside that folder I am having an index.html file.
In the above picture, I have created user/equipment/index.html, in this file all the equipment are being shown, now I want that if user clicks on an equipment then the URL should be like 'domain.com/user/equipment/equipment-title' and the file should be called user/equipment/details/index.html
I believe that this can be done with a .htaccess file.
Any solution for the problem would be much appreciated.
Well, you need to store the references in that index file the way you want them to be, request rewriting (wo which you refer as ".htaccess") cannot do that for you. Why you can do with request rewriting, so inside a distributed configuration file (".htaccess") is to the internally rewrite the incoming requests. For that you need a mapping from request URLs to your detail pages. If the mapping the simply the name as to be found in the "equipment" folder (this is unclear from your question), then you indeed can simply implement a rewriting rule.
This would be such an example:
RewriteEngine on
RewriteRule ^/?user/equipment/(.*)/?$ /equipment/$1 [END]
This will deliver the content of the file /equipment/equipment-title when the URL https://example.com/user/equipment/equipment-title gets requested and that file exists.
For this to work the rewriting module has to be enabled inside your http server and, if you want to use a distributed configuration file for this, the interpretation of such files also needs to be enabled for that location inside your http server. Usually the better alternative is to place such rules in the real http server's host configuration, though.

htaccess rule - relative files

It's probably an easy thing to fix however I tried to google this stuff but I'm not sure how to put it in words so I couldn't find anything which would help me.
I have a problem where my very simple .htaccess changes my url as it supposed to but all the resources are trying to get loaded from a wrong place.
My Url:
http://domain.com/index.html?sport=test
My re-written Url:
http://domain.com/test/
.htaccess:
RewriteEngine On
RewriteRule ^([^/]*)/$ /index.html?sport=$1 [L]
now when I type in: http://domain.com/test/ it loads the correct index file however every resource file is trying to get downloaded from test folder...
this is an example resource file location (relative to the index.html):
css/styles.css
js/main.js
but it's looking for them in:
test/css/styles.css
test/js/main.js
Cheers
You've hit the most common problem people face when switching to pretty URL schemes. Solution is also simple, just use absolute path in your css, js, images files rather than a relative one. Which means you have to make sure path of these files start either with http:// or a slash /.
OR you can try adding this in your page's HTML header:
<base href="/" />

Mod Rewrite to add query string to all static files for cache busting

I am trying to append a query string to the end of all .js files, without actually making a code change. The purpose of this would be so static files get pulled from the server rather than the cache when they are changed so we don't have any stale static files.
So when my html says
<script type="text/javascript" src="file1.js">
I want it to actually pull
<script type="text/javascript" src="file1.js?v=1">
from the server. Is this possible?
So far I have:
RewriteEngine on
RewriteCond %{QUERY_STRING} ^$
RewriteRule ^(.*)\.js$ /$1.js?v=1 [L,R]
But I don't think it's quite doing what I want it to...
This doesn't make sense: if the resource is cached with an expiry date, the browser will never make the request in the first place.
The much more straighforward way is to set up proper caching rules for the JS file as outlined here: How to prevent http file caching in Apache httpd (MAMP), adjusting the filesMatch directive to your liking.
This will double the number of requests for the files (all .js files in this case), but it should work otherwise.
I'm using a directory (effectively modifying the filename), rather than a query string because that is the preferred technique. [1][2]
First, redirect to a different folder name
ReWriteRule ^(.*\.js)$ /rev/000/$1 [L,R]
Replace the 000 part with your revision number (you should be able to automate this)
The L flag stops processing rewrites, and the R does an HTTP redirect, initiating a second request to your server using the new file name.
Then catch requests tagged using the new "folder" and point them back to the original file using a rewrite rule that does not invoke another request (your first rule will be passed over this time around)
ReWriteRule ^/rev/[0-9]+/(.*)$ $1
Ok I know the question is old, but while looking for something similar I found two clever Apache plugins:
mod_pagespeed:
This thing rewrites the html page that you are serving and replaces the referenced static resource links (CSS, JS, imgs) with modified ones. When static resource on disk changes, the plugin will generate a completely new url for it. There is some content based hash included I guess. And this busts the cache because the client will have to make a new request.
So caching time of the original html page should be rather short, but all related static resources can be cached for extremely long times.
mod_proxy_html:
This one also can edit html on the fly. But it is a lot simpler and only offers rewriting links, no minification and other fancy stuff like in mod_pagespeed. But sometimes this is all what you need, as this probably mean there is less risk that page will be broken by rewriting.

URL Rewrite for relative URLs on localhost

I just moved my website to a folder (called test) inside the root of my Apache web server and now I am getting 404 errors while the page tries to fetch different resources.
When I looked at the urls - they seem to be pointing to the root of server as they are root relative URLs (as opposed to be pointing to the folder [test] inside the root).
For example: when the index page of my site has a reference to an image like
<img src="/images/img-1.jpg" alt="Image 1" />
The page when executed, tries to fetch the image from the following url
http://localhost/slider-images/img-1.jpg
instead of accessing the image from the url
http://localhost/slider-images/test/img-1.jpg
I have been trying various options like having a ReWriteBase etc, but that doesn't seem to be working!
What works instead is me changing the resource path to be an absolute one, but that is just too much of work for me to do for all the resources!
<img src="./images/img-1.jpg" alt="Image 1" />
Any help is much appreciated.
Thanks
Update
A similiar question has been posted earlier, but I don't a response that solved the issue
Converting relative URL requests to absolute URL request using mod_rewrite
After hours of troubleshooting, I did a workaround which can be used a temp fix.
Posting it here to help those who possibly could be having the same problem.
Simply changed the DocumentRoot in httpd.conf file to point to the sub-directory where I had moved my website files for testing.
Hope this helps those who can make use it this fix.
I would still be really interested to know of a way to address the root-relative URL rewrites via htaccess and hopefully someone here would be able to help.
Thanks!