Showing non-ascii characters in URL - apache

I'm trying to make a page that will show Arabic/Hebrew in the URL.
for example: www.mydomain.co.ar/אבא.php
Problem is, when i upload the page to the Apache server and try to browse to that
page either with "www.mydomain.co.ar/אבא.php" or the percent encoding way
"www.mydomain.co.ar%D7%90%D7%91%D7%90.php" i get a 404.
Then i list the directory and apache sees àáà.php.
I know there is a way to show up non ASCII in url, wikipedia is doing it for ages.
My thoughts are maybe .htaccess rewrite? if so how can i accomplish that?

Looks like you have to tell apache that the file system is encoded in UTF-8 (or whatever). Maybe starting apache with an UTF-8 locale active (LC_CTYPE=ar.utf8 or similar) helps there.
Wikipedia parses the URLs in the PHP software (and then asks the database about the right article), so this does not necessarily say how Apache does this.

Related

Archiving an old PHP website: will any webhost let me totally disable query string support?

I want to archive an old website which was built with PHP. Its URLs are full of .phps and query strings.
I don't want anything to actually change from the perspective of the visitor -- the URLs should remain the same. The only actual difference is that it will no longer be interactive or dynamic.
I ran wget --recursive to spider the site and grab all the static content. So now I have thousands of files such as page.php?param1=a&param2=b. I want to serve them up as they were before, so that means they'll mostly have Content-Type: text/html, and the webserver needs to treat ? and & in the URL as literal ? and & in the files it looks up on disk -- in other words it needs to not support query strings.
And ideally I'd like to host it for free.
My first thought was Netlify, but deployment on Netlify fails if any files have ? in their filename. I'm also concerned that I may not be able to tell it that most of these files are to be served as text/html (and one as application/rss+xml) even though there's no clue about that in their filenames.
I then considered https://surge.sh/, but hit exactly the same problems.
I then tried AWS S3. It's not free but it's pretty close. I got further here: I was able to attach metadata to the files I was uploading so each would have the correct content type, and it doesn't mind the files having ? and & in their filenames. However, its webserver interprets ?... as a query string, and it looks up and serves the file without that suffix. I can't find any way to disable query strings.
Did I miss anything -- is there a way to make any of the above hosts act the way I want them to?
Is there another host which will fit the bill?
If all else fails, I'll find a way to transform all the filenames and all the links between the files. I found how to get wget to transform ? to #, which may be good enough. It would be a shame to go this route, however, since then the URLs are all changing.
I found a solution with Netlify.
I added the wget options --adjust-extension and --restrict-file-names=windows.
The --adjust-extension part adds .html at the end of filenames which were served as HTML but didn't already have that extension, so now we have for example index.php.html. This was the simplest way to get Netlify to serve these files as HTML. It may be possible to skip this and manually specify the content types of these files.
The --restrict-file-names=windows alters filenames in a few ways, the most important of which is that it replaces ? with #. This is needed since Netlify doesn't let us deploy files with ? in the name. It's a bit of a hack; this is not really what this option is meant for.
This gives static files with names like myfile.php#param1=value1&param2=value2.html and myfile.php.html.
I did some cleanup. For example, I needed to adjust a few link and resource paths to be absolute rather than relative due to how Netlify manages presence or lack of trailing slashes.
I wrote a _redirects file to define URL rewriting rules. As the Netlify redirect options documentation shows, we can test for specific query parameters and capture their values. We can use those values in the destinations, and we can specify a 200 code, which makes Netlify handle it as a rewrite rather than a redirection (i.e. the visitor still sees the original URL). An exclamation mark is needed after the 200 code if a "query-string-less" version (such as mypage.php.html) exists, to tell Netlify we are intentionally shadowing.
/mypage.php param1=:param1 param2=:param2 /mypage.php#param1=:param1&param2=:param2.html 200!
/mypage.php param1=:param1 /mypage.php#param1=:param1.html 200!
/mypage.php param2=:param2 /mypage.php#param2=:param2.html 200!
If not all query parameter combinations are actually used in the dumped files, not all of the redirect lines need to be included of course.
There's no need for a final /mypage.php /mypage.php.html 200 line, since Netlify automatically looks for a file with a .html extension added to the requested URL and serves it if found.
I wrote a _headers file to set the content type of my RSS file:
/rss.php
Content-Type: application/rss+xml
I hope this helps somebody.

-Apache- files from "website" not UTF8

ok, lets start from scratch. I just realized this is apache and not phpmyadmin, my bad.
Anyway, I needed some sort of file storage accessible through the web. I deleted the index.html to list the other files in /var/www. Now if I open the json file (UTF8 w/o BOM) in the browser, the special charakters like ä,ü,ö are not correctly displayed (normal chars are). If I download the file, all is correct on my system.
So the file itself is fine, but the stream from apache to the web is not in UTF8, or something like that. And that I would like to change.
I need this for an android app, where I parse the content of the json file with volley lib. But there it also gets the special charakters wrong.
hope this is more usefull than befor. my apologies for that.
The only thing that is wrong is that your browser doesn't know it should interpret the UTF-8 encoded JSON file as UTF-8. Instead it falls back to its default Latin-1 interpretation, in which certain characters will screw up, because it's using the wrong encoding to interpret the file.
That is all. The file will appear fine if it is interpreted using the correct encoding, UTF-8 in this case.
Use the View → Encoding menu of your browser to force it to UTF-8 and see it work.
Why doesn't the browser use UTF-8? Because there's no HTTP Content-Type header telling it to do so. Why is there no appropriate HTTP header set? Because you didn't tell your web server that it should set this header for .json files. How do you tell Apache to do so? By adding this line in an .htaccess file:
AddCharset UTF-8 .json

How do I rewrite URLs with Nginx admin / Apache / Wordpress

I have the following URL format:
www.example.com/members/admin/projects/?projectid=41
And I would like to rewrite them to the following format:
www.example.com/avits/projectname/
Project names do not have to be unique when a user creates them therefore I will be checking for an existing name and appending an integer to the end of the project name if a project of the same name already exists. e.g. example.project, example.project1, example.project2 etc.
I am happy setting up the GET request to query the database by project name however I am having huge problems setting up these pretty url's.
I am using Apache with Nginx Admin installed which mens that all static content is served via Nginx without the overhead of apache.
I am totally confused as to whether I should be employing an nginx rewrite rule in my nginx.conf file or standard rewrites in my .htaccess file.
To confuse matters further although this is a rather large custom appliction it is build on top of a wordpress backbone for easy blogging functionality meaning that I also have the built in wordpress rewrite module at my disposal.
I have tried all three methods with absolutely no success. I have read a lot on the matter but simply cannot seem to get anything to work. I am certain this is purely down to a complete lack of understanding on with regards to URL rewriting. Combined with the fact that I don't know which type of rewriting should be applicable in my case means that I am doing nothing more than going round in circles.
Can anyone clear up this matter for me and explain how to rewrite my URLs in the manner described above?
Many thanks.
If you are proxying all the non static file requests to Apache, do the rewrites there - you don't need to do anything on nginx as it will just pass the requests to the back end.
The problem with what you are proposing is that it's not actually a rewrite, a rewrite is taking the first URL and just changing it around or moving the user to another location.
What you need actually takes logic to extrapolate the project name from the project ID.
For example you can rewrite:
www.example.com/members/admin/projects/?projectid=41
To:
www.example.com/avits/41/
Fairly easily, but can you map that /41/ in your app code to change it to /projectname/ - because a URL rewrite can't do that.

Apache URL Redirect Alternatives

One of my clients (before I came along) decided to use htaccess redirects as their form of URL shortening/search engine friendly URLs. They have literally thousands of them.
The new version of the site now has friendly urls but they aren't equivalent to their redirects so they still need them.
My question to you all is: Is there another way than to populate this file with thousands of lines of "Redirects /folder1 /folder2"?
Thanks
If you cannot make simple rules to catch all of them as in the #chris henry solution you can use the RewriteMap utility of mod_rewrite. You'll be able to write these thousand rules in a text file, then make this text file an hash file, and mode_rewrite will try to match url in this file (if it's an hash file it's quite fast). After that mode_rewwrite can generate a redirect 301 with the [L,R=301] tag.
Yep, look at using the Apache config (httpd.conf or httpd-vhosts.conf) to set up site wide folder aliasing. Eg:
Alias /folder1 c:/www/folder2
Look at http://httpd.apache.org/docs/2.0/mod/core.html#directory for more info.
Depending on how different the URLs being redirected are, one solution might be to come up with an rewrite rule that covers all of them, and maintain the short / long URLs in your application, or even a database.

Updating Files on Apache

I'm having trouble with my Apache Web Server. I have a folder (htdocs\images) where I have a number of images already in place. I can browse them and see them on my web server (and access them via HTML). I added a new image in there today, and went to browse to it, and it can't be found. I double and triple checked the path and everything. I even restarted Apache and that didn't seem to help.
I'm really confused as to what's going on here. Anybody have any suggestions?
Thank you.
Edit I just turned on the ability for the images directory to be listed, browsed to it (http://127.0.0.1/images/) and I was able to see all the previous images that were in the folder, but not the new one.
Turn directory indexes on for htdocs\images, remove (or move out of the way) any index.* files, and point your browser at http://yoursite/images/
That should give you a full listing of files in that directory. If the file you're looking for isn't there, then Apache is looking at a different directory than you think it is. You'll have to search your httpd.conf for clues -- DocumentRoot, Alias, AliasMatch, Redirect, RedirectMatch, RewriteRule -- there are probably dozens of apache directives that could be causing the web server to get its documents from somewhere other than where you think it's looking.
make sure the caSE and spelling are 100% correct.
There is not magic in programming (some may disagree:), so look for silly errors. Wrong server? Case of your letters? Wrong extension?
There's a chance it could be due to the cookies stored on your device. I would delete all cookies to the website you're working on before you refresh again