ok, lets start from scratch. I just realized this is apache and not phpmyadmin, my bad.
Anyway, I needed some sort of file storage accessible through the web. I deleted the index.html to list the other files in /var/www. Now if I open the json file (UTF8 w/o BOM) in the browser, the special charakters like ä,ü,ö are not correctly displayed (normal chars are). If I download the file, all is correct on my system.
So the file itself is fine, but the stream from apache to the web is not in UTF8, or something like that. And that I would like to change.
I need this for an android app, where I parse the content of the json file with volley lib. But there it also gets the special charakters wrong.
hope this is more usefull than befor. my apologies for that.
The only thing that is wrong is that your browser doesn't know it should interpret the UTF-8 encoded JSON file as UTF-8. Instead it falls back to its default Latin-1 interpretation, in which certain characters will screw up, because it's using the wrong encoding to interpret the file.
That is all. The file will appear fine if it is interpreted using the correct encoding, UTF-8 in this case.
Use the View → Encoding menu of your browser to force it to UTF-8 and see it work.
Why doesn't the browser use UTF-8? Because there's no HTTP Content-Type header telling it to do so. Why is there no appropriate HTTP header set? Because you didn't tell your web server that it should set this header for .json files. How do you tell Apache to do so? By adding this line in an .htaccess file:
AddCharset UTF-8 .json
Related
I want to archive an old website which was built with PHP. Its URLs are full of .phps and query strings.
I don't want anything to actually change from the perspective of the visitor -- the URLs should remain the same. The only actual difference is that it will no longer be interactive or dynamic.
I ran wget --recursive to spider the site and grab all the static content. So now I have thousands of files such as page.php?param1=a¶m2=b. I want to serve them up as they were before, so that means they'll mostly have Content-Type: text/html, and the webserver needs to treat ? and & in the URL as literal ? and & in the files it looks up on disk -- in other words it needs to not support query strings.
And ideally I'd like to host it for free.
My first thought was Netlify, but deployment on Netlify fails if any files have ? in their filename. I'm also concerned that I may not be able to tell it that most of these files are to be served as text/html (and one as application/rss+xml) even though there's no clue about that in their filenames.
I then considered https://surge.sh/, but hit exactly the same problems.
I then tried AWS S3. It's not free but it's pretty close. I got further here: I was able to attach metadata to the files I was uploading so each would have the correct content type, and it doesn't mind the files having ? and & in their filenames. However, its webserver interprets ?... as a query string, and it looks up and serves the file without that suffix. I can't find any way to disable query strings.
Did I miss anything -- is there a way to make any of the above hosts act the way I want them to?
Is there another host which will fit the bill?
If all else fails, I'll find a way to transform all the filenames and all the links between the files. I found how to get wget to transform ? to #, which may be good enough. It would be a shame to go this route, however, since then the URLs are all changing.
I found a solution with Netlify.
I added the wget options --adjust-extension and --restrict-file-names=windows.
The --adjust-extension part adds .html at the end of filenames which were served as HTML but didn't already have that extension, so now we have for example index.php.html. This was the simplest way to get Netlify to serve these files as HTML. It may be possible to skip this and manually specify the content types of these files.
The --restrict-file-names=windows alters filenames in a few ways, the most important of which is that it replaces ? with #. This is needed since Netlify doesn't let us deploy files with ? in the name. It's a bit of a hack; this is not really what this option is meant for.
This gives static files with names like myfile.php#param1=value1¶m2=value2.html and myfile.php.html.
I did some cleanup. For example, I needed to adjust a few link and resource paths to be absolute rather than relative due to how Netlify manages presence or lack of trailing slashes.
I wrote a _redirects file to define URL rewriting rules. As the Netlify redirect options documentation shows, we can test for specific query parameters and capture their values. We can use those values in the destinations, and we can specify a 200 code, which makes Netlify handle it as a rewrite rather than a redirection (i.e. the visitor still sees the original URL). An exclamation mark is needed after the 200 code if a "query-string-less" version (such as mypage.php.html) exists, to tell Netlify we are intentionally shadowing.
/mypage.php param1=:param1 param2=:param2 /mypage.php#param1=:param1¶m2=:param2.html 200!
/mypage.php param1=:param1 /mypage.php#param1=:param1.html 200!
/mypage.php param2=:param2 /mypage.php#param2=:param2.html 200!
If not all query parameter combinations are actually used in the dumped files, not all of the redirect lines need to be included of course.
There's no need for a final /mypage.php /mypage.php.html 200 line, since Netlify automatically looks for a file with a .html extension added to the requested URL and serves it if found.
I wrote a _headers file to set the content type of my RSS file:
/rss.php
Content-Type: application/rss+xml
I hope this helps somebody.
When I try to GET images that have special characters like ấ in the filename, I can't read the files on the frontend. It will always throw a 404 error when navigating to the url as well.
My server os is CentOS, and my site is running on Apache with Nodejs. I was wondering if I have to somehow change the file encoding in order to read images with special characters. All normal images work fine, it just seems to not recognize the images with special characters at all.
There are a lot of files, which makes renaming them all not an option for me unfortunately. If anyone knows what I have to do to get the files to the correct encoding, please let me know.
Update: I've discovered a way to find the files, but I dont understand the encoding pattern. For example a file known as kt-giấy-2.jpg can be viewed directly using kt-gia%CC%82%CC%81y-2.jpg, does anyone know what kind of encoding this is? It doesnt line up with URI encoders.
For anyone that has this issue. My issue was that I transferred the files from Mac Osx to Centos directly through a zip file through Cpanel. The files are fine, but you need to use convmv to change the files. The files were readable, but they werent in the exact encoding.
Mac OSX encodes in NFC, every other os encodes in NFD
use this command in the directory of the files you want to encode differently.
convmv -r -f utf8 -t utf8 --nfc --notest .
I'm writing an apache2 module
by default and when viewed in a web browser, the module would only print the first lines of a large file and convert them to HTML.
if the user choose to 'download as...', the whole raw file would be downloaded.
Is it possible to detect this choice on the server side ? (for example is there a specific http header set ?).
note: I would like to avoid any parameter in the GET url (e.g: "http://example.org/file?mode=raw" )
Pierre
added my own answer to close the question: as said #alexeyten there is no difference. I ended by a javascript code the alter the index.html file generated by apache.
I have an application that generates xml files, and they might contain special characters. My problem is that Apache will not give me the xml file if the url with the special character is encoded.
Example:
File ABCö.xml is accessible by http://host/path/ABCö.xml, but if accessed with encoded url http://host/path/ABC%F6.xml apache gives me an 404.
Is this a setting in httpd.conf or do I need som rewriting to make the xml files accessible by both urls?
You may have an encoding issue.
Most (all?) modern browsers use UTF-8 when encoding special characters in URLs that the user inputs directly into the address bar.
So when you enter ABCö.xml say in Firefox, it will transform ö into its UTF-8 multi-byte representation, so the end result will be
ABC%C3%B6.xml
and not the single-byte
ABC%F6.xml
only one of them will work. Check which encoding is used in your file name.
I'm trying to make a page that will show Arabic/Hebrew in the URL.
for example: www.mydomain.co.ar/אבא.php
Problem is, when i upload the page to the Apache server and try to browse to that
page either with "www.mydomain.co.ar/אבא.php" or the percent encoding way
"www.mydomain.co.ar%D7%90%D7%91%D7%90.php" i get a 404.
Then i list the directory and apache sees àáà.php.
I know there is a way to show up non ASCII in url, wikipedia is doing it for ages.
My thoughts are maybe .htaccess rewrite? if so how can i accomplish that?
Looks like you have to tell apache that the file system is encoded in UTF-8 (or whatever). Maybe starting apache with an UTF-8 locale active (LC_CTYPE=ar.utf8 or similar) helps there.
Wikipedia parses the URLs in the PHP software (and then asks the database about the right article), so this does not necessarily say how Apache does this.