Automatic Internationalization Testing For Web

Automatic Internationalization Testing For Web - testing

Does there exist a website service or set of scripts that will tell you whether your web page badly configured if your goal is to be internationally friendly?
To be more precise, I'm wondering if something like this exists:
Checking URL: http://www.example.com
GET / HTTP/1.0
Accept-Charset: utf8
...
HTTP/1.0 200 OK
Charset: iso-8859-1
..<?xml version="1.0" charset="utf8" ?>
WARNING: Header document conflict, your server claims to return iso-8859-1, but
includes octet values outside the legal range. This can happen when your documents
are saved with a different character set than your web server is configured to serve.
From my understanding its unlikely that this will help me make a website that will allow people to post in Japanese or Hebrew, but it might be able to help my English websites reach a larger international audience.

I believe the W3C validator does it, but maybe not to the extent you are looking for...

Related

W3C validator and HTTPS

I'm currently having trouble with the W3C markup validation service https://validator.w3.org and the use of HTTPS. When I type in there the website address with https I get the following response:
Sorry! This document cannot be checked.
Together with an error 500 saying that it can't connect to the site. Also, on the website page I have one link which carries the person into the validation and shows the site has been validated. When clicking the link without HTTPS everything works, but with HTTPS I get one message
Sorry! This document cannot be checked. No Referer header found!
which I believe is because the secure connection doesn't send the referer header right?
Now, how can I use HTTPS and avoid these problems with the validation?

Please always directly use https://validator.w3.org/nu/ (the current W3C HTML Checker) instead of https://validator.w3.org/ (the legacy W3C Markup Validator).
The HTML Checker is able to check documents at https URLS just fine. So If you find a https site that it doesn’t work with as expected, then that’s likely a bug I need to fix. (I maintain the checker, and recently updated it to get HTTPS support using HTTP Components HttpClient 4.4 —the latest Apache HTTP client library—including full support for HTTPS sites that use SNI.
A note about which W3C tool to use for checking HTML documents
On the W3C backend, when you use the https://validator.w3.org/ legacy Markup Validator to check documents with <!DOCTYPE html> doctypes, it just hands off the request to the same backend that directly drives the https://validator.w3.org/nu/ HTML Checker. But the HTML Checker has a UI with more features, and using it from https://validator.w3.org/nu/ is faster.
We (the W3C) plan to swap those two around eventually—that is, move the current HTML Checker to https://validator.w3.org/ and move the legacy Markup Validator to https://validator.w3.org/legacy/ or some such—but it will be a while yet before that happens. So in the mean time, as I said, I suggest always just doing all your HTML checking from the https://validator.w3.org/nu/ site.

There seems to be a bug in the W3C NU validator, so the "referer" value is not processed fully. :-/
I.e. the code for their badge <a target="_blank" href="http://validator.w3.org/check/referer"><img src="http://www.w3.org/Icons/valid-xhtml10" alt="Valid XHTML 1.0 Transitional" title="Valid XHTML 1.0 Transitional" style="height: 31px; width: 88px;" /></a>
does not validate my nested sub-page, but just the root-page of the whole web-site instead, on click on the badge, in a footer of the deep sub-page. Sad. :-/
And the same for the alternative parameterized .../check?uri=referer" URL, still the same issue. :-/

Serving gzipped content directly — bad thing to do?

I have my website configured to serve static content using gzip compression, like so:
<link rel='stylesheet' href='http://cdn-domain.com/css/style.css.gzip?ver=0.9' type='text/css' media='all' />
I don't see any website doing anything similar. So, the question is, what's wrong with this? Am I to expect shortcomings?
Precisely, as I understand it, most websites are configured to serve normal static files (.css, .js, etc) and gzipped content (.css.gz, .js.gz, etc) only if the request comes with a Accept-Encoding: gzip header. Why should they be doing this when all browsers support gzip just the same?
PS: I am not seeing any performance issues at all because all the static content is gzipped prior to uploading it to the CDN which then simply serves the gzipped files. Therefore, there's no stress/strain on my server.
Just in case it's helpful, here's the HTTP Response Header information for the gzipped CSS file:
And this for gzipped favicon.ico file:

Supporting Content-Encoding: gzip isn't a requirement of any current HTTP specification, that's why there is a trigger in the form of the request header.
In practice? If your audience is using a web browser and you are only worried about legitimate users then there is very, very slim to no chance that anyone will actually be affected by only having preprocessed gzipped versions available. It's a remnant of a bygone age. Browsers these days should handle being force-fed gzipped content even if they don't request it as long as you also provide them correct headers for the content being given to them. It's important to realise that HTTP request/response is a conversation and that most of the headers in a request are just that; a request. For the most part, the server on the other end is under no obligation to honor any particular headers, and as long as they return a valid response that makes sense the client on the other end should do their best to make sense of what was returned. This includes enabling gzip if the server responds that it has used it.
If your target is machine consumption however, then be a little wary. People still think that it's a smart idea to write their own HTTP/SMTP/etc parsers sometimes even though the topic has been done to death in multiple libraries for pretty much every language out there. All the libraries should support gzip just fine, but hand-rolled parsers usually won't.

Server locale xmlhttp requests recognised as wrong language by remote servers?

I have a web server hosted with 1and1 which evidently is hosted in Germany, so if I try to do a xmlhttp get on data from google or facebook I am presented with German return data as their site presumes I am a German user.
Does anyone know if it is a server setting which needs to be changed or is facebook recognising the IP location?

if the resource is available in two or many languages, the server mast decide which version to serve. he does this often by examining Accept-Language HTTP header. Probably the header in the request issued by yur server says that it accepts any language, so the server prefers to send german not english due to your srever's IP. Try to add the header menually to your request:
Accept-Language: en
so your ajax will look like this:
xmlhttpobject.setRequestHeader('Accept-Language', 'en');

XSS Attack Prevention

I have a web application written in PHP. The templating engine is SMARTY. My question is very simple, yet the answer should not be that easy, because I searched the hell out of it to no avail.
When I telnet to port 80 and run the following command:
GET /some_directory_on_my_server/?""><SCRIPT>alert(123)</SCRIPT>
The servers responds back with an html page. When I save this HTML page and open it in a browser I see alert(123) on top of the page, which means that the site is vulnerable to Cross-Site Scripting (XSS).
My question is how can I access the actual url entered by the user in order to sanitize it? When it comes to user input sanitization for forms or database queries, the scenario seems to be much easier, because you actually have a variable on hand to manipulate, but in the case of actual url entered by the user in a browser, how can I get hold of the url itself to sanitize it?
For your information, I have already read all modules which provide library functions for XSS Prevention, but none gives me an example on how to deal with actual url XSS Attack. By the way, my magic_quote_gpc in my php configuration is already turned off. What should I do now? Any thoughts?

How do I figure out which parts of a web page are encrypted and which aren't?

I'm working on a webserver that I didn't totally set up and I'm trying to figure out which parts of a web page are being sent encrypted and which aren't. Firefox tells me that parts of the page are encrypted, but I want to know what, specifically, is encrypted.

The problem is not always bad links in your page.
If you link to iresources at an external site using https://, and then the external site does its own HTTP redirect to non-SSL pages, that will break the SSL lock on your page.
BUT, when you viewing the source or the information in the media tab, you will not see any http://, becuase your page is properly using only https:// links.
As suggested above, the firebug Net tab will show this and any other problems. Follow these steps:
Install Firebug add-on into firefox if you don't already have it, and restart FF when prompted.
Open Firebug (F12 or the little insect menu to the right of your search box).
In firebug, choose the "Net" tab. Hit "Enable" (text link) to turn it on
Refresh your problem page without using the cache by hitting Ctrl-Shift-R (or Command-shift-R in OSX). You will see the "Net" tab in firefox fill up with a list of each HTTP request made.
Once the page is done loading, hover your mouse over the left colum of each HTTP request shown in the net tab. A tooltip will appear showing you the actual link used. it will be easy to spot any that are http:// instead of https://.
If any of your links resulted in an HTTP redirect, you will see "301 Moved Permanently" in the HTTP status column, and another HTTP request will be just below for the new location. If the problem was due to an external redirect, that's where the evidence will be - the new location's request will be HTTP.
If your problem is due to redirections from an external site, you will see "301 Moved permanently" status codes for the requests that point them to their new location.
Exapnd any of those 301 relocations with the plus sign at the left, and review the response headers to see what is going on. the Location: header will tell you the new location the external server is requesting browsers use.
Make note of this info in the redirect, then send a friendly polite email to the external site in question and ask them to remove the https:// -> http:// redirects for you. Explain how it's breaking the SSL on your site, and ideally include a link to the page that is broken if possible, so that they can see the error for themselves. (this will spur faster action than if you just tell them about the error).
Here is sample output from Firebug for the the external redirect issue.. In my case I found a page calling https:// data feeds was getting the feeds rewritten by the external server to http://.
I've renamed my site to "mysite.example.com" and the external site to "external.example.com", but otherwise left the heders intact. The request headers are shown at the bottom, below the response headers. Note that I"m requesting an https:// link from my site, but getting redirected to an http:// link, which is what was breaking my SSL lock:
Response Headers
Server nginx/0.8.54
Date Fri, 07 Oct 2011 17:35:16 GMT
Content-Type text/html
Content-Length 185
Connection keep-alive
Location http://external.example.com/embed/?key=t6Qu2&width=940&height=300&interval=week&baseAtZero=false
Request Headers
Host external.example.com
User-Agent Mozilla/5.0 (Windows NT 6.1; WOW64; rv:7.0.1) Gecko/20100101 Firefox/7.0.1
Accept */*
Accept-Language en-gb,en;q=0.5
Accept-Encoding gzip, deflate
Accept-Charset ISO-8859-1,utf-8;q=0.7,*;q=0.7
Connection keep-alive
Referer https://mysite.example.com/real-time-data
Cookie JSESSIONID=B33FF1C1F1B732E7F05A547A9CB76ED3
Pragma no-cache
Cache-Control no-cache
So, the important thing to note is that in the Response Headers (above), you are seeing a Location: that starts with http://, not https://. Your browser will take this into account when figuring out if the lock is valid or not, and report only partially encrypted content! (This is actually an important browser security feature to alert users to a potential XSRF and/or phishing attacks).
The solution in this case is not something you can fix on your site - you have to ask the external site to stop their redirect to http. Often this was done on their side for convenience, without realizing this consequence, and a well written, polite email can get it fixed.

For each element loaded in page, check their scheme:
it starts with HTTPS: it is encrypted.
it starts with HTTP: it's not encrypted.
(you can see a relatively complete list on firefox by right-clicking on the page and selecting "View Page Info" then the "medias"tab.
EDIT: FF only shows images and multimedia elements. They are also javascript files & CSS ones which have to be checked. And Firebug is a good tool to find what you need.

Some elements may not list http or https, in this case whichever was used for the page will be used for these items, i.e. if the page request is under SSL then these images will come encrypted while if the page request is not under SSL then these will come unencrypted. Fiddler in Internet Explorer may also be useful in tracking down some of this information.

Sniff the packets - that'll tell you really quick. WireShark is a good program for such a task.

Can firebug do this?
Edit: Looks like firebug will also do this using the "Net" panel, which also gives you some other interesting statistics.

The best tool I have found for detecting http links on a https connection is Fiddler. It's also great for many other troubleshooting efforts.

I use FF plugin HTTPFox for this.
https://addons.mozilla.org/en-us/firefox/addon/httpfox/

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas